Next / Previous / Contents / Shipman's homepage

20.2. Semi-automated conversion from 4.x to 5.x

The DocBook project has provided an XSL stylesheet that converts a DocBook 4.3 .xml file to DocBook 5.0. It comes from There is a local copy in:


However, this stylesheet removes all entities, but we use entities extremely heavily here to give single points of definition for names, URLs, and other textual units. So it is necessary to first convert all entity references to something that doesn't look like an entity, then run the output of that step through the db4-upgrade.xsl stylesheet, then convert the modified entity references back to real entity references.

  1. You will need these two Python scripts.

  2. Add two namespace declarations to the root <article> tag, so that it now looks like this:

    <article xmlns="" version="5.0"
  3. Preserve all your entity definitions. Using a text editor, make a copy of the DOCTYPE of your old .xml file. For example, if your old file is called spec43.xml, save its DOCTYPE as spec43.type.

  4. Use the enthide script to change each entity reference &Entity; to the form [[[Entity]]]. To continue the example:

    enthide <spec43.xml >spec5.a
  5. Use the converter to convert that file to 5.0. The disguised entity references will not be affected. Continuing the example (assuming bash shell syntax):

    xsltproc db4-upgrade.xsl spec5.a >spec5.b 2>errors

    You may want to examine the errors file to see if the stylesheet complained about anything.

  6. In file spec5.b, edit the top of the file to look like this, using the ENTITY declarations from spec43.type:

    <!DOCTYPE book
        <!ENTITY ...>
  7. Replace the disguised entity references with real ones:

    entshow <spec5.b >spec5.xml

    The spec5.xml file is now in DocBook 5.0 format with all entities preserved.