Next / Previous / Contents / Shipman's homepage

15. Breaking your document into multiple files

For larger documents, it is often convenient to break the document into more than one file, so you can work on a specific chapter or section by itself.

This is easy to do because of another type of entity declaration that you can put inside your DOCTYPE. Here's the general form:

    <!ENTITY new-name SYSTEM 'filename'>

This defines a new entity named “&new-name;”. If this entity appears in your document, the effect is to insert the contents of file filename at that point.

Here's an example. Suppose you want to break your document up into four files—a top-level file named mydoc.xml and three subsidiary files named head.xml, body.xml, and tergum.xml. File mydoc.xml might look like this:

<!DOCTYPE article PUBLIC '-//OASIS//DTD DocBook XML V4.3//EN'
  'http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd'
  [ <!ENTITY head SYSTEM 'head.xml'>
    <!ENTITY body SYSTEM 'body.xml'>
    <!ENTITY tail SYSTEM 'tergum.xml'>
  ]
>
<article>
  <articleinfo>
    …  <!-- Usual article info content here -->
  </articleinfo>
  &head;
  &body;
  &tail;
</article>

There is one drawback to this method. If you are using special character entities such as &deg; (the degree symbol, °), your HTML and PDF output files will still build normally, but some editing tools (such as emacs nxml-mode) will no longer validate these character entities, because they see only the current file, and in the subsidiary files there is no <!DOCTYPE> declaration to tell them where the entities are defined.

The workaround for this problem is to use the alternate form for each entity that uses the hexadecimal Unicode character value. For example, the degree symbol entity “&deg;” can also be expressed as “&#x00b0;”. For a complete list of all special character entities in both name and numeric form, see Section 18, “Special characters”.