Next / Previous / Contents / TCC Help System / NM Tech homepage

Abstract

Describes the Document Type Definition notation for describing the schema of an SGML or XML document type.

This publication is available in Web form and also as a PDF document. Please forward any comments to tcc-doc@nmt.edu.

Table of Contents

1. What is a DTD?
1.1. Definitions
2. Where does a DTD live?
2.1. Linking an XML file to an external DTD
2.2. Including the DTD inside your XML file
3. Types of DTD declarations
4. Element declarations
4.1. Declaring empty elements
4.2. Elements with text content only
4.3. Elements with mixed content
5. Attribute declarations
5.1. Tokenized attributes
5.2. Enumerated attributes
6. Declaring and using entities
6.1. General entities
6.2. Character entities
6.3. Parameter entities
6.4. Binary (non-parsed) entities
7. Notation declarations

1. What is a DTD?

The purpose of a Document Type Definition or DTD is to define the structure of a document encoded in XML (eXtended Markup Language).

For introductory material about XML, see the XML help page.

It is possible to build and use files containing XML tags without ever defining what tags are legal. However, if you want to insure that files conform to a known structure, writing a DTD is the preferred method.

Two definitions:

  • A well-formed file is one that obeys the general XML rules for tags: tags must be properly nested, opening and closing tags must be balanced, and empty tags must end with '/>'.

  • A valid file is not only well-formed, but it must also conform to a publicly available DTD that specifies which tags it uses, what attributes those tags can contain, and which tags can occur inside which other tags, among other properties.

The advantage of a valid file is that its contents are more predictable for applications that want to process or present that file. The DTD insures that only certain tags can be used in certain places.

1.1. Definitions

We need to review some terminology before proceeding:

  • A proper XML name must start with a letter or underbar (_), with the rest letters, underbars, digits, or hyphen (-).

  • A tag is one of the XML constructs used to mark up documents. All tags start with a less-than symbol (<) and end with a greater-than symbol (>).

  • An element is a section of an XML document that acts as a unit. It may be either empty element, or it may have content.

  • An empty element consists of a single tag of the form

        <gi.../>

    Where gi is the tag type (or “generic identifier”), and the tag may include attributes. Note the slash before the closing “>”; this signifies an empty tag.

  • An opening tag begins a section of an XML document that ends with the corresponding closing tag. An opening tag has this form:

        <gi...>

    where gi is the tag type (or “generic identifier”), and the tag may include attributes. A closing tag has the form:

        </gi>
  • The content is everything between the opening tag and its corresponding closing tag. The content may be other elements or just plain text.