Introduction to DTD

Home » Scripting » XML » Introduction to DTD

DTD is the acronym for Document Type Definition.

A description of the content for a family of XML files. This is part of the XML 1.0 specification, and allows one to describe and verify that a given document instance conforms to the set of rules detailing its structure and content.

The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline in your XML document, or as an external reference.

A DTD defines the document structure with a list of legal elements and attributes. This is useful because until now we could only check the syntax of an XML document, if it is “Well Formed” but with the help of DTD we can make sure it is “Well Formed” and “Valid”.

Well formed in relation to XML means that it has no syntax, spelling, punctuation, grammar errors, etc. in its markup. These kinds of errors can cause your XML document to not parse.

Note: An XML Parser is software that reads XML documents and interprets or “parses” the code according to the XML standard. A parser is needed to perform actions on XML. For example, a parser would be needed to compare an XML document to a DTD.

Valid XML Documents vs Well formed XML documents

Validation is the process of checking a document against a DTD (more generally against a set of construction rules).

The simplest way I found to explain this is through a list:

  • ♦ XML documents that adhere to the xml standards is considered well formed
  • ♦ XML documents that adhere to a DTD are considered valid
  • ♦ all valid XML documents are well formed
  • ♦ well formed XML documents are valid only if it succeeds validation against a DTD

There are two types of DTD declarations

Internal DTD

<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Anne</to>
<from>Larry</from>
<heading>Reminder</heading>
<body>Buy milk!</body>
</note>
</pre>

The DTD above is interpreted like this:

!DOCTYPE note defines that the root element of this document is note
!ELEMENT note defines that the "note" element must contain four elements: "to,from,heading,body"
!ELEMENT to defines the "to" element to be of type "#PCDATA"
!ELEMENT from defines the "from" element to be of type "#PCDATA"
!ELEMENT heading defines the "heading" element to be of type "#PCDATA"
!ELEMENT body defines the "body" element to be of type "#PCDATA"

External DTD

This is the same XML document with an external DTD:

<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
</pre>

And here is the file "note.dtd", which contains the DTD:

<?xml version="1.0"?>
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
</pre>