I. Document Type Definition (DTD): A set of rules that defines 1) the legal set of elements for an XML document, 2) the legal set of attributes for each element, 3) the legal data content for each element and attribute, and 4) the order and number of times in which subelements must occur. A. Originally created as part of SGML (Standard Generalized Markup Language) and uses SGML syntax B. Advantages 1. Allows you to specify the elements that may appear in an XML file, the order of the elements, and the number of the elements 2. Allows you to specify the attributes that may appear with an element and the datatypes for those attributes C. Disadvantages 1. Cannot specify the datatypes of element values (e.g., int, bool, string, float) 2. Cannot specify more than one definition for an element. It is conceivable that you want an element to have different meanings depending on which element is its parent. For example, a name element might represent the name of a customer if the parent element is an account element and might represent the name of a bank if the parent element is a bank element. You might want a first/last name representation for the customer name and a title/stock ticker/acronym representation for the bank name but be unable to do so because DTD allows only one global definition for each element. 3. It is hard to specify that the number of occurrences of an element can be in a range (e.g., 1-3). D. DTD declaration in an XML file: <!DOCTYPE Root-Element SYSTEM "filename.dtd"> OR <!DOCTYPE Root-Element [DTD specification]> Example: <!DOCTYPE books SYSTEM "bookstore.dtd"> 1. SYSTEM keyword tells processor to fetch an external document (i.e., the DTD is external) 2. file reference is a Uniform Resource Identifier (URI): The example file, bookstore.dtd, resides in the same directory as the XML file 3. The second type of DTD declaration is called an internal declaration since the DTD specification is physically included in the XML file
    E. Sample bookstore xml specification
        and its DTD specification from
	XML Web Development With PHP by Thomas Myer.
1. Notice that the DTD file starts with an XML header but it could be omitted F. Element Declarations 1. Syntax: <!ELEMENT element-name content-type> Example: <!ELEMENT bookInfo (title, author, publisher, isbn)> a. Cannot specify elements with duplicate names 2. Content Types a. ANY: Allows any content type, including text or elements. Basically the same as unstructured XML. b. EMPTY: Contains no content c. Mixed content: allows either child elements, parsed character data (#PCDATA), or simple character data (#CDATA) i. Syntax: <!ELEMENT name (#PCDATA | #CDATA)> or <!ELEMENT name (#PCDATA | Child1 | Child2 | Child<sub>n</sub>)*> ii. Parsed character data 1) XML entity references (discussed later) will get expanded 2) Tags will be recognized by a parser iii. The * is required if you allow mixed content. You will lose the ability to sequence elements in this case iv. Example: <!ELEMENT author (#PCDATA | publisher)*>: allows the author element to include both parsed character data and publisher elements. There may be any number of publisher elements interspersed with character values. For example: <author> <publisher>John Wiley</publisher> <publisher>McGraw Hill</publisher> A big CS publisher <publisher>Bantam Books</publisher> Paperback publisher </author> d. Element content i. Syntax: <!ELEMENT name (child-list)> Example: <!ELEMENT bookInfo (title, author, publisher, isbn)> ii. The order in which elements appear on the child-list determines the order in which they must appear in the XML document iii. Specifying the number of occurrences 1) *: 0 or more 2) +: 1 or more 3) ?: 0 or 1 (i.e., optional) 4) |: either element may appear and order is irrelevant G. Attribute Declarations 1. Syntax: <!ATTLIST element-name attr-name datatype default-value> Example: <!ATTLIST customer custType CDATA #REQUIRED> 2. Common Datatype Values a. CDATA: the following chars must use special forms: < = &lt; > = &gt; & = &amp; " = &quot; Note that CDATA in an attribute is different than CDATA in an element content model. CDATA in an attribute has entity declarations expanded, whereas CDATA in an element does not b. ID: creates a unique ID for an attribute that identifies an element. i. Typically used by programs that process a document. Not typically used by XSLT files ii. only one ID attribute is allowed per element iii. ID's must start with a letter or underscore (_) c. enumerated list: (value1 | value2 | value<sub>n</sub>) i. example: <!ATTLIST customer type (Employee | Cust) "Cust"> ii. note that enumerated values do not appear in quotes but the default value does iii. enumerated types must be single words--they cannot be multiple word strings 3. Default Value a. #REQUIRED: user must always provide a value b. #IMPLIED: attribute is optional and no default value is provided if attribute is omitted c. #FIXED: attribute is optional but if it appears it must take a fixed value that is provided in the declaration Example: <!ATTLIST employee retired #FIXED "true"> d. value: attribute is required but may be optionally specified by the user. If it does not appear then the default value is used Example: <!ATTLIST employee class CDATA "staff"> H. Entity Declarations: Provide macros that expand into longer text 1. Example: <!ENTITY UT "University of Tennessee"> 2. General entities: meant to be used in XML documents only a. Syntax: <!ENTITY entity-name "replacement string"> b. Usage in an XML Document: &entity-name; c. Example: <address>&UT; 37996</address> 3. Parameter entities: meant to be used in DTD documents a. Syntax: <!ENTITY % entity-name "replacement string"> b. Usage in a DTD document: %entity-name; c. Example: <!ENTITY % acceptable_markup "(b|i|u|#CDATA)"> <!ELEMENT paragraph %acceptable;> <!ELEMENT abstract %acceptable;> <!ELEMENT summary (accepttable;> d. Useful when the same content will appear in different elements. Any change to the content requires only one edit as opposed to multiple edits. 4. External entity: Entities you've seen thus far are internal. External entities allow you to integrate external files that include either replacement strings or non-XML resources, such as jpg images. We won't discuss external entities