• XML = Extensible Markup Language
    1. Extensible: You can define your own tags
    2. Markup: Markup your data using tags

  • Motivation: To provide a way of describing data. HTML only provides a way to format data for visual presentation
    1. Since data is marked up, different applications can use the same XML file for different purposes. For example, a marked up address book could be used to generate a postal mailing list, an email mailing list, or a list of telephone numbers for telemarketers (ugh!).

    2. XML uses plain text so the files are human readable
    3. XML files are devoid of presentation rules so presentation rules can be independently specified and an XML file can be displayed in different ways depending on need. For example, you can specify presentation rules for rendering an XML file in HTML, PDF, or raw text.
    4. Easily parsed: XML files must meet well-formedness conditions, which are discussed later, that allow parsers to be easily written for XML files
    5. Hierarchical: The data in XML files form hierarchical trees so the data can be internally represented and manipulated as a tree

  • Components of XML
    1. XML file: The marked up data
    2. Document Type Definition (DTD) or Schema: Describe formatting and content of the tags
    3. XSLT (Extensible Stylesheet Language Transformations) and CSS (Cascading Style Sheets): Describe rules for presenting data--XSLT allows XML files to be transformed into arbitrary presentation formats such as raw text, html, or pdf.
  • Sample XML Files
    • Here is a simple example XML file:
          <?xml version="1.0"?>
          <letter>
            <to>Mom</to>
            <from>Tom</from>
            <message>Happy Mother's Day</message>
          </letter>
          

    • Here is a more complex XML file with multiply nested elements:
          <?xml version="1.0"?>
          <addressbook>
          <contact>
            <name> Brad Vander Zanden </name>
            <address>
              <street> 2400 Craghead Lane </street>
            </address>
            <city> Knoxville </city>
            <state> TN </state>
            <zip> 37920 </zip>
          </contact>
          <contact>
            <name> Mickey Mouse </name>
            <address>
              <pobox> 2485 </pobox>
              <apt> 303 </apt>
      	<aptname> Walt Disney Penthouse Suites </aptname>
            </address>
            <city> Orlando </city>
            <state> Florida </state>
            <zip> 20201 </zip>
          </contact>
          </addressbook>
          
  • Guidelines for structuring data : Make data as course-grained as possible. For example, rather than <name>Brad Vander Zanden</name> try <name> <firstname>Brad</firstname> <lastname>Vander Zanden</lastname> </name> You never know what application might need to use your data and the finer-grained you make it, the less guessing and/or parsing the application must do. In the above example, suppose an application needed to separate a name into a first name and a last name and it was just given the string "Brad Vander Zanden". It would need to guess whether "Vander" was part of the first name, the middle name, or part of the last name. By explicitly using the firstname and lastname elements the guess gets eliminated.

  • Structure of an XML file
    1. Elements: Appear as tags (e.g. <name>). It is ok for data and elements to both be nested within an element. For example: <address> 1678 Cardiff Rd <street> <number>1678</number> <road>Cardiff</road> <suffix>Rd</suffix> </street> Columbus, OH 43221 <city>Columbus</city> <state>OH</state> <zipcode>43221</zipcode> </address>
    2. Attributes: Appear as name/value pairs at the beginning of a tag.
      • The value must be quoted
      • Example: <name id="1">
      • In general attributes should be used sparingly. XML parsers may ignore attributes and hence any content that a user might have an interest in should be placed within elements

  • Elements of a well formed XML file

    1. Every tag is terminated by a / tag of the same name

      • Tags without data (i.e., empty tags) can be terminated with a />.
      • Example: <gender />
    2. Tags are properly nested
    3. There is a single, top-level root element: Well formed XML files form a tree where each node represents an element.
    4. The file begins with an xml declaration and a version number.
      	 Example: <?xml version="1.0"?>
      	 
    5. Attribute values are enclosed in quotes ("")
    6. Tags are case-sensitive