• XML = Extensible Markup Language
    1. Extensible: You can define your own tags
    2. Markup: Markup your data using tags

  • Motivation: To provide a way of describing data. HTML only provides a way to format data for visual presentation
    1. Since data is marked up, different applications can use the same XML file for different purposes. For example, a marked up address book could be used to generate a postal mailing list, an email mailing list, or a list of telephone numbers for telemarketers (ugh!).

    2. XML uses plain text so the files are human readable
    3. XML files are devoid of presentation rules so presentation rules can be independently specified and an XML file can be displayed in different ways depending on need. For example, you can specify presentation rules for rendering an XML file in HTML, PDF, or raw text.
    4. Easily parsed: XML files must meet well-formedness conditions, which are discussed later, that allow parsers to be easily written for XML files
    5. Hierarchical: The data in XML files form hierarchical trees so the data can be internally represented and manipulated as a tree

  • XML Versus Relational Databases
    1. XML is preferred when
      1. Data is highly unstructured with many different attributes for each subtype (e.g., addresses often exhibit high variability--home addresses, apartment addresses, PO Boxes, business address--although we try to accommodate addresses in a relational database because addresses are so ubiquitous).
      2. You are storing data in "documents" like Word or Excel does. In this case, each instance of an entity, such as a description of a house for a architectural program, is stored in a separate file.
      3. Data must be human-readable and editable.
    2. Relational databases are preferred when
      1. Data is highly structured
      2. A large quantity of data needs to be stored, in which case it's better to store much of it in binary form rather than text form with lots of markup tags.
      3. When instances of an entity, such as employees, can be combined in a single file.

  • Components of XML
    1. XML file: The marked up data
    2. Document Type Definition (DTD) or Schema: Describe formatting and content of the tags
    3. XSLT (Extensible Stylesheet Language Transformations) and CSS (Cascading Style Sheets): Describe rules for presenting data--XSLT allows XML files to be transformed into arbitrary presentation formats such as raw text, html, or pdf.
  • Sample XML Files
    • Here is a simple example XML file:
          <?xml version="1.0"?>
          <letter>
            <to>Mom</to>
            <from>Tom</from>
            <message>Happy Mother's Day</message>
          </letter>
          

    • Here is a more complex XML file with multiply nested elements:
          <?xml version="1.0"?>
          <addressbook>
          <contact>
            <name> Brad Vander Zanden </name>
            <address>
              <street> 2400 Craghead Lane </street>
            </address>
            <city> Knoxville </city>
            <state> TN </state>
            <zip> 37920 </zip>
          </contact>
          <contact>
            <name> Mickey Mouse </name>
            <address>
              <pobox> 2485 </pobox>
              <apt> 303 </apt>
      	<aptname> Walt Disney Penthouse Suites </aptname>
            </address>
            <city> Orlando </city>
            <state> Florida </state>
            <zip> 20201 </zip>
          </contact>
          </addressbook>
          
  • Guidelines for structuring data : Make data as fine-grained as possible. For example, rather than <name>Brad Vander Zanden</name> try <name> <firstname>Brad</firstname> <lastname>Vander Zanden</lastname> </name> You never know what application might need to use your data and the finer-grained you make it, the less guessing and/or parsing the application must do. In the above example, suppose an application needed to separate a name into a first name and a last name and it was just given the string "Brad Vander Zanden". It would need to guess whether "Vander" was part of the first name, the middle name, or part of the last name. By explicitly using the firstname and lastname elements the guess gets eliminated.

  • Structure of an XML file
    1. Elements: Appear as tags (e.g. <name>). It is ok for data and elements to both be nested within an element. For example: <address> 1678 Cardiff Rd <street> <number>1678</number> <road>Cardiff</road> <suffix>Rd</suffix> </street> Columbus, OH 43221 <city>Columbus</city> <state>OH</state> <zipcode>43221</zipcode> </address>
    2. Attributes: Appear as name/value pairs at the beginning of a tag.
      • The value must be quoted
      • Example: <name id="1">
      • In general attributes should be used sparingly. XML parsers may ignore attributes and hence any content that a user might have an interest in should be placed within elements

  • Elements of a well formed XML file

    1. Every tag is terminated by a / tag of the same name

      • Tags without data (i.e., empty tags) can be terminated with a />.
      • Example: <gender />
    2. Tags are properly nested
    3. There is a single, top-level root element: Well formed XML files form a tree where each node represents an element.
    4. The file begins with an xml declaration and a version number.
      	 Example: <?xml version="1.0"?>
      	 
    5. Attribute values are enclosed in quotes ("")
    6. Tags are case-sensitive