• Background: An XML file can be stored in memory as a tree, with each node corresponding to an element (tag) in the XML file. For example, consider the following XML markup from the w3school.com website: <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore> Here is a way to visualize the file as an XML tree:
    
                         |--title
                |--book--|--author
                |        |--year
                |        |--price
                |        |--author
                |
                |        |--title
    bookstore---|--book--|--author
                |        |--year
                |        |--price
                |   
                |        |--title
                |        |--author
                |        |--author
                |--book--|--author
                         |--author
                         |--author
                         |--year
                         |--price
    
  • XPath purpose: provide a way to identify or retrieve various nodes in an XML tree
    1. In XSLT, XPath is used to match nodes for visual formatting
    2. In a scripting language XPath will return a set of nodes matching a given path expression. You can think of it as a type of query language.

  • XPath uses path expressions to navigate through the tree.
    1. Path expressions mimic Unix file system syntax
      1. e.g., /bookstore/book/title matches all title elements whose parent is book and whose grandparent is the root element bookstore
      2. / means root and so does /element (e.g., /bookstore)
      3. pathnames that start with / specify absolute locations
      4. pathnames that do not start with / specify relative locations from the current node (a program using XPath has presumably navigated to this node).
      5. each pattern in a path expression is called a step. For example:
        	     step1/step2/.../stepn
        	     

    2. Path operators
      1. nodename: selects all child nodes of the named node e.g., bookstore
      2. /: selects from the root node e.g., /bookstore
      3. //: selects all nodes in the current node's subtree that match the selection (e.g., bookstore//book)
      4. .: selects the current node
      5. ..: selects the parent node (e.g., ../book would select all the book elements of the parent)
      6. @: selects attributes (e.g., //@lang: select all attributes named lang)

    3. Predicates: Sometimes you want to further restrict the elements in the "candidate" set of elements. You can use predicates to further filter the candiate set and select specific elements.
      1. predicates use array notation []
      2. types of predicates include:
        1. positional predicates
          • absolute positions can be specified using integer indices starting from 1 (IE5 and later starts from 0 but the W3C standard starts from 1)
          • last(): returns index of last element with that name e.g., /bookstore/book[last()]
          • ranges: the position() function returns each element's position index and allows you to use inequality relationships to specify a range of values. For example, /bookstore/book[position()<3] returns the first two book elements

        2. existence predicates: existence predicates allow you to select elements that have a specified attribute or sub-element. For example:
          • //title[@lang] selects all titles which have a lang attribute defined.
          • //book[price]/title selects the title of all books that have a price element.

        3. value predicates: value predicates allow you to select elements that has an attribute or sub-element with a particular value. For example:
          • //title[@lang='eng']: All titles with a value of 'eng' for the lang attribute.
          • /bookstore/book[price>35.00]: All books in bookstore with a price greater than $35.00.

          The allowable operators in value expressions include:

          • relational operators (=, !=, <, >, <=, >=)
          • arithmetic operators (but use div rather than / since / is a reserved symbol in XPath that separates steps)
          • boolean operators and/or

      3. It is acceptable to have a predicate at intermediate steps of a path expression. For example:
        /bookstore/book[price>35.00]/title: Select the titles of all books in bookstore with a price greater than $35.00.

    4. Wildcards can be used to select unknown XML elements. For example, we might want to select all elements in a bookstore, such as cd's, dvd's, and books, but we may not know all the element types.

      1. * matches any element node
      2. @* matches any attribute node
      3. node() matches any node of any kind

      Some examples:

      • /bookstore/*: selects all child nodes of bookstore
      • //*: selects all elements in the document
      • //title[@*]: selects all title elements that have a defined attribute

    5. Select multiple paths using | operator: The | operator means "either this element or that element" and can be used to select a subset of the child elements of a node. For example, to select either books or cds in our bookstore, we could write:
      	 /bookstore/book | dvd
      	 
      Just like predicates, the | operator can be used in intermediate steps. For example, we can select the price element of all books or cds using the following XPath:
      	 /bookstore/book | dvd/price
      	 
    6. XPath Axes: Serve as a shorthand notation for designating an initial candidate set of elements. This candidate set can be further winnowed using a pattern or predicate.

      1. useful when processing an XML document in a script. l
      2. less useful for XSLT.
      3. use with care because often times there is a comparable, and more compact notation that specifies the same candidate set
      4. most useful axes

        1. ancestor: the ancestors from the current node to the root
        2. ancestor-or-self: the ancestors of the current node plus the current node
        3. attribute: all the attributes for the current node (@* is more compact)
        4. child: all children of the current node (* is more compact)
        5. descendent: all descendents of the current node (// is more compact)
        6. descendent-or-self: all descents of the current node plus the current node
        7. following-sibling: all sibling nodes that follow the current node
        8. parent: the parent node (.. is more compact)
        9. preceding-sibling: all sibling nodes the precede the current node
        10. self: the current node (. is more compact)

      5. Formal syntax for an axis:
        	    axisname::path operator[predicate]
                   
      6. Examples

        • child::book: the set of nodes to select from is the children of the current node and the children to select are book nodes. book by itself retrieves the same set of nodes.
        • child::node(): the set of nodes to select from is the children of the current node and all children nodes should be selected. * by itself retrieves the same set of nodes.
        • child::*/child::price: The set of nodes to select from is the children of the current node and all children nodes should be selected. In the second step you examine each child node from the first step and select all of its children, then winnow down the children to the price elements. */price retrieves the same set of nodes.