Scripting Lecture notes -- XML Schemas
Schemas Vs DTDs
- DTDs are good for specifying the structure of an XML document. They
should be primarily used with files that are primarily text.
- Schemas are good for specifying the organization of XML documents that
contain a great deal of specifically typed data
- Schemas are very verbose and seem to be oriented toward
business programmers. Types can be restricted using regular
expressions, or using very Cobol-like syntax. With the
Cobol-like syntax, it takes a large number of statements
to specify relatively simple types.
- Schemas can be used with a validator, like xmllint, to
typecheck a file, thus obviating the necessity of doing it
in your program
- Unlike DTD files, Schema files are themselves an XML file
One drawback of Schemas is that they do not allow entities to be easily
declared and entities are restricted to being used as attribute values.
Understanding Schema Headers
Unlike DTDs, Schemas must always be external to the XML file. Both the
schema and the xml file start with fairly complicated looking syntax. Here's
a breakdown of what that syntax means:
- Here's a sample header from the books2.xsd
schema file:
Here's what each line means:
- The xml element is the standard one that starts all xml files
- The xs:schema element indicates that the schema is the root element
for this file. The xs prefix indicates that the schema can be found
in the xs namespace
- xmlns:xs="http://www.w3.org/2001/XMLSchema": indicates
that the elements and data types used in the schema come from
the "http://www.w3.org/2001/XMLSchema" namespace.
- It also
specifies that the elements and data types that come from the
"http://www.w3.org/2001/XMLSchema" namespace should be prefixed
with xs.
- The namespace should be specified exactly as above since
the browser associates a schema file with this particular
name (it's a system file)
- targetNamespace="http://www.wiley.com": The namespace we
are creating. The elements we
are defining will be placed in this namespace. By convention, the
namespace prefix will be your company's URL. Often times the
namespace will actually be a valid URL that contains a document
that describes the namespace. For example,
"http://www.w3.org/2001/XMLSchema" contains a document that explains
the elements of an XML schema.
- xmlns="http://www.wiley.com": The name of the default
namespace. If an element is not prefixed with anything, it is
assumed to belong to this namespace
- elementFormDefault="qualified": Indicates that any element
declared in this namespace must be namespace qualified, although
if we declare the namespace to be the default namespace, no qualifier
will be necessary
- Here's a sample header from the books2.xml
file:
Here's how each line is interpreted:
- The xml tag is the standard beginning of any xml file
- The books tag indicates that books is the root element
- xmlns="http://www.wiley.com": indicates that the
default namespace is "http://www.wiley.com".
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance":
Makes the XML Schema Instance namespace available. Once again
you must specify the namespace exactly as shown here, so that
the browser can locate the appropriate system defined schema
file for this namespace.
- xsi:schemaLocation="http://www.wiley.com books2.xsd">:
With the xsi namespace declared, you can now use the schemaLocation
attribute to tell the browser where to find the schema file for
the "http://www.wiley.com" namespace. Note that the string takes
two distinct values, the name of the namespace and then the URL
for the XML schema to use for that namespace.
Defining Elements
Elements are defined using content models. A content model defines the
type of content that can be contaned in an element.
The four content models for XML schema elements are:
- text: contains text only, but the text may be typed.
- element: contains child elements
- mixed: contains elements and text
- empty: contains no content
An element with the first type of model is called a simple element.
An element with any other type of model, or with both text and attributes,
is called a complex element.
XML Types
XML defines a type hierarchy that starts with the root element anyType.
From this root element are derived complex types and simple types. Simple types
are textual data that are constrained in some way, such as to be a boolean
value or an integer. XML provides a wide variety of simple types, which are
described further in the next section. Complex types allow you to specify
aggregate types that consist of sub-elements, as well as elements that
contain attributes.
Simple Types
Simple elements can be associated with a simple type as follows:
XML defines a large number of simple types, which you can find
here.
Before deriving your own type, check here first. For example, there is a
standard date type, so you should use that one in preference to a custom
one.
You can also derive types by using restrictions or extensions and introducing
the new type with the element xs:simpleType. Restrictions
on types are called facets. You almost always will use restrictions
when deriving types. Extensions are typically used in the context of
associating attributes with simple types. Occasionally you might
use an extension to add enumerated values to a previously
enumerated string type or additional children elements to a
complex type. An example of using extensions to add an attribute to
a simple type is shown later in the notes.
Here is an example that limits a string
to one of two enumerated values:
Notice that you can name a type, so that it can be used in multiple places.
You can create either anonymous, "inline" types, or external, named types.
Here is an example of an inline, anonymous type:
Here are some of the common restrictions that one employs on integers,
floats, decimals and strings (decimals are arbitrary precision numbers, while
floats are supposed to conform to IEEE's floating point standard--decimal
numbers are equivalent to Cobol's packed decimal numbers).
Restriction | Explanation | string | integer | float | decimal |
length | String must be exactly this number of chars | x | | | |
minLength | Minimum number of chars in string |
x | | | |
maxLength | Maximum number of chars in string |
x | | | |
pattern | Perl style regular expression |
x | x | x | x |
enumeration | Constrains the value space to a specified set of values |
x | x | x | x |
minInclusive | Minimum possible value for a number, including the specfied number |
| x | x | x |
maxInclusive | Maximum possible value for a number, including the specfied number |
| x | x | x |
minExclusive | Minimum possible value for a number, excluding the specfied number |
| x | x | x |
maxInclusive | Maximum possible value for a number, excluding the specfied number |
| x | x | x |
totalDigits | Maximum number of digits in the number |
| x | x | x |
fractionDigits | Maximum number of fractional digits in the number |
| | | x |
A complete list of restrictions can be found
here
in Section 10 (Constraining Facets) and a table showing which restrictions
can be used with which simple types can be found in Section 11. Some more
examples can be found in books2.xsd.
Attributes
Attributes are introduced just like elements, with a name and a type.
For example:
Attributes may have three attribute values associated with them, which
determine their initial value:
- default="value": optional attribute that will be given the
default value if the user does not provide one
- fixed="value": a fixed value that is given to the attribute
and that cannot be changed by the user.
- use="required": an attribute whose value must be provided
by the user.
Complex Types
Complex types are divided into two groups: those with simple content and
those with complex content. Simple content is used when you want to have
an element with a simple type and attributes.
Complex content is used
when you want your element to have child elements. Of course complex content
is also allowed to have attributes.
xs:simpleContent
While you typically use restrictions to derive new simple types, you
typically use extensions to derive new types based on simpleContent.
The reason is that you use an extension to add attributes to simple
types. Here is an example where I take the age that I defined earlier
and associate it with an element named antique that has an attribute
named quality:
Note how the type is introduced with the tag xs:complexType.
xs:complexContent
If I want my type to contain children elements, then I need to use
xs:complexContent.
This tag allows me to use one of
three compositor elements to specify the structure of the tag:
- sequence: indicates that the elements must occur in the specified order
- choice: indicates that any one of the elements may occur
- all: indicates that any or all of the elements may occur, and in any
order
Within each of these compositor tags, you list the elements that may occur.
For brevity, you may omit the xs:complexContent tag when it appears
inside a xs:complexType tag, because content that appears inside
such a tag is assumed by default to be complexContent.
For example:
or
Notice the use of ref rather than a name/type pair.
The ref attribute must have a value equal to the value of
a named element that is defined either earlier or later in the document
(i.e., the element's name element must be the same as the ref
attribute's value).
Additionally, you can specify the number of occurrences of an element or
whether mixed content is allowed with the followng attributes:
- minOccurs, maxOccurs: the minimum or maximum number of times an element
may occur. Use "unbounded" for an unbounded number of times. If you
want something to occur an exact number of times, you must use minOccurs
and maxOccurs, providing both with the same value. For example:
- mixed="true": specifies that both children elements and text are allowed.
For example, if I want to define a question type in which I could
embed a points element as well as a statement of the problem, I might
write:
If I want to include attributes with complex types, they must be listed
after the compositor element. For example:
Empty Elements
Empty elements may be specified with or without attribute values, but they
require complexTypes. Here's
one with and without attribute values: