This homework is designed to give you practice:

  1. creating DTD files,
  2. creating Schema files, and
  3. designing xml files.


Helpful Hints

  1. Here is a quick reference to XML schema data types and facets.

  2. xmllint provides an alternative way to a browser for checking whether or not the DTDs and Schemas that you develop for this assignment work. The format of the command is:
    	xmllint -dtdvalid DTDfile xmlfile
    	   or
    	xmllint -schema Schemafile xmlfile
    	
    xmllint is installed on the CS department's linux workstations.

  3. If you want to play around with xmllint on example xml, schema, and DTD files you can go to ~bvz/cs594/hw/hw6 and copy the files family.xml, family.dtd, and family.xsd to your directory. The family.xml file is set up to work with a schema. You will need to remove the schema declarations from the file to make it work with a DTD.


Problems

  1. (20 points) Design a Schema file named cdcatalog.xsd for the following XML specification: <catalog> <artist>* <retired>: boolean <country>: string <name> <first>: string <last>: string </name> <cd>* <title>: string <company>: string <price>: decimal number with up to 4 total digits and up to 2 decimal digits (i.e., dd.dd). Example numbers might be 5.2, 10.90, 3.31, .97, 0.97, 3, 3., etc. The minimum value is .01 and the maximum value is 99.99, which you can specify as being between 0 and 100, exclusive. <yearReleased>: An inclusive integer between 1900 and 2006 <dateAcquired>: A date in the form mm/dd/yy cdcatalog contains an example XML file that satisfies this schema.

  2. Consider the following XML specification: <payroll> <employee>* <name> <first>: string of up to 15 alphabetical characters, spaces, or hyphens (-). string must start with an upper-case letter <middle>?: upper case letter <last>: string of up to 15 alphabetical characters, spaces, or hyphens (-). string must start with an upper-case letter <spouse>? -- name is same type as employee name <first> <middle>? <last> <child>* -- name is same type as employee name <first> <middle>? <last> <tax-status> married | single | headOfHousehold | separated <ssn>: A nine digit number of the form ddd-dd-dddd (e.g, 865-57-2934) ssn attribute: name=type; values=(assigned | original); default="original" <salary>: A 9 digit number of the form ddddddd.dd with a minimum value of 0 and a maximum value of 2000000, inclusive. <date-of-birth>: A date type <manager> | <staff> <manager> attribute for manager: name=title; value=string; required <group>: string <yrsAtRank>: an inclusive integer between 0 and 50 <staff> <skill>+: up to 5 skills, each being a string payroll contains an example XML file that satisfies this specification.

    1. (20 points) Design a DTD file named employee.dtd for the above XML file. Use character data for all the simple elements (elements with data content as opposed to sub-elements) and do not worry about trying to limit skills to 5 or fewer. Instead require there to be at least one skill.

    2. (30 points) Design a schema file named employee.xsd for the above XML file.

  3. Behold the following CS140 exam that I gave a number of years ago. I want you to design a set of tags and an XML hierarchy that can be used to encode the exam instructions, questions, and answers (I know that the answers are not shown, but pretend there is an answer associated with each question). Place your answer in a file named exam.txt. I have provided a few hints at the end of this question to help guide you in the right direction, but the problem is deliberately a bit vague so that you are forced to exercise some judgement, just as you would have to do in the real world.

    Your solution should show me the hierarchy of element tags you will use. Place an asterisk (*), plus (+), or question mark (?) next to an element if it can occur 0 or more times, 1 or more times, or 0/1 times respectively. If you require one of two elements, you can write <element1> | <element2> and use the following syntax:

    <element1> | <element2> <element1> ... <element2> ...

    Hints:

    1. Formatting, such as horizontal rules and question numbers/letters, should not be part of your structure.
    2. Only use attributes if you want points deducted.
    3. Use the format shown in problem 1 to write your XML specification.
    4. Do not worry about the types of the data contained in the elements. If it makes you happy, assume they are parsed character data. I only care about the way you structure your data.
    5. Things you might want to think about:

      • What appear to be the major parts of the exam? The major parts might serve as tags.
      • Do the questions seem to fall into certain types of categories? The categories might serve as tags.
      • Do the questions seem to share some parts in common? How could you cleanly handle the fact that questions share some parts in common and have some differences?


What to Submit

You should use the same submission script you have used thus far this semester and you should submit the following files:

  1. cdcatalog.xsd
  2. employee.dtd
  3. employee.xsd
  4. exam.txt