Homework 2


Announcements

  1. 1/30: A sample executable can be found in ~bvz/cs461/hw/hw2/graph.
  2. 1/30: Make sure you have testfiles that include error tokens and labels that have physical newlines in them, as opposed to a newline character in the string. For example, this sample file has both error tokens and a label with a physical newline:
    node1 ??? "brad\nwent
    to the store"
    

This homework is designed to give you practice with writing a lexical scanner. You will use flex to generate a scanner for a graph language that allows a user to programmatically specify a directed graph. In the next few assignments, you will be writing a simple translator that translates this language to dot format, which is a language accepted by several graph generating tools, including dot and graphviz.

The following section shows a couple sample graph programs and that section is followed by a specification of the grammar and your assignment.


Sample Graph Programs

Here is a sample program that generates the pre-requisite graph for the new CS program that starts Fall Semester 2011:

direction = vertical;

nodestyle required [color = green, shape = box] CS102 CS140 CS160
      CS302 CS311 CS312 CS360 CS361 CS365 CS400;
nodestyle math [color = orange, shape = ellipse] MATH141 MATH142
      MATH241 MATH251 ECE313;
nodestyle elective [color = blue, shape = ellipse] CS340 CS370 CS425
      CS456 CS440 CS461 CS462 CS465 CS471 CS472 CS480;

edgeStyle mathPrereq [color = orange, fontsize = 10];

CS102 = "CS102\nC++";
CS140 = "CS140\nC++";
CS160 = "CS160\nC";
CS302 = "CS302\nC++";
CS360 = "CS360\nC";
CS361 = "CS361\nC";
CS365 = "CS365\nJava";
CS370 = "CS370\nPython/Matlab";
CS465 = "CS465\nScripting,SQL";

CS102 -> CS140, CS160;
CS140 -> CS302, CS311, CS370;
CS160 -> CS360;
CS302 -> CS340, CS360, CS365, CS425, CS456, CS461;
CS311 -> CS312, CS440, CS465;
CS312 -> CS480;
CS360 -> CS361, CS400;
CS361 -> CS462;
CS365 -> CS465;
CS370 -> CS471, CS472;
MATH251 -> CS370 mathPrereq;
ECE313 -> CS425 mathPrereq;
MATH141 -> MATH142;
MATH142 -> MATH241, MATH251, CS311 mathPrereq, ECE313;
A few notes about this example:

  1. The direction directive tells the graph layout algorithm to layout the graph vertically.
  2. The nodestyle directive tells the graph layout algorithm to draw the ensuing nodes with that particular style.
  3. Node names can only be single words that start with an upper/lowercase letter. By default, the node's label will be its node name. However, an optional string label can be supplied for the node's label using the '=' operator.
  4. The edgestyle directive declares an edge style that can be applied to edges that are defined later in the graph specification.
  5. The -> operator specifies a directed edge from the first node to the second node. The three edges with the mathPrereq directive will be drawn in the mathPrereq edge style.
  6. The blank lines between sections are optional.

Finite State Machine

Here is another graph specification that defines a simple finite state machine which recognizes a division operator (/), an open-ended C comment (/* ... */), or a single line C comment (// ...) (the start state should be on the left side of the graph but graphviz, the program I used to layout the graph, breaks cycles arbitrarily. You could add an additional directive to graphviz to force the start state to be on the left but I have not put such a directive into the graph language you will be specifying and hence you are left with the start state in the wrong position):

direction = horizontal;

nodestyle startState [color = orange] start;
nodestyle finalState [shape = doublecircle] state4 state1 state6;

state1 = "/";
state2 = "/*";
state3 = "/* . *";
state4 = "/* . */";
state5 = "//";
state6 = "// . \n";

start -> state1 "/";
state1 -> state2 "*",
          state5 "/",
          start "not * or /";
state2 -> state3 "*",
          state2 "not *";
state3 -> state3 "*",
          state4 "/",
          state2 "not * or /";
state5 -> state6 "\n",
          state5 ".";
As you might have guessed, the direction directive tells the graph layout algorithm to layout the graph horizontally, and the strings next to each directed edge is the label for that directed edge. I put the individual elements of the adjacency lists for states 1, 2, 3, and 5 on separate lines for readability sake. Each list could have been placed on a single line, but it would have been less readable.

The Graph Grammar

Here is the grammar that formally specifies the graph specification language:

graph => direction? nodeStyleList edgeStyleList nodeLabelList adjacencyList
direction => direction = vertical ;
          |  direction = horizontal ;
nodeStyleList => (nodeStyle ;)*
edgeStyleList => (edgeStyle ;)*
nodeLableList => (nodeLabel ;)*
nodeStyle => nodestyle STYLE_NAME? [ attributeList ] nodelist
edgeStyle => edgestyle STYLE_NAME [ attributeList ]

attributeList => attribute (, attribute)*
attribute => color = PROPERTY_NAME
          |  shape = PROPERTY_NAME
          |  fontname = PROPERTY_NAME 
          |  fontsize = NUMBER
nodeList => NODE_NAME (, NODE_NAME)*
nodeLabel => NODE_NAME = "NODE_LABEL"
adjacencyList => NODE_NAME -> NODE_NAME "EDGE_LABEL"? STYLE_NAME?
                          (, NODE_NAME "EDGE_LABEL"? STYLE_NAME?)*
Here are a few notes to help you interpret the grammar:

  1. All boldfaced names are terminals (tokens).
  2. All lowercase names are non-terminals.
  3. All operator and punctuation symbols are enclosed in quotes ('').
  4. All keywords are boldfaced and lowercase.
  5. All tokens that have a lexeme associated with them are boldfaced and uppercase.
  6. A NODE_NAME, STYLE_NAME, and PROPERTY_NAME is a single word that is any string that starts with a lower/uppercase alphabetic letter and is followed by one or more lowercase letters, uppercase letters, digits, or '_' (i.e., C identifiers). Note that prohibiting a node name from starting with a number or using an operator symbol does not preclude the node's label from starting with a number of using an operator symbol.
  7. Your scanner cannot distinguish between node names, style names, and property names, so just declare a name token for these three types of names in your scanner. In a later assignment, you will write a parser which will be able to decide whether a name token should be a node name, style name, or property name.
  8. The style name really is optional for a nodestyle, but not for an edgestyle.
  9. A NODE_LABEL and an EDGE_LABEL may be multiple words. You will probably need to use a state to recognize a NODE_LABEL and an EDGE_LABEL. Both of them start with a '"', so you can use a " to start your state. Note that from a user point of view, it seems unnecessary to put quotes around a node label. However, if I did not start a node label with a " it would be difficult for your scanner to distinguish between a one word property name and a multiple word node label.
  10. Your scanner cannot distinguish between a node label and an edge label, so just define a label token. In a later assignment, you will write a parser which will be able to decide whether a label token should be a node label or an edge label.
  11. A number is any positive integer with a non-leading 0

Specifications for this Assignment

You should use flex to generate a lexical scanner for the above grammar. For this assignment your scanner should run independently of the parser, so you will need to write a main procedure to call yylex and you will need to write out the tokens as you encounter them. For this assignment your scanner should simply echo the tokens to the output, one per line. It is okay to echo the operator symbols (e.g., ->, =), punctuation marks (e.g., ; and ,), and reserved keywords (e.g., color, nodestyle). For tokens that have lexeme values, I would like your program to echo out both the token name and the lexeme value. Please use the following token names:

  1. NAME for any of NODE_NAME, STYLE_NAME, or PROPERTY_NAME
  2. LABEL for any of NODE_LABEL or EDGE_LABEL
  3. NUMBER for any number

Also please use the following format for token/lexeme pairs:

token: lexeme value
For example:
NAME: CS160
LABEL: CS160      // really "CS160\nC", but this is how it will appear on 
C                 // the output
NUMBER: 10
As a short example, given the graph specification:
direction = horizontal;

nodestyle startState [color = orange] start;

state1 = "/";
start -> state1 "brad";
your scanner should output:
direction
=
horizontal
;
nodestyle
NAME: startState
[
color
=
NAME: orange
]
NAME: start
;
NAME: state1
=
"
LABEL: /
"
;
NAME: start 
-> 
NAME: state1 
"
LABEL: brad
"
;

Submission Instructions

Put your lex specification in a file named graph.lex and submit it using the 461_submit script.