Homework 2

<head>
<link rel="stylesheet" type="text/css" href="../cs461_hw.css" />
</head>

<center>
<h1>Homework 2</h1>
</center>
<hr>
<h2> Announcements </h2>
<ol>
<li> 1/30: A sample executable can be found in ~bvz/cs461/hw/hw2/graph.
<li> 1/30: Make sure you have testfiles that include error tokens and labels
     that have physical newlines in them, as opposed to a newline character
     in the string. For example, this sample file has both error tokens
     and a label with a physical newline:
<pre>
node1 ??? "brad\nwent
to the store"
</pre>
</h2>
</ol>
<hr>
This homework is designed to give you practice with writing a lexical
scanner. You will use flex to generate a scanner for a graph language that
allows a user to programmatically specify a directed graph. In the next
few assignments, you will be writing a simple translator that translates
this language to <tt>dot</tt> format, which is a language accepted by
several graph generating tools, including <tt>dot</tt> and <tt>graphviz</tt>.
<p>
The following section shows a couple sample graph programs and that
section is followed by a specification of the grammar and your assignment.
<hr>
<h2> Sample Graph Programs </h2>
<p>
Here is a sample program that generates the <a href="cspreq.jpg">pre-requisite graph</a> for
the new CS program that starts Fall Semester 2011:
<pre>
direction = vertical;

nodestyle required [color = green, shape = box] CS102 CS140 CS160
      CS302 CS311 CS312 CS360 CS361 CS365 CS400;
nodestyle math [color = orange, shape = ellipse] MATH141 MATH142
      MATH241 MATH251 ECE313;
nodestyle elective [color = blue, shape = ellipse] CS340 CS370 CS425
      CS456 CS440 CS461 CS462 CS465 CS471 CS472 CS480;

edgeStyle mathPrereq [color = orange, fontsize = 10];

CS102 = "CS102\nC++";
CS140 = "CS140\nC++";
CS160 = "CS160\nC";
CS302 = "CS302\nC++";
CS360 = "CS360\nC";
CS361 = "CS361\nC";
CS365 = "CS365\nJava";
CS370 = "CS370\nPython/Matlab";
CS465 = "CS465\nScripting,SQL";

CS102 -> CS140, CS160;
CS140 -> CS302, CS311, CS370;
CS160 -> CS360;
CS302 -> CS340, CS360, CS365, CS425, CS456, CS461;
CS311 -> CS312, CS440, CS465;
CS312 -> CS480;
CS360 -> CS361, CS400;
CS361 -> CS462;
CS365 -> CS465;
CS370 -> CS471, CS472;
MATH251 -> CS370 mathPrereq;
ECE313 -> CS425 mathPrereq;
MATH141 -> MATH142;
MATH142 -> MATH241, MATH251, CS311 mathPrereq, ECE313;
</pre>
A few notes about this example:
<p>
<ol>
<li> The <b>direction</b> directive tells the graph layout algorithm to layout
     the graph vertically.
<li> The <b>nodestyle</b> directive tells the graph layout algorithm to
     draw the ensuing nodes with that particular style.
<li> Node names can only be single words that start with an upper/lowercase
     letter. By default, the node's label will be its node name. However,
     an optional string label can be supplied for the node's label using
     the '=' operator.
<li> The <b>edgestyle</b> directive declares an edge style that can be
     applied to edges that are defined later in the graph specification.
<li> The <b>-></b> operator specifies a directed edge from the first node
     to the second node. The three edges with the mathPrereq directive will
     be drawn in the mathPrereq edge style.
<li> The blank lines between sections are optional. 
</ol>
<hr>
<h3> Finite State Machine </h3>
<p>
Here is another graph specification
that defines a simple <a href="fsm.jpg">finite state machine</a> which
recognizes a division operator (/), an open-ended C comment (/* ... */),
or a single line C comment (// ...) (the start state should be on the left
side of the graph but graphviz, the program I used to layout the graph,
breaks cycles arbitrarily. You could add an additional directive to graphviz
to force the start state to be on the left but I have not put such a
directive into the graph language you will be specifying and hence you are
left with the start state in the wrong position):
<pre>
direction = horizontal;

nodestyle startState [color = orange] start;
nodestyle finalState [shape = doublecircle] state4 state1 state6;

state1 = "/";
state2 = "/*";
state3 = "/* . *";
state4 = "/* . */";
state5 = "//";
state6 = "// . \n";

start -> state1 "/";
state1 -> state2 "*",
          state5 "/",
          start "not * or /";
state2 -> state3 "*",
          state2 "not *";
state3 -> state3 "*",
          state4 "/",
          state2 "not * or /";
state5 -> state6 "\n",
          state5 ".";
</pre>
As you might have guessed, the <b>direction</b> directive tells the graph
layout algorithm to layout the graph horizontally, and the strings next
to each directed edge is the label for that directed edge. I
put the individual elements of the 
adjacency lists for states 1, 2, 3, and 5 on separate lines for
readability sake. Each list could have been placed on a single line, but
it would have been less readable.
<hr>
<h2> The Graph Grammar </h2>
<p>
Here is the grammar that formally specifies the graph specification
language:
<pre>
graph => direction? nodeStyleList edgeStyleList nodeLabelList adjacencyList
direction => <b>direction = vertical</b> ;
          |  <b>direction = horizontal</b> ;
nodeStyleList => (nodeStyle ;)<sup>*</sup>
edgeStyleList => (edgeStyle ;)<sup>*</sup>
nodeLableList => (nodeLabel ;)<sup>*</sup>
nodeStyle => <b>nodestyle</b> <b>STYLE_NAME</b>? [</b> attributeList <b>]</b> nodelist
edgeStyle => <b>edgestyle</b> <b>STYLE_NAME</b> [</b> attributeList <b>]</b>

attributeList => attribute (, attribute)<sup>*</sup>
attribute => <b>color = PROPERTY_NAME</b>
          |  <b>shape = PROPERTY_NAME</b>
          |  <b>fontname = PROPERTY_NAME </b>
          |  <b>fontsize = NUMBER</b>
nodeList => <b>NODE_NAME</b> (, <b>NODE_NAME</b>)<sup>*</sup>
nodeLabel => <b>NODE_NAME</b> = "<b>NODE_LABEL</b>"
adjacencyList => <b>NODE_NAME -> NODE_NAME "EDGE_LABEL"? STYLE_NAME?
                          (, NODE_NAME "EDGE_LABEL"? STYLE_NAME?)</b><sup>*</sup>
</pre>
Here are a few notes to help you interpret the grammar:
<p>
<ol>
<li> All boldfaced names are terminals (tokens).
<li> All lowercase names are non-terminals.
<li> All operator and punctuation symbols are enclosed in quotes ('').
<li> All keywords are boldfaced and lowercase.
<li> All tokens that have a lexeme associated with them are boldfaced and
     uppercase.
<li> A NODE_NAME, STYLE_NAME, and PROPERTY_NAME is a single word
     that is any string that starts with
     a lower/uppercase alphabetic letter and is followed by one or more lowercase letters,
     uppercase letters, digits, or '_' (i.e., C identifiers). Note that
     prohibiting a node name from starting with a number or using an
     operator symbol does not preclude the node's label from starting with a
     number of using an operator symbol.
<li> Your scanner cannot distinguish between node names, style names, and
     property names, so just declare a name token for these three types
     of names in your scanner. In a later
     assignment, you will write a parser which
     will be able to decide whether a name token should be
     a node name, style name, or property name.
<li> The style name really is optional for a nodestyle, but not for an
     edgestyle.
<li> A NODE_LABEL and an EDGE_LABEL may be multiple words. You will probably
     need to use a state to recognize a NODE_LABEL and an EDGE_LABEL. 
     Both of them start with a '"', so you can use a " to start your state.
     Note that from a user point of view, it seems unnecessary to put
     quotes around a node label. However, if I did not start a node label with 
     a &quot; it would be difficult for
     your scanner to distinguish between a one word property name and a multiple
     word node label. 
<li> Your scanner cannot distinguish between a node label and an edge label,
     so just define a label token. In a later
     assignment, you will write a parser which
     will be able to decide whether a label token should be
     a node label or an edge label.
<li> A number is any positive integer with a non-leading 0
</ol>
<hr>
<h2> Specifications for this Assignment </h2>
<p>
You should use flex to generate a lexical scanner for the above grammar.
For this assignment your scanner should run independently of the parser,
so you will need to write a main procedure to call <tt>yylex</tt> and
you will need to write out the tokens as you encounter them. For this
assignment your scanner should simply echo the tokens to the output, one
per line. It is okay to echo the operator symbols (e.g., ->, =), punctuation
marks (e.g., ; and ,), and reserved keywords (e.g., color, nodestyle). 
For tokens that have lexeme values, I would like your program
to echo out both the token name and the lexeme value. Please use the following
token names:
<p>
<ol>
<li> NAME for any of NODE_NAME, STYLE_NAME, or PROPERTY_NAME
<li> LABEL for any of NODE_LABEL or EDGE_LABEL
<li> NUMBER for any number
</ol>
<p>
Also please use the following format for token/lexeme pairs:
<pre>
token: lexeme value
</pre>
For example:
<pre>
NAME: CS160
LABEL: CS160      // really "CS160\nC", but this is how it will appear on 
C                 // the output
NUMBER: 10
</pre>
As a short example, given the graph specification:
<pre>
direction = horizontal;

nodestyle startState [color = orange] start;

state1 = "/";
start -> state1 "brad";
</pre>
your scanner should output:
<pre>
direction
=
horizontal
;
nodestyle
NAME: startState
[
color
=
NAME: orange
]
NAME: start
;
NAME: state1
=
"
LABEL: /
"
;
NAME: start 
-> 
NAME: state1 
"
LABEL: brad
"
;
</pre>
<hr>
<h2> Submission Instructions </h2>
<p>
Put your lex specification in a file named <b>graph.lex</b> and submit it
using the 461_submit script.