Homework 3 Solutions

Differences between LL and LR parsing
1. LR
2. LL
3. LL
4. LR
5. 1. LL = Left-to-right, Left-most derivation
  2. LR = Left-to-right, Right-most derivation
The "1" in LR(1) indicates that 1 token of look-ahead is required in order to parse. In other words, the parser is allowed to consult the next token in the input stream, as well as whatever is on the stack, in order to figure out whether to shift or to reduce.
It derives the empty string
A shift-reduce conflict occurs when a bottom-up parser cannot decide whether to shift the next token onto the stack or to replace the topmost symbols on the stack with the left hand side non-terminal of a production (a reduction). Two examples of shift-reduce conflicts arise with associativity and operator precedence in expression grammars. In both cases, the parser is unsure whether to shift an operator token onto the stack or to reduce the topmost symbols on the stack to an expression. Another source of shift-reduce conflicts is the "dangling else" problem, in which the parser is unsure whether to shift an "else" onto the stack, thus extending the current conditional, or to reduce what is on the stack to a conditional statement, thus closing the previous conditional.
Recursive descent parsers are excellent for:
1. command-based languages in which each line of the program starts with a unique command name, and
2. recognizing control constructs that are labeled with a unique keyword, such as for loops, while loops, if-then-else conditionals, and switch statements.
Two idioms in context-free grammars that cannot be parsed top-down are:
1. left-recursive grammars
2. grammars in which two productions associated with the same lefthand side non-terminal generate strings with a common prefix.
1. A Pushdown automata (PDA) is the name of the machine generated by a bottom-up parser
2. A stack is the thing that augments the finite state machine generated by a scanner (i.e., a PDA is a finite state machine augmented with a stack).

Scott 2.13

Parse tree

Rightmost derivation

stmt => subr_call
     => id ( arg_list )
     => id ( expr args_tail )
     => id ( expr , arg_list )
     => id ( expr , expr args_tail )
     => id ( expr , expr )   // application of args_tail -> epsilon
     => id ( expr , primary expr_tail )
     => id ( expr , primary )  // application of expr_tail -> epsilon
     => id ( expr , id )
     => id ( primary expr_tail , id )
     => id ( primary , id )   // application of expr_tail -> epsilon
     => id ( id , id )

The grammar is not LL(1) because of two problems:
1. the strings generated by the two productions "stmt -> assignment" and "stmt -> subr_call" both generate an id as the first token.
2. the strings generated by the two productions "primary -> id" and "primary -> subr_call" both generate an id as the first token.

We can factor both sets of productions to make the grammar LL(1). In the first case we work as follows:

Original grammar:

stmt -> assignment
     -> subr_call   

assignment -> id := expr
subr_call -> id ( arg_list )

Factor out the id that is common to assignment and subr_call and move it
up to stmt:

stmt -> id rest_of_statement
rest_of_statement -> assignment
                  -> subr_call
assignment -> := expr
subr_call -> ( arg_list )

Once, we fix the production for subr_call, we have almost fixed the problem with primary. Here is how we work out the problem with primary:

Original grammar:

primary -> id
        -> subr_call
        -> ( expr )

subr_call -> id ( arg_list )

We need to factor out the id from subr_call and move it up to primary:

primary -> id rest_of_primary
        -> ( expr )

rest_of_primary -> epsilon
                -> subr_call

subr_call -> ( arg_list )

Notice that we had already modified subr_call in a similar fashion at the end of the first fix. Fortunately, the two modifications agree and we don't have to perform any further modifications.

Scott 2.18
1. 1. First(Es) = First(E) = { atom, ', ( }
  2. Follow(E) = { $$, atom, ', (, ) }: You compute the follow set of a non-terminal by 1) finding all the productions in which it appears as a RHS non-terminal, 2) for each such production, finding the symbol(s) that immediately follow the non-terminal, and 3) adding the first sets for these symbols to the follow set of the non-terminal. There are a couple tricky cases:
    1. If the non-terminal appears at the end of a RHS, such as in "E -> ' E", then you must also add the Follow set of the LHS non-terminal to the Follow set of the non-terminal.
    2. If the non-terminal is followed by a symbol that can generate an empty string, as in "E -> ( E Es )" or "Es -> E Es" (Es can generate an empty string), then you must also add the symbol's Follow set, in this case Follow(Es), to the Follow set of the non-terminal.
    In this problem, E appears in four productions: "P -> E $$", "Es -> E Es", "E -> ' E", and "E -> ( E Es )". We can apply our rules as follows:
    1. P -> E $$: $$ derives a string starting with $$, so we add $$ to Follow(E)
    2. Es -> E Es: We add First(ES) to Follow(E). The previous part of this problem showed that First(ES) is { atom, ', ( }, so we add these three tokens to Follow(E). Es can also derive an empty string, so we must add Follow(Es) to Follow(E). The next part of the solution shows you how to compute Follow(Es), which is { ) }, so we add ')' to Follow(E)
    3. E -> ' E: Since E appears at the end of the production, we add the Follow set of the LHS non-terminal to Follow(E). In this case the LHS non-terminal is E, so we are adding Follow(E) to Follow(E). This is a circular expression and adds no useful symbols to Follow(E).
    4. E -> ( E Es ): We have already handled the case where Es follows E in part ii.
  3. First the answer:
```
Predict(Es -> ε) = First(ε) U Follow(Es)
                         = Follow(Es)   
                         = { ) }
```
    Next the explanation. Since First(ε) is the empty set, we are left with only Follow(Es). To compute Follow(Es), we find the productions in which Es appears on the RHS:
```
E -> ( E Es )    // add ) to Follow(Es)
Es -> E Es       // add Follow(Es) to Follow(Es)--this is circular so nothing
                 // more gets added
```
    In the first production, Es is followed by a right parenthesis, so we add a ) to Follow(Es). In the second production, Es appears at the end of the production, and hence we add in the Follow set for the non-terminal on the LHS, which is also Es. This is circular and does not add any new information.
2. Parse tree
3. Leftmost derivation of (cdr '(a b c)) $$:
```
    P => E $$
      => ( E Es ) $$
      => ( atom(cdr) Es ) $$
      => ( atom(cdr) E Es ) $$
      => ( atom(cdr) ' E Es ) $$
      => ( atom(cdr) ' ( E Es ) Es ) $$
      => ( atom(cdr) ' ( atom(a) Es ) Es ) $$
      => ( atom(cdr) ' ( atom(a) E Es ) Es ) $$
      => ( atom(cdr) ' ( atom(a) atom(b) Es ) Es ) $$
      => ( atom(cdr) ' ( atom(a) atom(b) E Es ) Es ) $$
      => ( atom(cdr) ' ( atom(a) atom(b) atom(c) Es ) Es) $$
      => ( atom(cdr) ' ( atom(a) atom(b) atom(c) ) Es) $$
      => ( atom(cdr) ' ( atom(a) atom(b) atom(c) ) ) $$
```
The two productions are the two productions that could potentially be recognized at this point in the parse and the . represents where we are within each production. The portion to the left of the . repreents the portion that we have already seen, and which is on the top of the stack, and the portion to the right is what remains to be seen if we are to ultimately recognize this production. For example, the first production represents an addition statement, and the . indicates that we have already seen the expression that comprises the left operator and that we still need to see the addition sign and the expression that comprises the right operator.