SLR Parse Table Construction


These notes show how to construct an SLR(1) parse table for the following grammar:
(1) E -> E + E
(2) E -> E * E
(3) E -> ( E )
(4) E -> -E
(5) E -> id
We will assume that the order of precedence is -, *, + (- has the highest precedence) and that the operators are left associative. Typically each grammar will have a single start symbol, such as E, but we need to introduce a special start symbol, E', and a special production, E' -> E that makes it easier to accept the input string when it's exhausted. Note that E has multiple productions whereas E' has only a single production. We will designate the production for E' as production 0.


State Construction

The first thing we must do is construct states for our parser and we do so by creating sets of items, starting with the production E'->E:

Set 1: E' -> .E 
       E  -> .E + E | .E * E | .(E) | .-E | .id
The second line of items is produced by computing the closure of E'->.E. Any production that could derive a portion of E must be added to the closure and obviously any of E's productions may derive a portion of E so all of E's productions are added to the closure. None of the items on line 2 can add any productions that would help to derive a portion of a nonterminal just to the right of the . because the only such nonterminal is E and we have already added its productions to the closure.

The next set of items is derived by computing Goto(1, E), which is the set of items derived by moving the . over the E in the items in set 1 where the . immediately precedes the E. We must then compute the closure of these items:

Set 2: E' -> E.
       E  -> E .+ E | E .* E
The three items in set 2 are obtained from the three items in set 1 with a . immediately preceding the E. No more items can be added to the closure because in none of the three items does the . precede a nonterminal.

Sets 3, 4, and 5 are obtained by moving the . over the (, -, and id respectively in Set 1:

Set 3: E -> (.E) | .E+E | .E*E | .(E) | .-E | .id
Set 4: E -> -.E | .E+E | .E*E | .(E) | .-E | .id
Set 5: E -> id.
In Sets 3 and 4 we have added to the set all productions that can derive part of an E, which is all of E's productions.

We have now completed computing the Goto sets for set 1 so we move onto set 2 and compute Goto(2, +) and Goto(2, *), since the . immediately precedes an item that contains a + and an item that contains a *:

Set 6: E -> E+.E | .E+E | .E*E | .(E) | .-E | .id
Set 7: E -> E*.E | .E+E | .E*E | .(E) | .-E | .id
At this point take a moment to think about what these sets and items represent by thinking about set 6. We reach set 6 if we have already seen E+ (i.e., E+ is on the stack). We are expecting to see E next. Now of course we will not immediately see an E. We will first see one or more terminals that will hopefully get reduced to an E. So we have to add to set 6 all of the possible strings that we might see as we read the next symbols in the input stream. The strings that we might see are any strings that can be generated by E and hence we add all of E's productions to set 6. By adding these productions we can see that we will be able to proceed if the next input symbol is a (, -, or id. All other input symbols, such as +, *, or $ (end of input), will cause an error.

Moving on, we need to compute the sets generated by the items in Set 3, and these are the items generated by the closure of Goto(3,E), Goto(3,(), Goto(3,-), and Goto(3,id):

Set 8: E -> (E.) | E.+E | E.*E
Set 9 = Set 3: E -> (.E) | .E+E | .E*E | .(E) | .-E | .id
Set 10 = Set 4: E -> -.E | .E+E | .E*E | .(E) | .-E | .id
Set 11 = Set 5: E -> id.
Notice that sets 9, 10, and 11 duplicate pre-existing sets so we will discard them. Also note that set 8 obtains its three items from set 3 by simply moving the . past the E. No further items can be added because the . precedes terminal symbols. There is nothing further to add because only a ) can next derive a string starting with ) and the same is true for a + and a *.

Now I'm going to derive the remaining sets without much further explanation: Only one new set is generated from set 4 and that is Goto(4,E):

Set 9 = Goto(4,E): E -> -E. | E.+E | E.*E 
The other Gotos are Goto(4,.), Goto(4,-), and Goto(4,id) which generate sets 3, 4, and 5 respectively.

Set 5 cannot generate any new sets because its only item has the . at the end of the production, meaning that it will be a state that reduces a production to a nonterminal.

Set 6 and set 7 each generate one new set from Goto(6,E) and Goto(7,E) respectively. The other Gotos will duplicate sets 3, 4, and 5.

Set 10 = Goto(6,E): E -> E+E. | E.+E | E.*E
Set 11 = Goto(7,E): E -> E*E. | E.+E | E.*E
Set 8 has the items E -> (E.) | E.+E | E.*E. Goto(8,)) will generate a unique set:
Set 12 = Goto(8,)): E -> (E).
while Goto(8,+) and Goto(8,*) will generate sets 6 and 7 respectively.

Set 9 has the items E -> -E. | E.+E | E.*E. Nothing can be done with the first item because it indicates that a reduction is in order (the reduction should be E -> -E). Goto(9,+) and Goto(9,*) will generate the pre-existing sets 6 and 7.

Sets 10, 11, and 12 also do not generate any new sets. The Goto functions for sets 10 and 11 will both generate sets 6 and 7.

Since we have now gone through all of our sets without generating any new sets, we have our complete set of states, with each state corresponding to one of the sets:

State 1: E' -> .E 
         E -> .E + E | .E * E | .(E) | .-E | .id 
State 2: E' -> E.  
         E -> E .+ E | E .* E 
State 3: E -> (.E) | .E+E | .E*E | .(E) | .-E | .id 
State 4: E -> -.E | .E+E | .E*E | .(E) | .-E | .id
State 5: E -> id.  
State 6: E -> E+.E | .E+E | .E*E | .(E) | .-E | .id 
State 7: E -> E*.E | .E+E | .E*E | .(E) | .-E | .id 
State 8: E -> (E.) | E.+E | E.*E 
State 9: E -> -E. | E.+E | E.*E
State 10: E -> E+E. | E.+E | E.*E 
State 11: E -> E*E. | E.+E | E.*E 
State 12: E -> (E).

Constructing the Action Table

We can construct action entries for each state by considering the items in each state. If an item contains a . before a terminal or nonterminal then the action for that terminal or nonterminal will be a shift and a transition to the state represented by the Goto function for the current state and that symbol. For example, in state 1 we have the three items E'->.E, E->.E+E, E->.E*E and Goto(1,E) is 2. So the action in state 1 for E is to shift E onto the stack and move to state 2. Similarly Goto(1,() is 3 so the action in state 1 for ( is to shift ( onto the stack and move to state 3.

If an item has its . at the end of the production then we want to reduce the right hand side of the production to the left hand side nonterminal, and then shift based on the top state left on the stack and the nonterminal on the left hand side of the production. The stack consists of (state, symbol) pairs and we will pop off a number of pairs equal to the number of symbols on the right hand size of the production. We do not want to reduce for every symbol in the input string however. We only want to reduce if the next symbol in the input string is a symbol that could legitimately follow the portion of the string we have already recognized. In other words, we only want to reduce for those symbols in the Follow set for the nonterminal. For example, in state 5 we want to reduce id to E. The Follow set for E is {*, +, ), $$} where $$ means "end of input" so we will only reduce if the next symbol in the input stream is one of these symbols. You can easily compute the Follow set for E by looking at its productions and seeing that the only terminals that follow an E in the five productions are +, *, ), and $$.

We need the operator precedence and left associativity rules that we stated at the beginning of this problem to resolve several shift-reduce conflicts:

  1. State 9, E -> -E. | E.+E | E.*E: If the next symbol is either a '+' or a '-', then there is a conflict between reducing -E to E and shifting the symbol onto the stack. The precedence of - over + or * tells us that it is right to reduce -E to E, because - should bind more tightly than + or *. For example -a+b should be interpreted as (-a)+b rather than -(a+b) (the parentheses are added to show how the expression is interpreted--in the first expression a is negated and added to b while in the second expression a and b are summed and then the sum is negated). So when we have seen -a, which corresponds to state 9 with -E on the stack, and we are looking at '+' as the next symbol, you can see that the correct action is to reduce, which leads to the first interpetation, rather than shift, which would lead to the second interpretation.

  2. State 10, E -> E+E. | E.+E | E.*E: If the next symbol is a '+', we need to decide whether to reduce E+E to E or to shift + onto the stack. Because we stated that operators are left associative, we should reduce E+E to E. If we have the expression a+b+c, reducing will cause us to correctly evaluate the expression as (a+b)+c rather than as a+(b+c). If we shifted the +, we would obtain the second expression.

    If instead of seeing a '+' we see a '*', then we have a shift/reduce conflict between shifting the '*' onto the stack, thus obtaining the interpretation E+(E*E), or reducing E+E to E, thus obtaining the interpretation (E+E)*E. Since '*' has precedence over '+', the correct interpretation is the first one, and hence we want to shift '*' onto the stack.

  3. State 11, E -> E*E. | E.+E | E.*E: If the next symbol is a '+', then we have a shift/reduce conflict between shifting the '+' onto the stack, thus obtaining the interpretation E*(E+E), or reducing E*E to E, thus obtaining the interpretation (E*E)+E. Since '*' has precedence over '+', the correct interpretation is the second one, and hence we want to reduce E*E to E.

    If instead of seeing a '+' we see a '*', then we have a shift/reduce conflict between shifting the '*' onto the stack, thus obtaining the interpretation E*(E*E), or reducing E*E to E, thus obtaining the interpretation (E*E)*E. Since '*' is left associative, the correct interpretation is the second one, and hence we want to reduce E*E to E.

In the following action table, the entries mean the following things:

  1. s state: Shift (push) the current input symbol onto the stack, push the state number onto the stack, and change to the specified state.
  2. r production(#pairs): Reduce by the numbered production and pop the specified number of symbol/state pairs, which will be in parentheses. The symbol that will be reduced to is on the left hand side of the production. Use the topmost state that is left on the stack and the new nonterminal to determine which state to transition to. Push the new nonterminal and the new state onto the stack.
  3. acc: Accept the input string
  4. err: An error has occurred. There is no valid transition from the current state based on the next symbol in the input string.

The action table for the grammar will look as follows:

State+*()-id$$EComments
1errerrs3errs4s5errs2 
2s6s7errerrerrerraccerr 
3errerrs3errs4s5errs8 
4errerrs3errs4s5errs9 
5r5(1)r5(1)errr5(1)errerrr5(1)err 
6errerrs3errs4s5errs10 
7errerrs3errs4s5errs11 
8s6s7errs12errerrerrerr 
9r4(2)r4(2)errr4(2)errerrr4(2)errshift/reduce conflict on either +/* resolved by reduction because - has precedence over +/*
10r1(3)s7errr1(3)errerrr1(3)errshift/reduce conflict on + resolved by reduction because + is left associative; shift/reduce conflict on * resolved by shifting '*' onto the stack because * has precedence over +
11r2(3)r2(3)errr2(3)errerrr2(3)errshift/reduce conflict on + resolved by reduction because * has precedence over +; shift/reduce conflict on * resolved by reduction because * is left associative
12r3(3)r3(3)errr3(3)errerrr3(3)err 


Executing the Parser

To show how the parse table would be used by a parser, we will trace the execution of a parser on the string:

(id + id) * (id + id + -(id + id))
A short explanation follows the table. Note that the parser begins its executing by pushing state 1 onto the stack. State 1 represents the start state.

InputStackAction
(id+id)*(id+id+-(id+id))$$1Shift ( and move to state 3
id+id)*(id+id+-(id+id))$$1(3Shift id and move to state 5
+id)*(id+id+-(id+id))$$1(3id5Reduce by E->id, pop 1 pair, shift E, and move to state 8
+id)*(id+id+-(id+id))$$1(3E8Shift + and move to state 6
id)*(id+id+-(id+id))$$1(3E8+6Shift id and move to state 5
)*(id+id+-(id+id))$$1(3E8+6id5Reduce by E->id, pop 1 pair, shift E, and move to state 10
)*(id+id+-(id+id))$$1(3E8+6E10Reduce by E->E+E, pop 3 pairs, shift E, and move to state 8
)*(id+id+-(id+id))$$1(3E8Shift ) and move to state 12
*(id+id+-(id+id))$$1(3E8)12Reduce by E->(E), pop 3 pairs, shift E, and move to state 2
*(id+id+-(id+id))$$1E2Shift * and move to state 7
(id+id+-(id+id))$$1E2*7Shift ( and move to state 3
id+id+-(id+id))$$1E2*7(3Shift id and move to state 5
+id+-(id+id))$$1E2*7(3id5Reduce by E->id, pop 1 pair, shift E, and move to state 8
+id+-(id+id))$$1E2*7(3E8Shift + and move to state 6
id+-(id+id))$$1E2*7(3E8+6Shift id and move to state 5
+-(id+id))$$1E2*7(3E8+6id5Reduce by E->id, pop 1 pair, shift E, and move to state 10
+-(id+id))$$1E2*7(3E8+6E10Reduce by E->E+E, pop 3 pairs, shift E, and move to state 8
+-(id+id))$$1E2*7(3E8Shift + and move to state 6
-(id+id))$$1E2*7(3E8+6Shift - and move to state 4
(id+id))$$1E2*7(3E8+6-4Shift ( and move to state 3
id+id))$$1E2*7(3E8+6-4(3Shift id and move to state 5
+id))$$1E2*7(3E8+6-4(3id5Reduce by E->id, pop 1 pair, shift E, and move to state 8
+id))$$1E2*7(3E8+6-4(3E8Shift + and move to state 6
id))$$1E2*7(3E8+6-4(3E8+6Shift id and move to state 5
))$$1E2*7(3E8+6-4(3E8+6id5Reduce by E->id, pop 1 pair, shift E, and move to state 10
))$$1E2*7(3E8+6-4(3E8+6E10Reduce by E->E+E, pop 3 pairs, shift E, and move to state 8
))$$1E2*7(3E8+6-4(3E8Shift ) and move to state 12
)$$1E2*7(3E8+6-4(3E8)12Reduce by E->(E), pop 3 pairs, shift E, and move to state 9
)$$1E2*7(3E8+6-4E9Reduce by E->-E, pop 2 pairs, shift E, and move to state 10
)$$1E2*7(3E8+6E10Reduce by E->E+E, pop 3 pairs, shift E, and move to state 8
)$$1E2*7(3E8Shift ) and move to state 12
$$1E2*7(3E8)12Reduce by E->(E), pop 3 pairs, shift E, and move to state 11
$$1E2*7E11Reduce by E->E*E, pop 3 pairs, shift E, and move to state 2
$$1E2Accept the input string

The difficult part of the trace to follow may be the reductions. The first reduction occurs in the third row of the table when we reduce by E->id, pop 1 pair, shift E, and move to state 8. The entry in the action table for state 5 on input symbol + is r5(1). This entry dictates that the parser reduce by production 5, which is E->id and pop 1 pair off the stack. When the parser does so, the top state left on the stack is state 3. The action table for state 3 on input symbol E is to shift E and move to state 8, which is what the parser does, as shown by the stack in row 4 of the table.

A more interesting reduction is the first reduction by E->E+E. At this point the contents of the stack are 1(3E8+6E10 and the next input symbol is ). The action table entry for state 10 on symbol ) is r1(3), which means that the parser should reduce by production 1, which is E->E+E, and pop 3 pairs off the stack. Hence the parser pops off the pairs E10, +6, and E8. The stack now contains 1(3. The action entry for state 3 on symbol E indicates that the parser should shift E and move to state 8, which is what it does, as shown by the stack in the next row of the table.