CS360 Lecture notes -- Assembler Lecture #2: Procedures, Stack Frames, Spilling


This lecture is a continuation of computer organization & stack frames. The focus is on procedure calls.

Look at the following C code (p1.c):

int a()
{
  return 1;
}

main()
{
  int i;

  i = a();
}

This compiles into assembler that looks like the following:

    a:
	mov #1 -> %r0
	ret
    main:
	push #4
	jsr a
	st %r0 -> [fp]
	ret
So, both of these are straightforward. Main() first allocates one variable on the stack, and then calls "jsr a", which means jump to subroutine a. Once a returns, it stores the value in register r0 to the local variable i, which has been allocated to be the variable pointed to by the frame pointer. Then main() exits. A() is straightforward as well. It returns the value 1 by storing it in register r0, and then returning.

This seems simple, but what goes on when jsr and ret are called is a little trickier. This is what happens:

When "jsr" is called, (pc+4) and the current value of fp are both stored on the top of the stack. Then, the fp is changed to be the current sp, and pc is changed to be the location of the first instruction of the named procedure. This is done atomically by the computer's hardware. After jsr has taken effect, we are in a new stack frame, and the pc is executing a().

When "ret" is called, the sp is changed to be the current fp. Then the fp is popped off the stack: it is set to be the top stack value, and the sp is decremented by 4. Finally, the pc is popped off the stack: it is set to be the top stack value, and the sp is again decremented by 4. Like "jsr", this is all done atomically by the hardware. When "ret" completes, the pc is set to be the instruction after the original "jsr", and the stack frame of that procedure has been restored.

Let's look at it pictorally (you should use jassem.tcl to do this for yourself). At the start, the stack and registers look as follows:

              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |                       | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2
        |    unused      |                     |                       | r3
        |    unused      |                     |                       | r4
        |    unused      |           /-------- |                       | sp
        |    unused      | <------------------ |                       | fp
        |     used       |                     | main: push #4         | pc
        |     ....       |                     |-----------------------|
        |--------------- |
First, the sp is decremented by 4 to allocate the local variable i:
              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |                       | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2 
        |    unused      |                     |                       | r3 
        |    unused      |                     |                       | r4 
        |    unused      | <------------------ |                       | sp
        |    main: i     | <------------------ |                       | fp
        |     used       |                     | main+4: jsr a         | pc
        |     ....       |                     |-----------------------|
        |--------------- |
Now jsr is called. This pushes pc+4 and the value of the fp on the stack, and sets the fp to the new sp, and pc to a:
              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |                       | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2 
        |    unused      | <---\---\           |                       | r3 
    /---| old fp         |      \   \          |                       | r4 
    |   | old pc: main+8 |       \   \-------  |                       | sp
    \-->|    main: i     |        \----------- |                       | fp
        |     used       |                     | a: mov #1 -> %r0      | pc
        |     ....       |                     |-----------------------|
        |--------------- |
Note now that we have a new stack frame for a, and the pc is executing a. The first thing it does is load 1 into r0:
              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |       1               | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2
        |    unused      | <---\---\           |                       | r3
    /---| old fp         |      \   \          |                       | r4 
    |   | old pc: main+8 |       \   \-------  |                       | sp
    \-->|    main: i     |        \----------- |                       | fp
        |     used       |                     | a+4: ret              | pc
        |     ....       |                     |-----------------------|
        |--------------- |
And then call "ret". "Ret" sets the sp to the fp (which involves nothing in this case), and then pops the fp and the pc off the stack. When it's done we're back to main()'s stack frame, and executing the next instruction after the jsr:
              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |       1               | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2
        |    unused      |                     |                       | r3
        | old fp         |                     |                       | r4
        | old pc: main+8 |<------------------- |                       | sp
        |    main: i     |<------------------- |                       | fp
        |     used       |                     | main+8: st %r0->[fp]  | pc
        |     ....       |                     |-----------------------|
        |--------------- |
Note the "old fp" and "old pc" don't get changed. However since they are "above the stack", they should not get referenced. Now the "st %r0 -> [fp]" gets executed, and the machine state looks like:
              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |       1               | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2
        |    unused      |                     |                       | r3
        | old fp         |                     |                       | r4
        | old pc: main+8 |<------------------- |                       | sp
        |    main: i: 1  |<------------------- |                       | fp
        |     used       |                     | main+12: ret          | pc
        |     ....       |                     |-----------------------|
        |--------------- |
Now main() is over, and calls "ret". You can imagine what this does -- the stack is set up so that when main() calls ret, control returns to the operating system and the process goes away.

Now, make sure you go over this with Jassem. The program is in p1.jas, and you should see exactly what I have showed above, only you get to see all the memory addresses as well.


This next example shows a procedure with arguments and local variables (p2.c):

int a(int i)
{
  int j;

  j = i+1;
  return j;
}

main()
{
  int i;

  i = a(5);
}

This gets compiled into code like the following:

    a:
	push #4
	ld [fp+12] -> %r0
	add %r0, %g1 -> %r0
	st %r0 -> [fp]
	ld [fp] -> %r0
	ret
    main:
	push #4 
        mov #5 -> %r0
        st %r0 -> [sp]--
	jsr a
	pop #4 
	st %r0 -> [fp]
	ret
The only real difference between this example and the last is the argument to a(). It should be clear how a allocates its local variable j on the stack by incrementing the stack pointer. j is then referenced as the location pointed to by the frame pointer. Arguments are passed by the calling procedure by pushing them onto the stack in reverse order (here there is only one argument), and then calling jsr. The procedure knows how to reference the arguments -- they start at the memory location 12 bytes ahead of the fp. Why? Well, [fp] points to the beginning of the frame. [fp+4] points to the old frame pointer, and [fp+8] points to the pc to return to when the procedure is over. Thus, if the arguments are pushed onto the stack directly before calling jsr, then they start at [fp+12].

This program is in p2.jas, and you should trace through it using Jassem. Make sure you understand how main pushes its argument on the stack, how a finds the argument, and what happens on jsr/ret.


Register Spilling

One important thing that has to be decided is whether a procedure may use a register without worrying about its current value (like a() does with r0), or whether a procedure should first save the register on the stack before using it. This matters, because suppose for example, that the main routine uses register r3, then calls "jsr a", and afterwards expects r3 to have the same value. Then a() and any procedures that a() calls must make sure not to use r3, or to save r3's value before using it, and restore it when its done.

The act of saving a register's value before the body of a procedure call and restoring it afterwards is called spilling. Different machines and compilers handle spilling in different ways. For example, older CISC architectures sometimes had a spill-mask that would be part of a procedure call. This specified which registers should be spilled, and the machine actually did the spilling for you.

What we do on our machine is a typical spilling solution: Procedures can use r0 and r1 without worrying about their values. However, registers r2 through r4 must be spilled if a procedure uses them.

Here's an example (spill1.c):

int a(int i, int j)
{
  int k;

  k = (i+2)*(j-5);
  return k;
}

main()
{
  int i;

  i = a(44, 22);
}

To compile arithmetic expressions into assembler, it's useful to turn them into trees. For example, the above expression becomes:

In order to evaluate the tree, you need to do a postorder traversal (or, if you think of the edges are pointing upward, you need to do a topological sorting of the tree). Arithmetic has to be done on a register-by-register basis, so each of those nodes must be in a register. You (the compiler) must figure out an ordering of instructions that is legal, and then an assignment of nodes to registers so that you don't reuse registers unless you can be sure that you don't need their values any more.

For example, in the above expression, suppose you do the (i+2) calculation first and hold the result in r0. Then you can't use r0 to calculate (j-5). For that reason, you are going to have to use r2, and because you are using r2, you'll have to spill it onto the stack. I do this at the beginning of a procedure. Then at the end, I "unspill" it by reading it back from the stack.

The code is in spill1.jas, which I've reproduced below. You may use jassem.tcl to step through this.

a:
    push #4              / Allocate k
    st %r2 -> [sp]--     / Spill r2

    ld [fp+12] -> %r0
    mov #2 -> %r1
    add %r0, %r1 -> %r0  / Calculate (i+2) and put the result in r0

    ld [fp+16] -> %r1
    mov #5 -> %r2
    sub %r1, %r2 -> %r1  / Calculate (j-5) and put the result in r1

    mul %r0, %r1 -> %r0
    st %r0 -> [fp]       / Do k = r0 * r1

    ld [fp] -> %r0
    ld ++[sp] -> %r2     / Unspill r2
    ret

main:

    push #4              / Allocate i
 
    mov #22 -> %r0       / Push arguments onto the stack in reverse order
    st %r0 -> [sp]--
    mov #44 -> %r0       
    st %r0 -> [sp]--
    jsr a
    pop #8               / Always pop the arguments off the stack after jsr

    st %r0 -> [fp]
    ret

Note, that you have to spill r2 onto the stack after allocating the local variable. Otherwise, k will not be at [fp]. Think about it.


An example to show why spilling matters

Take a look at spill2.c:

int a(int i, int j)
{
  int k;

  k = (i+2)*(j-5);
  return k;
}

main()
{
  int i;

  i = (a(10, 20) + a(30, 40));
}

You'll note that a() is exactly the same. The only difference is that we're calling a() twice, and adding up the return values. Think about that for a minute -- where should you store the return value of the first call to a()? You can't store it in r0 or r1 because making a procedure call will destroy them (we have to assume that). Therefore, you have to store it in a higher register, like r2. You know that's ok, because a() will make sure that r2's value is unchanged. Here's the code for main() (in spill2.jas). You'll notice that main() spills r2 as well, because if any procedure uses r2, r3 or r4, it must spill them.

main:

    push #4              / Allocate i
    st %r2 -> [sp]--     / Spill r2
 
    mov #20 -> %r0       / Call a(10, 20) and store the result in r2
    st %r0 -> [sp]--
    mov #10 -> %r0
    st %r0 -> [sp]--
    jsr a
    pop #8
    mov %r0 -> %r2       

    mov #40 -> %r0       / Call a(30, 40) and add the result to r2
    st %r0 -> [sp]--
    mov #30 -> %r0
    st %r0 -> [sp]--
    jsr a
    pop #8
    add %r0, %r2 -> %r0
    st %r0 -> [fp]
   
    ld ++[sp] -> %r2     / Unspill r2
    ret

Once again, I urge you to trace through this code with jassem to see how the spilling works.


What do you do when you run out of registers?

Try spill3.c on for size:

int a(int i)
{
  return i+5;
}

main()
{
  int i;

  i = ( (a(2)+a(3)) * (a(4)+a(5)) + (a(10)+a(11)) * (a(12)+a(13)) ) *
      ( (a(6)+a(7)) * (a(8)+a(9)) + (a(14)+a(15)) * (a(16)+a(17)) );
}

Yuck. Here's the expression tree:

You can see I've labeled it with the registers that you can use if you do the calculation in post-order, from left to right. You'll see that we've run out of registers!

Below, I show how you handle that -- you spill the intermediate value shown as "Spill". That allows you to use r2 again, and you no longer run out of registers. Before you do the last multiplication, you unspill the value into a register:

Did you really want to see the assembler for that? It's in spill3.jas. It's not that hard to read. Here's the crucial code: Spilling the result of the sum (by the "Spill" in the picture above) onto the stack. I start with the call to a(13). When it's done, you perform the multiplication and addition, and then spill the result of the addition onto the stack. Then you start working on the right side of the equation (starting with a(6)):

    ...
    mov #13 -> %r0
    st %r0 -> [sp]--
    jsr a
    pop #4
    add %r0, %r4 -> %r0

    mul %r3, %r0 -> %r0   / Multiplication, then Addition, then spill
    add %r2, %r0 -> %r0   
    st %r0 -> [sp]--

    mov #6 -> %r0          / a(6)+a(7)
    st %r0 -> [sp]--
    jsr a
    pop #4
    mov %r0 -> %r2
    ...

At the end, when you're done with a(17), you do the multiplication and addition. You have one more multiplication, but its operand is the one spilled to the stack. You unspill it and perform the multiplication. Then you're done!

    ...
    mov #17 -> %r0
    st %r0 -> [sp]--
    jsr a
    pop #4
    add %r0, %r4 -> %r0

    mul %r3, %r0 -> %r0   / Multiplication, then addition, then unspill and multiply
    add %r2, %r0 -> %r0   
    ld ++[sp] -> %r1
    mul %r0, %r1 -> %r0

    st %r0 -> [fp]        / Store the result into i

    ld ++[sp] -> %r4      / Unspill before returning
    ld ++[sp] -> %r3
    ld ++[sp] -> %r2
    ret

You can run jassem on this -- it's a bit cumbersome, but I have important screen shots. Here is the state just before the crucial spill:

You can double-check yourself -- a(i) simply adds 5 to i, so:

After the spill, 0x55a goes onto the stack:

I continue stepping to the "unspill":

And at the end, i has been set to 0x3009e4 = 3148620. Is that right? I'll let you double-check it yourself, but it is indeed correct!