CS360 Lecture notes -- Assembler Lecture #2

Directory: /blugreen/homes/plank/cs360/notes/Assembler2

Lecture notes: http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Assembler2/lecture.html

This lecture is a continuation of computer organization & stack frames. The focus is on procedure calls.

Look at the following C code:

int a()
{
  return 1;
}

main()
{
  int i;

  i = a();
}

This compiles into assembler that looks like the following:

    a:
	mov #1 -> %r0
	ret
    main:
	push #4
	jsr a
	st %r0 -> [fp]
	ret

So, both of these are straightforward. Main() first allocates one variable on the stack, and then calls "jsr a", which means jump to subroutine a. Once a returns, it stores the value in register r0 to the local variable i, which has been allocated to be the variable pointed to by the frame pointer. Then main() exits. A() is straightforward as well. It returns the value 1 by storing it in register r0, and then returning.

This seems simple, but what goes on when jsr and ret are called is a little trickier. This is what happens:

When "jsr" is called, (pc+4) and the current value of fp are both stored on the top of the stack. Then, the fp is changed to be the current sp, and pc is changed to be the location of the first instruction of the named procedure. This is done atomically by the computer's hardware. After jsr has taken effect, we are in a new stack frame, and the pc is executing a().

When "ret" is called, the sp is changed to be the current fp. Then the fp is popped off the stack: it is set to be the top stack value, and the sp is decremented by 4. Finally, the pc is popped off the stack: it is set to be the top stack value, and the sp is again decremented by 4. Like "jsr", this is all done atomically by the hardware. When "ret" completes, the pc is set to be the instruction after the original "jsr", and the stack frame of that procedure has been restored.

Let's look at it pictorally (you should use jassem.tcl to do this for yourself). At the start, the stack and registers look as follows:

              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |                       | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2
        |    unused      |                     |                       | r3
        |    unused      |                     |                       | r4
        |    unused      |           /-------- |                       | sp
        |    unused      | <------------------ |                       | fp
        |     used       |                     | main: push #4         | pc
        |     ....       |                     |-----------------------|
        |--------------- |

First, the sp is decremented by 4 to allocate the local variable i:

              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |                       | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2 
        |    unused      |                     |                       | r3 
        |    unused      |                     |                       | r4 
        |    unused      | <------------------ |                       | sp
        |    main: i     | <------------------ |                       | fp
        |     used       |                     | main+4: jsr a         | pc
        |     ....       |                     |-----------------------|
        |--------------- |

Now jsr is called. This pushes pc+4 and the value of the fp on the stack, and sets the fp to the new sp, and pc to a:

              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |                       | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2 
        |    unused      | <---\---\           |                       | r3 
    /---| old fp         |      \   \          |                       | r4 
    |   | old pc: main+8 |       \   \-------  |                       | sp
    \-->|    main: i     |        \----------- |                       | fp
        |     used       |                     | a: mv #1 -> %r0       | pc
        |     ....       |                     |-----------------------|
        |--------------- |

Note now that we have a new stack frame for a, and the pc is executing a. The first thing it does is load 1 into r0:

              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |       1               | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2
        |    unused      | <---\---\           |                       | r3
    /---| old fp         |      \   \          |                       | r4 
    |   | old pc: main+8 |       \   \-------  |                       | sp
    \-->|    main: i     |        \----------- |                       | fp
        |     used       |                     | a+4: ret              | pc
        |     ....       |                     |-----------------------|
        |--------------- |

And then call "ret". "Ret" sets the sp to the fp (which involves nothing in this case), and then pops the fp and the pc off the stack. When it's done we're back to main()'s stack frame, and executing the next instruction after the jsr:

              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |       1               | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2
        |    unused      |                     |                       | r3
        | old fp         |                     |                       | r4
        | old pc: main+8 |<------------------- |                       | sp
        |    main: i     |<------------------- |                       | fp
        |     used       |                     | main+8: st %r0->[fp]  | pc
        |     ....       |                     |-----------------------|
        |--------------- |

Note the "old fp" and "old pc" don't get changed. However since they are "above the stack", they should not get referenced. Now the "st %r0 -> [fp]" gets executed, and the machine state looks like:

              Stack                              registers
        |----------------|                     |-----------------------|
        |                |                     |       1               | r0
        |                |                     |                       | r1
        |    .....       |                     |                       | r2
        |    unused      |                     |                       | r3
        | old fp         |                     |                       | r4
        | old pc: main+8 |<------------------- |                       | sp
        |    main: i: 1  |<------------------- |                       | fp
        |     used       |                     | main+12: ret          | pc
        |     ....       |                     |-----------------------|
        |--------------- |

Now main() is over, and calls "ret". You can imagine what this does -- the stack is set up so that when main() calls ret, control returns to the operating system and the process goes away.

Now, make sure you go over this with Jassem. The program is in p1a.jas, and you should see exactly what I have showed above, only you get to see all the memory addresses as well.

This next example shows a procedure with arguments and local variables:

int a(int i)
{
  int j;

  j = i+1;
  return j;
}

main()
{
  int i;

  i = a(5);
}

This gets compiled into code like the following:

    a:
	push #4
	ld [fp+12] -> %r0
	add %r0, %g1 -> %r0
	st %r0 -> [fp]
	ld [fp] -> %r0
	ret
    main:
	push #4 
        mov #5 -> %r0
        st %r0 -> [sp]--
	jsr a
	pop #4 
	st %r0 -> [fp]
	ret

The only real difference between this example and the last is the argument to a(). It should be clear how a allocates its local variable j on the stack by incrementing the stack pointer. j is then referenced as the location pointed to by the frame pointer. Arguments are passed by the calling procedure by pushing them onto the stack in reverse order (here there is only one argument), and then calling jsr. The procedure knows how to reference the arguments -- they start at the memory location 12 bytes ahead of the fp. Why? Well, [fp] points to the beginning of the frame. [fp+4] points to the old frame pointer, and [fp+8] points to the pc to return to when the procedure is over. Thus, if the arguments are pushed onto the stack directly before calling jsr, then they start at [fp+12].

This program is in p2.jas, and you should trace through it using Jassem. Make sure you understand how main pushes its argument on the stack, how a finds the argument, and what happens on jsr/ret.

Register Spilling

One important thing that has to be decided is whether a procedure may use a register without worrying about its current value (like a() does with r0), or whether a procedure should first save the register on the stack before using it. This matters, because suppose for example, that the main routine uses register r3, then calls "jsr a", and afterwards expects r3 to have the same value. Then a() and any procedures that a() calls must make sure not to use r3, or to save r3's value before using it, and restore it when its done.

The act of saving a register's value before the body of a procedure call and restoring it afterwards is called spilling. Different machines and compilers handle spilling in different ways. For example, older CISC architectures sometimes had a spill-mask that would be part of a procedure call. This specified which registers should be spilled, and the machine actually did the spilling for you.

What we do on our machine is a typical spilling solution: Procedures can use r0 and r1 without worrying about their values. However, registers r2 through r4 must be spilled if a procedure uses them.

Here's an example:

int a(int i, int j)
{
  int k;

  k = (i+2)*(j-5);
  return k;
}

If you think about it, there's no way to do that arithmetic using only r0 and r1. So you must spill r2 onto the stack at the beginning of the procedure, and restore it before returning:

a:
    push #4
    st %r2 -> [sp]--     / spill %r2

    ld [fp+12] -> %r0
    mov #2 -> %r1
    add %r0, %r1 -> %r0
    ld [fp+16] -> %r1
    mov #5 -> %r2
    sub %r1, %r2 -> %r1
    mult %r0, %r1 -> %r0
    st %r0 -> [fp]
    ld [fp] -> %r0

    ld ++[sp] -> %r2    / unspill %r2
    ret

Note, that you have to spill r2 onto the stack after allocating the local variable. Otherwise, k will not be at [fp]. Think about it.

Some code with pointers

Pointers require some care -- my advice with pointers is to go slowly, and think precisely. Here are the different types of pointers, and how you deal with them:

Simple Pointer Dereferencing. For example:
```
a(int *p)
{
  return *p
}
```
Here you simply get the pointer's value into a register, and then dereference the register (i.e. load its value into memory). Here's the above example:
```
a:
    ld [fp+12] -> %r0    / Get p into r0
    ld [r0] -> %r0       / dereference r0
    ret
```
A twist on this is when pointer arithmetic is involved -- then you must remember to multiply by the size of the item being pointed to. For example:
```
int a(int *p)
{
  return *(p+2)
}
```
Compiles into:
```
a:
   ld [fp+12] -> %r0
   mov #8 -> %r1
   add %r0, %r1 -> %r0
   ld [r0] -> %r0
   ret  
```
Array Dereferencing. Array dereferencing is much like pointer dereferencing. You multiply the array index by the item's size, then add it to the top of the array. Then dereference that value. For example, look at the following piece of code:
```
a(int *p)
{
  int i;
  
  i = p[0];
  i = p[5];
  i = p[i];
}
```
These are three types of array dereferencing. In the first, we will simply load p into r0 and dereference it. In the second, we load p, add 20, and dereference. In the third, we have to multiply i by 4, add that to p, then dereference:
```
a:
   push #4

   ld [fp+12] -> %r0
   ld [r0] -> %r0
   st %r0 -> [fp]

   ld [fp+12] -> %r0
   mov #20 -> %r1
   add %r0, %r1 -> %r0
   ld [r0] -> %r0
   st %r0 -> [fp]

   ld [fp] -> %r0
   mov #4 -> %r1
   mul %r0, %r1 -> %r0
   ld [fp+12] -> %r1
   add %r0, %r1 -> %r0
   ld [r0] -> %r0
   st %r0 -> [fp]

   ret
```

Note, if the compiler already knows values, it can do multiplication and addition at compile time, rather than putting it into the code. This is why, in the code above, we could directory move 20 into r1 and add it, rather than moving 5 into one register, 4 into another and multiplying them.

This concept is also at work when you declare an array as a local variable:

main()
{
  int a[5];

  a[2] = 3;
}

We allocate a by calling "push #20". The compiler knows, at compile time, that:

a[0] will be at [fp-16]
a[1] will be at [fp-12]
a[2] will be at [fp-8]
a[3] will be at [fp-4]
a[4] will be at [fp]

Therefore, the above program becomes:

main:
   push #20
   mov #3 -> %r0
   st %r0 -> [fp-8]
   ret

However, if the compiler is not dealing with constant indices, it must perform multiplication and addition to the top of the array before dereferencing it. For example:

int a(int i)
{
  int b[5];

  return b[i];
}

This compiles into:

a:
   push #20
   ld [fp+12] -> %r0       / Multiply i by 4 and put into r0  
   mov #4 -> %r1
   mul %r0, %r1 -> %r0  

   mov #-16 -> r1          / Put the top of the array in r1
   add %r1, %fp -> %r1

   add %r1, %r0 -> %r0     / Add 4*i to the top of the array

   ld [r0] -> %r0          / Dereference it and return
   ret

Addresses are similar -- you don't dereference the pointer:

int *a(int p)
{
  return &p;
}

compiles into: a: mov #12 -> %r0 add %r0, %fp -> %r0 ret Performing double-indirections is also straightforward -- you just have to think it through:

int a(int **arr, int i, int j)
{
  return a[i][j];
}

compiles into:

a:
   st %r2 -> [sp]--         / Spill r2 because you'll need ito

   ld [fp+16] -> %r0
   mov #4 -> %r1
   mul %r0, %r1 -> %r0
   ld [fp+12] -> %r1
   add %r0, %r1 -> %r0
   ld [r0] -> %r0          / a[i] is now in r0

   ld [fp+20] -> %r1
   mov #4 -> %r2
   mul %r1, %r2 -> %r1
   add %r0, %r1 -> %r0
   ld [r0] -> %r0          / a[i][j] is now in r0
   ld ++[sp] -> %r2
   ret

Finally, here are two additional pieces of code with pointers. The first (look at psimp.c) does some straightforward pointer and array operations:

main()
{
  int *a, a2[3], i;

  i = 6;
  a = &i;
  a2[1] = i+2;
  *a = 200;
  *(a2+2) = i+5;
}

Now, to compile this, you need to first figure out where all the locals are going to be. In the code below, I will put them in the following locations:

i will be [fp].
a2[0], a2[1] and a2[2] will be [fp-12], [fp-8] and [fp-4].
a will be [fp-16].

Note that this means:

&i will be fp.
a2 will be fp-12.
*a will be [[fp-12]]. Note, you can't really do that in assembler. Instead you will load [fp-12] into a register and dereference that register.
a2+2 will be fp-4. Remember -- that's pointer arithmetic.

Here's the assembler:

main:
	push #20             / Allocate locals
        st %r2 -> [sp]--     / Spill r2

        mv #6 -> %r0         / i = 6
        st %r0 -> [fp]

        st %fp -> [fp-16]    / a = &i

        ld [fp] -> %r0      / a2[1] = i+2
        mv #2 -> %r1
        add %r0, %r1 -> %r0
        st %r0 -> [fp-8]

        mv #200 -> %r0       / *a = 200
        ld [fp-16] -> %r1
        st %r0 -> [r1]

        ld [fp] -> %r0      / *(a2+2) = i+5
        mv #5 -> %r1
        add %r0, %r1 -> %r0
        mv #-12 -> %r1
        add %fp, %r1 -> %r1
        mv #8 -> %r2
        add %r1, %r2 -> %r1
        st %r0 -> [r1]

        ld ++[sp] -> %r2      / Unspill r2
        ret

Look at p3.c:

int *a(int *x)
{
  x[0] += x[2];
  return x+1;
}

main()
{
  int array[3];
  int *ip;

  array[0] = 8;
  array[1] = 9;
  array[2] = 10;

  ip = a(array);
  *ip = *ip+1;
}

Convince yourself that when you finish running it, ip should be pointing at element array[1], and the elements of array have the following values:

array[0] is 18
array[1] is 10
array[2] is 10

Now, compiling this into assembler is a bit tricky. First, here's a:

a:
    st %r2 -> [sp]--      / Spill r2
    st %r3 -> [sp]--      / Spill r3
    ld [fp+12] -> %r0     / Load x into r0
    ld [r0] -> %r1        / Load x[0] into r1
    ld [fp+12] -> %r2     / Load x[2] into r2
    mov #8 -> %r3
    add %r2, %r3 -> %r2
    ld [r2] -> %r2 
    add %r1, %r2 -> %r2   / Add them and put the result into r2
    st %r2 -> [r0]        / Store r2 into x[0]

    ld [fp+12] -> %r0     / return x+1
    mov #4 -> %r1
    add %r0, %r1 -> %r0

    ld ++[sp] -> %r3      / Restore r3
    ld ++[sp] -> %r2      / Restore r2
    ret

A uses registers r2 and r3, so the first thing it does is spill them to the stack. Next, we load ip[0] into r1 and ip[2] into r2. Then we add them and store the result back into ip[0].

The second instruction adds four to ip (pointer arithmetic) and returns that value (puts it into r0). Then it restores the spilled registers and returns.

Here's main:

main:
    push #16           // Array = fp-8, ip = fp-12
    mov #8 -> %r0
    st %r0 -> [fp-8]
    mov #9 -> %r0
    st %r0 -> [fp-4]
    mov #10 -> %r0
    st %r0 -> [fp]
 
    mov #-8 -> %r0
    add %r0, %fp -> %r0
    st %r0 -> [sp]--
    jsr a
    pop #4
    st %r0 -> [fp-12]

    ld [fp-12] -> %r0
    ld [r0] -> %r1
    add %g1, %r1 -> %r1
    st %r1 -> [r0]

First, main allocates 16 bytes worth of local variables. The compiler decides that array will start at fp-8. Ip will be stored at fp-12. This means that array[0] will be at fp-8. Array[1] will be at fp-4, and array[2] will be at fp. Given that, the setting of the three array values is straightforward.

Next, we put array (fp-8) on the stack, and call a. We store the result into ip ([fp-12]). Finally, we add one to *ip.

As before, trace through this with jassem.tcl, and make sure you understand everything.

This last section shows the code for the following main:

main(int argc, char **argv)
{
  char *s; 

  s = malloc(atoi(argv[1]));

  read(0, s, 10);

  write(1, s, strlen(s));

}

You'll note that main is just like any procedure. It assumes that its arguments have been pushed onto the stack in reverse order, starting with [fp+12]. Also, read and write, although system calls, look like regular procedure calls. The difference is that the code for read() and write() that ends up in the instructions of the program actually make system calls into the operating system.

Below is the assembler for main(). It should be very straightforward. Note that the return value of strlen() is pushed onto the stack as the 3rd argument to write(), as is the return value of atoi(). Note also how argv[1] is found. Argv is a pointer to an array. Thus, argv[1] is the value 4 bytes after the value of argv.

    main:
	push #4         	/ allocate s

	ld [fp+16] -> %r0	/ put argv into r0
	mov #4 -> %r1     
	add %r1, %r0 -> %r0
	ld [r0] -> %r1	        / put argv[1] into r1
	st %r1 -> [sp]--	/ push r1 onto the stack
	jsr atoi		/ call atoi(argv[1])
	pop #4         

	st %r0 -> [sp]--	/ push return value onto the stack
	jsr malloc		/ call malloc
	pop #4         

	st %r0, -> [fp]		/ store return value in s

	                	/ push args to read on the stack:
	mov #10 -> %r0		/ 10
	st %r0 -> [sp]--	 
        ld [fp] -> %r0          / s
	st %r0 -> [sp]--
	mov #0 -> %r0		/ 0
	st %r0 -> [sp]--	 
	jsr read		/ call read
	pop #12 

        ld [fp] -> %r0          / push argument to strlen 
	st %r0 -> [sp]--       
	jsr strlen		/ call strlen
	pop #4

                                / push args to write() on stack
	st %r0 -> [sp]--	/ strlen(s)
        ld [fp] -> %r0          / s
	st %r0 -> [sp]--         
	mov #1 -> %r0		/ 1
	st %r0 -> [sp]--	 
	jsr write		/ call write
	pop #12

	ret

I'm not going to show the stack after each instruction. You should trace it yourself. Unforunately, since read, etc are not implemented in jassem, you can't use it here. Note how the stack must be adjusted after every procedure call to get the arguments off the stack. Were the compiler to optimize, then you could save some operations. For example, after the "jsr atoi", you should just do:

	st %r0 -> [sp+4]
	jsr malloc

However, unoptimized compilers will produce inefficient, albeit easy to read code.