Look at the following C code:
int a() { return 1; } main() { int i; i = a(); }This compiles into assembler that looks like the following:
a: mov #1 -> %r0 ret main: push #4 jsr a st %r0 -> [fp] retSo, both of these are straightforward. Main() first allocates one variable on the stack, and then calls "jsr a", which means jump to subroutine a. Once a returns, it stores the value in register r0 to the local variable i, which has been allocated to be the variable pointed to by the frame pointer. Then main() exits. A() is straightforward as well. It returns the value 1 by storing it in register r0, and then returning.
This seems simple, but what goes on when jsr and ret are called is a little trickier. This is what happens:
When "jsr" is called, (pc+4) and the current value of fp are both stored on the top of the stack. Then, the fp is changed to be the current sp, and pc is changed to be the location of the first instruction of the named procedure. This is done atomically by the computer's hardware. After jsr has taken effect, we are in a new stack frame, and the pc is executing a().
When "ret" is called, the sp is changed to be the current fp. Then the fp is popped off the stack: it is set to be the top stack value, and the sp is decremented by 4. Finally, the pc is popped off the stack: it is set to be the top stack value, and the sp is again decremented by 4. Like "jsr", this is all done atomically by the hardware. When "ret" completes, the pc is set to be the instruction after the original "jsr", and the stack frame of that procedure has been restored.
Let's look at it pictorally (you should use jassem.tcl to do this for yourself). At the start, the stack and registers look as follows:
Stack registers |----------------| |-----------------------| | | | | r0 | | | | r1 | ..... | | | r2 | unused | | | r3 | unused | | | r4 | unused | /-------- | | sp | unused | <------------------ | | fp | used | | main: push #4 | pc | .... | |-----------------------| |--------------- |First, the sp is decremented by 4 to allocate the local variable i:
Stack registers |----------------| |-----------------------| | | | | r0 | | | | r1 | ..... | | | r2 | unused | | | r3 | unused | | | r4 | unused | <------------------ | | sp | main: i | <------------------ | | fp | used | | main+4: jsr a | pc | .... | |-----------------------| |--------------- |Now jsr is called. This pushes pc+4 and the value of the fp on the stack, and sets the fp to the new sp, and pc to a:
Stack registers |----------------| |-----------------------| | | | | r0 | | | | r1 | ..... | | | r2 | unused | <---\---\ | | r3 /---| old fp | \ \ | | r4 | | old pc: main+8 | \ \------- | | sp \-->| main: i | \----------- | | fp | used | | a: mv #1 -> %r0 | pc | .... | |-----------------------| |--------------- |Note now that we have a new stack frame for a, and the pc is executing a. The first thing it does is load 1 into r0:
Stack registers |----------------| |-----------------------| | | | 1 | r0 | | | | r1 | ..... | | | r2 | unused | <---\---\ | | r3 /---| old fp | \ \ | | r4 | | old pc: main+8 | \ \------- | | sp \-->| main: i | \----------- | | fp | used | | a+4: ret | pc | .... | |-----------------------| |--------------- |And then call "ret". "Ret" sets the sp to the fp (which involves nothing in this case), and then pops the fp and the pc off the stack. When it's done we're back to main()'s stack frame, and executing the next instruction after the jsr:
Stack registers |----------------| |-----------------------| | | | 1 | r0 | | | | r1 | ..... | | | r2 | unused | | | r3 | old fp | | | r4 | old pc: main+8 |<------------------- | | sp | main: i |<------------------- | | fp | used | | main+8: st %r0->[fp] | pc | .... | |-----------------------| |--------------- |Note the "old fp" and "old pc" don't get changed. However since they are "above the stack", they should not get referenced. Now the "st %r0 -> [fp]" gets executed, and the machine state looks like:
Stack registers |----------------| |-----------------------| | | | 1 | r0 | | | | r1 | ..... | | | r2 | unused | | | r3 | old fp | | | r4 | old pc: main+8 |<------------------- | | sp | main: i: 1 |<------------------- | | fp | used | | main+12: ret | pc | .... | |-----------------------| |--------------- |Now main() is over, and calls "ret". You can imagine what this does -- the stack is set up so that when main() calls ret, control returns to the operating system and the process goes away.
Now, make sure you go over this with Jassem. The program is in p1a.jas, and you should see exactly what I have showed above, only you get to see all the memory addresses as well.
int a(int i) { int j; j = i+1; return j; } main() { int i; i = a(5); }This gets compiled into code like the following:
a: push #4 ld [fp+12] -> %r0 add %r0, %g1 -> %r0 st %r0 -> [fp] ld [fp] -> %r0 ret main: push #4 mov #5 -> %r0 st %r0 -> [sp]-- jsr a pop #4 st %r0 -> [fp] retThe only real difference between this example and the last is the argument to a(). It should be clear how a allocates its local variable j on the stack by incrementing the stack pointer. j is then referenced as the location pointed to by the frame pointer. Arguments are passed by the calling procedure by pushing them onto the stack in reverse order (here there is only one argument), and then calling jsr. The procedure knows how to reference the arguments -- they start at the memory location 12 bytes ahead of the fp. Why? Well, [fp] points to the beginning of the frame. [fp+4] points to the old frame pointer, and [fp+8] points to the pc to return to when the procedure is over. Thus, if the arguments are pushed onto the stack directly before calling jsr, then they start at [fp+12].
This program is in p2.jas, and you should trace through it using Jassem. Make sure you understand how main pushes its argument on the stack, how a finds the argument, and what happens on jsr/ret.
The act of saving a register's value before the body of a procedure call and restoring it afterwards is called spilling. Different machines and compilers handle spilling in different ways. For example, older CISC architectures sometimes had a spill-mask that would be part of a procedure call. This specified which registers should be spilled, and the machine actually did the spilling for you.
What we do on our machine is a typical spilling solution: Procedures can use r0 and r1 without worrying about their values. However, registers r2 through r4 must be spilled if a procedure uses them.
Here's an example:
int a(int i, int j) { int k; k = (i+2)*(j-5); return k; }If you think about it, there's no way to do that arithmetic using only r0 and r1. So you must spill r2 onto the stack at the beginning of the procedure, and restore it before returning:
a: push #4 st %r2 -> [sp]-- / spill %r2 ld [fp+12] -> %r0 mov #2 -> %r1 add %r0, %r1 -> %r0 ld [fp+16] -> %r1 mov #5 -> %r2 sub %r1, %r2 -> %r1 mult %r0, %r1 -> %r0 st %r0 -> [fp] ld [fp] -> %r0 ld ++[sp] -> %r2 / unspill %r2 retNote, that you have to spill r2 onto the stack after allocating the local variable. Otherwise, k will not be at [fp]. Think about it.
a(int *p) { return *p }Here you simply get the pointer's value into a register, and then dereference the register (i.e. load its value into memory). Here's the above example:
a: ld [fp+12] -> %r0 / Get p into r0 ld [r0] -> %r0 / dereference r0 retA twist on this is when pointer arithmetic is involved -- then you must remember to multiply by the size of the item being pointed to. For example:
int a(int *p) { return *(p+2) }Compiles into:
a: ld [fp+12] -> %r0 mov #8 -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 ret
a(int *p) { int i; i = p[0]; i = p[5]; i = p[i]; }These are three types of array dereferencing. In the first, we will simply load p into r0 and dereference it. In the second, we load p, add 20, and dereference. In the third, we have to multiply i by 4, add that to p, then dereference:
a: push #4 ld [fp+12] -> %r0 ld [r0] -> %r0 st %r0 -> [fp] ld [fp+12] -> %r0 mov #20 -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 st %r0 -> [fp] ld [fp] -> %r0 mov #4 -> %r1 mul %r0, %r1 -> %r0 ld [fp+12] -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 st %r0 -> [fp] ret
This concept is also at work when you declare an array as a local variable:
main() { int a[5]; a[2] = 3; }We allocate a by calling "push #20". The compiler knows, at compile time, that:
main: push #20 mov #3 -> %r0 st %r0 -> [fp-8] retHowever, if the compiler is not dealing with constant indices, it must perform multiplication and addition to the top of the array before dereferencing it. For example:
int a(int i) { int b[5]; return b[i]; }This compiles into:
a: push #20 ld [fp+12] -> %r0 / Multiply i by 4 and put into r0 mov #4 -> %r1 mul %r0, %r1 -> %r0 mov #-16 -> r1 / Put the top of the array in r1 add %r1, %fp -> %r1 add %r1, %r0 -> %r0 / Add 4*i to the top of the array ld [r0] -> %r0 / Dereference it and return retAddresses are similar -- you don't dereference the pointer:
int *a(int p) { return &p; }compiles into: a: mov #12 -> %r0 add %r0, %fp -> %r0 ret Performing double-indirections is also straightforward -- you just have to think it through:
int a(int **arr, int i, int j) { return a[i][j]; }compiles into:
a: st %r2 -> [sp]-- / Spill r2 because you'll need ito ld [fp+16] -> %r0 mov #4 -> %r1 mul %r0, %r1 -> %r0 ld [fp+12] -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 / a[i] is now in r0 ld [fp+20] -> %r1 mov #4 -> %r2 mul %r1, %r2 -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 / a[i][j] is now in r0 ld ++[sp] -> %r2 retFinally, here are two additional pieces of code with pointers. The first (look at psimp.c) does some straightforward pointer and array operations:
main() { int *a, a2[3], i; i = 6; a = &i; a2[1] = i+2; *a = 200; *(a2+2) = i+5; }Now, to compile this, you need to first figure out where all the locals are going to be. In the code below, I will put them in the following locations:
main: push #20 / Allocate locals st %r2 -> [sp]-- / Spill r2 mv #6 -> %r0 / i = 6 st %r0 -> [fp] st %fp -> [fp-16] / a = &i ld [fp] -> %r0 / a2[1] = i+2 mv #2 -> %r1 add %r0, %r1 -> %r0 st %r0 -> [fp-8] mv #200 -> %r0 / *a = 200 ld [fp-16] -> %r1 st %r0 -> [r1] ld [fp] -> %r0 / *(a2+2) = i+5 mv #5 -> %r1 add %r0, %r1 -> %r0 mv #-12 -> %r1 add %fp, %r1 -> %r1 mv #8 -> %r2 add %r1, %r2 -> %r1 st %r0 -> [r1] ld ++[sp] -> %r2 / Unspill r2 ret
int *a(int *x) { x[0] += x[2]; return x+1; } main() { int array[3]; int *ip; array[0] = 8; array[1] = 9; array[2] = 10; ip = a(array); *ip = *ip+1; }Convince yourself that when you finish running it, ip should be pointing at element array[1], and the elements of array have the following values:
Now, compiling this into assembler is a bit tricky. First, here's a:
a: st %r2 -> [sp]-- / Spill r2 st %r3 -> [sp]-- / Spill r3 ld [fp+12] -> %r0 / Load x into r0 ld [r0] -> %r1 / Load x[0] into r1 ld [fp+12] -> %r2 / Load x[2] into r2 mov #8 -> %r3 add %r2, %r3 -> %r2 ld [r2] -> %r2 add %r1, %r2 -> %r2 / Add them and put the result into r2 st %r2 -> [r0] / Store r2 into x[0] ld [fp+12] -> %r0 / return x+1 mov #4 -> %r1 add %r0, %r1 -> %r0 ld ++[sp] -> %r3 / Restore r3 ld ++[sp] -> %r2 / Restore r2 retA uses registers r2 and r3, so the first thing it does is spill them to the stack. Next, we load ip[0] into r1 and ip[2] into r2. Then we add them and store the result back into ip[0].
The second instruction adds four to ip (pointer arithmetic) and returns that value (puts it into r0). Then it restores the spilled registers and returns.
Here's main:
main: push #16 // Array = fp-8, ip = fp-12 mov #8 -> %r0 st %r0 -> [fp-8] mov #9 -> %r0 st %r0 -> [fp-4] mov #10 -> %r0 st %r0 -> [fp] mov #-8 -> %r0 add %r0, %fp -> %r0 st %r0 -> [sp]-- jsr a pop #4 st %r0 -> [fp-12] ld [fp-12] -> %r0 ld [r0] -> %r1 add %g1, %r1 -> %r1 st %r1 -> [r0]First, main allocates 16 bytes worth of local variables. The compiler decides that array will start at fp-8. Ip will be stored at fp-12. This means that array[0] will be at fp-8. Array[1] will be at fp-4, and array[2] will be at fp. Given that, the setting of the three array values is straightforward.
Next, we put array (fp-8) on the stack, and call a. We store the result into ip ([fp-12]). Finally, we add one to *ip.
As before, trace through this with jassem.tcl, and make sure you understand everything.
main(int argc, char **argv) { char *s; s = malloc(atoi(argv[1])); read(0, s, 10); write(1, s, strlen(s)); }
You'll note that main is just like any procedure. It assumes that its arguments have been pushed onto the stack in reverse order, starting with [fp+12]. Also, read and write, although system calls, look like regular procedure calls. The difference is that the code for read() and write() that ends up in the instructions of the program actually make system calls into the operating system.
Below is the assembler for main(). It should be very straightforward. Note that the return value of strlen() is pushed onto the stack as the 3rd argument to write(), as is the return value of atoi(). Note also how argv[1] is found. Argv is a pointer to an array. Thus, argv[1] is the value 4 bytes after the value of argv.
main: push #4 / allocate s ld [fp+16] -> %r0 / put argv into r0 mov #4 -> %r1 add %r1, %r0 -> %r0 ld [r0] -> %r1 / put argv[1] into r1 st %r1 -> [sp]-- / push r1 onto the stack jsr atoi / call atoi(argv[1]) pop #4 st %r0 -> [sp]-- / push return value onto the stack jsr malloc / call malloc pop #4 st %r0, -> [fp] / store return value in s / push args to read on the stack: mov #10 -> %r0 / 10 st %r0 -> [sp]-- ld [fp] -> %r0 / s st %r0 -> [sp]-- mov #0 -> %r0 / 0 st %r0 -> [sp]-- jsr read / call read pop #12 ld [fp] -> %r0 / push argument to strlen st %r0 -> [sp]-- jsr strlen / call strlen pop #4 / push args to write() on stack st %r0 -> [sp]-- / strlen(s) ld [fp] -> %r0 / s st %r0 -> [sp]-- mov #1 -> %r0 / 1 st %r0 -> [sp]-- jsr write / call write pop #12 retI'm not going to show the stack after each instruction. You should trace it yourself. Unforunately, since read, etc are not implemented in jassem, you can't use it here. Note how the stack must be adjusted after every procedure call to get the arguments off the stack. Were the compiler to optimize, then you could save some operations. For example, after the "jsr atoi", you should just do:
st %r0 -> [sp+4] jsr mallocHowever, unoptimized compilers will produce inefficient, albeit easy to read code.