Look at the following C code:
int a()
{
return 1;
}
main()
{
int i;
i = a();
}
This compiles into assembler that looks like the following:
a:
mov #1 -> %r0
ret
main:
push #4
jsr a
st %r0 -> [fp]
ret
So, both of these are straightforward. Main() first allocates
one variable on the stack, and then calls "jsr a", which means
jump to subroutine a. Once a returns, it stores the
value in register r0 to the local variable i, which has
been allocated to be the variable pointed to by the frame pointer.
Then main() exits. A() is straightforward as well. It
returns the value 1 by storing it in register r0, and then
returning.This seems simple, but what goes on when jsr and ret are called is a little trickier. This is what happens:
When "jsr" is called, (pc+4) and the current value of fp are both stored on the top of the stack. Then, the fp is changed to be the current sp, and pc is changed to be the location of the first instruction of the named procedure. This is done atomically by the computer's hardware. After jsr has taken effect, we are in a new stack frame, and the pc is executing a().
When "ret" is called, the sp is changed to be the current fp. Then the fp is popped off the stack: it is set to be the top stack value, and the sp is decremented by 4. Finally, the pc is popped off the stack: it is set to be the top stack value, and the sp is again decremented by 4. Like "jsr", this is all done atomically by the hardware. When "ret" completes, the pc is set to be the instruction after the original "jsr", and the stack frame of that procedure has been restored.
Let's look at it pictorally (you should use jassem.tcl to do this for yourself). At the start, the stack and registers look as follows:
Stack registers
|----------------| |-----------------------|
| | | | r0
| | | | r1
| ..... | | | r2
| unused | | | r3
| unused | | | r4
| unused | /-------- | | sp
| unused | <------------------ | | fp
| used | | main: push #4 | pc
| .... | |-----------------------|
|--------------- |
First, the sp is decremented by 4 to allocate the local variable i:
Stack registers
|----------------| |-----------------------|
| | | | r0
| | | | r1
| ..... | | | r2
| unused | | | r3
| unused | | | r4
| unused | <------------------ | | sp
| main: i | <------------------ | | fp
| used | | main+4: jsr a | pc
| .... | |-----------------------|
|--------------- |
Now jsr is called. This pushes pc+4 and the
value of the fp on the stack, and sets the fp to
the new sp, and pc to a:
Stack registers
|----------------| |-----------------------|
| | | | r0
| | | | r1
| ..... | | | r2
| unused | <---\---\ | | r3
/---| old fp | \ \ | | r4
| | old pc: main+8 | \ \------- | | sp
\-->| main: i | \----------- | | fp
| used | | a: mv #1 -> %r0 | pc
| .... | |-----------------------|
|--------------- |
Note now that we have a new stack frame for a,
and the pc is executing a.
The first thing it does is load 1 into r0:
Stack registers
|----------------| |-----------------------|
| | | 1 | r0
| | | | r1
| ..... | | | r2
| unused | <---\---\ | | r3
/---| old fp | \ \ | | r4
| | old pc: main+8 | \ \------- | | sp
\-->| main: i | \----------- | | fp
| used | | a+4: ret | pc
| .... | |-----------------------|
|--------------- |
And then call "ret". "Ret" sets the sp to the fp
(which involves nothing in this case), and then
pops the fp and the pc off the stack. When it's
done we're back to main()'s stack frame, and
executing the next instruction after the jsr:
Stack registers
|----------------| |-----------------------|
| | | 1 | r0
| | | | r1
| ..... | | | r2
| unused | | | r3
| old fp | | | r4
| old pc: main+8 |<------------------- | | sp
| main: i |<------------------- | | fp
| used | | main+8: st %r0->[fp] | pc
| .... | |-----------------------|
|--------------- |
Note the "old fp" and "old pc" don't get changed. However since they are
"above the stack", they should not get referenced.
Now the "st %r0 -> [fp]" gets executed, and the machine state looks like:
Stack registers
|----------------| |-----------------------|
| | | 1 | r0
| | | | r1
| ..... | | | r2
| unused | | | r3
| old fp | | | r4
| old pc: main+8 |<------------------- | | sp
| main: i: 1 |<------------------- | | fp
| used | | main+12: ret | pc
| .... | |-----------------------|
|--------------- |
Now main() is over, and calls "ret". You can imagine
what this does -- the stack is set up so that when main() calls
ret, control returns to the operating system and the process
goes away.
Now, make sure you go over this with Jassem. The program is in p1a.jas, and you should see exactly what I have showed above, only you get to see all the memory addresses as well.
int a(int i)
{
int j;
j = i+1;
return j;
}
main()
{
int i;
i = a(5);
}
This gets compiled into code like the following:
a:
push #4
ld [fp+12] -> %r0
add %r0, %g1 -> %r0
st %r0 -> [fp]
ld [fp] -> %r0
ret
main:
push #4
mov #5 -> %r0
st %r0 -> [sp]--
jsr a
pop #4
st %r0 -> [fp]
ret
The only real difference between this example and
the last is the argument to a(). It should be
clear how a allocates its local variable j on the
stack by incrementing the stack pointer. j is
then referenced as the location pointed to by the
frame pointer. Arguments are passed by the
calling procedure by pushing them onto the stack
in reverse order (here there is only one
argument), and then calling jsr. The procedure
knows how to reference the arguments -- they
start at the memory location 12 bytes ahead of
the fp. Why? Well, [fp] points to the beginning
of the frame. [fp+4] points to the old frame
pointer, and [fp+8] points to the pc to return to
when the procedure is over. Thus, if the
arguments are pushed onto the stack directly
before calling jsr, then they start at [fp+12].
This program is in p2.jas, and you should trace through it using Jassem. Make sure you understand how main pushes its argument on the stack, how a finds the argument, and what happens on jsr/ret.
The act of saving a register's value before the body of a procedure call and restoring it afterwards is called spilling. Different machines and compilers handle spilling in different ways. For example, older CISC architectures sometimes had a spill-mask that would be part of a procedure call. This specified which registers should be spilled, and the machine actually did the spilling for you.
What we do on our machine is a typical spilling solution: Procedures can use r0 and r1 without worrying about their values. However, registers r2 through r4 must be spilled if a procedure uses them.
Here's an example:
int a(int i, int j)
{
int k;
k = (i+2)*(j-5);
return k;
}
If you think about it, there's no way to do that arithmetic using
only r0 and r1. So you must spill r2
onto the stack at the beginning of the procedure, and restore it
before returning:
a:
push #4
st %r2 -> [sp]-- / spill %r2
ld [fp+12] -> %r0
mov #2 -> %r1
add %r0, %r1 -> %r0
ld [fp+16] -> %r1
mov #5 -> %r2
sub %r1, %r2 -> %r1
mult %r0, %r1 -> %r0
st %r0 -> [fp]
ld [fp] -> %r0
ld ++[sp] -> %r2 / unspill %r2
ret
Note, that you have to spill r2 onto the stack after allocating
the local variable. Otherwise, k will not be at [fp]. Think
about it.
a(int *p)
{
return *p
}
Here you simply get the pointer's value into a register, and then
dereference the register (i.e. load its value into memory). Here's
the above example:
a:
ld [fp+12] -> %r0 / Get p into r0
ld [r0] -> %r0 / dereference r0
ret
A twist on this is when pointer arithmetic is involved -- then you
must remember to multiply by the size of the item being pointed
to. For example:
int a(int *p)
{
return *(p+2)
}
Compiles into:
a: ld [fp+12] -> %r0 mov #8 -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 ret
a(int *p)
{
int i;
i = p[0];
i = p[5];
i = p[i];
}
These are three types of array dereferencing. In the first, we will simply
load p into r0 and dereference it. In the second, we load
p, add 20, and dereference. In the third, we have to multiply i
by 4, add that to p, then dereference:
a: push #4 ld [fp+12] -> %r0 ld [r0] -> %r0 st %r0 -> [fp] ld [fp+12] -> %r0 mov #20 -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 st %r0 -> [fp] ld [fp] -> %r0 mov #4 -> %r1 mul %r0, %r1 -> %r0 ld [fp+12] -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 st %r0 -> [fp] ret
This concept is also at work when you declare an array as a local variable:
main()
{
int a[5];
a[2] = 3;
}
We allocate a by calling "push #20". The compiler knows, at
compile time, that:
main: push #20 mov #3 -> %r0 st %r0 -> [fp-8] retHowever, if the compiler is not dealing with constant indices, it must perform multiplication and addition to the top of the array before dereferencing it. For example:
int a(int i)
{
int b[5];
return b[i];
}
This compiles into:
a: push #20 ld [fp+12] -> %r0 / Multiply i by 4 and put into r0 mov #4 -> %r1 mul %r0, %r1 -> %r0 mov #-16 -> r1 / Put the top of the array in r1 add %r1, %fp -> %r1 add %r1, %r0 -> %r0 / Add 4*i to the top of the array ld [r0] -> %r0 / Dereference it and return retAddresses are similar -- you don't dereference the pointer:
int *a(int p)
{
return &p;
}
compiles into:
a:
mov #12 -> %r0
add %r0, %fp -> %r0
ret
Performing double-indirections is also straightforward -- you just
have to think it through:
int a(int **arr, int i, int j)
{
return a[i][j];
}
compiles into:
a: st %r2 -> [sp]-- / Spill r2 because you'll need ito ld [fp+16] -> %r0 mov #4 -> %r1 mul %r0, %r1 -> %r0 ld [fp+12] -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 / a[i] is now in r0 ld [fp+20] -> %r1 mov #4 -> %r2 mul %r1, %r2 -> %r1 add %r0, %r1 -> %r0 ld [r0] -> %r0 / a[i][j] is now in r0 ld ++[sp] -> %r2 retFinally, here are two additional pieces of code with pointers. The first (look at psimp.c) does some straightforward pointer and array operations:
main()
{
int *a, a2[3], i;
i = 6;
a = &i;
a2[1] = i+2;
*a = 200;
*(a2+2) = i+5;
}
Now, to compile this, you need to first figure out where all the locals
are going to be. In the code below, I will put them in the following
locations:
main:
push #20 / Allocate locals
st %r2 -> [sp]-- / Spill r2
mv #6 -> %r0 / i = 6
st %r0 -> [fp]
st %fp -> [fp-16] / a = &i
ld [fp] -> %r0 / a2[1] = i+2
mv #2 -> %r1
add %r0, %r1 -> %r0
st %r0 -> [fp-8]
mv #200 -> %r0 / *a = 200
ld [fp-16] -> %r1
st %r0 -> [r1]
ld [fp] -> %r0 / *(a2+2) = i+5
mv #5 -> %r1
add %r0, %r1 -> %r0
mv #-12 -> %r1
add %fp, %r1 -> %r1
mv #8 -> %r2
add %r1, %r2 -> %r1
st %r0 -> [r1]
ld ++[sp] -> %r2 / Unspill r2
ret
int *a(int *x)
{
x[0] += x[2];
return x+1;
}
main()
{
int array[3];
int *ip;
array[0] = 8;
array[1] = 9;
array[2] = 10;
ip = a(array);
*ip = *ip+1;
}
Convince yourself that when you finish running it, ip should
be pointing at element array[1], and the elements of
array have the following values:
Now, compiling this into assembler is a bit tricky. First, here's a:
a:
st %r2 -> [sp]-- / Spill r2
st %r3 -> [sp]-- / Spill r3
ld [fp+12] -> %r0 / Load x into r0
ld [r0] -> %r1 / Load x[0] into r1
ld [fp+12] -> %r2 / Load x[2] into r2
mov #8 -> %r3
add %r2, %r3 -> %r2
ld [r2] -> %r2
add %r1, %r2 -> %r2 / Add them and put the result into r2
st %r2 -> [r0] / Store r2 into x[0]
ld [fp+12] -> %r0 / return x+1
mov #4 -> %r1
add %r0, %r1 -> %r0
ld ++[sp] -> %r3 / Restore r3
ld ++[sp] -> %r2 / Restore r2
ret
A uses registers r2 and r3, so the first thing it does is spill them
to the stack. Next, we load ip[0] into r1 and ip[2] into r2. Then we
add them and store the result back into ip[0].
The second instruction adds four to ip (pointer arithmetic) and returns that value (puts it into r0). Then it restores the spilled registers and returns.
Here's main:
main:
push #16 // Array = fp-8, ip = fp-12
mov #8 -> %r0
st %r0 -> [fp-8]
mov #9 -> %r0
st %r0 -> [fp-4]
mov #10 -> %r0
st %r0 -> [fp]
mov #-8 -> %r0
add %r0, %fp -> %r0
st %r0 -> [sp]--
jsr a
pop #4
st %r0 -> [fp-12]
ld [fp-12] -> %r0
ld [r0] -> %r1
add %g1, %r1 -> %r1
st %r1 -> [r0]
First, main allocates 16 bytes worth of local variables. The compiler
decides that array will start at fp-8. Ip will
be stored at fp-12.
This means that array[0] will be at fp-8.
Array[1] will be at fp-4, and
array[2] will be at fp.
Given that, the setting of the three array values is straightforward.
Next, we put array (fp-8) on the stack, and call a. We store the result into ip ([fp-12]). Finally, we add one to *ip.
As before, trace through this with jassem.tcl, and make sure you understand everything.
main(int argc, char **argv)
{
char *s;
s = malloc(atoi(argv[1]));
read(0, s, 10);
write(1, s, strlen(s));
}
You'll note that main is just like any procedure. It assumes that its arguments have been pushed onto the stack in reverse order, starting with [fp+12]. Also, read and write, although system calls, look like regular procedure calls. The difference is that the code for read() and write() that ends up in the instructions of the program actually make system calls into the operating system.
Below is the assembler for main(). It should be very straightforward. Note that the return value of strlen() is pushed onto the stack as the 3rd argument to write(), as is the return value of atoi(). Note also how argv[1] is found. Argv is a pointer to an array. Thus, argv[1] is the value 4 bytes after the value of argv.
main:
push #4 / allocate s
ld [fp+16] -> %r0 / put argv into r0
mov #4 -> %r1
add %r1, %r0 -> %r0
ld [r0] -> %r1 / put argv[1] into r1
st %r1 -> [sp]-- / push r1 onto the stack
jsr atoi / call atoi(argv[1])
pop #4
st %r0 -> [sp]-- / push return value onto the stack
jsr malloc / call malloc
pop #4
st %r0, -> [fp] / store return value in s
/ push args to read on the stack:
mov #10 -> %r0 / 10
st %r0 -> [sp]--
ld [fp] -> %r0 / s
st %r0 -> [sp]--
mov #0 -> %r0 / 0
st %r0 -> [sp]--
jsr read / call read
pop #12
ld [fp] -> %r0 / push argument to strlen
st %r0 -> [sp]--
jsr strlen / call strlen
pop #4
/ push args to write() on stack
st %r0 -> [sp]-- / strlen(s)
ld [fp] -> %r0 / s
st %r0 -> [sp]--
mov #1 -> %r0 / 1
st %r0 -> [sp]--
jsr write / call write
pop #12
ret
I'm not going to show the stack after each instruction. You should
trace it yourself. Unforunately, since read, etc are not implemented
in jassem, you can't use it here.
Note how the stack must be adjusted after every
procedure call to get the arguments off the stack. Were the compiler
to optimize, then you could save some operations. For example, after the
"jsr atoi", you should just do:
st %r0 -> [sp+4] jsr mallocHowever, unoptimized compilers will produce inefficient, albeit easy to read code.