We'll assume that our machine has 8 general-purpose registers in the CPU. All are 4 bytes, and can be read or written by the user. The first five are named r0, r1, r2, r3, r4. The last three registers are special:
Additionally, the computer has three read-only registers, which always contain the same values:
Finally, the computer also has two special registers that the user cannot access directly:
In other words, if the pc contains the value 0x2040, then the IR is executing the instruction contained in the 4 bytes starting at memory address 0x2040.
Assembly code is a readable encoding of instructions. A program called an assembler converts assembly code into the proper 0's and 1's that compose the program. If you call gcc with the -S flag, it will produce a .s file containing the assembler for that C program. Without the -S flag, it produces the instructions directly.
(Try it on a Linux machine -- it produces assembler that looks much more like this than Solaris machines).
ld mem -> %reg Load the value of the register from memory. st %reg -> mem Store the value of the register into memory.There are a few ways to address memory:
st %r0 -> i Store the value of register r0 into the memory location of global variable i. st %r0 -> [r1] Treat the value of register r1 as a pointer to a memory location, and store the value of r0 in that memory location. st %r0 -> [fp+4] Treat the value of the frame pointer as a pointer to a memory location, and store the value of r0 in the memory location 4 bytes after that location. You can use any value, positive or negative, not just 4. However, you cannot use a register (i.e. you can't do st %r0 -> [fp+r2]). This only works with the frame pointer. It does not work with any other register. st %r0 -> [sp]-- Treat the value of register sp as a pointer to a memory location, store the value of r0 into that memory location, and then subtract 4 to the value of sp. st %r0 -> ++[sp] Treat the value of register sp as a pointer to a memory location. First, add 4 to that value, then store the value of r0 into that memory location.
mov %reg -> %reg Copy a register's value to another mov #val -> %reg register, or set its value to a constant.All arithmetic goes from register to register:
add %reg1, %reg2 -> %reg3 Add reg1 & reg2 and put the sum in reg3. sub %reg1, %reg2 -> %reg3 Subtract reg2 from reg1. mul %reg1, %reg2 -> %reg3 Multiply reg1 & reg2. idiv %reg1, %reg2 -> %reg3 Do integer division of reg2 into reg1. imod %reg1, %reg2 -> %reg3 Do reg1 mod reg2.There are two special instructions that let you perform addition and subtraction on the stack pointer:
push %reg This subtracts the value of stack pointer push #val by value contained in reg or the constant defined in val. pop %reg This adds the value of %reg or #val pop #val to the stack pointer.
jsr a Call the subroutine starting at instruction a. ret Return from a subroutine.There are also "compare" and "branch" instructions, which is how you implement for and if statements, but I won't go over them yet.
Finally, there are also "directives" which are not really code, but specify that memory must be allocated for variables. For example:
.globl i Allocate 4 bytes in the globals segment for the variable i.
The program counter points to where the instruction register must go to load its value. On normal instructions, the pc is incremented by 4 so that the next instruction can be loaded. On control instructions, the pc gets a new value, allowing the machine to call subroutines, perform "if-then" statements, etc.
Generally, a process treats memory like a huge array of bytes, however, the bytes are organized logically into units of 4 bytes each, as that is the size of registers. We assume that this memory is of size 0x80000000 (this is hexidecimal). Some machines, like our sparcs, assume it is 0x100000000, but we'll assume the smaller size here. The code starts at address 0 (typically, at a higher address, but for simplicity, we'll say zero here). The globals follow the code, and the heap follows the globals. Note that as a program executes, the heap and stack might grow and shrink, but the code and globals stay the same size. The stack grows from back to front, starting at address 0x80000000 (actually, starting at 0x7fffffff), and growing towards the heap. In between the heap and stack is unused memory:
The programs' address space: |--------------------------|<------- 0x00000000 | | | code | | | |--------------------------| | | | globals | | | |--------------------------| | | | heap | | | | | |vvvvvvvvvvvvvvvvvvvvvvvvvv| | (grows down) | | | .... Unused memory .... | | | (grows up) | |^^^^^^^^^^^^^^^^^^^^^^^^^^| | | | stack | | | |--------------------------|<------- 0x80000000
int i; int j; main() { i = 1; j = 2; j = i + j; }Will compile into the following assembly code:
.globl i .globl j main: mov #1 -> %r0 / i = 1 st %r0 -> i mov #2 -> %r0 / j = 2 st %r0 -> j ld i -> %r0 / j = i + j ld j -> %r1 add %r0,%r1 -> %r1 st %r1 -> j retThis code is pretty straightforward. Each instruction in C has a corresponding set of instructions in assembler. Unless your compiler is smart, it will produce inefficient code. For example, you can probably see that:
.globl i .globl j main: mov #1 -> %r0 mov #2 -> %r1 add %r0,%r1 -> %r1 st %r1 -> j st %r0 -> i retwould work just as well, and has fewer instructions. If you call gcc with the -O flag, it will attempt to optimize your code so that it has fewer instrcutions. However, normally, gcc simply produces straightforward, unoptimized code.
Now, suppose instead that we had the following code to run:
main() { int i, j; i = 1; j = 2; j = i + j; }Since i and j are local variables, they must come from temporary storage: The stack. How does the stack work? It is governed by the sp and fp registers. The sp and fp designate what is known as a "frame" on the stack. The fp points to the bottom of the frame, and the sp points to the top. All memory locations above (less than) the stack pointer are considered unused. Thus, we can get new temporary memory by decrementing the sp, thus putting memory locations into the current stack frame.
For example, When a procedure is first called, these two registers point to the same place in the stack. The frame is considered empty.
.... |----------------| | unused | |----------------| | unused | <------------- sp, fp |----------------| | used | |----------------| | used | |----------------| .... |----------------| | used | |----------------| <---------- Location 0x80000000To allocate room for the two local variables i and j, we decrement the stack pointer by 8. This allocates two 4-byte quantities in the current stack frame: By convention, we'll call the lower one j, and the upper one i. This is something that the compiler defines. We could just as easily have called the lower one i, and the upper one j:
.... |----------------| | unused | <------------- sp |----------------| | i | |----------------| | j | <------------- fp |----------------| | used | |----------------| | used | |----------------| .... |----------------| | used | |----------------| <---------- Location 0x80000000Now, the code for main() is just like before, only instead of accessing i and j as global variables, we access them as offsets to the frame pointer.
main: push #8 / This allocates i and j mov #1 -> %r0 st %r0 -> [fp-4] / Set i to 1 mov #2 -> %r0 st %r0 -> [fp] / Set j to 2 ld [fp-4] -> %r0 ld [fp] -> %r1 add %r0,%r1 -> %r1 / Add i and j and put the result st %r1 -> [fp] / back into j retLet's look at what happens when main() is executed:
Stack |----------------| | | | | | | | | | | | | | | registers | | |-----------------| | | | | r0 | | | | r1 | ..... | | | r2 | unused | | | r3 | unused | | | r4 | unused | /----------- | | sp | unused | <------------------ | | fp | used | | main | pc | .... | |-----------------| |--------------- |Note that the fp and sp point to the base of the empty stack frame. The pc points to the beginning of the main routine. This is the instruction "push #8". When this is done executing, we have the following:
Stack registers |----------------| |-----------------| | | | | r0 | | | | r1 | ..... | | | r2 | unused | | | r3 | unused | <---------------\ | | r4 | i | \- | | sp | j | <------------------ | | fp | used | | main + 4 | pc | .... | |-----------------| |--------------- |Space has been allocated on the current stack frame for i and j, and the pc has been incremented. It now points to the instruction "mov 1 -> %r0" This puts the machine into the following state:
Stack registers |----------------| |-----------------| | | | 1 | r0 | | | | r1 | ..... | | | r2 | unused | | | r3 | unused | <---------------\ | | r4 | i | \- | | sp | j | <------------------ | | fp | used | | main + 8 | pc | .... | |-----------------| |--------------- |Now, the pc points to "st %r0 -> [fp-4]". When this is done, the location for i is set to the value 1:
Stack registers |----------------| |-----------------| | | | 1 | r0 | | | | r1 | ..... | | | r2 | unused | | | r3 | unused | <---------------\ | | r4 | i=1 | \- | | sp | j | <------------------ | | fp | used | | main + 12 | pc | .... | |-----------------| |--------------- |After the next two instructions, the state of the machine will be:
mov 2 -> %r0 st %r0 -> [fp] Stack registers |----------------| |-----------------| | | | 2 | r0 | | | | r1 | ..... | | | r2 | unused | | | r3 | unused | <---------------\ | | r4 | i=1 | \- | | sp | j=2 | <------------------ | | fp | used | | main + 20 | pc | .... | |-----------------| |--------------- |Finally, the last 4 instructions do:
ld [fp-4] -> %r0 ld [fp] -> %r1 add %r0,%r1 -> %r1 st %r1 -> [fp] Stack registers |----------------| |-----------------| | | | 1 | r0 | | | 3 | r1 | ..... | | | r2 | unused | | | r3 | unused | <---------------\ | | r4 | i=1 | \- | | sp | j=3 | <------------------ | | fp | used | | main + 36 | pc | .... | |-----------------| |--------------- |
You can get this file locally at /blugreen/homes/plank/cs360/bin/jassem.tcl, or on the web at http://www.cs.utk.edu/~plank/plank/classes/cs360/360/bin/jassem.tcl .
The nice thing about tcl/tk is that it works on Unix, Windows, and Macintosh. To use jassem.tcl on our machines, simply run :
UNIX> wish ~plank/cs360/bin/jassem.tclWish should be installed on all of our machines.
To use jassem.tcl on a Windows or Macintosh machine, you will need to install tcl/tk. This is free -- get the code from www.scriptics.com.
In running jassem.tcl, the first thing you do is load a program, such as p1-g.jas (the program above that adds global variables) or p1.jas (the program above that adds local variables). You should see a picture of the system -- stack, registers, globals and code, as below:
Now you can step through the program, looking at everything as you go. Make sure you understand each step as you go through it. This is a very helpful tool.
>1. Does the term uniprocessor mean that we have only one CPU, or is there > something else I should understand?That is correct -- one CPU (as opposed to a parallel processor that has many CPUs)
>2. In the register <--> memory operation > st %r0 --> [r1] > the notes said "... treat the value of register r1 as a pointer to a memory > location ...". Except for the pc, fp and the sp do we ever know where in > memory a register is pointing ?You can assume that the pc points to the code, and the fp/sp both point to memory in the stack. Even these assumptions can be violated in some systems if you are doing complex stuff (I won't go into it). Otherwise, you cannot assume that a register is pointing to a specific memory segment. r1's pointer can point to the code, globals, heap or stack.
>3. Does each process assume that the address space is 0x80000000 ? If this is > the number of bytes it looks huge to me. I got exactly 2048Mb. The hydras, > for example, have only 96Mb of RAM. > I got 2048 by doing > 8*(16^7)/(1024^2). > Am I making any wrong assumptions ? Things don't seem right.Yes, the process assumes that memory is an array of 2 GB. However, it won't use all 2 GB. In particular, the addresses between the stack and heap are unused, and they compose the bulk of the address space. Even though a processor may have much less than 2 GB in RAM, the system is set up to look as though each process can access 2 GB. This is called "virtual memory", and is something that we'll learn about in CS560. In a few weeks, we'll see how your interface to memory is limited. In particular, usually your code and globals segment is smaller than a megabyte. On my machine, the OS does not allow the stack to grow larger than 8M or so, (type "limit" and look at "stacksize) and it does not allow the heap to grow too much larger than 96MB. If you don't believe me -- try it:
This (test1.c) tests the heap:
main(int argc, char **argv) { char *s; s = (char *) malloc(atoi(argv[1])); if (s == NULL) { perror("malloc"); exit(1); } printf("malloc %d worked\n", atoi(argv[1])); }(This is on hydra1a)
UNIX> test1 90000000 malloc 90000000 worked UNIX> test1 100000000 malloc 100000000 worked UNIX> test1 110000000 malloc: Not enough memory UNIX>This (test2.c) tests the stack:
#include(again on hydra1a)main() { char s[9000000]; printf("s = 0x%x\n", s); printf("%d\n", *s); }
UNIX> limit stacksize stacksize 8192 kbytes UNIX> test2 Segmentation fault (core dumped) UNIX> (If I change the 9000000 to 8000000, I get): UNIX> test2 s = 0xef85da08 0 UNIX>