Learning Objectives

RISC-V Assembly

RISC-V assembly is like any other assembly and resembles MIPS assembly. Just like any assembly, we have a list of instructions that incrementally get us closer to our solution.

We will be using the riscv-g++ compiler and linking C++ files with assembly files. You will write the assembly files, and the C++ files help make the lab a little bit easier.

Assembly Files

Assembly files end in a .S (capital S). The compiler includes all stages of compiling, assembling, and linking, but when we pass a file with a capital S, the compiler will skip right to the assembling stage.

We can specify a lowercase .s, but this will skip the preprocessor stage. So, in this class, use capital S.

vim myfile.S

RISC-V Register File

RISC-V contains 32 integer registers and 32 floating point registers. Through the ABI names, we reserve some of these registers for certain purposes. For example, all registers that start with a t for temporary can be used for any purposes. All registers that start with an a for argument are used for arguments passed to a function. All registers that start with s (except sp) for saved are registers that are preserved across function calls.

Integer Instructions

RISC-V contains integer and logic instructions as well as a few memory instructions. RISC-V is a load/store architecture, so integer instruction operands must be registers.

Instruction Example Description
lb t0, 8(sp) Loads (dereferences) from memory address (sp + 8) into register t0. lb = load byte, lh = load halfword, lw = load word, ld = load doubleword.
sb t0, 8(sp) Stores (dereferences) from register t0 into memory address (sp + 8). sb = store byte, sh = store halfword, sw = store word, sd = store doubleword.
add a0, t0, t1 Adds value of t0 to the value of t1 and stores the sum into a0.
addi a0, t0, -10 Adds value of t0 to the value -10 and stores the sum into a0.
sub a0, t0, t1 Subtracts value of t1 from value of t0 and stores the difference in a0.
mul a0, t0, t1 Multiplies the value of t0 to the value of t1 and stores the product in a0.
div a1, s3, t3 Dividies the value of t3 (denominator) from the value of s3 (numerator) and stores the quotient into the register a1.
rem a1, s3, t3 Divides the value of t3 (denominator) from the value of s3 (numerator) and stores the remainder into the register a1.
and a3, t3, s3 Performs logical AND on operands t3 and s3 and stores the result into the register a3.
or a3, t3, s3 Performs logical OR on operands t3 and s3 and stores the result into the register a3.
xor a3, t3, s3 Performs logical XOR on operands t3 and s3 and stores the result into the register a3.

Since RISC-V is a reduced instruction set, many instructions that can be completed by using another instruction are left off. For example, the neg a0, a1 (two's complement) instruction does not exist. However, this is equivalent to sub a0, zero, a1. In other words, 0 - a1 is the same as -a1.

Pseudo Instructions

The assembler provides for several pseudoinstructions, which expand into real instructions. For example, neg above is a pseudoinstruction. Whenever the assembler reads this instruction, it automatically expands it to be the sub instruction. Below is a list of all pseudoinstructions and their function.

Floating Point Instructions

The floating point instructions are prefixed with an f, such as fld, fsw, for floating-point load doubleword and floating point store word, respectively. The floating point instructions come in two flavors: (1) single-precision and (2) double-precision. You can select which data size you want by adding a suffix, which is either .s (for single-precision) or .d (for double-precision).

# Load a double-precision value
flw     ft0, 0(sp)
# ft0 now contains whatever we loaded from memory + 0
flw     ft1, 4(sp)
# ft1 now contains whatever we loaded from memory + 4
fadd.s  ft2, ft0, ft1
# ft2 is now ft0 + ft1

Notice in the code above, we used the fadd.s instruction to tell the RISC-V processor to add two single-precision values (ft0 and ft1) and store it as a single precision value into ft2.

We can convert between double and single precision using the instructions fcvt.d.s (convert from single into double) or the fcvt.s.d (convert from double to single).

Branching Instructions

Branching instructions are a way to jump to different parts of your code. If we didn't have branching instructions, the CPU would just be able to execute one instruction after another. With jumps and branches, we can go to any instruction, even out of order!

Branching instructions are how function calls and conditionals are implemented in assembly. Branching refers to the "conditional jump" instructions, such as beq, bne, bgt, bge, blt, ble for branch-if equals, not equals, greater than, greater than or equals, less than, and less than or equals, respectively.

The branching instructions take three parameters: the two operands (registers) to compare, and then if that comparison holds true, a memory label of the instruction you want to execute. If the branch condition is false, the branch instruction is ignored and the CPU goes to the next instruction below.

# t0 = 0
li      t0, 0
li      t2, 10
bge     t0, t2, loop_end
# Repeated code goes here
addi    t0, t0, 1
j		loop_head

The assembly code above implements the following C++ loop.

for (int i = 0;i < 10;i++) {
    // Repeated code goes here.

Notice that I used the "contrary" view of the condition. In a for loop, as long as the condition holds true, we execute the body of the loop. In assembly, I took the opposite. I'm saying if t0 is greater than or equal to t2 (>= is the opposite of <), then jump OUT of the loop and be done.

Taking the contrary view can save us some instructions.

Using the Stack

The stack is used for local memory storage. The stack grows from bottom (high memory) to top (low memory), and the bottom of the stack has a dedicated register called sp for stack pointer.

Whenever we use the saved registers or if we want to preserve a temporary register across a function call, we must save it on the stack. To allocate from the stack, we subtract. To deallocate, we add. Notice we don't "clean" the stack. This is why uninitialized variables in C++ are considered "garbage", since anything left on the stack is still there.

The stack MUST be aligned to 8, meaning we must always subtract and add a multiple of 8 from/to the stack.

addi    sp, sp, -8
sd      ra, 0(sp)
call    printf
ld      ra, 0(sp)
addi    sp, sp, 8

The code above saves the return address on the stack, calls printf, and then when printf returns, we load the old value of the return address back off the stack, and then deallocate by adding 8.

C++ to Assembly Conversion

A compiler's job is to convert .cpp files into assembly files, where an assembler will assemble an assembly file into machine code as an object file. A linker then links all object files together into an executable or into a library.

We know that our C++ code boils down into assembly, so whatever we can do in C++, we can also do in assembly. I've shown some examples above on how to write a for loop, but let's take a look at the other C++ constructs.


Functions are just a memory label to the very first instruction. The application binary interface (ABI) specifies what registers get what parameters and how to return things back and forth. However, all functions have a preamble, which is essentially setting up a stack frame for local storage, and an epilogue, which usually entails loading saved registers and return address and moving the stack pointer before returning.

void my_function();

    # Prologue
    addi    sp, sp, -32
    sd      ra, 0(sp)
    sd      a0, 8(sp)
    sd      s0, 16(sp)
    sd      s1, 24(sp)

    # Epilogue
    ld      ra, 0(sp)
    ld      a0, 8(sp)
    ld      s0, 16(sp)
    ld      s1, 24(sp)
    addi    sp, sp, 32

This code shows that we first allocate 32 bytes from the stack, which is the size of 4 registers. You can see that I subtract all of the necessary space off of the stack first, store the values, run my code, and then execute the epilogue. This was the main purpose for adding an offset to the store and load instructions.

Another thing to note is that I'm storing all caller saved registers. Once again, we must consider all caller saved registers to be destroyed. That includes all temporary, argument, and return address registers. I did save some saved registers above, but recall, if we use the saved registers, we are required to put their original values back in them before we return.

We want one prologue and one epilogue. When we call additional functions, we want our stack to be framed. In programming languages courses, you will hear about stack frames. So, we allocate ourselves ALL of the space necessary for the function, then store to it.

bne     t0, zero, 1f
# Code goes here if t0 == 0
j       2f   
bne     t1, zero, 1f
# Code goes here if t1 == 0
j       2f
# Code goes here if t0 != 0 and t1 != 0
# Dumping point is here.

The assembly code above mocks the following C++ code.

if (!t0) {
    // Code goes here if t0 == 0
else if (!t1) {
    // Code goes here if t1 == 0
else {
    // Code goes here if t0 != 0 and t1 != 0
// Dumping point is here.

If you don't remember, the label 1f means to go to the numeric label 1 FORWARD of the given position. This is the opposite of 1b, which looks for a numeric label 1 BACKWARDS of the given position.

Using Printf

Printf requires that the first parameter be a c-style, null-terminated string, which we can create using the .asciz assembler directive. The following code gives an example of how to use printf.

.section .rodata
prompt: .asciz "Value of t0 = %ld and value of t1 = %ld\n"
.section .text
    addi    sp, sp, -8
    sd      ra, 0(sp)
    la      a0, prompt
    mv      a1, t0
    mv      a2, t1
    call    printf
    ld      ra, 0(sp)
    addi    sp, sp, 8

The code above shows that we put the first parameter to printf in a0, which is the string we want to output. Then we want to output the values of t0 and t1, so those need to be moved into the other parameter registers a1 and a2, respectively.

Anytime you see a function call, you should be thinking about saving the return address register, like I did above. I might not start off by using the stack, but everytime I type "call", my fingers automatically expect to start typing something to save the RA (return address) register. Also, remember to always deallocate before you return!

Application Binary Interface (ABI)

We have 8 argument registers a0 through a7. These will be the 8 NON-FLOAT parameters passed to a function. This includes pointers, in which aX will contain a memory address, or pass-by-value, in which aX will contain the actual value. For floating point values only, you will use fa0 through fa7.

The ABI further states that we have to return an integer value via a0 or a floating point value via fa0.

If you have a function that combines integer and floating point, you use whatever number comes first that hasn't been taken. For example, consider the following prototype.

float func(int a, int *b, float c);

This function requires that int a be in the register a0, int *b have the memory address that b points to in a1, and the value of float c in fa0. Since we return a float, the result must be put into fa0 before executing the ret instruction.


Take note that we use a0, a1, ..., a7. This goes for all sizes, byte, word, doubleword, etc. Remember that we parse out the data size by choosing the instruction. For float versus double, we choose instruction.s versus instruction.d. For example, fadd.s fa0, ft0, ft1 adds single-precision values and fadd.d fa0, ft0, ft1 adds double-precision values.