4. Writing Inline Assembly

In embedded programming, there are occasions when you need to get right down to the machine level. Inline assembly is often used for this purpose. This section provides a primer on writing inline assembly code and integrating it with your C programs.

1. Basic Assembly Instructions

Before diving into the inline assembly, it's useful to know some basic assembly instructions. Below is a table that summarizes some of the most commonly used ARM Cortex-M4 assembly instructions:

Instruction	Description	Example	Plain English Description
`MOV`	Moves value between registers	`MOV R0, R1`	Move the value in R1 to R0
`ADD`	Adds values	`ADD R0, R0, #1`	Add 1 to the value in R0
`SUB`	Subtracts values	`SUB R0, R0, #1`	Subtract 1 from the value in R0
`MUL`	Multiplies values	`MUL R0, R1, R2`	Multiply R1 and R2, store result in R0
`BL`	Branch with Link (calls a function)	`BL my_function`	Call the function `my_function`
`BX`	Branch and exchange (return from function)	`BX LR`	Return from the function
`LDR`	Load from memory	`LDR R0, [R1]`	Load value at address in R1 into R0
`STR`	Store to memory	`STR R0, [R1]`	Store the value in R0 at address in R1
`CMP`	Compares values	`CMP R0, #0`	Compare value in R0 to 0
`BNE`	Branch if Not Equal	`BNE loop_start`	If R0 is not 0, jump to `loop_start`
`BEQ`	Branch if Equal	`BEQ exit_loop`	If R0 is 0, jump to `exit_loop`
`B`	Unconditional Branch	`B loop_end`	Jump to `loop_end` unconditionally
`PUSH`	Push register onto stack	`PUSH {R0}`	Push R0 onto the stack
`POP`	Pop value from stack into register	`POP {R0}`	Pop top of stack into R0
`NOP`	No Operation	`NOP`	Do nothing for one cycle
`AND`	Bitwise AND	`AND R0, R0, R1`	Perform AND on R0 and R1, store in R0
`ORR`	Bitwise OR	`ORR R0, R0, R1`	Perform OR on R0 and R1, store in R0
`EOR`	Bitwise Exclusive OR	`EOR R0, R0, R1`	Perform XOR on R0 and R1, store in R0
`LSL`	Logical Shift Left	`LSL R0, R0, #2`	Shift R0 left by 2 bits
`LSR`	Logical Shift Right	`LSR R0, R0, #2`	Shift R0 right by 2 bits

Please refer to Chapter 3 ("The Cortex-M4 Instruction Set") of the ARM Cortex-M4 Generic User Guide for a complete list of instructions.

Square Brackets

Note that the square brackets [ ] are used to denote memory access, specifically dereferencing the address stored in a register.

R1: When you see just the register (e.g., R1), it refers to the value stored in that register.
[R1]: When you see a register inside square brackets (e.g., [R1]), it means you're working with the value stored in memory at the address that is in R1.

For example:

LDR R0, [R1]: This instruction will load into R0 the value stored in memory at the address contained in R1.
STR R0, [R1]: This instruction will store the value in R0 into the memory location whose address is stored in R1.

Curly Braces

Curly braces {} in ARM assembly are typically used for register lists, especially in instructions that work with multiple registers at once. These braces can be used to specify a range or a list of registers.

For example:

PUSH {R0, R1, R2}: This will push the contents of registers R0, R1, and R2 onto the stack.
POP {R0, R1, R2}: This will pop the top values from the stack into registers R0, R1, and R2.

You can also specify ranges:

PUSH {R0-R3}: This will push R0, R1, R2, and R3 onto the stack.

In ARM assembly, the # symbol is used to indicate an immediate value, which is a constant value that's directly provided in the instruction.

`#`:

Example: MOV R0, #4
In this example, the immediate value 4 is directly loaded into the register R0.

`=#`:

This syntax is often used in ARM's Unified Assembly Language (UAL) to specify that a literal constant should be loaded into a register. The = tells the assembler to generate appropriate instructions to load the constant, even if it involves multiple steps.
Example: LDR R1, =0x12345678
In this case, 0x12345678 is too large to fit into an immediate operand, so the assembler will actually store the constant in a literal pool and generate a LDR instruction to load it into R1.

2. Inline Assembly: Basic Syntax

To include assembly code in a C program, you can use the __asm volatile construct. The volatile keyword tells the compiler not to optimize out the assembly instructions, which is crucial when we're doing low-level hardware manipulations.

Basic Example

You can write single instructions on separate lines:

__asm volatile("LDR R1,=#0x20001000");
__asm volatile("LDR R2,=#0x20001004");
__asm volatile("LDR R0,[R1]");
__asm volatile("LDR R1,[R2]");
__asm volatile("ADD R0,R0,R1");
__asm volatile("STR R0,[R2]");

The purpose of this code is to read two values from specific memory locations, add them together, and then store the result back into one of those memory locations.

Another way to write this is to put all the instructions in a single line, separated by \n\t:

__asm volatile("LDR R1,=#0x20001000\n\t"
               "LDR R2,=#0x20001004\n\t"
               "LDR R0,[R1]\n\t"
               "LDR R1,[R2]\n\t"
               "ADD R0,R0,R1\n\t"
               "STR R0,[R2]\n\t");

Note the use of \n\t to separate the assembly instructions. This ensures each instruction is on a new line followed by a tab character. If you don't use \n\t, the compiler might try to optimize your assembly code in ways you don't expect, like merging multiple instructions together.

Build the code and check the disassembly (located at ./Debug/002InlineAssembly.list) to see how the compiler has translated your inline assembly code into machine code:

08000204 <main>:
 */

#include <stdint.h>

int main(void)
{
 8000204:   b480        push    {r7}
 8000206:   af00        add r7, sp, #0
    __asm volatile("LDR R1,=#0x20001000");
 8000208:   4903        ldr r1, [pc, #12]   ; (8000218 <main+0x14>)
    __asm volatile("LDR R2,=#0x20001004");
 800020a:   4a04        ldr r2, [pc, #16]   ; (800021c <main+0x18>)
    __asm volatile("LDR R0,[R1]");
 800020c:   6808        ldr r0, [r1, #0]
    __asm volatile("LDR R1,[R2]");
 800020e:   6811        ldr r1, [r2, #0]
    __asm volatile("ADD R0,R0,R1");
 8000210:   4408        add r0, r1
    __asm volatile("STR R0,[R2]");
 8000212:   6010        str r0, [r2, #0]
    for(;;);
 8000214:   e7fe        b.n 8000214 <main+0x10>
 8000216:   0000        .short  0x0000
 8000218:   20001000    .word   0x20001000
 800021c:   20001004    .word   0x20001004

As you can see, the compiler has translated your assembly code into machine code. Now to see the code in action, you can use the debugger to step through the assembly instructions one by one. Let's first open the memory window and set the address 0x20001000 and 0x20001004 to some values:

memory_window

Now, let's step through the assembly instructions while monitoring the registers window. At the end, the result of the addition should be as expected:

register_window

3. Inline Assembly: Input and Output

Instead of hard-coding values into the assembly instructions, you can pass them in as input parameters. You can also return values from the assembly code as output parameters.

Syntax

The basic syntax for __asm volatile with input and output parameters looks like this:

__asm volatile (
    "assembly code"
    : "constraint_for_output" (output_var)
    : "constraint_for_input" (input_var)
);

assembly code: This is where you place the assembly instructions.
constraint_for_output: Specifies the type or constraint of the output operand(s). This is a string that tells the compiler how the operand should be used.
output_var: The C variable that will hold the output value.
constraint_for_input: Similar to the output constraint, it specifies the type or constraint of the input operand(s).
input_var: The C variable that will serve as the input value.

Constraints

You need to define constraints that specify how the variables will be used within the assembly code. Constraints are like placeholders. Some common constraints are:

Input Constraints

Constraint	Description
`"r"`	Operand should be stored in a general-purpose register
`"m"`	Operand is a memory operand
`"i"`	Immediate integer operand with a known value
`"g"`	Operand can be a register, memory location, or immediate integer

Output Constraints

Constraint	Description
`"=r"`	Write-only operand stored in a general-purpose register
`"=m"`	Write-only operand as a memory operand
`"+r"`	Read-write operand in a general-purpose register
`"+m"`	Read-write operand in a memory location

These tables are not exhaustive but cover some of the most commonly used constraints for inline assembly in GCC. Constraints can also be more specific, depending on what you're trying to do and which assembly language you're working with.

Example

Here's a simple example:

int foo = 10, result;
__asm volatile (
    "ADD %0, %1, %2"
    : "=r" (result)  // Output
    : "r" (foo), "i" (20)  // Inputs
);

In this example, the assembly ADD instruction adds the contents of foo and 20 and stores the result in result. The %0, %1, and %2 are placeholders for the output and input operands. The %0 refers to the first operand, %1 refers to the second operand, and so on.

We can again build and check the disassembly to see how the compiler has translated our inline assembly code into machine code:

    int foo = 10, result;
 8000216:   230a        movs    r3, #10
 8000218:   607b        str r3, [r7, #4]
    __asm volatile (
 800021a:   687b        ldr r3, [r7, #4]
 800021c:   f103 0314   add.w   r3, r3, #20
 8000220:   603b        str r3, [r7, #0]
        "ADD %0, %1, %2"
        : "=r" (result)  // Output
        : "r" (foo), "i" (20)  // Inputs
    );

Here we can see foo and result are stored in registers R3 and R0, respectively. The ADD instruction is translated into f103 0314, which is the machine code for ADD R3, R3, #20. The #20 is the immediate value 20.

Clobbers: Notifying the Compiler about Side Effects

When you write inline assembly, the compiler isn't aware of what your assembly code is doing. In some cases, your assembly code may alter the state of certain registers or memory locations that the compiler might otherwise assume are unchanged. This is known as "clobbering."

To handle this, you need to inform the compiler explicitly about any such side effects. This is done using the clobber list, a part of the inline assembly syntax.

Here's the layout again for context:

__asm volatile (
    "assembly code"
    : "output" (output_var)
    : "input" (input_var)
    : "clobbered_reg_1", "clobbered_reg_2"
);

Commonly Used Clobber Identifiers

"cc": Stands for "condition code." If your assembly code changes the flags in the processor's status register, you need to indicate this so the compiler can manage the condition flags appropriately.
"memory": Tells the compiler that the assembly code performs read/write operations on memory, and that this could affect other variables stored there. This ensures that the compiler will not cache values and will reload them after the assembly code has executed.

Example:

__asm volatile (
    "LDR R0, [R1]"
    : "=r" (my_output)
    : "r" (my_input)
    : "cc", "memory"
);

In this example, the clobber list "cc", "memory" informs the compiler that the condition codes and memory could be modified by the assembly instructions, even though these changes are not visible through my_output and my_input. This ensures that the compiler generates correct, safe code around your inline assembly.