8. Stack

1. Introduction to Stack Memory

What is Stack Memory?

Stack memory is a specialized part of the main memory used in computer systems and microcontrollers like the ARM Cortex-Mx series. It operates in a Last-In-First-Out (LIFO) manner, making it an efficient temporary storage for data and return addresses.

Role in Function Calls, Interrupts, and Exceptions

One of the most vital roles of the stack is to manage function calls, interrupts, and exceptions. When a function is called or an interrupt occurs, the processor pushes the return address and other relevant information onto the stack. This allows for seamless execution and later resumption of tasks.

Stack Operations: PUSH and POP

Stack operations are typically performed using PUSH and POP instructions. These instructions interact with the Stack Pointer (SP), known as R13 in the ARM Cortex-Mx series. The PUSH instruction decrements the SP, allocating space for new data, while POP increments the SP, freeing up space. These instructions usually modify the stack pointer by 4 bytes at a time.

Stack Pointer Initialization

Upon reset, the Stack Pointer (SP) is initialized to the value stored in the Main Stack Pointer (MSP) register. However, during the course of execution, it can be switched to the Process Stack Pointer (PSP) for more complex multi-threaded operations.

SRAM Organization

In most systems, SRAM (Static Random-Access Memory) is broken down into several segments:

Global Data: This area is used for storing static variables and program data.
Heap: Used for dynamic memory allocation, such as when you use functions like malloc in C.
Stack: Used for function calls, interrupts, and exceptions.

Memory Management

The size of each part of the SRAM (Global Data, Heap, and Stack) is usually decided at compile time or during system initialization. The stack generally starts at the top of the available memory (i.e., the highest address) and grows downwards, whereas the heap starts at the bottom (after the global data segment) and grows upwards. If either part becomes full, it could result in a stack overflow or heap exhaustion, leading to undefined behavior or system crashes.

The global data segment is usually allocated a fixed size that is determined at compile time based on the requirements of the program. This segment starts after the code memory and usually precedes the heap in memory layout. Since the global data segment has a fixed size, it doesn't dynamically grow or shrink during program execution. As a result, the risk of overflow is not present for global data, unlike with the stack and heap.

So, in essence, the SRAM could be visually imagined as divided into three main regions: Global Data at the bottom, followed by the Heap, and the Stack at the top. Each has its unique characteristics and operational rules, but together they allow for the versatile memory management essential for modern computing tasks.

2. Stack Operation Models

In ARM Cortex-Mx processors, the stack uses a Full Descending Stack Model. This is something we cannot change. Below is an in-depth look at what these models mean.

Ascent and Descent Models

Descending Stack Model (ARM Cortex-Mx): In the Descending Stack Model, the stack pointer starts at the highest address allocated for the stack and moves towards lower addresses as new elements are pushed onto the stack. This is what ARM Cortex-Mx uses by default.
Ascending Stack Model: Contrarily, in the Ascending Stack Model, the stack starts at the lowest address and moves upwards as elements are pushed onto the stack. This model is not applicable to ARM Cortex-Mx.

Full and Empty Models

Full Stack Model (ARM Cortex-Mx): The ARM Cortex-Mx uses a Full Stack Model, meaning that the stack pointer points to the last item that has been pushed onto the stack. When you push a new item, the stack pointer decrements first and then the item is stored at that location. This ensures that the stack is always 'full' of data, even if it's just one item.
Empty Stack Model: In the Empty Stack Model, the stack pointer points to the location where the next item will be stored. When an item is pushed, it's stored first, and then the stack pointer is updated. Here, the stack is considered 'empty' until a new item gets pushed.

Summary

In summary, the ARM Cortex-Mx uses a Full Descending Stack Model. This means that the stack pointer starts at the highest address allocated for the stack and moves towards lower addresses as new elements are pushed onto the stack. The stack pointer points to the last item that has been pushed onto the stack. When you push a new item, the stack pointer decrements first and then the item is stored at that location.

3. Stack Placement in ARM Cortex-Mx

In ARM Cortex-Mx microcontrollers, the stack is an essential part of the SRAM (Static Random-Access Memory). It is generally placed at the topmost section of the SRAM to facilitate its growth towards lower memory addresses. Below are the key aspects that define stack placement:

Starting Address

The stack's starting address is usually the highest address of the SRAM. This is in line with the Full Descending Stack Model used by ARM Cortex-Mx processors, which allows the stack to grow downwards.

Stack Pointer Initialization

Upon reset, the stack pointer (SP), also known as R13 in ARM Cortex-Mx, is initialized to the Main Stack Pointer (MSP). The MSP register holds the highest address of the stack, essentially pointing to where the stack starts.

Memory Division

The SRAM is divided into different sections such as global data, heap, and stack. The stack typically starts at the 'top' (highest address) and grows downwards, while the heap starts at the 'bottom' (lowest address) and grows upwards.

Size of the Stack

The size of the stack is generally defined at compile-time or during system initialization. It is crucial to allocate enough space for the stack to avoid stack overflow, which could lead to undefined behavior or system crashes.

4. Banked Stack Pointer Registers

In ARM Cortex-Mx processors, two separate stack pointer registers are available for system and application tasks: the Main Stack Pointer (MSP) and the Process Stack Pointer (PSP). These are known as banked stack pointer registers because they are part of the banked register set, allowing more flexible control over different operational states of the microcontroller.

Main Stack Pointer (MSP)

The MSP is the default stack pointer used by the processor. It's activated on reset and is generally used in privileged code such as the operating system kernel or interrupt service routines (ISRs). The key features of MSP are:

Initialization: Upon system reset, the MSP is automatically loaded with the start address of the stack, which is usually the highest SRAM address.
Privileged Access: Primarily used for handling system-level tasks, the MSP operates in privileged mode, providing unrestricted access to all CPU resources.
Interrupt Handling: By default, all exceptions and interrupts use the MSP, ensuring that system-level tasks have a stable, separate stack space.

We can see how the stack pointer is initialized to the MSP in the startup file startup_stm32f446retx.s:

Reset_Handler:
  ldr   r0, =_estack
  mov   sp, r0          /* set stack pointer */

This _etstack is defined in the linker script STM32F446RETX_FLASH.ld:

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM); /* end of "RAM" Ram type memory */

_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */

/* Memories definition */
MEMORY
{
  RAM     (xrw)    : ORIGIN = 0x20000000,   LENGTH = 128K
  FLASH    (rx)    : ORIGIN = 0x8000000,   LENGTH = 512K
}

_estack = ORIGIN(RAM) + LENGTH(RAM);: This line sets the value of _estack to the highest address of the RAM. In ARM Cortex-Mx processors, the stack starts at the highest address and grows downwards. _estack is used as the starting point for the Main Stack Pointer (MSP).

Process Stack Pointer (PSP)

The PSP is an optional stack pointer meant for user-level tasks or threads in a multitasking environment. Enabling the PSP allows for easier context switching and task isolation. Here's more on the PSP:

User-Level Code: The PSP is generally used for running application-level or user-level code.
Optional Use: The PSP has to be explicitly enabled, typically when an operating system is involved that can take advantage of dual stack pointers.
Context Switching: In Real-Time Operating Systems (RTOS), the PSP is beneficial for efficient context switching between different tasks or threads.

Switching Between MSP and PSP

The CONTROL register in the Cortex-Mx processor provides an option to switch between using the MSP and the PSP. This is particularly useful for applications involving an RTOS, where tasks may switch frequently between user-level and system-level code.

5. Example

We can set the SPSEL bit in the CONTROL register to switch between the MSP and PSP. The following code snippet shows how to switch to the PSP:

__attribute__((naked)) void switchToPSP() {
    __asm volatile(".equ SRAM_END, (0x20000000 + (128 * 1024))");
    __asm volatile(".equ PSP_START, (SRAM_END-512)");
    __asm volatile("LDR R0, =PSP_START");
    __asm volatile("MSR PSP, R0");
    __asm volatile("MOV R0, #0x02");
    __asm volatile("MSR CONTROL, R0");  // Set CONTROL->SPSEL bit
    __asm volatile("BX LR");  // Return to main()
}

__attribute__((naked)): This attribute tells the compiler not to add any prologue or epilogue to the function. In other words, it won't automatically push/pop registers or set up/tear down the stack frame. This is useful when you need full control over the function's behavior, like when you're managing stack pointers manually.
.equ: It stands for "equate" and it's used to set a symbolic name equal to a value. It's essentially creating a constant. In assembly language, .equ SRAM_END, (0x20000000 + (128 * 1024)) sets SRAM_END to 0x20000000 + (128 * 1024) so that you can use SRAM_END elsewhere in your code instead of writing out the whole value each time.
From the ARM Cortex-M4 Generic User Guide, we can see that the SPSEL bit is bit 1 of the CONTROL register:

CONTROL_SPSEL

Let's take a look at the code in main.c:

#include <stdio.h>

// Function to add four integers
int addNumbers(int a, int b, int c, int d) {
    return a + b + c + d;
}

// Function to switch to Process Stack Pointer (PSP)
__attribute__((naked)) void switchToPSP() {
    __asm volatile(".equ SRAM_END, (0x20000000 + (128 * 1024))");
    __asm volatile(".equ PSP_START, (SRAM_END-512)");
    __asm volatile("LDR R0, =PSP_START");
    __asm volatile("MSR PSP, R0");
    __asm volatile("MOV R0, #0x02");
    __asm volatile("MSR CONTROL, R0");  // Set CONTROL->SPSEL bit
    __asm volatile("BX LR");  // Return to main()
}

// Function to generate SVC exception
void generateSVCException() {
    __asm volatile("SVC #0x2");
}

int main(void) {
    switchToPSP();  // Switch to PSP

    int result = addNumbers(1, 4, 5, 6);  // will use PSP for storing the function's return address and local variables.

    printf("Result = %d\n", result);

    generateSVCException(); // will also use PSP to store the return address. But when this handler is invoked, the MSP is used for its stack operations, not the PSP.

    while (1);  // Infinite loop
}

void SVC_Handler(void) {
    printf("In SVC_Handler\n");
}

6. Function Call and AAPCS Standard

When dealing with ARM architectures like Cortex-Mx, understanding the function call convention is essential, especially for low-level programming tasks. The ARM Architecture Procedure Call Standard (AAPCS) sets the standard for function calls and is crucial for ensuring that compiled code can work together correctly.

Register Usage

In ARM Cortex-Mx processors, registers R0-R3 are used to pass parameters to functions. If more parameters exist, they are placed onto the stack. The return value is usually stored in R0 and R1.

Stack Alignment

The AAPCS standard requires an 8-byte aligned stack. This is to make sure data types like double and uint64_t, which are 8 bytes in size, are aligned correctly.

The Stack Frame

During a function call, the caller saves the current program counter (PC) to the Link Register (LR). It also stores important register values onto the stack, so that the function can later return to the caller's state. The callee (the function being called) also uses the stack for local variables and to save the caller's register values if they will be modified during the function's execution. Once the function finishes executing, it returns control to the caller, restoring the stack and registers to their previous state.

AAPCS and Optimization

Understanding AAPCS is critical when mixing assembly and C/C++ code or using compiler intrinsics. It ensures that the hand-written assembly code is aware of what the compiler-generated code expects in terms of register usage and stack management. This is particularly important for ensuring that function calls made from assembly code are compatible with those generated by a C/C++ compiler.

VFP and Advanced Features

While basic ARM processors may not have a Floating-Point Unit (FPU), more advanced ones do. AAPCS has extensions to handle such cases, such as the usage of VFP registers for passing floating-point arguments.

By adhering to AAPCS, both low-level and high-level code can coexist and operate as expected, making it an essential standard for ARM-based development.

Caller-Saved and Callee-Saved Registers

In the context of AAPCS and ARM architecture, understanding who saves and restores what registers—either the caller or the callee—is crucial for correct and efficient function calls. This responsibility is generally categorized into "caller-saved" and "callee-saved" registers.

Caller-Saved Registers

Also known as "scratch registers," these are registers that a called function is allowed to overwrite. If the caller wishes to preserve their values across a function call, it is the caller's responsibility to save them before the call and restore them after the call. In ARM Cortex-Mx, these usually include R0-R3 and R12. These are typically used for argument passing and temporary storage, knowing they might be altered by the function you're calling.

Callee-Saved Registers

These are registers that a called function must preserve across function calls. If a called function wants to use these registers for its own purposes, it must save the original values and restore them before returning. In ARM Cortex-Mx, the callee-saved registers are R4-R11.

Special Registers

LR (Link Register): Typically, the caller saves the old LR if it needs to use nested function calls.
SP (Stack Pointer): Generally managed by the compiler, and you usually don't need to worry about saving or restoring it unless you're doing low-level stack manipulation.

Too Many Arguments

If a function has more arguments than can fit into the available registers, the additional arguments are typically passed on the stack.

For ARM Cortex-M using the AAPCS standard, the first four arguments are placed in R0, R1, R2, and R3. If there are more than four arguments, the fifth argument, sixth argument, and so on, are pushed onto the stack.

Here's a simple example to illustrate:

void myFunction(int a, int b, int c, int d, int e, int f) {
    // Do something
}

int main() {
    myFunction(1, 2, 3, 4, 5, 6);
    return 0;
}

In this example:

a would go in R0
b would go in R1
c would go in R2
d would go in R3
e and f would be pushed onto the stack

If the caller function also has local variables that it needs to save while making the function call, it would also use the stack to save these variables.

Remember, using the stack for extra arguments or local storage is slower than using registers, so there's a performance trade-off. But the stack allows you to handle cases where you need more storage than the limited set of registers can provide.

7. Stack Activities During Function Calls, Interrupts, and Exceptions

The stack is a crucial part of the program's memory that aids in the execution of function calls, handling of interrupts, and exceptions. Understanding its behavior during these events is essential for efficient programming and troubleshooting.

Function Calls

Argument Passing: Registers are the first choice for passing arguments. If more arguments are present than registers, the stack is used.
Return Address: The return address is usually stored in a special register called the Link Register (LR in ARM architectures). However, there are cases where the LR itself might need to be saved to the stack to preserve it. This typically happens if the function in question makes additional function calls ('nested' or 'recursive' function calls). In such cases, the LR would get overwritten, so it is pushed onto the stack to preserve its value.
Local Variables: Registers are the first choice for storing them, especially in optimized code. However, if a function has more local variables than there are available registers, or if arrays or large data structures are involved, then the stack is used for storage.
Caller/Callee Saved Registers: Registers that need to be preserved across the function call are saved onto the stack.

Interrupts

Context Saving: When an interrupt occurs, the processor state (certain registers and flags) is saved onto the stack automatically. This is essential for resuming normal operation after the interrupt service routine (ISR) is done.
Interrupt Handling: The ISR is then called, during which local variables get allocated on the stack.
Context Restoration: Before exiting, the original processor state is popped from the stack, effectively resuming the interrupted activity.

Exceptions

Exception Entry: Similar to interrupts, an exception also results in automatic saving of processor state onto the stack.
Exception Handling: The corresponding exception handler function gets called. Again, local variables within this function are stored on the stack.
Exception Exit: Upon completion, the original state is restored from the stack.

Special Cases

Tail Calls: Some compilers optimize tail-recursive functions or tail calls to reduce stack usage.
Coroutines: These are special cases where stacks may be manually managed, although this is quite rare in typical ARM Cortex-Mx programming.

8. Stack Initialization

When an ARM Cortex-Mx system starts up, the stack pointer is automatically initialized to the highest address of the SRAM, a process typically handled by the system's startup code. The highest address is indicated in the linker script (often with the label _estack), which guides the layout of the compiled program in memory. For instance, you may see this in a startup file:

ldr   sp, =_estack  /* set stack pointer */

This sets the Main Stack Pointer (MSP) to point to the starting address of the stack. As the system operates, this stack pointer will move (typically downwards) to make room for local variables, return addresses, and to save processor state during function calls, interrupts, or exceptions.

Tips for Effective Stack Management

Assess Your Application: Determine the amount of stack space needed for the worst-case scenario during your application's runtime.
Understand the Stack Model: Familiarize yourself with your processor's stack model. For instance, ARM Cortex-Mx uses a Full Descending Stack Model.
Decide Stack Placement: Choose where to place the stack in RAM—whether it's in the middle, at the end, or in external memory.
Consider Secondary Initialization: In some applications, you might start with internal RAM and then switch to external SDRAM. If this is the case, initialize the SDRAM in the main or startup code and then update the stack pointer to point to it.
Initialize Properly: If you're using an ARM Cortex-Mx processor, ensure that the first entry in the vector table contains the initial stack address (MSP). Your project's startup code usually handles this for you.
Leverage Linker Scripts: You can use linker scripts to define boundaries for the stack, heap, and other RAM areas. The startup code typically uses these settings to initialize the stack pointer.
RTOS Considerations: In cases involving an RTOS, the kernel might use MSP for its own stack and configure PSP for user tasks.

9. Stack Overflow

A stack overflow occurs when the stack grows beyond the memory region allocated for it. Given that stacks in ARM Cortex-Mx systems grow downwards, a stack overflow would occur if the stack pointer reaches a memory address below its initial starting point, thereby overwriting other parts of the memory, like the heap or global data sections.

When a stack overflow occurs, undefined behavior can ensue, potentially causing data corruption or system crashes. Detecting a stack overflow can be challenging, and approaches may include:

Hardware support: Some systems offer Memory Protection Units (MPUs) that can trigger an exception when a stack overflow occurs.
Software Checks: Inserting canary values at the boundaries of the stack and checking their integrity regularly.
Static Analysis: Some tools can analyze code to predict stack usage, although this can be hard to get right for all but the simplest code.