12. Fault Handling and Analysis on ARM Cortex-Mx

1. Introduction to Processor Faults

What is a Fault?

A fault in the context of ARM Cortex-Mx processors is an abnormal condition that disrupts the normal flow of a program. It could be caused by various issues, such as an illegal memory access or a divide-by-zero error. When a fault occurs, the processor halts normal operation to handle the fault condition, often invoking a fault exception handler.

Faults vs. System Exceptions

Faults are specific kinds of system exceptions. While system exceptions encompass a broad category of interrupt and non-interrupt events, faults are a subset that focuses on errors and abnormal conditions. In ARM Cortex-Mx, faults can be categorized into types such as HardFault, MemManage, BusFault, and UsageFault.

Types of Faults

Exception Number	Exception Name	Priority	Description	Enabled by Default?	What Causes It
3	HardFault	-1	General fault	Yes	Stack overflow, Bus errors
4	MemManage	Configurable	Memory protection fault	Yes	Illegal memory access
5	BusFault	Configurable	Bus access fault	No	Misaligned data access
6	UsageFault	Configurable	Invalid instruction or operation	No	Divide-by-zero, undefined instruction

Note: If not configured, the default priority of all faults except HardFault is 0.

The table above summarizes key information about different types of faults. Each fault has an exception number, a name, a priority level, a description, whether it's enabled by default, and what conditions trigger it. The priority levels are negative to indicate their high urgency, as lower numbers are treated with higher priority in ARM architectures.

Exception Number: A unique ID for each exception.
Exception Name: The official name of the exception.
Priority: A number indicating the urgency of the exception. Lower numbers mean higher priority.
Description: A brief outline of what this fault is designed to catch.
Enabled by Default?: Indicates if this fault is active right after a reset.
What Causes It: Describes the conditions that trigger this fault.

2. HardFault

Introduction

HardFault is a special system exception in ARM Cortex-Mx processors. It has the third-highest fixed priority, with a priority value of -1, making it one of the most urgent exceptions that the processor can encounter. HardFault serves as a catch-all for various types of faults that either aren't caught by other fault handlers or aren't enabled.

Causes of HardFault

Here are some of the primary reasons that a HardFault can occur:

Escalation of Configurable Fault Exceptions: If other configurable fault handlers like MemManage, BusFault, and UsageFault are not enabled or properly configured, those faults escalate to a HardFault.
Bus Error During Vector Fetch: Occurs when there's an issue with fetching an interrupt vector, usually because of an invalid address.
Execution of Breakpoint Instruction: A HardFault is triggered if a breakpoint instruction is executed while both halt mode and the debug monitor are disabled.
Executing SVC Instruction Inside SVC Handler: The execution of the Supervisor Call (SVC) instruction within an SVC handler can also trigger a HardFault.

HardFault Status Register (HFSR)

The HardFault Status Register (HFSR) contains various bits that help diagnose the cause of a HardFault. Here are some of the significant bits:

FORCED Bit (30): This bit is set when an escalated configurable fault occurs. It indicates that a configurable fault was not handled and has thus resulted in a HardFault.
DEBUGEVT Bit (31): This bit is set when a HardFault is caused by debug events like breakpoints.
VECTTBL Bit (1): Indicates a bus fault during vector fetch. This bit is set if the fault was triggered during the fetching of an interrupt vector.

To understand these bits in greater detail, you can refer to the ARM Cortex-Mx Generic User Guide, specifically Section 4.3.11, "HardFault Status Register":

hfsr

3. MemManage Fault

Introduction

The MemManage Fault is another type of fault exception specific to the ARM Cortex-Mx processors. Unlike HardFault, which has a fixed priority, MemManage Fault has configurable priority, allowing you more flexibility in setting its urgency within your system. By default, MemManage Fault is disabled and needs to be explicitly enabled in the system configuration.

System Handler Control and State Register (SHCSR)

To enable MemManage Fault handling, you'll need to modify the System Control Block's System Handler Control and State Register (SCB->SHCSR). Specifically, set the MEMFAULTENA bit (bit 16) to 1. This action will enable the MemManage Fault and allow the processor to enter the MemManage exception handler when the fault conditions are met.

SCB_SHCSR

Causes of MemManage Fault

MemManage Faults can occur under various circumstances, mainly related to memory management issues:

Memory Access Violation: Triggered when there's a violation in memory access rules, either set by the processor or by the Memory Protection Unit (MPU).
Unprivileged Access to Privileged Memory: Occurs when an unprivileged task or mode tries to access memory regions reserved for privileged operations.
Writing to Read-Only Memory: If an attempt is made to write to a memory region marked as read-only, a MemManage Fault is generated.
eXecute Never Violation: Trying to execute program code from a memory region marked with the eXecute Never (XN) attribute will also result in a MemManage Fault. This attribute is often set for peripheral memory regions, external devices, or peripheral register blocks (PPB).

4. BusFault

Introduction

BusFault is another system exception specific to ARM Cortex-Mx processors, aimed at detecting errors during memory accesses on the system bus. Like the MemManage Fault, BusFault is also configurable in terms of priority and is disabled by default. It must be explicitly enabled to function.

System Handler Control and State Register (SHCSR)

To enable BusFault, you need to go to the System Control Block's System Handler Control and State Register (SCB->SHCSR). In this register, set the BUSFAULTENA bit (bit 17) to 1. Doing so will enable BusFault handling in the system.

Causes of BusFault

BusFault can occur under various scenarios, typically related to erroneous memory access on the system bus:

Error Response from Processor Bus Interface: Occurs when the processor's bus interface returns an error during memory access. This can happen during instruction fetch or during data read/write operations to memory devices.
Vector Fetch Escalation: If a bus fault occurs during a vector fetch operation, it will escalate to a HardFault. This prevents the system from invoking an incorrect interrupt service routine.
Invalid or Restricted Memory Access: When the processor's bus interface attempts to access invalid or restricted memory locations, and the memory device sends back an error response, a BusFault is generated.
Device Not Ready for Memory Transfer: A BusFault can occur if the memory device, such as SDRAM connected via a DRAM controller, is not ready to accept memory transfers.
Unprivileged Access to PPB: Unprivileged access to Peripheral Peripheral Blocks (PPB) can also result in a BusFault.

5. UsageFault

Introduction

UsageFault is another configurable exception type specific to ARM Cortex-Mx processors, designed to catch invalid operations at the instruction level. By default, this exception type is disabled, and if not specifically configured, its default priority is set to 0.

System Handler Control and State Register (SHCSR)

To enable UsageFault, you'll need to modify the System Control Block's System Handler Control and State Register (SCB->SHCSR). Look for the bit designated for enabling UsageFault (typically, it might be labeled USGFAULTENA) and set it to 1.

Causes of UsageFault

UsageFault can be triggered under several different scenarios, most of which are related to invalid or illegal operations:

Execution of Undefined Instruction: For instance, the Cortex-M4 only supports the Thumb Instruction Set Architecture. If an instruction doesn't comply with the Thumb ISA (T-bit set), a UsageFault occurs.
Floating-Point Instruction Without FPU: If an attempt is made to execute a floating-point instruction when the Floating-Point Unit (FPU) is not enabled, a UsageFault is triggered.
Returning to Thread Mode During Active Exception/Interrupt: Trying to switch back to thread mode while an exception or interrupt is still active will also result in a UsageFault.
Unaligned Memory Access: UsageFault can be generated with multiple load and multiple store instructions if there's an attempt to access memory in an unaligned manner.
Divide by Zero: A UsageFault will be generated if an attempt is made to divide by zero, but only if the divide-by-zero trap is enabled. If it's not enabled, the operation will yield zero without triggering a fault.
Unaligned Memory Access Trap: If you want the processor to trap unaligned memory accesses, you can enable this feature. If not, the processor might either generate a fault or proceed with the operation depending on its configuration.

6. Exercise: configuring faults

We can find the memory address of the SHCSR by looking at the System Control Block (SCB) in the ARM Cortex-M4 Generic User Guide:

SHCSR_address

This shows that the SHCSR is located at address 0xE000ED24. We can use this address to access the SHCSR in our code:

int main(void)
{
    // Enable all configurable exceptions: usage fault, mem manage fault, and bus fault
    uint32_t *pSHCSR = (uint32_t *) 0xE000ED24;
    *pSHCSR |= (1 << 16);  // Enable MemManage Fault
    *pSHCSR |= (1 << 17);  // Enable BusFault
    *pSHCSR |= (1 << 18);  // Enable UsageFault

    // rest of code...
}

Alternatively, we can use the CMSIS macro SCB->SHCSR to access the SHCSR. We will do that in the another course. Note that we also enable the MemManage Fault, BusFault, and UsageFault in the code above.

Now if we try to execute undefine instruction:

int main(void)
{
    // Enable all configurable exceptions: usage fault, mem manage fault, and bus fault
    // Not Shown

    // Force the processor to execute an undefined instruction
    uint32_t *pSRAM = (uint32_t *) 0x20010000;
    *pSRAM = 0xFFFFFFFF;  // Undefined instruction value

    void (*funcPtr) (void);
    funcPtr = (void *) 0x20010001;
    funcPtr();  // Execute undefined instruction

    // rest of code...
}

Run and debug the code. You will see that the processor enters the UsageFault handler:

usage_fault_window

However, since we don't always have access to a nice debugger with a Fault Analyzer window, we need to print the UsageFault Status Register (UFSR) to the console. We can do that by adding the following code to the UsageFault handler:

void UsageFault_Handler(void)
{
    uint32_t *pUFSR = (uint32_t *) 0xE000ED2A;

    printf("Exception : UsageFault\n");
    printf("UFSR = %lx\n",(*pUFSR) & 0xFFFF);

    while (1) {}
}

It might be useful to also analyze the stack frame. We can adjust our UsageFault_Handler function to print the stack frame as well:

__attribute__ ((naked)) void UsageFault_Handler(void)
{
    // Extract the value of MSP (base address of the stack frame)
    __asm ("MRS r0, MSP");
    __asm ("B UsageFault_Handler_c");
}

void UsageFault_Handler_c(uint32_t *pBaseStackFrame)
{
    uint32_t *pUFSR = (uint32_t *) 0xE000ED2A;

    printf("Exception : UsageFault\n");
    printf("UFSR = %lx\n",(*pUFSR) & 0xFFFF);

    printf("pBaseStackFrame = %p\n",pBaseStackFrame);
    printf("Value of R0 = %lx\n", pBaseStackFrame[0]);
    printf("Value of R1 = %lx\n", pBaseStackFrame[1]);
    printf("Value of R2 = %lx\n", pBaseStackFrame[2]);
    printf("Value of R3 = %lx\n", pBaseStackFrame[3]);
    printf("Value of R12 = %lx\n", pBaseStackFrame[4]);
    printf("Value of LR = %lx\n", pBaseStackFrame[5]);
    printf("Value of PC = %lx\n", pBaseStackFrame[6]);
    printf("Value of XPSR = %lx\n", pBaseStackFrame[7]);

    while (1) {}
}

The __attribute__((naked)) tells the compiler that the function will not require prologue/epilogue sequences. This is useful in low-level system programming where we need precise control over the generated assembly. When a function is declared naked, the compiler doesn't generate any of the usual function entry and exit code (like saving registers onto the stack). It's a direct handover to the programmer to control the behavior of the function at the assembly level.

In this case, UsageFault_Handler is written as a naked function because we want to directly manipulate the MSP (Main Stack Pointer) and immediately branch to UsageFault_Handler_c without any additional instructions that a compiler would normally insert for function calls. This offers a streamlined and efficient way to handle the fault.

Also note that we don't need volatile here because __asm inherently implies that there's a side effect. The compiler assumes that an __asm statement can have side effects and so it won't optimize it away, ensuring that the assembly code will be executed as-is. This is different from regular C or C++ code, where you would use volatile to prevent the compiler from optimizing out accesses to certain variables when we don't want it to.