Syscalls and switching between different exception levels is a very intricate topic filled with minute details which are often overlooked. Let’s dive into details.
EL0 <-> EL1 switching
Hello There!
Let’s continue on our journey of understanding a mammoth that is the ARM64 architecture.
In the Previous article, we went through different exception levels. Would encourage perusing through it once in case you’d want to jog your memories. In this one, we would together want to attempt to deep dive into switching between two of the exception levels - EL0
and EL1
.
We’ll walk through the process of a system call from EL0 to EL1 and the return journey, focusing on context switching, the vector table, synchronous exception handling, and key registers like ESR_EL1
and ELR_EL1
. We will explore from the lens of code snippets to see how implementation might look in action. Finally, we will piece together the snippets to arrive at full switching flow.
Quick Recap
ARMv8-A organizes privilege levels into Exception Levels (EL0 to EL3).
- EL0 is where user applications run — unprivileged, restricted access.
- EL1 is typically the OS kernel, with more privileges like accessing system registers.
- Switching between these levels happens during exceptions, like system calls, interrupts, or faults.
- A system call (via the
svc
instruction) is a synchronous exception that moves execution from EL0 to EL1. The return to EL0 is handled by theeret
instruction.
So that’s it. Thanks for reading! Not quite though, right? Our interest is going in the details and that’s what we are going to do.
Triggering the System Call from EL0
Let’s start with a basic example and say I’m writing a user program in EL0, and I want to make a system call to, say, write to a file (like Linux’s write
syscall). The svc
instruction is my gateway. It generates a synchronous exception, causing the processor to switch to EL1 and jump to a predefined handler in the vector table.
Here’s a simple EL0 code snippet to trigger a system call:
|
|
What’s happening here?
I load the system call number (64 for write
in Linux ARM64) into x8
, set up arguments in x0
–x2
(per the Linux ABI), and issue svc #0
. The immediate value (#0
) is mostly ignored in practice but can be used by the kernel to differentiate types of supervisor calls. When svc
executes, the processor (automatically happens in h/w, no s/w intervention is required):
- Switches to EL1.
- Saves the program counter (
PC
) toELR_EL1
(Exception Link Register for EL1). - Saves the status register (
PSTATE
) toSPSR_EL1
(Saved Program Status Register for EL1). - Jumps to the vector table entry for synchronous exceptions.
The
svc
instruction doesn’t save general-purpose registers (x0
–x30
). That’s on the kernel to handle, or you risk clobbering user state.
Homework: Imagine any malicious user passes garbage or some unsavory data in registers (e.g., an invalid pointer in x1
), the kernel needs robust validation to avoid crashes. Can you come up with some modifications in our snippet to achieve that?
The Vector Table and Synchronous Exception Handling
When the svc
instruction fires, the processor looks at the Vector Base Address Register (VBAR_EL1
- it is programmed during the boot phase, we will talk about this in future articles when we talk about the boot flow) to find the vector table — a table of exception handlers in EL1. The table has entries for different exception types, including synchronous exceptions like system calls.
For a 64-bit EL0 app, the processor jumps to the “Synchronous, Lower EL, AArch64” entry.
Code snippet of what a vector table would look like:
|
|
The vector table must be aligned to a 2KB boundary (hence .align 11
). Each entry is 128 bytes, allowing for small handlers or branches to larger ones. For our system call, the processor jumps to sync_lower_el
. The handler saves critical registers and calls a syscall handler. The eret
instruction restores execution to EL0.
Context Switching
In EL1, the kernel saves the EL0 context (general-purpose registers, etc.) to ensure the user program can resume. On return, it restores the context before eret
.
Context save/restore sub-routines:
|
|
Here is the key is store all the registers (x0
–x30
) to a stack frame (256 bytes). Why all registers? The kernel might clobber them (we don’t want to put any restrictions on the kernel, do we now?), and the user expects preservation of its registers (except x0
for return values). The restore function pops registers back. You’d see a very similar looking code in any major kernel source out there.
Homework: You’d see a very similar looking code in any major kernel source out there. There might be some extra registers (e.g Floating point registers, SVE/NEON registers) which might also be saved/restored. Analyzed the Linux source and jot-down those differences.
Parsing ESR_EL1
The Exception Syndrome Register (ESR_EL1
) identifies the exception cause. Remember, h/w will only be able to identify that a synchronous exception from lower EL and the control would be transferred to the appropriate entry in vector table which is sync_lower_el
. We will have to identify if it is indeed a svc
call or not, and for that, we will have to decode the exception syndrome register.
For a system call, it confirms an svc
instruction and provides details.
|
|
The Exception Class (EC, bits 31:26
) is 0x15
for svc
from AArch64. The ISS (bits 15:0) holds the svc
immediate.
ELR_EL1 and Returning to EL0
ELR_EL1
holds the return address (usually the instruction after svc
). The kernel can modify it for special cases.
The eret
instruction restores PC from ELR_EL1 and state from SPSR_EL1.
Complete Example
Let’s piece everything together:
|
|
Homework: I’ve deliberately kept a subtle bug in the above snippet. Can you catch it?
Hint - it is related to return values being correctly communicated back.
In the next one, we will talk about the switching of other layers and even secure and non-secure worlds. We will take a simple use-case and trace its flow. See you in the next one!