I am going to explain the anatomy of assembly code using RISC-V as an example since it’s a clean, modern instruction set architecture (ISA) that’s easy to understand. Assembly code is a low-level programming language that directly corresponds to machine instructions executed by a processor.
Assembly is human-readable but closely tied to the hardware. Let us break it down step-by-step and then dive into RISC-V examples.
Instructions
- These are the basic operations the CPU executes (e.g.,
add
,load
,jump
). - In assembly, each line typically represents one instruction.
- Instructions consist of an opcode (operation code) and operands (data or registers to operate on).
Registers
- Small, fast storage locations inside the CPU.
- RISC-V has
32
general-purpose registers, labeledx0
tox31
. Some have special purposes (e.g.,x0
is alwayszero
). - Assembly code often manipulates data in registers rather than memory for speed.
Operands
- Can be registers, immediate values (constants), or memory addresses.
- Example:
add x1, x2, x3
uses registers as operands;addi x1, x2, 5
uses an immediate value (5).
Syntax
- Typically:
[opcode] [destination], [source1], [source2]
. - RISC-V uses a consistent format, often with commas separating operands.
Labels
- Symbolic names for memory addresses, used for jumps or branches (e.g., loops or function calls).
- Example: loop: marks a spot in the code.
Comments
- Ignored by the assembler, used for human readability.
- In RISC-V, comments start with
#
.
Directives
Commands to the assembler (not CPU instructions), like .data
to define data or .text
for code sections.
RISC-V Instruction Types
RISC-V has a reduced instruction set, meaning it keeps things simple with a few key instruction formats:
- R-type: Register-to-register operations (e.g., arithmetic).
- I-type: Immediate operations (e.g., add a constant).
- S-type: Store instructions (save to memory).
- B-type: Branch instructions (conditional jumps).
- U-type: Upper immediate (large constants).
- J-type: Jump instructions (unconditional jumps).
Code Examples
Now, let us see this in action with examples. Examples of RISC-V Assembly Code
Basic Arithmetic (R-type and I-type)
Let’s add two numbers stored in registers and then add a constant.
add
is an R-type instruction: it operates on three registers.addi
is an I-type instruction: it uses two registers and an immediate value.- Anatomy:
[opcode] [destination], [source1], [source2 or immediate]
.
Loading and Storing Data (I-type and S-type)
Let’s load a value from memory into a register and store it back elsewhere.
lw
(load word) fetches 32 bits from memory into a register.sw
(store word) writes a register’s value to memory.
The number (e.g., 0 or 4) is an offset added to the base address in the register.
Branching (B-type)
Let’s write a simple loop that increments a counter until it hits 5.
beq
(branch if equal) compares two registers and jumps if they’re equal.j
(jump) is a J-type instruction for unconditional jumps.loop:
andexit:
are labels marking addresses.
Function Call (J-type and I-type)
Let’s call a subroutine to double a number.
|
|
jal
(jump and link) jumps to a label and saves the return address.jalr
(jump and link register) returns using the address inx10
.- Registers like
x10
are conventionally used for return addresses.
Data Section (Directives)
Let’s define some data and use it.
.data
and.text
are directives telling the assembler where data and code go.la
(load address) is a pseudo-instruction that simplifies getting a label’s address.
Putting It All Together
Here’s a small program to sum numbers 1 to 5:
- Uses registers
x1
,x2
,x3
. - Combines I-type (
addi
), R-type (add
), and B-type (bne
).
Things to remember about RISC-V Assembly
- Fixed-length instructions: All are 32 bits, making decoding simple.
- Load/store architecture: Only lw and sw access memory; arithmetic uses registers.
- Minimalist design: Fewer instructions than complex ISAs like x86, but still powerful.
Hopefully, this breakdown gave you a solid grasp of assembly code anatomy through RISC-V.