Note: The mercurial server is disabled at the moment while I investigate whether it can run with an acceptably low CPU load – Mike.
Rough guide to ARM (Compilers)
Registers
The ARM has (for our purposes) 16 registers plus a status word.
r0 – r3 |
Scratch registers |
r4 |
Static link |
r5 – r10 |
Temporaries |
r11 =fp |
Frame pointer |
r12 =sp |
Stack pointer |
r13 =ip |
Linkage temp |
r14 =lr |
Link register |
r15 =pc |
Program counter |
A procedure receives its first four parameters in registers r0
– r3
, and need not preserve these registers. It must preserve the values of registers r4
– r10
.
Stack frame layout
Other incoming +56 parameters +52 Parameter 3 +48 Parameter 2 +44 Parameter 1 +40 Parameter 0 +36 Return address +32 Saved sp +28 Dynamic link +24 Saved register r10 ... +4 Saved register r5 fp: Static link -4 Local -8 variable -12 space +8 Other +4 outgoing sp: parameters
Note that the frame head is 40 bytes instead of 16 as in Keiko, and the contents appear in a different layout. The only thing that matters for the code in the procedure body is that the static link is in a different place. Parameters are addressed at positive offsets from fp
and local variables at negative offsets from fp
. If the procedure calls others that have more than four words of parameters, then extra space is reserved for these outgoing parameters; this space is addressed at positive offsets from sp
.
The first four words of incoming arguments arrive in registers r0
– r3
, and we save them into the stack frame as part of the prcedure prologue. We also save the values of the other registers so that they can be restored when the procedure returns; it's ok therefore for our caller to keep values in these registers across the procedure call.[1]
As an extension to the ARM calling convention, we pass the static link to a procedure in r4
, and that gets saved in the frame of the procedure we call. If the static link is the constant 0, however, we know the procedure won't refer to it, so don't bother to pass it.
Each procedure starts with code like this, using special instructions to save up to four incoming parameters and the values of registers r4 to r10 in the stack frame:
_P: mov ip, sp stmfd sp!, {r0-r1} stmfd sp!, {r4-r10, fp, ip, lr} mov fp, sp sub sp, sp, #8
The stmfd
instructions store multiple registers (by a bitmap in the instruction) and adjust the value of sp
. The first one saves two words of incoming parameters, and the second saves the other registers which must be preserved by the procedure call.[2][3]
A procedure ends with a single instruction that resets the registers to their previous values; by resetting sp
this destroys the stack frame, and by resetting pc
it returns to the caller.
ldmfd fp, {r4-r10, fp, sp, pc} .ltorg
The .ltorg
directive identifies a place in the code where the assembler can place a table of large constants for use in ldr=
instuctions.
Instructions
Moves
We can put a small constant in a register, or move from one register to another with the mov
instruction:
mov r1, #1 |
r1 := 1 |
mov r1, r2 |
r1 := r2 |
For larger constants, and symbolic addresses that are assumed to be 32 bits, there's a form of ldr
instruction:
ldr r1,=#1000000 |
r1 := 1000000 |
ldr r1,=#_x |
r1 := _x |
(The second of these sets r1
to the address _x
, not the contents of that address.)
The mnemonic ldr
is a reminder that this instruction loads the value from a place in memory that is allocated by the assembler. The actual instrucion that is assembeled uses pc
-relative addressing; at the end of each procedure, we use the .ltorg
directive to identify a place where the assembler can safely dump the table of constants that it has assembled.
Loads and stores
Loads and stores are performed using the ldr
and str
instructions that transfer a 4-byte word from or to memory (and the variants ldrb
and strb
that transfer a single (unsigned) byte). The available addressing modes form the address by adding a register and a constant or two registers, optionally shifting one of the registers left by a constant number of bytes.
ldr r0, [fp, #48] |
r0 := mem4[fp+48] |
str r2, [r0, r1] |
mem4[r0+r1] := r2 |
ldr r3, [r1, r2, LSL #2] |
r3 := mem4[r1+r2*4] |
ALU operations
Most ALU operations operate between three registers, or a destination and a source register and an immediate constant.
add
sub
eor
and
orr
lsl
lsr
asr
Unary operations: neg
, mvn
mul
– requires all 3 operands in registers.[4]
Branches
b
– unconditional branch.
cmp
followed by beq
etc.; the cmp
sets four condition bits in the status register, and the conditional branches interpret these bits.
bl
and blx
for procedure calls, blx
allowing call of procedure with address in a register. Both save return address in lr
register.
Indexed jumps in case
statements use ldrlo pc, [pc, ip, LSL #2]
, combining conditional execution, unsigned comparison, jumping by loading pc
, addressing relative to pc
(with 8-byte implicit offset), use of ip
, scaled addressing. Treat it as magic!
Quirks
Conditional execution: it's not just the branch instructions but most others too that can be made conditional. We use moveq
etc. to implement comparisons with Boolean value: for example, the sequence
cmp r2, r3 mov r4, #0 moveq r4, #1
first sets r4
to 0, then sets it to 1 if r2 = r3
; the net effect is to set r4
to the boolean value of the comparison.
Also use movhs
and blhs
to implement bounds checks: the sequence
cmp r1, #100 movhs r0, #23 blhs check
invokes check(23)
to print "array bound error on line 23" if r1 < 0
or r1 >= 100
. This code sneakily uses unsigned comparisons, exploiting the fact that numbers that are negative in the two's-complement representation are treated as large positive numbers in the unsigned representation. Both the mov
and the bl
are made conditional on the comparison finding that r1
is Higher or Same compared to 100. The time wasted by the two unexecuted instructions is probably less that would be taken by a conditional branch around them.
Second operand of arithmetic instructions can be shifted with LSL
or with other shift operations.
Footnotes
- ↑ There's no need for a leaf routine (one that calls no others) to save
r0
–r3
in memory; and there's no need to save any ofr4
–r10
that are not used in the procedure. The potential savings here are huge, but we ignore them for simplicity. - ↑ If 1 or 3 parameters, then save 2 or 4 in order to maintain stack alignment
- ↑ Value in scratch
ip
register is the initial value ofsp
, which will be restored intosp
on return. The value inlr
is the return address, which will be restored intopc
. - ↑ Division done by out-of-line function to get correct treatment of negative numbers