Note: The mercurial server is disabled at the moment while I investigate whether it can run with an acceptably low CPU load – Mike.

Rough guide to ARM (Compilers)

Copyright © 1993–2025 J. M. Spivey
Jump to navigation Jump to search

Registers

The ARM has (for our purposes) 16 registers plus a status word.

r0r3 Scratch registers
r4 Static link
r5r10 Temporaries
r11=fp Frame pointer
r12=sp Stack pointer
r13=ip Linkage temp
r14=lr Link register
r15=pc Program counter

A procedure receives its first four parameters in registers r0r3, and need not preserve these registers. It must preserve the values of registers r4r10.

Stack frame layout

     Other
     incoming
 +56 parameters

 +52 Parameter 3
 +48 Parameter 2
 +44 Parameter 1
 +40 Parameter 0

 +36 Return address
 +32 Saved sp
 +28 Dynamic link

 +24 Saved register r10
     ...
  +4 Saved register r5

fp:  Static link
       
  -4 Local
  -8 variable
 -12 space

  +8 Other
  +4 outgoing
sp:  parameters

Note that the frame head is 40 bytes instead of 16 as in Keiko, and the contents appear in a different layout. The only thing that matters for the code in the procedure body is that the static link is in a different place. Parameters are addressed at positive offsets from fp and local variables at negative offsets from fp. If the procedure calls others that have more than four words of parameters, then extra space is reserved for these outgoing parameters; this space is addressed at positive offsets from sp.

The first four words of incoming arguments arrive in registers r0r3, and we save them into the stack frame as part of the prcedure prologue. We also save the values of the other registers so that they can be restored when the procedure returns; it's ok therefore for our caller to keep values in these registers across the procedure call.[1]

As an extension to the ARM calling convention, we pass the static link to a procedure in r4, and that gets saved in the frame of the procedure we call. If the static link is the constant 0, however, we know the procedure won't refer to it, so don't bother to pass it.

Each procedure starts with code like this, using special instructions to save up to four incoming parameters and the values of registers r4 to r10 in the stack frame:

_P:
	mov ip, sp
	stmfd sp!, {r0-r1}
	stmfd sp!, {r4-r10, fp, ip, lr}
	mov fp, sp
	sub sp, sp, #8

The stmfd instructions store multiple registers (by a bitmap in the instruction) and adjust the value of sp. The first one saves two words of incoming parameters, and the second saves the other registers which must be preserved by the procedure call.[2][3]

A procedure ends with a single instruction that resets the registers to their previous values; by resetting sp this destroys the stack frame, and by resetting pc it returns to the caller.

	ldmfd fp, {r4-r10, fp, sp, pc}
	.ltorg

The .ltorg directive identifies a place in the code where the assembler can place a table of large constants for use in ldr= instuctions.

Instructions

Moves

We can put a small constant in a register, or move from one register to another with the mov instruction:

mov r1, #1 r1 := 1
mov r1, r2 r1 := r2

For larger constants, and symbolic addresses that are assumed to be 32 bits, there's a form of ldr instruction:

ldr r1,=#1000000 r1 := 1000000
ldr r1,=#_x r1 := _x

(The second of these sets r1 to the address _x, not the contents of that address.) The mnemonic ldr is a reminder that this instruction loads the value from a place in memory that is allocated by the assembler. The actual instrucion that is assembeled uses pc-relative addressing; at the end of each procedure, we use the .ltorg directive to identify a place where the assembler can safely dump the table of constants that it has assembled.

Loads and stores

Loads and stores are performed using the ldr and str instructions that transfer a 4-byte word from or to memory (and the variants ldrb and strb that transfer a single (unsigned) byte). The available addressing modes form the address by adding a register and a constant or two registers, optionally shifting one of the registers left by a constant number of bytes.

ldr r0, [fp, #48] r0 := mem4[fp+48]
str r2, [r0, r1] mem4[r0+r1] := r2
ldr r3, [r1, r2, LSL #2] r3 := mem4[r1+r2*4]

ALU operations

Most ALU operations operate between three registers, or a destination and a source register and an immediate constant.

add sub eor and orr lsl lsr asr

Unary operations: neg, mvn

mul – requires all 3 operands in registers.[4]

Branches

b – unconditional branch.

cmp followed by beq etc.; the cmp sets four condition bits in the status register, and the conditional branches interpret these bits.

bl and blx for procedure calls, blx allowing call of procedure with address in a register. Both save return address in lr register.

Indexed jumps in case statements use ldrlo pc, [pc, ip, LSL #2], combining conditional execution, unsigned comparison, jumping by loading pc, addressing relative to pc (with 8-byte implicit offset), use of ip, scaled addressing. Treat it as magic!

Quirks

Conditional execution: it's not just the branch instructions but most others too that can be made conditional. We use moveq etc. to implement comparisons with Boolean value: for example, the sequence

cmp r2, r3
mov r4, #0
moveq r4, #1

first sets r4 to 0, then sets it to 1 if r2 = r3; the net effect is to set r4 to the boolean value of the comparison.

Also use movhs and blhs to implement bounds checks: the sequence

cmp r1, #100
movhs r0, #23
blhs check

invokes check(23) to print "array bound error on line 23" if r1 < 0 or r1 >= 100. This code sneakily uses unsigned comparisons, exploiting the fact that numbers that are negative in the two's-complement representation are treated as large positive numbers in the unsigned representation. Both the mov and the bl are made conditional on the comparison finding that r1 is Higher or Same compared to 100. The time wasted by the two unexecuted instructions is probably less that would be taken by a conditional branch around them.

Second operand of arithmetic instructions can be shifted with LSL or with other shift operations.

Footnotes

  1. There's no need for a leaf routine (one that calls no others) to save r0r3 in memory; and there's no need to save any of r4r10 that are not used in the procedure. The potential savings here are huge, but we ignore them for simplicity.
  2. If 1 or 3 parameters, then save 2 or 4 in order to maintain stack alignment
  3. Value in scratch ip register is the initial value of sp, which will be restored into sp on return. The value in lr is the return address, which will be restored into pc.
  4. Division done by out-of-line function to get correct treatment of negative numbers