Note: The mercurial server is disabled at the moment while I investigate whether it can run with an acceptably low CPU load – Mike.
Implementing a recursive subroutine (Digital Systems)
In C, pointer-linked binary trees can be represented using this datatype:
typedef struct node *tree; struct node { int data; tree left, right; };
A variable t
declared with
tree t;
stores a pointer to a node; the data
field of that node can be written t->data
, an abbreviation for (*t).data
, in which the *
denotes the passage from a pointer to the structure it accesses, and the .data
denotes selecting the data
field of that record. (In Java, Scala and similar languages – and in Hakell too – the operation written *
here is implicit.) Each node in the tree will be a structure with three words, one for the data
and one each for the two child pointers, with data
at offset 0 in the sructure, and the two pointers left
and right
at offsets 4 and 8 respectively. If a tree is built using memory that is dynamically allocated using the library function malloc
, it's likely each node will be bigger than this; it may be prefixed by a header that says how large the structure is, so that it can be released by free
, and it may be padded to a larger size in order to reduce the number of different sizes the malloc
has to deal with. None of that need concern us, because a pointer to the node is the address of its first field, and that field will be data
.
Given that type definition, we can write the following recursive function that adds up the integer labels in a tree.
int sum(tree t) { if (t == NULL) return 0; else return t->data + sum(t->left) + sum(t->right); }
That's the same function we would write in Haskell with someting like
sum :: Tree -> Int sum (Node x lt rt) = x + sum lt + sum rt sum Null = 0
How can we write the function in assembly language?
The heart of the subroutine will be a sequence containing two recursive calls to the same routine, with their results added together and added also to the value of t->data
. The key insight is that whatever values must survive the recursive calls sum(t->left)
and sum(t->right)
must not live in the registers r0
to r3
across those calls. In this subroutine, we can idenfify two such values: the parameter t
, a pointer that arrives in r0
, but can't stay there when the parameters of the recursive calls are in their turn computed into the same register; and the growing total where we add together the data
field of t
and the results of the recursive calls. Those results also appear in r0
after the calls, but must be moved from there quickly.
So, in spite of the fact that the whole routine uses (as will emerge) use no more than three registers, we must make two of them be in the range r4
to r7
. We'll use r4
to hold on to the incoming parameter t
, and r5
to compute the total that will become the result. We should adopt the following skeleton for our subroutine.
sum: push {r4, r5, lr} @ Save registers <body of sum> pop {r4, r5, pc} @ Restore and return
ARM coding conventions demand that we save registers on the stack as the very first action of the subroutine, and restore them right at the end. We need to preserve the incoming values of r4
and r5
, which in the case of recursive calls will be working data for the outer instance of the function. We also need to preserve the return address from the lr
register, because lr
will be trashed by subsequent subroutine calls. That value will be restored into pc
at the moment the suroutine returns.
For the body of the subroutine, we start with an if-then-else structure that tests whether the argument is zero. representing the NULL
pointer.
<body of sum>= cmp r0, #0 @ Test if t == NULL bne sum_calc @ If not, go to calculate movs r0, #0 @ Empty tree has sum 0 b sum_exit @ Finished sum_calc: <calculate sum into r0> sum_exit:
We're left with the hard part, calculating the sum of a non-trivial tree. After saving the pointer t
in r4
and initialising r5
with the data
field of the structure, we write two recursive calls. For each one, we fetch a pointer to the relevant subtree into r0
, invoke the sum
subroutine recursively, and add the result of the recursive call (which is in r0
when it returns) to the value in r5
. We can sneakily put the result of the final addition back into r0
ready to return it ourselves.
The addressing mode written [r4, #4]
is especially useful here, because it allows us to add together the address of a node (in r4
) and the offset 4 between the start of the node and its left
field, as part of the instruction that fetches the value of that field.
<calculate sum into r0>= movs r4, r0 @ Save t in r4 ldr r5, [r4, #0] @ Get data field in r5 ldr r0, [r4, #4] @ Get left subtree in r0 bl sum @ Recusively compute its sum adds r5, r5, r0 @ Add to total ldr r0, [r4, #8] @ Get right subtree in r0 bl sum @ Second recursive call adds r0, r5, r0 @ Put grand total in t0
There's no particular advantage in fetching the data
field at the start, rather than adding it at the end, but no special disadvantage either, given that r5
is ours to use throughout the subroutine.
The code could be optimised a bit, at the expense of moving away from a style that is understood by the debugger, by moving in the push
and pop
instructions so that they surround only the part of the code where r4
and r5
are used, namely the fragment <calculate sum into r0>
. This saves the time needed to save and restore these registers when our task is merely to answer that the sum of the empty tree is 0. But what is not needed (and not an improvement) is to surround each recursive call with its own push/pop
pair. Saving registers at the top and restoring them at the bottom makes the registers available to us throughout the subroutine; if we call other subroutines (including making recursive calls) then it's the responsibility of the subroutine we call to preserve the values we are keeping in those registers. There is a possible style where registers are saved on the stack around each call, but that's not the ARM style, and it's not generally as efficient. Details: if we move the push/pop
pair inwards, then no registers have been saved before the then
part of the routine, so it should end with bx lr
rather than a pop instruction.
I have to say for completeness that the sequence
cmp r0, #0 @ Test if t == NULL bne sum_calc @ If not, go to calculate movs r0, #0 @ Empty tree has sum 0
can also be optimised a bit: if we reach the third instruction, then we know r0
contains 0 already, so we don't have to set it to 0, and that instruction can be omitted. That feels wrong, because the first 0 is a null pointer, and the second 0 is zero as a number. But the computer doesn't know the difference!
It's also the case that a C compiler is free to reorder the members of a structure and insert padding between them, so that there's no guarantee that the layout we've assumed in our assembly language program will actually match the one chosen by the C compiler. That might matter if we want to write a tree-processing program partly in C and partly in assembly language. However, although the compiler has the freedom to pad and reorder structures, actual compilers rarely do so. The potential gains in efficiency are small, and reordering structures gets in the way of using them to interface with hardware, with operating systems that are compiled differently, and (as in our case) subroutines hand-written in assembly language.