Note: The mercurial server is disabled at the moment while I investigate whether it can run with an acceptably low CPU load – Mike.
Thunder tutorial
Thunder[1] is based on a register machine with a RISC-like instruction set where arithmetic instructions specify two source registers and a destination, or a source register, an immediate operand and a destination. Loads and stores are separate from arithmetic, and support a few simple addressing modes.
Note: the code contained in this tutorial will work on x86, ARM and amd64. For amd64, there are complications due to the fact that the virtual machine can access only the bottom 2GB of the address space, so it's best to stick with x86 for now.
Another page has a comprehensive list of Thunder instructions.
Touring fact.c
The code shown here is part of the example program thunder/fact.c
included with Thunder. Build instructions are given below.
First, we need to get a piece of C gobbledegook out of the way:
typedef int (*funcp)(int);
This defines funcp
to be the type of pointers to functions that take an int
parameter and return an int
result.[2]
Then we can define compile
to be a function with no parameters that returns this type: it does so by using Thunder to create a factorial function and return it.
funcp compile(void) { void *entry;
We'll use void *
for the type of addresses in the generated code. In this case entry
is the entry point for the factorial function.
vmlabel lab1 = vm_newlab(), lab2 = vm_newlab(); vmreg r0 = vm_ireg[0], r1 = vm_ireg[1];
Now we create two label objects for use in the code. We will define these labels at appropriate points below, and can use them as targets for branch instructions either before or after defining them. Everything is fixed up behind the scenes so that the branches lead to the right place.
We also name two registers for use in the code. There are two classes of registers in Thunder: integer and floating point, and the integer registers are divided into the callee-save registers vm_ireg[0..vm_nvreg)
and the caller-save registers vm_ireg[vm_nvreg..vm_nireg)
. Thus, vm_nvreg
is the number of callee-save registers, guaranteed to be preserved across each function call, and vm_nireg
is the total number of integer registers, including any caller-save registers that are not preserved across function calls. The floating point registers vm_freg[0..vm_nfreg)
are all caller-save.
This scheme is convenient, because it's safe to use callee-save registers as if they were caller-save, but not vice versa. Thunder guarantees that all callee-save registers that are used in a function body are saved on entry to the function and restored on exit, either by saving them all regardless, or by monitoring register use in the function body and saving registers as needed.
All existing implementations of Thunder provide at least 2 callee-save registers, at least 7 integer registers altogether, and at least 6 floating-point registers. Smart applications (like the OBC JIT) have their own scheme for managing registers, and can use additional registers if they exist. Simpler applications may use a fixed number of registers for fixed purposes, and can safely ignore any registers they don't need. It's wise to include an assertion like assert(vm_nvreg >= 2 && vm_nireg >= 4)
in the main program to document the assumptions made by a program, and protect against surprises if future implementations provide fewer registers than expected.
The program continues.
entry = vm_begin("fact", 1); vm_gen(GETARG, r0, 0); vm_gen(MOV, r1, 1);
Here we begin compiling a Thunder procedure: its name is fact
, and it has one (integer) argument. Thunder supports up to three incoming arguments, all of integer or pointer type, and three outgoing arguments for function calls that appear in the body of a function. We save the entry address of the function as the value of entry
. The name fact
doesn't greatly matter, except that there is a debugging feature that lets you save the code for the procedure in a file named fact.dump
.
The second line is the first instruction in the body of the procedure, generated by calling the interface function vm_gen
that takes a variable number of arguments. The instruction GETARG r0, 0
fetches the first argument (numbered 0) and puts in register r0
. This argument could, according to the host architecture, be passed on the stack or in a register – we don't know – but the GETARG
instruction is implemented in the appropriate way. Incoming arguments are guaranteed to remain available until the first instruction in the function body that is not a GETARG
instruction, but not after that. Most applications fetch the arguments immediately on entry and save them in registers or memory. The third line adds another instruction, setting register r1
to 1.[3]
Now begins the main loop of the factorial function, which repeatedly decrements r0
towards zero, and keeps a running product in r1
. What you see here is a sequence of instructions, each adding one line to the assembly-language program we are putting together.
vm_label(lab1); // Label lab1 vm_gen(BEQ, r0, 0, lab2); // Branch to lab2 if r0 = 0 vm_gen(MUL, r1, r1, r0); // Set r1 = r1 * r0 vm_gen(SUB, r0, r0, 1); // Set r0 = r0 = 1 vm_gen(JUMP, lab1); // Jump back to lab1 vm_label(lab2); // Label lab2
The first and last lines define the two labels we created earlier; these labels are used in the BEQ
and JUMP
instructions that make the loop. I've added comments above to describe the effect of each instruction.
Notice that we don't concern ourselves here with the nature of the host machine: it may or may not have an instruction that compares a register with a constant and branches if they are equal, and if it doesn't, then it's the job of the Thunder implementation on that machine to make the same effect using the instructions that do exist, perhaps a compare instruction followed by a branch on condition codes.
After the loop, we need to return the value in register r1
. We do this by moving it to a special register vm_ret
, and making this the last action in the function.
vm_gen(MOV, vm_ret, r1);
The register vm_ret
may or may not be the same as one of the other caller-save registers provided for our use, so moving a value to vm_ret
may destroy the value of a random register. For this reason, the move to vm_ret
should come just before the function returns.
The compile
function ends as follows:
vm_end(); return entry; }
First, the call to vm_end()
completes and seals the function we have just put together. There may be interaction with the operating system to make sure the memory it occupies is executable, and to flush the memory caches so that the code appears ready for execution. The funtion compile
returns the entry point of the newly compiled function. In the main program, we can write:
int main(int argc, char *argv[]) { funcp fact = compile(); printf("The factorial of 10 is %d\n", fact(10)); return 0; }
As you can see, the object that is returned by compile
can be called just as if it were a function written in C.
A recursive function
The program fact.c
contains a second definition of the factorial function, this time using recursion. This illustrates the fact that recursion is possible for generated functions, but at a more mundane level also shows the code needed for a function call. The function compile2
begins as follows.
funcp compile2(void) { void *entry; vmlabel lab1 = vm_newlab(), lab2 = vm_newlab(); vmreg r0 = vm_ireg[0], r1 = vm_ireg[1]; assert(vm_nvreg >= 2); entry = vm_begin("fact2", 1); vm_gen(GETARG, r0, 0);
As before, we declare a couple of labels and a couple of registers; this time we assume the registers to be callee-save, hence the assertion. Now we find the body of the factorial function. First we deal with the case n = 0
by returning the constant 1.
vm_gen(BNE, r0, 0, lab1); vm_gen(MOV, vm_ret, 1); vm_gen(JUMP, lab2);
If the base case does not apply, then we must make a recursive call. To implement a function call, we first compute into registers any arguments that are not constants; in this case, we comptute n-1
in r1
.
vm_label(lab1); vm_gen(SUB, r1, r0, 1);
Now we are ready for the call. First comes a PREP
instruction, specifying the number of outgoing arguments; then a sequence of ARG
instructions specifying the arguments from right to left. In this case, we write vm_gen(ARG, r1)
to pass a value from a register, but there is also a form of the ARG instruction that passes an integer constant. Finally, there is a CALL
instruction to specify the procedure to call; this time, we are calling a fixed function, namely the one we are in the process of defining, but there is also an instruction to call a function whose address is in a register.
vm_gen(PREP, 1); vm_gen(ARG, r1); vm_gen(CALL, entry);
A number of safety precautions are needed in order to be sure that the function call will work on all architectures. First, only ARG
instructions appear between the PREP
and the CALL
; that means computing any non-constant arguments in advance. Second, we must be finished with GETARG
instructions for the incoming arguments of a function before the PREP
, because some architectures put incoming and outgoing arguments in the same registers.
The result of the call appears in the special register vm_ret
, which may be the same as one of the other caller-save registers. We should save it elsewhere before using any of these registers again, and before making another call. In our case, we can safely read the result of the recursive call and write our own result in the same multiply instruction, then immediately return.
vm_gen(MUL, ret, r0, ret);
The compiling function ends in a similar way as before.
vm_label(lab2); vm_end(); return (funcp) entry; }
Both these examples have compiling functions that are straight-line sequences of procedure calls, so they always compile the same sequence of instructions. But the possibility is immediately present of using more elaborate compiling functions, including ones that translate a sequence of instructions in some other code, such as Keiko, or compiling functions that are defined by recursion over an abstract syntax tree.
(This example doesn't access data in memory, so doesn't use the load and store instructions that Thunder provides. There's another example in the same file that uses these things, and also floating point.)
Other features
- Local storage by using
vm_begin_locals(name, nargs, locsize)
, then addressing the local storage at positive offsets from the special regiservm_base
. This is not used by the Keiko JIT, so may not be implemented on some targets.
Operand sizes
Machines commonly restrict the size of values that may appear as immediate fields of instructions. This isn't a problem on x86, where 32-bit immediate values are supported, but it is a problem on the ARM, where immediates larger than 256 are not always allowed, according to a complicated scheme that makes 256 and 260 representable but 257 not. Thunder imposes no such restrictions, and you can freely write
vm_gen(ADD, r0, r0, 257);
If the architecture requires it, then Thunder will insert code that loads the constant into a (hidden) register before the ADD
operation. It's also possible to load from arbitrary addresses, not just from ones that will fit in the offset field of an instruction. So you can write
vm_gen(LDW, r0, (int) &var);
if var
is any (statically-allocated) variable. On x86, this generates a mov
instruction containing a 32-bit address; on ARM, we get an instruction that loads the address into a spare register then an indirect load through that register; luckily, there is an ARM convention that register IP
is reserved for this kind of manoeuvre. It works out either way, without us having to worry about it at the Thunder level. On amd64, things are a bit more complicated, and we have to make sure that var
has an address that fits in 32 bits.
Debugging
By setting appropriate debugging flags, it's possible to see the Thunder instructions together with the machine instructions for the host that they generate. Here is a trace of the iterative factorial function being assembled on the x86. Thunder instructions have upper-case mnemonics, and the corresponding x86 instructions are in lower case and indented a bit more. Flouting both the Intel and the AT&T conventions, these x86 instructions are shown in a style where the destination is on the left, memory operands are written offset(reg)
and immediate operands are written #imm
. This is debugging output, not input, so you will never have to write in this form.
You will see that the routine begins with four registers being pushed on the stack so as to preserve the values they had in the caller; they are popped again as part of the translation of the RET instruction at the end. Registers V0
and V1
correspond to BX
and SI
on the x86, notated as rBX
and rSI
here to avoid the confusion about EBX
vs BX
or BL
in Intel assembler.
The GETARG
operation fetches the argument from the stack by addressing relative to rSP
and puts it in a register. Each of the other instructions has a straightforward translation into x86 machine code, most as a single instruction. In the case of the MUL
instruction, this is possible because one of the input registers is also used for the output, but the Thunder translator would have inserted a move instruction if needed. The conditional branch BEQ
is translated into a test
instruction followed by a branch on condition codes, and the SUB
instruction is able to use the special form dec rBX
that assembles to a single byte.
--- push rBP [55] --- push rBX [53] --- push rSI [56] --- push rDI [57] --- GETARG V0, 0 --- mov rBX, 20(rSP) [8b 5c 24 14] --- MOV V1, 1 --- mov rSI, #1 [be 01 00 00 00] --- L1: --- BEQ V0, 0, L2 --- test rBX, rBX [85 db] --- je L2 [0f 84 00 00 00 00] --- MUL V1, V1, V0 --- imul rSI, rBX [0f af f3] --- SUB V0, V0, 1 --- dec rBX [4b] --- JUMP L1 --- jmp L1 [e9 ef ff ff ff] --- L2: --- MOV I0, V1 --- mov rAX, rSI [8b c6] --- pop rDI [5f] --- pop rSI [5e] --- pop rBX [5b] --- pop rBP [5d] --- ret [c3]
Looking more closely, you will see the numeric machine code for each instruction shown in square brackets. This information is mainly of interest to those who write the translator, piecing together the output bit by painful bit, as in the end the binary output is all that matters. There are a couple of points worth noting, however.
In the je
instruction, there is a two-byte opcode 0f 84
, followed by a four-byte offset that is shown as zero, because at the time the instruction is assembled, the location of label L2
is still unknown. The proper offset will be patched in later when L2
does become known. In contrast, as a backward jump, the jmp
instruction shows the offset to reach L1
as -17, or ef ff ff ff
in little-endian order, and that's correct because there are precisely 17 bytes of object code between L1
and L2
.
In difficult cases, where the irregularity of the x86 architecture gets in the way, it is sometimes necessary for the object code to do a rain dance, where some registers are pushed on the stack before the operation, used for their magical properties, then popped again afterwards. This particularly affects (a) shift instructions with a variable shift amount, where the x86 insists on having the shift amount in the rCX
register (and calling it CL
for the purpose); and (b) single-byte stores, where the source register must be one of rAX
, rCX
, rDX
or rBX
(aka AL
, CL
, DL
, BL
) and not rSI
, rDI
or rBP
. It's all taken care of in the translation, and it's best not to worry about it. The story changes in amd64 mode, where the gods of AMD or Intel have revealed special spells to make rain without the need for a dance.
Build instructions
To build Thunder and the example program for the host:
$ git clone https://github.com/Spivoxity/thunder.git $ cd thunder $ autoreconf $ ./configure --disable-m64x32 --enable-debug $ make fact $ ./fact
The --disable-m64x32
flag causes the program to be built as a 32-bit application, avoiding all the complexities of mixing 32-bit and 64-bit addressing. It will uses gcc -m32
, and requires approriate development tools and libaries to be installed.
To build for the ARM and run on an emulator:
$ sudo apt-get install gcc-arm-linux-gnueabihf qemu-user $ cd thunder/armtest $ make fact $ ./run ./fact
- ↑ Why is it called Thunder? Because it's intended as a successor to GNU Lightning! https://www.youtube.com/watch?v=bFjL3DZ9V0I
- ↑ You can avoid the typedef by writing
int (*compile(void))(int) { ... }
andreturn (int (*)(int)) entry
instead of what follows. But only Ken Thompson will ever go out with you if you do, and he's dead now. - ↑ The function
vm_gen
is actually a portmanteau for a family of functions with names likevm_gen2ri
, meaning "generate an instruction with 2 operands, one a Register and the other an Integer", so that the callvm_get(MOV, r1, 1)
actually meansvm_gen2ri(MOV, r1, 1)
. A fancy overloading mechanism identifies which member of the family is needed for each call, making the code more readable and less fiddly to type. I mention this fact only because, like all overloading mechanisms I've seen, it reacts to even trivial mistakes by spitting out incomprehensible error messages. The mechanism in this case involves varargs macros and the_Generic
feature introduced in C11.