Note: The mercurial server is disabled at the moment while I investigate whether it can run with an acceptably low CPU load – Mike.
Keiko assembly language (Compilers)
Syntax
This section gives the syntax of Keiko assembly language programs in the form that is accepted by the bytecode assembler/linker oblink
. The style of syntax description is similar to that used in the Kernighan & Ritchie book on C: a syntactic category is followed by a sequence of alternatives, each on a separate line. A subscript opt indicates that a construct is optional.
Lexical conventions
- Each element appears on its own line (nl used below to denote a line boundary)
- Blank lines and lines beginning
#
are ignored - Identifiers can be any sequence of non-blank characters, including e.g.
Files.Read
. It's wise to avoid indentifoers that begin with a digit or minus sign, as in some contexts these may be interpreted as numeric constants.
The operands of instructions are described as instances of the class constant. In most contexts, constants may be specified in decimal or hexadecimal (as in 0x1234abcd
), or may be symbols defined elsewhere in the program.
constant: decimal-constant hexadecimal-constant ident
Files
A Keiko file contains a heading that gives the name of the module and lists (in IMPORT
directives) other modules that it depends upon. A compiler that outputs Keiko code can generate a checksum for the public interface of a module and embed this checksum in each other module that uses it, and the assembler/linker will then check across all modules in a program that the checksums are consistent. Unused checksums can be replaced by 0. The module header also contains a count of source lines in the module that is used to allocate counters for line-count profiling; this too can be replaced by 0 of profiling is not going to be used on the program.
file: heading bodyopt
heading: module-directive importsopt endhdr-directive
module-directive: MODULE ident checksum linecnt nl
imports: import-directive import-directive imports
import-directive: IMPORT ident checksum nl
endhdr-directive: ENDHDR nl
The body of a module constists of multi-line procedures interspersed with other single-line directives that (among other things) allocate global storage.
body: phrase phrase body
phrase: directive procedure
Directives
Directives appear between the procedures of a program.
directive: DEFINE ident nl WORD constant nl LONG constant nl FLOAT float nl DOUBLE float nl STRING hex-string nl GLOVAR ident integer nl PRIMDEF ident ident type-string nl
- a
DEFINE
directive defines a symbol at the current location in the data segment. That location is the address of any following data item created with another directive such asWORD
,FLOAT
orSTRING
. - the
WORD
,LONG
,FLOAT
andDOUBLE
directives each contribute a numeric constant to the data segment, allowing global data tables to be initialised; the table can be accessed through a label defined by a precedingDEFINE
directive. - a
STRING
directive contributes a sequence of characters, specified by a hexadecimal string, to the data segment. If a terminating null character is needed, then this should be included in the hex string. The length of the string is padded to a multiple of 4 bytes. When convenient, it is possible to build up a string in several parts by giving multiple successiveSTRING
directives, provided the length of all but the last directive is a multiple of 4 to prevent padding. - a
GLOVAR
directive allocates space of a specified size in the bss segment, and defines a symbol with its address. The size is rounded up to a multiple of 4, so that the current location in the bss segment is always aligned. - a
PRIMDEF
directive declares a named primitive whose definition is a C subroutine. A directive such asPRIMDEF Math.sqrt sqrtf FF
declares a primitive that will be namedMath.sqrt
in the Keiko program, and interfaces to the standard C library functionsqrtf
, which the type stringFF
describes as taking a singlefloat
argument and yielding afloat
result. Some implementations of Keiko are able to link dynamically to libaries containing C functions, and others require an interpreter containing the primitives to be compiled specially.
Procedures
Each procedure has a heading that gives its name and some other information. This is followed by a sequence of mingled Keiko machine instructions and pseudo-operations. The pseudo-operations typically assemble into an entry in the procedure's constant pool, together with an instruction that loads the constant onto the stack.
procedure: proc-directive bodyopt end-directive
proc-directive: PROC ident integer integer constant nl
body: element element body
end-directive: END nl
element: pseudo-operation instruction
- A
PROC
directive begins a procedure. The three arguments are:- The size of the procedure's local variable space in bytes; this should be a multiple of 4.
- The maximum number of values pushed on the evaluation stack during the procedure, counting most types as one value, but long integers and doubles as two values. This argument is not currently used by implementations of the Keiko machine, and can be repaced by zero; the only possible disadvatage in future Keiko implementations is that stack overflow not be detected promptly. At present, the stack overflow check leaves a generous margin of space for each procedure to use.
- A garbage collector map for the stack frame. If the Kieko machine is built without the optional garbage collector, or if the stack frame of the procedure contains no pointers into the heap, then this argument can be zero. If garbage collection is enabled, then every procedure that stores pointers in its frame must have a garbage collector map, which will be either a bitmap expressed as a hexdecimal constant, or the address of a program written in a special mini-language that describes the layout of the frame. This mini-language is described elsewhere.
- An
END
directive ends a procedure and can be followed by further material that appears between procedures.
Instructions
instruction: opcode operandsopt nl
operands: constant constant operands
Each instruction has an optional list of operands, which (depending on the instruction) can be integer constants, assembler symbols, and labels. Details of the instructions and what operands they take appear later in this document.
Pseudo-operations
These pseudo-operations should appear inside a Keiko procedure; most behave like intructions but also contribute additional information to the current procedure.
pseudo-operation: LABEL ident nl CONST constant nl GLOBAL ident nl FCONST float nl DCONST float nl QCONST constant nl STKMAP constant nl LINE integer nl
- The
LABEL
pseudo-op defines its argument as a label for the next instruction in the procedure. Labels can be arbitrary identifiers and have a scope that is the whole of the current procedure. They are used only in branch instructions, and do not have a value that can be stored in a variable. - The next few pseudo-ops act as instructions that push a value on the stack, but are capable of handling 32-bit or 64-bit values that are stored out of line in the constant pool for the procedure.
CONST
pushes an arbitrary integer or address;GLOBAL
is similar, but retricted to the addresses of globals;FCONST
,DCONST
andQCONST
push float, double and long integer constants respectively, with the double and long integer constants taking up two stack slots. These pseudo-ops are typically translated by the Keiko assembler intoLDKW
orLDKD
instructions that reference a slot it has allocated in the constant pool. As a special caseCONST
pseudo-ops that contain a small constant are translated intoPUSH
instructions that use an inline constant, either encoded directly in the opcode byte, or following it as the next one or two bytes of the instruction stream. All this is hidden from programmers and compilers by the Keiko assembler. - The
STKMAP
pseudo-op specifes a pointer map for the evaluation stack that holds at an immediately followingCALL
instruction. Any pointer values on the evaluation stack that are used as arguments to the procedure call will be covered by the procedure's own stack map, so this pseudo-op is needed only in the rare case where other values near the bottom of the evaluation stack will persist over the call. These stack maps are gathered for the whole procedure and used by the assembler to compile a stack map table that – alongside the code and the constant pool – forms part of the runtime representation of the procedure. If the Keiko machine is built without a garbage collector, then naturally enough these stack maps can be omitted. - The
LINE
pseudo-op marks a source line, with an argument that is the line number. It adds the line number to a table that the assembler includes with the object program, and also generates anLNUM
instruction in the code. TheLNUM
instructions are used both by the Keiko profiler, which can count how many times each line is executed, and by debuggers, which can replace them withBREAK
insructions to implement breakpoints.
Load and store instructions
These instructions are named according to a convention where the last letter of the mnemonic identifies the size and type of data: W
for a 4-byte word, but also C
for a byte, S
for a 2-byte halfword, F
for a single-precision floating point number, D
for double-precision floating point, and Q
for an 8-byte integer. Single bytes are treated as unsigned, but halfwords are sign-extended on loading, in agreement with the CHAR
and SHORTINT
types of the Oberon language for which Keiko was originally designed. The distinction between integer and floating-point values is made primarily to help very simple JIT transalators with register allocation.
LOCAL n
- Push the address of a local at a constant offset (positive or negative) from the frame pointer.
OFFSET
- Expect an address and an integer offset on the stack; pop them, add them, and push the result.
INDEXS
,INDEXW
,INDEXD
- Similar to
OFFSET
, except that the integer offset if multiplied by 2, 4, or 8 respectively before the addition.
LOADW
- Expect an address on the stack; pop it, and push the 4-byte contents of the address.
LOADS
,LOADC
,LOADF
- Like
LOADW
, except that the value loaded is a 2-byte signed integer (LOADS
), a single unsigned byte (LOADC
), or a single-precision float (LOADF
). LOADD
,LOADQ
- Like
LOADW
, except that the value loaded is a double-precision float (LOADD
) of an 8-byte integer (LOADQ
). Each of these types takes up two slots on the evaluation stack. The two halves of each value are loaded separately, so that the values in memory need only 4-byte alignment.
STOREW
- Expect a 4-byte integer value and an address on the stack; pop them, and store the value at the address.
STORES
,STOREC
,STOREF
- Like
STOREW
, except the value stored is a 2-byte integer (STORES
), a single byte (STOREC
) or a single-precision float (STOREF
). STORED
,STOREQ
- Like
STOREW
, except that the value stored is a double-precision float or an 8-byte integer, each of which occupies two slots on the evaluation stack. Again, only 4-byte alignment of the target address is required.
The addressing operators and load/store instructions listed above form a complete set, and the remaining instructions listed below are just shorthand for combinations of them. Compilers may combine the basic instructions into these shorthands, and implementations of Keiko may choose to implement some of the combinations directly, reducing the size of the binary code and speeding up the bytecode interpreter, which can achieve more in each cycle. Some of the rarer combinations may not in fact be implemented with their own bytecodes; for them, the assembler partially or completely expands the shorthands into their underlying primitive instructions. For example, the usual implementation of Keiko implements LDGF
directly, in a single instruction that occupies 2 or 3 bytes, but re-expands LDGD x
into the equivalent sequence GLOBAL x; LOADD
, partly because the operation LOADD
is already quite expensive, so the expense of forming the address in a separate instruction is proportionally less significant.
LDLW n
,LDLS
,LDLC n
,LDLF n
,LDLD n
,LDLQ n
- Load local, equivalent to
LOCAL n
followed byLOADW
,LOADS
, etc. Note that the signed offsetn
must fit into 2 bytes; otherwise the compiler generating the Keiko code must explicitly use the equivalent sequenceLOCAL 0; CONST n; OFFSET; LOADW
, etc. STLW n
,STLS n
,STLC n
,STLF n
,STLD n
,STLQ n
- Store local, equivalent to
LOCAL n; STOREW
, etc. Again, the signed offsetn
must fit in 2 bytes. LDGW x
,LDGS x
,LDGC x
,LDGF x
,LDGD x
,LDGQ x
- Load global, equivalent to
GLOBAL x; LOADW
, etc. STGW x
,STGS x
,STGC x
,STGF x
,STGD x
,STGQ x
- Store global, equivalent to
GLOBAL x; STOREW
, etc. LDNW n
,LDNS n
,LDNC n
,LDNF n
,NDND n
,LDNQ n
- Load indexed, equivalent to
CONST n; OFFSET; LOADW
, etc., with signed offsetn
fitting in 2 bytes. STNW n
,STNS n
,STNC n
,STNF n
,STND n
,STNQ n
- Store indexed, equivalent to
CONST n; OFFSET; STOREW
, etc.,, with signed offsetn
fitting in 2 bytes. LDXW
,LDXS
,LDXC
,LDXF
,LDXD
,LDXQ
- Double-indexed load, equivalent to
CONST 4; TIMES; OFFSET; LOADW
, etc., with a scale factor equal to the size of the value loaded. STXW
,STXS
,STXC
,STXF
,STXD
,STXQ
- Double-indexed store, equivalent to
CONST 4; TIMES; OFFSET; STOREW
, etc., with a scale factor equal to the size of the value being stored. ADJUST n
- Add constant offset, equivalent to
CONST n; OFFSET
, providedn
fits in two signed bytes.
Integer arithmetic
PLUS
,MINUS
,TIMES
- Pop two integers from the stack, combine them with an arithmetic operation, and push the result.
UMINUS
- Unary minus; pop an integer and push the integer with the same magnitude and opposite sign.
DIV
,MOD
- Integer division and modulo, defined with truncation towards minus infinity.
INC
,DEC
- Integer increment and decrement, equivalent to
CONST 1; PLUS
orCONST 1; MINUS
. AND
,OR
,NOT
- Boolean operations; the integer arguments are interpreted as false if zero, true if non-zero, and the result is either 0 or 1.
BITAND
,BITOR
,BITXOR
,BITNOT
- Bitwise logical operations.
LSL
,LSR
,ASR
,ROR
- Shifts and rotations, with left shift (
LSL
), both logical (LSR
) and arithmetic (ASR
) right shifts, and right rotation (ROR
). All expect an operand and a shift amount on the stack, and operate on 32-bit words. EQ
,NEQ
,LT
,GT
,LEQ
,GEQ
- Integer comparisons, popping two integer arguments and pushing a Boolean result, either 0 or 1.
Miscellaneous operations
INCL n
- Increment a local variable, equivalent to
LDLW n; CONST 1; PLUS; STLW n
. The offsetn
must fit in two bytes. DECL n
- The same as
INCL
, but decrementing instead of incrementing. DUP k
- Push on the top of the stack a copy of the (single-word) value that is
k
items from the top, wherek
is 0, 1, or 2. SWAP
- Swap the top two (single-word) values on the stack.
POP n
- Pop
n
items from the stack, wheren
< 256.
Conditional and unconditional branches
JEQ lab
,JNEQ lab
,JLT lab
,JGT lab
,JLEQ lab
,JGEQ lab
- Expect two integers on the stack; pop and compare them, and branch to
lab
if the relevant condition is satisfied. JEQZ lab
,JNEQZ lab
,JLTZ lab
,JGTZ lab
,JLEQZ lab
,JGEQZ lab
- Expect an integer on the stack; pop it and compare it with zero, and branch to
lab
if the relevant condition is satisfied. JUMP lab
- Jump to
lab
.
The next few instructions are intended for implementing case
statements. The TESTGEQ
instruction can be used to build a binary tree of comparisons involving a value k
that remains on the stack. At the leaves of the tree, JCASE
and JRANGE
instructions permit the relevant case to be identified quickly.
JCASE n; CASEL lab1; ...; CASEL labn
- A
JCASE n
instruction must be followed by a table ofn
labels written as operands ofCASEL
. The instruction expects an integerk
on the stack; it pops the integer, and if 0 <=k
<n
branches to the corresponding case label; otherwise execution continues with the next instruction. JRANGE lab
- Expect three integers
k
,lo
andhi
on the stack; pop them and branch tolab
iflo
<= k <=hi
. TESTGEQ lab
- Expect two integers
k
andx
on the stack; popx
but leavek
on the stack, branching tolab
ofk
>=x
.
Long integer operations
QPLUS
,QMINUS
,QTIMES
,QDIV
,QMOD
- Binary operations on long integers.
QUMINUS
- Unary minus on long integers.
QINC
,QDEC
- Increment and decrement long integers.
QEQ
,QNEQ
,QLT
,QGT
,QLEQ
,QGEQ
- Comparisons on long integers.
QJEQ
,QJNEQ
,QJLT
,QJGT
,QJLEQ
,QJGEQ
- Conditional branches on long integers.
Floating point arithmetic
FPLUS
,FMINUS
,FTIMES
,FDIV
- Binary arithmetic operations on floats.
FUMINUS
- Unary minus on floats.
DPLUS
,DMINUS
,DTIMES
,DDIV
- Binary arithmetic operations on doubles.
DUMINUS
- Unary minus on doubles.
FEQ
,FNEQ
,FLT
,FGT
,FLEQ
,FGEQ
- Comparisons on floats.
DEQ
,DNEQ
,DLT
,DGT
,DLEQ
,FGEQ
- Comparisons on doubles.
Floating point conditional branches
There are ten different conditional branches involving comparison of two single-precision floats (and another ten for double precision), because the treatment of anomalous NaN values means the jump-if-less-than-or-equal is different from jump-if-not-greater-than. The standard implementation of Keiko, like the JVM, implements these using one of two floating-point comparison instructions (with different treatment of NaN) followed by an integer conditional branch.
FJEQ lab
,FJNEQ lab
,FJLT lab
,FJGT lab
,FJLEQ lab
,FJGEQ lab
,FJNLT lab
,FJNGT lab
,FJNLEQ lab
,FJNGEQ lab
- Expect two single-precision floats on the stack; pop and compare them, and jump to
lab
if the condition is satisfied. DJEQ lab
,DJNEQ lab
,DJLT lab
,DJGT lab
,DJLEQ lab
,DJGEQ lab
,DJNLT lab
,DJNGT lab
,DJNLEQ lab
,DJNGEQ lab
- Similar instructions, but with double-precision operands.
Conversions
A small but sufficient set of conversions is provided; other conversions must go via the integer type.
CONVNF
- Expect an integer on the stack; pop it, convert it to a single-precision floating point approximation to the same value, and push the result.
CONVND
- Convert an integer to double-precision floating point.
CONVFN
- Convert a single-precision float to an integer, discarding the fractional part.
CONVDN
- Convert a double-precision float to an integer, discarding the fractional part.
CONVFD
- Convert from single-precision to double-precision floating point.
CONVDF
- Convert from double-precision to single-precision floating point.
CONVNC
- Convert an integer to an unsigned character by masking off all but the bottom 8 bits.
CONVNS
- Convert an integer to a signed 2-byte integer by masking off all but the bottom 16 bits, then sign extending from 16 to 32 bits.
CONVNQ
- Convert from a 32-bit to a 64-bit integer with sign extension.
CONVQN
- Convert from a 64-bit to a 32-bit integer.
CONVQD
- Convert a 64-bit integer to double-precision floating point.
CONVDQ
- Convert a double-precision float to a 64-bit integer.
Runtime checks
BOUND line
- Expect integers index and bound on the stack; pop bound, but leave index on the stack. If the relationship 0 <= index < bound is not satisfied, signal an array bound error on line.
NCHECK line
- Expect a pointer on the stack; leave it there, but if it is null, signal a null pointer error on line.
GHCECK line
- Expect a pointer on the stack; leave it there, but if it is non-null, signal an error involving the assignment of a local procedure to a procedure-valued variable.
ZCHECK line
,FZCHECK line
,DZCHECK line
,QZCHECK line
- Expect an integer or a single- or double-precision float or a long integer on the stack; leave it there, but i it is zero, signal a divide-by-zero error on line.
ERROR n line
- signal error n on line. Predefined constants such as E_CAST allow the message to be chosen from a standard list: see the function
message
in source filexmain.c
.
More bits and pieces
ALIGNC
,ALIGNS
- Expect a one-byte or two-byte quantity on top of the stack. Adjust its alignement in way that is appropriate if it is to become a procedure parameter. These operations are no-ops on little-endian architectures like x86 and ARM, but act as shifts on big-endian architectures.
FIXCOPY
- Expect two pointers
dst
andsrc
and a countn
on the stack; pop them and copyn
bytes fromsrc
todst
. FLEXCOPY
- Expect on the stack a pointer to the location in the stack frame where the address of a flexible array parameter is stored, and an integer giving its size in bytes; pop them, allocate space for the parameter in the stack frame of the current procedure, copy the data across, and replace the parameter address with the address of the copy.
LNUM n
- Note the beginning of the code for source line
n
. This instruction has no effect normally, but can be used for line-count profiling and to support breakpoints in a debugger. The valuen
must fit in 2 bytes.
Procedure call
The procedure call instructions provided by the standard Keiko machine are slightly different from the PCALL
instruction used in the Compilers course: in fact PCALL n
is equivalent to CALL (n+1)
, because the always-present static link is treated as an extra parameter. Other compilers targetting Keiko do not pass the static link as a parameter, but instead via a special 'secret' place. This makes it possible to global procedures to ignore any dummy static link they may be passed, and for calls to known global procedures to avoid passing a static link at all, saving time and space.
CALL n
,CALLW n
,CALLF n
,CALLD n
,CALLQ n
- Call a procedure with
n
arguments and no result (CALL
), a one-word result that may be an integer or a pointer (CALLW
), a single-precision (CALLF
) or double-precision (CALLD
) floating point result, or a 64-bit integer result (CALLQ
). The arguments should previusly have been pushed on the evaluation stack, with double-precision float and 64-bit integer arguments counting double, and followed by the procedure address. These arguments become part of the stack frame of the procedure, and the procedure address and the arguments are popped when the procedure returns. STATLINK
- Expect a pointer to a frame base on the stack; pop it and save it in a secret place. The
STATLINK
instruction should appear just before the code that pushes the procedure address for the call (and that code should not itself involve other procedure calls). SAVELINK
- This instruction must be the first in a procedure that expects a static link. It moves the link from the secret place to it proper location in the stack frame of the procedure.
RETURN