Note: The mercurial server is disabled at the moment while I investigate whether it can run with an acceptably low CPU load – Mike.
Bytecode instructions for OBC
For release 2.9
This list contains the instructions that are generated by the compiler: first, a basic set that allows all necessary operations to be expressed, and contains more-or-less the instructions that are used by the compiler's code generator. Then we can add the packed instructions that are used by the compiler's peephole optimizer; each packed instruction corresponds to a short sequence of basic instructions that frequently occur together.
Basic instructions
Addressing, load and Store
PUSH n
- Push the small constant n onto the evaluation stack. Like all constants that appear as operands of instructions, n must fit into 16 bits.
CONST n
- The constant n can be a decimal or hexadecimal constant or an assembler symbol; its value is pushed on the stack. The assembler implements this instruction by placing n in the constant pool for the current procedure and generating an LDKW instruction to fetch it at runtime.
FCONST x
,DCONST x
,QCONST n
- Push a floating-point, double-precision or long-integer constant.
LOCAL n
- Push the address of a local variable that is at offset n from the frame head.
LOADC
,LOADS
,LOADW
,LOADD
- Pop an address and push the contents of the (8, 16, 32, 64)-bit location it names.
STOREC
,STORES
,STOREW
,STORED
- Pop an address and a one- or two-word value and store the value into the (8, 16, 32, 64)-bit location named by the address.
PLUSA
- Pop an integer offset and an address and push their sum. This instruction behaves like integer addition, but is kept separate to help the JIT translator.
DUP n
- Duplicate the one-word item that is
n
words from the top of the stack – soDUP 0
duplicates the top element on the stack.
SWAP
- Swap two one-word items at the top of the stack.
POP n
- Pop
n
one-word items from the stack.
Integer arithmetic
All arithmetic instructions expect their operands on the evaluation stack and leave their result on the stack.
PLUS
,MINUS
,TIMES
,DIV
,MOD
- Binary arithmetic operations on integers. (
DIV
andMOD
are guaranteed to give mathematically correct results).
UMINUS
- Unary minus on integers
EQ
,LT
,GT
,LEQ
,GEQ
,NEQ
- Integer comparisons with a Boolean result
AND
,OR
,NOT
- Boolean operations. As in C, these treat any non-zero operand as true, and produce a result that is either 1 or 0.
BITAND
,BITOR
,BITXOR
,BITNOT
- Bitwise boolean operations.
BIT
- Given a number between 0 and 31, this generates a word with the only corresponding bit set. The LSB is numbered 0 and the MSB is numbered 31.
LSL
,LSR
,ASR
- Shift instructions.
QPLUS
,QMINUS
,QTIMES
,QUMINUS
,QDIV
,QMOD
- Long integer (64 bit) instructions.
QEQ
,QLT
,QGT
,QLEQ
,QGEQ
,QNEQ
- Long integer (64 bit) comparisons.
Floating point
FPLUS
,FMINUS
,FTIMES
,FDIV
,FUMINUS
- Floating point arithmetic.
FEQ
,FLT
,FGT
,FLEQ
,FGEQ
,FNEQ
- Floating-point comparisons with a Boolean result.
DPLUS
,DMINUS
,DTIMES
,DDIV
,DUMINUS
- Double-precision arithmetic.
DEQ
,DLT
,DGT
,DLEQ
,DGEQ
,DNEQ
- Double-precision comparisons with a Boolean result.
Conversions
CONVNF
- Convert integer to floating point.
CONVND
- Convert integer to double precision.
CONVFD
- Convert floating point to double precision.
CONVDF
- Convert double precision to floating point.
CONVNC
- Convert integer to unsigned character, masking off unwanted bits.
CONVNS
- Convert integer to signed short, sign extending the result from 16 to 32 bits.
CONVNQ
- Convert integer to long integer
CONVQN
- Convert long integer to integer
CONVQD
- Convert long integer to double precision
Jumps
Labels in the assembler are written as decimal integers.
JUMP lab
- Jump to label
lab
.
JUMPF lab
,JUMPT lab
- Pop a Boolean value and jump to label
lab
if the Boolean is false or if it is true, respectively.
LABEL lab
An assembler directive: attach the label lab
to the next instruction.
Runtime checks
BOUND n
- Pop an array bound B and check the index i now at the top of the stack. If the test 0 ≤ i < B fails, report and array bound error on line n.
NCHECK n
- Check a pointer p on top of the stack. If it is null, report a null pointer error on line n.
GCHECK n
- Check a static link p on top of the stack. If it is not null, report the error that a local procedure is being assigned to a procedure variable.
ZCHECK n
,FZCHECK n
,DZCHECK n
,QZCHECK n
- Check a divisor d on top of the stack. If it is zero, report the error of division by zero.
ERROR e n
- Report runtime error e on line n. The code e should be one of those listed in function
message
in the filexmain.c
.
Language-specific operations
These instructions perform tasks that correspond to specific constructs in Oberon. They may or may not help with the implementation of other languages.
TYPETEST n
- At the top of the stack are the address of a record descriptor, and below it the address of a record. Pop these, test whether the decriptor is the ancestor of the record at level n, and push a Boolean result.
JCASE n
,CASEL lab
- The instruction
JCASE n
must be followed byn
of theCASEL
instructions. It pops an integeri
from the stack. If0 <= i < n
, the program continues at the corresponding label; otherwise, it continues with the instruction following the last followingCASEL
instruction.
JRANGE lab
- The instruction expects three integers
x
,lo
andhi
on the stack. Iflo <= x <= hi
, the program contiunes at labellab
, otherwise, it continues with the next instruction as usual.
FIXCOPY
- This instruction expects two address
a
andb
and an integern
on the stack. It copies a block ofn
bytes from addressb
to addressa
.
FLEXCOPY
- This instruction expects to find on the stack the address of an open array parameter and a size in bytes. It allocates space in the stack frame to copy the parameter, then overwrites the parameter with its new address.
Assembler directives
PROC name nargs fsize gcmap
,END
- @PRIMDEF name prim nargs fsize gcmap
DEFINE name
WORD n
,FLOAT x
,DOUBLE x
,LONG n
STRING hex
GLOBAL name size
LINE n
MODULE m
,IMPORT m
,PRIM p
,ENDHDR
Packed instructions
For compactness, brevity and speed, the compiler has a peephole optimizer that replaces common sequences of instructions by single instructions that have the same effect. Some of these instructions are realised by the bytecode assembler and interpreter in a form that is more compact and faster than the original sequence.
LDLC n
,LDLS n
,LDLW n
,LDLD n
- Load from local variable;
LDLs n
is equivalent toLOCAL n / LOADsW
.
STLC n
,STLS n
,STLW n
,STLD n
- Store to local variable;
STLs n
is equivalent toLOCAL n / STOREs
.
LDGC x
,LDGS x
,LDGW x
,LDGD x
- Load from global variable;
LDGs x
is equivalent toCONST x / LOADs
. Here x is a symbol that refers to an address in global storage, and the instruction loads from that address.
STGC x
,STGS x
,STGW x
,STGD x
- Store to global variable;
STGs x
is equivalent toCONST x / STOREs
.
JEQ lab
,JLT lab
,JGT lab
,JLEQ lab
,JGEQ lab
,JNEQ lab
- Integer compare and branch;
JEQ lab
is equivalent toEQ / JTRUE lab
, etc.
JEQZ lab
,JNEQZ lab
,JLTZ lab
,JGTZ lab
,JLEQZ lab
,JGEQZ lab
- Integer compare with zero and branch;
JEQZ lab
is equivalent toPUSH 0 / EQ / JTRUE lab
, etc.
FJEQ lab
,FJLT lab
,FJGT lab
,FJLEQ lab
,FJGEQ lab
,FJNEQ lab
- Floating point compare and branch;
FJEQ lab
is equivalent toFEQ / JTRUE lab
, etc.
DJEQ lab
,DJLT lab
,DJGT lab
,DJLEQ lab
,DJGEQ lab
,DJNEQ lab
- Double-precision compare and branch;
DJEQ lab
is equivalent toDEQ / JTRUE lab
, etc.
QJEQ lab
,QJLT lab
,QJGT lab
,QJLEQ lab
,QJGEQ lab
,QJNEQ lab
- Long integer compare and branch;
QJEQ lab
is equivalent toQEQ / JTRUE lab
, etc.
INC
,DEC
- Increment and decrement;
INC
is equivalent toPUSH 1 / PLUS
, etc.
INCL n
,DECL n
- Increment or decrement local variable;
INCL n
is equivalent toLDLW n / INC / STLW n
, etc.
QINC
,QDEC
- Long integer increment and decrement;
QINC
is equivalent toQCONST 1 / QPLUS
, etc.
BITSUB
- Bitwise difference, equivalent to
BITNOT / BITAND
INDEXC
,INDEXS
,INDEXW
,INDEXD
- Scaled addition;
INDEXs
is equivalent toPUSH n / TIMES / PLUSA
where n is 1, 2, 4 or 8.
LDIC
,LDIS
,LDIW
,LDID
- Indexed load;
LDIs
is equivalent toINDEXs / LOADs
.
LDNW n
,STNW n
- Constant indexed load and store;
LDNW n
is equivalent toPUSH n / LDIW
, etc.
LDEW n
,STEW n
- Load and store in enclosing frame;
LDEW n
is equivalent toLDLW -4 / LDNW n
, andSTEW n
toLDLW -4 / STNW n
. These instructions perform the operations needed to load and store one-word quantities in the stack frame of the statically enclosing procedure by following one link in the static chain.
TESTGEQ lab
- The sequence
DUP 0 / CONST n / JGEQ lab
can be abbreviated asCONST n / TESTGEQ lab
. This is used in translating CASE statements.
==Procedure call and return
CALL n
,CALLW n
,CALLD n
STKMAP n
LINK
,SAVELINK
RETURN
,RETURNW,
RETURND@
ALIGNC
,ALIGNS
Support
These instructions are not used directly by the compiler, but are introduced by the assembler in the code it produces for other instructions.
FCMP
,DCMP
LDKW n
,LDKD n
JPROC n
RESULTW,
RESULTD@
SLIDEW
,SLIDED
Older text
Various parts of the system deal with bytecode instructions:
- The back end of the compiler.
- The assembler/linker.
- The bytecode interpreter.
- JIT translators.
However, the sets of instructions that can appear at various stages vary a bit. The compiler generates code using instructions from a basic set, but there's a peephole optimiser that packs together common sequences into single instructions. For example, the code generator might translate the expression x+1 into (its internal representation of) LOCAL 12 / LOADW / CONST 1 / PLUS
, and the peephole optimiser will compress this into LDLW 12 / INC
. The peephole optimiser also introduces out-of-line constants for values that will not fit in the 16 bits that is the largest operand supported by the bytecode.
Next in the tool-chain comes the assembler/linker, which makes two significant changes: first, it may have multiple encodings for each instruction as opcodes; and second, some less common instructions generated by the compiler have no encoding, but are expanded by the assembler into equivalent short sequences of instructions. This releases more of the 256 opcodes for encoding more common instructions. For an example of the first kind of change, there are single-byte instructions that push small constants (perhaps 0 to 6) onto the stack, then there is a two-byte sequence that can push any value between -128 and 127, and there is a three-byte sequence that can push values between -32768 and 32767. Any larger values will have been replaced by out-of-line constants by the compiler.
An example of the second kind of change is provided by floating point branches. In place of (for example) the instruction FJLT lab
(which pops two floats x and y and branches to lab
if x<y
), the assembler generates the sequence FCMP / JLTZ lab
, first comparing the two floats and pushing a comparison indicator -1, 0, or +1, then popping this and branching to lab
if it is negative.
A sensible approach in a JIT translator is first to decode the opcodes into instructions, then to reverse the packing process that was done by the peephole optimiser, getting code that is expressed in terms of a small number of basic operations. These can then be used as the basis of instruction selection in a way that is more general than can be acheived by fixed translation of each instruction that is present in the raw bytecode.
Core instructions
These are the instructions as generated by the compiler before the peephole optimiser does its packing. They are also a reasonable set to implement directly in a JIT translator, after expanding packed instructions into their constituent parts.
CONST n -- Push constant LDKW n -- Push word from constant pool LOCAL n -- Push address of local PLUSA -- Add address and offset LOADs -- Pop address and push contents (s = W, S, C, D) STOREs -- Pop address then value, and store DUP n SWAP POP n PLUS, MINUS, TIMES, DIV, MOD -- Integer arithmetic UMINUS -- Integer unary minus AND, OR, NOT -- Boolean operators BITAND, BITOR, BITXOR, BITSUB, BITNOT -- Bitwise operators BIT -- Pop n, push (1 << n). [Note that BIT(32) = 0] LSL, LSR -- Logical shifts ASR -- Arithmetic shift right EQ, LT, GT, LEQ, GEQ, NEQ -- Integer comparisons JUMPF, JUMPT -- Pop boolean and jump if true or false JUMP -- Unconditional jump JCASE, JRANGE -- Special jumps for CASE statements FPLUS, FMINUS, FTIMES, FDIV -- Floating-point arithmetic FUMINUS -- Floating point unary minus FEQ, FLT, FGT, FLEQ, FGEQ, FNEQ -- Floating-point comparisons DPLUS, DMINUS, DTIMES, DDIV -- Double-precision arithmetic DUMINUS -- Double-precision unary minus DEQ, DLT, DGT, DLEQ, DGEQ, DNEQ -- Double-precision comparisons CONVNF, CONVFN -- Convert between integer and floating point CONVND, CONVDN -- Convert between integer and double precision CONVFD, CONVDF -- Convert between float and double CONVNC -- Convert integer to character CONVNS -- Convert integer to short BOUND -- Array bound check NCHECK -- Null pointer check GCHECK -- Check for global procedure ZCHECK -- Zero divisor check FZCHECK -- Floating-point zero divisor check DZCHECK -- Double-precision zero divisor check ERROR -- Runtime error ALIGNC, ALIGNS -- Align argument in 32-bit word FIXCOPY -- Copy fixed-size aggregate FLEXCOPY -- Allocate and copy open array parameter TYPETEST -- Class membership test CALL, CALLW, CALLD -- Procedure call TCALL, TCALLW, TCALLD -- Tail call LINK -- Pass static link SAVELINK -- Save static link in frame RETURN, RETURNW, RETURND -- Return from procedure LNUM -- Line number for debugging and profiling
Packed instructions
These instructions are introduced by the peephole optimiser as compact equivalents for sequences of common instructions.
INDEXs = PUSH s / BINOP Times / BINOP PlusA where s can be W=4, S=2, C=1, D=8 LDLs n = LOCAL n / LOADs STLs n = LOCAL n / STOREs LDGs n = LDKW n / LOADs STGs n = LDKW n / STOREs LDIs = INDEXs / LOADs STIs = INDEXs / STOREs LDNW n = CONST n / LDIW STNW n = CONST n / STIW LDEW n = LDLW -4 / LDNW n STEW n = LDLW -4 / STNW n INCL n = LDLW n / INC / STLW n DECL n = LDLW n / DEC / STLW n JEQZ lab, etc. = CONST 0 / JEQ lab, etc. TESTGEQ lab = DUP 0 / JLT lab JTYPET m lab = TYPETEST m / JUMPT lab JTYPEF m lab = TYPETEST m / JUMPF lab
Support instructions
The assembler and bytecode interpreter spend lots of opcodes on common instructions like "load local", so that a fair number of locals can be accessed with a fast, single-byte instruction. To free up opcodes for this, they implement other, less common instructions with an equivalent short sequence of others. These extra instructions are there to help with this process.
FCMP -- Compare floating point values and push comparison indicator DCMP -- Ditto for double-precision values
Because these instructions are never generated by the compiler, but only be the assembler as part of a canned sequence, it's OK for a JIT translator to handle them only in restricted circumstances; for example, FCMP
is always followed by an integer comparison with zero or a conditional jump.