Ramblings on the Toy Language Part 1

Posted on February, 21 2016

Since the slow beginning of my senior year, I've been slowly laying out the foundations of my toy language's VM. Much like how you would implement a real machine, or any functioning system, is the fact that it can be interpreted as smaller, individual components that work together. Once you have a rough layout of the individual components and how they interact, you can implement unit tests on each component you make and ensure their behavior via comprehensive tests. Of course, certain components are more reusable than others. For this installment, I'm going to be talking about the instruction interpreter.

The Instruction Interpreter

This is the unit that takes a series of bytes and giving them a meaning to the machine. Instructions are single-byte opcodes, with varying length arguements depending on the opcodes definition. For example, a binary addition operator such as ADD, has 0 parameters, thus the instruction is exactly 1 byte in length. On the other hand, operators such as GOTO require an address parameter. This makes its length 1 byte for the opcode, and the length of how many bytes it takes to represent an address. For example, the mnemonic way to write the expression is as follows:

GOTO 0x04

The above instruction implies that the address space 'width' in bytes is 1, so this instruction would take 1 (opcode length) + 1 (arguement length) = 2 bytes. This width implication must be defined pre-execution during the initialization of the interpreter.

The instruction interpreter would have to lookup the metadata associated with the particular instruction to correctly load the correct values into references in memory. Much like how you read a book and remember important 'keyed' bits of information that chain together a story, the instruction interpreter allows the program to interpret the words of the story to be put into memory; but more on memory later. There's a few more important points that we must cover here.

Once this is done, the instructions must be traverseable by address. This will be required by the 'Program Counter'. A program counter is a unit that keeps track of the current instruction being interpreted. It must increment/decrement, and branch to arbitrary locations in memory, invoked by instructions like GOTO.

On top of the instruction interpreter, there must the the logic that actually implements the definition of every single opcode. Along with the logic, there are many interfaces it must communicate with that implement architecture-specific functions and external library calls.

Helpful diagram


That's one small part of the big picture. The instruction interpreter covers a critical & fundamental part of the problem of getting a machine to compute. Further iterations of these posts rambling about components will continue when I get the time to type out my thoughts.