Sugar ISA, Parser, and Emulator

This project achieves two goals.

Formally design and specify a distinctive RISC-ISA, including assembly syntax, for educational purposes.
Emulate assembly parser and program execution using Java.

Personal Agenda

This project was implemented for my CV, to demonstrate my understanding of computer architecture at a very detailed level, and to also showcase my skills in software engineering, data structures and object-oriented programming. Most of these concepts were taught to me from courses at the Australian National University, such as COMP2300 and COMP2100.

Examples of Sugar assembly code snippets can be found here.

ISA

Sugar Instruction Set Architecture (ISA) is designed for the simplicity of its assembly code, and for education purposes, as it does not have any real-world practical uses. Sugar is a RISC ISA, inspired by ARM and ANU's QuAC ISAs. The assembly is purposely designed to use C-style syntax, for uniqueness and user-friendliness. For the full specification, see the specifications.

See below for an example of Sugar assembly code that calculates the factorial of a number.

MAIN:
    push lr;

    r1 = 10;
    call FAC;
    goto END;

// r1 holds n, returns n! through r1.
FAC:
    push lr;
    r2 = 2;
    compare r1, r2;
    goton BASE_CASE;

    push r1;
    r1 -= 1;
    call FAC;
    pop r2;
    r1 *= r2;

    pop lr;
    return;

BASE_CASE:
    r1 = 1;
    pop lr;
    return;

END:
    pop lr;

User Guide

To use the emulator, first create a SugarExecutor instance.

import xyz.haroldgao.sugarisa.execute.SugarExecutor;

public class Example{
  
    public static void main(String[] args){
        
        String assembly = "r1 = r2 + 50;";
        Sugar executor = SugarExecutor.load(assembly);  
          
    }
  
}

Then, with the instance, use SugarExecutor#next() to execute a single instruction, or SugarExecutor#execute();, which executes all instructions until it reads a termination instruction. When any of these methods are called, an instance of ArchitecturalState is returned, which contains a maximum 4GB of emulated memory, along with a register file. ArchitecturalState#read(int) can be called to read from memory, while ArchitecturalState#read(Register) can be used to read a register. Similarly, ArchitecturalState#write(int, int), and ArchitecturalState#write(Register, int) can be used to write to memory, or the register file.

import xyz.haroldgao.sugarisa.execute.ArchitecturalState;
import xyz.haroldgao.sugarisa.execute.Register;
import xyz.haroldgao.sugarisa.execute.SugarExecutor;

public class Example {

  public static void main(String[] args) {

    String assembly = "r1 = r2 + 50;";
    Sugar executor = SugarExecutor.load(assembly);

    ArchitecturalState state = executor.execute();

    //Should print 50.
    System.out.println(state.read(Register.R1));

  }

}

Parser

This project includes a parser which is capable of converting Sugar assembly into a list of internal representation of instructions. This is achieved through the following pipeline.

For this section, the following example will be used to explain this process.

FUNCTION: //Comment
    r2 = r3;
    goto END;
END:
    ;

The Tokeniser converts the code into a list of tokens. The assembly string in our example will be converted to the following tokens.

tokens: [
    {label, "FUNCTION"},
    {colon, ":"},
    {comment, "Comment"}, 
    {keyword, "r2"}, 
    {equals, "="}, 
    {keyword, "r3"}, 
    {terminator, ";"},
    {keyword, "goto"}, 
    {label, "END"},
    {terminator, ";"}, 
    {label, "END"},
    {colon, ":"}, 
    {terminator, ";"}
]

The tokeniser takes care of whitespaces and invalid combinations of characters. 2. A Linker then removes all comment tokens, splits the list based on the terminator token, removes and links each label to the program address of the instruction it is pointing to. Hence, we have the following results.

instructions: [
    [
        {keyword, "r2"},
        {equals, "="}, 
        {keyword, "r3"},
        {terminator, ";"}
    ],
    [
        {keyword, "goto"}, 
        {label, "END"}, 
        {terminator, ";"}
    ],
    [
        {terminator, ";"}
    ]
]

labelMap:
    "FUNCTION" -> 0
    "END" -> 12

For each set of tokens for a specific instruction, The Parser then takes the map produced by the linker and the tokens as input, and parses the tokens based on a single instance of a decision-parse-tree. This data structure recurses through the list of tokens and its children, while potentially saving data to a ParseState. Functional interfaces also generalises the process of error handling, and returning instructions. See SugarParseTree for the singleton construction of the tree specifically for Sugar. Hence, our example above will be converted to the following list of instructions.

instructions: [
    {set r2, r3},
    {goto END},
    {set r0, r0}
]

(Note that the instruction represented by a single semicolon is specifically {set r0, r0}, which does nothing.)

Program Execution

Memory

Sugar's memory is 32-bit addressable, which gives us an address space of 4 gigabytes. Obviously if we allocate each possible memory address at once, the program becomes very inefficient. To solve this problem, a HashMap between integers and byte arrays are used, where each key represents the upper half of a memory address, and each byte array is a chunk of $2^{16}$ possible memory slots. If more chunks of memory need to be allocated, the program will automatically do so. This memory emulator is not idiot-proof, if desired, one can purposely hack the "smart" memory allocation by writing a non-zero byte to an entry in the byte array identified by each possible upper half of memory address. In that case, the emulator becomes very inefficient.

Register File

The register-file is trivially a mutable integer array with 16 entries.

Instruction Transformation

With the same example as above, but transformed to a set of instructions, this section describes how program execution is emulated. First, observe that the Instruction class contains an abstract method defining its binary representation. This obviously depends on the instruction, and with the list of instructions, the binaries are computed and stored in an emulated memory. The program memory of Sugar always starts at 0x00000000, with no definitive bound for simplicity. Then, the SugarExecutor uses the emulated memory and register file to execute instructions sequentially, reverse-engineering each binary back to the internal representation in Java.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
examples		examples
items		items
src		src
.gitignore		.gitignore
README.md		README.md
grammar.md		grammar.md
newgrammar.md		newgrammar.md
pom.xml		pom.xml
specifications.md		specifications.md
tokens.md		tokens.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sugar ISA, Parser, and Emulator

Personal Agenda

ISA

User Guide

Parser

Program Execution

Memory

Register File

Instruction Transformation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

LaFriska/sugar-isa

Folders and files

Latest commit

History

Repository files navigation

Sugar ISA, Parser, and Emulator

Personal Agenda

ISA

User Guide

Parser

Program Execution

Memory

Register File

Instruction Transformation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages