Vous êtes sur la page 1sur 9

Building a compiler for a custom MIPS-like

instruction set

Vittorio Giovara
Fabio Margaglia
Francesco Meo
Lorenzo Simone

document revision 1.2


Contents

1 Introduction 2

2 Instruction List 3
2.1 Instruction Rules . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Arithmetical Operations . . . . . . . . . . . . . . . . . 4
2.1.2 Logical Operations . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Memory Operations . . . . . . . . . . . . . . . . . . . 4
2.2 Jump Operations . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Branch Operations . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Shift Operations . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Special Operations . . . . . . . . . . . . . . . . . . . . 5

3 Compiler structure 6
3.1 The JFlex scanner . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 The CUP parser . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1 Instruction Encoding and Emulation . . . . . . . . . . 8
3.2.2 Error Handling . . . . . . . . . . . . . . . . . . . . . . 8

1
Chapter 1

Introduction

The aim of this report is to provide documentation about the use of the
instruction set of a custom MIPS-like processor and about the internal code
of the compiler.
This document will list the implemented instructions and their assembly-
like structure. Afterwards it will present the use of the compiler itself as
well as the explanation of the involved code about the scanner and parsing
technique.
The compiler comes with an emulator, that preforms simulated execution
of the compiled code.

2
Chapter 2

Instruction List

• NOP • ST8

• ADDU • JMP
• ADDS • JAL
• SUBU • JZ
• SUBS
• JNZ
• MULU
• JLZ
• MULS
• JGZ
• DIVU
• JLEZ
• DIVS
• JGEZ
• AND
• LLS
• OR
• LRS
• XOR
• ALS
• LD32

• LD16 • ARS

• LD8 • LR

• ST32 • RR

• ST16 • MOV

3
2.1 Instruction Rules
2.1.1 Arithmetical Operations
parameter 1 : Destination Register (Rd 1 or Rt 2 )

parameter 2 : Source Register (Rs )

parameter 3 : Source Register (Rt ) or Immediate

2.1.2 Logical Operations


parameter 1 : Destination Register (Rd or Rt )

parameter 2 : Source Register (Rs )

parameter 3 : Source Register (Rt ) or Immediate

2.1.3 Memory Operations


LOAD
parameter 1 : Destination Register (Rt )

parameter 2 : Source Register (Rs )

parameter 3 : Immediate

STORE
parameter 1 : Source Register (Rt )

parameter 2 : Destination Register (Rs )

parameter 3 : Immediate

2.2 Jump Operations


parameter : Immediate or Register

2.2.1 Branch Operations


parameter 1 : Register (Rs )

parameter 2 : Immediate or Register (Rt )


1
in register to register operations
2
in register to immediate operations

4
2.2.2 Shift Operations
parameter 1 : Destination Register (Rd )

parameter 2 : Source Register (Rs )

parameter 3 : Register (Rt ) 3

2.2.3 Special Operations


MOV
parameter 1 : Destination Register (Rt or Rd )

parameter 2 : Immediate or Register (Rt )

NOP
No arguments needed.

3
Defines the shift or rotate amount

5
Chapter 3

Compiler structure

The compiler needs 3 arguments to be passed by command line:

Input file: The source file for the assembly-like code;

Verbosity level: Selects the quantity of ouput message printed;

Emulation level: Turns on or off the emulation.

The verbosity level is a number between 0 (meaning no output at all)


and 4 (meaning that every single step of the scanner and of the parser
are notified); error messages are always printed. When none is specified, a
default is set to 1.
The emulation level is a flag that manages the emulation execution, 0
deactivates this feature, 1 activates it and 2 gives information about the
program counter. When none is specified, a default is set to 0.
The output file on which the binary encoding is saved is always named
code.bin.

3.1 The JFlex scanner


The scanner initially defines two additional contexes for handling the com-
ments: one for single line comments (active from // to a new line), and the
other for multi line comments (enclosed in /* and */).
Then it describes the syntactic rules, using the following macros:

nl The new line symbol for different operating systems;

ignored List of charaters or symbols that can be safely ignored;

register The structure for registers (capital R followed by a number be-


tween 0 and 31);

integer Structure for integers (optional sign followed by a natural number);

6
separator The character that separates the instruction arguments;
eol The character that concludes the instruction.

The actual operations are not defined by macros but are aggregated:
for example mathematicatical operations that can be signed or unsigned
are described by the same rule (eg. the addition is described as ADD(U|S)
not as ADDU and ADDS but inside that rule the two possible symbols are
differenciated.
Every rule passes a symbol to CUP, with some attributes: the number
for the register and the kind of operation for the base instructions.

3.2 The CUP parser


The parser defines a terminal symbol for each possible operation, and the set
of arguments (INT, COMA, REG, EOL); then it describes the productions
in the following way:

• op −→ any possible operation


translates from terminal symbol to non-terminal saving the string at-
tribute;
• register −→ REG
passes the attribute to top level ;
• arguments
−→ register COMA register COMA INT EOL
for register-to-immediate instructions
−→ register COMA register COMA register EOL
for register-to-register instructions;
−→ register COMA INT EOL or register COMA register EOL
for conditional branch instructions;
−→ register EOL or INT EOL
for unconditional branch instructions;
−→ EOL
for the no-operation instruction.
• line −→ op arguments
joins the operation with its arguments;
• instruction −→ line
renames the non terminal symbol and discards the attributes;
• list −→ instruction or list instruction
allows to parse more than one instruction;

7
• program −→ list
renames the non terminal symbol and produces the starting symbol.

3.2.1 Instruction Encoding and Emulation


When line gets produced (operation and arguments are available) the in-
struction is both encoded and emulated. According to the type of arguments
in the production arguments a status variable (flag) is set up and it changes
the output encoding and emulation result.
In order to perform the encoding, a file named rules encoding is loaded
in the program and generates an hash map in which each instruction is as-
sociated with its binary code; the string used to determine the type of oper-
ation is used as a key to the hash map. When an instruction is encoded the
integer values are converted in binary string with the function printBin()
and placed in the output string.
As for the emulation, the registers and the cache memory are emulated
using two integer vectors of size 32 and 2561 respectively and both are initial-
ized with initRegandMem(). Then using the same structure of the encoding
section it proceeds to perfom a the detected operation by getting the integer
values of the register (used as vector index) and immediate numbers and to
print on screen the output result.
The emulation is capable of handlint unsigned integer numbers (whose
native type is absent in Java) through the use of unsignedOperation()
function.

3.2.2 Error Handling


The main focus of error handling is to force the parser to continue reading
from the file instead of aborting execution; so after an error the parser must
not stop.
In order to achieve this behaviour, special terminal symbol error is used
and inserted in productions that can cause an unexpected behaviour. As a
matter of fact the error symbol is present in the arguments production for
correctly handling wrong argument formats like inserting more than three
arguments.
However error correction is also obtained using once again the flag
structure: since a different argument format gives a different value to this
variable, it is possible to recognise a wrong argument format just by checking
its value: for example in mathematical operations that can handle only two
registers and an immediate (flag is 0) or three registers (flag is 1), only those
two values are accepted while all the other configurations are rejected.

1
this is due to the actual processor that has such dimesion because of synthesis -
constraints.