Académique Documents
Professionnel Documents
Culture Documents
rv8
RISC-V simulator for x86-64
rv8 is a RISC-V simulation suite comprising a high performance x86-64 binary translator, a user mode
simulator, a full system emulator, an ELF binary analysis tool and ISA metadata:
About
The rv8 simulator suite contains libraries and command line tools for creating instruction opcode maps, C
headers and source containing instruction set metadata, instruction decoders, a JIT assembler, LaTeX
documentation, a metadata based RISC-V disassembler, a histogram tool for generating statistics on RISC-V
ELF executables, a RISC-V proxy syscall simulator, a RISC-V full system emulator that implements the RISC-
V 1.9.1 privileged speci cation and an x86-64 binary translator.
The rv8 binary translation engine works by interpreting code while pro ling it for hot paths. Hot paths are
translated on the y to native code. The translation engine maintains a call stack to allow runtime inlining of
hot functions. A jump target cache is used to accelerate returns and indirect calls through function pointers.
The translator supports hybrid binary translation and interpretation to handle instructions that do not have
native translations. Currently ‘IM’ code is translated and ‘AFD’ is interpreted. The translator supports RVC
compressed code.
https://rv8.io/ 1/28
14/04/2018 rv8 - RISC-V simulator for x86-64
The rv8 suite includes a full system emulator that implements the RISC-V privileged ISA with support for
interrupts, MMIO (memory mapped input output) devices, a soft MMU (memory management unit) with
separate instruction and data TLBs (translation lookaside buffers). The full system emulator has a simple
integrated debugger that allows setting breakpoints, single stepping and disassembling instructions as they
are executed.
The rv8 user mode simulator is a single address space implementation of the RISC-V ISA that implements a
subset of the RISC-V Linux syscall ABI (application binary interface) and delegates system calls to the
underlying native host operating system. The user mode simulator can run RISC-V Linux binaries on non-
Linux operating systems via system call emulation. The current user mode simulator implements a small
number of Linux system calls to allow running RISC-V Linux ELF static binaries.
The rv-bin tool contains a meta-data driven disassembler and a histogram tool for analysing static register
usage and static instruction usage.
https://rv8.io/ 2/28
14/04/2018 rv8 - RISC-V simulator for x86-64
The rv-meta tool is able to generate opcode maps, instruction decoders, source, headers and instruction set
listing LaTeX from ISA metadata. The following is an example of PDF output:
RISC-V instruction set metadata is available here. The linked page shows an example RISC-V Instruction Set
Reference generated by the rv-meta tool. rv8 has a simple extensible generator framework that allows
re ection on the instruction set metadata to generate a number of different output formats.
Installation
rv8 supports the following target architecture and host operating system combinations:
Target
RV32IMAFDC
RV64IMAFDC
Privilged ISA 1.9.1
Host
Linux (Debian 9.0 x86-64, Ubuntu 16.04 x86-64, Fedora 25 x86-64) (stable)
macOS (Sierra 10.11 x86-64) (stable)
FreeBSD (11 x86-64) (alpha)
Please read the RISC-V toolchain installtion instructions in the riscv-gnu-toolchain repository. To
experiment with the RISC-V toolchain online try the RISC-V Compiler Explorer.
Building riscv-gnu-toolchain
https://rv8.io/ 3/28
14/04/2018 rv8 - RISC-V simulator for x86-64
rv8 has minimal external dependencies besides a C++14 compiler, the C/C++ standard libraries and the
asmjit submodule.
Building rv8
Running rv8
The riscv64-unknown-elf newlib toolchain is required for building the rv8 test cases and this build step depends
on the RISCV environment variable.
$ cd rv8
$ export RISCV=/opt/riscv/toolchain
$ make test-build
$ make test-sim
$ make test-sys
$ rv-jit build/riscv64-unknown-elf/bin/test-dhrystone
Optimisations
The rv8 binary translator performs JIT (Just In Time) translation of RISC-V code to X86-64 code. This is a
challenging problem for many reasons; with the principle challange due to RISC-V having 31 integer
registers while x86-64 has only 16 integer registers.
Register allocation
rv8 solves the register set size problem by using a static register allocation and spilling registers to memory
(L1 cache) (a future versions may use dynamic register allocator). A signi cant amount of performance is
https://rv8.io/ 4/28
14/04/2018 rv8 - RISC-V simulator for x86-64
lost due to register allocations that take advantage of the larger number of available registers and less
frequent stack spills. It is not possible for the translator to rearrange memory and registers for optimal
stack spills as memory accesses must be translated precisely. The additional registers are translated as x86-
64 memory operands (which produce load and store micro-ops) or in some circumstances, explicit mov
instructions.
The remaining unallocated registers are stored in a memory spill area accessed using the rbp register. e.g.
qword [rbp+0xF8] would be used to access t4.
Translator temporaries
The rv8 translator needs to use several host registers to point to translator internal structures and for use
as temporary registers for the emulation of many instructions, for example a store instructions require the
use of two temporary registers if both register operands are in the spill area. The translator uses the
following x86-64 host registers as temporaries leaving 12 registers available for mapping to RISC-V
registers:
rbp - pointer to the register spill area and jump target cache
rsp - pointer to the host stack to allow procedure calls
rax - translator temporary register
rcx - translator temporary register
The rv8 translator makes use of CISC memory operands to access registers residing in the memory backed
register spill area, which resides in L1 cache. The complex memory operands end up being cracked back into
micro-ops in the CISC pipeline however the use of complex memory operands helps increase instruction
density, which increases performance due to better use of I$ (instruction cache).
There are many combinations of instruction expansions depending on whether a register is mapped to a live
register, is memory backed and whether there are two or three operands. A three operand RISC-V
instruction is translated into a move and a destructive two operand x86-64 instruction. Temporary
registers are used if both operands are memory backed. The principle is to maintain the densest possible
mapping to the x86-64 ISA.
https://rv8.io/ 5/28
14/04/2018 rv8 - RISC-V simulator for x86-64
Indirect calls through function pointers cannot be statically translated as the target address of their
translation is not known at the time of translation. rv8 employs a trace cache which is a hashtable of guest
program addresses to native code addresses. A full trace cache lookup is relatively slow because it requires
saving caller-save registers and calling into C++ code. To accelerate indirect calls through function pointers,
a small assembly stub looks up the target address in a sparse 1024 entry direct mapped L1 translation
cache, and falls back to a slow translation cache miss path that saves registers and calls into the translator
code to populate the L1 translation cache so that the next indirect call can be accelerated.
The direct mapped L1 translation cache is indexed by bits[10:1] of the guest address. Bit zero can be ignored
because RISC-V instructions must start on a 2-byte boundary
Inline caching
Returns also make use of the L1 translation cache, however a procedure call made inside of a hot trace can
be inlined. The translator maintains a call stack to keep track of return addresses. Upon reaching an inlined
procedure RET (jalr zero, ra) instruction, the link register (ra in RISC-V, rdx in the x86 translation) is compared
against the callers known return address and if it matches, control ow continues along the return path. In
the case that the function is not inlined, the regular L1 translation cache is used to lookup the address of the
translated code.
https://rv8.io/ 6/28
14/04/2018 rv8 - RISC-V simulator for x86-64
The translator performs lazy translation of the source program during tracing and when it reaches
branches, it can only link both sides of the branch if there exists an existing translation for the not taken side
of the branch. To accelerate branch tail exits, the translator emits a relative branch to a trampoline that
returns to the tracer main loop, and the tracer adds the branch to a table of branch xup addresses indexed
by target guest address. If the branch target is hot, once it has been translated, all relative branches that
point to tail exit trampolines will be relinked to branch directly to the translated native code.
Macro-op fusion
The rv8 translator implements an optimisation known as macro-op fusion whereby speci c patterns of
adjacent instructions are translated into a smaller sequence of host instructions. The macro-op fusion
pattern matcher has potential to increase performance further with the addition of common patterns. The
following is a list of macro-op fusion patterns that are currently implemented in rv8:
ADDIW rd, rs1, imm12; SLLI rd, rs1, 32; SRLI rd, rs1, 32;
Fused into 32-bit zero extending ADD instruction.
SRLI r1, rs, imm12; SLLI r2, rs, 64 - imm12; OR r1, r1, r2;
Fused into 64-bit ROR with one residual SHL or SHR temporary
SRLIW r1, rs, imm12; SLLIW r2, rs, 32 - imm12; OR r1, r1, r2;
Fused into 32-bit ROR with one residual SHL or SHR temporary
A technique known as deoptimisation can be employed to allow elision of temporary registers in macro-op
fusion patterns assuming the translator sees the register killed within its translation window.
Deoptimisation requires that the optimised translation has an accompanying deoptimisation sequence to
ll in elided register values, and this is played back in the case of a fault (device or debug interrupt) so that
the visible machine state precisely matches that which the ISA dictates. rv8 does not presently implement
deoptimisation, however it may be necessary to allow more sophisticated optimisations.
In addition to the register allocation problem, rv8 has to make sure that 32-bit operations on registers are
sign extended instead of zero-extended. The normal behaviour of 32-bit operations on x86-64 is to zero
extend bit 31 to bit 63 whereas RISC-V sign extends bit 31 to bit 63. One potential optimisation is lazy sign
extension. It may be possible in a future version of the JIT translation engine to elide redundant sign
extension operations, however it is important that the register state precisely matches the semantics of the
ISA before executing an instruction that may cause a fault e.g. loads and stores.
The bencharks below contain digest algorithms and ciphers which can take advantage of bit manipulation
instructions such as rotate and bswap. Present day compilers detect rotate and byte swap bitwise logical
operations by matching intermediate representation patterns that can be lowered directly to bit
manipulation instructions such as ROR, ROL, BSWAP on x86-64. This approach has the bene t of accelerating
code that does not use inline assembly or compiler builtin functions. RISC-V currently lacks bit manipulation
instructions however there are proposals to add them in the B extension. The following is a typical byte
swap pattern.
rv8 implements rotate macro-op fusion which can translate two shift instructions and one OR instructions
with the correct offsets into one shift and one rotate. The rotate macro-op fusion needs to create the
residual temporary register side effects so that the register le contents are precisely matched, as it can’t
https://rv8.io/ 8/28
14/04/2018 rv8 - RISC-V simulator for x86-64
easily prove the residual temporary register is not later used. Deoptimisation would be required to elide the
temporary register.
Measurement
A future goal is to quantify the factors that contribute to the performance differences between native x86-
64 code and translated RISC-V code, so future benchmarks should measure:
Benchmarks
The following section contains benchmark runtime and instructions per second results comparing the
QEMU and rv8 JIT engines against native x86. This section also contains runtime neutral results comparing
total retired RISC-V instructions to x86 micro-ops. The benchmark programs are compiled for aarch64,
arm32, riscv64, riscv32, x86-64 and x86-32. See the Benchmarks Results page for the complete result set
including optimisation level comparisons, macro-op fusion performance, executable le sizes, dynamic
register and instruction usage charts.
Benchmark source
rv8 - https://github.com/rv8-io/rv8/
rv8-bench - https://github.com/rv8-io/rv8-bench/
qemu-riscv - https://github.com/riscv/riscv-qemu/
musl-riscv-toolchain - https://github.com/rv8-io/musl-riscv-toolchain/
Benchmark metrics
Runtimes
Instructions Per Second
Benchmark details
https://rv8.io/ 9/28
14/04/2018 rv8 - RISC-V simulator for x86-64
Compiler details
The following compiler architectures, versions, compile options and runtime libraries are used to run the
benchmarks:
Measurement details
Runtimes
https://rv8.io/ 10/28
14/04/2018 rv8 - RISC-V simulator for x86-64
10
native-x86-64-O3-runtime
rv8-riscv64-O3-runtime
qemu-riscv64-O3-runtime
qemu-aarch64-O3-runtime
8
6
Runtime (secs)
0
aes bigint dhrystone miniz norx primes qsort sha512
https://rv8.io/ 11/28
14/04/2018 rv8 - RISC-V simulator for x86-64
10
native-x86-64-O2-runtime
rv8-riscv64-O2-runtime
qemu-riscv64-O2-runtime
qemu-aarch64-O2-runtime
8
6
Runtime (secs)
0
aes bigint dhrystone miniz norx primes qsort sha512
https://rv8.io/ 12/28
14/04/2018 rv8 - RISC-V simulator for x86-64
10
native-x86-64-Os-runtime
rv8-riscv64-Os-runtime
qemu-riscv64-Os-runtime
8 qemu-aarch64-Os-runtime
6
Runtime (secs)
0
aes bigint dhrystone miniz norx primes qsort sha512
https://rv8.io/ 13/28
14/04/2018 rv8 - RISC-V simulator for x86-64
https://rv8.io/ 14/28
14/04/2018 rv8 - RISC-V simulator for x86-64
10
native-x86-32-O3-runtime
rv8-riscv32-O3-runtime
qemu-riscv32-O3-runtime
qemu-arm32-O3-runtime
8
6
Runtime (secs)
0
aes bigint dhrystone miniz norx primes qsort sha512
https://rv8.io/ 15/28
14/04/2018 rv8 - RISC-V simulator for x86-64
10
native-x86-32-O2-runtime
rv8-riscv32-O2-runtime
qemu-riscv32-O2-runtime
qemu-arm32-O2-runtime
8
6
Runtime (secs)
0
aes bigint dhrystone miniz norx primes qsort sha512
https://rv8.io/ 16/28
14/04/2018 rv8 - RISC-V simulator for x86-64
10
native-x86-32-Os-runtime
rv8-riscv32-Os-runtime
qemu-riscv32-Os-runtime
8 qemu-arm32-Os-runtime
6
Runtime (secs)
0
aes bigint dhrystone miniz norx primes qsort sha512
https://rv8.io/ 17/28
14/04/2018 rv8 - RISC-V simulator for x86-64
Instructions per second in millions comparing qemu, rv8 and native x86:
https://rv8.io/ 18/28
14/04/2018 rv8 - RISC-V simulator for x86-64
native-x86-64-O3-mips
14000
rv8-riscv64-O3-mips
qemu-riscv64-O3-mips
12000
MIPS (Millions of Instructions Per Second)
10000
8000
6000
4000
2000
0
aes bigint dhrystone miniz norx primes qsort sha512
Instructions per second (MIPS) qemu, rv8 and native 64-bit -O3
https://rv8.io/ 19/28
14/04/2018 rv8 - RISC-V simulator for x86-64
native-x86-64-Os-mips
14000
rv8-riscv64-Os-mips
qemu-riscv64-Os-mips
12000
MIPS (Millions of Instructions Per Second)
10000
8000
6000
4000
2000
0
aes bigint dhrystone miniz norx primes qsort sha512
Instructions per second (MIPS) qemu, rv8 and native 64-bit -Os
https://rv8.io/ 20/28
14/04/2018 rv8 - RISC-V simulator for x86-64
native-x86-32-O3-mips
14000
rv8-riscv32-O3-mips
qemu-riscv32-O3-mips
12000
MIPS (Millions of Instructions Per Second)
10000
8000
6000
4000
2000
0
aes bigint dhrystone miniz norx primes qsort sha512
Instructions per second (MIPS) qemu, rv8 and native 32-bit -O3
https://rv8.io/ 21/28
14/04/2018 rv8 - RISC-V simulator for x86-64
native-x86-32-Os-mips
14000
rv8-riscv32-Os-mips
qemu-riscv32-Os-mips
12000
MIPS (Millions of Instructions Per Second)
10000
8000
6000
4000
2000
0
aes bigint dhrystone miniz norx primes qsort sha512
Instructions per second (MIPS) qemu, rv8 and native 32-bit -Os
Logging
rv-sim and rv-sys support the ability to log instructions (--log-instructions), register values (--log-operands) and rv-sys
can log page table walks (--log-pagewalks).
https://rv8.io/ 22/28
14/04/2018 rv8 - RISC-V simulator for x86-64
$ rv-sim -l build/riscv64-unknown-elf/bin/hello-world-pcrel
0000000000000000000 core-0 :0000000000010078 (4501 ) mv a0, zero
0000000000000000001 core-0 :000000000001007a (00000597) auipc a1, pc + 0
0000000000000000002 core-0 :000000000001007e (02658593) addi a1, a1, 38
0000000000000000003 core-0 :0000000000010082 (4631 ) addi a2, zero, 12
0000000000000000004 core-0 :0000000000010084 (4681 ) mv a3, zero
0000000000000000005 core-0 :0000000000010086 (04000893) addi a7, zero, 64
Hello World
0000000000000000006 core-0 :000000000001008a (00000073) ecall
0000000000000000007 core-0 :000000000001008e (4501 ) mv a0, zero
0000000000000000008 core-0 :0000000000010090 (4581 ) mv a1, zero
0000000000000000009 core-0 :0000000000010092 (4601 ) mv a2, zero
0000000000000000010 core-0 :0000000000010094 (4681 ) mv a3, zero
0000000000000000011 core-0 :0000000000010096 (05d00893) addi a7, zero, 93
Tracing
The rv-jit program supports the ability to log RISC-V instructions along with the dynamically translated x86-
64 assembly and machine code (--log-jit-trace). This mode is useful for JIT translation debugging and
optimisation analysis.
Histograms
The rv-sim and rv-sys programs support the ability to record and print histograms. Program counter
frequency (--pc-usage-histogram), dynamic instruction frequency (--instruction-usage-histogram) and dynamic register
usage (--register-usage-histogram) is supported.
The rv-bin program via the histogram subcommand has the ability to print static instruction frequency and
static register usage.
https://rv8.io/ 23/28
14/04/2018 rv8 - RISC-V simulator for x86-64
Sample output from rv-sim with the --register-usage-histogram and --instruction-usage-histogram options
$ rv-sim --register-usage-histogram \
--instruction-usage-histogram \
build/riscv64-unknown-elf/bin/test-aes
https://rv8.io/ 24/28
14/04/2018 rv8 - RISC-V simulator for x86-64
Linux
This section describes how to build and boot a Linux image in the full system emulator.
Please read the RISC-V toolchain installation instructions in the riscv-gnu-toolchain repository. The riscv64-
unknown-elf newlib toolchain is required for building the rv8 test cases and the riscv64-unknown-linux-gnu glibc
toolchain is required for building busybox which is used to create the Linux image that runs in the full system
emulator.
https://rv8.io/ 25/28
14/04/2018 rv8 - RISC-V simulator for x86-64
$ cd riscv-gnu-toolchain
$ make linux
$ cd rv8
$ export RISCV=/opt/riscv/toolchain
$ export PATH=${PATH}:${RISCV}/bin
$ make linux
To start linux, we execute bbl (the Berkeley Boot Loader) which performs early machine set up and then
passes control to an embedded linux kernel. After kernel initialisation, busybox is then executed from the
initramfs as pid 1 (init). The linux image and the initramfs are combined together and linked into bbl as the
boot payload.
$ rv-sys build/riscv64-unknown-elf/bin/bbl
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
vvvvvvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrr vvvvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrr vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrrrr vvvvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrrrrr vvvvvvvvvvvvvvvvvvvvvv
rrrrrrrrrrrrr vvvvvvvvvvvvvvvvvvvvvv
rr vvvvvvvvvvvvvvvvvvvvvv
rr vvvvvvvvvvvvvvvvvvvvvvvv rr
rrrr vvvvvvvvvvvvvvvvvvvvvvvvvv rrrr
rrrrrr vvvvvvvvvvvvvvvvvvvvvv rrrrrr
rrrrrrrr vvvvvvvvvvvvvvvvvv rrrrrrrr
rrrrrrrrrr vvvvvvvvvvvvvv rrrrrrrrrr
rrrrrrrrrrrr vvvvvvvvvv rrrrrrrrrrrr
rrrrrrrrrrrrrr vvvvvv rrrrrrrrrrrrrr
rrrrrrrrrrrrrrrr vv rrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrr rrrrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrrrr rrrrrrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrrrrrr rrrrrrrrrrrrrrrrrrrrrr
/#
References
RISC-V Foundation
https://rv8.io/ 27/28
14/04/2018 rv8 - RISC-V simulator for x86-64
https://rv8.io/ 28/28