Vous êtes sur la page 1sur 13

CellSim Tutorial

CellSim: a Modular Simulator for Heterogeneous Chip Mutiprocessors


Alejandro Rico1, David Rdenas1,2, Felipe Cabarcas1,3, Xavier Martorell1,2, Alex Ramrez1,2, Eduard Ayguad1,2

1: Universitat Politcnica de Catalunya 2: Barcelona Supercomputing Center 3: Universidad de Antioquia

PACT Tutorial September 15, 2007

Motivation

Multicore architectures has been the design adopted by computer architects for the present and the near future. Currently, commercial general purpose processors are composed by homogeneous cores. Heterogeneous chip multiprocessor architectures have more performance potential than homogeneous.

The Cell Processor is a heterogeneous chip multiprocessor by Sony, Toshiba and IBM, which is part of a gaming product. We have developed CellSim, a heterogeneous chip multiprocessor simulator that currently implements the Cell Processor.

Outline

The Cell Processor overview UNISIM overview CellSim overview CellSim installation Cell application compilation Tutorial application CellSim reconfiguration and restructure Adding a new instruction

Questions to be answered in the tutorial

What is the Cell Processor?


How does the Cell work? How does a Cell application look like? How does one compile and execute a Cell application?

What is CellSim?
How does CellSim work? What are the CellSim features and capabilities? How does one reconfigure CellSim? How does one restructure CellSim? How does one add a new instruction to the Cell compiler and to CellSim?

CellSim Tutorial

The Cell Processor

Alejandro Rico1, David Rdenas1,2, Felipe Cabarcas1,3, Xavier Martorell1,2, Alex Ramrez1,2, Eduard Ayguad1,2

1: Universitat Politcnica de Catalunya 2: Barcelona Supercomputing Center 3: Universidad de Antioquia

PACT Tutorial September 15, 2007

Cell Processor 1 PPE 8 SPEs 4-ring EIB 1 MIC 1 I/O

PowerPC Processor Element PPU:


64-bit PowerPC compatible ISA. Dual issue Dual threaded In order

L2 Cache
512 KB

Synergistic Processor Element SPU:


Newly architected SIMD ISA 128 registers of 128 bits (accessed as 16/8, 8/16, 4/32, 2/64, 1/128) Dual issue In order No branch prediction (uses hints)

LS:
256 KB of local memory for code and data Mapped in global address space to be accessible from outside the SPE

MFC
DMA controller that can be programmed from the local SPU or from other devices outside the SPE

The SPU access directly to the LS but has to program DMA transfers to access any other device in the chip.
8

Element Interconnect Bus 4-ring topology. up to 3 outstanding transfers per ring. up to a total of 8 outstanding transfers. coherent/non-coherent access 128-byte DMA transfers. 32-bit MMIO access.

Cell execution, the PPE runs the OS 1. The operating system runs on PPE.

10

Cell execution, the PPE starts the programs and manage the SPEs 2. Applications start executing on the PPE.
PPE code
int main() int main() {{ spe_id == spe_open_image(spubin); spe_id spe_open_image(spubin); spe_create_thread(gid, spe_id, spe_create_thread(gid, spe_id, &param, &env, -1, 0); &param, &env, -1, 0); spe_wait(spe_id); spe_wait(spe_id); } }

11

Cell execution, PPE create_thread starts programs in SPEs


3. The application can execute tasks in the SPEs (a thread executing in the PPE waits until SPE execution completes). PPE code
int main() int main() { { spe_id = spe_open_image(spubin); spe_id = spe_open_image(spubin); spe_create_thread(gid,spe_id,&param,&env,spe_create_thread(gid,spe_id,&param,&env,1,0); 1,0); spe_wait(spe_id); spe_wait(spe_id); }}

SPE code
int main() int main() {{ do_operations(); do_operations(); }}
12

Cell application compilation

PPU .c

CellSim libspe

SPU .c

ppu-gcc

spu-gcc

ppu binary

spu binary

Two compilers are needed since the ISAs of the PPU and SPU are different The applications contain two different binaries. It is also possible to embed the SPU binary into the PPU one.
13

Vous aimerez peut-être aussi