Vous êtes sur la page 1sur 6

VHDL-BASED SIMULATION ENVIRONMENT FOR PROTEO NOC

David Sigenza-Tortosa, IDCS-Tampere University of Technology, Finland, siguenzd@cc.tut.fi


Jari Nurmi, IDCS-Tampere University of Technology, Finland, jari.nurmi@tut.fi

Abstract
The purpose of this paper is to present the work that has
been carried out for the creation of a simulation
environment of our Network-on-Chip (NoC) architecture,
called "Proteo".
In an Intellectual Property (IP) based design
methodology also the interconnection structures may be
treated as IPs. The Proteo project is aimed at creating a
library of pre-designed communication blocks that can be
selected from a component library and configured by
automated tools. The network implements packet switching
in a hierarchical topology. We have created a high level
model of our network in VHDL, allowing mixed-abstraction
level simulation of our synthesizable code for validation.

Introduction
As the feature dimensions scale down to deep
submicron regime (below 0.25 m) the integration density
is not limited by the individual feature sizes, e.g., of circuit
metallization layers, but by electrical phenomena, capacitive
and inductive crosstalk between the interconnect lines [1,2].
In this environment, communication within logic blocks will
still be synchronous, but between them it will become
asynchronous in order to solve the problem of clock skew
and delay. This is the Globally-Asynchronous LocallySynchronous (GALS) paradigm [3].
New flexible and configurable communication channel
architectures need to be identified. These communication
channels will not form dedicated buses as currently
implemented on-chip and on PCBs, due to noise, scalability
and speed constraints. Thus, the overall communication
scheme will resemble more computer networking than
traditional bus based design [4].

near future and a stronger emphasis in manufacturability is


needed [6].
The Virtual Socket Interface Alliance (VSIA) is an organization whose goal is to promote IP block reuse and integration solutions [7]. One of their first efforts has been the
definition of a standard interface [8] for IP blocks.
Methodologies for the separated design of system
functionality and communication have been proposed
[9,10]. This effectively clears the way for independent general-purpose network design.
Various architectures for inter-block communication
have been already developed and commercially used in SoC
designs. An overview of the currently available bus-based
options can be found in [11]. Several researchers have proposed other packet-switched network architectures, like
Hemani [12], Guerrier [13] and Dally [14].

Our Project
The problem is to design an interconnection mechanism
for systems built from heterogeneous blocks providing an
adequate level of performance for a given application. The
solution will consist of a network of some type, a set of
protocols and a standard interface for accessing the network, along with the implementation of software tools for
the automation of the integration process.
"Proteo" is the name we gave to our network proposal.
In this project, the issue is focused on researching new protocols, architectures and the implementation of synthesizable blocks, leaving aside the development of the software
tools.

One of the major obstacles in reaching the complexities


of today's ICs has been the lag in design tools and methods
development. The current focus in IC design is on extensive
reuse of readily designed circuit blocks, called Intellectual
Property (IP) blocks or Virtual Components (VC) [5].

Flexibility and transparency of the network are traded


for resources, mainly storage capability. This should not be
a big problem in future gigascale chips. We think it is wise
to study how to set in place all these resources in such a way
that they can be shared with other system level tasks. Interesting applications will consist of generic mechanisms for
system test and fault tolerance. In addition, power-saving
techniques may be implemented using network resources.

Another major concern is the increasing costs of testing


an IC. It is being argued that current design goals of
performance and small size may not be economical in the

The network will be built from parameterized IP


blocks, providing a very flexible structure. As we want our
NoC to be directly integrable with available cores, the VCI

recommendation was adopted as exemplifying the current


bus-oriented standards for interface design.
Nodes will be customized sizing their internal buffers,
enabling/disabling protocol features, etc. One of our most
important goals is to design a highly scalable network, both
in terms of number of nodes (IPs) and in performance. Latency and bandwidth goals are set to less than 1s and up to
2Gb/s, respectively. We don't want to focus (initially) on
any specific application and our desire is to create a general
purpose NoC. This requires scalability and programmability.

The Proteo Architecture


A more detailed description of the architecture of our
network can be found in [15].
The basic hardware elements in our network are hosts,
nodes and links. Every host corresponds to an IP block that
will be connected to the network using a dedicated node as
a wrapper. Our nodes present a VSIA-compliant interface.
This imposes a series of constrains in our design, the principal being the adoption of a memory-mapped communication
paradigm, instead of a more natural node-to-node approach.
In Proteo, request packets contain an address field. Nodes
check this field in the incoming packets against their assigned range of addresses and accept or by-pass them according to the result.
Links and nodes are available as part of a library. They
include parameters to customize their number of channels
and dimensions, their interface options, supported data sizes
and protocol features, based on requirements of functionality, throughput and Quality of Service (QoS).

chip. The system is divided in clusters, using a hierarchical


network. This will comprise multiple subnets with different
performance, topologies, packet formats, etc. The subnets
are typically point-to-point structures, so each link can be
effectively tuned to its individual traffic requirements.
Currently, the topology being explored is a hierarchical
network built from a system-wide bi-directional ring and
several subnets with ring, star or bus topology (Fig. 1). The
use of regular topologies allows easy routing and direct
replication of blocks throughout the system.
The architecture of a typical node is inspired by the SCI
standard [16], which is a standard interface developed for
multiprocessor systems. SCI implements a rich set of
mechanisms covering most of the needs of high performance systems. Our node architecture extends the basic SCI
architecture to allow a configurable number of dimensions
and channels (Fig. 2). We have chosen a highly modular
structure that makes its configuration and tuning easy.
The links provide a high level interface, so they are
treated as modular elements and tuned independently. At
this point, the links are the only asynchronous elements in
our design, although we are currently studying including
more asynchronous elements in the node architecture.
Current synthesis results show that a big-sized node
with 4 I/O pairs (layers) and one channel per layer, implemented in a 0.18m technology, would be about one-third
of the size of a small microprocessor core. FIFOs make
around three-quarters of the total node size, because we
have used register banks to implement them. Optimisation
will include replacing these with specialised memory
blocks. We think the results are acceptable.

Our target domain is that of heterogeneous systems,


with many different types of IPs coworking in the same

Figure 2 Proteo node architecture (3 layers + 2 channels).


Figure 1 Example topology.

The Proteo Model


The first tool we needed in our project was a model of our
network. Its requirements are:

It must be easily modifiable and extensible, so we


can use it to compare the behavior of different design choices.

It must be relatively lightweight, so that large networks can be simulated.

As we develop a synthesizable version of the different blocks, we should be able to back-annotate


the information we gather from the physical implementation in the high level model.

Models at other levels of abstraction can be easily


co-simulated. We could use the model for validation of the final synthesizable blocks.

The approach we have chosen is to implement this high


level model (HLM) in VHDL, borrowing some of the ideas
described in [17]. However, here we are more interested in
the practical aspects than in the theoretical consistency of
the approach.
In VHDL, signals are not just variables: they have an
inherent time-related behavior. For a VHDL signal, it takes
some time to change its value. The VHDL simulator evaluates the value of all system signals iteratively in so-called
delta steps, which have no effect on simulated time. This
mechanism was introduced to allow the simulation of highly
concurrent systems deterministically. By moving all the
information transactions needed to implement a specific
functionality to this delta-time domain, we are able to model
any behavior in zero-time, that is to say, all the blocks have,
in principle, no delay at all. Special blocks are in charge of
synchronizing the flow of information to specific events,
like clock edges, in a fully controllable way. An example of
the delta-mechanism in action is in Figure 3.

The main advantage of having this kind of model is that


its functionality and timing are effectively separated as orthogonal dimensions, so it is possible to modify one of them
without affecting the other. Timing information recollected
from synthesis of blocks and subsystems can be backannotated to the model by just changing the value of the delay
parameters.
The reason for using VHDL to model the network at
this level of abstraction is the possibility we have now to cosimulate the HLM together with the synthesizable models
(SM) which are the final product. It has the great advantage
over other options of allowing us to use our experience with
the language and the tools.
If our goal were just the simulation of highly abstract
models, the SystemC [18] class library would be a better
choice. But cosimulation was one of our main requirements
and there were no SystemC-synthesis tools available at the
time of starting this project.
Model Structure
Our HLM has three principal components: hosts, nodes
and hubs. Currently, we don't include the links in our simulations, so the network behaves like a completely synchronous system.
Differences between the HLM and the SM are:

The level of abstraction in the description. The


HLM model uses enumerated and numeric types,
and high level operators and data structures.

Abstraction of node and sub-block interfaces. They


consist of a record-type signal and bi-directional
handshake signal only.

No data content. The payload of the packets is


modelled only by its size.

Hosts
Host models are generic Bus Functional Models (BFM)
of processors or other blocks and we classify them in active
and passive, depending on their requester or server role.
The input to an active host is written in a text file with
STI extension. The activity of the block is recorded in another text file with LOG extension. A simple script checks
the LOG file and extracts some statistics from it.
STI files can be written by hand or generated automatically by a script from a text file with TRF extension, containing the statistical parameters needed to generate the
stimuli using random functions.

Figure 3 Delta steps during protocol simulation.

Nodes and Hubs


Each version of the HLM node is developed as a prototype for its SM counterpart, and as a consequence, they
both share the same internal structure. The same test-

Figure 4 Information flows in a node and associated


delays.
benches used for validating the HLM node can be used afterwards to simulate the SM.
The functionality of a node can be viewed as a multiplexing and demultiplexing of three different data flows
(Fig.4). We associate a delay-generator block to each of
these flows. The delay formulae used for each one are included in the figure. After forwarding a packet, the multiplexing block enters a rejection state in which it is not accepting any more packets. The duration of this state is a
function of the size of the last accepted packet. In this way
we can emulate wormhole routing. The time it takes for (the
header of) a packet to travel from the source node to its
target is a function of the number of nodes between them
only, not its size.
Hubs are of similar conception to nodes, but they are
not connected to any host. They are used to interconnect
rings or to connect buses and stars to the main ring.
Auxiliary Scripts
Apart from the scripts we use for stimulus generation
and results analysis, we have developed additional scripts
for network generation and test-bench configuration. All
scripts are written in Python or Perl. The files used for describing a particular network configuration have a NET
extension. NET files contain four sections:

Figure 5 Comparison of simulation times for HLM


and SM.
The output consists of two VHDL files: one of them
containing the network entity and the other one the testbench.
Simulations
Simulations performed in a HP 9000/785 workstation
using Modelsim version 5.6b show that the HLM is really
lighter than SM (Fig. 5). The networks simulated are unidirectional rings and they perform the same series of 1000
transactions defined by a STI file. The SM has not yet
achieved the same level of functionality as the HLM, so the
figures are expected to differ even more in the future.
For realising mixed-abstraction level simulations, we
need wrappers. Interconnecting the HLM and the SM requires information insertion and deletion at the model
boundaries. This is done at the wrappers. They realise the
conversion between synthesizable and abstract data types,
and the protocol translation. In Figure 6 there is a diagram
showing the configuration for the mixed-level simulation of
a three-node ring, with the data types involved. It can be
seen that each of the three wrappers involved has a slightly
different interface and functionality. This is one disadvantage of this method of simulation.

1.

Interface configurations, in which we configure the


VCI connection to the hosts, using a list of parameters.

2.

Node configurations, in which the communication requirements are specified for each node using parameters like: number of I/O links, number of channels per
link, protocol options, routing table and FIFO lengths.

There is no automated way to generate the wrappers


and it has to be done manually, but they can be reused, as
long as the interfaces and protocols remain unchanged.

3.

Topology description, entered as list of point to point


connections.

Proteo Novelties And Future Work

4.

Test bench description (optional), including configuration of HLM hosts and model selection for nodes.

Our network proposal is the first one, to our knowledge, that implements the VSIA interface standards
natively. Another interesting point in this project is our vision that the Network-on-Chip can help partially solving
some of the problems associated with gigascale integration,

scripting languages to create a simulation environment for


our own designs. In this way we have the possibility to validate real synthesizable models using the same set of testbenches developed for the high level simulations.

Acknowledgements
We would like to thank Mikko Alho and Juha Pirttimki for performing the simulation and synthesis runs, and
Ilkka Saastamoinen for his comments.

References
[1] D. Sylvester and K. Keutzer, "A Global Wiring Paradigm For Deep Submicron Design", IEEE Transactions
on Computer Aided Design of Integrated Circuits and
Systems, pages 242-252, February 2000.
[2] J. Cong, Lei He, Kei-Yong Khoo, Cheng-Kok Koh and
Zhigang Pan, "Interconnect Design For Deep Submicron ICs", in Proceedings of ICCAD, San Jose, USA
November 1997.

Figure 6 Mixed-level simulation using wrappers.


providing some generic mechanisms, which can therefore be
offered as a standard for every design. For example, the
connection/disconnection of blocks in a system for power
saving purposes could be implemented as a network mechanism.
Future plans include:

Demonstrate the feasibility of complex hierarchical


networks using our approach.

Characterise network and protocol performance by


means of simulation.

Finish the implementation of the basic set of


building blocks and gather low level statistics.

Build a prototype board including several processors and FPGAs implementing our network.

Conclusion
The future of highly integrated systems is pointing at a
network-on-chip as a solution to the problems of interconnection, productivity and heterogeneity. We are trying to
extend our NoC proposal to the fields of testing, fault tolerance and low-power techniques.
We needed a simulation environment to experiment and
learn about network design. We have used VHDL and

[3] J. Muttersbach, T. Villiger, H. Kaeslin, N. Felber and


W. Fichtner, "Globally-Asynchronous LocallySynchronous Architectures To Simplify The Design Of
On-Chip Systems", in Proceedings of the 12th Annual
IEEE International ASIC/SOC Conference, Washington DC, USA, September 1999.
[4] M. Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. Malik
and A. Sangiovanni-Vincentelli, "Addressing The System-On-A-Chip Interconnect Woes Through Communication-Based Design", in Proceedings of DAC, Las
Vegas, USA, June 2001.
[5] M. Hunt and J. A. Rowson, "Blocking In A System On
A Chip", IEEE Spectrum, pages 35-41, November
1996.
[6] W. Maly, H. Heineken, J. Khare and P. K. Nag, "Design For Manufacturability In Submicron Domain", in
Proceedings of ICCAD, San Jose, USA, Novemeber
1996.
[7] M. Birnbaum and H. Sachs, "How VSIA Answers The
Soc Dilemma", IEEE Computer, pages 42-49, June
1999.
[8] VSI Alliance, "Virtual Component Interface Standard",
http://www.vsi.org, April 2001.
[9] J. A. Rowson and A. Sangiovanni-Vincentelli, "Interface-Based Design", in Proceedings of DAC, Anaheim,
USA, June 1997.
[10] G. Nicolescu, Sungjoo Yoo and A. A. Jerraya, "MixedLevel Cosimulation For Fine Gradual Refinement Of
Communication In Soc Design", in Proceedings of
DATE, Munich, Germany, March 2001.

[11] E. Salminen, V. Lahtinen, K. Kuusilinna and T.


Hmlinen, "Overview Of Bus-Based System-OnChip Interconnections", in Proceedings of ISCAS,
Scottsdale, USA, May 2002.
[12] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. Berg,
M. Millberg and D. Lindqvist, "Network On A Chip:
An Architecture For Billion Transistor Era", in Proceedings of NORCHIP, Turku, Finland, November
2000.
[13] P. Guerrier and A. Greiner, "A Generic Architecture
For On-Chip Packet-Switched Interconnections", in
Proceedings of DATE, Paris, France, March 2000.

[14] W. J. Dally and B. Towles, "Route Packets, Not Wires:


On-Chip Interconnection Networks", in Proceedings of
DAC, Las Vegas, USA, June 2001.
[15] I. Saastamoinen, D. Sigenza-Tortosa and J. Nurmi,
"Interconnect IP Node For Future System-On-Chip Designs", in Proceedings of DELTA 2002, Christchurch,
New Zealand, January 2002.
[16] IEEE, "The Scalable Coherent Interface", March 1992.
[17] J. M. Schoen (editor), "Performance And Fault Modeling With VHDL", Prentice-Hall inc., 1992.
[18] OSCI, http://www.systemc.org.

Vous aimerez peut-être aussi