Académique Documents
Professionnel Documents
Culture Documents
[8] Jae S. Lim, Two-Dimensional Signal and Image Proceeding, Prentice-Hall, Inc.,
Englewood Cliffs, New Jersey, 1990.
[9] Ingrid M. Verbauwhede, Chris J. Scheers, and Jan M. Rabaey, “Specification and Sup-
port for Multidimensional DSP in the Silage Language,” ICAASP ‘94.
The author would like to acknowledge various people at U.C. Berkeley for numerous
ideas and thought provoking discussion. I would like to thank my research advisor Professor
Edward A. Lee, without whom I would undoubtedly never have embarked on this project. I would
also like to thank the members of Professor Lee’s research group, especially Sun-Inn Shih and
Tom Parks, for their input on MDSDF and their assistance in implementing the domain in
Ptolemy. Lastly, I thank my parents for their encouragement, love, support, and nagging through-
out my years in school.
7.0 References
[1] E.A. Lee, “Multidimensional Streams Rooted in Dataflow,” Proceedings of the IFIP
Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain
Parallelism, Jan 20-22, 1993, IFIP Transactions A (Computer Science and Technology), 1993,
vol.A-23:295-306.
[2] J. Buck, S. Ha, E.A. Lee, and D.G. Messerschmitt, “Ptolemy: a Framework for Simu-
lating and Prototyping Heterogeneous Systems”, International Journal of Computer Simulation,
special issue on “Simulation Software Development,” January, 1994.
[3] E.A. Lee and D.G. Messerschmitt, “Synchronous Data Flow,” Proceedings of the
IEEE, Vol. 75, No. 9, pp. 1235-1245, September, 1987.
[4] E.A. Lee and D.G. Messerschmitt, “Static Scheduling of Synchronous Data Flow Pro-
grams for Digital Signal Processing” IEEE Transactions on Computers, Vol. C-36, No. 1, pp. 24-
35, January, 1987.
[5] Shuvra S. Bhattacharyya and E.A. Lee, “Memory Management for Synchronous Data-
flow Programs,” Memorandum No. UCB/ERL M02/128, Electronics Research Laboratory, U.C.
Berkeley, November 18, 1992.
[6] Shuvra S. Bhattacharyya and E.A. Lee, “Scheduling Synchronous Dataflow Graphs for
Efficient Looping,” Journal of VLSI Signal Processing, Dec. 1993, vol.6 (no3):271-88.
out = 0;
int tap = 0;
The syntax is very similar to the normal ones used to access the block directly assigned to
the firing except we can use negative and positive arguments to getFloatInput() and get-
Input() to access data backwards or forwards in the data space, respectively.
This concept is equally valid in the multidimensional case. Although currently not imple-
mented this way, we should be able to have destination portholes of a fork share one geodesic, so
that we do not have to have multiple copies of the data in separate geodesics for each output arc of
the fork.
go {
// get a scalar entry from the buffer
double& out = output.getFloatOutput();
out = 0;
int tap = 0;
Currently, MDSDF supports a limited method of accessing data with indices to the past
and future of the “current” data block. As we mentioned before, ever star firing is mapped to a
specific block in the data space. If the star also desires to access data that is outside that block, it
can do so, with some limitations. The limitations are that the star can only access data blocks
within the current buffer. Data outside the current buffer is considered zero. We do not support
dependency along the iterations such that a star that was firing at the last column of the current
iteration buffer size would not force a subsequent iteration firing to produce the data for the for-
ward reference. Similarly, a star that is the first firing of an iteration cannot access data from the
buffer of the previous iteration. The syntax for making such references is shown in the code frag-
ment for the MDSDFFIR star below:
defstate {
name { firstRowIndex }
type { int }
default { “-1” }
desc { The index of the first row of tap values }
}
defstate {
name { lastRowIndex }
type { int }
default { 1 }
delete &input1;
delete &input2;
delete &result;
}
}
Notice how we have declared the types of each porthole. The MDSDF stars use the types
COMPLEX_MATRIX, FIX_MATRIX, FLOAT_MATRIX, and INT_MATRIX, in contrast to the SDF
stars that act on the PMatrix class objects, which have portholes declared to be of type
COMPLEX_MATRIX_ENV, FIX_MATRIX_ENV, FLOAT_MATRIX_ENV, and INT_MATRIX_ENV. The
SDF matrix types have the ENV extension because the matrix particles in SDF use the Envelope
structures to hold the matrices being transferred. The MDSDF star uses states that allow the user
to change the dimensions of the inputs and outputs for the star as needed. The dimensions are
declared in the setup() method, as we mentioned before. It is important to note how the calls to
getInput() and getOutput() have been cast to the appropriate return type needed. Type
checking is performed by the system during scheduling, so these casts should match the ones
declared for the porthole types or else unexpected results will occur. The last thing to note is how
we delete the submatrices used to access the data buffers at the end of the go() method. This is
because the submatrices are currently allocated by the getInput() and getOutput() methods
whenever they are called and no pointers to those submatrices are ever stored (unlike particles).
Thus, to prevent memory leaks, the submatrices must be deleted by the stars that created them.
The memory for the data actually referenced by the submatrices is not changed since the subma-
trices are simply access structures and do not allocated any memory of their own for storage pur-
poses.
4) ANYSIZE rows or columns are resolved by following the input porthole with ANY-
SIZE rows or columns and assigning the ANYSIZE row or ANYSIZE column dimension to the
corresponding row or column dimension of the output porthole connected to it. If that output port-
hole itself has ANYSIZE rows or columns (as in the case of cascaded fork stars), then that star is
resolved first, following the rules given here, until we find an output porthole which has determi-
nate row and column dimensions.
(1,1)
(2,1) (2,2)
A B
2 2
A[0,0] A[0,1] A[0,3] A[0,2]
rows 1
2
A[0,4] A[0,5]
FIGURE 40. Buffer evolution of a MDSDF system with delay.
We have implemented the ability to support stars which have portholes with specifications
that are (ANYSIZE, ANYSIZE). The rules for resolving the size that the porthole uses is as follows:
1) No star can have more than one input porthole with ANYSIZE rows or columns.
2) A star with ANYSIZE rows or columns on an output porthole must have an input port-
hole that also has ANYSIZE rows or columns.
We have created a slightly more complex schedule class for the MDSDF domain. The
SDFSchedule class was essentially a sequential list of pointers to stars. An MDSDFSchedule
needs to know more than just the order of the stars. The schedule entries must also have the firing
index of the star since the firing index is the only way to determine how a particular firing of a star
is mapped into the data space. This index is produced when the schedule is created and then
stored along with the star pointer in a cell called the MDSDFScheduleEntry. The index stored is
not just one (row,column) pair but actually a (row,column) star index and a (row,column) end
index range. This allows us to express larger schedules more efficiently. We are essentially storing
the syntax used to express single processor schedules like A[0,0]-[4,4]B[0,0]-[2,2] instead of storing
each firing of the star as one entry. For a multiprocessor scheduler, we will need to develop new
structures to represent such schedules.
When the column dimensions of the delay is greater than zero, we will increment the col-
umn dimensions of the buffer by multiples of the original column size of the buffer. We cannot
simply increment by the size of the column delay due to the issue that we discussed before about
wanting submatrices to access proper subsets of the buffer storage. For example, for the system
shown in Figure 40, we use a buffer with twice the column size as would be needed if there were
no delay. The row size of the buffer is one greater than the row size that would be needed if there
were no delay. The column size of the buffer when there is a delay in the system must be a multi-
ple of the original column size because we want both the input and output submatrices to access
proper subsets of the buffer. Since it is possible that either submatrix might access the entire orig-
inal buffer as its block size, we need the column dimensions of the modified buffer to always be a
multiple of the original buffer size. Similarly, if the system has actors that access data in the
“past” along the column dimension, we must use a buffer size that has a column dimension that is
The buffers of the portholes are also still around in MDSDF, but they are not used to store
data that is being transferred on the arc. Instead, to maintain backward compatibility with some
stars that we copied from SDF that had to use the % operator and required a particle input, we have
created a % operator for MDSDF portholes that will create a temporary matrix particle and copy
the data from the submatrix that would normally have been accessed. This temporary matrix par-
ticle is stored in the porthole’s buffer and is deleted when the porthole is deleted. Currently, the
only case of this being used is to support the MDSDFTkText stars, which expect inputs of the
Particle class. The star does not care what the dimensions of the data are, or even that it is a
matrix. The reason for our modification is that the TclTk stars utilize Ptolemy kernel code that
we did not want to duplicate or modify just for the MDSDF case.
In summary, although submatrices are similar to particles in that they should be able to be
reused instead of being created and deleted repeatedly for every iteration, the primary difference
in the way we treat submatrices is that we never buffer them in the portholes or geodesics.
Although it might be possible to buffer the submatrices used by a star in the portholes for that star,
which would give us the advantage of maintaining pointers to all the submatrices used so that the
system could recover the memory used by the submatrices instead of forcing the star to do so, this
would involve an additional complexity of maintaining a two dimensional buffer. In our first
attempt at implementing a MDSDF simulation domain, we did not think this extra complexity
would provide enough benefit to be justified.
Geodesic
Plasma Particle
FIGURE 39. Close-up of connections for data transfer between actors in the SDF simulation domain.
arc connecting portholes is implemented using the geodesic structure, which also has a buffer that
acts as a FIFO queue. The particles go into the geodesic buffer when the source actor has finished
firing to produce the data. The particles move from the geodesic buffer to the buffer of the desti-
nation porthole when the destination actor is ready to fire. After the destination actor has fired, the
“empty” particles are returned to the plasma, which acts as a repository of empty particles that can
be reused by the source porthole.
We felt that this system of having three buffers (one in each porthole and one in the geode-
sic) per arc would be too inefficient for MDSDF. Many of the systems described in MDSDF have
large rate changes, which results in a large number of particles flowing through the system if we
use the old style of implementation. An example of such a system would be an image processing
graph, where we wished to work at the pixel level. A typically sized image would generate thou-
sands of particles of data if treated at such a level. This inefficiency is not inherent to SDF. On the
contrary, SDF systems in general have very desirable qualities, such as the ability to make static
schedules and perform static buffer allocation for them. These qualities have been implemented
for SDF code generation domains, but not for the SDF simulation domain. MDSDF has similar
qualities, so we have designed the MDSDF simulation domain to take advantage of these qualities
to reduce the amount of buffering overhead in the system.
We mentioned in the previous section that stars in MDSDF access the data space of the
buffer using submatrix structures instead of through particles like SDF stars. These submatrices
are not buffered at all, but are created and deleted as needed when the star requests one for input
or output purposes (it might be even more efficient to allocate a submatrix plasma to store
“empty” submatrices so that we can reuse allocated memory for the structures). For example, a
star that generates data would first request from the output porthole a submatrix to access the out-
put buffer using the getOutput() method of that porthole. That star could then write to the
entries of that submatrix using the standard matrix operations. Similarly, a star that receives input
from another star could get access to the data using the getInput() method of its input porthole.
This is in contrast to the standard SDF style of using the % operator of the portholes to access the
current particle or any previously received particles in its buffer. We will illustrate how stars
access these submatrices in a future section. Here, we want to emphasize that there are no buffers
of particles or submatrices for data transfer purposes at all in the MDSDF simulation domain
implementation. The storage for the data that passes on an arc is allocated by the geodesic as one
large mother matrix. The stars at either end of the arc will access subsets of the memory allocated
for the mother matrix using submatrices.
In the example, the three variables A, B, and C have been previously declared to be of type
FloatSubMatrix. The assignment operator has been overloaded to allow us to assign all entries
of a matrix to be the same value, as shown in the first code statement. We can also use the []
operator to access an entry of the matrix at a specific row and column, as shown in the next three
code statements. The last code statement shows how we can use the * operator, which we have
defined to implement matrix multiplication, on two source matrices A and B, and the result of that
operation is then assigned to the destination matrix C. The ability to define operators for the
Matrix and SubMatrix classes gives us the ability to treat matrices simply by their variable
names and operate on them as if they were a new data type in the system.
We can solve these equations to generate the repetitions count for each actor, which are
A{1,1}, B{1,1}, Mult{4,1}, Add{4,1}, Fork{4,1}, C{4,1}. Thus, for one iteration period, actors A and
B fire one time each and the other actors all fire four times. The actors that fire four times each
consume data down the rows of one column.
Using the scheduling rules we presented previously, the schedule for the vector inner
product system is A[0,0]B[0,0]Mult[0,0]-[3,0](AddFork)[0,0]-[3,0]C[0,0]-[3,0]. The schedule uses a
short-hand notation to group the pair of sequential firings of the Add actor followed by the Fork
actor. That sequence is executed four times, from index [0,0] to [3,0]. The Add actor can fire the
first time because it has a initial data block provided by the delay on its upper input. After its first
firing, it needs the output of the Fork actor to continue. Thus, the pair Add and Fork must fire
together in series. After one iteration, the Add gets reset because its first input comes from a new
column, which again has an initial delay value. The final result is that for each iteration, the sys-
tem computes the inner product of the two vectors provided by actors A and B. We could make
the system into a galaxy, and provide a different pair of input vectors for each call of this galaxy.
Iteration 1 Iteration 2
FIGURE 36. Buffer usage for two iterations of a MDSDF system with constrained delays and where the
column size of the buffer is a multiple of the column size of the buffer if there were no delays.
(1,2)
(2,2) (4,4)
A B
columns columns
0 1 2 3 4 5 0 1 2 3 4 5
0 0
1 1
rows 2 rows 2
3 3
4 4
Iteration 1 Iteration 2
We can see that the source actor produces submatrices that are always subsets of the buffer
space. If the column size of the buffer is increased by a multiple of the original column size of the
columns columns
0 1 2 3 4 5 6 0 1 2 3 4 5 6
0 0
1 1
2 2
rows 3 rows 3
4 4
5 5
6 6
Iteration 1 Iteration 2
FIGURE 34. Buffer usage in two iterations of a MDSDF system with delays.
Notice how in the second iteration, the submatrices for firings B[0,2] and B[1,2] are no
longer proper subsets of the buffer space. Similarly, firing A[0,6] will produce data into a subma-
trix that wraps around the boundary of the buffer space. In order to support such modulo address-
ing in the submatrices, their design would need to be much more complex, and the methods to
access each entry of the submatrices would be much slower. These problems also exist in the first
finite block definition we gave previously, but not in the second definition given above where the
delay block size was a multiple of the input block size.
In an attempt to simplify the system and especially to keep the implementation of the sub-
matrices as fast and efficient as possible, we chose not to support modulo addressing. We wanted
submatrices to always access proper subsets of the buffer space. In order to do this, we had to
adopt a constraint such that the number of column delays specified must always be a multiple of
the column dimension of the input to the arc with the delay. This causes the column delays to
behave like initial firings of the source actor onto the buffer space, and results in the submatrices
used by the source actor to always fit as proper subsets of the buffer space. Unfortunately, this
constraint is not sufficient to guarantee that the destination actor will use a submatrix that is a
proper subset of the buffer space.
An additional constraint was needed, such that the number of columns in the buffer with
delays is always a multiple of the number of columns of the original buffer with no delays. This is
because there are instances where the source or destination actor works on the entire original
buffer space, thus increasing the number of columns in the buffer only by the number of column
1
rows
2
Iteration 1
We notice that similar to what happens with delays in SDF, there is left-over data on the
buffer that will never be consumed, and the buffer size must be large enough to accommodate this
extra data. In the row dimension, the delay has caused the last row of data produced by the source
actor to be never consumed. Currently, we simply enlarge the buffer by the number of row delays,
to give the producer a place to put the data generated. We could discard the data after this, or it
might even be possible to discard it immediately when it is created so we do not have to buffer the
data, but this would require the submatrix of the producer to be smart enough to know that the
data being generated should be discarded. We feel the cost of this modification is not worth the
savings at this time. The extra column data that is left unconsumed in the first iteration by column
delays cannot be so discarded because subsequent iterations would consume it.
As we just showed, the column delays also increase the number of columns needed in the
buffer, but this increase in column size results in much more complex problems than the increase
in row size caused by the row delays. The problems have to do with determining how much to
increase the column size of the buffer. If we simply increase the number of columns of the buffer
by an amount equal to the number of column delays (the method used for the row delays), we
encounter a problem that has to do with the implementation of the submatrices used to access sub-
1
rows
2
Iteration 1
interpretation actually changes the schedule generated for the system. Again, this definition may
be useful in some cases, but we felt that it was not the “correct” extension of SDF delays since
SDF delays do not change the number of times an actor is repeated in each iteration period
(although delays might cause some data generated by an actor to be unused and left on the queue).
The last definition we present is the one presented in [1] and is the one we have adopted in
our implementation. This interpretation of two-dimensional delays is one in which the delay
dimensions cause a two-dimensional offset of the data generated by the source actor relative to
the data that is consumed by the destination actor. This is similar to considering the two-dimen-
sional delay specifications as boundary conditions on the data space. The two-dimensional speci-
fication of the delay, (Nrow delays, Ncolumn delays), is interpreted such that Nrow delays is the number
of rows of initial delay values and Ncolumn delays is the number of columns of initial delays values.
Although it is possible in SDF to specify non-zero initial values for delays, in the current imple-
The notation we use for specifying a two-dimensional delay is similar to how we specify
the portholes of a MDSDF actor. This is seen in Figure 30, in which we have specified the delay
(1,1)
(2,2) (3,3)
A B
rows 1
distorts the data space so that it is even unclear how the data from subsequent firings of actor A
should be placed in the data space. Although a limited definition (where we limit the dimensions
of the delay to be some multiple of the input dimensions) of such finite block delays might be use-
ful in some cases, we do not think this is the “correct” definition of multidimensional delays.
Another possible way to define 2-D delays is to be multiples of the input dimensions. In
SDF, delays were a count of how many initial particles, so if we consider MDSDF actors to pro-
duce arrays, we might consider delays to be a count of the number of initial arrays. This definition
would be similar to the previous one when we limit the delay dimensions to be multiples of the
input dimension. For the previous system, the data space would look like the diagram in
4.3 Delays
Delays are a common feature in one-dimensional signal processing system, but their
extension to multiple dimensions is not trivial and can cause many problems for both scheduling
and buffer management. In one-dimensional SDF, delays on an arc are usually implemented as
initial particles in the buffer associated with that arc. The initial particles act as offsets in the data
stream between the source and destination actor, as show in Figure 29. Effectively, the output of
actor A has been offset by the number of particles set by the delay.
2
2 3
A B
Delays
B0 B1 B2
Unfortunately, the extension to more than one dimension is not so simple. In our attempts
at implementing multidimensional delays, we were at first uncertain how to even define them. We
see at least two ways to interpret the meaning of a delay on a multidimensional arc, and we have
adopted the definition that seems more logical and attractive to us, but we still had to limit its
functionality to aid us in implementation. It is not yet clear to us whether our definition is the
“correct” one, but more experience in using MDSDF to model real problems should settle the
matter. For now, we will present the various alternative definitions and go into more detail about
the definition we have adopted. We will explain some of the problems we found in implementing
our definition and the restrictions we had to place on it to simplify our implementation.
A[0,0]
B[1,0]
that firing A[0,0] produces data that correspond to buffer locations d[0,0], d[0,1], d[1,0], d[1,1],
where d represents the two-dimensional buffer. Similarly, firing B[1,0] requires that buffer loca-
tions d[0,3], d[0,4], d[0,5], d[1,3], d[1,4], d[1,5], d[2,3], d[2,4], d[2,5] all have valid data before
it can fire. We can also tell that firing B[1,0] requires firings A[0,1], A[0,2], A[1,1], and A[1,2] to pre-
cede it. The problem is how to determine such dependencies quickly, without resorting to a two-
dimensional state-space search to verify that the required data buffer entries are available. In a
single processor scheduler, given the simplifications we mentioned before based on the fixed row-
by-row execution order of firings, the problem is solved by simply keeping a pointer to the loca-
tion of the last “valid” row and column in the buffer. Any rows above the last valid row (lvr) is
assumed to have data filled by the source star already, and any column to the left of the last valid
column (lvc) is similarly assumed to be valid.
For example, after firing A[2,1], lvr = 5 and lvc = 3 (see Figure 28). To check whether fir-
ing B[0,0] is runnable, we simply check the location of lvr and lvc. We know that actor B expects
(3,3) blocks of data, and since this is the [0,0]th firing, we need lvr >= 2 and lvc >= 2. Similarly,
firing B[1,1] would not be runnable in this example since we need lvr >= 5 and lvc >= 5.
(2,2) (3,3)
A B
For example, consider the universe of Figure 26. Using the techniques from the previous
section on calculating the row and column repetitions, it is easy to determine that actor A needs to
be fired {3,3} times and actor B {2,2} times for one complete iteration. Since actor A can fire a
total of nine times, we will schedule it to do so immediately, before the four firings of actor B.
Using the row-by-row scheduling rule we mentioned above, we schedule the first three row fir-
ings of actor A, starting from firing A[0,0] and incrementing in the column dimension, and then
proceed to the next two rows. At completion of scheduling, the schedule that our simple single
processor MDSDF scheduler generates is
A[0,0]A[0,1]A[0,2]A[1,0]A[1,1]A[1,2]A[2,0]A[2,1]A[2,2]B[0,0]B[0,1]B[1,0]B[1,1].
From the experience of using our MDSDF scheduler on systems with large two-dimensional rate
changes, it became clear that a shorthand notation for such a schedule is needed because there are
often many firings of each actor per iteration (especially for systems like image processing). For
the single processor case, when we know that there is a specific order of firings, we can use the
shorthand notation A[0,0]-[2,2]B[0,0]-[1,1] to represent the above schedule.We still have the problem
of determining when the destination actor can fire. In the one-dimensional SDF case, the solution
was to simply count the number of particles on the buffer between the actors. In the previous
example, actor B was runnable when the buffer had enough particles, and when it fired, it would
remove the first NB particles from the buffer. The seemingly simple extension to working on a
two-dimensional data stream actually results in a quite complex problem. We cannot simply talk
about “when is star B runnable?” We need to talk about a specific instance of the firing of star B,
like “when is the instance of B[0,0] runnable?” This is because of the fact that the buffers between
MDSDF actors can no longer be represented as simple FIFO queues and each firing of a MDSDF
star has a fixed block of data that it needs to produce or consume, depending on its firing index.
The difference between the two SDF schedules has to do with the fact that the second
schedule defers the last firing of actor A when it realizes that actor B was runnable after the first
two firings of actor A. This “smarter” schedule has the advantage of being able to use a smaller
buffer between the two actors. For the example above, the first schedule requires a buffer of size
six, while the second schedule requires a buffer of size four. There is a cost in using the second
schedule that has to do with the fact that the first schedule can be written so that is uses less mem-
ory for the code than the second schedule. This is because the first schedule can be expressed as a
loop schedule 3A2B, which means that the code for actor A is simply placed inside a loop that
executes three times and the code for actor B is placed inside a loop that executes twice. If we try
to loop the second schedule, the best we can do is A2(AB), which requires us to repeat the code
for actor A an extra time (note that in real DSP systems, code for modules are often repeated
rather than called as functions since function calls are slower and take stack memory as well).
Considerable work has been done on how to schedule SDF graphs to minimize the two often
opposing criteria of code size and buffer size [5,6].
The critical problem to solve in generating any schedule is knowing when the destination
actor has enough data to fire. This is not too difficult a problem to solve in the SDF case where all
buffers are modeled as FIFO queues. A simple scheduler for SDF graphs simply keeps track of
the number of particles at the input to an actor. If an actor has no inputs, then it is always runnable
and can be added to the schedule. So, source actors are always runnable. Otherwise, the only con-
dition for an SDF actor with inputs to be runnable is that there are enough particles on each of its
input buffers to satisfy the number required. Thus, an SDF scheduler can determine when an actor
is runnable simply by keeping track of the number of particles on the buffer.
In SDF, it is possible to specify a system such that its balance equations have no integer
repetition solutions. This situation is called sample rate inconsistency [4]. An example of such a
system is shown in Figure 23. Since actor A has a one-to-one production/consumption ratio with
1 2
B 1
C
A
1 1
actors B and C, they should have the same number of repetitions in one iteration period. Unfortu-
nately, actor B produces twice as many particles per firing as actor C consumes, which implies
that actor C should fire twice as often as actor B in one iteration. Thus, there is an inconsistency in
the number of repetitions for each actor in one iteration.
It is also possible to specify MDSDF systems with sample rate inconsistencies. The user
needs to be even more careful when specifying MDSDF systems because it is possible for same
rate inconsistencies to occur on both dimensions. An example of an MDSDF system with sample
rate inconsistencies is shown in Figure 24.
(2,1) (1,2)
B (2,1)
C
A
(2,1) (2,1)
A related problem is when a user defines a non-executable system due to insufficient data
on an input for the first iteration. This situation, which we term a deadlock condition, can occur in
systems with feedback, as shown in the SDF system of Figure 25. For the first firing of actor A, it
1
1
1 1
A Fork 1
1
In Figure 21, the system has only one arc, so there is only the single balance equation.
NA NB
A B
rA NA = rB NB
The unknowns rA and rB are the minimum repetitions of each actor that are required to maintain
balance on each arc. NA and NB are the number of output and input particles produced and con-
sumed by actors A and B respectively. The scheduler first calculates the smallest non-zero integer
solutions for the unknowns, which we saw to be rA = 3 and rB = 2 for the universe of Figure 8.
The MDSDF extended universe differs because we no longer consider the arcs connecting
the actors to be a FIFO queue but rather a two-dimensional data space. We adopt a similar defini-
tion of an iteration for the MDSDF case such that at the end of one iteration, the consumption of
data should be balanced with the production so that all buffers are returned to the same state as at
the beginning of the iteration. In terms of repetitions, this definition involves a simple extension
so that there are now two sets of balance equations, one for each dimension:
Each equation can be solved independently to find the row repetitions and column repeti-
tions for each actor. We consider this two-dimensional repetition specification to represent the
number of row firings and the number of column firings for that actor in one iteration. We use the
curly brace notation {row firings, column firings} to denote the repetitions of a MDSDF actor.
The product rowfirings × columnfirings gives us the total number of repetitions of that actor
in one iteration period.
Many of the problems in developing a workable MDSDF specification are concerned with
the task of scheduling a MDSDF system. Part of the complexity of implementing MDSDF is the
fact that so many of the issues are interrelated, and a design decision in one area will have major
impact in many others.
We will present the discussion by scheduling topic, first summarizing how the problem is
defined and solved in SDF, and then presenting the MDSDF definition and solution. This discus-
sion will be more formal than what we presented in Section 2.0. The reader is referred to
[3],[4],[5] for a more complete presentation of SDF topics.
FIGURE 18. A SDF implementation of 2D FFT revealing the data parallelism awkwardly.
For example, Figure 19 shows a MDSDF system that implements a two-dimensional FIR
filtering system [7][8]. We use a very small image size so that we can show the data space dia-
gram more easily in Figure 20. Here, we show that the designer can have the ability to choose dif-
2D FFT Galaxy
Fourier Transform (FFT) of an image. One easy way to compute a two-dimensional FFT is by
row-column decomposition, where we apply a 1-D FFT to all the columns of the image and then
to all the rows [7][8]. This simple concept is straightforwardly expressed in MDSDF as we see in
the figure. The diagram shows how we can use the graphical hierarchy of Ptolemy to implement
the 2-D FFT as a module made of the two 1-D FFT components. The 1-D FFT stars of the 2-D
FFT galaxy are identical, except that we have specified the inputs and outputs to work along the
columns and rows of the image, respectively.
We could describe something similar in SDF, but we would be limited to either working
with the entire image (as in Figure 17) or adding a series of matrix-vector conversions and trans-
1 Particle holding 1 Particle holding
a 256x256 matrix a 256x256 matrix
2D FFT Image
Image Star Viewer
positions to manipulate the 1-D vectors to the correct orientation (as shown in Figure 18). The
first alternative is not very attractive because we would not be able to take advantage of the data
parallelism in the algorithm for multiprocessor scheduling, especially the data parallelism that the
MDSDF system reveals. The second alternative is also unattractive because it is quite cumber-
some and awkward to have all the data manipulation stars that do not really contribute to under-
standing the algorithm. The two-dimensional image, considered in SDF as a single monolithic
matrix, needs to be converted to a series of vectors so that we can apply the 1-D FFT star on the
rows. Then, the vectors must be collected again into a large data block and then transposed and
Delays columns
0 1 2 3 4 5 6 7 8 ...
rows 2
0 1 2 3 4 5 6 7 8 ...
Iterations
have one row and no columns. This implies that the entire first row of the data space is set to the
initial value of zero. Thus, at every iteration, the Add actor will have its upper input reset, which
is equivalent to resetting the output result C at the beginning of each iteration. This example
shows one of the features of our interpretation of two-dimensional delay specifications as infinite
along a row or column.
In the last chapter, we introduced how MDSDF can reveal data parallelism in a system.
We now present a couple of more interesting examples from field of two-dimensional signal pro-
C = 0;
for (counter = 0; counter < iterationCount; counter++) {
for (i = 0; i < 4; i++) {
C += A[i] * B[i];
}
}
FIGURE 13. C++ code for vector inner product SDF system.
to make this into a module such that each time the system is run, one would like to have it do the
inner product of two four-entry vectors. The problem is that because of the stream orientation of
the system, there is no way to reset the accumulator output C. A second iteration of the system
would have C to accumulate the sum of the inner product of the first pair of vectors with the inner
product of the second pair of vectors.
One possible way to make the system do what we desire is if we could somehow reset the
delay at every iteration. A delay is usually considered to be an initial particle on the arc and we set
its value to be zero. This is how the first iteration computes the inner product correctly because it
essentially sets the initial value of C to be zero. If we could have the delay insert another initial
particle at every iteration, this would achieve the functionality we desire. To do this in SDF, we
often had to resort to various tricks to hardwire a reset to actors or delays in order to implement
this controlled reset of nested loops.
MDSDF can implement such functionality by using the fact that successive iterations are
along a new column in the data space. By using our definition of a delay as an entire row or col-
umn of initial values in the data space, we can implement the inner product function as shown in
Figure 14. Here, all the input/output specifications of the actors in the SDF version have been
(1,0)
augmented to a second dimension. The specification of the second dimension in most of these
extensions have been set to one, which implies a trivial use of the second dimension. It is prima-
rily the specification of the two-dimensional delay, and the use of the implicit use of a new col-
umn for each successive iteration that makes this system different. The effect of the two-
dimensional delay is best illustrated by a diagram of the data space buffer for the arc containing
A[0,0]
A[0,0] A[0,1] A[0,2]
B[0,0]
B[0,0]
A[0,1]
B[1,0] B[1,0]
A[0,2]
FIGURE 11. Precedence graph and data distribution for system of Figure 10.
produced by actor A is arranged as a column of the data space, the two output values of each fir-
ing of actor A is distributed to each firing of actor B. So even though the actors in the SDF and
MDSDF systems both produce and consume the same number of data values, and the schedules
for the two systems are similar in that actor A fires three times and actor B fires twice in both
schedules, the data distribution of the two systems is quite different. Note that the MDSDF model
is more general since it can express the dataflow of the SDF system by varying one of the dimen-
sions and keeping the other dimension fixed at one. We can also express the precedence graph of
Figure 11 in SDF, but we would have to lay out the system exactly as shown, using five nodes and
connecting them up exactly as we showed in Figure 11, which makes it clear that MDSDF is a
more expressive model of dataflow and can express a larger set of systems more compactly than
SDF.
4 1 1
A 1 1
1
Add Fork
Multiply 1 1 C
1 1
1
B
4
FIGURE 12. A SDF system to do vector inner product.
2 3
A B
We can formalize this more clearly by looking at the precedence graph and the distribution
of data for the above system. These are shown in Figure 9. Since the arc connecting the two actors
A0 A0 A1 A2
B0
d0 d1 d2 d3 d4 d5 ...
A1
B1 B0 B1
A2
is considered to be a FIFO queue, the order of the data produced by the various firings of actor A
are consumed in order by actor B, as shown in both the precedence graph and the data distribution
diagram. The data distribution diagram is similar to the two-dimensional data space buffer dia-
grams we have shown for MDSDF systems before, but it is only a single dimensional stream. The
left most entry, labeled d0, is the first particle in the stream. Therefore, d0 and d1 are the first two
particles generated by the first firing of actor A.
Figure 10 shows a possible MDSDF extension of the previous system. Again, actor A pro-
duces two data values each time it fires and actor B consumes three, but the extra information
inherent in the dimensions specified for their portholes results in a much different distribution of
data between the two actors.
(2,1) (1,3)
A B
(1,1)
(2,1) (1,3)
A B
rows 1
to buffer locations d[1,1], d[1,2], d[2,1], d[2,2]. We will discuss the effects of two-dimensional
delays on scheduling and other complexities that it introduces in Section 4.0. We note that another
possible interpretation of the specifications of a two-dimensional delay is simply as one fixed
sized data block with the given dimensions, instead of an infinite stream along each dimension.
We feel that our interpretation is the proper extension of SDF delays and has some useful advan-
tages over other interpretations, as we shall show in the next chapter.
Note that the firing index of an actor is directly associated with a fixed location in the data
space, but they are not exactly equivalent. We need to know the size of the blocks produced or
consumed by the actor to determine the exact mapping between the firing instance of the actor and
its corresponding data space.
Additionally, an important feature about the above firing sequence is the fact that the two
sets of firings for actor A and actor B could have clearly been scheduled for parallel execution. In
other words, we can see from the data space diagram that the three firings of actor A are indepen-
dent and can be executed in parallel. Similarly, once all three firings of A are complete and the
data they produce are available, the two firings of actor B are also data independent and can be
scheduled for parallel execution. We will give more examples of this important aspect of MDSDF
in the next chapter.
For a second iteration of the schedule, we can see in Figure 5 that the data space of the
second iteration is laid alongside the data space of the first, incremented along the column dimen-
sion. This was a design decision, to increment along the column dimension rather than the row
dimension. We even considered defining a two-dimensional iteration count, so that we could iter-
ate in both dimensions. We do not know if this latter definition is needed, and all the systems we
have implemented thus far have been definable using just the column incrementation definition of
a schedule iteration. One issue that is clear is the fact that if there are no delays in the system and
there are no actors in the system that require access to “past data” (delays and accessing past data
will be described next), then each iteration is self-contained, in the sense that all data produced is
consumed in the same iteration. The next iteration of the schedule can reuse the same buffer space
as the previous iteration, so the buffer can be of constant size. So although the index of the data
increases as the firing indices increase for each iteration, we do not need an ever increasing buffer
to represent the data space. This is essentially a two-dimensional extension of static SDF buffer-
ing (see [5] for a discussion of static one-dimensional SDF buffering). The index space increases
in the column dimension for each iteration, but the actual buffer is from the same memory loca-
tions.
The last two basic features of MDSDF that we must explain deal with dependency of an
actor on data that is “before” or “after” in the two-dimensional data space. In SDF, the model of
interpreting the arcs as FIFO queues implies an ordering of where particles are in time. Therefore,
we could discuss how stars could access data in the “past.” In MDSDF, since one of our main
goals is to take advantage of multiprocessor scheduling, we do not impose a time ordering along
the two dimensions of the data buffer for one iteration (note that there is an ordering between the
data of successive iterations). Therefore, for lack of a better term, we use “before” or “past” and
“after” or “future” in each dimension to refer to data locations with lower or higher index, respec-
tively, in each dimension. So data location d[0,0] is before d[0,1] in the column dimension but not
the row dimension.
(2,1) (1,3)
A B
Figure 4 shows a possible MDSDF extension to the SDF system of Figure 1. Actor A still
produces two data values, but they are now considered to be arranged as a block that has dimen-
sions of two rows and one column. Similarly, actor B still consumes at each firing three data val-
ues, but these three values are required to be structured as a block with dimensions of one row and
three columns. The underlying data space for this system would look like:
columns
0 1 2 3 4 5 ...
0 B[0,0] B[0,1]
rows
1 B[1,0] B[1,1]
Iteration 1 Iteration 2
Here, the figure shows how the underlying data space has two rows and many columns. First look
at the section marked as Iteration 1. This section of the data space is of size two rows by three col-
umns, which is the lowest common multiple of the row and column dimensions of the two actors
in Figure 4. The first firing of actor A, which we denote with a firing index using square brackets,
is A[0,0] (note the starting index in each dimension is zero), and is mapped to the data space as a
two row by one column block at location d[0,0] and d[1,0], where d represents the underlying
data space. We notice that since actor B needs data blocks that have three columns, the only way
actor A can fulfill such a demand is by firing two more times along the column dimension. These
two firings are denoted A[0,1] and A[0,2], and their associated data space are the two columns next
to that of firing A[0,0]. Once the three firings of A have produced the data, now considered as a
In MDSDF, the graphical notation is extended by adding an extra dimension to the input/
output specifications of each porthole of a star. A MDSDF star in our current two-dimensional
implementation has input and output portholes that have two numbers to specify the dimensions
of the data they consume or generate, respectively. These specifications are given as a (row, col-
umn) pair, and we use parenthesis to denote this pair. For example, Figure 3 shows a MDSDF star
that has one output that generates data with dimensions of two rows by one column.
(2,1)
Unlike the SDF case, which can support two-dimensional data objects using the Matrix
class, the data generated by a MDSDF star is not a self-contained monolithic structure but is con-
sidered part of a underlying two-dimensional indexed data space. SDF is able to transmit two-
dimensional data objects, such as matrices, using the MatrixParticle construct. However, these
data objects are of fixed size, and all actors working on the data stream must be aware of the size
of the object (usually by specifying some parameters to the star) and can only manipulate each
particle of the stream individually. On the other hand, the input/output specifications of a MDSDF
star simply gives us directions on how to arrange the data consumed/produced by the star. For the
case of an output data block, once the data has been generated, it no longer has a fixed sized struc-
ture, and the system is free to rearrange or combine data generated from multiple firings of the
source star into a differently sized data block.
Another way at looking at the specifications of the dimension of the data generated or con-
sumed by a MDSDF star is to consider the specifications as the size of a window into an underly-
Node/actor/star 2 Particle 3
A B
Portholes
Actors are connected together by arcs that represent FIFO queues. The arcs are attached to
an actor at a location called a porthole. An actor can have more than one input or output porthole.
The numbers along the arc connecting the two actors specify the number of particles generated or
consumed by each star every time it executes (also called a star firing in Ptolemy). In the above
example, actor A generates two particles at each firing and actor B consumes three particles.
The fact that the number of inputs and outputs for every actor in a SDF system is known at
compile time gives the scheduler of the SDF domain (note that SDF is just one model of computa-
tion supported by Ptolemy, each of which is called a domain) the ability to generate a compile-
time schedule for simulation and code generation purposes. This schedule is called a periodic
admissible sequential schedule (PASS). A PASS is a sequence of actor firings that executes each
actor at least once, does not deadlock, and produces no net change in the number of particles on
each arc. Thus, a PASS can be repeated any number of times with a finite buffer size, and more-
over, the maximum size of the buffer for each arc is a constant that is determined by the exact
sequence of actor firings in the schedule. We call each of these repetitions of the PASS an itera-
tion.
SDF systems also support the concept of feedback and delays. A delay is depicted by a
diamond on an arc, as shown in Figure 2. The delay is specified by an integer whose value is
interpreted as a sample offset between the input and the output. It is implemented simply as an ini-
1
2 3
A B
1
1
tial particle on the arc between the two actors, so that the first particle consumed by actor B when
it fires is the value of the delay (most often this value is zero, but Ptolemy allows the user to give
This report discusses some of the issues that arose during the development of a multidi-
mensional synchronous dataflow (MDSDF) domain in Ptolemy. The initial goal was to implement
support for a two-dimensional extension of the synchronous dataflow (SDF) domain that could
simulate MDSDF systems on a single processor system. Therefore, throughout this paper, the
terms MDSDF will most often refer to only a two-dimensional implementation, although we hope
that many of the ideas can be generalized to higher dimensions. In implementing a simulation
environment running on a single processor machine, we made a number of simplifying assump-
tions, which we will explain in this paper. We will also discuss some of the difficulties we foresee
in implementing a full multiprocessor version.
Due to the fact that MDSDF is closely related to single dimension SDF, we will contrast
their differences throughout this report. Chapter 2 will explain the graphical representation used
for SDF in Ptolemy and the terms we use to describe the components of an SDF system. We will
also introduce the graphical notation of MDSDF and explain how the two differ. Chapter 3 will
present the features of MDSDF with a series of example systems. Chapter 4 will discuss in more
detail the attributes of an MDSDF system and the problems in implementing a simulation domain.
Chapter 5 will discuss the low-level implementation issues involved in the creation of the
MDSDF simulation domain in Ptolemy, covering design issues such as data representation, buff-
ering, schedule representation, and writing stars for the MDSDF domain. Chapter 6 will conclude
with a summary of what has been accomplished and the areas that still need to be worked on.
In SDF and other graphical models of one-dimensional dataflow, the data transferred
between functional blocks (or actors) is of simple form, i.e. a single value that can be a floating-
point number, an integer, a fixed-point number, or a complex number. In Ptolemy, these values are
June 6, 1994