Assessing StreamIt SPADE

Stream Programming Language: Assessing StreamIt and SPADE
Thang M. Le David R. Cheriton School of Computer Science University of Waterloo thang.le@uwaterloo.ca ABSTRACT
Stream programming language offers an appealing approach by means of exposing parallelism in stream programs from the language and leaves all optimizations and parallelization to the compiler. StreamIt and SPADE are the two representatives of this approach. Despite the two languages sharing some common ideas, they remain different from their compilers and optimizations. The similarities and differences of the two make an interesting case study. Using Kahn Process Networks as the guideline, we will assess the two languages from their language designs, compilers and optimizations. programs from the language and leaves all optimizations and parallelization to the compiler. StreamIt and SPADE are the two representatives of this approach. StreamIt [13] is a stream programming language constructed based on the Java language. StreamIt is different from other stream programming languages in its basic constructs which expose the parallelism and communication of streaming application without depending on the topology or granularity of the underlying architecture. Various optimizations are done at the StreamIt compiler which targets at Raw architecture [14]. On commercial software products, SPADE [3], an IBM stream programming language, shares many features in common with StreamIt. The language is built in with a richer set of features to meet the demands from its high end users. Programs written in SPADE are executed on System S which is a large-scale distributed data stream processing middleware. Even though the two languages share some common ideas, they remain different from their compilers and optimizations. The similarities and differences of the two make an interesting case study. In order to properly evaluate the two languages with respect to stream processing disciplines, we choose Kahn Process Networks (KPNs) to facilitate our study. KPN is the well-known work in stream processing research. The model possesses some attractive properties which we will use to discuss about the correctness of a stream programming language. Our study will start with a brief discussion on Kahn Process Networks in section 2. In section 3, we will use KPN to assess various aspects of the two languages. Section 4 has our study on different approach of the two languages for optimizations and parallelization. We will then conclude our study in section 5.
Categories and Subject Descriptors

D.3.3 [Programming Languages]: Language Contructs and Features Stream programming structure. D.3.4 [Programming Languages]: Processors-Compilers, Optimization.
General Terms Algorithms, Languages, Theory Keywords StreamIt, SPADE, stream programming language,
parallelism, Kahn Process Networks
1. INTRODUCTION
Stream processing is a wide study spanning different research areas including dataflow systems, reactive system, synchronous concurrent algorithms, signal processing systems and real-time systems [12]. The term stream in stream processing is defined as an infinite sequence of data. A stream program continuously performs computations as data arrives and produces a consistent output stream for the same input stream. This is the determinate constraint of stream programs. On the other hand, because of the computationally intensive nature of stream programs, computations are necessarily executed in parallel to efficiently produce output. These unique characteristics of stream programs impose certain challenges on the design of stream programming language and compiler to exploit parallelism in stream applications. Although a general theory of stream processing has not been studied in depth, there have been many distinguishing researches in developing parallel processing techniques for data stream. Researchers at Stanford have built STREAM [2], a data stream management system (DSMS) which uses Continuous Query Language to facilitate parallel executions of queries. STREAM is useful for demand-driven systems on stream data. However, when it comes to data-driven stream processing where a stream application needs to perform computationally intensive tasks on data stream, the use of DSMS is less useful. Examples of such applications are sound processing applications, image decoding applications, cryptographic encryptions For these applications, the stream programming paradigm offers an appealing approach by means of exposing parallelism in stream 1
2. Kahn Process Networks

The study of stream processing language has many relations to the study of parallel computation, specifically, parallel programming language. The main interest is the ability to exploit parallelism in stream programs. Kahns simple language for parallel programming [7] has become an important model in this domain. In his semantics, a written program is mapped to a network of processes connected by channels. This network is known as Kahn Process Network (KPN). In this network, a process is a computing station which can compute a sequential program. A channel is a unidirectional communication line connecting two processes. Data is transmitted through this channel following First-In First-Out basis. The transmission is presumably reliable and the computing stations are assumed to have an unbounded amount of memory. There are a number of constraints which must be satisfied in order to correctly construct a KPN:
A channel is the only form which assists data transfer between two processes. This constraint promotes the share nothing policy in KPN. Two processes must transmit data over a dedicated channel. At any given time, a process is either computing or waiting for an input on one of its input lines. This means reading from an empty channel blocks the process until sufficient data arrives while writing to a channel is nonblocking. The function computed by a process must be a continuous function [7]. It was proved by Lynch [10] that determinate programs compute continuous functions. Programs that compute functions drawn from a class of functions defined by recursive definition process from a base function [11] are determinate, hence, these programs immediately satisfy this constraint. Using the fixed-point equation principle (Kahns principle) [7], Kahn proved KPN is determinate. Determinacy is a very strong property which is often used to reason about the correctness of a parallel programming language. Readers can refer to [8] for the formal definition of determinacy and its importance in parallel computation. Monotonicity and continuity follow immediately from Kahns proof. The three properties of KPN: determinacy, monotonicity and continuity constitute to the correct behavior of KPN, which guarantees processes of KPN can be executed in parallel while the system still produces continuous and correct output. Using KPN, we show a parallel programming language is sound if it strictly follows KPN. In the next section, we will evaluate the correctness of StreamIt and SPADE languages based on KPN.
C++ and Java. For its semantics, SPADE chooses to use value semantics as it is more natural and more efficient to treat stream data as pure copies rather than having an identity in memory [6]. In addition, the language is also required to interact with higher programming tools and other languages such as System S IDE, StreamSQL as well as the lower level System S programming APIs. Currently, SPADE language has built-in support for all fundamental stream relational operations in its toolkit. These operators can be used to implement any relational query. For nonrelational operations, SPADE provides support for user-defined operations which can extend the basic operators to add userspecific implementation. In SPADE, an operator is considered a basic unit of computation. While the StreamIt filter can execute any sequential program, SPADE strictly defines different types for its operators. Each type is created for different purposes. These types include Functor, Aggregate, Sort, Punctor and Delay. For the meaning of these operators, please refer to [3]. In terms of supporting network topologies of operations, SPADE provides Split and Join operators. They are analogous to SplitJoin defined in StreamIt. In addition, SPADE supports Barrier which joins all input and emits output data only when data from each of its input streams has arrived. The Delay operator is to delay a flow of a stream based on user supplied time interval. SPADE does not explicitly have pipeline connection between two operations. In fact, the way two operators are connected in SPADE is handled at runtime with the help from Data Graph Manager (DGM) [3], a component of System S. A connection between two operators can be explicitly established using a hardcoded link which is similar to PipeLine in StreamIt. Alternatively, two operators can also be implicitly connected by relying on type compatibility of the input port types and the output port types. SPADE adopts UIMA [15] framework for its type system. StreamIt filters and SPADE operators are analogous to the processes of KPN. Communication links in both languages are equivalent to the channels of KPN. Clearly, the design of the two languages is structured around computation units and connection channels. Given a program written in these languages, one can easily construct its KPN representation by mapping filters/operators to Kahn processes and connection links are corresponding to Kahn channels. We now verify the KPN representations of programs written in these two languages satisfy all KPN constraints mentioned above. Both languages apply a share nothing policy among computation units. Communication channels are used to transfer data among units. Hence, the first KPN constraint holds in both languages. The second constraint of KPN which states each channel connecting two processes must be dedicated is also valid in both languages. In more detail, each filter in StreamIt has a single input and a single output. Hence, it only accepts one incoming channel and produces one outgoing channel. Clearly, these channels are dedicated to this filter. The StreamIt splitter takes data from its single input channel and sends out in parallel to its output channels. Immediately, these channels are dedicated to this splitter. The SteamIt joiner can be either RoundRobin or Null. The Null joiner should never violation the constraint. The RoundRobin joiner receives input data from different channels separately. Hence, these incoming channels are separated and dedicated. Similarly, in SPADE, Functor, Aggregate, Sort, Punctor and Delay are all operators having a single input and a single output. Hence, communication channels of these operators are dedicated. The SPADE splitter routes incoming data to different output streams. Immediately, its channels are dedicated. The join
3. StreamIt & SPADE - Languages

StreamIt and SPADE are languages designed for stream programming. The two languages share the same approach which describes a stream program as a directed graph where vertices represent computation units and edges are channels transmitting data from output ports to input ports. We will show a program written in these languages has a KPN representation. StreamIt adopts the Java language for its syntax and grammar. The basic unit of computation in StreamIt is Filter which computes a sequential program. A filter has an input and an output channels which are FIFO queues used to receive and send data. Filters can be connected by using different constructs to produce different network topologies of filters. StreamIt supports three constructs: PipeLine, SplitJoin and FeedBackLoop. The PipeLine construct is used to build a sequence of filters. The SplitJoin construct consists of a splitter and a joiner. The splitter specifies how data received on the input port is distributed to a set of parallel output ports and the joiner defines how data received from the parallel input ports is joined on the output port. Splitters which distribute data equally among output ports in round-robin sequence are RoundRobin splitters. Correspondingly, there are RoundRobin joiners which join data among parallel input ports in round-robin fashion. The last construct is the FeedbackLoop construct which provides a way to create recursive parallel programs where the outputs of downstream filters become the inputs of upstream filters. The SPADE language is built from the ground up with its own syntax and structure. The language constructs its syntax based on 2
operator receives data from different input channels separately to perform its join logic. Hence, its channels are dedicated. For the third constraint, at the language level, we can safely assume a computation unit having a single input stream becomes active only when input data is available on its input stream. Otherwise, it is blocked on waiting for input data to arrive. The joiners are exceptions. Indeed, more care must pay to the joiners so that the joiners do not perform merging logic. In parallel computation, merge is a non-determinate operator from the wellknown Brock-Ackerman merge anomaly. The difference between a join operation and a merge operation is a join operation necessarily waits for input data to arrive on all of its parallel input streams while a merge operation does not wait and sends out data as soon as there is data available on any of its input streams. From both language specifications, the joiners of both languages require input data must be available on all of its input streams. Hence, the third KPN constraint holds with respect to the joiner behavior. For the last KPN constraint, it is enough to show computation units of both languages are primitive recursive functions. By definition, a primitive recursive function is a recursive function of a base operation [11]. Let us verify the last constraint of KPN. The RoundRobin joiner in StreamIt simply writes data out to its output stream. Hence, this joiner is determinate. The joiner in SPADE evaluates its match clause which is a primitive operation. Hence, this joiner is also determinate. By theorem 1 in [7], we can conclude the use of splitters in both languages does not break properties of KPN. The SPADE aggregate and sort operators are higher order functions taking primitive functions as its inputs. Hence, they are primitive recursive functions. As a result, they are determinate. The Barrier operation in SPADE is another form of the join operator. Since the join operator is determinate, so is the Barrier operator. The Functor operator needs to evaluate a filter condition which is boolean expression. Hence, this operator is determinate. Similarly, the Punctor operator only needs to evaluate the condition specified in the punctuate. Hence, this operator is determinate. The Delay operation is clearly determinate. For the StreamIt filter, we are unable to determine whether the filter is determinate. The reason is the filter can take any logic from developers. If this logic is non-determinate, the filter becomes non-determinate. This is indeed the uncertainty in StreamIt. If a stream application is broken down into filters such that some filters are non-determinate while determinate behavior is required by the application, these filters cannot be safely executed in parallel. In order to protect its users, the StreamIt compiler needs a mechanism to detect these non-determinate filters. One approach is the method of monitors by Hoare mentioned in Kahn paper [7].
The compiling process of StreamIt comprises eight phases: KOPI Front-end, SIR Conversion, Graph Expansion, Scheduling, Partitioning, Layout, Communication Scheduling and Code Generation. Out of these phases, the scheduling phase, partitioning phase and layout phase are most relevant to stream processing. In the scheduling phase, the StreamIt compiler calculates an initialization schedule and steady-state execution schedule. By definition, a steady-state execution of a stream application is a state where all computation units of a stream application can be repeatedly applied in a regular and predictable order such that the amount of data buffered in all input and output streams remains unchanged when completing execution of the schedule [5]. This phase is important to make sure computation units can be safely executed on a limited amount of memory. Hence, this phase addresses the first limitation of KPN. StreamIt uses the values set for numbers of peek(), pop() and push() in each filter to make these calculations feasible. The partitioning phase involves various optimizing efforts to achieve a high degree of parallelism in StreamIt applications. During this phase, the SteamIt compiler applies a set of fusion, fission and reordering transformations on the filters to achieve a desired throughput. The fusion transformation is used to merge small filters into a larger filter to balance computations among filters and to make space for larger filters to be split. In contrast, the fission transformation is used to increase parallelism by duplicating stateless filters a number of times and connect these filters with the RoundRobin splitter and the RoundRobin joiner. The fission transformation employs different techniques to explore data parallelism, which addresses the second limitation of KPN. Going beyond, the StreamIt compiler also performs many advanced algorithms which allow it to exploit parallelism at various levels including task parallelism, pipeline parallelism and data parallelism. Readers may refer to [5] for more detail on how StreamIt achieves these types of parallelism. The layout phase concerns the efficiency when mapping filters to cores on the target architecture. The goal of this phase is to minimize communication costs and synchronization. As mentioned in [5], achieving data parallelism comes with a higher cost in communication. The new technique was developed to address this communication overhead. The new technique involves two steps. In the first step, the compiler fuses as many nodes as possible so long as the new node remains stateless. This reduces unnecessary communication costs across cores. Then, the new fission transformation named judicious fission is applied to the new stateless nodes. The judicious fission only duplicates enough stateless nodes to fill up idle processors. This minimizes communication overheads by preventing too many duplicated stateless filters being assigned to cross cores. These two steps of the new technique become the two new phases of StreamIt compiling process: Coarsen Granularity and Judicious Fission [5]. The present of the two phases addresses the third limitation of KPN. While the StreamIt compiler and its optimizations target the Raw architecture, SPADE optimizations aim at a cluster of machines running in System S. The SPADE compiler is responsible for exploiting parallelism in SPADE programs and assigns operators to machines in the cluster. In return, the SPADE compiler relies on System S for scheduling and load balancing. Because of this, we can assume the first limitation of unbounded memory in KPN is addressed by the System S scheduler. Exploiting parallel computations in SPADE is limited compared with StreamIt. There are two types of optimizations performed by the SPADE compiler: operator grouping optimization which focuses on
4. StreamIt & SPADE - Compilers & Optimizations

We just showed the two languages follow KPN. Hence, their computation units can be executed concurrently. However, the degree of parallelism of KPN is unsatisfactory for few reasons. Admittedly, KPN has number of limitations need to be addressed. First, KPN assumes computing stations have unbounded memory which is unrealistic. Second, the model only explores parallelism in computations. Stream data come in large amount of data. Exploring data parallelism yields a better throughput. Third, the model does not address the communication overhead which in practice has a large impact on the throughput of a system. In this section, we will assess compilers and different optimizations of both languages with the focus on these limitations. 3
pipeline parallelism and execution model optimization which exploits task parallelism. Since SPADE applications run on cluster, the communication cost is much higher compared with StreamIt programs executed on tiled architectural processors. Hence, both optimizations of SPADE are tailored toward a goal to achieve a better throughput while keeping transmission overhead low. In order to achieve optimal communication cost, SPADE employs the powerful profiling framework [3] to collect statistic data which is a key ingredient for an effective optimization strategy. Various data metrics are collected to measure input/output ratio and CPU consumption. These metrics are extensively used by the SPADE fusion optimizer to exploit pipeline parallelism. By formulating the problem of fusion transformation with the help of these statistic data, GreedyFuse algorithm was constructed as an effective fusion strategy with minimized inter-node communication costs. The SPADE execution model optimization follows a much simpler way which duplicates computation units to take advantage of multi-threaded execution when necessary. Our concern is thread is problematic for its non-deterministic [9]. Supporting multi-threaded operators is the weak point in SPADE. Specifically, preempting a running thread to allow an independent thread executing the same logic on the same computing node in fact violates the third constraint of KPN. With the multi-threaded model execution, the SPADE system might no longer be determinate. Given that SPADE applications mostly perform relational operations, we think the need for determinacy can be relaxed. The lack of fission transformation prevents SPADE to truly exploit data parallelism. The split/aggregate/join architectural pattern [1] was studied to take advantage of coarse grained data parallelism in SPADE applications. We would think this pattern will initiate a complete study on data parallelism in SPADE in the future.
6. REFERENCES
[1] H. Andrade, B. Gedik, K. L. Wu, P. S. Yu. Processing High Data Rate Streams in System S. Journal of Parallel and Distributed Computing, Volume 71 Issue 2, February, 2011. [2] A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava and J. Widom. STREAM: The Stanford Data Stream Management System. InfoLab, Stanford University, Menlo Park, CA, Technical Report 2004-20, Mar.2004. [3] B. Gedik, H. Andrade, K. L. Wu, P. S. Yu, M. Doo. SPADE: The System S Declarative Stream Processing Engine. Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008. [4] M.I. Gordon, W. Thies, M. Karczmarek, J. Lin, A.S. Meli, A.A. Lamb, C. Leger, J. Wong, H. Hofmann, D. Maze and S. Amarasinghe. A Stream Compiler for CommunicationExposed Architectures. Proceeding ASPLOS-X Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, 2002. [5] Michael Gordon. Compiler Techniques for Scalable Performance of Stream Programs on Multicore Architectures. Ph.D thesis, MIT, 2010. [6] M. Hirzel, H. Andrade, B. Gedik, V. Kumar, G. Losa, M. Mendell, H. Nasgaard, R. Soule and K. L. Wu. SPADE Language Specification, 2009. http://www.google.ca/url?sa=t&rct=j&q=spade%20language %20specification&source=web&cd=2&ved=0CCsQFjAB&u rl=http%3A%2F%2Fresearcher.ibm.com%2Ffiles%2Fushirzel%2Ftr09-rc24830-spade.pdf&ei=1UV9T7uLIa29QSs0tmUDQ&usg=AFQjCNEBXtxsn0veaHuxs1mPln LWzbRw7Q&cad=rja [7] Gilles Kahn. The Semantics of A Simple Language for Parallel Programming. IFIP Congress 1974: 471-475. [8] R. M. Karp, R. E. Miller. Parallel Programming Schemata: A Mathematical Model for Parallel Computation. Switching and Automata Theory, 1967. SWAT 1967. IEEE Conference Record of the Eighth Annual, 1967. [9] Edward A. Lee. The Problem with Threads. Journal Computer, Volume 39 Issue 5, May 2006. [10] N. A. Lynch and E. W. Stark. A Proof of the Kahn Principle for Input/Output Automata. Information And Computation 82, 81-92, 1989. [11] John Mccarthy. A Basis for a Mathematical Theory of Computation. Proceedings of the Western Joint Computer Conference, 1961. [12] R. Stephens. A survey of stream processing. Acta Informatica, 34:491-541, 1997. [13] W. Thies, M. Karczmarek and S. Amarasinghe. StreamIt: A Language for Streaming Application. In Proceedings of the International Conference on Compiler Construction, Grenoble, France, 2002. [14] E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe and A. Agarwal. Baring it all to software: Raw machines. IEEE Computer, 30(9), 1997. [15] IBM UIMA. http://www.research.ibm.com/UIMA/, Aug 200
5. Conclusion
In this paper, we have effectively used KPN to guide us through the study of the two languages, StreamIt and SPADE. Adopting KPN in our study offers some merits. On the one hand, KPN has some elegant properties which we need to ensure determinacy of a stream programming language. Throughout this paper, we have successfully applied KPN to perform various discussions about the correctness of the two languages. On the other hand, KPN has certain limitations in exploiting parallelism. These limitations drive us through different techniques in the compilers and optimizations of the two languages. Our study has covered most important features of StreamIt and SPADE. We have contrasted them to identify the similarities and differences. From the languages, both embrace the same approach and are closed to each other. SPADE has richer advanced features in its language compared with StreamIt. From the compilers and optimizations, StreamIt employs more advanced techniques to fully exploit parallelism in different aspects: task parallelism, pipeline parallelism and data parallelism. Since the two compilers target at different domains, the StreamIt compiler is for multi-core processors while the SPADE compiler aims to cluster of machines. The cross domain might result in some restrictions on parallelization for stream applications in SPADE. It would be interesting to see how a program written in StreamIt behaves on a cluster environment.

Assessing StreamIt SPADE

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Assessing StreamIt SPADE

Transféré par

Droits d'auteur :

Formats disponibles

Stream Programming Language: Assessing StreamIt and SPADE

Categories and Subject Descriptors

2. Kahn Process Networks

3. StreamIt & SPADE - Languages

4. StreamIt & SPADE - Compilers & Optimizations

Vous aimerez peut-être aussi