Vous êtes sur la page 1sur 18

Model-based hardware/software synthesis for Wireless Sensor Network applications Shahzad Ahmad Butt Electronics Department Politecnico di Torino,

Italy Shahzad.Butt@polito.it Parinaz Sayyah Electronics Department Politecnico di Torino, Italy Parinaz.Sayyah@polito.it Luciano Lavagno Electronics Department Politecnico di Torino, Italy Luciano.Lavagno@polito.it AbstractIn recent years the spectrum of wireless sensornetwork applications has grown dramatically. Sensor nodes range from very cheap nodes for control dominated applications like temperature, pressure, area and health monitoring to more powerful and sophisticated nodes used for audio and video surveillance. Implementing a low power hardware/software platform for computationally intensive applications such as video surveillance Becomes challenging. Fast analysis of hardware/software trade offs and accurate estimation of different cost functions becomes essential to improve the final quality of results and time to market. In this paper we start from a realistic application domain, namely sound-triggered wireless security cameras, and we show how one can start from an algorithm modeled and validated using Simulink and using commercial state-of-the-art tools explore various possible implementations for a key component, an FFT module. We show how rapid estimations of the various aspects of the cost function can be obtained quickly, using directly the C code generated from Simulink. Implementation results that we obtained are close to previously reported hand optimized results, showing the feasibility of our approach for trade-off analysis. Keywords- Model-based Design; Hardware and Software Implementation, Wireless sensor networks, Fast Fourier Transform I. INTRODUCTION Real-time applications are implemented as a combination of hardware and software components. Therefore performance evaluation at the early stages of the system design provides an engineer with adequate understanding of the application and lets him determine the best hardware and software partitioning. Signal processing applications are especially challenging, when they need to target low power and low energy domains with very specialized application needs, such as the broad and heterogeneous field of Wireless Sensor Networks. The platform-based approach taken to develop other wireless applications, such as cellular handsets and smart phones, is at odds with the very diverse requirements of this field, and with

the need to absolutely minimize power consumption in order to maximize battery life or rely only on energy scavenging. Recent approaches to WSN node design [1] radically depart from the traditional lightweight microcontroller plus radio platform paradigm that has been used so far to develop applications for this domain. The rationale is that on one hand, a 16 or 32-bit CPU is cheap enough in terms of both area and power consumption to become a reasonable choice, and hence provide substantial computing power, even on low-cost nodes. On the other hand, dedicated hardware provides such huge advantages in terms of cost and power, and WSN nodes for a given application promise to be so widespread, that it makes sense to quickly develop application-specific hardware platforms for them, essentially because significant hardware content is an essential enabler of complex WSN applications, since it is the only mechanism available to increase battery life and reduce radio power by performing most computations and signal analysis locally. In this paper we present our design experience considering audio/video surveillance networks as a realistic compute- intensive case study. In this domain, given the energy cost of video sensing, compression and radio transmission, it is essential to both turn on video capture only when needed and to perform video compression and radio protocol handling in hardware. Thus sophisticated audio signal processing techniques, with a much lower power and energy requirements than video handling, are used to detect when something interesting is happening, and motion detection techniques are also adopted to further reduce the computation load of full video encoding and transmission [2]. As a first step, we focused on one of the most compute-intensive portions of an audio-detection system, namely a Fast Fourier Transform, which is an essential component of any frequency-domain analysis, which is used in tight coupling with time-domain detection algorithms to identify interesting sounds (such as the noise of an animal, a gunshot or the steps of an intruder) from the background (such as wind, cars or birds voices). In this paper, we take the FFT block from the Simulink [10] library, we generate C code from it using Real-Time Workshop, and then we perform very simple profiling and hardware implementation analysis steps, in order to minimally modify the C code, and yet obtain a reasonable feasible estimate of the cost and performance of a block. One key advantage of this approach, with respect to much earlier highlevel estimation work [5] is that with this approach we guarantee that an implementation with that cost and performance exists, even though it may not be optimal. Thus we provide a good starting point for further manual optimization steps, already starting from a nearly-optimal overall solution. The key contribution of this paper is an example of how to use C code generation from Simulink and high-level synthesis to perform fast and accurate design space exploration for a WSN application, aimed at minimizing energy consumption within given performance constraints. II.RELATED WORK

There has been lot of research carried out about designing low power platforms for sensor nodes but there is limited past work about model-based hardware/software tradeoff analysis for sensor network applications. [8] proposes an ultra low power system architecture for sensor networks which is based on the philosophy of driving the system by events and distributing the events between different small blocks so that an event can activate a small part of the hardware at any time, while the other blocks are power gated. [7] describes a low power processor and software stack for sensor network application tuned for saving energy. [9] emphasizes designing an application specific hardware that can reduce network energy and delay. The basic idea of this application specific design is to avoid general purpose hardware and use more dedicated and distributed hardware for specific functionality that promotes fine-grained Vdd-gating. [14] Investigates the usage of power gated hardware task and reports a gain of two orders of magnitude over a conventional low power MSP430 [15] microcontroller implementation. [16] provides a design space exploration platform to characterize the processor and ISA for an application or a specific class of applications. It also analyzes code footprints and memory access patterns. The result is that the area of the Application-Specific Instruction set Processor is very high, unless a very careful co-design methodology for processor and memory architecture is used. In [23] the authors propose a WSN power management scheme that detects a sound in the time domain, and uses it to gate the activity in the rest of the system. The idea is similar to our approach, but we use a more expensive and more precise frequency domain approach. [24] proposes an automatic system level low power synthesis technique based on mapping fine-grained software tasks extracted from an ANSI-C executable model to hardware micro-tasks that are power gated to save energy. The firing of hardware task is done by a system monitor based on guard conditions. Their technique seems more suited to control-dominated applications, in which relatively simple computations are triggered by hardware or software events, while our technique, based on a high-level synthesis tool which can handle both control-dominated and data-dominated specifications, is more general purpose. [25] discusses many state of the art power optimization techniques for WSN at the hardware and network protocol layers. It concludes by proposing a mixed ASP (application specific processor) and ASIC based hardware platform for sensor network applications, which could be co-designed using our approach. In the area of model-based signal processing hardware and software design from Simulink, [18] describes an algorithm for buffer memory optimization that can be used in our context to reduce the cost of a multi-process concurrent implementation in hardware. [19] describes a heavily manual refinement procedure for both hardware and software that is quite similar to the design flow implied by the use of HDL

coder from the Mathworks, while we provide a mostly automated flow, in which only some minor manual code optimization and rewriting steps. [20] describes an automated library-based method for interface generation that could be used in our case to ease and speed up HW/SW interface Design, while we focus mostly on providing a fast and effective path to HW synthesis. [21] uses a very low-level mapping of Simulink time instants to clock cycles, and is useful only for rapid prototyping. Finally, [22] describes at a very high level the need for using high-level synthesis when using FPGAs to implement embedded hardware, but then describes only the internal details of a proprietary tool from Altera, while we describe more in general how to use a generic C/C++/SystemC based high-level synthesis tool to efficiently implement (part of) a Simulink design. III. MODEL BASED HARDWARE/SOFTWARE CO-DESIGN Model-based design starting from block-level languages (such as Simulink or UML) or extended Finite State Machine languages (such as StateFlow [12] or UML), has been used for decades, in order to provide faster time to market and better verification. It achieves this objective by decoupling functional specification, using a platform-independent model (PIM), from the generation of an implementable platform-dependent model (PDM) using various forms of direct mapping and compilation. Simulink, Stateflow, Embedded Real Time Coder (ERT) and Hardware Description language Coder (HDL) from The Mathworks, as well as related hardware and software synthesis tools from companies such as DSpace, Synopsys, and Xilinx, provide a fast path to such modeling and implementation both as hardware and software. However, they are mostly based on a 1-1 mapping between PIM elements (blocks, signals, states, transitions, ) and PDM elements (C or synthesizable VHDL statements). While this provides a good level of transparency (i.e. the designer can directly control the cost and performance of the implementation), it also defies the goals of model-based design, because the PIM must be polluted with platformdependent architectural aspects, which make it hard to re-use. For example, HDL coder provides essentially a single VHDL implementation for each member of a subset of Simulink blocks, such as the Fast Fourier Transform that is described in this paper. This means that architectural trade-offs exploring various combinations of clock period, latency, and sharing, must be written by hand, working essentially at the

Register Transfer Level, and thus making hardware/software trade-offs as well as design space exploration in hardware much more time-consuming. Techniques from software compilation and process scheduling, partitioning and refactoring can be used to optimize the software implementation [4]. In this paper, we focus mostly on the use of high-level synthesis to broaden the trade-off space of hardware implementations, starting essentially from the same C code that is generated by Simulink Embedded coder for software implementation. We show that both hardware and software derived from that code are not too far, in terms of performance, cost and power, from manually optimized designs. Hardware/software trade-offs, high-level synthesis and model-based design are broadly studied topics, and a thorough review of that work is impossible in the limited space of this paper (see [17] for a good overview). However, to the best of our knowledge, this is the first time that these techniques have been applied to a case study from the WSN domain, and that the Simulink-generated software code has been used for hardware implementation via high-level synthesis. frequency of 44.1 kHz, an FFT with frame size 256 must be completed every 5.78 ms. Minimum frequency at which an ARM Cortex M3 processor core can satisfy the performance requirements is 4 MHz, while at 72 MHz (the maximum for the implementation that we use, from STMicroelectronics) the FFT completes in 348.1 us. This can be exploited by using the low power modes of the processor when it is inactive, and using a DMA to manage the A/D conversion. In the following, we will consider a Fast Fourier Transform Simulink Block configured to compute complex radix-2 decimation-in-time (DIT) FFT for 16-bit signed fixed-point data. The Twiddle Factors are computed statically as a Look up Table, because computation on the fly would be much more energy-consuming. V. SIMULINK MODEL TO HARDWARE Figure 1. Multi-modal wireless surveillance IV. CASE STUDY

Still images and video streams are very expensive to transmit over a wireless link, especially in a battery-powered setting. Thus we consider a multi-modal (Audio+Video) surveillance scenario using audio information, which can be processed locally with low energy cost, as a trigger for a camera to capture any abnormal situation, as illustrated in Figure 1. By applying event detection analysis on signals received by microphones, if a sound event is detected, then trigger signals are transmitted from audio sensors to the cameras through the wireless medium. Then the scene can be captured (possibly with cameras pointed toward the sound source) and sent wirelessly to the central hub for further processing, such as intruder identification. In this case microphones can continuously acquire audio signals in noisy environments while cameras can mostly stay in sleep mode. Due to the complexity of this scenario, we started to study the first building block of above system, which is also the first step of every sound analysis, namely event detection. Thus the audio information is sampled at 44.1 kHz using low cost, small size, Omni-directional microphones using 16-bit precision. Several sound detection algorithms exist, such as timebased, frequency-based and statistical-based event detection [3]. The area of specific regions of the audio spectrum is used in adaptive threshold calculations. Frequency-based detection is very effective, but due to its computational complexity it has not been used in recent implementations [3]. In this paper we try to accurately estimate the cost for different hardware and software implementations of FFT which is the computational core of frequency-based audio detection. We first did a very coarse performance analysis of a software implementation, just to check its feasibility. More detailed results are presented in section VI.B. For the chosen sampling We obtained a set of hardware implementations of the FFT from the Simulink model by: Generating C++ code using the Embedded Real Time Coder tool from The Mathworks. Encapsulating the code into a SystemC cycle- and bit-

accurate interface protocol. Using a high level synthesis tool, namely CtoSilicon [13] from Cadence Design Systems, to generate a set of architectural implementations representing various points in the area/performance/power trade-off space. A. Code Generation from Simulink The tuning of ERT consisted of selecting parameters that are required to generate code that is at the same time efficient and readable, so that it can be easily modified to be acceptable for high level synthesis (HLS). Most commercial HLS tools take SystemC as input, hence we selected the Encapsulated C++ code generation option. We also used the Single output update function generation option. This option generates a function that can be called periodically and it is responsible for single step model processing and output updates. Single output update functions can be easily converted to a SystemC clocked process for HLS. Some optimizations should also be enabled during code generation. Enabling block reduction optimization can eliminate some unnecessary runtime calculations by creating arrays of constants and eliminating dead code. Parameter inlining should be disabled, so that the generated model can still be parameterized (e.g. as a SystemC template, or via #define pre-processor directives). B. Code Modification for Synthesis The generated code requires some changes in order to become synthesizable. First of all, it needs to be encapsulated as a SystemC model. The SystemC encapsulation step requires proper bit-width control for data input and output ports, placement of arrays as global or local variables and IO signal declaration for interface purposes. The optimal bit width SytemC data types for data ports can be figured out from simulation and verification of fixed point model done in Simulink. The code must be changed so that it satisfies synthesizability requirements. In our case, the FFT model works on frames of data, so we decided to organize computation by the main SystemC thread as shown in Figure 2. The interface that we implemented includes proper handshake signals to communicate with the outside world, and could be easily replaced with a different protocol (e.g. AHB). The encapsulated C++ code from ERT already uses two member functions called getOut and setIn, which more or less

follows the Transaction Level Modeling (TLM) guidelines and make this kind of interface adaptation easy. Then, the reset behavior of the module should be identified in the generated code and should be included at the beginning of the SystemC thread that implements the computation. C. Further Code Optimization The next step, which also requires a source code review and some C++ profiling, is to develop some understanding of the code structure, for example by identifying the most computationally intensive loops and the memory access patterns that they require. The analysis of data storage and its access patterns in various loops is very useful during microarchitecture selection, and can significantly affect performance, as we will discuss next. ::single_step(Clocked Thread) Infinite While loop setIn(Inlined) Core FFT Processing Scramble Input data Butterfly Operations Global Array (Input Buffer) Global Array (Output buffer) Twiddle Table setOut (Inlined) several functions that are called from single model stepping function, and the data exchange between these functions is often done through intermediate arrays. The loops performing data movements between local and global arrays should be removed, leaving only the arrays that are strictly necessary. Sometimes the HLS tool can perform these transformations automatically (by flattening arrays to scalars and unrolling loops), but very often these optimizations need to be performed

manually with a text editor, and could be automated by changing the ERT C code generator from Simulink. D. Hardware Implementation via HLS The top level micro-architectural decisions that must be performed before starting the scheduling step in CtoSilicon are: Whether to inline functions or not (which determines whether their control FSM is shared or not), Whether to unroll loops or not (which determines the amount of data-level parallelism exposed to the scheduler), How to allocate memories and memory ports to ensure the required processing bandwidth. The FFT algorithm performs a lot of memory accesses to execute butterflies and write their results. Hence insufficient memory bandwidth can significantly degrade performance. In our case the performance target was taken from the existing optimized implementation obtained from HDL coder (which was manually designed as a parameterized optimized RTL block by The Mathworks). We used it as the baseline both to set the synthesis constraints, and to evaluate the results. We increased the memory bandwidth, by using memories with 2 or 3 read ports and 1 write port, until we matched the overall processing rate of the HDL coder implementation. Then we considered loops, and found a small number that can be unrolled, because they did not contain memory accesses. We choose not to experiment with pipelined implementation, because it would have required significant source code changes to separate the butterflies in layers, and we could already meet our performance target (for an audio processing application) without requiring it. In our case, performance was limited by memory accesses so we defined the number of clock cycles to be used by each iteration of a loop based on the available memory bandwidth, in order to maximize throughput. VI.

EXPERIMENTAL RESULTS Other supporting member fuctions Figure 2. Structure of main SystemC process Although the code as modified above is synthesizable, the results may be very inefficient, because it has a structure that makes automated code generation easier and may make the code more readable, but it contains details that are not needed for implementation. For example, the code is broken down in A. Hardware Implementaion TABLE I. shows the required throughput for different nominal audio sampling rates and the throughput that we could achieve by using High-Level Synthesis as described in the previous section. Since the achieved throughput is much higher than the requirement, in the power and energy consumption analysis we assumed that power shut-off is used in order not to consume dynamic or leakage power when the FFT is done computing. We performed logic synthesis of the FFT processor using a 90nm technology library from UMC, obtaining the results shown in TABLE II. The SRAM-based implementation of memories (1K 16-bit words) in this case is much more efficient than the flip-flop based implementation. TABLE I. It would be easy to rewrite the SystemC code so that it can be synthesized with comparable efficiency. We felt that the current implementation is good enough to enable a designer to make decisions, based on other application requirements, about whether to go with a HW or a SW implementation, and which area/performance/power trade-off to choose (e.g. 2-ported vs. 3-ported memory) for a given application area. REQUIRED VS ACHIEVED THROUGHPUT Required throughput (Samples / Cycle) Clock Freq. (MHz)

.00016 50 50 .00013 Achieved throughput (Samples / Cycle) 2 port memory 0.059 0.059 3 port memory 0.083 0.083 Note how the 30% performance increase by adding memory ports is more significant than the 20% area cost. Power consumption increases by about 5% (mostly due to memories), but energy decreases (due to the increase in performance), and hence energy is decreased by about 30%, at the cost of 20% in area by using one additional memory port. No. of Memory read ports 2 (FF-based) 3 (FF-based) 2 (SRAM-based) We tried running the same hardware implementation (optimized for 72MHz) with different clock frequencies. This resulted in a different power supply duty cycle, since a lower frequency implies less dynamic power, but also shorter power shutoff, and thus more leakage. The result of this exploration is shown in Figure 3. through Figure 5. , motivating our choice of 50MHz as the clock frequency in the rest of the section. At that frequency, power is shut off about 98-99% of the time. Note that at lower frequencies, leakage power dominates energy

consumption. This could be avoided, in a real design, by using high Vt libraries, which are slower but have much less leakage. TABLE III. AREA AND ENERGY CONSUMPTION VERSUS MEMORY PORTS Cell Area (lib units) 552 694 125 Figure 3. HW Logic and Memory Energy consumption Energy Consumption (uJ) Logic Memory .21 .065 .24 .064 .016 .019 Total .27 .30 .035 COMPARISON WITH OPTIMIZED RTL AT 50MHZ Cell Area (lib units, FF based) 439

552 Energy Consumption (uJ) .19 .27 Throughput (Samples / Cycle) .179 .083 HDL Coder High Level Synthesis Finally, in TABLE III. we compare our synthesized FFT processor with the RTL design from HDL coder. The handoptimized RTL from The Mathworks is, as expected, smaller, faster and consumes less power than our implementation. However, our HLS implementation was obtained in just two weeks, by a designer without previous exposure to FFT implementation or to High-Level Synthesis techniques. We report only the results of the FF-based memory implementation, because supporting the I/O protocol of the UMC-provided SRAMs would have required changing the RTL, while HLS automatically supports any memory protocol. Figure 4. HW Leakage and Dynamic Energy Consumption B. Software implementation TABLE IV. 256 Points 1024 Points TABLE V. SW (70 MHz) HW (70 MHz) SW (50 MHz) HW (50 MHz) MANUAL VS. AUTOMATED CODE PERFORMANCE

ERT Code 25,000 120,000 HARDWARE/SOFTWARE COMPARISON Area (lib units, SRAMbased) 372 150 372 151 (Cycles) Manual Code (Cycles) 20,000 99,000 Power (mW) 88 0.67 55 0.40 Time (uS) Energy (uJ) 348 61 504 87 30 .041 27 .035 The software implementation was obtained directly from the ERT Coder output, and we used profiling to analyze its performance and power consumption on a variety of embedded 32 bit processors from ARM. Here we report the results for a Cortex M3 implementation from STMicroelectronics. The

performance of the code generated by ERT is very similar to that of hand-optimized code discussed in [6], as shown in TABLE IV, which reports the number of clock cycles required to perform an FFT. In the rest of this section, we consider the 256 point code, in order to be consistent with the hardware implementation. Memory occupation is 3KB for code, 0.5K for read-only data, and 2.5K for read/write data. This results in a total memory size of 6KB, which we used as the area estimation of the SW implementation. This means that we did not consider the area of the processor itself (which is shared among several parts of the application), and only considered the additional area due to the memory occupation, which is required only if the FFT component is used in the application. For area estimation we assumed that code and data all reside in SRAM (with the same technology used for the HW implementation). VII. CONCLUSIONS AND FUTURE WORK In this paper we presented a method to experiment with both HW/SW trade-offs and HW design space exploration options. It is based on using the embedded software generated by Embedded Real Time Coder from a Simulink (and/or Stateflow) diagram, as a starting point for both SW implementation and HW implementation, via high-level synthesis. We show that the results obtained by this methodology are good enough for initial high-level decisions, and we claim that with some code restructuring and further experiments, one can approach the quality of manual design. We apply the methodology to one performance- and energy-critical block from a Wireless Sensor Network-based surveillance application, where balancing hardware cost with energy consumption and real-time performance requirements is essential. In the future, we plan to extend our work to multiblock Simulink diagram, where we will need to explore the possibility of partitioning the HW implementation into separate processes and/or modules, to increase performance. Software Codesign Approach, IEEE Transactions on VLSI, (6)2, 266275, june 1998. Ivan Mellen. FFT Library v. 2.0 Benchmark. (http://www.embeddedsignals.com/ARM.htm) Y. Xu; L. Liu; P. Shen; T. Lv; X. Li; , "Low power processor design for

wireless sensor network applications," Wireless Communications, Networking and Mobile Computing, 2005. Proceedings. Hempstead, M.; Tripathi, N.; Mauro, P.; G. Wei; Brooks, D.; , "An ultra low power system architecture for sensor network applications," Computer Architecture, 2005. ISCA '05. M. Lyons and D. Brooks. "Application-Specific Hardware Design for Wireless Sensor Network Energy and Delay Reduction," Workshop on Optimizations for DSP and Embedded Systems (ODES), April 2008 Simulink Simulation and Model-Based Design. http://www.mathworks.com/products/simulink/ The MathWorks - Real-Time Workshop Embedded Coder - Generate C and C++ code optimized for embedded systems. http://www.mathworks.com/products/rtwembedded/ Stateow - Design and simulate state machines and control logic. http://www.mathworks.com/products/stateow/ C-to-Silicon Compiler - Next-generation high-level synthesis for design and verification. http://www.cadence.com/products/sd/silicon_compiler/ Pasha, M.A.; Derrien, S.; Sentieys, O.; , "Toward ultra low-power hardware specialization of a Wireless Sensor Network node," Multitopic Conference, 2009. INMIC 2009. IEEE 13th International , vol., no., pp.1-6, 14-15 Dec. 2009 doi: 10.1109/INMIC.2009.5383135 T. Instruments, MSP430 Users Guide, Texas Instruments, Tech. Rep.,2006 Mysore, S.; Agrawal, B.; Chong, F.T.; Sherwood, T.; , "Exploring the Processor and ISA Design for Wireless Sensor Network Applications," VLSI Design, 2008. VLSID 2008. 21st International Conference on ,

vol., no., pp.59-64, 4-8 Jan. 2008 doi: 10.1109/VLSI.2008.72 G. De Micheli, R. Ernst W. Wolf, Readings in HW/SW co-design, Morgan Kaufmann 2001 S-I Han, X. Guerin, S-I Chae, A.A. Jerraya, Buffer memory optimization for video codec application modeled in Simulink, Proceedings of the Design Automation Conference 2006. K. Popovici, A.A. Jerraya, Simulink-based HW/SW co-design flow for heterogeneous MPSOC, Proceedings of the Summer Computer Simulation Conference 2007. Y. Atat, N-E Zergainoh, Simulink-based MPSOC design: new approach to bridge the gap between algorithm and architecture design, Proceedings of the International Symposium on VLSI, 2007. M. Struebuehr, M. Jaentsch, C. Haubelt, J. Teich, Semi-automatic generation of mixed HW/SW prototypes from Simulink models, Proceedings of the GI/ITG/GMM workshop, 2008. S. Perry, Model-based design needs high-level synthesis, Proceedings of Design Automation and Test in Europe 2009. Noguchi, H.; Takagi, T.; Yoshimoto, M.; Kawaguchi, H.; , "An ultralow-power VAD hardware implementation for intelligent ubiquitous sensor networks," Signal Processing Systems, 2009. SiPS 2009. IEEE Workshop on , vol., no., pp.214-219, 7-9 Oct. 2009 doi: 10.1109/SIPS.2009.5336254 Pasha, M.A.; Derrien, S.; Sentieys, O.; , "System Level Synthesis for Ultra Low-Power Wireless Sensor Nodes," Digital System Design: Architectures, Methods and Tools (DSD), 2010 13th Euromicro Conference on , vol., no., pp.493-500, 1-3 Sept. 2010 doi: 10.1109/DSD.2010.88 Jingxian Wu; Smith, S.C.; , "Integrated software-hardware design for ultra-low power infrastructure monitoring," Intelligent Transportation Systems, 2009. ITSC '09. 12th International IEEE Conference on , vol., no., pp.1-8, 4-7 Oct. 2009 doi: 10.1109/ITSC.2009.5309696 [6] [7] [8] [9] [10] [11]

[12] [13] [14] [15] [16] REFERENCES [17] [18] [19] [20] [21] [22] [23] Muhammad Adeel Pasha, Steven Derrien, Oliver Sentieys. A complete Design-Flow for The Generation of Ultra Power WSN Node Architecturees Based on Microtasking. Design Automation Conference (DAC), 2010 47th ACM/IEEE. C..Clavel, T.Ehrette, G. Richard. Events Detection for Audio-Based Survillance Application, Multimedia and Expo, 2005. ICME 2005. Alan F. Smeaton, Mike McHugh. Towards Event Detection in an Audio-Based Sensor Networks, Proceedings of the Third ACM International Workshop on Video Surveillance & Sensor Networks, 2005 Bart Kienhuis, Edwin Rijpkema, Ed Deprettere Compaan: Deriving Process Networks from Matlab for Embedded Signal Processing Architectures, Proceedings of the international Workshop on Hardware Software Co-design, 2000. William Fornaciari , Paolo Gubian, Donatella Sciuto, Cristina Silvano Power Estimation of Embedded Systems: A Hardware/ [24]

Vous aimerez peut-être aussi