Académique Documents
Professionnel Documents
Culture Documents
ABSTRACT
The wavelet transform provides an alternative approach to signal processing especially suitable for the analysis of spatial and spectral locality. However wavelet transform requires a large amount of computation. In order to meet the requirements of fast computations in many real time applications, dedicated hardware implementations are necessary.VLSI
implementation of the algorithms are preferred for real-time applications. Memory requirements for storing intermediate signals being large, is an important factor to be considered for 2-D or multidimensional transform. High-speed calculation of 2-D DWT to meet the timing requirement of real-time applications is, consequently, considered as an important task. However folded structures were able to save the storage requirement and hardware consumption to some extent.
Reconfigurable Discrete Wavelet Transform architecture is supposed to meet the diverse computing requirements of advanced multimedia systems. Designing a efficient VLSI architecture is important in order to reduce computation time and thereby increase the speed. Size of memory can be reduced by reusing the same memory. The architecture can be reconfigured to deal with 1-D or 2-D DWT with different bandwidth and throughput requirements. It is challenging work to implement 2-D DWT on reconfigurable block because the order of the results that produced by the row processors does not accord with the order that the column processors consume the data. In this work a thorough literature review to understand the state of art and the trends towards DWT architectures is planned. The Simulation of a small existing architecture for 1-D DWT using Modelsim is also planned. The design, simulation and synthesis of an efficient reconfigurable architecture for 1-D and 2-D DWT is planned for major project.
Page 1
Page 2
Fig 1.1 Time-frequency tilings for a simple discrete-time signal . (a) Sine wave plus impulse. (b) Expansion onto the identity basis. (c) Discrete time Fourier series. (d) Local discrete-time Fourier series. (e) Discrete-time wavelet series
where * denotes complex conjugation. This equation shows how a function (t) is decomposed into a set of basis functions called the wavelets. The variables s and are
the new dimensions, scale and translation, after the wavelet transform.
Page 3
The wavelets are generated from a single basic wavelet wavelet, by scaling and translation
1.1.2 SCALING FUNCTIONS Any function f(x) can be analyzed as a linear combination of real-valued expansion functions
Where k is an integer index of summation (finite or infinite), the expansion coefficients and (x) forms an expansion set.
s is the real-valued
scalings of the real, square integrable function (x), so that (1.1) Where r, s Z (the integer space) and (x) (R) (the square-integrable real space). In the
above equation, s controls the translation in integer steps and r controls the amplitude, as well as the width of the function in the x-direction. Increasing r by one decreases the width by one-half and increases the amplitude by
Fig 1.2 Subspace Relationship of Scaling Functions 1.1.3 WAVELET FUNCTIONS A set of integer translated and binary scaled functions, that span the difference
subspace between two adjacent scaling functions subspace is defined as a set of wavelet
Page 4
functions. If we consider two adjacent subspaces and, the set of wavelets spanning the subspace within these are given as
..1.2
where s Z and (x) (R). It may be noted that although the functional forms of
equations (1.1) and (1.2) are the same, the scaling functions and the wavelet functions differ by their spanning subspaces. The relationship between scaling and wavelet function spaces is illustrated in Fig 1.2
Page 5
Page 6
Fig 1.5 Splitting the signal spectrum with an iterated filter bank
SUMMARY
In this chapter a study on wavelets transforms and their applications in image processing. The main limitations of DCT are blocking Artifacts at lower bit rates. The advantage in wavelet is time frequency localization is good and level of details within an image varies from location to location. Some locations contain significant details, where we require finer resolution for analysis and there are other locations, where a coarser resolution representation suffices. A multi-resolution representation of an image gives us a complete idea about the extent of the details existing at different locations
Page 7
A novel parallel stripe-based scanning method based on the analysis of the dependency graph of the lifting scheme is proposed by Yusong Hu [2]. The elimination of frame memory and the small temporal memory lead to signicant reduction in overall size and have a regular structure and achieved 100% hardware utilization. The overlapped stripebased scanning method we used in the proposed design has scalable stripe width with 7 columns overlapped between two adjacent stripes. The data in the overlapped columns are processed by a newly proposed partial processor in the rDWT of the rst level (Level 1) DWT. With this approach, the temporal memory of Level 1 DWT, which occupies a dominating chip area in other lifting-based architectures, is eliminated at the expenses of reading seven more pixels per cycle and a few extra arithmetic resources. The elimination of the temporal memory results in signicant memory saving. Pipelined architecture, which does not require frame memories, unlike the existing folded multi-level DWT architectures and the ipping method is applied in their design for shortening the critical path delay.
Basant Kumar Mohanty [3] proposed a design strategy for the derivation of memory efficient architecture for multilevel 2-D DWT he proposed design scheme on a convolutionbased generic architecture for the computation of three-level 2-D DWT based on Daubechies The proposed structure does not involve frame buffer
An hardware efficient parallel fir structure with parallel structure is implemented by k k parhi [4] he achieved a high speed and low computation time and higher processing speed
Page 8
can be achieved by using parallel fir filter structures but hardware cost increases. The proposed design can also save a large amount of multipliers and storage elements . The throughput rate is improved by a factor of 4 by the proposed design, but the hardware cost increases by a factor of around 3.
A novel architecture for DWT that can be reconfigured to be adapted to different sizes of input .the architecture can be reconfigured to 3 modes 1-D and 2-D DWT is proposed by Qin sung [5].The reconfigurable architecture mainly on convolution based approach, which has better scalability. In order to minimize the critical path multipliers and adders are pipelined and data dependencies can be overcome by loop unrolling technique
An efficient architecture for the two-dimensional discrete wavelet transform 2-D DWT is proposed by Po-Cheng Wu [6]. The architecture includes a transform module, a RAM module, and a multiplexer. In the first-level decomposition, the multiplexer selects data from the input image. The transform module decomposes the input image to the four sub bands. The advantage of such a scheme is that the data flow is very regular. the transform module is tree-structured and comprises two stages. Stage 1 performs horizontal filtering, and stage 2 performs vertical filtering.
An efficient multi-input-multi-output VLSI architecture (MIMOA) for twodimensional lifting-based discrete wavelet transform is proposed by Xin Tian[7]. Computing time of MIMOA is reduced with less increase of hardware cost MIMOA has the least consumption of hardware cost and on-chip memory. However, the best advantage of this architecture is that it provides a variety of hardware implementations to meet different processing speed requirements by selecting different throughput rates.
A novel 2-D DWT architecture are composed of two 1-D DWT cores and a 2 2 transposing register array is proposed by Yeong-Kang Lai [8] . In this architecture 1-D DWT core consumes two input data and produces two output coefficients per cycle, and its critical path takes one multiplier delay only. Two coefficients at the same column are scanned along the row direction and fed into the column processor with the proposed parallel scanning method. The designed architecture is flexible, and two 1-D processors can be configured to perform 5/3 filter and 9/7 filter efficiently.
Page 9
Chao-Tsung Huang [9] flipping structure, is proposed for the lifting-based discrete wavelet transform. It can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement of the lifting- based discrete wavelet transform by flipping conventional lifting structures. Since the timing problem is due to the accumulation of timing delays from the input node to the computation node in each computing unit, releasing the accumulation by eliminating the multipliers on the path from the input node to the computation node. This can be achieved by flipping each computing unit with the inverse of the multiplier coefficient Moreover, the computation nodes can be split into two parts: One is the summation of the multiplication results from register nodes and the other one is the adder on the accumulative path. The timing accumulation can be greatly reduced by flipping the original lifting-based architectures. Another advantage of flipping structures is that no additional multipliers will be required if the computing units are all flipped
Francescomaria Marino [10] proposed two scalable architectures that perform the discrete wavelet transform DWT of an N sample sequence in only N/2 clock cycles. This result has been achieved by means of a carefully balanced pipelining, First, Architecture 1 and Architecture 2 can be employed for performing two times faster processing than allowed by other architectures working at the same clock frequency (high- speed utilization). Second, they can be employed even using a two times lower clock frequency but reaching the same performance as other architectures. This second possibility allows for reducing the supply voltage and the power dissipation, respectively, by a factor of two and four with respect to other architectures (low-power utilization).
Reconfigurable Array Targeting Discrete Wavelet Transform for System-on-Chip Applications is proposed by Georgi [11].Reconfigurable architectures are highly suitable for complex algorithms which are part of changing standards like JPEG2000 .reconfigurable arrays are for one particular domain of applications, which provide high performance over generic Field Programmable Gate Arrays (FPGAs) he proposed reconfigurable array is flexible to implement lifting and integer based different DWT algorithm.
Page 10
Architecture DWT category Data Scanning Frame Memory Temporal Memory Flipping Parallel Architecture
Basant jiang Convolution convolution Line based no Yes yes yes Line based yes yes ---
From the literature survey mainly 2 architectures are used for implementation of DWT i.e. convolution based and lifting scheme. While the convolution based architectures are implemented with FIR lter banks, the lifting-based architectures are implemented by factorizing the lter banks into several lifting steps followed by a scaling step. Both types of architectures are composed of arithmetic resources such as multipliers, adders and multiplexers, and storage resources. The storage resources include transposition memory, temporal memory and frame memory. Compared to the convolution based architectures, the lifting- based architectures possess the merits of lower computational complexity and higher memory efficiency, but suffer from a long critical path.
Page 11
CHAPTER 3
ARCHITECTURES FOR DWT 3.1 INTRODUCTION
In recent years, many researchers have proposed a number of VLSI architectures .There are two approaches to compute the 2-D DWT convolution and lifting based approach. A simple separable approach to compute the 2-D DWT is processing the horizontal direction followed by the vertical direction or vice versa by cascading 1-D DWT devices. However, this approach requires a transposition memory to keep an intermediate result of 1-D DWT. Thus, lifting scheme was used to reduce the arithmetic complexity. To achieve real time signal processing. Discrete wavelet transform is being increasingly used for image coding. This is due to the fact that DWT supports features like progressive image transmission by quality, and resolution, ease of compressed image manipulation, region of interest coding, etc. DWT has also been adopted in the JPEG 2000 standard due to its favourable characteristics, such as multi-resolution representation and the ability to decorrelate large image. Compared to the convolution based architectures, the lifting based architectures possess the merits of lower computational complexity and higher memory efficiency, but suffer from a long critical path. The ipping method proposed by Huang et al. reduce the critical path length of the lifting based architecture. Further, the memory is also a size dominant factor in the 2-D can be reduced by the (5, 3) filter mode consists of one predict step and one update step, while the (9, 7) filter mode can be performed by applying the predict and update steps two times. Intermediate values generated by the predict and update step should be saved for the next step. In 1996, Sweldens presented a lifting scheme for a fast DWT, which can be easily implemented by hardware due to significantly reduced computations There are two main methods to produce and implement wavelet transforms. These methods are based on time domain or frequency domain features. The frequency based method is Filter Banks (FB) and the time based one is called Lifting Scheme (LS).
Page 12
Page 13
but with half size. So the s signal is an approximation for x and is called approximation coefficient. Note that the details and approximation coefficients (d, s) in lifting scheme, respectively, are the same as high pass and low pass outputs in FB. Based on the above description we have For prediction block d[n] = Xo[n] + (Xe[n]). For update block s[n] = Xe[n] + (d[n]).
Equations for P and U functions are determined based on the implemented wavelet; also the number and arrangement of P and U blocks in the lifting structure are different for various types of wavelets. H[n] = d[n] K0 L[n] = s[n] K1.
The lifting scheme has been developed as a flexible tool suitable for constructing the second generation wavelets. It is composed of three basic operation stages: split, predict and update. Fig.3.2 shows the lifting scheme of the wavelet filter computing one dimension signal.
Page 14
The three basic steps in Lifting based DWT are: Split step: where the signal is split into even and odd points, because the maximum correlation between adjacent pixels can be utilized for the next predict step. For each pair of given input samples x(n) split into even x(2n) and odd coefficients x(2n+1).
Predict step: The even samples are multiplied by the predict factor and then the results are added to the odd samples to generate the detailed coefficients (dj).Detailed coefficients results in high pass filtering. Update step: The detailed coefficients computed by the predict step are multiplied by the update factors and then the results are added to the even samples to get the coarse coefficients (sj).The coarser coefficients gives low pass filtered output.
Page 15
An architecture using the block-based scanning method was proposed by Cheng and Parhi. In their design, an image is divided into blocks of size for a lter length of and the pixels are scanned row by row within each block and block by block horizontally across the image. The design has high throughput but its trans- position memory and temporal memory are large. Additionally, it also requires a large amount of arithmetic resources due to its convolution-based architecture.
Fig3.4: block based scan 3.2.3 STRIPE BASED SCAN The stripe-based scanning method was rst introduced by Chiu et al. In the method, an image is partitioned into a number of stripes of columns and the data is scanned row by row within each stripe. A modied stripe-based scanning method with overlapped columns
Page 16
between every pair of stripes was proposed by Huang et al for efficient bandwidth-memory trade-off. The internal memory size is reduced at the expense of longer computation time and slightly larger external bandwidth.
SUMMARY
A Literature survey on the existing DWT Architectures had done from that the comparison between two architectures are listed below: 1) In both the convolution-based and the lifting-based architectures, the area is dominated by the memory. 2) Data scanning method has a signicant impact on memory size as it decides how t he data ows and how the computations are scheduled. 3) The lifting-based architecture demands less arithmetic resources than the convolutionbased architecture does.
Page 17
4.1 INTRODUCTION
In the past few years, wavelet transforms have become a hot topic of research. Discrete and continuous wavelet transforms have been widely used in signal and multimedia processing. Due to the high performance and flexibility of reconfigurable computing systems, it is very attractive to design a reconfigurable architecture for discrete and continuous wavelet transform of wide range of wavelet filters
Reconfigurable systems are a novel computing paradigm, which allow different tradeoffs between flexibility and performance. Typical reconfigurable computing systems consist of arrays of reprogrammable logic blocks and flexible interconnect. Such architectures distinguish themselves from traditional microprocessor architectures in that reconfigurable computing systems work in a complete parallelized manner, and exhibit an inherent computational density advantage over microprocessors.
Page 18
SUMMARY
The demand for image/video applications in portable form has greatly increased in recent years. At the core of these productive and useful application is image/video compression technology. The DWT is one of these algorithms and technologies that had been developed for the compression of digital image/video data. Reconfigurable architectures are highly suitable for complex algorithms. By using the reconfigurable architectures we can achieve higher performance and scalability
Page 19
CHAPTER 5 IMPLEMENTATION
In this work by using the Modelsim SE 6.4 will be used to simulate the architectures of 1-D and after simulating the results in the form of waveform. Functional simulation using ModelSim to compile the Verilog or VHDL. ModelSim timing simulation is then run using the timing simulation model. The language in which the design is written can be VHDL or Verilog. Here Verilog will be chosen.
Page 20
CONCLUSION
The Discrete Wavelet Transform (DWT), which is based on sub-band coding, is found to yield a fast computation of Wavelet Transform. It is easy to implement and reduces the computation time and resources required. In CWT, the signals are analyzed using a set of basis functions which relate to each other by simple scaling and translation. In the case of DWT, a time-scale representation of the digital signal is obtained using digital filtering techniques. Scaling and wavelet functions these functions can analyze a continuous valued, square integrable signal in multiple resolutions. The scaling functions provide approximations or low-pass filtering of the signal and the wavelet functions add the details at multiple resolutions or perform high-pass filtering of the signal. Although the theory in report for continuous, one-dimensional signals, it may be extended for discrete two-dimensional signals, which we require for multi resolution image analysis and coding.
A study on different architectures of DWT has been done and from the main requirements for designing and implementing DWT on VLSI architecture are power efficiency, area and memory by using any one of the scheme like convolution or lifting scheme can give us better performance in terms of either speed or area.
Mainly reconfigurable architectures are preferred for the better performance and scalability. Reconfigurable architectures will be useful in advanced multimedia communication systems. Due to the high performance and flexibility of reconfigurable computing systems, it is very attractive to design a reconfigurable architecture for discrete Wavelet Transform of wide range of wavelet filters
Page 21
REFERENCES
1. Chih-Hsien Hsia, Jen-Shiun Chiang, Memory-Efficient Hardware Architecture of 2-
D Dual-Mode Lifting-Based Discrete Wavelet Transform IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 4,pp 671-683, 2013. Yusong Hu and Ching Chuen Jong A Memory-Efficient Scalable Architecture for
2.
Lifting-Based Discrete Wavelet Transform IEEE Transactions on Circuits and Systems II, Exp Briefs, vol. 60, no. 8,pp 4975-4987, 2013 Basant Kumar Mohanty and P.K.Meher Memory-Efficient High-Speed Convolution-
3.
based Generic Structure for Multilevel 2-D DWT IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 2, pp 353-363,2013. Chao cheng and K K Parhi High Speed VLSI Implementation OF 2-D DWT IEEE
4.
Transactions on Signal Processing, vol. 56,no.1,pp 385-391, 2008. Qing sin and Jiang Jiang A Reconfigurable Architecture for 1-D and 2-D Discrete
5.
Wavelet Transform IEEE Transactions on Field Programmable Custom Computing Machines,vol.3,pp 81-84,2013 6. Po-Cheng Wu and Liang-Gee Chen An Efficient Architecture for Two-Dimensional
Discrete Wavelet Transform IEEE Transactions on Circuits and Systems for Video Technology, Vol. 11, no. 4,pp 536-544, 2001. 7. Xin Tian, Lin Wu, Yi-Hua Tan, and Jin-Wen Tian Efficient Multi-Input/Multi-
Output VLSI Architecture for Two-Dimensional Lifting-Based Discrete Wavelet Transform IEEE Transactions on Computers, vol. 60, no. 8,pp 1207-1208,2011 8. Yeong-Kang Lai, Member ,Lien-Fei Chen, and Yui-Chih Shih A High-Performance
and Memory-Efficient VLSI Architecture with Parallel Scanning Method for 2-D LiftingBased Discrete Wavelet Transform IEEE Transactions on Consumer Electronics, vol. 55, no. 2,pp 400-407, 2009
Page 22
9.
Efficient VLSI Architecture for Lifting-Based Discrete Wavelet Transform IEEE Transactions on Signal Processing, vol. 52, no. 4,pp 1080-1089, 2004 Francescomaria Marino, David Guevorkian, and Jaakko T. Astola Highly Efficient
10.
High-Speed/Low-Power Architectures for the 1-D Discrete Wavelet Transform IEEE Transactions on Circuits and Systems-II Analog and Digital Signal Processing, vol. 47, no. 12,pp 1492-1502,2000
Page 23