Vous êtes sur la page 1sur 6

Reverberation Reverberation Reverberation

Reverberation Der everberation Dereverberation


and

Dereverberation Dereverberation
By Staff Technical Writer

n this article we summarize recent AES convention papers dealing with reverberation and its artificial generation, analysis, and enhancement. How can some of the characteristics of reverberation be measured in a perceptually relevant way? How can reverberation be removed successfully from other wanted audio signals? It is also important to consider ways in which reverberation may be implemented in platforms designed for interactive virtual environments, where computational resources may be shared with visual processing and numerous changing sources have to be processed in real time.
ARTIFICIAL REVERBERATION Although artificial reverberation processors have been in existence for many years, research is still continuing into more efficient and better sounding algorithms. Vickers et al., in Frequency Domain Artificial Reverberation Using Spectral Magnitude Decay (AES 121st paper 6926), explore the concept of producing artificial reverberation in the frequency domain. Most extant algorithms, they point out, are either based on feedback-delay networks in the time domain or on convolution in the frequency domain. The former have a relatively low computational cost and provide control over some perceptually relevant parameters; but it is expensive to implement multiband equalizers in the feedback paths, so any control over frequency-dependent decay tends to be limited to a few bands. On the other hand, those based on convolution in the frequency domain are very effective at simulating specific physical spaces, but it is more difficult to control individual parameters, and such systems are computationally expensive for long decay times. In theory, the method described by the authors would allow for detailed control over the frequency-dependent decay time and require less memory than feedback-delay networks. The authors use a technique inspired by the phase vocoder, shown in Fig. 1, which essentially consists of a time-toJ. Audio Eng. Soc., Vol. 55, No. 3, 2007 March

frequency transform based on the shorttime fourier transform (STFT), followed by modifications to the phase and magnitude components of the frequency spectrum, followed by an inverse transform and reconstruction of the time-domain signal. They investigate a range of techniques for generating reverberation and time-freezing effects, mainly based on the accumulation of successive frames of spectral magnitude and phase information and the successive attenuation of the magnitude components over time. They find that the processing of phase information in successive frames during the decay is crucial to the generation of a perceptually natural reverberation. For this reason they have to generate an artificial phase signal that can be combined with the the accumulated magnitude response, such that the reverbs impulse response resembles noise with an exponential decay. Some of the issues to be overcome here include establishing the right tradeoff between phase coherence and randomization, as well as the avoidance of roughness and periodicity in the decay structure. The authors find that an algorithm with phase randomization applied at the output works quite effectively although the echo density does not necessarily increase with time. However, they argue that since late reverberation is normally said to begin where individual reflections cannot

x
windows

fft(fftshift())

X
Rectangular to Polar

mag

phase

Modifications

Polar to Rectangular

Y
fftshift(real(ifft))

windows

overlap-add

y
Fig. 1. Phase vocoder (courtesy Vickers et al.)
189

Reverberation and Der everberation

Fig. 3. Helical spring with driver and receiver at either end

Fig. 2. Spring propagation modes: (a) transverse, (b) longitudinal, (c) torsional. (Figs. 26 courtesy Abel et al.)

Fig. 4. Typical spring reverb impulse response and spectrogram

Fig. 5. (a) Left- and right-going waves are processed separately using delay, dispersion, and attenuation filters. (b) Separate sections of the spring reverb are connected using scattering junctions.
190

be heard, this does not seem to be a problem. The quality of the output is found to be reasonably good, though not as high as that of the best reverberation devices based on feedback-delay networks in the time domain or on convolution in the frequency domain. For this reason they plan further work on aspects such as the control of modal density and the elimination of unwanted perceptual artifacts. An alternative to the above types of digital reverberation, stemming from the early days of audio effects, is the spring reverberator. Springs were originally used because they gave rise to delays between the signal applied to an exciter at one end and a receiver at the other end of a spring. Abel et al., in Spring Reverb Emulation Using Dispersive Allpass Filters in a Waveguide Structure (AES 121st paper 6954), attempt to analyze and emulate the performance of these classic devices using digital waveguide models. They explain that springs are approximately linear and time invariant at typical operating levels for audio systems, so they can be studied by observing their impulse responses. Springs can propagate waves in longitudinal, transverse, or torsional modes (as shown in Fig. 2). Modern devices typically use the torsional mode with two or three independent springs operating in parallel. Sometimes there are multiple elements connected in series, or one part of the spring is wound in the opposite direction to the other, leading to scattering at the junctions. A magnetic driver at one end turns the spring so as to set up a propagating torsional wave through the spring, which is detected by a similar pickup at the other end, as shown in Fig. 3. The impulse response, shown in Fig. 4, tends to show a series of decaying repetitions, but with each reflected impulse having considerable and increasing smearing in the time domain. This appears to be caused by
J. Audio Eng. Soc., Vol. 55, No. 3, 2007 March

Reverberation and Der everberation


have been tried. One of these is known as blind dereverberation, in which the algorithm only has access to the received signal and has no knowledge of the dry signal or acoustical environment giving rise to the reverberation. Huang and Kyriakakis try a novel approach based on blind deconvolution in Blind Dereverberation of Audio Signals Using a Modified Constant Modulus Algorithm (AES 121st paper 6974). A constant modulus algorithm (CMA) is used in conjunction with a linear predictive coding (LPC) filter to dereverberate a monophonic audio signal. One of the problems that they found in their previous work on this topic was that typical constant modulus algorithms assume that the input signal is a statistically-independent and identically-distributed sequence, and that it has a sub-Gaussian distribution with negative kurtosis. (A Gaussian distribution is like the statistical distribution of a white noise signal. Statistical distribution essentially describes the likelihood that a signal will have a certain amplitude. The degree of kurtosis describes the peakedness of the distribution.) Many audio signals, however, are quite the oppositethey have non-white characteristics and have positive kurtosis. For this reason the authors employed LPC analysis. Linear predictive coding attempts to predict the current sample on the basis of a weighted sum of previous samples. The error or residual in this process is the difference between the prediction and the actual value of the sample, which in this case is used in the dereverberation process because the LPC residual values tend to have a whiter statistical characteristic than the audio signal itself, with lower kurtosis. The residual signal is then used as the input to the modified CMA algorithm used for dereverberation. Because of the linear nature of these algorithms under the circumstances employed here, the blind deconvolution filter thereby derived can be applied directly to the reverberant speech signal. This process is shown in Fig. 7. The authors found that this process improved upon the results of a previous study in which they had used a conventional CMA-based method, in that the speech signal tested had less reverberation, but they speculate
191

Fig. 6. Comparison between measured (upper) and modeled (lower) impulse response spectrograms for one particular spring reverb

Fig. 7. Overall system diagram for modified-CMA-based monaural dereverberation (courtesy Huang and Kyriakakis)

low frequencies propagating faster than high frequencies (see the spectrogram in Fig. 4), which tends to turn impulses into a chirp and eventually into a noiselike sound. Little energy seems to propagate above 4 kHz through most of the springs that the authors tested. The models used to emulate spring reverbs, tried by the authors, had a structure based on that shown in Fig. 5, consisting of a number of spring sections connected using scattering junctions. The results were reasonably
J. Audio Eng. Soc., Vol. 55, No. 3, 2007 March

frequency - KHz

frequency - KHz

successful, and good perceptual equivalence is claimed between the models and the original devices they aimed to emulate. One example of the comparison between the spectrograms of measured and modeled devices is shown in Fig. 6, for the Accutronics Type 8. DEREVERBERATION Dereverberation is the identification and removal of reverberation from other wanted audio signals, using digital signal processing. Many different methods

Reverberation and Der everberation


that coloration and nonstationary properties of the speech signal might be two problematic factors. They also note that real room impulse responses are nonminimum phase, which could present difficulties with the type of algorithm employed here. In A Perceptual Measure for Assessing and Removing Reverberation from Audio Signals (AES 120th paper 6702), Zarouchas et al. treat reverberation as a distortion or degradation of an otherwise dry monophonic audio signal. Unlike the method of Huang and Kyriakakis, the process they employ for dereverberation needs access to the source signal and the reverberant version of the same. A computational auditory masking model (CAMM) is employed to model the perception of reverberant signals in a similar way to sound quality measurement approaches such as NMR (noiseto-mask ratio) and PAQM (perceptual audio quality measure) used for measuring the perceptual significance of codec distortions. As shown in Fig. 8, the source and reverberant versions of the signal are both processed by the CAMM, then a decision device makes a subtraction of the distortion due to reverberation on the basis of evaluating the just-noticeable difference between the internal perceptual representations of the two versions. In this way only the perceptually-significant reverberation is removed. The authors report preliminary success with this approach, which appears to be successful at reducing the perceived reverberance of a range of different material. ANALYZING REVERBERATION In two papers from the AES 121st Convention (6928 and 6981), Bitzer and Extra and colleagues present the results of monaural and binaural analysis tools applied to artificial reverberation. They attempted to find physical metrics that could be used to predict the perceived quality of artificial reverberation devices, first of all using traditional monaural measures of the reverberation impulse response. The metrics employed included the energy-decay curve and the energydecay relief, the latter being based on Jots 3-D surface plots of decay time against frequency. In order to split the
192

Fig. 8. Dereverberation using a computational auditory model (CAMM) (Courtesy Zarouchas et al.)

audio frequency range into bands, filters based on third octaves were used, except that Bark-scaled bands were employed below 500 Hz. These were used to derive the reverberation time and the early decay time in different bands. Echo density was also measured as this has also been shown to be an important factor governing the perceived quality of reverberation. The authors also examined short-time histograms of the probability distribution of the decays to determine their whiteness (see above), as this is known to correspond to good-sounding reverberation. The autocorrelation function (the similarity of the signal to itself at different time intervals), as proposed by Griesinger, is a good way of finding out whether there are repetitive features in the reverberant decays, as these tend to result in annoying coloration. Objective clarity and time variance were also measured. The authors used these metrics on a range of free and commercial reverberation devices and plug-ins and found that they were able to observe the degree of complexity in different algorithms. They also conducted a listening test to get some initial information about the sound quality of different reverbs. Initial results suggest that there is a relationship between the objective measures and perceived quality, but these metrics cannot currently replace a listening test. A graininess and metallic quality was observed in the sounds of some reverbs, particularly on drum sounds, which pointed the authors toward future development ideas for metrics. Examining binaural measures in a simular way, Bitzer and Extra attempted to find metrics that would model some of the spatial characteristics of artificial reverberation. IACCbased measures, as well as lateral early decay time (LEDT), room level (early) (RLE), and a time-angle phase scope were employed. The latter two were

found to be useful coarse predictors of the quality of selected reverberation algorithms. RLE is a measure that was introduced by Trautmann and enhanced by Blauert. It is based on an analysis of short time segments of the first 80 ms of the binaural impulse response. It attempts to measure the energy in each segment at a dominant angle of incidence, compared with the frontal energy. The time-angle phase scope is based on the traditional Lissajous figure display sometimes used to analyze the coherence of stereo audio channels, but in this case it is applied to binaural impulse responses separated into 40-ms overlapping blocks. The result is a representation of the energy and direction of the signal over time. Abel and Huang provide further insight into the development of metrics for predicting reverberation quality in A Simple, Robust Measure of Reverberation Echo Density (AES 121st paper 6985). They work on a similar premise to Bitzer and Extra and colleagues, such that the late part of reverberation has a smooth quality, a Gaussian probability distribution, and a high echo density. This is the result of a suitably diffuse, mixed soundfield. Echo density can be measured as a function of time, they assert, by looking at the standard deviation of the impulse response in each time window, counting the number of reflection taps that lie outside this standard deviation, and normalizing by that expected for Gaussian noise. In the early part of the decay where there are relatively few but prominent reflections, the standard deviation is large so a smaller number of reflections are classified as outliers. During the later, diffuse part of the decay there will be a dense pattern of overlapping reflections approximating Gaussian noise. The authors found that a time window of 20 to 30 ms worked well as it is long enough to contain at least a few reflections and short enough
J. Audio Eng. Soc., Vol. 55, No. 3, 2007 March

Reverberation and Der everberation


should have a homogenous distribution in the horizontal plane, in particular, reverberance image directional strength should be high from lateral (90) directions; (3) the new system should not be dispreferred to a conventional 2/0 system. The system he developed uses an adaptive filter and time-delay combination that is able to adjust input signals in both frequency and time before the difference signal between the stereo channel pair is calculated (see Fig. 10). By means of this approach it was possible to upmix a range of different stereo recordings to four channel surround (using two additional loudspeakers at 120), such that diagonally opposite loudspeakers showed almost zero correlation but side pairs had non-zero correlation. His conjecture that this would maximize the spatial fidelity of the front source image while locating the reverberation images to the sides was largely supported by listening tests, and upmixed results were generally preferred to the original two-channel versions. REVERBERATION IN ARTIFICIAL SCENE SYNTHESIS Jot and Trivi take a novel approach to the synthesis of reverberation for interactive virtual environments in Scene Description Model and Rendering Engine for Interactive Virtual Acoustics (AES 120th paper 6660). They try to differentiate between the reverberation and reflection modeling that might be required for interactive virtual acoustics and that required for applications such as architectural acoustics. (The application in ques-

Fig. 9. Echo density profiles of feedback delay network impulse responses at different diffusion settings: 0.2 (red), 0.4 (green), 0.6 (blue), and 1.0 (black), with RT60 = 1.0s and roughly highpass equalization (courtesy Abel and Huang)

to have sufficient time resolution for psychoacoustic purposes. The resulting plots of echo density according to the new measure were found to discriminate well between different diffusion settings of artificial reverberators (see Fig. 9) and be robust to different equalization and decay parameters. It is claimed that since traditional acoustical parameters for measuring reverberation have not included anything that accounts for the temporal structure of reflections, this new approach could be a useful indicator of the time-domain quality or texture of a reverberant signal. REVERBERATION ENHANCEMENT IN UPMIXING Upmixing is the term commonly applied to systems that attempt to derive multichannel stereo signals from two-channel or matrixed multichannel program material. Usher, in A New Upmixer for Enhancement of Reverberance Imagery in Multichannel Loudspeaker Audio Scenes (AES 121st paper 6965), attempts to deal with the problem that most upmixing approaches are designed on the assumption that stereo images are created using amplitude-panned sources.
J. Audio Eng. Soc., Vol. 55, No. 3, 2007 March

Such upmixers tend to look for highly correlated material between left and right channels, assuming this represents front sources, extracting less correlated signals, and using them to drive rear/side loudspeakers, assuming they are diffuse sound or reverberation. However, as Usher points out, some stereophonic recording techniques use spaced microphones or time-delay panning, and these recordings are not handled well by conventional upmixers. He devised three design criteria for a new upmixer: (1) spatial distortion of the source image in the upmixed audio scene should be minimized; (2) reverberance imagery

Fig. 10. Conceptual diagram of Ushers upmixer


193

Reverberation and Der everberation


tion is the Creative EAX environmental audio programming interface.) In particular the authors are keen to develop a means of generating reflections and reverberation that has minimal computational costs, so they aim to reduce the complexity of the algorithms that might be used based on the principle of plausibility. They argue, for example, that when attempting to process large numbers of virtual sound sources, an individual reflection rarely has a critical effect on perception compared to direct path components. They therefore opt to prioritize the allocation of resources toward improving the shared reverberation process and the control of per-source reverberator feeds, only attempting to improve the simulation of early or discrete reflections when necessary and if resources allow. In realistic virtual worlds, they assert, audio accuracy is not as crucial as in some other applications because visual cues often dominate. Furthermore, virtual audio environments are not attempting to simulate existing situations so the requirement becomes one of plausibility rather than accuracy. Auditory cues should be sufficiently valid and believable to support the accompanying visual information. The priority for resources therefore is ordered like this: (1) direct path components; (2) reverberation of listeners environment; (3) reverberation from other environments (such as adjacent spaces); (4) refine early or discrete reflections. For this reason a reverberation system built on a physical model of the scene, with accurate rendering of individual reflections, is out of the question. The authors prefer to contemplate traditional algorithms based on feedback-delay networks or possibly frequency-domain convolution with a measured or synthetic reverb impulse response.
Editors note: The papers reviewed in this article, and all AES papers, can be purchased online at www.aes.org/publications/ preprints/search.cfm and www.aes.org/ journal/search.cfm. AES members also have free access to a large number of past technical review articles such as this one and other tutorials from AES conventions and conferences; go to www.aes.org/tutorials/.
J. Audio Eng. Soc., Vol. 55, No. 3, 2007 March

T HE P RO C E EDINGS th OF THE A E S 29 IN T E R NATIONA L C O N F ER E NC E

A u d i o f o r M o b i l e a nd H and he l d D e v i c e s
The 24 papers in this proceedings address the important areas of research and practical applicationscoding, Class-D amplifiers, implementations, speech processing, 3-D audio, and synthetic audioin the fast-paced, emerging field of audio for mobile and handheld devices. 188 pages. Also available on CD-ROM. Purchase online at www.aes.org/publications/conf.cfm For more information call Donna Vivaro at +1 212 661 8528, ext. 42.

194

Vous aimerez peut-être aussi