Académique Documents
Professionnel Documents
Culture Documents
Dereverberation Dereverberation
By Staff Technical Writer
n this article we summarize recent AES convention papers dealing with reverberation and its artificial generation, analysis, and enhancement. How can some of the characteristics of reverberation be measured in a perceptually relevant way? How can reverberation be removed successfully from other wanted audio signals? It is also important to consider ways in which reverberation may be implemented in platforms designed for interactive virtual environments, where computational resources may be shared with visual processing and numerous changing sources have to be processed in real time.
ARTIFICIAL REVERBERATION Although artificial reverberation processors have been in existence for many years, research is still continuing into more efficient and better sounding algorithms. Vickers et al., in Frequency Domain Artificial Reverberation Using Spectral Magnitude Decay (AES 121st paper 6926), explore the concept of producing artificial reverberation in the frequency domain. Most extant algorithms, they point out, are either based on feedback-delay networks in the time domain or on convolution in the frequency domain. The former have a relatively low computational cost and provide control over some perceptually relevant parameters; but it is expensive to implement multiband equalizers in the feedback paths, so any control over frequency-dependent decay tends to be limited to a few bands. On the other hand, those based on convolution in the frequency domain are very effective at simulating specific physical spaces, but it is more difficult to control individual parameters, and such systems are computationally expensive for long decay times. In theory, the method described by the authors would allow for detailed control over the frequency-dependent decay time and require less memory than feedback-delay networks. The authors use a technique inspired by the phase vocoder, shown in Fig. 1, which essentially consists of a time-toJ. Audio Eng. Soc., Vol. 55, No. 3, 2007 March
frequency transform based on the shorttime fourier transform (STFT), followed by modifications to the phase and magnitude components of the frequency spectrum, followed by an inverse transform and reconstruction of the time-domain signal. They investigate a range of techniques for generating reverberation and time-freezing effects, mainly based on the accumulation of successive frames of spectral magnitude and phase information and the successive attenuation of the magnitude components over time. They find that the processing of phase information in successive frames during the decay is crucial to the generation of a perceptually natural reverberation. For this reason they have to generate an artificial phase signal that can be combined with the the accumulated magnitude response, such that the reverbs impulse response resembles noise with an exponential decay. Some of the issues to be overcome here include establishing the right tradeoff between phase coherence and randomization, as well as the avoidance of roughness and periodicity in the decay structure. The authors find that an algorithm with phase randomization applied at the output works quite effectively although the echo density does not necessarily increase with time. However, they argue that since late reverberation is normally said to begin where individual reflections cannot
x
windows
fft(fftshift())
X
Rectangular to Polar
mag
phase
Modifications
Polar to Rectangular
Y
fftshift(real(ifft))
windows
overlap-add
y
Fig. 1. Phase vocoder (courtesy Vickers et al.)
189
Fig. 2. Spring propagation modes: (a) transverse, (b) longitudinal, (c) torsional. (Figs. 26 courtesy Abel et al.)
Fig. 5. (a) Left- and right-going waves are processed separately using delay, dispersion, and attenuation filters. (b) Separate sections of the spring reverb are connected using scattering junctions.
190
be heard, this does not seem to be a problem. The quality of the output is found to be reasonably good, though not as high as that of the best reverberation devices based on feedback-delay networks in the time domain or on convolution in the frequency domain. For this reason they plan further work on aspects such as the control of modal density and the elimination of unwanted perceptual artifacts. An alternative to the above types of digital reverberation, stemming from the early days of audio effects, is the spring reverberator. Springs were originally used because they gave rise to delays between the signal applied to an exciter at one end and a receiver at the other end of a spring. Abel et al., in Spring Reverb Emulation Using Dispersive Allpass Filters in a Waveguide Structure (AES 121st paper 6954), attempt to analyze and emulate the performance of these classic devices using digital waveguide models. They explain that springs are approximately linear and time invariant at typical operating levels for audio systems, so they can be studied by observing their impulse responses. Springs can propagate waves in longitudinal, transverse, or torsional modes (as shown in Fig. 2). Modern devices typically use the torsional mode with two or three independent springs operating in parallel. Sometimes there are multiple elements connected in series, or one part of the spring is wound in the opposite direction to the other, leading to scattering at the junctions. A magnetic driver at one end turns the spring so as to set up a propagating torsional wave through the spring, which is detected by a similar pickup at the other end, as shown in Fig. 3. The impulse response, shown in Fig. 4, tends to show a series of decaying repetitions, but with each reflected impulse having considerable and increasing smearing in the time domain. This appears to be caused by
J. Audio Eng. Soc., Vol. 55, No. 3, 2007 March
Fig. 6. Comparison between measured (upper) and modeled (lower) impulse response spectrograms for one particular spring reverb
Fig. 7. Overall system diagram for modified-CMA-based monaural dereverberation (courtesy Huang and Kyriakakis)
low frequencies propagating faster than high frequencies (see the spectrogram in Fig. 4), which tends to turn impulses into a chirp and eventually into a noiselike sound. Little energy seems to propagate above 4 kHz through most of the springs that the authors tested. The models used to emulate spring reverbs, tried by the authors, had a structure based on that shown in Fig. 5, consisting of a number of spring sections connected using scattering junctions. The results were reasonably
J. Audio Eng. Soc., Vol. 55, No. 3, 2007 March
frequency - KHz
frequency - KHz
successful, and good perceptual equivalence is claimed between the models and the original devices they aimed to emulate. One example of the comparison between the spectrograms of measured and modeled devices is shown in Fig. 6, for the Accutronics Type 8. DEREVERBERATION Dereverberation is the identification and removal of reverberation from other wanted audio signals, using digital signal processing. Many different methods
Fig. 8. Dereverberation using a computational auditory model (CAMM) (Courtesy Zarouchas et al.)
audio frequency range into bands, filters based on third octaves were used, except that Bark-scaled bands were employed below 500 Hz. These were used to derive the reverberation time and the early decay time in different bands. Echo density was also measured as this has also been shown to be an important factor governing the perceived quality of reverberation. The authors also examined short-time histograms of the probability distribution of the decays to determine their whiteness (see above), as this is known to correspond to good-sounding reverberation. The autocorrelation function (the similarity of the signal to itself at different time intervals), as proposed by Griesinger, is a good way of finding out whether there are repetitive features in the reverberant decays, as these tend to result in annoying coloration. Objective clarity and time variance were also measured. The authors used these metrics on a range of free and commercial reverberation devices and plug-ins and found that they were able to observe the degree of complexity in different algorithms. They also conducted a listening test to get some initial information about the sound quality of different reverbs. Initial results suggest that there is a relationship between the objective measures and perceived quality, but these metrics cannot currently replace a listening test. A graininess and metallic quality was observed in the sounds of some reverbs, particularly on drum sounds, which pointed the authors toward future development ideas for metrics. Examining binaural measures in a simular way, Bitzer and Extra attempted to find metrics that would model some of the spatial characteristics of artificial reverberation. IACCbased measures, as well as lateral early decay time (LEDT), room level (early) (RLE), and a time-angle phase scope were employed. The latter two were
found to be useful coarse predictors of the quality of selected reverberation algorithms. RLE is a measure that was introduced by Trautmann and enhanced by Blauert. It is based on an analysis of short time segments of the first 80 ms of the binaural impulse response. It attempts to measure the energy in each segment at a dominant angle of incidence, compared with the frontal energy. The time-angle phase scope is based on the traditional Lissajous figure display sometimes used to analyze the coherence of stereo audio channels, but in this case it is applied to binaural impulse responses separated into 40-ms overlapping blocks. The result is a representation of the energy and direction of the signal over time. Abel and Huang provide further insight into the development of metrics for predicting reverberation quality in A Simple, Robust Measure of Reverberation Echo Density (AES 121st paper 6985). They work on a similar premise to Bitzer and Extra and colleagues, such that the late part of reverberation has a smooth quality, a Gaussian probability distribution, and a high echo density. This is the result of a suitably diffuse, mixed soundfield. Echo density can be measured as a function of time, they assert, by looking at the standard deviation of the impulse response in each time window, counting the number of reflection taps that lie outside this standard deviation, and normalizing by that expected for Gaussian noise. In the early part of the decay where there are relatively few but prominent reflections, the standard deviation is large so a smaller number of reflections are classified as outliers. During the later, diffuse part of the decay there will be a dense pattern of overlapping reflections approximating Gaussian noise. The authors found that a time window of 20 to 30 ms worked well as it is long enough to contain at least a few reflections and short enough
J. Audio Eng. Soc., Vol. 55, No. 3, 2007 March
Fig. 9. Echo density profiles of feedback delay network impulse responses at different diffusion settings: 0.2 (red), 0.4 (green), 0.6 (blue), and 1.0 (black), with RT60 = 1.0s and roughly highpass equalization (courtesy Abel and Huang)
to have sufficient time resolution for psychoacoustic purposes. The resulting plots of echo density according to the new measure were found to discriminate well between different diffusion settings of artificial reverberators (see Fig. 9) and be robust to different equalization and decay parameters. It is claimed that since traditional acoustical parameters for measuring reverberation have not included anything that accounts for the temporal structure of reflections, this new approach could be a useful indicator of the time-domain quality or texture of a reverberant signal. REVERBERATION ENHANCEMENT IN UPMIXING Upmixing is the term commonly applied to systems that attempt to derive multichannel stereo signals from two-channel or matrixed multichannel program material. Usher, in A New Upmixer for Enhancement of Reverberance Imagery in Multichannel Loudspeaker Audio Scenes (AES 121st paper 6965), attempts to deal with the problem that most upmixing approaches are designed on the assumption that stereo images are created using amplitude-panned sources.
J. Audio Eng. Soc., Vol. 55, No. 3, 2007 March
Such upmixers tend to look for highly correlated material between left and right channels, assuming this represents front sources, extracting less correlated signals, and using them to drive rear/side loudspeakers, assuming they are diffuse sound or reverberation. However, as Usher points out, some stereophonic recording techniques use spaced microphones or time-delay panning, and these recordings are not handled well by conventional upmixers. He devised three design criteria for a new upmixer: (1) spatial distortion of the source image in the upmixed audio scene should be minimized; (2) reverberance imagery
A u d i o f o r M o b i l e a nd H and he l d D e v i c e s
The 24 papers in this proceedings address the important areas of research and practical applicationscoding, Class-D amplifiers, implementations, speech processing, 3-D audio, and synthetic audioin the fast-paced, emerging field of audio for mobile and handheld devices. 188 pages. Also available on CD-ROM. Purchase online at www.aes.org/publications/conf.cfm For more information call Donna Vivaro at +1 212 661 8528, ext. 42.
194