Académique Documents
Professionnel Documents
Culture Documents
1
MQA Ltd., Huntingdon, PE29 6YE, UK
2
Algol Applications Ltd., BN44 3QG, UK
On the topic of high-performance audio, there remains disagreement over the ways in which
sound quality might benefit from higher sample-rates or bit-depths in a digital path. Here
we consider the hypothesis that if a digital pathway includes any unintended or undithered
quantizations, then several types of errors are imprinted, whose nature will change with
increased sampling rate and wordsize. Although dither methods for ameliorating quantization
error have been well understood in the literature for some time, these insights are not always
applied in practice. We observe that it can be rare for a performance to be captured, produced,
and played back with a chain “flawless” in this regard. The paper includes an overview of
digital sampling and quantization with additive, subtractive, and noise-shaped dither. The
paper also discusses more advanced topics such as cascaded quantizers, fixed and floating-
point arithmetic, and time-domain aspects of quantization errors. The paper concludes with
guidelines and recommendations, including for the design of listening tests.
1 MODULATION NOISE
4
Sound reproduction systems suffer from commonly mea-
sured technical defects, such as frequency response irreg-
ularities, non-linear waveform distortion, and background 2
noise. While non-linear distortion depends on signal level,
Steps (Δ)
we prefer system noise to be stationary, as a background,
0
independent of the signal.
If an audible noise is stationary, we can perceive it as
a separate “object” in the signal. But there are technical -2
defects that can add so-called “modulation noise,” an error
responsive to the music waveform or envelope.
In analog systems, simple examples include level- -4
1 σ2
Level (dBFS)
0 none 0th moment -96
1 Rectangular 2 σ2 1st moment
2 Triangular 3 σ2 2nd moment
3 Parabolic 4 σ2 3rd moment
– Etc.
-120
→∞ Gaussian n/a
statistically independent of the signal. Fig. 2. Showing spectra for a 1 kHz tone @ –79 dBFS at a sample
Combining multiple independent RPDF streams yields rate of 48 kHz and quantized to 16b with: no dither (orange),
a series of dither types, summarized in Table 1, each with RPDF (violet) or TPDF (green).
unique statistical properties and differing effects on both
measurement instruments and human listeners. An impor-
1 kHz -79 dBFS @ 96 kHz
tant case is the second-order combination, TPDF (triangular 16b quantized
probability distribution function) dither. -72 No dither
RPDF
The literature strongly suggests that TPDF is the pre- TPDF
Noise-shaped
ferred choice for audio—higher order providing no dis- Shaper 'P'
cernible benefit while adding unnecessary noise. The quan- -96 Shaper 'G'
MAFM-shaped TPDF
-108
Subtractive MAFM-shaped
-120
-120
-144
-132
0Hz 5kHz 10kHz 15kHz 20kHz 25kHz -168
0Hz 4kHz 8kHz 12kHz 16kHz 20kHz 24kHz
Fig. 4. Showing a 4 kHz tone @ –110 dBFS, reproduced in a 16-
2 kHz -79 dBFS tone; Fs=96 kHz
bit channel using TPDF dither. Without dither, there is no output Original
for an input signal below –96 dB. -72 16-bit quantisation using:
No dither
TPDF
RPDF
Subtractive TPDF
Shaped TPDF
-96 Subtractive Shaped
Count
5k
cessed in the digital domain, by even the simplest gain-
change operation, the output signal ends up with wider
wordwidth than the input.7
At each step the dither must be adequate. But is TPDF
always the best? And at which level do we apply it? What if
the signal is processed without a large accumulator? What
about systems where the signal is moved from fixed to 0
-1000 -500 0 500 1000
floating-point and back? These thorny topics are tackled Level (Δ)
next and also in Sec. 11.
Fig. 6. Histogram of typical well-behaved CD recording.
access to original equipment and can feed it with test tones, 40k
synchronous averaging can reveal faults more robustly.
3.0k
8k
Count
2.0k
6k
1.5k
4k
1.0k
-200 -150 -100 -50 0 50 100 150 200 -40 -30 -20 -10 0 10 20 30 40
Level (Δ)
Level (D)
Fig. 8. A recording where the level seems to have been reduced
Fig. 10. An example histogram from a recording that reveals
without dither (spike spacing suggests by 0.1 dB).
two more fundamental problems (typical in early A/D convert-
ers) where the MSB and other bits are mismatched, and droop
occurs in a preceding analog sample-and-hold.
2.0k
1.5
0.6
1.0
0.4
0.2 0.5
Mean (Δ)
0.0
0.0
D
-1.5
-0.8 145 150 155 160
4800 4850 4900 Samples
Samples
0.35
0.15
30
Signal (undithered) pletely the combination of gain change and quantizer (the
20
Original maximum deviation of about 0.002 being too small to
1.0 quantizer
0.9 quantizer see on the plot). However, for the two quantizers in cas-
10 Both quantizers
Error x 10 (offset for clarity) cade (with gain change but no dither prior to the second
0 1.0 quantizer quantizer) there is a wavy component in the transfer char-
0.9 quantizer
-10 Both quantizers acteristic; see Fig. 14 (magenta trace).13
-20 The wavy component represents distortion which we re-
Level (Δ)
in each case, the quantization error (multiplied by 10 and 4.6. Can Analog Noise Provide Necessary
offset for clarity). Dither?
Fig. 13 is undithered while Fig. 14 is a simulated syn-
Analog noise generally has a Gaussian probability dis-
chronous average showing the mean (expected) values
tribution and if (as is typical) its rms amplitude is greater
when TPDF dither of pk-pk amplitude 2 is added be-
than one or two steps of a single quantizer then the quanti-
fore the first quantization.
zation of a signal containing the noise will be substantially
The TPDF dither has the correct amplitude to perfectly
linearized, i.e., the mean value will be correct and the MSE
linearize the plain quantizer and to linearize almost com-
will be substantially constant [15].
From this truth has grown the myth that if an equipment
12
A gain 10/9 (≈ + 0.915 dB) is applied, then the signal is is generally fed from an analog source there will be “plenty”
requantized with stepsize , then, for plotting purposes, a gain
of 9/10 (≈ – 0.915 dB) is applied to restore the original signal
13
level. This sequence is equivalent to quantization with a stepsize of The qualitative similarity to Fig. 11 will not have escaped the
0.9 . reader’s attention.
of dither to cover internal quantizations that therefore do By contrast, with TPDF generated by subtracting two
not need to be dithered individually. RPDF random numbers, then the quantizer preferably uses
This reasoning is of course false: it does not consider the a rounding operation to avoid a DC offset.
likely accumulation of washboard distortion. DSP operations are often carried out in fixed-point pro-
cessors. A well-known 24-bit platform can multiply two
4.7. Dither versus Bit-Depth 24-bit numbers, the result ending up in a 56-bit accumula-
As noted earlier, dither should ideally be applied at each tor. After operations are accumulated it will be necessary
significant quantization but unfortunately washboard ef- to quantize to a 24-bit number and it is important that the
fects will not be eliminated if the same dither is applied dither number has enough bits, as shown later in Sec. 11.6,
to each quantization in a chain. For the system to work and is scaled correctly in the 56-bit accumulator.
as intended, each dither should be statistically independent Other fixed-point platforms exist including asymmetric
of the others. Generating statistically independent multibit engines in integrated circuit filtering or gain stages (where
dither streams is, however, computationally intensive and, signal and coefficient word-sizes differ). We have seen pro-
in some cases more expensive than the signal-processing cessors operating with 16, 18, 24, and 32-bit word widths
itself. An alternative is to increase the bit-depth sufficiently and intermediate numbers in silicon implementation.
that quantization distortion is below any plausible thresh- Much of the theory of quantization has been based on
old of audibility. To determine the required bit-depth is integer models. In the last two decades it has become
extremely difficult—at very low levels the distortion, in more common to encounter signal paths and processors
isolation, is not directly audible as such. However, over that use floating-point arithmetic. A significant advantage
many years of working on this topic, we have anecdotes of a floating-point (FP) representation is the large dynamic
from more than one source that a new algorithm or piece of range, so designers do not have to worry about overshoots
equipment has been judged sonically “not quite right” and in internal processing. Another is that quantization artefacts
that, on investigation, an undithered quantization has been tend to reduce with signal level so dithering of internal pro-
found around the 22nd , 23rd or even 24th bit; and that smiles cessing is arguably less necessary.
returned to the listeners’ faces when this was corrected. As examples, 32-bit FP is seen in studio workstations,
If quantizing an analog or other high-precision source, audio desktop software, plug-ins, and intermediate files.
quantization distortion will reduce in amplitude commen- 32-bit FP is also used in computer or “phone” operating
surately with the bit depth, for example, 24 dB lower at 20 systems.
bits than if the quantizer were at the 16th bit. Further, er- As specified by the IEEE 754 standard, a 32-bit float
rors from a single undithered quantizer are likely to be less can represent all the values that a 24-bit or even a 25-bit
correlated with the original audio if quantizing to a higher fixed-point integer format can represent, and more too. It
bit depth, since the quantization grid will then be finer so is therefore tempting to assume that the floating format is
each sample is more likely to arrive at a “random” position “clearly better.” However:
relative to the grid.14
With only 16 bits a reverberation tail from a low-noise • Adding dither within the format is problematic be-
original recording can easily remain in a similar position cause without special hardware some steps may not
relative to the grid for several samples in succession, so a be accessible (well described in [12] and [13]).15
similar quantization error will apply to each sample, result- • Conversion from a 32-bit float back to a 24-bit
ing in a “gritty” sound. fixed16 requires care and understanding and perfec-
With more than one quantizer in the chain, increasing bit tion may be impossible in some architectures.17
depth will reduce the amplitude of quantization artefacts,
but the decorrelation advantage cannot now be guaranteed
because of the likely “washboard” effects. To avoid such 15
A key problem is having access to internal signals during
effects, one may either apply an independent dither prior arithmetic operations. In [13] the authors suggest that custom
to each quantization or increase the bit-depth to the point hardware offers the only “perfect” solution.
where quantization errors become insignificant, but that 16
This operation is inevitably needed to create a distribution file
strategy is inefficient for distribution. or to pass over an interface, including to a converter.
17
To convert from 32-bit float to 24-bit fixed, dither (preferably
4.8. Fixed and Floating-Point Arithmetic TPDF) needs to be added appropriate to the 24-bit stepsize be-
cause, for small signals, the floating resolution is much higher.
The theory of quantization is straightforward in princi- The sum of signal plus dither should ideally not be stored back
ple, but correct implementation typically requires precise to a 32-bit float before rounding to integer, otherwise at high sig-
knowledge of the platform. For example, RPDF where the nal levels the dither is compromised by the coarser quantization,
dither is a random number between 0 and 1 requires a quan- leading to the complications of “Discrete Dither” and potentially
tizer which truncates towards –∞ (to cancel the dither’s DC giving rise to a “23-bit-ism” (c.f., Sec. 11.4). If programming in
offset of 0.5 ). a high-level language it may be hard to know the details of inter-
mediate storage. One may ideally convert to 64 bits first to avoid
these problems. Failing that, an undithered conversion to 32-bit
14
This is the converse of the observation made in the last para- fixed, followed by a dithered rounding to 24 bits is a possible
graph of Sec. 4.5. option.
Variance Estimate (Δ )
2
0.8
Some platforms now use 64-bit FP and there is very much
10 trials RPDF
less risk. In this case the mantissa is 53 bits, so computa- 0.6
tional noise and undithered quantizations can be at a very
0.4
low level in the FP domain. Even so, dither is needed when
transitioning back to a 24-bit fixed-point representation. 0.2
It is possible that some (not all) of the non-linear behavior
analyzed in Fig. 11 manifests inadequate conversions from 0.0
Variance Estimate (Δ )
2
zation may be a bridge between amplitude and temporal
0.8
errors, as we shall illustrate in this section.
10 trials TPDF
Dither is implicitly an averaging process and applies 0.6
neatly to continuous signals; it seems to be effective at
removing errors we hear and measure providing the 1st and 0.4
20
This is distinct from the ERB discussed in Sec. 3.2.
Amplitude (Δ16)
96 kHz
48 kHz -72dB
-120
Bin Level (dBFS)
-0.8
The top and bottom groups illustrate typical delta-sigma
A/D and D/A converters (expanded from Fig. 1 in [4]).
-0.6
These days professional A/D converters normally use dither
-0.4 for internal processing. Unfortunately, the same cannot be
-0.2 said of all IC D/A converters; sometimes the use of dither is
a configuration option, in others it is simply omitted (even
0.0
Δ
-120
errors within plug-ins (such as under-sampling or “16-in-
24b” quantization).
The resulting export (file) goes to distribution and may
arrive without transcoding. If it is to make an optical disk,
-144 we need to be sure the pressing plant doesn’t compress
the audio or drop the level (to “be on the safe side”)—as
analyzed in Sec. 4.1.22
0Hz 4kHz 8kHz 12kHz 16kHz 20kHz 24kHz
The penultimate group includes processes commonly
seen during playback. Cost, convenience, expediency, in-
Fig. 19. (Upper) shows the error signal when the signal of Fig. 18 flexibility of operating systems, bandwidth restrictions in
encounters an undithered 16-bit quantizer and (lower), its spec- domestic wireless distribution, etc., all conspire to increase
trum.
the number of signal processing steps.
In summary, between the high-speed modulators (in red)
we have highlighted 19 stages where the signal can be
manipulated. If just one has an undithered quantization,
Analog In
Modulator
1 - 4 bits
Decimator
(filter)
Decimator
(filter)
HPF
(opon) 24b PCM then modulation noise and temporal blur can appear. What
are the chances of purity?
Typical ADC
Fundamental Limits
Hearing theshold (tones)
Hearing Threshold (UEN)
Microphone limit
-96
96 96kHz 17b 'violet' shaper
Coding channels (Gain=108 dBspl) 192/24
24 Hz -90 dB test in workstation CD 44.1kHz 16b
72 96kHz 24b
Original 192kHz 24b
+3 dB gain 1x shaper in Fig. 5 (upper)
-120 -3 dB gain Observed Limits
48 Min Recording Noise
Bin Level (dBFS)
dB SPL
Export using forced TPDF dither 24 Shaper P
-3 dB with 24-bit Shaper G
-144 -3 dB with 23-bit
0
CD Channel
-24
-168
-48
Thermal limit 96/24
-72
-192
at lower levels—yet we are not suggesting that the error
0Hz 24kHz 48kHz 72kHz 96kHz signals in Fig. 21 would be detectable in isolation—even
though it can have audible consequences on content. This
Fig. 21. Showing FFT analysis of the test signals used to probe a
quantization error in a workstation. is a situation that requires further understanding of auditory
grouping mechanisms and further confirming experiments.
still has “grainy structure” in the noisefloor. Only in the 23-
bit case were listeners happy that the sound had not been
degraded by the simple mastering process.23 8 HIGH-RESOLUTION
This was a useful episode because the problem was dis-
In Sec. 0 we speculated that if a chain has defective quan-
covered by relaxed listening after mastering and the content
tizations in any part of a digital path, then the resulting errors
was not especially low noise, the background spectrum be-
(manifesting as distortion and/or modulation noise) change
ing equivalent to 14 bits (see Fig. 8 in [4]).
with the inter-related variables of sampling rate, modula-
tion index, and wordsize. And therefore, might confound
7 AUDIBILITY ANALYSIS experiments to determine, in isolation, the significance of
higher sample rates or increased word-widths.
Psychoacoustic modeling will indicate a high probability
For example, when we increase the sample rate from
that the quantization distortion and noise shown in Figs. 2,
48 to 96 kHz or to 192 kHz are we responding to finer
3, and 5 will be detectable at modest playback gains—the
temporal resolution (including lower noise modulation with
noise exceeding the normal threshold. There are appropriate
faster convergence—Sec. 5.1)? Is it that quantization noise
examples in [27] and [33].
lowers 3 dB with each doubling, or that some quantization
Fig. 22 plots the data of Fig. 3 adjusted for a playback
errors move outside the conventional “audio band” (see Sec.
gain of 108 dBSPL, alongside some fundamental limits
3)? Are we responding to shorter filter chains and hence
and channel capacities. We can see that at this gain, the
lower computational noise within A/D and D/A converters
quantization products are predicted to be audible based on
(Secs. 5.3 and 6)?
the simple monaural psychoacoustic model. Higher gain
For many questions on high resolution, factors tend to be
increases the likelihood and vice-versa.
inter-related and enquiry needs careful framing and inter-
But we encounter several situations where mastering en-
pretation of results.
gineers or equipment designers are reacting to distortions
24
Note that the thermal limit can be emulated by a 17.5-bit
23
This story is related because we suspect the least bias happens noise-shaper in a 96 kHz channel (or 17 bits at 192 kHz). Fig. 22
in an unwitting experiment—no one expected a problem. shows a 96-kHz/17-bit example (appropriately in violet)!
8.1 Designing Listening Tests Sec. 4.4 demonstrated the ability of TPDF dither to avoid
The topic of high-resolution listening tests has proven to modulation noise, while in Sec. 4.5 we advocated the use of
be very challenging, for example, to compare sample rates. independent dithers for each significant quantization within
It is difficult to isolate a single parameter (e.g., Fs) and an equipment and illustrated the “washboard” distortion
avoid inadvertently incorporating other variables associated that can result if this is not done.
with the reconfiguration of signal paths at different rates or Sec. 4.6 challenged the widespread assumption that ana-
changes in quantization. log noise can generally be relied upon to dither a complete
Sec. 13.7 includes some references including the meta- system.
study by Reiss [41] and one in which great pains were taken Sec. 4.7 considered the tactic of increasing bit-depth as
to isolate parameters [42]. a possibly more cost-effective alternative to adding dither
Moving forward, it seems logical that to design a lis- within an equipment (but not for a distribution file). In
tening experiment with the minimum risk of accidental Sec. 4.8 we pointed out that, while 64-bit floating-point
variables, we would do better with a very short signal path arithmetic can be helpful, the 32-bit equivalent does not
from microphone to loudspeaker, passing through only the necessarily solve all quantization problems.
variable under test. In Sec. 5 we examined quantization from the time-
Fig. 20 reminds us how difficult it can be to “clean the domain perspective, reminding us that errors in a quantized
pipe” in everyday music distribution—the problem is much system evolve over time and that analysis often assumes
more acute for a test. an averaging process. We showed an existence region for
Without clear proof from measurement, we can’t arbitrar- modulation noise and how it varies with sample rate, noise-
ily pass a signal through any “black box” such as a work- shaping, additive or subtractive dither.
station, or between fixed and floating-point representation, In Sec. 5.2 we illustrated how signals that move very
nor can we select a converter without a clear understanding slowly with respect to an undithered quantization grid can
of its internal signal path. introduce tonal errors at much higher frequencies—adding
So, the test setup should be carefully measured in the test so-called “birdies.”
configuration. We can use long-duration signals and SA to Sec. 5.3 illustrated how an undithered quantizer encoun-
check for non-linearities and autocorrelation methods to tering a filtered signal can add a noise error proportional to
document the system impulse response. Until we have a the length of the filter kernel, at least in principle impacting
more complete understanding of the human hearing pro- both amplitude and time domains.
cess, we should take care to avoid all known errors without Sec. 6 illustrated the complexity of a typical record-
expedient or pre-conceived exclusions. ing/reproducing chain and its many opportunities for
For a simple test of sample rate discrimination, maybe we incorrectly-dithered quantization. As an example, we re-
should select converters that are discrete, or implemented in ported how an audible problem with a highly-regarded pro-
FPGA, just so that each processing step can be forensically fessional workstation (when used with its default settings)
examined before investing in the biggest cost of a listening was resolved by adding TPDF dither at the 23rd bit.
test, which is the listeners’ time? Sec. 7 examined the theoretical audible consequences of
a measured spectrum from a 16-bit undithered quantization,
concluding that further work is needed to explain why some
9 TUTORIAL SUMMARY listeners apparently react to distortions that should not be
audible according to simplistic criteria.
In Sec. 1 we pointed out that modulation noise can Sec. 8 returned to the topic of high resolution and argued
be insidious. It may not always be directly audible as for listening tests to be designed with greater care to isolate
such, but experience suggests it can modify our percep- some variables than has been typical in the past, especially
tion by subtly changing the way perceived objects separate considering that the effect of a quantization will be different
from each other or from the background. The paper high- at different sample rates.
lights several points in a digital chain where this defect
can arise.
Sec. 2 laid the foundations for discussion of quantization 10 CONCLUDING REMARKS
artifacts and in Sec. 3 we showed how by using dither,
such artifacts can be rendered substantially independent of 10.1 A Gentle Art?
the input signal, leading to zero measured distortion and It wasn’t whimsy that titled this paper “The Gentle Art
also the ability to resolve signals “below the LSB.” We of Dithering.” Practitioners of high-resolution techniques
indicated that subtractive dither can achieve this advantage know that valued sonic attributes of “high resolution” in-
without incurring a noise penalty and how spectral shaping clude transparency, sound separation, the reproduction of
can further reduce the audibility of quantization noise. space and acoustic in the performance, etc. Many of the
Sec. 4 covered some advanced dither topics. We showed sounds that are “lost” with lower resolution don’t prevent
how histogram analysis can sometimes reveal problems in us following the song; they are low level yet contribute to
an existing recording, how “synchronous averaging” can the interested listener’s enjoyment or engagement.
be used to characterize a complete equipment and we high- It is well established from experience that many times
lighted the difference between these two investigative tools. we can identify an improvement in signal processing, not
10.2 CONCLUSION Fig. 23. RPDF requantization of a 17-bit signal to 16 bits using a
discrete binary dither. Upper: an original 17-bit sample lies on an
Despite having been understood for over three decades, in even 17-bit grid point. Lower: sample lies on an odd grid point.
practice we too often see a careless and expedient approach Red arrows show the changes in sample values from dithering;
to dithering multibit PCM. Unless dither is used correctly at black arrows show the further changes from quantization.
each stage in a digital recording and playback, transparency
will suffer.
It is often asked: “Why should I add noise to my record-
11 APPENDIX: DISCRETE DITHER
ing” or, “How can adding noise make things clearer?” This
paper gives a tour through these questions as well as aspects Up to now we have assumed that RPDF and TPDF
of time-domain and highlighting the need for care if a sig- dithers are available as continuous distributions. Now we
nal is moved within or between integer and floating-point ask whether there is compromise in using digital dither
representations. samples having only a finite number of dither bits.
By illustrating “washboard” distortion we can be re- It may seem “obvious” that if we have a signal already
minded that the strategy of “hoping the errors will all be quantized to 24 bits and we requantize to 16, there is no
covered by the microphone noise” can come unstuck down- point in generating dither that extends beyond the 24th bit.
stream. That is, in fact, correct for RPDF dither, but we shall show
The paper reminds us that bigger numbers are not al- that TPDF performance is compromised very slightly; more
ways better—a 16-bit delivery where dither is used imagi- severely as the number of bits shaved off is reduced. Ac-
natively can easily carry modern recordings without impact- cordingly, we shall start by considering the limiting case
ing noisefloor (see Fig. 8 in [4]) and may even be preferred where the input wordlength is to be reduced by just one bit.
to one using 24-bits but where implementation had assumed
that “the errors were too low to hear.” Fig. 22 and footnote
11.1 Continuous and Discrete RPDF
24 remind us that a 96 kHz 18-bit channel can encode down
Quantization
to the thermal noise of our planet’s atmosphere.
We hope a few common misconceptions or regrettable For definiteness we consider requantization of a 17-bit
decisions have been pointed out.25 signal to 16 bits. In Fig. 23, horizontal axes represent signal
Dither should not be looked on as an added noise but an amplitude. The lower axis is labeled (pink) in quanta of the
essential lubricant. If noise-floor is a problem, then noise- original 17-bit signal and the upper axis is labeled (black)
shaping—or even better, subtractive dither—provide the in quanta of the 16-bit quantized signal. The pink rectan-
perfect solutions. gles show the probability distributions that would result if
We hope that, in some small way, this paper can remind continuous RPDF dither were added to the original sample
us all to be vigilant. shown as a red blob. In contrast, a 1-bit discrete dither will
oscillate randomly between two values (black open circles)
separated by half a 16-bit quantum.
After quantization by a rounding quantizer, an even 17-
25 bit value (upper diagram) is mapped to the same value
Including: “I don’t need to add dither because there is so
much noise coming in from the microphone”; “I don’t need to
as itself while an odd 17-bit input sample is mapped to a
add dither because my incoming and outgoing signals are both random choice between the two nearest 16-bit grid points.
24-bit.”; “Dithering-down is only used for reducing wordsize”; This scheme preserves expected value (the first moment
“We can’t add dither to our D/A (SRC, A/D) chip because if of error is zero) while minimizing the mean-square error
we do it will reduce the dynamic-range specification and not be (MSE, aka second moment of error). However, the resulting
selected.” MSE is not constant, being zero for even 17-bit sample
Fig. 24. Akin to Fig. 23 but using a ternary dither with probabilities Fig. 25. Akin to Fig. 24 but avoiding decision thresholds.
given by a discretized TPDF. The grey arrows show ambiguous
quantization when the dithered sample is on a decision threshold
(dotted vertical lines). 11.3 Avoiding Decision Thresholds
With continuous dither, it is not important how the sub-
sequent rounding (or truncation) operation will behave at a
values but 1/4 2 for odd values where refers to the step quantization decision threshold because the probability of a
size of the 16-bit grid. This is similar to the situation with dithered sample landing precisely on a threshold is vanish-
continuous dither, cf., Fig. 12 (lower). ingly small. This is not the case with discrete dither, and we
need either to avoid thresholds or to consider the threshold
behavior carefully. In particular, the “ideal” random up or
11.2 TPDF Dither down quantization of threshold values will not be obtained
TPDF dither attempts to balance the MSEs from even and by default.
odd 17-bit samples and so to eliminate modulation noise. A random up/down choice can be forced by adding a
The pink triangles in Fig. 24 show the PDFs that would smaller binary perturbation as shown in Fig. 25.27
result from adding continuous TPDF dither to the signal.
The open black circles are the dithered values resulting 11.4 Rounding Modes
from adding discrete ternary dither taking the 17-bit values Common rounding modes are:
(–1, 0, 1) with probabilities (1/4 , 1/2, 1/4 ). Thus, the central
dithered value has twice the probability of each of the two
outliers, irrespective of whether the 17-bit input is even or r Truncate towards zero (not recommended)
odd. r Truncate towards ±∞ (floor(), ceil())
The new phenomenon is that some of the dithered values r Round ties28 consistently (up or down)
now lie on decision thresholds, shown as vertical dotted r Round ties to even (IEEE 754)
lines.
We show firstly an ideal case where threshold values are “Truncation towards zero” must be avoided but unfortu-
quantized up or down with equal probability, as shown by nately it can happen unintentionally because some program-
the grey arrows in Fig. 24. The probabilities of quantiz- ming languages silently do it when a floating-point value is
ing the 17-bit value “0” to the 16-bit values (–1, 0, 1) are assigned to an integer variable.29 If decision thresholds have
(1/8 , 3/4 , 1/8 ) respectively26 , from which we deduce an abso- been avoided (or will occur with negligible probability) then
lute error of with probability 1/4 and zero with probability the remaining modes are all fine regarding distortion and
3/ , leading to an MSE of 1/ 2 .
4 4 modulation noise.
For odd 17-bit input values the quantized result is a ran-
dom choice between the two nearest 16-bit values, so the 27
This combination of TPDF and binary dither can be imple-
absolute error is always 1/2 so the MSE is again 1/4 2 .
mented as the sum of a binary dither at the 17th bit and a two-bit
The MSE is thus independent of signal and there is no
RPDF dither occupying the 17th and 18th bit positions. Six per-
modulation noise. turbed values (black open circles) are thus generated with proba-
bilities proportional to (1, 1, 2, 2, 1, 1): (1, 0, 1) ⊗ (1, 1, 1, 1) =
(1, 1, 2, 2, 1, 1).
28
“Ties” refers to values on a decision threshold.
26 29
The same probabilities are obtained using continuous dither. Truncation towards zero creates a kink in the transfer function
They correspond to the areas of the pink triangle bounded by the akin to classic crossover distortion. It affects all sample values,
decision thresholds in Fig. 24 (upper). not just those on a decision threshold.
Fig. 26. Using floor in place of round in Fig. 24. Fig. 27. Use of a two-bit rectangular dither to achieve the same
result as Fig. 26.
[36] S. R. Norsworthy, R. Schrier, and G. C. Temes [48] A. S. Bregman, Auditory Scene Analysis: The Per-
(eds.), Delta-Sigma Data Converters: Theory, Design and ceptual Organization of Sound (The MIT Press, 1990).
Simulation (IEEE Press, 1997). [49] E. Zwicker, and H. Fastl, Psychoacoustics, Facts
[37] https://en.wikipedia.org/wiki/Washboard_ and Models (Springer, Berlin, 1990), vol. 22.
(musical_instrument)
[38] https://en.wikipedia.org/wiki/Moir%C3%A9_
pattern 14.2. Natural sounds and ethology
[39] https://en.wikipedia.org/wiki/IEEE_754#
[50] M. S. Lewicki “Efficient Coding of Natural
Rounding_rules
Sounds,” Nature Neurosci., vol. 5, pp. 356–363 (2002).
[40] https://en.wikipedia.org/wiki/Quantization_(signal_
DOI:http://dx.doi.org/10.1038/nn831
processing)#Mid-riser_and_mid-tread_uniform_
[51] E. C. Bluvas and T. Q. Gentner, “Attention to Natu-
quantizers
ral Auditory Signals,” Hearing Research, vol. 305, pp. 10–
18 (2013). https://doi.org/10.1016/j.heares.2013.08.007
13.7. Listening tests and evaluation [52] E. C. Smith and M. S. Lewicki, “Efficient Audi-
tory Coding,” Nature, vol. 439, pp. 978–982 (2006 Feb.).
[41] J. D. Reiss, “A Meta-Analysis of High Resolu-
https://doi.org/10.1038/nature04485
tion Audio Perceptual Evaluation,” J. Audio Eng. Soc.,
vol. 64, pp. 364–379 (2016 Jun.). https://doi.org/10.17743/
jaes.2016.0015 14.3. Auditory neuroscience
[42] H. M. Jackson, M. D. Capp, and J. R. Stu-
art, “The Audibility of Typical Digital Audio Filters [53] W. A. Yost et al., “Auditory Perception of Sound
in a High-Fidelity Playback System,” presented at the Sources,” in Springer Handbook of Auditory Research,
137th Convention of the Audio Engineering Society vol. 29 (Springer Science+Business Media, 2008).
(2014 Oct.), convention paper 9174. http://www.aes.org/e- [54] J. Schnupp et al., Auditory Neuroscience: Making
lib/browse.cfm?elib=17497 Sense of Sound (MIT Press, 2011).
[43] A. Pras and C. Guastavino, “Sampling Rate Dis- [55] A. Rees and A. R. Palmer (eds.), The Oxford Hand-
crimination: 44.1 kHz vs. 88.2 kHz,” presented at the book of Auditory Science: The Auditory Brain, vol. 2 (OUP,
128th Convention of the Audio Engineering Society 2010).
(2010 May), convention paper 8101. http://www.aes.org/e- [56] D. Oertel et al., “Integrative Functions in the Mam-
lib/browse.cfm?elib = 15398 malian Auditory Pathway,” in Springer Handbook of Audi-
[44] T. Nishiguchi and K. Hamasaki, “Differences tory Research, vol.15 (Springer Verlag, 2002).
of Hearing Impressions among Several High Sam- [57] F. Rieke et al., Spikes: Exploring the Neural Code
pling Digital Recording Formats,” presented at the (MIT Press, 1997).
118th Convention of the Audio Engineering Society
(2005 May), convention paper 6469. http://www.aes.org/e-
lib/browse.cfm?elib=13185 14.4. Audio temporal discrimination
[45] W. Woszczyk, “Physical and Perceptual Consid-
erations for High-Resolution Audio,” presented at the [58] T. Deneux et al., “Temporal Asymmetries in Au-
115th Convention of the Audio Engineering Society ditory Coding and Perception Reflect Multi-Layered Non-
(2003 Oct.), convention paper 5931. http://www.aes.org/e- linearities,” Nature Communications, (2016 Sep.). DOI:
lib/browse.cfm?elib = 12372 https://doi.org/10.1038/ncomms12682
[46] S. Yoshikawa, S. Noge, M. Ohsu, S. Toyama, H. [59] D. A. Abrams, “Population Responses in Pri-
Yanagawa, and T. Yamamoto, “Sound Quality Evalua- mary Auditory Cortex Simultaneously Represent the
tion of 96-kHz Sampling Digital Audio,” presented at Temporal Envelope and Periodicity Features in Natural
the 99th Convention of the Audio Engineering Society Speech,” Hearing Research, vol. 348, pp. 31–43 (2017).
(1995 Oct.), convention paper 4112. http://www.aes.org/e- https://doi.org/10.1016/j.heares.2017.02.010
lib/browse.cfm?elib=7654 [60] K. Krumbholz, R. D. Patterson, et al., “Microsecond
Temporal Resolution in Monaural Hearing without Spectral
Cues,” J. Acoust. Soc. Amer., vol. 113, no. 5, 2790–2800
14. BIBLIOGRAPHY (2003). https://doi.org/10.1121/1.1547438
14.1. Human auditory perception [61] P. Heil and H. Neubauer, “A Unifying Basis
for Auditory Thresholds Based on Temporal Summa-
[47] C. J. Plack (ed.), The Oxford Handbook of Auditory tion,” PNAS, vol. 100, no. 10, pp. 6151–6156 (2003).
Science: Hearing, vol. 3 (OUP, 2010). https://doi.org/10.1073/pnas.1030017100
THE AUTHORS
J. Robert (Bob) Stuart was born in 1948. He studied Bob joined AES in 1971, has been a fellow since 1992,
electronic engineering and acoustics at the University and is a member of ASA, IEEE, and the Hearing Group at
of Birmingham and took an M.Sc. in operations re- Cambridge.
search at Imperial College, London. While at Birming- Bob has a deep interest in music and spends a good deal
ham he studied psychoacoustics under Professor Jack of time listening to live and recorded material.
Allison, which began a lifelong fascination with the •
subject. Peter Craven was born in 1948. He studied maths and
In 1977 he co-founded Meridian Audio and served as then astrophysics at Oxford University while collaborating
CTO until early 2015. In 2014 he founded MQA Ltd. where with Michael Gerzon and others at the Oxford University
he is currently full time as Chairman and CTO. Tape Recording Society on making live recordings and on
At the request of Hiro Negishi and Raymond Cooke, audio quality topics generally, including the ideas (1972)
Bob chaired the advocacy group Acoustic Renaissance for leading to the Ambisonic Soundfield Microphone.
Audio between 1994 and 2002. After some years in research and academia, Peter became
In the 1990s he worked with Michael Gerzon and Peter independent in 1981, specializing in digital signal process-
Craven on lossless compression and was instrumental in its ing for audio. Further collaborations with Gerzon resulted
adoption for optical discs. in a seminal paper on noise-shaping in 1989. Work on room
Bob has contributed to DVD-Audio and BluRay equalization with B&W loudspeakers was followed by col-
standards and has served on the technical committees laboration with Bob Stuart on noise shaping and buried
of the National Sound Archive, JAS, and the ADA data, leading eventually to the MLP lossless compression
(Japan). system and, more recently, to the MQA music delivery
Bob’s professional interests are the furthering of ana- system.
log and digital audio and developing understanding of A collector of pre-war coarse-groove 78 r.p.m. records,
human auditory perception mechanisms relevant to live Peter Craven seeks a synergy between modern high defi-
and recorded music. His specialties include the au- nition multichannel technology and the simpler recording
ditory sciences and the design of analog and digi- techniques of yesteryear.
tal electronics, loudspeakers, audio coding, and signal Bob and Peter have worked together on audio topics
processing. since they first met in 1975.