Vous êtes sur la page 1sur 6

H D T V

Audio
In the December ‘07 issue, we examined the various ways to hook
up pieces of your home entertainment system to your HDTV. We
specifically focused on the different video interfaces. We’ll continue
now with the choices for passing audio from one device to another.

by Jeff Mazur

O
nce again, the most common the ham operator who lives next and also known as TOSLINK — uses 1
connection by far is the standard door!). To solve this issue — as well as mm fiber terminated in a 5 mm
analog stereo pair using RCA complete the total conversion to binary connector. While earlier cables were
jacks and cables. With good quality 1s and 0s — there are three basic ways restricted to less than 15 feet, you can
cable and connectors, this method can to pass audio signals digitally between now buy high quality TOSLINK cables
provide excellent results. The most devices: coax, optical, and HDMI. up to 100 feet in length. TOSLINK can
common issue with analog audio con- carry data signals of up to 125 Mbits/s,
nections is its susceptibility to picking S/PDIF (Sony/Philips which allows for three audio channels.
up hum and/or other extraneous Digital Interconnect However, it is usually used to carry a
signals, especially from components Format) single pair of stereo audio signals.
within your system (or perhaps from As an electrical signal, S/PDIF is
Named after the two companies represented by a roughly 1V digital
Figures 2-4 are courtesy of Wikipedia, the that developed this interface, S/PDIF pulse train using Biphase Mark Code
free encyclopedia (licensed to the public is a means to carry audio between (BMC) to carry the audio data. While
under the GNU Free Documentation
License). devices in a digital format. The signals no specific sampling rate or bit depth
can be carried over is specified in the standard, audio is
standard 75 ohm coaxial usually carried as either 48 kHz (DAT)
cable using RCA jacks or 44.1 kHz (CD) data with either 20
(or BNC connectors in or 24 bit samples. We’ll describe the
professional equipment) actual data format in a moment.
or via optical fiber
(glass or plastic, usually HDMI
terminated with F05
connectors). See Figure 1. We’ve already discussed the
The optical connec- HDMI interface that can carry digital
tion — created by Toshiba video between devices. HDMI also
includes support for up to eight
FIGURE 1. Digital audio channels of uncompressed digital
connections (top, coax
and bottom, optical). audio at a 192 kHz sample rate with a
60 February 2008
24 bits/sample, as well as compressed values. Under ideal conditions, it also (PCM). This approach simply takes the
streams such as Dolby Digital, or DTS. represents the maximum signal to noise output from an Analog-to-Digital
HDMI also supports one-bit audio, ratio (SNR), which is related to the num- Converter (ADC) and places the bits
such as that used on Super Audio CDs ber of bits by the following formula: into a continuous bitstream.
at rates up to 11.3 MHz. With version Figure 2 shows a sine wave (in
1.3, HDMI now also supports lossless SNR = 20 log 2N = approx (6 x N) dB red) that is sampled and quantized
compressed streams such as Dolby using simple PCM. At each sample
TrueHD and DTS-HD Master Audio. where N = number of bits. point, the digital representation of the
signal’s analog value is sampled and
Digital Audio Basic For example, a 20-bit converter then held until the next sample point.
theoretically could obtain an SNR of This produces an approximation of the
Digital audio connections can be 120 dB (if there are no other sources original signal, which is easily encoded
used to connect various components of noise). In practice, the maximum as digital data. For example, if the sine
of your home entertainment system signal level is usually reduced by 20 wave in Figure 2 is quantized into 16
such as from a cable or satellite STB dB of headroom to prevent clipping. values (i.e., four bits), we would
(Set Top Box) to the TV. Since audio is This still leaves an SNR of approxi- generate the following data samples:
transmitted digitally in the ATSC DTV mately 100 dB. In comparison, normal 1001, 1011, 1100, 1101, 1110, 1110,
signal, this will often be the best audio tape typically only achieves an 1111, 1111, 1111, 1110, etc.
choice. Other components (e.g., a CD SNR of about 60 dB. We could transmit these PCM
player) also handle audio natively in a As you can see, digitizing an ana- samples as four-bit parallel data with a
digital form. However, devices that log signal is all about compromise. You separate clock signal to indicate when
handle audio as an analog signal — need to sample at a high enough rate each sample was taken. This is cumber-
including the equipment used to so as not to miss changes in the signal some, however, and requires the use of
record or create TV audio at its source that occur between the samples. And multi-conductor cables. Most data
— must first convert the analog signal we need enough bits to represent each transmission today is done in a serial
to digital. This process is known as sample so that the difference between fashion. This requires that each bit of
digitizing and is a good place to start the actual analog value and its closest the PCM sample be clocked out onto a
when discussing digital audio. digital representation (a.k.a., quantiza- single serial data line. At the receiving
To digitize an analog signal, we tion error) is not very much. Of course, end of this data stream, a shift register
basically perform two separate increasing either of these values means will convert the serial data back into
functions. First, the signal is sampled that there will be more digital data that parallel data words. To keep the receiv-
at regular intervals to determine its needs to be carried and processed. er in sync with the transmitter, some
value at each discrete point in time. On the positive side, once a signal form of clock recovery is necessary.
This is usually the function of a has been digitized it can be transmit- One of the easiest ways to do this
sample-and-hold circuit. Next, each ted much more efficiently and without is to make sure that the serial data
sample is quantized, or converted many of the side effects of noise and changes polarities at least once during
from an analog voltage to a particular distortion present in the communica- each bit-time. This is the basis for sever-
digital representation of that value. tion channel used. More importantly, al different coding schemes, including
The sampling rate determines it can be compressed digitally so that Biphase Mark Code (BMC) — the sig-
what frequencies can be carried redundant and/or unessential data naling method used by both TOSLINK
digitally; information theory tells us can be discarded. This is one of the and the professional digital audio for-
that only frequencies below one-half main reasons that our TV signals are mat established by, and referred to as,
of the sampling frequency (also undergoing the transition to digital. AES/EBU (Audio Engineering Society
referred to as the Nyquist frequency)
can be represented accurately. Signals PCM
above this limit will cause extraneous
frequencies (i.e., distortion) to appear There are many
due to an effect known as aliasing. ways to represent
In other words, we need at least each sample as a dig-
two samples per cycle of the highest ital signal. The most
frequency we wish to digitize. common technique
The quantization of each sample is known as Pulse-
determines how many bits will be used Code Modulation
to represent each sample. The more
bits, the higher the precision will be of FIGURE 2. Analog-
each sample. This translates into the to-digital conversion
dynamic range of a signal, or the differ- of a signal using
Pulse Code
ence between its lowest and highest Modulation (PCM).
February 2008 61
FIGURE 3. Serialization of rate, compression, emphasis modes.
digital data using Biphase
Mark Coding (BMC).
• Byte 1: Indicates if the audio
nized to a common 27 stream is stereo, mono, or some other
MHz timebase. Even so, combination.
a frame of NTSC video
has a duration of: • Byte 2: Audio word length.

1 / 29.97 = 33.366… ms • Byte 3: Used only for multichannel


and the European Broadcasting Union). applications.
With BMC, the data stream At 48 kHz, an audio frame has a
changes value at the beginning of duration of: • Byte 4: Suitability of the signal as a
each data bit. A logic 1 is represented sampling rate reference.
by having the stream change value 1 / 48,000 = 20.833… µs
again during the middle of its bit time; • Byte 5: Reserved.
it does not change for a logic 0 (see This makes a complete audio block
Figure 3). BMC coding provides easy 192 x 20.833 = 3,999.4 µs. The number • Bytes 6–9 and 10–13: Two slots of
synchronization since there is at least of audio samples per video frame, four bytes each for transmitting ASCII
one change in polarity for every bit. however, is not an integer number: characters.
Also, the polarity of the actual signal is
not important since information is 33366 / 20.833 = 1601.6 audio • Bytes 14–17: Four-byte/32-bit sam-
conveyed by the number of transitions samples/video frame ple address, incrementing every frame.
of the data signal.
Another advantage of BMC is that Because of this, it takes a total of • Bytes 18–21: As above, but in
the average DC value of the data five video frames before an even time-of-day format (numbered from
stream is zero, thus reducing the neces- number of audio samples corresponds midnight).
sary transmitting power and minimizing to an even number of video frames
the amount of electromagnetic noise (8,008 audio samples per five video • Byte 22: Contains information about
produced by the transmission line. All frames). Some video frames are given the reliability of the audio block.
these positive aspects are achieved at 1,602 samples while others are only
the expense of using a symbol rate that given 1,601. This relationship is • Byte 23: CRC (Cyclic Redundancy
is double the actual data rate. detailed in Figure 4. Check) for error detection. The absence
Each audio frame consists of two of this byte implies interruption of the
Transmission Protocol subframes: one for each of the two data stream before the end of the audio
discrete audio channels. Furthermore, block, which is therefore ignored.
S/PDIF and its professional cousin, as shown in Figure 4, each subframe
AES/EBU, were designed primarily contains 32 bits — 20 audio sample AC-3
to support two channels of PCM bits plus 12 extra bits of metadata.
encoded audio at 48 kHz (or possibly There is a single Channel Status As previously mentioned, raw PCM
44.1 kHz) with 20 bits per sample. bit in each subframe, making 192 bits data would require a large bandwidth to
Sixteen-bit data is handled by setting per channel in every audio block. This transmit. For surround sound, this would
the unused bits to zero; 24-bit data can means that there are 192 / 8 = 24 require approximately six channels x 48
be achieved by using four auxiliary bits bytes available in each block for samples/s x 20 bits = 5.7 Mb/s. With
to expand the data samples. The higher level metadata. In S/PDIF, the appropriate compression, however, this
low-level protocol used by both S/PDIF first six bits are organized into a control can be reduced to 384 Kb/s.
and AES/EBU is the same, with the code. The meaning of these bits is: Dolby Digital — officially known
exception of a single Channel Status bit. as AC-3 (Adaptive Transform Coder 3)
To create a digital stream, we bit if 0 if 1 — is the compression scheme used to
0 Consumer Professional
break the continuous audio data into 1 Normal Compressed data transmit audio within the ATSC DTV
smaller packets or blocks. Each block 2 Copy Prohibit Copy Permitted data stream. It can represent up to five
is further divided into 192 frames. 3 Two Channels Four Channels full bandwidth (20 Hz-20 kHz)
4 — —
Note, however, that these frames have 5 No Pre-emphasis channels of surround sound (Right
nothing to do with frames of video. In pre-emphasis Front, Center, Left Front, Right Rear,
fact, when digital audio is combined and Left Rear), along with one low fre-
with digital video signals, there are a In AES/EBU, the 24 bytes are used quency channel (20 Hz–120 Hz) for
number of steps that must be taken to as follows: subwoofer driven effects. This is often
make them compatible. First off, both referred to as 5.1 surround sound.
digitizing clocks must be synchro- • Byte 0: Basic control data — sample A complete description of the
62 February 2008
Figure 4
Bits 0 to 3 bits at the start of each sub-frame.
These do not actually carry any • V (28) Validity bit: It is set to zero if
data but they facilitate clock recovery Bits 4 to 7 the audio sample word data are cor-
and subframe identification. They are These bits can carry auxiliary rect and suitable for D/A conversion.
not BMC encoded so they are unique information such as a low-quality Otherwise, the receiving equipment is
in the data stream and they are easier auxiliary audio channel for producer instructed to mute its output during
to recognize, but they don’t represent talkback or studio-to-studio communi- the presence of defective samples.
real bits. Their structure minimizes the cation. Alternately, they can be used to It is used by players when they have
DC component on the transmission enlarge the audio word length to 24 problems reading a sample.
line. Three preambles are possible: bits, although the devices at either
end of the link must be able to use this • U (29) User bit: Any kind of data such
X (or M): 11100010 if previous state non-standard format. as running time, song, track number,
was “0;” 00011101 if it was “1.” etc. One bit per audio channel per
Bits 8 to 27 frame form a serial data stream.
Y (or W): 11100100 if previous state These bits carry the 20 bits of
was “0;” 00011011 if it was “1.” audio information starting with LSB • C (30) Channel status bit: Its structure
and ending with MSB. If the source depends on whether AES/EBU or
Z (or B): 11101000 if previous state was provides fewer than 20 bits, the S/PDIF is used (see text).
“0;” 00010111 if it was “1.” unused LSBs will be set to a logical
“0” (for example, for the 16-bit audio • P (31) Parity bit: For error detection.
They are called X, Y, Z from the read from CDs, bits 8-11 are set to 0). A parity bit is provided to permit
AES standard; M, W, B from the IEC the detection of an odd number of
958 (an AES extension). The eight-bit Bits 28 to 31 errors resulting from malfunctions in
preambles are transmitted in the same These bits carry associated status the interface. If set, it indicates an
time allocated to four (BMC encoded) bits as follows: even parity.

AC-3 standard and its use in ATSC


transmission is quite complex and Video Frame 1 Video Frame 2 Video Frame 3 Video Frame 4 Video Frame 5
33.37 ms 33.37 ms 33.37 ms 33.37 ms 33.37 ms
beyond the scope of this article. You
can download the entire ATSC audio
standards document (A/52B) using the
Audio Data 1 Audio Data 2 Audio Data 3 Audio Data 4 Audio Data 5
link given under Further Info. However, 1602 samples 1602 samples 1601 samples 1602 samples 1601 samples
there are however some interesting
details worth mentioning here.

ATSC Audio Details 192 samples 192 samples .....


Unlike analog NTSC, audio does Audio Block 1 Audio Block 2 Audio Block 3 Audio Block 4
not take a backseat to video in ATSC. .....
3999.4 µS 3999.4 µS 3999.4 µS 3999.4 µS
Quite a bit of the standard is devoted to
how sound will be delivered to the
viewer. We’ve already seen how 5.1
surround sound can be transmitted with
each DTV channel. Other parameters in
the audio metadata can be used to
Frame 1 Frame 2 Frame 3 ..... Frame 192
enhance the viewing experience. One of 20.83 µS 20.83 µS 20.83 µS 20.83 µS
these parameters is known as dialnorm.
The purpose of dialnorm is to
equalize the sound levels when
changing from one program to anoth-
er. The value of this parameter —
which is embedded within the audio
stream — is meant to indicate the level Subframe A Subframe B
32 bits 32 bits
of average spoken dialog within the
complete audio program. This is then
used to control the decoder compres- Bit descriptions

FIGURE 4. Packetization of data


in digital audio streams.
February 2008 63
sion gain within the HDTV receiver. Table 5.7 of the A/52 document (see ments similar to the Emergency Alert
If set properly, it will maintain a Table 1). System (EAS). Whenever an E service
consistent dialog level between pro- A complete main (CM) channel signal is present, it will automatically
gram elements and when changing represents the main audio service mute and/or replace the normal audio
from one channel to another, hence the with dialog, music, and effects. This is channels with the E channel audio.
abbreviation of “dialog normalization.” the normal audio program which can The voice over (VO) and karaoke
The dialnorm parameter ranges in be monaural (one channel), stereo services allow an additional channel to
integer values from 31 (where decoder (two channel), or surround sound (5.1 be added to an existing AC-3 stream
gain remains at unity) to a value of one channel) where available. A music and without requiring the audio to be
(where decoder gain is reduced by 30 effects channel (ME) contains only decoded (i.e., uncompressed) back to
dB). Unfortunately, many producers those respective portions of the audio, baseband PCM audio data, mixed, and
and broadcasters currently do not without dialog. This would be useful then re-encoded. Local stations could
provide a proper dialnorm value in when supplying a program in multiple use this to add their own audio tags to
their programs. This is partly due to the languages; the single ME service programming supplied by their network.
complexity and variability of actually would be combined with various
measuring the dialog level properly. other streams containing only a dialog Lip Sync
Thus, you may still find wildly varying (D) service for each language.
levels between channels. The visually impaired (VI) service Because audio and video are
is designed to allow a separate audio processed separately by various circuits
Other Audio Services channel to contain a narrative descrip- which can delay the signals significant-
tion of the program content. Also ly, special attention is needed to keep
The ATSC standard also provides known as video described, this aids a these parts of a presentation in sync.
for alternate audio channels by allow- person who is blind or otherwise When they drift apart past a certain
ing multiple AC-3 elementary streams visually impaired to comprehend what threshold, the discrepancy becomes
within the full transport stream. As is happening on the screen. Likewise, very noticeable and objectionable.
such, each alternate audio channel the hearing impaired (HI) service is Technically called audio/video
can have up to 5.1 channels of its own provided to aid those with slight hear- sync, this quality is often referred to as
to provide a complete audio service. It ing loss. Unlike captioning, which can lip sync (not to be confused with a
is also possible for the alternate audio provide audio content for those who Milli Vanilli performance). A/V sync
to consist of a single channel intended are completely deaf, the HI service is errors are becoming a significant prob-
to be combined with other channels designed to provide more intelligible lem in the digital television industry
from a different stream (although not audio by processing (compressing) because of the use of large amounts of
all HDTVs are capable of this). the dialog channel and emphasizing it video signal processing in television
One obvious use for an alternate over the music and effects. production and broadcasting and fixed
audio channel would be to convey the While the dialog service contains pixel, progressive television displays
dialog in a different language, much actual program dialog from the speak- such as Plasma, LCD, and DLP sets.
like the SAP (Secondary Audio ing actors, an additional commentary Studies have shown that “When
Programming) service, currently avail- (C) service can be added to provide audio precedes video by five video
able on NTSC channels. Because there further information. This is like many fields (83 ms), viewers evaluate
can be any number of audio streams, DVDs which offer a special audio track people on television more negatively
this would allow multiple languages to to provide director’s or actor’s com- (e.g., less interesting, more unpleas-
be transmitted at the same time. ments while you watch their movie. ant, less influential, more agitated, less
The ATSC standard also identifies The emergency (E) service is a successful). Viewers can accurately
several types of audio signals that can special, high priority channel which tell when a television segment is in
be transmitted. These are specified in can be used to convey vital announce- perfect sync, and when it is five fields
out of sync.” See the Reeves and
Voelker reference in the sidebar.
Table 1. Bit Stream Modes Furthermore, there is a larger toler-
bsmod acmod Type of Service ance for audio that is delayed in com-
000 Any Main audio service: Complete main (CM) parison to the video. This is a phenom-
001 Any Main audio service: Music and effects (ME) enon that we are all used to when we
010 Any Associated service: Visually impaired (VI) watch a fireworks display or, to a larger
011 Any Associated service: Hearing impaired (HI) degree, an electrical storm. We see the
100 Any Associated service: Dialog (D) effect before we hear it. Of course, this
101 Any Associated service: Commentary (C) is due to a totally different reason: the
110 Any Associated service: Emergency (E) difference in velocity between light and
111 001 Associated service: Voice over (VO) sound waves. But if you’ve ever had to
111 010 - 111 Main audio service: Karaoke watch a program with significant A/V
64 February 2008
FIGURE 5. Lip sync
adjustment on an HDTV.
sync error, you know how
annoying it can be.
Good engineering
practices specify that the
audio should never lead
the video by more than 15
milliseconds or lag by
more than 45 milliseconds.
To keep the audio and
video signals in sync,
Presentation Time Stamps
(PTS) are added to the
transport stream packets.
This allows the MPEG decoder in the native resolution of the display device)
receiver to re-assemble the packets and correction for progressive display
correctly and keep the audio and of interlaced sources (de-interlacing
video (and captions, etc.) in sync. and 3:2 pull-down removal). They can
When the audio and video pack- also perform image enhancement to
ets are multiplexed together, they can reduce specific artifacts of the display
be sent up to one second apart. (e.g., Sony’s Digital Reality Creation).
Fortunately, most of the other delays in Some of these processes add
the transport stream affect audio and considerable delay, especially when
video together. However, if you con- they need to examine multiple video
sider the delays encountered in encod- fields to perform their function. This can
ing, buffering, multiplexing, transmis- cause noticeable A/V sync errors. Some
sion, demultiplexing, decoder buffer- HDTVs now have user adjustments to
ing, decoding, and presentation, there compensate for this (see Figure 5). NV
can be over five seconds of delay
between the broadcast input and your
TV display. You can easily see this by Glossary of Useful Terms
switching between one of your local ATSC Advanced Television System
station’s analog and digital channels. Committee — The organization
Even if the receiver in an HDTV and name of the digital television
decodes a perfectly synchronized standard adopted in the US.
signal, there still can be a difference in
DTV — Digital Television
the picture and sound when viewed.
This is because TVs now have lots of DAT — Digital Audio Tape
computing power and use it to
enhance HD, as well as SD pictures. HDMI: High-Definition Multimedia
They have large video buffers and DSP Interface — A method of connecting
(Digital Signal Processing) chips to components using a single cable that
perform resolution changes (mapping carries digital video signals along
with multichannel digital audio.
the incoming video resolution to the
HDTV: High Definition TeleVision —
Further Info Part of the new Digital Television
standards, those formats that have
Digital Audio Compression either 720 or 1080 lines of vertical
Standard (AC-3, E-AC-3) Revision B resolution.
www.atsc.org/standards/a_52b.pdf
MPEG :Motion Picture Experts Group
“Effects of Audio-Video — Standard for transmitting
Asynchrony on Viewer’s Memory, compressed audio and video.
Evaluation of Content
and Detection Ability” NTSC: National Television System
by Reeves and Voelker Committee — The organization
www.lipfix.com/file/doc/reeves_ and name of the analog television
and_voelker_paper.pdf standard currently used in the US.

February 2008 65

Vous aimerez peut-être aussi