Vous êtes sur la page 1sur 14

NES APU Sound Hardware Reference

--------------------------------
This reference covers Nintendo Entertainment System (NES) sound hardware in as
much detail as I know. It is intended primarily to assist in the implementation
of emulators and might also be useful as a programmer reference.
Tables, diagrams, and formulas are formatted for a mono-spaced font, like
Courier.
The latest version is kept at http://www.slack.net/~ant/nes-emu/apu_ref.txt

-----
Intro
-----
This reference is based on the results of tests I have run on a 1988-model US
NTSC NES which contains the G revision of the 2A03 CPU/APU and the NES-CPU-07
version of the main board. PAL hardware will be covered once tests are
performed on it. Feel free to incorporate this information in references and
other documentation.
While implementing a NES sound emulator, even after reading the available
documentation I still had many unanswered questions, so I made a simple
development cartridge to test on a real NES. This was very successful and
revealed many new details. My notes consisted of differences from existing
documentation, but this didn't seem to be a very reliable way to release my
findings, so I decided to write a concise reference.
For those familiar with NESSOUND.TXT and DMC.TXT, the following differences
should be specifically noted:
Corrections:
- DMC table entry $D should be $2A0 instead of $2A8
- Frame sequencer
- Square's duty generator
Clarifications:
- DMC
- Triangle's linear counter
- Length Counter operation and status register behavior
It should go without saying that the model presented here probably doesn't
match the actual logic gate arrangement in the NES. It makes no difference how
the hardware is implemented, as long as its behavior matches what is described
here
Corrections, questions and additions are welcome. I keep up with the forum at
http://nesdev.parodius.com/ and can be contacted via blargg at the mail.com
domain.

-----
To Do
-----
- See if comprehensive emulator test ROM is even practical.
- Probe complete power-up state and reset state.
- Test PAL hardware to determine APU frame rate, DMC and noise period tables.
- Check complete behavior of each unit of each channel to be sure common units
behave the same for all channels.
- Double-check details on NES hardware again.
- Determine post-DAC filtering done before output.

--------
Overview
--------
The APU is composed of five channels: square 1, square 2, triangle, noise,
delta modulation channel (DMC). Each has a variable-rate timer clocking a
waveform generator, and various modulators driven by low-frequency clocks from
a frame sequencer. The DMC plays samples while the other channels play
waveforms. The waveform channels have duration control, some have a volume
envelope unit, and a couple have a frequency sweep unit.
Square 1/Square 2
$4000/4 ddle nnnn duty, loop env/disable length, env disable, vol/env
period
$4001/5 eppp nsss enable sweep, period, negative, shift
$4002/6 pppp pppp period low
$4003/7 llll lppp length index, period high
Triangle
$4008 clll llll control, linear counter load
$400A pppp pppp period low
$400B llll lppp length index, period high
Noise
$400C --le nnnn loop env/disable length, env disable, vol/env period
$400E s--- pppp short mode, period index
$400F llll l--- length index
DMC
$4010 il-- ffff IRQ enable, loop, frequency index
$4011 -ddd dddd DAC
$4012 aaaa aaaa sample address
$4013 llll llll sample length
Common
$4015 ---d nt21 length ctr enable: DMC, noise, triangle, pulse 2, 1
$4017 fd-- ---- 5-frame cycle, disable frame interrupt
Status (read)
$4015 if-d nt21 DMC IRQ, frame IRQ, length counter statuses

------
Basics
------
Hexadecimal values are prefixed by a $ except for some single-hex-digit
sequences where it's clear that they are hex. Bits are numbered from 0 to 7,
corresponding with the least to most significant bits of a byte; bit n has a
binary weight of 2^n.
A flag is a two-state variable that can be either set or clear. When
implemented in a bit, clear = 0 and set = 1.
A divider outputs a clock every n input clocks, where n is the divider's
period. It contains a counter which is decremented on the arrival of each
clock. When it reaches 0, it is reloaded with the period and an output clock is
generated. Resetting a divider reloads its counter without generating an output
clock. Changing a divider's period doesn't affect its current count.
A sequencer generates a series of values or events based on the repetition of a
series of steps, starting with the first. When clocked the next step of the
sequence is generated.
In the block diagrams, the triangular symbol is a control gate; if control is
non-zero, the input is passed unchanged to the output, otherwise the output is
0.
control
|
v
|\
in -->| >-- out
|/

Except for the status register, all other registers are write-only. The "value
of the register" refers to the last value written to the register.
The NTSC NES has a master clock based on a 21.47727 MHz crystal which is
divided by 12 to obtain a ~1.79 MHz clock source. Both clocks are used by the
APU.
The CPU's IRQ line is level-sensitive, so the APU's interrupt flags must be
cleared once a CPU IRQ is acknowledged, otherwise the CPU will immediately be
interrupt again once its inhibit flag is cleared.
In general, the APU is a collection of many independent units which are always
running in parallel. Modification of a channel's parameter usually affects only
one sub-unit and doesn't take effect until that unit's next internal cycle
begins.
Each section begins with an overview and an optional block diagram, which
provide a framework for the information that follows. In order to reduce
ambiguity, there is very little re-statement of information.

---------------
Frame Sequencer
---------------
The frame sequencer contains a divider and a sequencer which clocks various
units.
The divider generates an output clock rate of just under 240 Hz, and appears to
be derived by dividing the 21.47727 MHz system clock by 89490. The sequencer is
clocked by the divider's output.
On a write to $4017, the divider and sequencer are reset, then the sequencer is
configured. Two sequences are available, and frame IRQ generation can be
disabled.
mi-- ---- mode, IRQ disable
If the mode flag is clear, the 4-step sequence is selected, otherwise the
5-step sequence is selected and the sequencer is immediately clocked once.
f = set interrupt flag
l = clock length counters and sweep units
e = clock envelopes and triangle's linear counter
mode 0: 4-step effective rate (approx)
---------------------------------------
- - - f 60 Hz
- l - l 120 Hz
e e e e 240 Hz
mode 1: 5-step effective rate (approx)
---------------------------------------
- - - - - (interrupt flag never set)
l - l - - 96 Hz
e e e e - 192 Hz
At any time if the interrupt flag is set and the IRQ disable is clear, the
CPU's IRQ line is asserted.

--------------
Length Counter
--------------
A length counter allows automatic duration control. Counting can be halted and
the counter can be disabled by clearing the appropriate bit in the status
register, which immediately sets the counter to 0 and keeps it there.
The halt flag is in the channel's first register. For the square and noise
channels, it is bit 5, and for the triangle, bit 7:
--h- ---- halt (noise and square channels)
h--- ---- halt (triangle channel)
Note that the bit position for the halt flag is also mapped to another flag in
the Length Counter (noise and square) or Linear Counter (triangle).
Unless disabled, a write the channel's fourth register immediately reloads the
counter with the value from a lookup table, based on the index formed by the
upper 5 bits:
iiii i--- length index
bits bit 3
7-4 0 1
-------
0 $0A $FE
1 $14 $02
2 $28 $04
3 $50 $06
4 $A0 $08
5 $3C $0A
6 $0E $0C
7 $1A $0E
8 $0C $10
9 $18 $12
A $30 $14
B $60 $16
C $C0 $18
D $48 $1A
E $10 $1C
F $20 $1E
See the clarifications section for a possible explanation for the values left
column of the table.
When clocked by the frame sequencer, if the halt flag is clear and the counter
is non-zero, it is decremented.

---------------
Status Register
---------------
The status register at $4015 allows control and query of the channels' length
counters, and query of the DMC and frame interrupts. It is the only register
which can also be read.
When $4015 is read, the status of the channels' length counters and bytes
remaining in the current DMC sample, and interrupt flags are returned.
Afterwards the Frame Sequencer's frame interrupt flag is cleared.
if-d nt21
IRQ from DMC
frame interrupt
DMC sample bytes remaining > 0
triangle length counter > 0
square 2 length counter > 0
square 1 length counter > 0
When $4015 is written to, the channels' length counter enable flags are set,
the DMC is possibly started or stopped, and the DMC's IRQ occurred flag is
cleared.
---d nt21 DMC, noise, triangle, square 2, square 1
If d is set and the DMC's DMA reader has no more sample bytes to fetch, the DMC
sample is restarted. If d is clear then the DMA reader's sample bytes remaining
is set to 0.

------------------
Envelope Generator
------------------
An envelope generator can generate a constant volume or a saw envelope with
optional looping. It contains a divider and a counter.
A channel's first register controls the envelope:
--ld nnnn loop, disable, n
Note that the bit position for the loop flag is also mapped to a flag in the
Length Counter.
The divider's period is set to n + 1.
When clocked by the frame sequencer, one of two actions occurs: if there was a
write to the fourth channel register since the last clock, the counter is set
to 15 and the divider is reset, otherwise the divider is clocked.
When the divider outputs a clock, one of two actions occurs: if loop is set and
counter is zero, it is set to 15, otherwise if counter is non-zero, it is
decremented.
When disable is set, the channel's volume is n, otherwise it is the value in
the counter. Unless overridden by some other condition, the channel's DAC
receives the channel's volume value.

-----
Timer
-----
All channels use a timer which is a divider driven by the ~1.79 MHz clock.
The noise channel and DMC use lookup tables to set the timer's period. For the
square and triangle channels, the third and fourth registers form an 11-bit
value and the divider's period is set to this value *plus one*.
llll llll low 8 bits of period (third register)
---- -hhh upper 3 bits of period (fourth register)

----------
Sweep Unit
----------
The sweep unit can adjust a square channel's period periodically. It contains a
divider and a shifter.
A channel's second register configures the sweep unit:
eppp nsss enable, period, negate, shift
The divider's period is set to p + 1.
The shifter continuously calculates a result based on the channel's period. The
channel's period (from the third and fourth registers) is first shifted right
by s bits. If negate is set, the shifted value's bits are inverted, and on the
second square channel, the inverted value is incremented by 1. The resulting
value is added with the channel's current period, yielding the final result.
When the sweep unit is clocked, the divider is *first* clocked and then if
there was a write to the sweep register since the last sweep clock, the divider
is reset.
When the channel's period is less than 8 or the result of the shifter is
greater than $7FF, the channel's DAC receives 0 and the sweep unit doesn't
change the channel's period. Otherwise, if the sweep unit is enabled and the
shift count is greater than 0, when the divider outputs a clock, the channel's
period in the third and fourth registers are updated with the result of the
shifter.

--------------
Square Channel
--------------
+---------+ +---------+
| Sweep |--->|Timer / 2|
+---------+ +---------+
| |
| v
| +---------+ +---------+
| |Sequencer| | Length |
| +---------+ +---------+
| | |
v v v
+---------+ |\ |\ |\ +---------+
|Envelope |------->| >----------->| >----------->| >-------->| DAC |
+---------+ |/ |/ |/ +---------+
There are two square channels beginning at registers $4000 and $4004. Each
contains the following: Envelope Generator, Sweep Unit, Timer with
divide-by-two on the output, 8-step sequencer, Length Counter.
$4000/$4004: duty, envelope
$4001/$4005: sweep unit
$4002/$4006: period low
$4003/$4007: reload length counter, period high
In addition to the envelope, the first register controls the duty cycle of the
square wave, without resetting the position of the sequencer:
dd-- ---- duty cycle select
d waveform sequence
---------------------
_ 1
0 - ------ 0 (12.5%)
__ 1
1 - ----- 0 (25%)
____ 1
2 - --- 0 (50%)
_ _____ 1
3 -- 0 (25% negated)

When the fourth register is written to, the sequencer is restarted.


The sequencer is clocked by the divided timer output.
When the sequencer output is low, the DAC receives 0.

--------------
Linear Counter
--------------
The Linear Counter serves as a second more-accurate duration counter for the
triangle channel. It contains a counter and an internal halt flag.
Register $4008 contains a control flag and reload value:
crrr rrrr control flag, reload value
Note that the bit position for the control flag is also mapped to a flag in the
Length Counter.
When register $400B is written to, the halt flag is set.
When clocked by the frame sequencer, the following actions occur in order:
1) If halt flag is set, set counter to reload value, otherwise if counter
is non-zero, decrement it.
2) If control flag is clear, clear halt flag.

----------------
Triangle Channel
----------------
+---------+ +---------+
|LinearCtr| | Length |
+---------+ +---------+
| |
v v
+---------+ |\ |\ +---------+ +---------+
| Timer |------->| >----------->| >------->|Sequencer|--->| DAC |
+---------+ |/ |/ +---------+ +---------+
The triangle channel contains the following: Timer, 32-step sequencer, Length
Counter, Linear Counter, 4-bit DAC.
$4008: length counter disable, linear counter
$400A: period low
$400B: length counter reload, period high
When the timer generates a clock and the Length Counter and Linear Counter both
have a non-zero count, the sequencer is clocked.
The sequencer feeds the following repeating 32-step sequence to the DAC:
F E D C B A 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 A B C D E F
At the lowest two periods ($400B = 0 and $400A = 0 or 1), the resulting
frequency is so high that the DAC effectively outputs a value half way between
7 and 8.
-------------
Noise Channel
-------------
+---------+ +---------+ +---------+
| Timer |--->| Random | | Length |
+---------+ +---------+ +---------+
| |
v v
+---------+ |\ |\ +---------+
|Envelope |------->| >----------->| >------->| DAC |
+---------+ |/ |/ +---------+
The noise channel starts at register $400C and contains the following: Length
Counter, Envelope Generator, Timer, 15-bit right shift register with feedback,
4-bit DAC.
$400C: envelope
$400E: mode, period
$400F: reload length counter
Register $400E sets the random generator mode and timer period based on a 4-bit
index into a period table:
m--- iiii mode, period index
i timer period
----------------
0 $004
1 $008
2 $010
3 $020
4 $040
5 $060
6 $080
7 $0A0
8 $0CA
9 $0FE
A $17C
B $1FC
C $2FA
D $3F8
E $7F2
F $FE4
The shift register is clocked by the timer and the vacated bit 14 is filled
with the exclusive-OR of *pre-shifted* bits 0 and 1 (mode = 0) or bits 0 and 6
(mode = 1), resulting in 32767-bit and 93-bit sequences, respectively.
When bit 0 of the shift register is set, the DAC receives 0.
On power-up, the shift register is loaded with the value 1.

------------------------------
Delta Modulation Channel (DMC)
------------------------------
+----------+ +---------+
|DMA Reader| | Timer |
+----------+ +---------+
| |
| v
+----------+ +---------+ +---------+ +---------+
| Buffer |----| Output |---->| Counter |---->| DAC |
+----------+ +---------+ +---------+ +---------+
The DMC can output samples composed of 1-bit deltas and its DAC can be directly
changed. It contains the following: DMA reader, interrupt flag, sample buffer,
Timer, output unit, 7-bit counter tied to 7-bit DAC.
$4010: mode, frequency
$4011: DAC
$4012: address
$4013: length
On power-up, the DAC counter contains 0.
Register $4010 sets the interrupt enable, loop, and timer period. If the new
interrupt enabled status is clear, the interrupt flag is cleared.
il-- ffff interrupt enabled, loop, frequency index
f period
----------
0 $1AC
1 $17C
2 $154
3 $140
4 $11E
5 $0FE
6 $0E2
7 $0D6
8 $0BE
9 $0A0
A $08E
B $080
C $06A
D $054
E $048
F $036
A write to $4011 sets the counter and DAC to a new value:
-ddd dddd new DAC value
Sample Buffer
-------------
The sample buffer either holds a single sample byte or is empty. It is filled
by the DMA reader and can only be emptied by the output unit, so once loaded
with a sample it will be eventually output.
DMA Reader
----------
The DMA reader fills the sample buffer with successive bytes from the current
sample, whenever it becomes empty. It has an address counter and a bytes remain
counter.
When the DMC sample is restarted, the address counter is set to register $4012
* $40 + $C000 and the bytes counter is set to register $4013 * $10 + 1.
When the sample buffer is in an empty state and the bytes counter is non-zero,
the following occur: The sample buffer is filled with the next sample byte read
from memory at the current address, subject to whatever mapping hardware is
present (the same as CPU memory accesses). The address is incremented; if it
exceeds $FFFF, it is wrapped around to $8000. The bytes counter is decremented;
if it becomes zero and the loop flag is set, the sample is restarted (see
above), otherwise if the bytes counter becomes zero and the interrupt enabled
flag is set, the interrupt flag is set.
When the DMA reader accesses a byte of memory, the CPU is suspended for 4 clock
cycles.
Output Unit
-----------
The output unit continually outputs complete sample bytes or silences of equal
duration. It contains an 8-bit right shift register, a counter, and a silence
flag.
When an output cycle is started, the counter is loaded with 8 and if the sample
buffer is empty, the silence flag is set, otherwise the silence flag is cleared
and the sample buffer is emptied into the shift register.
On the arrival of a clock from the timer, the following actions occur in order:
1. If the silence flag is clear, bit 0 of the shift register is applied to
the DAC counter: If bit 0 is clear and the counter is greater than 1, the
counter is decremented by 2, otherwise if bit 0 is set and the counter is less
than 126, the counter is incremented by 2.
1) The shift register is clocked.
2) The counter is decremented. If it becomes zero, a new cycle is started.

----------
DAC Output
----------
The DACs for each channel are implemented in a way that causes non-linearity
and interaction between channels, so calculation of the resulting amplitude is
somewhat involved.
The normalized audio output level is the sum of two groups of channels:
output = square_out + tnd_out

95.88
square_out = -----------------------
8128
----------------- + 100
square1 + square2

159.79
tnd_out = ------------------------------
1
------------------------ + 100
triangle noise dmc
-------- + ----- + -----
8227 12241 22638

where triangle, noise, dmc, square1 and square2 are the values fed to their
DACs. The dmc ranges from 0 to 127 and the others range from 0 to 15. When the
sub-denominator of a group is zero, its output is 0. The output ranges from 0.0
to 1.0.

Implementation Using Lookup Table


---------------------------------
The formulas can be efficiently implemented using two lookup tables: a 31-entry
table for the two square channels and a 203-entry table for the remaining
channels (due to the approximation of tnd_out, the numerators are adjusted
slightly to preserve the normalized output range).
square_table [n] = 95.52 / (8128.0 / n + 100)
square_out = square_table [square1 + square2]
The latter table is approximated (within 4%) by using a base unit close to the
DMC's DAC.
tnd_table [n] = 163.67 / (24329.0 / n + 100)
tnd_out = tnd_table [3 * triangle + 2 * noise + dmc]

Linear Approximation
--------------------
A linear approximation can also be used, which results in slightly louder DMC
samples, but otherwise fairly accurate operation since the wave channels use a
small portion of the transfer curve. The overall volume will be reduced due to
the headroom required by the DMC approximation.
square_out = 0.00752 * (square1 + square2)
tnd_out = 0.00851 * triangle + 0.00494 * noise + 0.00335 * dmc
This linear approximation neglects the attenuating effect the DMC has when its
DAC is in the upper level. This factor can be calculated using the main formula
to form a ratio, and precalculated into a 128-entry lookup table.
tnd_out(triangle=15,dmc=d) - tnd_out(triangle=0,dmc=d)
attenuation(d) = ------------------------------------------------------
tnd_out(triangle=15,dmc=0)
-------------------
Unreliable Behavior
-------------------
(The following behaviors probably don't need to be emulated due to their
unreliability since stable code will avoid invoking it, and since their
behavior is somewhat difficult to precisely predict.)
If the frame IRQ is set just as register $4015 is being read, it seems to be
ignored (similar to polling $2002 for the vbl flag).
The DMC's DMA reader seems to check for an empty buffer every few CPU cycles,
rather than every cycle or continuously.
Writing to the DAC register ($4011) while a sample is playing sometimes has no
effect, probably because the DMC's output unit is clocking the counter at the
same moment as the write.

--------------
Clarifications
--------------
(The following are meant only as re-statements of the main content, rather than
additions of new content.)
Because the envelope loop and length counter disable flags are mapped to the
same bit, the length counter can't be used while the envelope is in loop mode.
Similar applies to the triangle channel, where the linear counter and length
counter are both controlled by the same bit in register $4008.
Unlike the other waveform channels, the triangle channel is silenced by
stopping its waveform at whatever phase it's at, rather than causing zero to be
sent to its DAC.
The length counter table seems to be set up for standard note durations for 4/4
time at 160 bpm and 180 bpm. If bit 3 is 0, the following results (Dn is bit n
of the fourth channel register):
180bpm 160bpm
D6-D4 D7=0 D7=1 note
-------------------------------
$00 10 12 16th
$01 20 24 8th
$02 40 48 4th (one beat)
$03 80 96 half
$04 160 192 whole
$05 60 72 4th dotted
$06 14 16 8th triplet (*3 = a 4th)
$07 26 32 4th triplet (*3 = a half)

-------------
Collaborators
-------------
Brad Taylor's NESSOUND.TXT and DMC.TXT as a starting point for testing.
NTSC NES for testing on.
Nesdev forum for feedback.
xodnizel for testing results, correction to DMC table, feedback.
Bloopaws/Draci for feedback, possible explanation of length counter table
values.

-------
History
-------
2003.12.01
Made development cartridge and started testing on NES hardware.
2003.12.14
Started project.
2003.12.20
A few draft sections were posted to Nesdev or e-mailed privately.
2004.01.02
Draft version posted to Nesdev. Corrected tnd_table formula.
2004.01.02
Corrected incorrect "correction" to tnd_table formula. Double-checked them.
2004.01.03
Corrected envelope flag name to "disable" (it was named "enable").
Added effective frequencies of frame sequencer outputs.
Added Overview, Unreliable Behavior, Clarifications, and Collaborators
sections.
2004.01.04
Adjusted linear approximation (difficult to find a compromise).
A few minor edits.
2004.01.30
First release. Probably won't be doing much with it for a while.

Vous aimerez peut-être aussi