Vous êtes sur la page 1sur 37

Lecture 2

High-Speed I/O

Mark Horowitz
Computer Systems Laboratory
Stanford University
horowitz@stanford.edu
Copyright © 2007 by Mark Horowitz, with material from
Stefanos Sidiropoulos, and Vladimir Stojanovic

M Horowitz EE371 Lecture 2 1


Readings

• Readings
– Techniques for High-speed Implementation of Nonlinear
Cancellation, Sanjay Kasturia and Jack H. Winters

• Overview:
– Your project will be the design of a circuit that processes the
input data from a high-speed I/O. This processing is
generally done in a mixed signal manner today, but your job
will be to build a digital implementation of the algorithm. This
lecture will try to give you some background about why I/O
rates are important, and what issues need to be resolved to
achieve high performance. The next lecture will discuss the
operation of the circuit you need to build.

M Horowitz EE371 Lecture 2 2


Computers Today

CPU
FSB, HT
DDR, RDRAM
AGP, PCI-E FBDIMM
DVI, HDMI

Graphics System
Display Controller
Memory
Controller
>1GB/s >4GB/s >4GB/s

PCI-X
PCI-E

>0.1GB/s
Storage I/O
Network Controller
I/O

PCI*, *ATA, USB ..


M Horowitz EE371 Lecture 2 3
Speed of Light:
The Difference Between I/O and On-Chip Wires
• First question:
– Why is I/O different from on-chip wires?
• Both send signals to each other
• Gates send data to each other all the time
– Don’t generally worry about signals, or delay
– Model the connection between gates as a capacitor
• Sometimes a capacitor/resistor network
• Answer:
– On-chip, ignore the speed of light, assume “c” infinite
• For external wires can’t make that assumption
– Wire connecting the pins is not an equipotential
– References are different

M Horowitz EE371 Lecture 2 4


Finite Speed of Light Ramifications

• Signals must have delay in reaching destination


– Td = L/ν, bits arrive at a different time than when sent
– Thus must determine ‘right’ time to sample them
• Wires store energy
– Current is set by the geometry of wire (what else?)
• Signal can’t see termination resistor (causality)
• V/I for the line is called the impedance, Z < 300 Ω
– When signal is traveling on the wire
• Power goes into the wire before it hits load
• Since energy is conserved, wire must be storing energy
• Signal is ALWAYS a pair of currents

M Horowitz EE371 Lecture 2 5


Link Issues

• Signaling: getting the bit to the receiver

RTERM RTERM

Tx Rx
Channel

• Timing: Determining which bit is which

1 0 0 1 0 1 0

tbit /2
M Horowitz EE371 Lecture 2 6
Transmission Lines

Figure from John Poulton


• Wire where you notice ‘c’ is finite
– Current flows in one terminal
– And flows out the other

• Energy is stored in E and B fields


– But can model with L, C

M Horowitz EE371 Lecture 2 7


Problems : Material Loss

Loss in GETEK : 1m, 8mil μstrip trace

H(s)
(transfer
function)

Frequency
• PCB Loss : skin & dielectric loss
– Skin Loss ∝ √f
– Dielectric loss ∝ f : a bigger issue at higher f
M Horowitz EE371 Lecture 2 8
Dealing With Current Return/References

• Wire Utilization:

– Single Ended
shared signal return path

– Differential -
+
explicit signal return path

ref
-
– “Pseudo” Differential +

M Horowitz EE371 Lecture 2 9


Transmission Lines

Z2 – Z 1 2Z2
-------------------- --------------------
Z 1 + Z2 Z 1 + Z2

Z1 Z2

Two constraints govern behavior at any junction:


• Voltage are equal
– They are electrically connected
• Power is conserved
– Energy flow into junction is equal to transmitted and reflected

M Horowitz EE371 Lecture 2 10


High-Speed Wires Are Point to Point

• Can’t split a wire to go to two location


– You will get a reflection from the junction
– Z1 will see impedance discontinuity

Z2

Z1

Z2

M Horowitz EE371 Lecture 2 11


At High Speeds, Vias are Stubs

Top layer signaling


results in large via stub

• Signal energy splits at via


– If via is short can be modeled as a cap load
– Causes a reflection in signal
• Higher the frequency, the more sensitive you are to stubs
M Horowitz EE371 Lecture 2 12
Backplane Environment

Package

On-chip parasitic
Package
(termination resistance and via
Line card trace device loading capacitance)

Back plane trace Back plane connector Line card


via

Backplane via

• Line attenuation
• Reflections from stubs (vias)

M Horowitz EE371 Lecture 2 13


Backplane Channel

• Loss is variable
0

Attenuation [dB]
– Same backplane
– Different lengths -10 9" FR4

– Different stubs -20


• Top vs. Bot
-30 26" FR4
• Attenuation is large -40
– >30dB @ 3GHz 9" FR4,
-50 via stub
– But is that bad?
-60 26" FR4,
via stub
0 2 4 6 8 10
frequency [GHz]

M Horowitz EE371 Lecture 2 14


Inter-Symbol Interference (ISI)

• Channel is low pass


– Our nice short pulse gets spread out

• Dispersion – short latency


pulse response

1
(skin-effect, dielectric loss)
0.8

0.6 Tsymbol=160ps
• Reflections – long latency
0.4 (impedance mismatches –
connectors, via stubs,
0.2 device parasitics, package)

0
0
M Horowitz 1 2 3
EE371 Lecture 2 15
ns
ISI

Error!
0.8
Amplitude

0.6

0.4

0.2

0
0 2 4 6 8 10 12 14 16 18
Symbol time
• Middle sample is corrupted by
– 0.2 trailing ISI (from the previous symbol),
– 0.1 leading ISI (from the next symbol) resulting in 0.3 total ISI
• As a result middle symbol is detected in error
M Horowitz EE371 Lecture 2 16
Equalization For Loss :
Goal is to Flatten Response

• Channel is band-limited
• Equalization : boost high-frequencies; or attenuate low freq
M Horowitz EE371 Lecture 2 17
Equalization Mechanisms
0.6
1
No equalization Tx equalization
0.8 0.4

Amplitude
Amplitude

0.6 0.2

0.4 0

0.2 -0.2

0
-0.4
0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18
Symbol time Symbol time
• Tx equalization
– Pre-filter the pulse with the inverse of the channel
– Filters the low freq. to match attenuation of high freq.
• Rx feedback equalization
– Subtract the error from the signal
M Horowitz EE371 Lecture 2 18
Removing ISI

Linear transmit equalizer


Anticausal taps Sampled Deadband Feedback taps
Tx Data
Data

Channel
50Ω 50Ω TapSel
Causal outP Logic
taps outN
d d
Decision-feedback equalizer
I eq 0

• Transmit and Receive Equalization


– Changes signal to correct for ISI
– Initial work was at transmitter
J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane
Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.
M Horowitz EE371 Lecture 2 19
Transmit Equalization – Headroom Constraint

Attenuation [dB]
Anticausal taps
Peak power constraint
unequalized
Tx -5
Data
-10

equalized
-15
Channel
-20
Causal frequency [GHz]
taps -25
0 0.5 1 1.5 2 2.5

Amplitude of equalized signal


depends on the channel
• Transmit DAC has limited voltage headroom
• Unknown target signal levels
– Harder to make adaptive equalization work
• Need to tune the equalizer and receive comparator levels
– If you have multi-level signals
M Horowitz EE371 Lecture 2 20
Removing Interference at Receiver

• Could also build a linear filter


– Could have gain in the filter
– But either it would need to be analog and have gain
– Or need high-speed A/D
• And real multiplication
• Sum (ai*xi)
– Increases channel noise too

M Horowitz EE371 Lecture 2 21


High Frequency Channel Noise: Crosstalk

• Many sources
– On-chip
– Package
– PCB traces
– Inside connector
• Differential signaling can help
– Minimize xtalk generation & make effects common-mode
• Both NEXT & FEXT
– NEXT very destructive if RX and TX pairs are adjacent
• Full swing-TX coupling into attenuated RX signal
• Effect on SNR is multiplied by signal loss
– Simple solution : group RX/TX pairs in connector
– NEXT typically 3-6%, FEXT typically 1-3%

M Horowitz EE371 Lecture 2 22


Subtract Out Residual Interference

• Called Decision feedback equalization (DFE)


– Subtracts error from input 1

– No attenuation Feedback
0.8
equalization

Amplitude
0.6

0.4

• Problem with DFE 0.2


– Need to know interfering bits
0
– ISI must be causal 0 2 4 6 8 10 12 14 16 18
Symbol time
• Problem - latency in the decision circuit
• Receive latency + DAC settling < bit time
– Can increase allowable time by loop unrolling
• Receive next bit before the previous is resolved

M Horowitz EE371 Lecture 2 23


Removing ISI

Linear transmit equalizer


Anticausal taps Sampled Deadband Feedback taps
Tx Data
Data

Channel
50Ω 50Ω TapSel
Causal outP Logic
taps outN
d d
Decision-feedback equalizer
I eq 0

• Transmit and Receive Equalization


– Changes signal to correct for ISI
– Initial work was at transmitter
J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane
Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.
M Horowitz EE371 Lecture 2 24
One Bit Loop Unrolling (for 2 level signal)

K.K. Parhi, "High-Speed architectures for algorithms with quantizer loops,"


1 1 + αD IEEE International Symposium on Circuits and Systems, May 1990
2PAM signal
constellation 1 +α 1 +α
+1
1 −α 1 −α
+α d n | d n −1 = 1
+α +α
0 xn d n −1
D Q
−α −α dClk

−1 + α −1 + α −α d n | d n −1 = 0
−1
−1 − α −1 − α dClk

• Instead of subtracting the error


– Move the slicer level to include the interference
– Slice for each possible level, since previous value unknown

M Horowitz EE371 Lecture 2 25


More Bits/Hz

• Multi-level signaling (aka PAM)


– Convert extra voltage margin to more bits

– Works well when the noise is small


• Need even more signal processing
M Horowitz EE371 Lecture 2 26
Internal Speed Limitation

• Links need good quality clocks with low jitter


– That means you want them to settle to both Vdd, and Gnd
– If you make the clock to fast, it will not “rail”
– And that means it will be prone to jitter

• So one limitation for links is internal clock rate


– For power efficiency want FO on clock to be around 4
– Need pulse width 3-4 times the slowest gate
– Gives around 8 FO4 clock

• For higher speed bit rates


– Need to generate multiple bits/clock
– Use non-static CMOS clock circuits (CML & inductors)
M Horowitz EE371 Lecture 2 27
Simple Demultiplexing Receiver

Data_E
in
ref pre latch
Input

Data_O clk clk

• 2-1 demux at the input


• Preconditioning stage: filter/integrate, can be clocked to avoid ISI
– Reject CM
– Sometimes not used
• Latch makes decision (4-FO4)

M Horowitz EE371 Lecture 2 28


Simple Multiplexing Transmitter

• DDR: send a bit per clock edge


• Critical issues:
– 50% duty cycle 30

output pulse width closure (%)


– Tbit > 4-FO4
20

Data_O
10

Data_E
01 2 3 4 5
bit time (normalized to FO4)

M Horowitz EE371 Lecture 2 29


I/O Clocking Issues

• Remember the clocking issues:


– Long path constraint (setup time)
– Short path constraint (hold time)
– Need to worry about them for I/O as well

• For I/O need to worry about a number of delays


– Clock skew between chips
– Data delay between chips
• Can be larger than a clock cycle (speed of light)
– Clock skew between external clock and internal clock
• This can be very large if not compensated
• It is essentially the insertion delay of the clock tree

M Horowitz EE371 Lecture 2 30


System Clocking: Simple Synchronous Systems

d1
CKX
CKX
DI
CKC1 CKC2

d2 CKC1
on-chip
logic CKC2
DI

• Long bit times compared to on chip delays:


– Rely on buffer delays to achieve adequate timing margin

M Horowitz EE371 Lecture 2 31


PLLs: Creating Zero Delay Buffers

PLL/DLL
CKX
CKC CKX

DI
on-chip
logic
DI CKC

• On-chip clock might be a multiple of system clock:


– Synthesize on-chip clock frequency
• On-chip buffer delays do not match
– Cancel clock buffer delay

M Horowitz EE371 Lecture 2 32


Used to Argue About PLLs vs DLLs

VCO VCDL

clk

clk

÷N
PD PD
ref ref
clk clk
Filter Filter

• Second/third order loop: • First order loop:


 Stability is an issue  Stability guaranteed
 Frequency synthesis easy  Frequency synthesis problematic
 Ref. Clk jitter propagates
 Ref. Clk jitter gets filtered
 Phase error does not accumulate
 Phase error accumulates
M Horowitz EE371 Lecture 2 33
After Many Years of Research

• And many papers and products


• One can mess up either a DLL or PLL
– Each has it own strengths and weaknesses
• If designed correctly, either will work well
– Jitter will be dominated by other sources
• Many good designs have been published
– It is now a building block that is often reused
– We all have our favorites, mine is the dual-loop design

• And yes, people use ring oscillators


– Still an open question about how much LC helps (in system)
M Horowitz EE371 Lecture 2 34
Clocking Structures

• Synchronous:
Same frequency and phase
t t
• Conventional buses
F0

• Mesochronous
Same frequency, unknown phase
• Fast memories
tA tB
• Internal system interfaces
• MAC/Packet interfaces tA≠ tB F0

• Plesiochronous:
Almost the same frequency
– Mostly everything else today F1 F2
F1≈ F2

M Horowitz EE371 Lecture 2 35


Source Synchronous Systems
CKSRC CKRCV
PLL/DLL

data rcvr
logic

ref

CKSRC

data D0 D1 D2 D3

CKRCV

– Position on-chip sampling clock at the optimal point


i.e. maximize “timing” margin
M Horowitz EE371 Lecture 2 36
Serial Link Circuit

rcvr
logic D0 D1
DIN
CKR
DIN
CKR
CDR

– Recover incoming data fundamental frequency

– Position sampling clock at the “optimal” point

M Horowitz EE371 Lecture 2 37

Vous aimerez peut-être aussi