Vous êtes sur la page 1sur 49

Int. Journal on High Speed Electronics and Systems, vol. 11, No. 1, pp.

257-306 (2001)

RSFQ TECHNOLOGY: PHYSICS AND DEVICES PAUL BUNYK, KONSTANTIN LIKHAREV, and DMITRY ZINOVIEV State University of New York at Stony Brook Stony Brook, NY 11794-3800, U.S.A.
Rapid Single-Flux-Quantum (RSFQ) logic, based on the representation of digital bits by single quanta of magnetic flux in superconducting loops, may combine several-hundredGHz speed with extremely low power dissipation (close to 10-18 Joule/bit) and very simple fabrication technology. The drawbacks of this technology include the necessity of deep (liquid-helium-level) cooling of RSFQ circuits and the rudimentary level of the currently available fabrication and testing facilities. The objective of this paper is to review RSFQ device physics and also discuss in brief the prospects of future development of this technology in the light of the tradeoff between its advantages and handicaps.

1. Introduction The most authoritative industrial forecast, the International Technology Roadmap for Semiconductors (ITRS),1 predicts that the current spectacular growth in density of semiconductor digital integrated circuits (Moores Law) will continue for at least one more decade, increasing the integration scale by almost three orders of magnitude by the end of that period. The outlook for speed is, however, entirely different: the document predicts that the exponential growth of microprocessor clock frequency which was typical for the last two decades, will slow to a crawl right after the recent crossing of the 1 GHz frontier. If the anticipated mass production of integrated circuits with sub-100-nm features does not materialize because of skyrocketing fabrication costs, the prospects for multiGHz operation of the mainstream CMOS logic circuits will seem even more bleak. In order to explore alternative ways to overcome the integrated circuit speed saturation, it is important to recognize that this problem has virtually nothing to do with the intrinsic switching speed of semiconductor transistors. The intrinsic switching time of a modern silicon MOSFET is below 10 ps.2 This does not mean, however, that such transistors may enable digital integrated circuits with a 10-ps-scale clock cycle. In fact, a dominant share of the much longer, 1-ns-scale clock cycle in modern integrated circuits is spent on recharging the interconnect capacitances (C) by the on-currents (I) of logic gate transistors. (The relative contribution of the gate capacitance to C is almost negligible.)1,3 The most apparent way to speed up the recharging process is to increase the output current I, e.g., by using transistors with wider channels. This, however, immediately leads to growth of the dynamic power consumption. A convenient measure of this power is its average density (power per unit area)

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

P0 = C0V2fC,

(1)

where V is the logic swing (typically close to the power supply voltage VDD), fC is the clock frequency, and C0 is the effective total interconnect capacitance per unit area. The latter parameter is virtually independent on the active device parameters (for current interconnect technologies, C0 ~ 10-8 F/cm2).4 As a result, P0 is very large even now: modern 1-GHz microprocessors burn above 50 watts on a 1-cm2-scale chip area.5 This power density should be compared to ~0.1 W/cm2 power flux from the direct sunlight or ~10 W/cm2 from a kitchens hot plate. Removal of such enormous power from a chip without its overheating presents a very serious technical challenge. The ITRS1 foresees that no more than 185 W will be lifted from a chip even by the year 2014, indicating that there are virtually no known reserves for this way. In addition, C0 will continue to grow due to the increase of the wiring level number (see, e.g., Fig. 7 of Ref. 4). The only visible path to the increase of speed in semiconductor digital devices, while keeping power within acceptable limits, is to decrease the power supply voltage VDD while keeping the on-current I fixed. In CMOS circuits, this reduction requires an increase of the ratio I/VDD, i.e. the transistor transconductance gm, via shortening the MOSFET gate. This is a very arduous (and increasingly expensive) process which will require the commercial introduction of radically new patterning techniques to implement minimum features smaller than ~100 nm. Even if this size reduction is implemented, the speed improvements will be rather marginal.1 The problem of heat removal is not limited to CMOS circuits. Novel semiconductor devices like heterojunction bipolar transistors6 or resonant tunneling diodes7 demonstrate even more remarkable speed of internal switching (beyond 100 GHz), due to their low internal parasitics. However, they do obey the lower power bound expressed by Eq. (1) and have voltage swings V comparable to those of the CMOS transistors. As a result, an attempt to use them in faster VLSI circuits would lead to the same heat-limited performance saturation. A remarkable opportunity to solve the speed saturation problem is provided by superconductor integrated circuits which can operate above 100 GHz. The goal of this paper is to give a brief review of this opportunity and discuss prospects and problems of the so-called Rapid Single-Flux-Quantum (RSFQ) logic which is currently the focus of work in this field. A review of current and possible near term future applications of this technology from the system point of view is given by D. Brock in another article of this special issue. Somewhat more technical, albeit already somewhat outdated, reviews may be found in Refs. 8, 9; for popular accounts of RSFQ technology, see Refs. 10, 11. Current research work in this field may be followed conveniently via proceedings of the biennial Applied Superconductivity Conferences, which are being published in special issues of the IEEE Transactions on Applied Superconductivity, and via Web home pages of several groups.12

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

2. Superconductor Digital Electronics 2.1. Superconductor transmission lines The main advantage of superconductor digital circuits may be not in active devices, but in interconnects. Superconductors have very low ac loss below the gap frequency f = 2(T)/h, (2)

where (T) is the superconductor energy gap (at T Tc/2, (T) 1.76 kBTc, where Tc is its critical temperature).13,14 For the superconductor most practical for integrated circuit fabrication, niobium, Tc is close to 9 K (degrees Kelvin) and f to 700 GHz. As a result, on-chip superconducting transmission lines (Fig. 1) with very thin dielectric layers, with d ~ 0.1 m, have very low attenuation for picosecond signals, and may be used for transfer of picosecond waveforms over any on-chip distances.15,16. Since insulation thickness for such lines may be much smaller than the strip width w, the electromagnetic field is very well localized in narrow gap(s) between the strips and ground(s), so that such interconnects also have very low crosstalk.17 In order to implement such low crosstalk in semiconductor transistor technology at comparable frequencies, one would need to use ground planes with similarly small distance d from the strips. However, if implemented with normal metals, such interconnects would have very high attenuation, since d would be comparable to the skin depth (). In superconductors, however, the dissipative field penetration by skin depth () is replaced with non-dissipative, frequency-independent field penetration by the so-called London depth , about 0.1 m for niobium thin films. (a) w t d t d w d' (b)

Fig. 1. In superconductor electronics, on-chip interconnects such as (a) microstrips and (b) striplines with submicron dielectric layers may have negligible attenuation and dispersion for picosecond waveforms transfer over distances of a few cm. Lines with min [d, d, ] << w also have virtually negligible crosstalk even if the wiring pitch is close to the strip width w.

2.2. Josephson junctions The second important component in the arsenal of superconductor electronics is the Josephson junction, a two-terminal device which physically is just a weak contact between two superconductor electrodes.13,14,18 For present practice, the most important type of such a contact is a niobium-trilayer (Nb/AlOx/Nb) tunnel junction19,20 in which

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

the weak contact between two niobium thin-film electrodes is provided by tunneling through a ~1-nm-thick layer of aluminum oxide. Josephson junctions feature very unusual dynamics18 because of the macroscopic quantum nature of charge carriers (Cooper pairs) in superconductors.13,14 In contrast to Fermi particles (single electrons and holes) in normal metals and semiconductors, Cooper pairs have integer spin and hence obey the Bose statistics. As a result, they form a coherent condensate which may be described with a single wavefunction (r, t) = || exp{i(r, t)}. (2)

The wavefunction amplitude ||, which is proportional to the square root of the Cooper pair density, is almost constant inside Josephson junction electrodes, but the wavefunction phase exhibits fascinating dynamics. Indeed, plugging Eq. (2) with || = const into the Schrdinger equation i/t = H, (3)

for superconductors in equilibrium when the Hamiltonian operator H is just the Cooper pair energy E = 2e + const (where is the local Fermi level, i.e. the electrochemical potential), we get the following equation for the phase evolution: (r, t)/t = -(2e/)(r, t). (4)

Subtracting Eqs. (4) written for two arbitrarily fixed points inside superconductor electrodes, we get a the fundamental relation between the phase drop 1 - 2 and voltage drop V = 2 - 1 between these points: d/dt = (2e/)V(t). (5)

The last equation has been experimentally confirmed to be accurate to at least the 15th decimal place and is presently used in the legal definition of the volt. The spatial and temporal dynamics of the phase (r, t) affects all properties of superconductors; in particular, it determines the flow of Cooper pairs (supercurrent) in a Josephson junction:13,14,18 IS = IC sin . (6)

Here is the phase drop across the junction, while IC is its critical current which depends on its area and barrier transparency. According to Eqs. (5), (6), for small perturbations of the supercurrent IS a Josephson junction behaves as a (nonlinear) dynamic inductance where LJ V/ (dI/dt) = LC/cos, (7a)

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

LC /2eIC.

(7b)

However, for large signals the Josephson junction dynamics is substantially more complex. For a fair description of this dynamics, one should take into account three other components of the current through the junction: I(t) IC sin + CdV/dt + V/R + If (t), (8)

where C is the junction capacitance and R its normal resistance. (Generally, 1/R is the sum of a nonlinear "quasiparticle" conductance GN(V) of the junction itself and a linear conductance of an external shunt, but in present-day circuits the latter term dominates, so that R may be considered constant.) The (typically small) term If(t) gives the Langevin description of current noise in the normal resistance R. For externally-shunted junctions, a fair description of this noise is given by the Nyquist formula which may be written as either or <If 2> = (4kBT/R) f, <If (t)If (t')> = (2kBT/R) (t-t'). (9a) (9b)

The system of equations (6), (8) gives an implicit relation between the current and voltage in Josephson junctions. Its analysis18 shows that these junctions allow generation of various picosecond waveforms. Moreover, due to the fundamental relation (5) the junctions may recover weak incoming pulses, restoring their waveforms to a nominal value. In addition to that, the effective impedance Z~ R of a Josephson junction may be matched to that of on-chip superconductor interconnect lines (Fig. 1), ensuring effective insertion of generated signals into the line and reception of signals incident from the line. Also, these devices operate with low signal voltages (V ~ ICR ~ 1 mV) and as a result their scale of power dissipation, P ~ V2Re(Z-1) ~ IC2R, is extremely low, typically of the order of 1 W. Moreover, junctions may be in their superconducting state, with no dissipation at all, most of the time, so that the average power dissipation may be reduced well below 1 W per logic gate even at 100-GHz-scale frequencies. Combined in one device, all these features enable extremely fast digital signal processing together with very low dissipation. As a result, device integration and chip packaging can be extremely dense, saving more time on signal propagation delays in multi-chip systems. Finally, in contrast to most advanced semiconductor devices, the fabrication technology of niobium-based integrated circuits is very simple. Though these circuits are usually formed on the readily available, standard silicon wafers, they do not require any silicon processing (like ion implantation or high-temperature diffusion), but rather just deposition of several metallization layers including superconductor interconnects and one or two normal-metal layers for Josephson junction shunting and biasing (Fig. 2).

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

2.3. Latching logics The recognition of the advantages of superconductor integrated circuits has motivated several attempts to develop a practical Josephson junction digital technology, among them, the large-scale IBM effort in the USA (1969 1983)21 and the MITI project in Japan (1981-1990).22 These projects should be credited for several important contributions to superconductor electronics. However, they were terminated without commercialization of the technology because the achieved circuit speed (clock frequency ~ 1 GHz by 1990) was only marginally higher than that of the contemporary semiconductor transistor circuits, and could hardly justify the necessary helium cooling. The main factor limiting the speed was the unfortunate choice of so-called latching (or "voltage-state") circuitry based on the properties of unshunted Josephson tunnel junctions.
I1A R2 R3 M3 I2 M2 I1 M1 I0 M0 I0
23 Si Fig. 2. A commercially available 10-level niobium-based process for superconductor integrated circuits includes 4 superconductor metal layers M for wiring, a device definition layer I1A for Josephson junctions, a resistor layer R2, a gold layer R3 for the contact pads, and three sets of vias to connect the conducting layers. Figure courtesy of HYPRES, Inc.

As can be readily shown13,14,18 from Eqs. (5) and (8), such junctions, when biased with a dc current within the range -IC < I < +IC, have two different states: a superconducting state with vanishing voltage drop V across the junction, and a resistive state with V Vg 2(T)/e (for niobium-trilayer junctions, Vg 2.8 mV). In latching logic, the superconducting state is used to denote binary 0, while the resistive state represents binary 1. Here comes the problem: switching from 0 to 1 may be rather fast, a few picoseconds for junctions with high critical current density jC = IC/A (of a few kA/cm2). However, the reciprocal switching (10) is much more complex18 and should be long, of the order of one nanosecond, to avoid errors. Recently, several relatively simple latching circuits have been tested at clock frequencies of a few GHz;24,25 this speed is, however, considerably (by a factor of 10 or so) slower than that of modern, RSFQ circuits - see below - at the same level of fabrication. This is essentially the price which was paid for an attempt to mimic the information representation by dc voltage, which is the only option is semiconductor electronics, but is very unnatural for superconductors with their macroscopic quantum dynamics.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

From the practical point of view, another problem of latching logic was even more formidable. Most latching devices must be driven by an external clock signal which also provides the necessary power. The total current needed to run an LSI circuit could reach many amperes, and feeding integrated circuits with such huge currents at multi-GHz frequencies would create severe crosstalk between the off-chip segments of ac power and signal lines.26 2.4. SFQ logics An alternative approach to use superconductors for computing is based on their natural property13,14 to quantize magnetic flux BndA through any closed superconducting loop in multiples of the flux quantum 0. Indeed, let us plug in Eq. (5) written for two end points of an almost closed loop into Faradays induction law for this loop: d/dt = V. (10)

Integration of the resulting equation over time yields the relation between the magnetic flux and Josephson phase difference: = 2/0, where the fundamental constant combination 0 h/2e 210-15 Wb, (12) (11)

is called the magnetic flux quantum. (Due to the relation (11), the variable (r, t) (0/2)(r, t) is frequently referred to as "flux" in a given point of the circuit, even if it does not belong to any specific superconductor loop.) Now, closing the ends of the loop we have to require that the wavefunctions in these two (now identical) points coincide, besides maybe a phase difference multiple of 2. Then Eq. (11) immediately yields the flux quantization: = n0, n = 0, 1, 2, (13)

Since 1961, this prediction has been repeatedly verified experimentally with high accuracy. Evidently, digital information can be coded by certain values of the integer n, for example, the flux states with n = 0 and n = 1 may be used to represent binary zero and one, respectively. If a superconducting loop is made of a bulk superconductor, switching between the different flux states requires the suppression and restoration of superconductivity in at least some cross-section of the loop; the latter process would take too much time (~100 ps for niobium). However, if the loop is interrupted with a Josephson junction, switching may be performed much faster (for niobium-based junctions, in a fraction of a picosecond). Let us consider the simplest, but representative, single-flux-quantum (SFQ) circuit with just one Josephson junction (Fig. 3), usually called SQUID (standing for

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

Superconducting QUantum Interference Device). In order to describe it, one should combine Eqs. (8), (11) with the usual equation for the total magnetic flux through the loop: (14) = ex LI, where L is the loop inductance, and ex is the external magnetic flux. (In practical SFQ circuits, it is frequently more convenient to create this flux by passing an external current Iex through a part of the loop: ex = MIex, where M is the inductance of this part - see Fig. 3a). Neglecting the small noise If(t) for a while, we get the following simple stationary relation between the external and total magnetic flux: + l sin = ex, where we have used convenient definitions l 2LIC/0,
external current Iex loop inductance L
Phase

(15)

ex 2ex/0.

(16)

(a)
8.0 6.0 4.0 2.0

(b)

I (t)

voltage V(t) phase (t)


Voltage

0.0 1.0 0.5 0.0 0 5 10 15 20 Time 25 30 35 40

superconductor (e. g., Nb)

Josephson junction (e. g., Nb/AlOx/Nb)

Fig. 3. (a) The simplest SFQ circuit (SQUID) which may serve as a generator of single SFQ pulses and (b) dynamics of its switching in the moment when the externally applied flux induced by the slowly changing current Iex, reaches its threshold value (see Fig. 4). In panel (b), time is in units of 0 (defined by Eq. (17) below), voltage in units of ICR. Inductive parameter l equals 2; shunting parameter c (defined by Eq. (18) below ) equals 1.

Equation (15) shows that if the LIC product is small (l < 1), the Josephson phase (and hence the total magnetic flux through the loop) is a unique function of ex, i.e. the applied flux. This means that the insertion of a Josephson junction with small critical current suppresses the flux quantization effect described by Eq. (13), since the loop is virtually broken by the Josephson junction. Such loops are called non-quantizing.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

However, if the product LIC is large enough compared to 0 (l > 1), the Josephson phase difference , and hence the total magnetic flux and persistent supercurrent I = IC sin, may have several stable stationary states, n 2n, i.e. n n0, for the same external field ex see Fig. 4. This means that the insertion of a Josephson junction with a sufficiently large critical current retains the flux quantization effect (the loop is quantizing), but modifies it. In particular, the difference between the neighboring values of is somewhat smaller than 0, but if l is not very close to 1, this reduction is small. The Josephson junction also limits the number of stable flux states to N ~ l/; in typical RSFQ circuits, l is close to 2 (i.e., LIC ~ 0), and one can conveniently work with only two flux states. Moreover, by fixing the dc flux bias at 0/2 (see the dashed vertical line in Fig. 4), these two states (n = 0 and n = 1) may be equilibrated, i.e. provided with equal energy and stability.

/0
1

n=1

0.5

n=0
1 0.5 0 0.5 1 1.5 2

ex/0
0.5

Fig. 4. RF SQUID: total magnetic flux as a function of applied flux ex as given by Eq. (15) for the LIC product typical for quantizing loops in RSFQ circuits (l = 2, i.e. LIC = 0 > 0/2). Arrows indicate fluxstate switching induced by a slow change of the external field for dynamics, see Fig. 3b. Solid points show two stable states at the equilibrating value ex = 0/2.

Switching between the two states may be achieved by changing ex (i.e., via changing Iex - see Fig. 3). At large values of inductance (l >> 1) this switching may be conveniently understood as the result of the Josephson junction current exceeding its critical current IC. Any current beyond this value cannot be carried by supercurrent (6) alone, so that eventually the normal current IN = V/R should pick up the difference; since according to the basic Eq. (5), V d/dt, the Josephson phase difference should start moving beyond the critical value /2, decreasing IS and leading to the further growth of IN, V, and d/dt. This positive-feedback (exponential) growth of the phase change speed ends only when the phase has come close to its initial value (0 < < /2) plus 2, i.e. when the 2 leap of the phase has been performed see Fig. 3b and 4. As follows from Eqs. (5) and (8), the time of the phase leap is of the order of a few units of 0 LC/R = /2eICR = 0/2ICR. (17) = max [0, RC], In order to have non-oscillatory transient dynamics, with the unambiguous selection of the final state (and hence avoid the dynamics complications which have killed the latching logics), the time constants should be related as RC 0; this relation may be expressed as

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

c RC/0 = (2/0)ICR2C 1,

or R (0/2ICC)1/2.

(18)

For stand-alone tunnel junctions this condition is not satisfied (until their critical current density is extremely high, see Sec. 4.8 below); this is why in SFQ circuits the junctions are externally shunted with thin-film resistors (Fig. 5) to decrease the total value of R.
ground contact Nb/AlO x/Nb trilayer

counter electrode

base electrode

shunt resistor

Fig. 5. Typical layout of an externally shunted Josephson junction assembly. The whole circuit is fabricated over a common, mostly unpatterned ground plane (not shown). Figure courtesy of HYPRES, Inc.

The shunt resistance is selected to provide just the critical damping (c 1), since further reduction of R would only slow down the switching speed. At this value of c, the transient time constant may be expressed as 0 = RC = (0C/2IC)1/2. (19)

For tunnel junctions, 0 does not formally depend on the junction area A (since both C and IC are proportional to A), but only on the ratio of the critical current density jC = IC/A to the specific capacitance C0 = C/A. The density jC depends exponentially on the tunnel barrier thickness and hence may be readily adjusted to a desirable level within very broad limits, by changing aluminum oxidation time and oxygen pressure. Its practical choice is determined by two important limitations on the junction critical current. On one hand, thermal fluctuations (9) may toggle an SFQ circuit from one state into another spontaneously. Detailed analyses18 show that in order to make the corresponding bit error rate to be sufficiently low, IC should satisfy the relation: IC 3 IT ln(1/20), where IT is the current equivalent of the thermal fluctuation scale kBT: IT (2/0)kBT. (20b) (20a)

For the usual temperature of operation of niobium-based circuits (T = 4-5 K), IT is about 0.2 A, so that for a reasonably low bit error rate, 0 10-30, IC should not be less than 50 A. As will be discussed in Sec. 3.3 below, during the junction switching the effect of

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

10

fluctuations is even more dangerous, so that IC should be even somewhat larger, above ~100 A. On the other hand, excessively larger ICs lead to excessive power dissipation (and eventually may result in local overheating). Indeed, in the stationary state Josephson junctions do not dissipate energy, because at (t) = const, voltage V across the junction vanishes completely - see Eq. (5). Power P = IV is finite (i. e., energy is dissipated) only during the transient, leading to the energy loss E = IVdt = (0/2) Id ~ (0/2) IC 2 = IC0 (21)

per each SFQ switching event. A reasonable compromise between the fluctuation limitations and power growth is to have IC of the smallest junction about Iu = 125 A (for this value, Eq. (21) gives E 210-19 joule per junction switching event) and hence the critical current density jC Iu/Amin. (22)

where Amin is the smallest junction area available at the given patterning technology. Table 1 lists the values of 0 and other key parameters of SFQ circuits for several values of Amin. (This scaling has been confirmed in numerous experiments.) The table shows that superconductivity allows a natural implementation of very fast bistable devices with extremely low power consumption. Now, a major problem is how to pass the information about the flux state (or equivalently about the Josephson phase difference ) of one loop to other similar cells. (Again, V = 0 for each of these states, so that usual interconnects would carry no information at all.) There have been two approaches to this problem, which may be called static and dynamic SFQ, respectively. In the former approach, the information is passed quasistatically via superconducting wires. Unfortunately, the unavoidable inductance L of a wire causes a Josephson phase drop along it: = (2/0) V(t)dt = (2/0) LI. (23)

similar to the voltage drop along normal-metal wires. As a result, only a fraction of the initial phase signal reaches the destination; hence, a phase amplifier is needed. Such a device, named the Parametric Quantron was suggested33 in 1976. Its simplest version is similar to the SQUID shown in Fig. 3a, but includes a Josephson junction whose critical current IC can be externally controlled. (Either a long junction or two or more lumped junctions in parallel may be used for this control.18) Modulation of IC by an external clock signal may create a local Josephson phase and energy gain, thus enabling control of the final state by a relatively small input signal from similar cells. Parametric-Quantron-based circuits may have several interesting properties, including the ability to process digital information reversibly,34 with energy dissipation per bit well below both the apparent thermodynamic limit kBTln2 and the apparent quantum limit ~/.35 Unfortunately, detailed analyses36-41 have shown that the critical parameter margins of practical Parametric Quantron circuits are rather low. Moreover, lack of long passive interconnects makes the information transfer over few-mm-scale distances forbiddingly slow.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

11

Table 1. Scaling of niobium-trilayer junctions with c = 1.

Fabrication technology Minimum junction size F = Amin (m) Critical current density jC (kA/cm2) Specific capacitance C0 (F/cm2) Voltage scale ICR (mV) Time scale 0 = (0C0/2jC) (ps) Power scale Iu2R (W)
1/2

HYPRES23

TRW31

Assumed in COOL core design32 0.8 20 7 1.1 0.3 0.14

Deepsubmicron (see Sec.5.3) 0.3 150 8 2.0 0.17 0.25

3.5 1 5 0.3 1.1 0.04

1.75 4 6 0.5 0.7 0.07

2.5. Dynamic SFQ: The basic idea Currently, we are in the midst of a new attempt to develop a competitive superconductor digital technology, using dynamic SFQ devices. The basic idea of these devices is to use transient dynamics for information transfer. Indeed, according to Faraday's induction law V = d/dt, during the switching between the neighboring flux states (Fig. 4) a short voltage pulse is formed across the junction (Fig. 3b). Since for SFQ circuits the flux change is quantized ( 0), so is the pulse area: V(t)dt 0 2 mVps. (24)

For typical critically shunted Josephson junctions, the FWHM switching time is of the order of 40 (Fig. 3b), i.e. a few picoseconds, so amplitude of the pulse Vmax 0/40 ~ 1.5 ICR is of the order of a millivolt for more specific numbers, see Table 1. In dynamic single-flux-quantum circuits these SFQ pulses are passed to other devices along either passive superconductor transmission lines (Fig. 1) or, if current/power gain is needed, active Josephson transmission lines (see Sec. 3.1 below).

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

12

Dynamic SFQ circuits are very attractive because the pulses can be naturally generated, reproduced/recovered, memorized and processed with simple SFQ devices whose speed is much higher, and energy dissipation much smaller, than that of the latching logic. Another feature which distinguishes dynamic SFQ circuits from other logics using two-terminal devices is the pulse nature of the signals. For such picosecond signals, even a-few-pH inductance may provide a substantial isolation between the circuit input and output. For more usual signals such as voltage steps in semiconductor electronics, threeterminal devices like transistors are very essential to provide sufficient isolation. In contrast, RSFQ circuits with their return-to-zero signals may be quite robust despite using just two-terminal devices, Josephson junctions, eliminating the need for superconductor transistors. Historically, some prototype dynamic SFQ circuits were discussed by several authors since the mid-1970s.42-48 It was only in 1985-86, however, that a complete family of dynamic SFQ logic circuits, broadly known as RSFQ, was suggested.49,50 The first simple devices of this family were experimentally implemented in 1986-1990 see the early review.8 Since 1991, the RSFQ idea has been adopted by several groups in the United States and other countries, and its development has started to progress rapidly. At the last Applied Superconductivity Conference in Virginia Beach, VA (September 2000) almost 100 papers on various aspects of RSFQ technology were presented by more than 15 groups from all over the globe. However, to our knowledge, no RSFQ-based system has been commercialized by the moment of this writing. 3. RSFQ Devices Since the variety of RSFQ devices developed by now is quite large, for this review we have picked up a limited subset which is being currently used in our current FLUX project.51 The goal of this project is the demonstration of the first RSFQ general-purpose microprocessor; as a result, the subset is quite representative. Though nominal values of circuit parameters may be readily estimated from the similarity between any particular closed loop inside an RSFQ circuit and the simple SQUID (Fig. 3a), quality design requires their intensive numerical optimization (see Sec. 4.7 below) using a specific design criterion. The parameters cited below are the result of such optimization for the broadest noise margins when operating at a relatively modest clock frequency fc = (750)-1. Other optimization criteria, more relevant to different design goals, may lead to somewhat different parameter values and even different schematics. Description of some other RSFQ devices can be found in a Web-browsable library52 and original literature. All RSFQ devices may be divided into 2 groups: - asynchronous components with no internal memory, which generate an output SFQ pulse immediately upon the arrival of an input pulse, and - synchronous (clocked) devices with internal memory, where the generation of an output pulse may be delayed substantially after the arrival of data SFQ pulse(s), until the arrival of one more SFQ pulse playing the role of the clock.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

13

3.1. Asynchronous components 3.1.1. Josephson transmission line Figure 5 shows a few segments of the simplest active RSFQ component, the Josephson Transmission Line (JTL), which had been repeatedly discussed in literature long before the full RSFQ family was suggested in 1985.
I1 L1 L2 I2 L3 I3 L4 I4 L5

(a)

A J1 J2 J3 J4

(b)

Fig. 5. Josephson transmission line: (a) schematic of a 4-stage fragment and (b) typical layout of a 2 stage fragment. Nominal parameters: IC1 = IC2 == IC = 250 A, I1 = I2 =IDC = 175 A, L2 = L3 == L = 4.0 pH (LIC 0.5 0). For a finite-length JTL fragments, the edge inductances (L1 and L5) are half of the internal inductance L. Figure (b) courtesy of HYPRES, Inc.

In the initial state of the line, the equal dc supply currents I1 = I2 = = IDC 0.7 IC, feed Josephson junctions creating, in accordance with Eq. (6), equal dc phase drops 1 = 2 = = arcsin (IDC/IC) /3 across each junction. Now, let an SFQ pulse propagate from the left to the right in Fig. 5a. There are two alternative but equivalent languages which describe dynamics of the JTL (or any other RSFQ device). In the phase-current language, the propagating SFQ pulse from input (A) to output (B) induces a 2 leap of the Josephson phase difference (similar to those discussed above in the context of the simple SQUID loop) across one junction of the line at a time. For brevity, we say that the corresponding junction switches, although the voltage across the junction vanishes both before and after this brief event see Eq. (5).

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

14

Let us start from the moment when junction J1 switches. For the loop comprising that junction, inductance L2 and junction J2, this event is equivalent to the insertion of additional external flux 0.53 This increase causes an immediate increase of the current in the loop by I 0/(LJ1 + L2 + LJ2), where LJ are effective inductances of the Josephson junctions - see Eq. (7). In the JTL, the segment inductances L2 = L3 == L are made low: LIC < 0 (non-quantizing loops). Thus the new value of current through J2, which is the sum of I and the dc current IDC 0.7 IC, exceeds IC. As a result, this junction switches just like its predecessor, with a delay D 40. Now 2 /3 + 2, i.e. the difference between 1 and 2 is small again, while the difference between 2 and 3 is close to 2. For the loop J1-L1-J2 this means a reduction of the effective external flux by 0, with all the junction currents below IC, and the cell becomes dormant, but for the next loop (J2-L3-J3) it means the increase of flux by 0, and the beginning of the similar switching process in J3. We see that the extra flux quantum propagates along the JTL, with a delay of D 40 per cell, as if it were crossing Josephson junction by junction. The latter description corresponds to the magnetic flux language of description of RSFQ circuits; below we will use whichever language is more convenient in the particular situation. Notice that the amplitude of the SFQ pulse is the same on each junction, despite the dissipation of energy E ~ IC0 (21) during each switching event. (The necessary energy is picked up from the dc power supply providing dc currents I1, I2,). This recovery/amplification of the SFQ pulse in the JTL (and all other RSFQ devices) is due to the fundamental quantization of flux and hence of the SFQ pulse area - see Eq. (24). However, this quantization leaves free the current scale of the SFQ pulse (and hence its impedance and energy scales) which may be regulated by the choice of IC (at fixed LIC product and IDC/IC ratio).54 The choice of IC indicated in Fig. 5 corresponds to the standard I/O impedance in the FLUX microprocessor. 3.1.2. Splitter and merger Two straightforward generalizations of the JTL are the SFQ pulse splitter (Fig. 6) and merger (Fig. 7). These devices are necessary in particular to complement RSFQ gates with their limited fan-in and fan-out see Sec. 3.3. (In future, the fan-out may be increased by incorporation of these components in the gates; this goal seems feasible, but still has to be achieved.) In the splitter, switching of junction J1 causes the 0-increase of effective flux applied to both branches L4-J1 and L5-J2. The resulting current increase I adds to the dc bias current and exceeds the critical currents of J1 and J2. As a result, both these junctions switch virtually simultaneously, supplying SFQ pulses to the correspondent outputs (Q1 and Q2). In the reciprocal, merger circuit (Fig. 7) an input pulse (arriving, say from input A), switches junction J4, first causing the 0-increase of the external flux applied to the nonquantizing four-junction loop J0-J1-J2-J4. In junctions J0 and J1 the resulting current increase I adds to the dc bias current IDC (in J2, I and IDC have opposite directions) and exceeds the critical current of J1. (Although IC0 IC1, the former junction is inductively shunted by the input circuit connected to B.) J1 switches, applying an additional flux to the loop consisting of J3, L2, L1, and a parallel connection of two junction pairs J0-J1 and

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

15

J2-J4, and hence increasing current through J3 (which in fact starts switching immediately after J4). This increment accelerates the switching of J3, which forms the output SFQ pulse at Q. If the input pulse arrives at B, the device dynamics is similar. If two SFQ pulses arrive at A and B almost simultaneously, either one or two of them pass to the circuit output. In the latter case, one of these pulses can be readily decimated with a special circuit at the input of the next latch. With this addition, the device shown in Fig. 7 may be used as an asynchronous OR gate.56 (a)
I0 L1 IN J0 J2 L2 L3 J1 L4 L5

(b)
L0 J0 I0

L6 Q2 Q1

J1 L1

L2

L3 Q J3

L7

L4 A

J2

J4

Fig. 6. SFQ pulse splitter (fork). Nominal parameters: IC0 = 250 A, IC1 = IC2 = 163 A, I0 = 400 A, L1 = 1.98 pH, L2 = 1.68 pH; L3 = 0.84 pH; L4 = L5 = 0.79 pH.

Fig. 7. SFQ pulse merger (asynchronous OR gate). Nominal parameters: IC0 = IC3 = IC4 = 144 A, IC1 = IC2 = 150 A, I1 = 313 A, L0 = L1 = L3 = 1.97 pH; L2 = 0.53 pH, L2 = 4.47 pH.

3.1.3. Transmitter and receiver for passive transmission lines Figures 8 and 9 show two components of a transceiver for passive superconductor transmission lines (cf. Fig. 1). Both circuits are essentially short JTL segments with Josephson junction critical currents changing along the line. This provides a reasonable matching of the effective circuit I/O impedance for this particular RSFQ device family, R = Ru/2 = 1.9 , (where Ru 0/20Iu = 3.8 ) to a higher impedance Z = 4.6 of the transmission lines. This transformation allows narrower lines and thus considerable chip real estate savings. (A small series resistor R1 in the receiver prevents undesirable dc interaction between the transmitter and receiver: without the resistor, a supercurrent could flow along the superconductor strip.) 3.1.4. Asynchronous component characterization Table 2 gives a summary of the most important time parameters of the components described above, listed in units of 0 see Eq. (17). Parameter D is the full time delay of the SFQ pulse measured, e.g., from the instant of maximum of current through the input terminal to that in the output terminal. Parameter , which is defined as 2

i ICi2(D/ICi)2

(25)

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

16

characterizes the sensitivity of the delay D to random, independent variations of critical currents ICi of all Josephson junctions of the circuit. (We will need it for the discussion of junction parameter spread effects in Sec. 3.3.) (a)
I1
R2 I1 L2 L3 L4 JTLOUT J1 J2

(b)
R1 L1

L1

L2

L3

L4
MSLIN

JTLIN J1 J2

MSLOUT

Fig. 8. SFQ pulse transmitter. Nominal parameters: IC1 = 175 A, IC2 = 125 A, I1 = 212 A, L1 = 1.97 pH, L2 = 1.58 pH; L3= 3.95 pH, L4 = 0.66 pH.

Fig. 9. SFQ pulse receiver. Nominal parameters: IC1 = 125 A (besides the explicitly shown resistors, this junction is unshunted), IC2 = 175 A, I1 = 212 A, L1 = 1.79 pH, L2 = 3.95 pH; L3= 1.58 pH, L4 = 1.97 pH, R1 = 0.71 , R2 = 7.4 .

Table 2. Speed and jitter parameters of the key asynchronous RSFQ devices (in units of 0). Merger parameters are for the case of well separated input pulses; if the pulses are simultaneous the delay is lower.

Component JTL (1 stage) Splitter Merger Transmitter Receiver

Pulse delay D 4.0 9.5 9.0 3.7 6.0

Delay sensitivity 5.9 7.0 9.2 3.2 7.6

Jitter t 0.065 0.11 0.17 0.09 0.11

The last of the listed parameters is the r.m.s. fluctuation ("jitter") t of the time delay due to the thermal fluctuations. It has been calculated for the usual operation temperature T = 4.2 K, using a method similar but somewhat more precise than that discussed in detail in Refs. 55, 56. (In fact, these methods are strictly valid for c << 1, while for practical circuits with c = 1 they give a slightly exaggerated jitter estimate.)

3.2. Latches In contract to the asynchronous devices working very similar to the usual combinational logic in semiconductor digital electronics, clocked devices have internal memory and should be formally treated as finite state machines.57 We will start the review of these devices from the simplest species, latches (flip-flops).

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

17

3.2.1. D flip-flop Figure 10 shows the simplest RSFQ latch, the D flip-flop,50 built around a quantizing loop J1-L2-L3-J2 which may be in either of two equalibrated flux states, 0 and 1. Since L2 > L3, in the initial state 0 the dc supply current I0 flows to the ground mostly through junction J2, creating a sub-critical Josephson phase drop 2 /3 across this junction, while the phase drop across J1 is small. This is why when an SFQ pulse arrives from the data terminal D (passing through a buffer stage L7-J5-L5 which is similar to one JTL stage and provides impedance matching) it switches junction J2, inserting the extra external flux 0 into the quantizing loop. However, since the loop inductance is large (L2IC1 0), the resulting clockwise current I 0/(LJ1+L2+L3+LJ2) 0/L2, is insufficient to bring the total current through J1 above its critical current value IC1 . Hence the circuit is now stuck in its another stationary flux state 1. In this state the persistent current I circulates in the quantizing loop clockwise; in J2 it subtracts from the initial dc bias current making this junction almost unbiased (2 << /2). On the contrary, in J1 the persistent current adds up to the initial dc bias, creating a subcritical phase drop 1 /3. This is why when an SFQ pulse arrives at the clock input C (via a buffer stage L6-J6-L4), it switches junction J1 rather than J2. As a result of this switching, an output SFQ pulse is formed across junction J1, while the flip-flop returns to its initial flux state 0. The buffer stage L1-J0-L0 passes the generated SFQ pulse to the output terminal Q. As a magnetic language bottom line, a flux quantum incident from input D enters the quantizing loop and is stored there, until it is released with the clock pulse and is free to propagate to the next RSFQ circuit.
J0 L0 T0 L5
L5 J6 L6 C I2 I3 J5

I0 L1 L2 I1 J3 L6 L7 J2

J1 L3 L4 J4 Q0

J0 L0 L1 J3 I1 L4

J1 L2

I0 L3

J2

A L9
D

J6 J5 L10 J8 2 I2 J9 L11 J7 L8 L12

L7

T1

Q1

Fig. 10. D flip-flop. Nominal parameters: IC0 = 276 A, IC1 = 268 A, IC2 = 269 A, IC5 = IC6 = 250 A, I0 = 250 A, I1 = 128 A, I2 = 175 A, I3 = 220 A, L0 = L6 = L7 = 1.97 pH, L1 = 3.08 pH, L2 = 6.58 pH, L3 = 1.32 pH, L5 = 0.66 pH. Here and below, line indicates a large (quantizing) inductance.

Fig. 11. D flip-flop. Nominal parameters: IC1 =IC2 = IC3 = IC5 =IC7 = IC8 = IC9 = 250 A, IC4 = IC6 = 259 A, I0 = I2 = 161 A, I2 = 296 A, L0 = L5 = L9 = 1.97 pH, L1 = L10 = 1.31 pH, L2 = L11 = 1.47 pH, L6 = 1.71 pH, L7 = 2.52 pH.

Notice that the clock pulse, generated by the buffer junction J6, is applied to junctions J3 and J1 in series. At the regular operation cycle described above, J3 is dc biased less than J1 and is not switched. However, if the clock pulse arrives when the flip-flop is in its 0 state (which happens if no data pulse arrives between two sequential clock pulses), J1 is only weakly biased, and the clock switches junction J3 instead, without any consequence for the quantizing loop. In the magnetic language, the clock pulse flux quantum drops out of the circuit across junction J1.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

18

3.2.2. D2 flip-flop Figure 11 shows a more complex latch called D2 flip-flop.58 This device still has only one quantizing loop formed by junction J5, inductances L6 and L7 in series, and a parallel connection of two junction pairs, J4 and J1, and J6 and J9. This loop is quantizing and may reside in one of two flux states, 0 and 1. Like in the D flip-flop, in the initial 0 state the input junction J5 is sub-critically biased, so that the SFQ pulse arriving from data input A switches this junction, and the whole quantizing loop, into the opposite flux state 1. In this state, branches J4-J1 and J6-J9 carry sub-critical currents (in Fig. 11, directed up for junctions J4 and J1, and down for junctions J6 and J9). Junction J3 preserves the quantizing loop from the effect of a possible second data pulse during the same clock cycle, like J3 does with the extra clock pulse in the D flip-flop described above (Fig. 10). If now a clock SFQ pulse arrives, e.g., from input T0, it passes the buffer stage L0-J0-L1 and then switches junction J1. (If the flip-flop were in state 0, junction J6 would be unbiased, and J2 would be switched instead, dropping the input flux quantum out of the circuit.) As a result of this switching, an SFQ pulse is sent to output terminal Q0, and the effective flux applied to the non-quantizing loop is increased by 0. Simultaneously, current through junction J6 increases beyond IC6, and it is switched, completing the transient process. (Notice that the whole process of the four-junction loop switching is similar to that in the merger which was discussed in Sec. 3.1.2 above see Fig. 7.) If the clock pulse arrives from T1 rather than from T0, the transient process is similar, leading to the sequential switching of junctions J8 and J9, with the output SFQ pulse sent to Q1 and the quantizing loop returned to its initial state 0. Thus the D2 flip-flop sends the trapped data (if it has arrived after the previous clock pulse) to the output corresponding to the clock input. 3.2.3. TRS flip-flop This device (Fig. 12) is one more truncation of the B flip-flop.60 Its quantizing loop is formed by inductances L7 and L8 and branches of two non-quantizing four-junction loops (J1-J6 in parallel with J8-J11, and J3-J7 in parallel with J9-J14). Each loop operates like that in the merger (Fig. 7) or the D2 flip-flop (Fig. 11). For example, if an SFQ pulse arrives from terminal S1, the right non-quantizing loop lets a flux quantum into the quantizing loop by switching junctions J3 and then J9. The reset may be achieved by applying a pulse to input S0; this leads to the flux quantum extraction through the left non-quantizing loop by successive switching of J1 and J8. (The former switching also leads to formation of an SFQ pulse at the destructive output terminal QD.) Another way to switch this flip-flop is to feed its toggle input T with a pulse; in this case the quantizing loop always switches to the state opposite to the one it had before the pulse. Each other toggle leads to generation of an output pulse at output Q. A slight modification of the device (the addition of one more output) turns it to a socalled T1 flip-flop which may be quite useful for application in decimation filters61 and as the main component of a single-bit full adder60. In the FLUX project we, however, use the TRS flip-flop only as a basic cell of the clock controller (similar to that described in Ref. 62, but with the additional option of running a pre-set number of cycles).

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

19

J0 L0 S0 I3 L9 Q J10 I0 L1 J5

J1 L2

J2 L3 QD

J3 L4 L5

J4 L6 S1 I2

J6

I1

L7

L8 I6 L12 J13 J14 I7 L14

J7

J8 L10 J12 J11

I4 L11

I5

J9

L13 J15

Fig. 12. TRS flip-flop. Nominal parameters: IC0 = 264 A, IC1 = 228 A, IC2 = 250 A, IC3 = 216 A, IC4 = 375 A, IC5 = 266 A, IC6 = 210 A, IC7 = 126 A, IC8 = 276 A, IC9 = 146 A, IC10 = 280 A, IC11 = IC13 = 250 A, IC12 = 226 A, IC14 = 229 A, IC15 = 280 A, I0 = 219 A, I1 = 156 A, I2 = 293 A, I3 = 130 A, I4 = I6 = 125 A, I5 = 340 A, I7 = 293 A, L0 = L3 = L6 = L9 = L14 = 1.98 pH, L1 = 1.24 pH, L2 = 5.28 pH, L4 = 5.97 pH, L5 = 1.76 pH, L7 = 3.16 pH, L8 = 1.13 pH, L10= 2.43 pH, L11 = 1.08 pH, L12 = 1.10 pH, L13 = 1.13 pH.

3.2.4. NDRO memory cell Any latch is essentially a single-bit memory cell, but in the flip-flops considered above the information readout is always destructive. Figure 13 shows a more complex cell enabling non-destructive readout (NDRO) of the stored bit.63 The cell is built around the quantizing loop J2-L2-L3-J9 which is switched from 0 to 1 by a pulse from input SET1 and switched back by a pulse from SET0, similarly to the D flip-flop discussed above. (The latter process yields an output pulse at the auxiliary output Q0.) The NDRO readout is enabled by an additional circuit including a series connection of two additional Josephson junctions J3 and J7 which is nested on the quantizing loop. If the loop is in state 0, the Josephson phase n in the nesting point (between inductances L2 and L3) is small, and thus all the junctions of the string have small phase drops and hence carry little supercurrent. (Additional dc current I5 makes the phase drop across J3 slightly negative.) As a result, if an SFQ pulse arrives at terminal RD and is applied to junctions J5 and J7 in series, the former junction is switched, and no output signal is developed (as required from function READ 0). However, if the quantizing loop has been switched into flux state 1, the Josephson phase drop across J9 is close to about 2. As a result, phase n (which is close to mean of 2 and 9) is close to . This phase drop is divided between junctions J3 and J7, so that the latter two junctions are now sub-critically biased. As a result, the pulse from RD switches J7 rather than J5, developing an SFQ pulse at the NDRO output Q. The transient is completed by switching J3 in the opposite direction, so that the final value of n and hence the quantizing loop state are not affected by the NDRO process.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

20

I0 L0 SET0 I1 L1 SET1 I4 L7 RD J6 J7 J8 L8 J5 J2 L2 J3 L6 I5 L9 L3 J1 J9 I6 L10 Q I2 L4 J4 I3 L5 Q0 J0

Fig. 13. NDRO memory cell. Nominal parameters: IC0 = 235 A, IC1 = 291 A, IC2 = 259 A; IC3 = 126 A, I5 = 350 A, IC6 = 318 A, IC7 = 235 A, IC8 = 303 A, IC9 = 375 A, I0 = 134 A, I1 = 129 A, I2 = I6 = 127 A, I3 = 140 A, I4 = 251 A, I5 = 163 A, L0 = 1.52 pH, L1= L5 = L7 = L10 = 1.98 pH, L2 = 2.68 pH, L3 = 1.68 pH, L4 = 4.31 pH, L6 = 1.08 pH, L8 = 2.65 pH, L9 =

3.3. Clocked gates 3.3.1. Standard RSFQ logic Before discussing particular logic gates, a signaling protocol in RSFQ circuits should be clearly defined. It should differ from that in the ordinary combinational logic accepted in semiconductor electronics, because of two inter-related factors: - "return-to-zero" nature of SFQ pulses, and - natural internal memory of quantizing SFQ loops. Most RSFQ circuits implemented so far have been based on the standard RSFQ protocol49 illustrated schematically in Fig. 14a. (a) (b)
D1 D2 DN C Q

CD

"1" "0"

C IN

DD

"1"

D1 IN D2 IN DN IN

Q OUT

DC
clock period

time Fig. 14. (a) The standard RSFQ protocol and (b) a typical clocked gate. Timing parameters shown in (a) are discussed in detail in the text below.

In this system, a signal in a data line is treated as binary 1 if it carries an SFQ pulse within the given clock period - see signal D1. On the contrary, the absence of the pulse during this time interval (see signal D2) is understood as binary 0. More generally, any RSFQ circuit using the orthodox protocol may be considered as a connection of asynchronous components and clocked gates ("elementary cells"8 or logic latches).64 Such a gate has a few (typically two) internal states and functionally may be considered as

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

21

an explicit or implicit integration of combinational logic and a latch. Input SFQ pulses change the state of the latch which stores this information until the arrival of the clock pulse. This pulse triggers output signal(s) and resets the cell into its initial state. For example, within this protocol, the D flip-flop (see Sec. 3.2.1 above) may be called a clocked YES gate. Let us discuss the implementation of a few other logic functions. 3.3.2. Inverter The RSFQ inverter8,69 (Fig. 15) is built around a quantizing loop J2-L3-J3-L2 which may be set (01 switched) by a data pulse arriving from D, very much as in the devices described above. However, the quantizing loop is now not directly connected to the common ground, but is separated from it by an additional junction (J1). As a result, when a clock pulse arrives from terminal C, it is applied to J2 and J1 in series. If the quantizing loop is in its flux state 1, junction J2 is sub-critically biased, and the clock switches it, producing an SFQ pulse across it, but no pulse at the gate output Q. Such an output pulse only appears if by the arrival of the clock pulse the quantizing loop was in its 0 state, i.e., if no data pulse(s) has arrived at the device input since the previous clock pulse (which has reset the loop into the 0 state).
J0 L0 QNOT I1 L5 L7 C I3 I4 J6 J7 L1 J2 L3 L6 L8 D J1 I0 L2 J3 L4 I2 J4 J5

Fig. 15. Clocked inverter. Nominal parameters: IC0 = 295 A, IC1 = 268 A, IC2 = 235 A, IC3 = 141 A, IC4 = 129 A, IC5 = 269 A, IC6 = 248 A, IC7 = 146 A, I0 = 125 A, I1 = 130 A, I2 = 191 A, I3 = 126 A, I4 = 215 A, L0 = L7 = L8 = 1.97 pH, L1 = 3.08 pH, L2 = 0.58 pH, L3 = 6.34 pH, L4 = 1.66 pH, L5 = 1.63 pH, L6 = 0.66 pH.

3.3.3. XOR gate Figure 16 shows a clocked XOR gate8,70,71. This device is very much similar to the merger (Fig. 7), but the main loop (including junctions J6 and J5, inductance L7, and two parallel branches L6-J3-L4-L2 and L11-J10-L14-J9) is now quantizing. If the loop has been reset to its initial state 0, junctions J2 and J9 are both sub-critically biased, so that an SFQ pulse incident from either input A or input B switches the loop to the opposite flux state 1 (e.g., in the case of pulse from A, by switching sequentially junctions J2 and J10 exactly as was described in Sec. 3.1.2.) This switching adds the persistent current I to dc current flowing through junction J6, and biases it sub-critically. If only one data pulse has arrived until the arrival of the clock pulse C, the latter pulse switches J6 rather than J4, forming an output SFQ pulse at terminal Q. However, if both pulses A and B arrive before C, the second of these pulses increases the persistent current in the quantizing loop beyond the critical current of junction J5, and this junction switches, letting the extra flux out of the quantizing loop. As a result, the gate returns to state 0, with junction J6 virtually unbiased, so that the clock pulse switches J4 rather than J6, and no output pulse is formed at output Q. The same happens if

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

22

no data pulses have arrived during the clock period, so that the full truth table of XOR function is faithfully implemented.
T L0 J0 L1 J1 L2 A L6 L7 I0 L3 J2 L4 J3 L5 J4 L8 J5 J6 J10 J8 I4 J9 I5 I3 J7 L9 L10 Q I1

I2

L11 L12 B L13 L14

Fig. 16. Clocked XOR. Nominal parameters: IC0 = IC1 = IC7 = IC8 = 250 A, IC2 = IC9 = 271 A, IC3 = IC10 = 293 A, IC4 = 210 A, IC5 = 214 A, IC6 = 255 A, I0 = I4 = 206 A, I1 = I5 = 119 A, I2 = 255 A, I3 = 236 A, L0 = L2 = L10 = L12 =198 pH, L1 = 0.29 pH, L3 = L13 = 3.98 pH, L4 = L14 = 1.43 pH, L5 = 1.47 pH, L6 = L11 = 4.23 pH, L7 = 0.53 pH, L8 = 4.02 pH, L9 = 1.33 pH.

3.3.4. AND gate Finally, Figure 17 shows a clocked AND gate.70 Its inputs are fed into two similar D flipflops (cf. Fig. 10) that operate exactly as was described in Sec. 3.2 above. Clock pulse passes through the buffer stage L4-J5-L6 and is applied simultaneously to output junctions of both D flip-flops. If both data bits A and B have arrived by that time, junctions J1 and J11 switch simultaneously and provide large enough current to force switching of junction J8 and the formation of the output pulse first across that junction and then, after passing the buffer stage L8-L9-J9-L10, at output terminal Q . On the other hand, if only one input pulse (say, A) has arrived before the clock, the additional current through J8 is not sufficient for switching of that junction, and J4 is switched instead, letting the additional flux quantum out of the circuit, with no output pulse formation. Finally, if no data pulse has arrived during the given clock period, neither of junctions J1, J11 is switched (J3 and J6 are switched instead by the clock pulse), and no output pulse is formed either. 3.3.5. Clocked component characteristics As evident from Fig. 14, clocked flip-flops and logic gates cannot be fully characterized (as asynchronous components) by just the delay D between the input (in this case clock) pulse and the output pulse; at least three more time constants have to be included: - the minimum value of interval DC between the last of the data pulses (Dn in Fig. 14a) and the next clock pulse, at which the device operates correctly, - the minimum value of the interval CD between the clock pulse and the first data pulse (D1 in Fig. 14a), and

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

23

- the minimum value(s) of the interval(s) DD between the data pulses. (Notice that in all single-input-bit gates, such as inverter, the data-to-data interval is not defined. Moreover, some two-input-bit gates, like the AND and XOR discussed above, may operate at an arbitrary data-to-data interval DD. In these cases, in all the forthcoming formulas (DD )min should be set to 0.)
J0 L0 A J2 L1 I0 L2 J3 I1 L4 T J5 L13 B J10 J11 L14 I3 L15 J12 L5 L6 L3 L3 L7 L8 J4 I2 L9 L10 Q J8 J9 J1

L11 J6

L12 J7

Fig. 17. Clocked AND. Nominal parameters: IC0 = IC11 = 199 A, IC1 = IC12 = 266 A, IC2 = IC10 = 234 A; IC3 = IC6 = 231 A, IC4 = IC7 = IC8 = 190 A, IC5 = IC9 = 250 A, I0 = I3 = 171 A, I1 = 278 A, I2 = 238 A, L0 = L13 = 1.37 pH, L1 = L14 = 1.00 pH, L2 = L15 = 4.34 pH, L3 = L6 = L11 = 0.72 pH, L3 = L12 = 0.99 pH, L4 = 2.03 pH, L5 = 0.62 pH, L7 = 0.69 pH, L8 = 2.25 pH, L9 = 0.80 pH, L10 = 1.88 pH.

Table 3 shows the timing parameters for the clocked devices discussed above. Since the sum CD + DD + DC defines the clock period, its minimum value determines the maximum clock frequency (26) (fc)max = 1/(CD + DD + DC)min of the gate in the absence of fluctuations.
Table 3. Basic speed parameters of RSFQ clocked devices (flip-flops and logic gates) in units of 0. Notice that in some gates including AND and XOR, the data-to-data interval DD may be arbitrarily small, while in singlebit circuits like inverter and simple flip-flops, this interval is not defined. For NDRO cell, this parameter depends on the data signal order; the first value is for SET preceding RESET.

Device D flip-flop D flip-flop NDRO cell Inverter XOR AND


2

D 11.0 14.1 14.5 16.0 14.4 22.7

11.5 12.0 15.3 15.1 13.0 14.0

t 0.11 0.15 0.12 0.19 0.14 0.145

(DC)min 10 14 26 12 12 0

(CD)min 11 14 38 17 12 27

(DD)min n/a 0 17/26 n/a 0 0

The main effect of thermal fluctuations If(t) is the finite probability pS of decision errors occurring in the moment of the cell of SFQ pulse arrival, in addition to the

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

24

storage errors which occur during the passive waiting time.55 (The latter errors are characterized by a rate - see Eq. (21) and its discussion). A unified description of both types of errors may be achieved in terms of degradation of RSFQ device noise margins.55 Even if fluctuation-induced errors are negligible, an RSFQ cell operates correctly only within a certain range of each parameter, including notably the Josephson junction critical currents which are the most sensitive parameters in RSFQ circuit fabrication technologies. In the device set described in this paper, the noise margins are at least 35% if each IC is varied individually and about 30% if all critical currents are changed simultaneously and proportionally.72 As the clock frequency approaches a certain maximum value, the margins shrink see solid lines in Fig. 18.
IC decision errors

nominal value timing errors storage errors 0 (fc)max clock frequency Fig. 18. Typical operation window of an RSFQ circuit (schematically). Solid lines: boundaries in the absence of fluctuations; dotted lines: levels of a fixed bit error rate. Dashed arrows explain the definition of the maximal clock frequency for a particular choice of ICi (and other parameters) in the presence of fluctuations.

For a typical RSFQ device and a given bit error rate, storage errors decrease the parameter margins region slightly from one side, while decision errors cause a considerably larger degradation of the operation region from another side (Fig. 18). For fc (fc)max/2, and a small deviation I of IC from the deterministic boundary of the operation region, the decision error probability does not depend of clock frequency and may be described by the Gaussian statistics: i.e. dp/dIC = (2)-1/2(D)-1 exp{-(I)2/2(D)2}, p = (1/2)[1 - erf (I/2 D)], (27a) (27b)

with a frequency-independent r.m.s. fluctuation D. An analytical theory for the gray zone width D due to thermal and quantum fluctuations has been developed73 and confirmed experimentally74,75 only for a simple circuit (a balanced comparator) which may serve as a model of the RSFQ decision-making component. Nevertheless, this model is in semi-quantitative agreement with results of numerical modeling76,77 and 55,78-81 experimental studies of various gates. It shows that for thermal fluctuations

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

25

D (32/)1/4 IT1/2IC1/2,

(27c)

where is a dimensionless parameter depending on the SFQ pulse shape. (For the usual RSFQ design style described in the above device examples, 0.2, but in the case of necessity this parameter may be reduced to min fc0/2 using additional shunting of some Josephson junctions.) For niobium-based implementations of RSFQ logic, the typical values of IC are close to 150 A (see Figs. 5-17 above), so that for the operation at the liquid helium temperature T = 4.2 K (IT 0.17 A) and 0.2, D 5 A, so that in the middle of the parameter range (I/IC 35%) the decision error probability at low frequencies given by Eq. (27) is reasonably low (p 10-20). Nevertheless, any further decrease of critical currents or increase of operation temperature makes the decision error rate unacceptably high for most digital applications. (This explains, in particular, our choice of the current unit Iu = 125 A which is essentially the lowest value of critical current we use in our designs.) In particular, this effect excludes operation of RSFQ circuits based on hightemperature superconductors at temperatures above ~10 K - for details, see Sec. 4.5 of Ref. 9. The bit error rate grows as the clock frequency approaches its maximum deterministic value fmax. In this case the decision error probability may be expressed as sum of those due to violations of each of the critical intervals shown in Fig. 14a: p = pDC + pCD + pDD = i = DC,DC,CC (1/2)[1 - erf (i/2 ti )], where i are the timing noise margins i i (i)min. (29) (28)

Parameters ti in Eq. (28) have the physical sense of the r.m.s. jitter of the time intervals between the pulses arriving at the decision-making part of the RSFQ gate: lastdatum-to clock, clock-to-first-datum, and first-datum-to-last-datum, respectively. Equation (28) justifies the special name "timing errors"55 for decision errors in this region. If an RSFQ circuit should operate as fast as possible, timing errors become the major factor limiting the circuit parameter margins. With the usual requirement of a very low bit error rate, Eq. (28) can be simplified using the well-known asymptotic expansion for the reciprocal error function: if p = (1/2) [1 - erf(x/2)], and p << 1, then x xa (p) {2 ln [21/2p ln (p-1)]-1}1/2. (30)

For a modest value p = 10-23 (corresponding to a 6-months average interval between errors of a 100-thousand-gate chip), xa(p) 10, while even a slight increase of this number, to say xa(p) = 11 gives a much lower the error rate p 210-28 corresponding, e.g., to a similar reliability of a large, 5-billion-gate computer, even without any circuit redundancy. Noise margins may be degraded further by random deviations of the real circuits parameters (notably of critical currents IC) from the optimal values, due to fabrication

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

26

process imperfections. In contrast to the fluctuations discussed above, these deviations IC do not change in time and rarely follow the Gaussian distribution (27a,b) exactly. This distribution may be used, however, for a crude estimate of these effects. Assuming that the critical current deviations are independent, and using the definition (25) of the sensitivity parameter , the r.m.s. (time-independent) variation of a circuit components time delay may be presented as t = IC/IC, (31)

where IC is the r.m.s. critical current spread. The maximum deviation on a chip with N similar components, which needs to be fabricated with a yield of (1 q), may be estimated using Eq. (30): tmax = xa(q/N)IC/IC. For example, if the desired fabrication yield a is 80% (q = 0.2), for a 5,000-gate chip we get xa(q/N) 4, while for a 5-milliongate chip, xa(q/N) 5. Assuming a similar rate p for all three types of timing errors, we may now write the following requirement for the noise margins: DC > xa(p/3)tDC +xa(q/N) DC IC/IC, CD > xa(p/3)tCD+xa(q/N) CD IC/IC, (32a) (32b) DD > xa(p/3)tDD+ xa(q/N) DC IC/IC, so that the minimum clock period increases, in comparison with Eq. (26), by the sum of the right hand parts of Eqs. (32). (Again, in all single-input-bit gates and some two-inputbit gates, the data-to-data interval DD may be arbitrary. In all these cases, pDD = 0 and DD = 0, so the restriction expressed by Eq. (32b) should be ignored, and factors p/3 in Eq. (32a) replaced by p/2.) Even for the imperfect present-day fabrication technologies, the relative r.m.s. spread IC/IC is as low as 1 to 2%. For this case, Table 3 and Eq. (32) show that noise margins are consumed more by the thermally-induced jitter (with xa(p/2) 10) than by the fabrication spreads (with xa(q/N) 5). Another conclusion which might be made from these data is that both the pulse jitter and fabrication-induced deviation introduced by clocked gates are both much smaller than (CD)min and (DC)min, so that the clock frequency decrease enforced by timing errors seems to be relatively small. In practical circuits, however, a very substantial (and usually dominant) contribution to jitter is provided by asynchronous circuit components, in particular the clock distribution circuits. This issue will be discussed in Sec. 4.2 below. 3.4. I/O interface components 3.4.1. Input stage (DC/SFQ converter) The single-junction circuit shown in Fig. 3a may serve as a rudimentary input stage ("DC/SFQ converter"), but its operation may be improved69 by using two additional Josephson junctions J0 and J2 in Fig. 19. Junction J0 allows the quantizing loop (L1-J0J1) to be reset to its initial state 0 when the input current Iex(t) is ramped down, without disturbing junction J1 (which is left exclusively for positive switching events, > 0),

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

27

while junction J2 together with inductors L1, L2 forms an output buffer stage, bringing the output impedance to the nominal value used in this particular logic set. 3.4.2. Output stage (SFQ/DC converter) When the ultrafast processing of digital information in an RSFQ circuit is completed, the results may be transferred to the usual (non-return-to-zero) form using a "SFQ/DC converter" shown in Fig. 20. It is based on a 2-junction quantizing loop (J3-L4-L5-J5), similar to those used in other RSFQ flip-flops and logic gates. Input SFQ pulses, after passing a buffer stage (L0-J0-L1-L2), are fed into each arm of the loop and ensure toggling of its flux state (010...) by every pulse. (This part of the converter is essentially an RSFQ T flip-flop, the device similar to, but simpler than, the TRS flip flop described in Sec. 3.2.)82 (a) (b)
Iex(t) I0 J0 L0 J1 L1 L2 L4 L3 Q

I0 L0 SFQ J0 L1 L2 J1 J3 I2 L3 L4

I1

L5 J4 J6

J2 J5

VOUT

Fig. 19. DC/SFQ converter. IC1 = IC2 = 125 A, IC3 = 162 A, 0 < Iex(t) < 250 A, I2 = 275 A, L0 = 8.41 pH, L1 = L2 = 1.32 pH, L3 = L4 =1.97 pH.

J2

Fig. 20. SFQ/DC converter. IC0 = 212 A, IC1 = 288 A, IC2 = 156 A, IC3 = 138 A, IC4 = 125 A, IC5 = 350 A; IC6 = 163 A, I0 = 106 A, I1 = 150 A, I2 = 181 A, L0 = 1.98 pH, L1= L2 = 0.66 pH, L3 = 0.79 pH, L4 = 1.58 pH, L5 = 2.89 pH .

Two readout junctions J4 and J6 are nested on the flip-flop very much like in the NDRO cell (see Fig. 13 and its discussion in Sec. 3.2), but now the bias current I2 is higher. As a result, when the flip-flop is in the state 1 with 0, and the Josephson phase in the nesting point is close to , I2 exceeds the maximum supercurrent which can be transferred from the current injection point. Hence junctions J4 and J6 have to carry part of the dc current in the form of normal current through their resistances R, providing a finite output dc voltage Vout I2R/2. On the contrary, when the flip-flop is in state 0, junctions J4 and J6 stay superconducting and Vout - d4/dt = d6/dt = 0.84 The resulting voltage signal, with a swing of several hundred microvolts, is sufficient to be transferred from the cryostat using a copper cable and amplified to the standard semiconductor transistor level by inexpensive room-temperature semiconductor amplifiers. Such a simple output interface can have a bandwidth of at least 100 megabits per second per channel.30 This rate may be increased to at least 1 gigabit per second, and quite possibly to ~10 Gbps using an additional on-chip, Josephson-amplifier to a fewmillivolt level. Such an amplifier may be based either on a latching (e.g., HUFFLE-type) circuit30 or a non-latching, multi-junction output stage.85-87

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

28

4. RSFQ Technology Development: Problems Real and Imaginary 4.1. Connectivity There are at least 3 ways to connect RSFQ components: - If the components may be laid out directly next to each other, they may be connected directly (if designed properly, see below). - If the distance between two components is not negligible, they may be connected with active lines - JTLs (Fig. 5). These lines may be arbitrarily long, but introduce considerable signal delay about 40 per stage and substantial jitter (Table 2). For example, in the 1.75-m technology,33 the signal speed in these lines is about 10 m/ps. Besides that, these lines are relatively wide (for the just mentioned technology, about 25 m), introduce considerable jitter (see Table 1) and require substantial dc current supply and as a rule three metallization levels (including the ground plane see Fig. 5). - Due to the factors listed above, passive, superconducting transmission lines (Fig. 1) should be used for all long-distance connections practically, any connections longer than the combined transceiver length (for the 1.75-m technology, about 50 m). These lines feature much faster signal propagation speed, v c [d/(d + 2)]1/2, (33)

where c is the speed of light, d and are the insulation layer thickness and dielectric constant, respectively. In typical cases, v is close to 100 m/ps, i.e. an order of magnitude higher than in a JTL. Additional advantages of the passive line include the virtual absence of jitter (besides that of transceiver circuits), relatively small width (down to ~10 m in a 1.75-m technology), and lower metallization layer consumption (down to two superconductor layers besides the line crossing points). The drawback of these lines is the necessity to use transceivers (Fig. 8 and 9) which consume chip area of the order of that of a typical gate and introduce additional latency. Our recent design experience51 indicates the need for the integration of the transceivers with RSFQ gates, the task that is certainly doable but still has to be completed. Regardless of the interconnect choice, RSFQ components should be designed in a way allowing their direct connection either to each other, or to transmission line tranceivers, without parameter re-optimization. All the circuits presented in this paper, which feature necessary I/O buffer stages, do satisfy this important condition. 4.2. Timing and jitter For any 100-GHz-scale digital technology, clock distribution and other timing issues are extremely important since even a-few-ps jitter in the data and/or clock distribution path may lead to unacceptable bit error rate. This is why various clock distribution schemes for RSFQ circuits have been discussed in detail in several papers.8,56,88 Requirements for such schemes may be formulated using a very general sketch of an RSFQ circuit fragment shown in Fig. 21.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

29

t'1

t'P D L L 1 1 M F D N

t1

tQ

t'1

t'R

Fig. 21. General scheme of timing of a two-bit RSFQ gate F, showing three racing loops (data-to-data, clockto-data, and data-to-next-clock).

Let the two data bits D and D be initially stored in clocked devices L and L. (Each of them may be either just a latch or a logic gate). Clock pulses following with a period fc-1, originate (or are split) in some point C. After through a generally different chains of asynchronous components (e.g., splitters) with time delays ti, ti, and ti, respectively, this pulse triggers the latches L, L, and finally gate F. Signals from the latches generally also pass a few asynchronous components before landing at gate F. At the ideal choice of time delays of each circuit, in the absence of fluctuations, these delays should be related as (Fig. 14a) i=1.. R ti + i=1.. N i + (DC)min = i=1.. Q ti , i=1.. Q ti + (CD)min = i=1.. P ti + i=1.. M i + fc-1, i=1.. P ti + i=1.. M i (DD)min = i=1.. R ti + i=1.. N i ,

(34)

(Generally, these sums should include the input and output delays of the clocked components, however, in the typical case when at least some of the asynchronous component numbers N, M, P, Q, and R are large, those contributions are minor.) In this case, the highest clock frequency given by Eq. (26) is achieved. However, in the real world, mutual jitter of signal and data pulses grow as they propagate through the circuit components. Assuming that the thermal fluctuations in Josephson junctions of the circuit are independent (as estimates show they should be), for the full r.m.s. jitters we get (tDC)2 = i=1..R (ti)2 + i=1..N (ti)2 + i=1.. M (i)2, (35) (tCD)2 = i=1..Q (ti)2 + i=1..P (j)2 + i=1.. Q (ti)2. 2 2 2 2 2 (tDD) = i=1.. P (ti) + i=1.. M (i) + i=1..R (ti) + j=1..N (j) .

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

30

We see that the jitter values scale crudely as (R+N+Q)1/2t, (P+M+R+N)1/2t, and (R+M+Q)1/2t, respectively, where t ~ 0.10 is the average jitter of an typical asynchronous component see Table 2. (These estimates are exact if all the asynchronous stages are similar.) A similar analysis is valid for the time-independent random variations of timing intervals, introduced by imperfect fabrication see Sec. 3.3 above. It means CD, DC (and possibly DD) participating in Eq. (32) should be calculated as (DC)2 = i=1..R (i)2 + i=1..N (i)2 + i=1.. Q (i)2, (CD)2 = i=1.. Q (i)2 + i=1..P (i)2 + i=1..M (i)2, (DD)2 = i=1.. P (i)2 + i=1.. M (i)2 + i=1..R (i)2 + i=1..N (i)2,

(36)

and scale as (R+N+Q)1/2, (Q+P+M)1/2, and (P+M+R+Q)1/2, respectively, where is the sensitivity parameter for each component, as defined by Eq. (25). The jitter scaling imposes substantial restrictions on RSFQ circuit design. For example, an attempt to implement timing of a 64-bit-wide array of RSFQ gates performing a parallel calculation, by m = 64 sequential splitters of a master clock signal, would lead to an r.m.s. jitter (relative to the clock source) about m1/2 (t)splitter = 8 (t)splitter 0.90. An almost similar jitter will accumulate in the 64 stages of logic see the 3rd column of Table 3, so that the net r.m.s. jitter will be close to 20. Multiplying it by xa(p/2) = 10, we see from Eq. (32) that this factor alone adds as much as ~200, i.e., more than 2 typical gate delays, to the clock period. Such performance degradation may be unacceptable in many cases. This is why in our current project51 we use trees for clock distribution, thus reducing the factor m1/2 by the much smaller factor (log2 m)1/2 in the total r.m.s. jitter. Quality design of RSFQ circuits requires to use Eqs. (32), (36) for a more exact calculation of parameter margin degradation (and hence of the maximum clock frequency) than the simple estimate given above. So far, to our knowledge, such a calculation has only been carried out for only few circuits, including notably a pipelined parallel fixed-point adder.56 The results show, for example, that the requirement of a 10-25 bit error rate increases the minimum clock cycle from a noise free value of 220 to as much as 460. An additional 1.5% spread of the Josephson junction currents (with a requirement of a high, 99% circuit fabrication yield) increases the period by additional 80, and thus reduces the maximum clock frequency to about 30 GHz for a 1.75-m technology or 60 GHz for a 0.8-m technology. We believe the maximum clock frequency may be similar for all RSFQ-based fixedpoint and floating-point functional units. However, our preliminary estimates show51 that branch condition handling in general-purpose microprocessors may require somewhat longer clock periods (up to 1000) unless innovative architecture solutions, taking into account the peculiarities of RSFQ logic, are used.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

31

4.3. The memory problem The NDRO memory cell shown in Fig. 13 allows very fast (a-few-picosecond) read and write operations. However, this is rather bulky: it requires, with buffer stages, 10 Josephson junctions per bit; its layout in a 4-metal-layer technology (Fig. 2) takes a chip area about 1,000 F2, where F is the minimum junction size. As a result, such cells are quite suitable for logic registers, but impractical for even relatively small on-chip memories (say, L1 caches). A much more practical solution for those memories is to use the compact, fourJosephson-junction, flux-transition memory cells developed by NEC for latching logic.89 The most important drawback of this memory is its relatively long access time, limited by the double time of flight of signals along the word and bit lines through the memory cell matrix. For a 33 mm2 matrix, this time is about 120 ps much longer than for the NDRO cell, but still much shorter than for SRAM-based semiconductor cache memories. The flux-transition memory may be adjusted for operation with RSFQ logic by replacing: - the readout circuit (dc SQUID) with a similar circuit using shunted Josephson junctions, - the ac-powered line drivers with dc-powered HUFFLE-type drivers, and - the latching decode logic with a pipelined RSFQ decoder.32 Preliminary experiments90 show that with just 5 metallic layers the memory cell area may be close to just ~ 200F2. Estimates show that with two more wiring levels the area may be reduced to ~100F2. This means that with the modest 0.8-m technology, 1 Mb/cm2 density is achievable, while the introduction of a deep-submicron fabrication technology may give 16 Mb chips for more on this, see Sec. 5.3 below. These estimates show that the much exaggerated superconductor memory problem (the fact that only very small memories have been implemented so far) has hardly anything to do with physics or technology of superconductor integrated circuits: since the early 1980s, funding for work in this direction has been practically unavailable in the United States because of some strange twist of administrative wisdom. 4.4. Magnetic flux trapping Another frequently cited problem of superconductor electronics is flux quantum trapping in superconducting ground plane. The physical reason for this effect is that magnetic flux quanta may exist not only in superconducting loops (Sec. 2.4), but also in continuous superconductors (especially thin superconducting films) where they take the form of socalled Abrikosov vortices.13,14 The vortex in a thin film is virtually axially-symmetric, with axis perpendicular to the film plane. It can be imagined as a bundle of magnetic field lines, with the total flux equal to 0 see Eq. (12). The bundles radius is limited by persistent supercurrent circulating around it, to max [, 22/t], where t is the film thickness. For a typical Nb ground plane, 0.1 m. The magnetic field penetration to a continuous superconductor becomes possible because the flux-shielding supercurrent increases toward the vortex center, eventually becoming so large that it suppresses the

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

32

film superconductivity in a central spot with the radius close to the so-called coherence distance . For typical Nb ground plane films, is somewhat smaller than 0.1 m. The Abrikosov vortex has a positive self-energy E0 (02 /20) ln(1.5/). (37)

In a typical Nb film, E0 is quite substantial, of the order of 10-17 joule, i.e. about 105 K in temperature units. This is why thermally-induced self-nucleation of vortices at temperatures below the critical temperature TC is virtually impossible. However, if a superconductor integrated circuit is being cooled from room temperature to T < TC in a substantial magnetic field B, unavoidable small variations of TC of the ground plane film cause superconductivity to arise in random spots first. Merging, these spots form superconducting loops encircling magnetic flux lines and preventing their escape from the film. As a result, even as temperature drops well below TC, the film is left with residual flux line in the form of quantized vortices with 2D density n ~ B/0, trapped on intentional and occasional inhomogeneities of the film (pinning centers). If even one vortex happens to sit too close to a loop of an RSFQ circuit, the magnetic field of the vortex may offset the bias flux in the loop and disturb the circuit operation. The problem may be solved by a combination of two measures. First, the external magnetic field may be reduced to a few nanotesla using a simple system of degaussed magnetic shields, thus decreasing the trapped vortex density to ~ 10 cm-2. Additionally, holes are patterned in free areas of the ground plane, close to all RSFQ gates. (These holes should not protrude under the superconductor interconnects, to avoid signal propagation disturbances.) A near Abrikosov vortex is attracted by such a hole, and tends to slip into it in the moment of its formation (T TC), especially if chip cooling through the critical point is carried out slowly, at a rate of the order of a few K/s. Holes in the form of moats surrounding each RSFQ gate or SFQ memory cell work best,92, 93 but cutting out nearly all free parts of the ground plane also gives acceptable results. The rule of thumb is to have at least one hole of size a >> at a distance not more than a few micrometers from each RSFQ circuit loop.94 For present-day RSFQ circuits, with their relatively low integration scale, the described combination of methods works quite well. It remains to be seen whether the currently accepted procedures are sufficient for degaussing of future VLSI RSFQ chips, but the authors feel that, if necessary, each of the components of the procedure may be improved considerably . 4.5. DC current recycling A real problem awaiting a solution is the dc power current recycling. While the dc current necessary for powering of a single RSFQ device is quite modest, of the order of 100 A per Josephson junction (see Fig. 5-17), the total current necessary for powering a VLSI RSFQ circuit may be much higher than the value which can be comfortably passed into a helium cryostat by simple copper leads (a few amperes per lead). Hence, the dc current has to be recycled, i.e. used for powering several fragments of the circuit. For this purpose, the fragments should be connected in series for the dc current, excluding the usual (galvanic) means of signal transfer between them.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

33

Apparently, the problem may be solved by SFQ pulse transfer through superconducting thin-film transformers. Such transformers, which are already used in some SFQ circuits (see, e.g., Ref. 89, 90), may be formed by a couple of overlapped short superconducting strips. The issues to be addressed in this way include possible excitation of parasitic low-frequency oscillations in tank circuits formed by the transformer inductances and large capacitances between the circuit fragments. Probably, these resonances may be successfully damped by special thin film resistors. 4.6. I/O issues As was discussed in Sec. 3.4, existing DC/SFQ and SFQ/DC converters, complemented with superconductor drivers, may allow RSFQ circuits to be interfaced with semiconductor electronics environment at frequencies up to ~10 GHz, i.e. about the highest frequency attainable in such an environment. The thermal load imposed on the helium-level cryosystem by high-frequency I/O channels based on copper cables may be quite modest (of the order of 100 W per channel), and may present a serious challenge only for extremely-large-scale systems with millions of I/O channels.31 Several groups have reported successful experiments95-98 on optical interfacing between superconductor chips and room-temperature devices. Unfortunately, for the output channels this load includes considerable power dissipation in amplifiers which are necessary to boost the signal energy from ~10-18 J/bit in RSFQ circuits to ~ 10-12 J/bit in optical channels. Input channels do not have this problem and may be quite simple,97 but for systems with a comparable number of room temperature inputs and outputs, it hardly makes sense to employ two different I/O technologies. As a result, electric cables, possibly using high-temperature superconductor wires between helium and nitrogen stages to reduce the thermal load, may be the best I/O option for RSFQ systems. Another important direction of recent work in superconductor electronics was the development of fast communications channels between superconducting chips. Though present-day experimental results are still in the range of a few GHz,99-103 analyses104-106 and the first experimental results107 show that the prospects are good for implementing superconductor multichip modules (MCMs), using for example multi-flux-quantum104 or even single-flux-quantum105-107 pulses transmitted over superconducting microstrip lines on silicon-based MCMs. In the former, more conservative, design the bandwidth may reach ~30 Gbps per channel, while in the latter case it may be as high as that of the onchip RSFQ circuitry (~ 100 Gbps). 4.7. Design and testing tools During the past decade, there was rapid progress in the development of software tools for computer-aided design of RSFQ circuits. For example, our Stony Brook team has developed such tools as Josephson junction circuit simulators PSCAN108,109, a circuit optimizer COWBOY109, and a quasi-2D inductance matrix calculator LMETER110 with a back-annotator to PSCAN, called LM2SCH. Some other RSFQ groups have created their own design tools. A useful review of these tools can be found in Ref. 111 and on the

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

34

Web.112 These tools are still insufficient for the VLSI RSFQ circuit design, in particular good layout synthesis tool still have to be developed. Testing tools also need additional development. As a typical example of the present state-of-the-art, our group has developed an automated multi-channel circuit tester OCTOPUX which can perform measurements of a diced chip at a rate of up to 300 kHz.113 The developed software support of this system allows relatively sophisticated tests of RSFQ circuits; for example, statistics of parameter spreads and thermal noise may be studied automatically with good accuracy using a special RSFQ circuit with only a few contact pads.114 RSFQ circuit testing at multi-10-GHz frequencies may be carried out using special RSFQ on-chip testers.115-117 (Because of the unique speed of RSFQ devices, their testing by high-speed room-temperature equipment, which can only be extended to a few GHz, is hardly worth the effort.) Still, tools for comprehensive testing of RSFQ chips before wafer dicing have to be developed. 4.8. Submicron RSFQ technology Future VLSI RSFQ technology requires deep-submicron (e.g., 0.3-m) Josephson junctions which involve several new issues. Indeed, Eq. (22) shows that as Amin is reduced to ~0.30.3 m2, jC exceeds 100 kA/cm2. At this stage, junction physics and RSFQ circuit scaling become somewhat different from those of a-few-micron junctions described above, due to two factors. (1) In tunnel Josephson junctions, intrinsic normal quasiparticle conductance GN is proportional to IC, so that the ICRN product is constant: (ICR)max = a(T/TC) (T)/e, a(0) ~ 1. (38)

(Within the classical BSC theory of direct tunneling between superconductors,13 a(0)= /2, though in practical niobium-trilayer junctions this constant may be some 30% lower, apparently due to proximity effects at the interface between niobium and the unoxidized fraction of the aluminum layer.) Plugging Eqs. (22) and (38) into Eq. (18) (with the experimental values of the specific capacitance listed in Table 1) we see that at jC 100 kA/cm2 the junctions become naturally overdamped: their intrinsic value of C becomes less than 1, so that the junctions may be used in RSFQ circuits even without any external shunting.118 As is evident from Fig. 5, this allows the circuit density to be increased quite dramatically: according to estimates,121 by a factor of ~3 in terms of the minimum junction area, if an adequate number (~8) of superconductor layers is used. In this case the RSFQ IC density becomes comparable with that of CMOS circuits with the same F and same functionality, while retaining much higher speed and simpler fabrication technology. (2) Experiments122,123 have shown that transport properties of niobium-trilayer Josephson junctions with jC 10 kA/cm2 differ considerably from those calculated from the theory of direct tunneling.119 Until recently, it was feared that these deviations were due to rare microshorts of the aluminum oxide layer (which is extremely thin, below 1 nm, in these high-jC junctions.) This could mean that the junctions were inherently irreproducible. However, recent experiments123 have shown a reasonably small on-chip spread of IC for junctions with jC as high as 210 kA/cm2. Moreover, in a very recent

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

35

work124 properties of these junction were quantitatively explained (Fig. 22) by the socalled multiple-Andreevreflection (MAR) theory of the Josephson effect125-127, with account of a rather general statistical distribution128,129 of the electron mode transparencies. This MBSB distribution may be interpreted as a result of resonant tunneling via random localized electron states in a disordered aluminum oxide barrier.124 If this conclusion is correct, it will mean that high-jC Josephson junctions may be inherently very reproducible (with the r.m.s. critical current spreads below 1% even for deep-submicron junctions), giving every hope for the possibility of high-yield fabrication of VLSI RSFQ circuits.
0.50.5 um2 MAR + Dorokhov MAR + MBMB data

11 um2 jc = 210 kA/cm2 (in both cases)

Fig. 22. DC I-V curves for two Josephson junction samples of different area, with the critical current suppressed by a magnetic field, in order to reveal the details of quasiparticle transfer. Solid lines: experimental results141. Dashed lines: MAR theory using the MBSB distribution128,129 of transparencies. Dotted lines: MAR theory using an alternative, Dorokhov distribution. In the absence of the magnetic field, the junctions exhibit critical current whose temperature dependence is also very well described by the MAR theory (After Ref. 126.)

Despite these advances and hopes, much research remains to be done in this field. For example, there are still no RSFQ circuit simulators (like PSCAN) which would adequately describe the specific dynamics of self-shunted, high-jC junctions. (The dynamics is only semi-qualitatively described with Eq. (8) used in existing simulators.) Also, the decision bit error rate for these junctions is determined by a combination of quantum and shot noise rather by thermal fluctuations. In fact, Eqs. (17) and (38) show that the time scale 0 for the self-shunted junctions (R = RN) approaches the fundamental value

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

36

(0)min = /2a(0)(0) 0.2 /kBTC,

(39)

about 0.17 ps for niobium - see the last column of Table 1. (By the way, this brings the highest operation frequency of a simple RSFQ device, digital frequency divider based on the T flip flop, close to 800 GHz. This prediction was confirmed in recent experiments130 where 770 GHz operation was demonstrated using 210-kA/cm2 junctions.) For the typical operation temperature T = 4.2K 0.5TC this value of 0 brings the junctions beyond (though not too far from) the thermal-to-quantum fluctuation crossover131 /0 2kBT0. (40)

A good theoretical understanding of decision errors due to quantum fluctuations,73 confirmed experimentally,75 exists only within a simple "Resistively Shunted Junction" (RSJ) model of Josephson junction dynamics described by Eq. (8) with constant R and thermally-equilibrium fluctuation sources,18 which ignores the shot noise and the MAR dynamic peculiarities. This theory indicates that the bit error rate in the quantum limit may be fairly well estimated from that in the classical limit with the replacement T (/2)1/2T0, where T0 is the crossover temperature given by Eq. (40). Since for unshunted niobium junctions at 4.2K the T0/T ratio is about 1.5, and the noise margin degradation scales as T1/2 (see Eq. (27c)), the r.m.s. jitter due to quantum fluctuations in unshunted high-jc junctions ( = 0.2) should be about 60% larger than that due to the thermal fluctuations (see Sec. 3 and 4). For the best studied case of integer adder56 it means that the maximum clock frequency of its operation with low bit error rate should be about 100 GHz (instead of the 130 GHz which could be naively anticipated from the 0 scaling). It may be expected that the quantitative analysis of fluctuations in high-jc junctions with MAR transport will give a close result. Using this assumption, requirements for the high-jc junction fabrication technology reproducibility may be formulated. As follows from the data in Tables 2 and 3, for nearly all RSFQ components the t/ ratio is close to 0.01 for thermal fluctuations at 4.2 K. According to Eq. (32) with xa(p/3)/xa(q/N) 10/5 2, it means that the clock frequency degradation due to the quantum fluctuations is crudely equivalent to 1.620.01 3% r.m.s. spread of the critical currents. For 0.3-m junctions this means that in order to avoid additional speed degradation at high yield, junction linear size should be reproduced with a 3 spread below about 15 nm. Such accuracy has been already achieved in modern photolithography,1 although its applicability to Josephson junction fabrication still has to be confirmed experimentally. (In experiments,141 junctions were defined by direct e-beam writing which is too slow for practical VLSI fabrication.)

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

37

5. Future Prospects 5.1. Immediate opportunities The main practical drawback of the niobium-based RSFQ circuits is the necessity of cooling them to helium temperatures (4 to 5 K). Currently, closed-cycle refrigerators for this temperature range are somewhat costly (~$30,000) and bulky (~100 kg), though their inconvenience relative to other fluid-based refrigeration systems is frequently exaggerated. Recent rapid progress in cryocooler technology indicates that the cost per unit may be reduced to below ~$1,000 when they are produced in volume.132 Nevertheless, the necessity of deep refrigeration imposes hard conditions on RSFQ technology applications to practical digital electronics. Crudely speaking, there is no hope of using this technology for any application which may be implemented using the mainstream, room-temperature CMOS ICs. However, for several important military and commercial applications the necessity of helium cooling of RSFQ circuits may be more than compensated by their unparalleled speed, even if they are implemented using currently existing (or slightly upgraded) fabrication technology. What follows is a very brief review of these applications. 5.1.1. Analog-to-digital converters The first successes in this area61,133,134 and analysis of possible improvements135 allow us to believe that there are good prospects for the implementation, within the next few years, of unique RSFQ ADCs with, e.g., a 16-bit signal-to-noise ratio and 100-MHz analog signal bandwidth. This is considerably better than what has been achieved with the best semiconductor ADCs.136 Hopefully, this advantage will be sufficient for the practical introduction of RSFQ ADCs in radar and wireless communication systems, in particular in software defined radio.137 5.1.2. Digital-to-analog Converters There are very good prospects for the extension of the recent progress in this area138-140 to develop in the next few years, for example, a multi-chip 20-bit DAC with settling time below 1 s, accuracy better than 0.001 ppm, and output voltage approaching 1 volt. These converters may serve, in particular, as ac voltage calibrators in metrological systems, with performance much higher than that of alternative devices,141 and at a lower cost, since the RSFQ DAC may use a simple and cheap MHz-range rf reference source rather than a complex picosecond pulse sequence synthesizer. This simplicity may allow the DACs to compete with traditional Josephson standards of dc voltage, which require expensive, high power sources of stable multi-GHz reference signals. Other possible applications of RSFQ DACs include arbitrary waveform generation in radars and secure communication systems.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

38

5.1.3. Digital SQUIDs In contrast to present-day analog SQUIDs, with slew rate below ~ 106 flux quanta per second,142 their digital counterparts143 will use superfast on-chip feedback providing slew rates beyond 1010 0/s. This feature may allow electronic subtraction of interference and, as a result, operation of SQUIDs without external magnetic shields which now are a major component of system cost. When implemented, digital SQUIDs may rapidly replace their analog counterparts in most application areas, and help, in particular, to move biomedical applications of these devices144 from research centers to medical clinics. 5.1.4. Digital autocorrelators These devices can combine unprecedented bandwidth with very small size and power consumption. The first 16-channel prototype of such a correlator has already been tested.117 However, the number of channels of such systems still has to be increased to values of practical interest for radio astronomy and other applications (1,024 channels and beyond). For this, current fabrication technology should be improved, at least modestly, to allow a higher integration scale, at least ~100K Josephson junctions per chip. 5.1.5. Pseudo-random signal circuits Some circuits of this class, including pseudo-random number generators, modulators and demodulators, may be relatively simple (hundreds of elementary cells) and thus implemented using the current niobium-trilayer technology, while providing a decisive speed advantage over the semiconductor competition in spread-spectrum communication systems, e.g., in 3G wireless communication systems like CDMA. The first RSFQ pseudo-random generators have already been designed, fabricated and tested.145-147 5.2. Long term prospects The circuits and systems listed above, as interesting and important as they may be, nevertheless occupy hardly more than just narrow niches in the immense electronics market. With the transfer to a submicron, VLSI RSFQ technology (see Sec. 4.8 above), many other applications will become possible. These applications include notably: 5.2.1. Ultrafast digital switching Preliminary studies indicate148 that RSFQ circuits can be used for the implementation of digital switching cores with unparalleled speed performance with very low power consumption and (as a result) high circuit density. For example, a 128128-channel selfrouting Batcher-banyan core for a 424-bit (ATM) packet payload, implemented in a 0.8m technology, could provide throughput close to 100 Gbit per channel, dissipate about 10 mW of power and fit on a single 11 cm2 chip. To our knowledge, no semiconductor, electro-optical, or fully-optical system could provide comparable performance. The traditional switches for digital communications, however, include large memory components, mostly to search for the physical address of the packet inside the switch

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

39

using the destination address carried in the packet header. The feasibility of their implementation using RSFQ technology, or the feasibility of new switch architectures, still have to be explored. 5.2.2. Digital signal processing RSFQ technology seems uniquely suited for several types of digital signal and image processing including motion estimation, digital Fourier and cosine transforms, etc., for applications in communication systems and high-definition digital television. As an illustration of the possible speed of such processing, a RSFQ fixed-point 32-bit multiplier would be able to provide throughput close to 60 billion operations per second (gigaops)56 in comparison with just a few gigaops for modern CMOS DSPs. The estimated power consumption of a floating-point RSFQ DSP is close to 50 W per gigaflops, the number to be compared to approximately 1 W per gigaflops for the best prospective CMOS DSP-based systems such as IBMs Blue Gene.149 Several RSFQ blocks important for DSP applications have already been designed.56,68,71,150 5.2.3. High-performance general-purpose computing According to the ITRS1, by the year 2006 high-performance microprocessors may reach a clock frequency of 2 to 3.5 GHz, and a microprocessor assembly featuring up to 200 million transistors may be placed on a single ~500-mm2 chip dissipating up to 160 watts. The peak performance of such a multiprocessor CMOS chip can be crudely estimated as 10 to 100 gigaflops. Notice that this estimate is based on a very optimistic assumption of 70-nm fabrication CMOS technology, for which there are still "no known solutions".1 On the other hand, preliminary design work32,151-153 shows that an RSFQ microprocessor using a much more conservative, 0.3-m fabrication technology to place just about 30 million Josephson junctions on a chip of comparable area, and operating at clock frequency about 90 GHz would be able to provide a peak performance of approximately 2,000 gigaflops, while dissipating power below 1 watt. This dramatic advantage may be used on at least two system size scales: (a) Unique petaflops-scale systems. To achieve a peak performance of 1 petaflops will take 10 to 100 thousand advanced CMOS chips discussed above, with a total power consumption of the order of 10 MW. The management of power of such proportions would take a sizeable building. The significant (microsecond-scale) latency of interprocessor communication in a system of such a physical size would make the system stall for programs where inter-processor communication is a large enough fraction of the computation process. The problems associated with semiconductor processors have stimulated a search for alternative approaches to petaflops-scale computing, in particular, the Hybrid Technology MultiThreaded architecture (HTMT) project based on the use of RSFQ technology for most number crunching and inter-processor communications see Fig. 23. Our preliminary design work on the RSFQ COOL core32,151-153 for the HTMT computer system indicates that the 1-petaflops peak performance might be reached with just 500 logic chips (plus about 2,000 fast superconductor memory chips), with aggregate power dissipation in the core below 1 kW. Though removal of such power from the

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

40

cryostat would require a large-scale close-cycle cryocooler (helium recondenser) consuming about 300 kW, this is still considerably less than what would be required for a CMOS-based system. Even more important, the cryocooler would be remote, enabling to compact the RSFQ core into a 1-m3 volume. As a result, the simulated average latency of inter-processor communication network (including both switching delays and signal timeof-flight) is as low as 20 ns,153 apparently enabling the system as a whole to sustain a subpetaflops performance at many real-life computer programs. (b) Personal teraflops-scale computing (PeT). Much larger potential market (up to $10B/yr worldwide) may exist for high-performance desktop-scale systems (personal workstations and corporate servers) with just a few RSFQ VLSI chips, flip-chip-mounted on a single superconductor-wired MCM. Estimates151 show that a PeT computer would be able to sustain a few-teraflops-scale performance, while its cost, at a production volume in 100,000s per year, may be below $100K. This would provide at least an order-ofmagnitude price-to-performance advantage over semiconductor competition.
RSFQ COOL core nitrogen helium

fiber DV-to-fiber channel hard disk channel array servers interfaces (40 (10 (2 cabinets) cabinets) cabinets)

tape silo array (20 silos)

WDM source

980 nm optical pumps (20 cabinets)

monitor and control optical WAN amplifiers computer w/ data gateways acquisition cards (1 cabinet) (3 cabinets)

front end server w/console (x4)

40 m Fig. 23. Side view on the HTMT petaflops computer room: a conceptual design. (Picture courtesy of J. Morookian and L. Bergman, Jet Propulsion Laboratory.)

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

41

6. Conclusion Assuming that the problems outlined in Sec. 4 are successfully and promptly solved, we may place crude year tags on the RSFQ technology levels characterized in Table 1, thus obtaining the expected RSFQ learning curve (Fig. 24). This plot is compared with the ITRS predictions for the mainstream CMOS technology.1
1 THz

HTS RSFQ
(??)

Nb RSFQ
(the authors forecast) 100 GHz 1.75 m 100K JJs 3.5 m 5K JJs

0.8 m 1 M JJs

0.3 m 10M JJs

DSP FLUX microprocessor ADC, DAC, DSQUID, etc.

petaflops computing, PeT computers, etc.

10 GHz

0.065 m 1 GHz 0.14 m photolithography historic trend 100 MHz 1996 1999 2002 2005 0.09 m

0.045 m

0.02 m

"no known solutions" (e-beam lithography?)

(the ITRS 1999 forecast1)

Si CMOS

2008

2011

Year
Fig. 24. An optimistic version of the expected progress of clock frequency of the high-performance semiconductor and superconductor LSI circuits. The numbers near the points show the necessary minimum feature size and (for RSFQ) the anticipated integration scale. Dashed lines on the right indicate the ranges where forecasts seem rather uncertain. Dotted lines on the left show the CMOS historic trend.

It is probably evident that the RSFQ speed advantage is so great that even if the time tags in this (quite subjective) forecast are somewhat misplaced, the potential value of RSFQ as the possible fastest practical digital technology can hardly be questioned. After the transfer to deep-submicron design rules and multiple wiring levels, the RSFQ logic circuits may also be the densest (for the given patterning technology level). In addition, a breakthrough in the technology of Josephson junctions based on high-temperature

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

42

superconducting (HTS) materials155,156 may make it possible for RSFQ systems to operate at even higher speed, though probably not at much higher temperatures. We believe that with a relatively modest government and/or industrial effort, RSFQ could be established as the leading digital technology for high-performance computing, wireless communications and precise instrumentation. However, if this support does not arrive very soon, the current momentum may be lost, and then it will take much more time and money to revive this technology when its remarkable advantages are finally broadly recognized. Acknowledgments Fruitful discussions with numerous colleagues, and valuable comments by T. Claeson, M. Dorojevets and V. Semenov are gratefully acknowledged. The authors are grateful to D. Brock (HYPRES), and J. Morookian and L. Bergman (JPL) for their kind permission to use previously unpublished figures. This work was supported in part by DoD and NASA via JPL. References
1. 2. 3. 4. 5. 6. 7. The International Technology Roadmap for Semiconductors, 1999 Version, 2000 Update, available on the Web at public.itrs.net/. See, e.g., C. Wann, F. Assaderaghi, and Y. Taur, High-performance 0.07-micrometer CMOS with 9.5 ps Gate Delay and 150 GHz ft , IEEE El. Dev. Lett. 18 (1997) 625-627. G. A. Sai-Halasz, Performance trends in high-end processors, Proc. of IEEE 83 (1995) 20-36. S. Borkar, Design challenges for technology scaling, IEEE Micro (1999) 23-29. A fine collection of modern microprocessor specifications may be found on the Web at www.geek.com/procspec/procspec.htm G. Raghavan, M. Sokolich, and W. E. Stanchina, Indium phosphide ICs unleash the high-frequency spectrum, IEEE Spectrum 37 (2000) No. 7, 47-52. S. L. Rommel, T. E. Dillon, M. W. Dashiell, H. Feng, J. Kolodzey, P. R. Berger, P. E. Thompson, K. D. Hobart, R. Lake, A. C. Seabaugh, G. Klimeck, and D. K. Blanks, D.K. Room temperature operation of epitaxially grown Si/Si0.5Ge0.5/Si resonant interband tunneling diodes, Appl. Phys. Lett. 73 (1998) 2191-2193. K. K. Likharev and V. K. Semenov, V.K. RSFQ logic/memory family: A new Josephson-junction digital technology for sub-terahertz-clock-frequency digital systems, IEEE Trans. on Appl. Supercond. 1 (1991) 3-28. K. K. Likharev, Superconductor devices for ultrafast computing, in: H. Weinstock (ed.) Applications of Superconductivity, Kluwer, Dordrecht, 2000, pp. 247-294. K. Likharev, Superconductors speed up computation, Phys. World (1997) No. 5, 39-43. D. K. Brock, E. Track, and J. M. Rowell, Superconductor ICs: the 100-GHz second generation, IEEE Spectrum 37 (2000) No. 12, 40-46. In particular, the Stony Brook RSFQ group page gamayun.physics.sunysb.edu/RSFQ/RSFQ.html has links to most of these sites. See, e.g., M. Tinkham, Introduction to Superconductivity, 2nd ed., McGraw-Hill, New York, 1996. T. Van Duzer and C. W. Turner, Principles of Superconducting Circuits, Elsevier, New York, 1981. R. L. Kautz, Picosecond pulses on superconducting striplines, J. Appl. Phys. 49 (1978) 308-314. S. V. Polonsky, V. K. Semenov, and D. F. Schneider, Transmission of single-flux-quantum pulses along superconducting microstrip lines, IEEE Trans. on Appl. Supercond. 3 (1993) 2598-2600. M. Currie, R. Sobolewski, and T. Y. Hsiang, High-frequency crosstalk in superconductor microstrip waveguide interconnects, ibid. 9 (1999) 3602-3605. K. K. Likharev, Dynamics of Josephson Junctions and Circuits, Gordon and Breach, New York, 1986.

8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

43

19. M. Gurvitch, M. A. Washington, and H. A. Huggins, High refractory Josephson tunnel junctions utilizing thin aluminum layers, Appl. Phys. Lett. 42 (1983) 472-475. 20. S. Hasuo, T. Imamura, and N. Fujimaki, Recent advances in Josephson junction devices, Fujitsu Techn. J. 24 (1988) 284-292; Y. Tarytani, M. Hirado, and U. Kawabe, Niobium-based integrated circuit technologies, Proc. IEEE 77 (1989) 1164-1176. 21. Josephson computer technology: An IBM research project, IBM J. Res. Devel. 24 (1980) No. 5. 22. S. Hasuo, S. Kotani, A. Inoue, and N. Fujimaki, High speed Josephson processor technology, IEEE Trans. on Magn. 27 (1991) 2602-2609. 23. HYPRES Design Rules, available from HYPRES, Inc., 175 Clearbrook Rd., Elmsford, NY 10523, U.S.A., and from the Web site www.hypres.com. 24. M. Jeffery, W. Perold, and T. Van Duzer, Superconducting complementary output switching logic operating at 5-10 Gbps, Appl. Phys. Lett. 69 (1996) 2746-2749. 25. Y. Hashimoto, S. Yorozu, H. Numata, M. Koike, M. Tanaka, and S. Tahara, High-speed testing of Josephson logic circuits by an on-chip signal-pattern generator, in: Ext. Abst. of Int. Supercond. Electronics Conf., PTB, Berlin, 1997, pp. 269-271. 26. This problem is avoided in dc-powered latching circuits of the HUFFLE family. Originally proposed long ago27, these devices were later improved substantially 28,29 so that operation of single gates was demonstrated at frequencies up to 6 GHz. These circuits share, however, one more drawback of latching circuits: relatively high power consumption. Their power per gate at a few GHz is of the order of 10 W per gate, i.e., at least two orders of magnitude higher than in RSFQ technology. As a result, HUFFLE devices are not a match for RSFQ in basic logic circuits; however, they may be quite useful for some auxiliary functions, e.g., as amplifiers in the superconductor/semiconductor electronics interfaces30 and as drivers in superconductor memories32. Indeed, since the round-trip time-of-flight through, say, a 1-cm-long memory drive line is about 200 ps, the limited speed of the latching circuits is not such a big problem as it is in logic. Notice that there have been also several other directions of superconductor digital electronics, including dual-rail voltage-state logic, almost similar SAIL devices, and Josephson field-effect transistors, which eventually ran into a dead end for their critical review, see, e.g., Ref. 9. 27. A. F. Hebard, S. S. Pei, L. N. Dunkleberger, and T. A. Fulton, A DC powered Josephson flip-flop, IEEE Trans. on Magn. 15 (1979) 408-411. 28. Y. Hatano, H. Nagaishi, S. Yano, K. Nakahara, H. Yamada, S. Kominami, and M. Hirano, An all DC-powered Josephson logic circuit, IEEE J. of Solid State Circuits 26 (1991) 1123-1132. 29. H. Hasegawa, H. Nagaishi, S. Kominami, H. Yamada, and T. Nishino, A DC-powered Josephson logic family that uses hybrid unlatching flip-flop logic elements (HUFFLES), IEEE Trans. on Appl. Supercond. 5 (1995) 3504-3510. 30. D. F. Schneider, J.-C. Lin, S. V. Polonsky, V. K. Semenov, and C. A. Hamilton, Broadband interfacing of superconducting digital systems to room temperature electronics, ibid. 5 (1995) 31523155. 31. L. Abelson, Q. P. Herr, G. L. Kerber, M. Leung, and S. Tighe, Manufacturability of superconductor electronics for a petaflops-scale computer, IEEE Trans. on Appl. Supercond. 9 (1999) 3202-3207. 32. M. Dorojevets, P. Bunyk, D. Zinoviev, and K. Likharev, COOL-0: Design of an RSFQ subsystem for petaflops computing, ibid. 9 (1999) 3606-3614. 33. K.K. Likharev, Dynamics of some single-flux-quantum devices. I. Parametric Quantron, IEEE Trans. on Magn. 13 (1976) 242-244. 34. C. Bennett, Logical reversibility of computation, IBM J. Res. Devel. 17 (1973) 525-532. 35. K. Likharev, Classical and quantum limitations on energy consumption at computation, Int. J. Theor. Phys. 21 (1982) 311-326. 36. K. K. Likharev, S. V. Rylov, and V. K. Semenov, Reversible conveyor computation in array of parametric quantrons, IEEE Trans. Magn. 21 (1985) 947-950. 37. S. V. Rylov and V. K. Semenov, Superconductor quantum interferometers as elements with controllable, sign changeable inductance, and their use in parametric quantrons, Sov. Microelectronics 17 (1988) 109-116. 38. K.F. Loe and E. Goto, Analysis of flux input and output Josephson pair device, IEEE Trans. Magn. 21 (1985) 884-887. 39. M. Hosoya, W. Hioe, J. Casas, R. Kamikawai, Y. Harada, Y. Wada, H. Nakane, R. Suda and E. Goto, Flux quantum parametron: A single quantum flux device for Josephson supercomputer, IEEE Trans. on Appl. Supercond. 1 (1991) 77-89.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

44

40. M. Hosoya and W. Hioe, Margin analysis of quantum parametron logic gates, ibid. 3 (993) 30223028. 41. R. Suda, R. Kamikawai, Y. Wada, W. Hioe, M. Hosoya, and E. Goto, QFP wiring problem Introduction and analytical considerations, IEEE Trans. on CAD/ICAS 13 (1994) 48-56. 42. K. K. Likharev, Properties of a superconducting ring closed with a weak link as a device with several stable states, Radio Eng. and Electron. Phys. 19 (1974) No. 7, 109-115. 43. K. Nakajima, Y. Onodera, and Y. Ogawa, Logic design of Josephson network, J. Appl. Phys. 47 (1976) 1620-1627. 44. K. Nakajima and Y. Onodera, Logic design of Josephson network II, ibid. 49 (1978) 2958-2963. 45. J. P. Hurrell and A. H. Silver, SQUID digital electronics, in: B.S. Deaver Jr. et al. (eds.), Future Trends in Superconductive Electronics, AIP, New York, pp. 437-447. 46. J. P. Hurrell, D. C. Pridmore-Brown, and A. H. Silver, A/D conversion with unlatched SQUIDs, IEEE Trans. Electron. Dev. 27 (1980) 1887-1896. 47. K. Nakajima, G. Oya, G., and Y. Sawada, Fluxoid motion in phase mode Josephson switching system, IEEE Trans. on Magn. 19 (1983) 1201-1204. 48. G. Oya, M. Yamashita, and Y. Sawada, Single flux quantum 4JL-interferometer operated in the phase mode, ibid. 21 (1985) 880-883. 49. K. K. Likharev, O. A. Mukhanov, and V. K. Semenov, Resistive single flux quantum logic for the Josephson-junction technology, in: H. Hahlbohm and H. Lbbig (eds.) SQUID'85, W. de Gruyter, Berlin, 1985, p. 1103-1108. 50. K. K. Likharev, K.K., O. A. Mukhanov, and V. K. Semenov, Ultimate performance of RSFQ logic circuits, IEEE Trans. on Magn. 23 (1987) 759-762. 51. M. Dorojevets, P. Bunyk, and D. Zinoviev, FLUX Project: Design of a 20-GHz 16-bit Ultrapipelined Processor Prototype Based on 1.75-um LTS RSFQ Technology, Report #1EK04, 2000 Applied Superconductivity Conference, to be published in IEEE Trans. on Appl. Supercond. 11 (2001) No.2. 52. P. Bunyk, A. Rylyakov, K. Likharev, P. Litskevitch, and D. Zinoviev, SUNY RSFQ Cell Library, available on the Web at gamayun.physics.sunysb.edu/RSFQ/Lib/. 53. For a single-junction loop (Fig. 3) this equivalence is clear from Eq. (11). A very similar equation may be written for any loop of an RSFQ device. 54. If necessary, SFQ pulse current and energy may be boosted by an exponential ramp-up of the junction critical currents along the array see Fig. 9 and its discussion. 55. A. V. Rylyakov and K. K. Likharev, Pulse jitter and timing errors in RSFQ circuits, IEEE Trans. on Appl. Supercond. 9 (1999) 3539-3544. 56. P. Bunyk and P. Litskevich, Case study in RSFQ design: Fast pipelined parallel adder, ibid. 9 (1999) 3714-3720. 57. See, e.g., C. H. Roth, Fundamentals of Logic Design, West Publishing Co., St. Paul, MN, 1985 58. This device was suggested by one of the authors.59 It may be also considered as a truncation of a more general device, the B flip flop, suggested earlier.60 59. D. Zinoviev, Design and partial implementation of RSFQ-based Batcher-banyan switch and support tools, PhD thesis, SUNY at Stony Brook, Aug. 1997. 60. S. Polonsky, V. Semenov, and A. Kirichenko, Single-flux-quantum B flip flop and its possible applications, IEEE Trans. on Appl. Supercond. 4 (1994) 9-18. 61. J. C. Lin, V. K. Semenov, and K. K. Likharev, Design of an SFQ-counting analog-to-digital converter, ibid. 5 (1995) 2252-2259. 62. J.-C. Lin and V. Semenov, Timing circuits for RSFQ digital systems, ibid. 5 (1995) 3472-3477. 63. O. A. Mukhanov, S. V. Rylov, V. K. Semenov, and S. V. Vyshensky, RSFQ logic arithmetic, IEEE Trans. on Magn. 25 (1989) 857-860. 64. Several different approaches to RSFQ computing have been suggested. For example, in the dual-rail approach65-68 two lines are used to carry exactly one SFQ pulse during each clock period: binary information is coded according to which line carries this pulse. This approach may be stretched to implement completely asynchronous ("delay-insensitive") circuits66,68 which fire outputs as soon as the later of the input pulses has arrived. Preliminary comparison of sizable circuits56,68 indicates that delay-insensitive circuits require more hardware (measured in, say, the number of Josephson junctions) for the same function, though they provide some improvement in the maximum processing rate (throughput), while their processing delay (latency) is almost similar to architectures with explicit local clock. 65. Z. J. Deng, N. Yoshikawa, S. R. Whiteley, and T. Van Duzer, Data-driven self-timed RSFQ digital integrated circuits and systems, IEEE Trans. on Appl. Supercond. 7, (1997) 3634-3637.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

45

66. P. Patra, S. Polonsky, and D. S. Fussel, Delay insensitive logic for RSFQ superconductor technology, in: Proc. of 3rd Int. Symp. on Adv. Res. in Asynchronous Circ. and Syst. (Async97), IEEE Comp. Soc., Los Alamitos, CA, pp. 42-53. 67. A. V. Rylyakov and S. V. Polonsky, All-digital 1-bit RSFQ autocorrelator for radioastronomy applications: Design and experimental results, IEEE Trans. on Appl. Supercond. 8 (1998) 14-19. 68. Y. Kameda, S. V. Polonsky, M. Maezawa, and T. Nanya, Self-timed parallel adders based on DI RSFQ primitives, IEEE Trans. on Appl. Supercond. 9 (1999) 4040-4045. 69. A. Kidiyarova-Shevchenko, A. Kirichenko, S. Polonsky, and P. Shevchenko, New elements of the RSFQ logic memory (Part 2), in: Ext. Abstr. of the 3rd Int. Supercond. Electron. Conf., Glasgow, UK, 1991, pp. 200-203. 70. S. V. Polonsky, V. K. Semenov, P. I. Bunyk, A. F. Kirichenko, A. Yu. Kidiyarova-Shevchenko, O. A. Mukhanov, P. N. Shevchenko, D. F. Schneider, D. Yu. Zinoviev, and K. K. Likharev, New RSFQ circuits, IEEE Trans. on Appl. Supercond. 3 (1993) 2566-2567. 71. S. V. Polonsky, J. C. Lin, and A. V. Rylyakov, RSFQ arithmetic blocks for DSP applications, ibid. 5 (1995) 2823-2826. 72. A bipolar power supply may increase the margins to approximately 40%. 73. T. V. Filippov, The quantum dissipation properties of a Josephson balanced comparator, Rus. Microelectron. 25 (1996) 250-256. 74. T. V. Filippov, Yu. A. Polyakov, V. K. Semenov, and K. K. Likharev, Signal resolution of RSFQ comparators, IEEE Trans. on Appl. Supercond. 5 (1995) 2240-2243. 75. V. K. Semenov, T. V. Filippov, Yu. A. Polyakov, and K. K. Likharev, SFQ balanced comparators at a finite sampling rate, ibid. 7 (1997) 3617-3621. 76. J. Satchell, Stochastic simulation of SFQ logic, ibid. 7 (1997) 3315-3318. 77. M. Jeffery, P. Y. Xie, S. R. Whiteley, and T. Van Duzer, Monte Carlo and thermal noise analysis of ultra-high-speed high temperature superconductor digital circuits, ibid. 9 (1998) 4095-4098. 78. B. Ruck, Y. Chong, R. Dittmann, A. Engelhardt, B. Oelze, E. Sodtke, W. E. Booij, and M. G. Blamire, Measurement of the error rate of single flux quantum circuits with high temperature superconductors, ibid. 9 (1999) 3850-3853. 79. E. J. Dean, P. D. Dresselhaus, J. X. Przybysz, A. H. Miklich, A. H. Worsham, and S. V. Polonsky, Bit error rate measurements for GHz code generator circuits, ibid. 9 (1999) 3598-3601. 80. Q. P. Herr, M. W. Johnson, and M. J. Feldman, Temperature-dependent bit-error rate of a clocked superconducting digital circuit, ibid. 9 (1999) 3594-3597. 81. A. M. Herr, M. J. Feldman, and M. Bocko, Timing jitter and bit errors in a 64-bit circular shift register, IEEE Trans. on Appl. Supercond. 9 (1999) 3721-3724. 82. Preliminary versions of this device were suggested45 and implemented83 well before the invention of the full RSFQ logic set. 83. C. A. Hamilton and F. L. Lloyd, 100 GHz binary counter based on the dc SQUIDs, IEEE Electron. Dev. Lett. 3 (1982) 335-338. 84. In fact, the output part of the SFQ/DC converter is just a galvanically-coupled version of a superconductor quantum magnetometer ("dc SQUID"13,14); more usual, magnetically-coupled versions of the SFQ/DC converter are also possible. 85. J.X. Przybysz, J. D. McCambridge, P. D. Dresselhaus, A. H. Worsham, and E. J. Dean, Dewar-todewar data transfer at GHz rates, IEEE Trans. on Appl. Supercond. 9 (1999) 2981-2984. 86. R. D. Sandell, J. W. Spargo, and M. Leung, High data rate switch with amplifier chip, ibid. 9 (1999) 2985-2988. 87. O. A. Muhkanov, S. V. Rylov, D. V. Gaidarenko, N. B. Dubash, and V. V. Borzenets, Josephson output interfaces for RSFQ circuits, ibid. 7 (1997) 2826-2831. 88. K. Gaj, E. G. Friedman, and M. J. Feldman, Timing of multi-gigahertz RSFQ digital circuits, J. of VLSI Signal Processing Systems 16 (1997) 247-276. 89. S. Tahara, I. Ishida, S. Nagasawa, M. Hidaka, H. Tsuge, and Y. Wada, 4-Kbit Josephson nondestructive read-out RAM operated at 580 ps and 6.7 mW, IEEE Trans. on Magn. 27 (1991) 2626-2633. 90. S. Nagasawa, S. Tahara, H. Numata, and A. Tsuchida, A. Miniaturized vortex transitional Josephson memory cell by a vertical integrated device structure, IEEE Trans. on Appl. Supercond. 4 (1994) 1924. 91. The trapping may happen in other layers of a superconductor integrated circuit as well, but since these layers are usually patterned into narrow interconnecting wires, the effect is much less probable.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

46

92. S. Bermon and T. Gheewala, Moat-guarded Josephson SQUIDs, IEEE Trans. Magn. 19 (1983) 1160-1164. 93. S. Nagasawa, H. Numata, C. Kato, and S. Tahara, Evaluation of trapped magnetic flux for Josephson 4-Kbit RAMs, in Ext. Abstr. of Int. Supercond. Electron. Conf., Nagoya, Japan, 1995, pp. 192-194. 94. Spectacular experimental images of flux quanta trapped on random pinning centers and intentional holes may be found in the paper M. Jeffery, T. Van Duzer, J. Kirtley, and M. B. Ketchen, Magnetic imaging of moat-guarded superconducting electronic circuits, Appl. Phys. Lett. 67 (1995) 17691771. 95. K. Nakahara, H. Nagaishi, H. Hasegawa, S. Kominami, H. Yamada, and T. Nishino, Optical inputoutput interface system for Josephson-junction integrated circuits, ibid. 4 (1994) 223-227. 96. L. A. Bunz, R. Robertazzi, and S. Rylov, An optically coupled superconducting analog to digital converter, ibid. 7 (1997) 2972-2974. 97. J. F. Bulzacchelli, H.-S. Lee, A. Sotiris, J. A. Misewich, and M. B. Ketchen, Optoelectronic clocking system for testing RSFQ circuits up to 20 GHz, ibid. 7 (1997) 3301-3306. 98. D. Gupta, D. V. Gaidarenko, and S. V. Rylov, A 16-bit analog-to-digital converter module with optical output, ibid. 9 (1999) 3030-3033. 99. S. Tanahashi, T. Kubo, K. Kawabata, R. Jikuhara, G. Kaji, M. Terasawa, H. Nakagawa, M. Aoyagi, I. Kurosawa, and S. Takada, S. Superconductor wiring in multi-chip module for Josephson LSI circuits, Jpn. J. Appl. Phys. 32, pt. 2 (1993) L898-L900. 100. T. Ogashiwa, H. Nakagawa, H. Akimoto, H. Shigyo, and S. Takada, Flip-chip bonding using superconducting solder bump, Jpn. J. Appl. Phys. 34, pt. 1 (1995) 4043-4046. 101. R. D. Sandell, G. Akerling, and A. D. Smith, Multichip packaging for high-speed superconducting circuits, IEEE Trans. on Appl. Supercond. 5 (1995) 3160-3163. 102. B. J. Dalrymple, M. Leung, R. D. Sandell, and J. Spargo, Multi-Gb/s operation of flipped chip MVTL circuits, ibid. 7 (1997) 2693-2696. 103. J. X. Przybysz, D. L. Miller, S. S. Martinet, J. H. Kang, A. H. Worsham, and M. L. Farich, Interface circuits for chip-to-chip data transfer at GHz rates, ibid. 7 (1997) 2657-2660. 104. S. P. Polonsky and D. F. Schneider, Toward broadband communications between RSFQ chips, ibid. 7 (1997) 2818-2821. 105. M. Maezawa, M. Yamamori, and A. Shoji, A novel approach to chip-to-chip communications using single flux quantum pulse, ibid. 9 (1999) 4049-4052. 106. H. Toepfer, T. Lingel, F. H. Uhlman, and M. Aoyagi, Numerical studies of interchip pulse transmission for complex RSFQ systems, ibid. 9 (1999) 3725-2728. 107. M. Maezawa, H. Yamamori, A. Shoji, Chip-to-chip communication using a single flux quantum pulse, ibid. 10 (2000) 1603-1605. 108. S. V. Polonsky, V. K. Semenov, and P. N. Shevchenko, PSCAN - Personal superconductor circuit analyzer, Supercond. Sci. Technol. 4 (1991) 667-669. 109. S. Polonsky, P. Shevchenko, A. Kirichenko, D. Zinoviev, and A. Rylyakov, PSCAN96: New software for simulation and optimization of complex RSFQ circuits, IEEE Trans. on Appl. Supercond. 7 (1997) 2685-2689. 110. P. I. Bunyk, and S. V. Rylov, Automated calculation of mutual inductance matrices of multilayer superconductor integrated circuits, in Abstr. of Int. Supercond. Electronics Conf., NIST, Boulder, CO (1993); LMETER is available at http://gamayun.physics.sunysb.edu/~paul/lmeter/lmeter.html under the conditions of Gnu Public License (GPL). 111. K. Gaj, Q. P. Herr, V. Adler, A. Krasniewski, E. G. Friedman, and M. J. Feldman, Tools for the computer-aided design of multigigahertz superconducting digital circuits, IEEE Trans. on Appl. Supercond. 9 (1999) 18-38. 112. K. Gaj, Survey of SDE Design Tools, http://henry.ee.rochester.edu:8080/~sde/cad/survey.html. 113. D. Zinoviev and Yu. Polyakov, Octopux: An advanced automated setup for testing superconductor circuits, IEEE Trans. on Appl. Supercond. 7 (1997) 3240-3243. 114. V. K. Semenov, Yu. A. Polyakov, and W. Chao, Extraction of impacts of fabrication spreads and thermal noise on operation of superconducting digital circuits, ibid. 9 (1999) 4040-4033. 115. A. F. Kirichenko, O. A. Mukhanov, and A. I. Ryzhikh, A.I. Advanced on-chip test technology for RSFQ circuits, ibid. 7 (1997) 3438-3441. 116. Q. P. Herr, K. Gaj, A. M. Herr, N. Vukovic, C. A. Mancini, M. Bocko, and M. J. Feldman, High speed testing of a four-bit RSFQ decimation digital filter, ibid. 7 (1997) 2975-2978. 117. A. V. Rylyakov, D. F. Schneider, and Yu. A. Polyakov, A fully integrated 16-channel RSFQ autocorrelator operating at 11 GHz, ibid. 9 (1999) 3623-3627.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

47

118. Other Josephson junctions which may be overdamped without external shunting include superconductor microbridges119 and double-tunnel-barrier junctions with normal-metal interlayer (socalled SISIS structures)120. However, in order to reach substantial values of the ICR product (~ 1 mV, cf. Table 1), and also suppress the strong temperature sensitivity of microbridges, they should have length below 100 nm which would be reproducible with the 3 spread about 5 nm, imposing very hard demands on patterning technology. In order to reach the same goal using SISIS junctions, their critical current should be increased to about 20 kA/cm2. This is already close to the level necessary for the usual, single-junctions which should be inherently more reproducible. 119. For a general review, see K. Likharev, Superconducting weak links, Rev. Mod. Phys. 51 (1999) 101-160. 120. A. Brinkman, D. Gassel, A. A. Golubov, M. Yu. Kupriyanov, M. Siegel, and H. Rogalla, Doublebarrier Josephson junctions: Theory and experiment, Report at the 2000 Applied Superconductivity Conference, to be published in IEEE Trans. of Supercond. Electron. 11 (2001) No. 2. 121. Y. Naveh, D. Averin, and K. Likharev, Physics of high-jc Josephson junctions and prospects of their RSFQ VLSI applications, Report #4EL03, 2000 Applied Superconductivity Conference; to be published in IEEE Trans. on Appl. Supercond. 11 (2001) No. 2. 122. A. W. Kleinsasser, R. E. Miller, W. H. Mallison, and G. D. Arnold, Observation of multiple Andreev reflections in superconducting tunnel junctions, Phys. Rev. Lett. 72 (1994) 1738-1741. 123. V. Patel and J. E. Lukens, Self-shunted Nb/AlOx/Nb Josephson junctions, IEEE Trans. on Appl. Supercond. 9 (1999) 3247-3250. 124. Y. Naveh, V. Patel, D. V. Averin, K. K. Likharev, and J. E. Lukens, Universal distribution of transparencies in highly conductive Nb/AlOx/Nb junctions, Phys. Rev. Lett. 85 (2000) 5404. 125. A. Furusaki and M. Tsukada, A unified theory of clean Josephson junctions, Physica B 165&166 (1990) 967-968. 126. C. W. J. Beenakker and H. van Houten, Josephson current through a superconducting point contact shorter than the coherence length, Phys. Rev. Lett. 66 (1991) 3056-3059. 127. D. Averin and A. Bardas, Josephson effect in a single quantum channel, Phys. Rev. Lett. 75 (1995) 1831-1834. 128. J. A. Melsen and C. W. J. Beenakker, Reflectionless tunneling through a double-barrier junction, Physica B 203 (1994) 219. 129. K. M. Schep and G. E. W. Bauer, Transport through dirty interfaces, Phys. Rev. Lett. 78 (1997) 3015. 130. W. Chen, A. V. Rylyakov, V. Patel, J. E. Lukens, and K. K. Likharev, Rapid single flux quantum Tflip-flop operating at 770 GHz, IEEE Trans. on Appl. Supercond. 9 (1999) 3212-3215. 131. See, e.g., A. O. Caldeira and A. Leggett, Quantum tunneling in a dissipative system, Ann. Phys. 149 (1983) 374. 132. See, e.g., Proceedings of the 5K Cryocooler Workshop, July 24-25, 1995, Elmsford, NY, available from HYPRES, Inc., phone 914-592-1190. 133. S. V. Rylov, D. K. Brock, D. V. Gaidarenko, A. F. Kirichenko, J. M. Vogt, and V. K. Semenov, High-resolution ADC using modulation-demodulation architecture, IEEE Trans. on Appl. Supercond. 9 (1999) 3016-3019. 134. V. K. Semenov, Yu. A. Polyakov, and T. V. Filippov, Superconductor delta ADC with on-chip decimation filter, ibid. 9 (1999) 3026-3029. 135. O. Mukhanov, D. Brock, W. Li, D. Gupta, J. Vogt, V. Semenov, T. Filippov, Y. Polyakov, Superconductive High Resolution ADC, Report #2EK01, 2000 Applied Superconductivity Conference, to be published in IEEE Trans. on Appl. Supercond. 11 (2001), No. 2. 136. R. H. Walden, Analog-to-digital converter survey and analysis, IEEE J. on Sel. Areas of Commun. 17 (1999) 539-550; see also Web site www.hrl.com/TECHLABS/micro/ADC/adc.html. 137. E. B. Wikborg, V. K. Semenov, and K. K. Likharev, RSFQ front end for a software radio receiver, IEEE Trans. on Appl. Supercond. 9 (1999) 3615-3618. 138. V. K. Semenov, P. N. Shevchenko, and Yu. A. Polyakov, Digital-to-analog converter based on processing of SFQ pulses, in Ext. Abstr. of Int. Supercond. Electron. Conf., PTB, Berlin, 1997, pp. 320-322. 139. H. Sasaki, S. Kiryu, F. Hirayama, T. Kikuchi, M. Maezawa, A. Shoji, and S. Polonsky, RSFQ-based D/A converter for AC voltage standard, IEEE Trans. on Appl. Supercond. 9 (1999) 3561-3564. 140. V.K. Semenov, Yu. A. Polyakov, and E. Wikborg, Flux multiplier and its metrology applications, Report #2EI13, 2000 Applied Superconductivity Conference; to be published in IEEE Trans. on Appl. Supercond. 11 (2001), No. 2.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

48

141. S. P. Benz, C. A. Hamilton, T. E. Harvey, L. A. Christian, and J. X. Przybysz, Pulse-driven Josephson digital/analog converter, IEEE Trans. on Appl. Supercond. 8 (1998) 42-47. 142. See, e.g., J. Clarke, Low- and high-Tc SQUIDs and some applications, in: H. Weinstock (ed.) Applications of Superconductivity, Kluwer, Dordrecht, 2000, pp. 1-60. 143. V. K. Semenov and Yu. A. Polyakov, Fully integrated digital SQUID, in Ext. Abstr. of Int. Supercond. Electron. Conf., PTB, Berlin, pp. 329-331. 144. J. Vrba, Multichannel SQUID biomagnetic systems, in: H. Weinstock (ed.) Applications of Superconductivity, Kluwer, Dordrecht, 2000, pp. 61-138. 145. A. Kidiyarova-Shevchenko, and D. Zinoviev, RSFQ pseudo random generator and its possible applications, IEEE Trans. on Appl. Supercond. 5 (1995) 2820-2822. 146. J. Kang, A. H. Worsham, J. Przybysz, 4.6 GHz SFQ shift register and SFQ pseudorandom bit sequence generator, ibid. 5 (1995) 2827-2830. 147. P. D. Dresselhaus, E. J. Dean, A. H. Worsham, J. X. Przybysz, and S. V. Polonsky, Modulation and demodulation of 2 GHz pseudo random binary sequence using SFQ digital circuits, ibid. 9 (1999) 3585-3589. 148. D. Yu. Zinoviev and K. K. Likharev, Feasibility study of RSFQ-based self-routing nonblocking switches, IEEE Trans. on Appl. Supercond. 7 (1997) 3155-3163. 149. D. Clark, Blue Gene and the race toward petaflops capacity, IEEE Concurrency 8 (2000) 5-9. 150. O. A. Mukhanov and A. F. Kirichenko, Implementation of a FFT radix 2 butterfly using serial RSFQ multiplier-adders, IEEE Trans. on Appl. Supercond. 5 (1995) 2461-2464. 151. M. Dorojevets, P. Bunyk, D. Zinoviev, and K. Likharev, RSFQ Computing: The Quest for Petaflops, in: Future Trends in Microelectronics, S. Luryi, J. Xu, and A. Zaslavsky (eds.), Wiley, New York, 1999, pp. 193-206. 152. M. Dorojevets, P. Bunyk, D. Zinoviev, and K. Likharev, Superconductor Electronic Devices for Petaflops Computing, FED Journal 10 (2000) 3-14. 153. M. Dorojevets, P. Bunyk, D. Zinoviev, and K. Likharev, "COOL-1: The Next Step in RSFQ Computer Design". Physica B 280 (2000) 495-496. 154. L. Wittie, D. Zinoviev, G. Sazaklis, and K. Likharev, CNET: RSFQ switching network for petaflopsscale computing, IEEE Trans. on Appl. Supercond. 9 (1999) 4034-4039. 155. For a review, see, e.g., M. Yu. Kupriyanov and K. K. Likharev, Josephson effect in high-temperature superconductors, Sov. Phys. Usp. 33 (1990) 340-364. 156. To our knowledge, the most reproducible technology of HTS junction fabrication was described by B. D. Hunt, M. G. Forrester, J. Talvaccio, and R. M. Young, High-resistance HTS edge Josephson junctions for digital circuits, IEEE Trans. on Appl. Supercond. 9 (1999) 3362-3365. The achieved critical current spreads are still insufficient for VLSI applications.

C:\User\LIKHAREV\RSFQ\Reviews\IJHSES_01\ReprintDec01.doc

49

Vous aimerez peut-être aussi