Académique Documents
Professionnel Documents
Culture Documents
Introduction
ors~ ~ ~
r/ r
I..I
0
IL\\ AIIhL.
-
I
-~~Ii
Table 1. Error code bits required to provide single-error correction and Table 2. Redundancy and coding efficiency
double-error detection (minimum Hamming code distance: 4) for common word lengths.
Cost and power requirements Figures 2 and 3 are provided as a convenient means
of rapidly determining the MTBF for a wide range of
The percentage of redundancy required, however, de- memory sizes. In Figure 2 memory size in number of
creases rapidly as word length (m) increases. (For common words stored is related to the number of RAM devices
word lengths, the ratios are shown in Table 2.) Similarly, required, with word length as the parameter. Specifically,
coding efficiency increases rapidly. the figure is based on the equation, d = nw/b, where
The redundancy ratio is a fairly good measure of the d represents the number of devices, n the word length,
cost increase resulting from the addition of error detec- w the number of words, and b the number of bits per
tion and correction to a memory system. This ratio device.
exactly represents the relative cost of the additional Figure 3 is a log/log plot relating the number of de-
storage required to store the error-code bits, but it vices to the mean time between failures, with device
does not include the cost of the parity generators, failure rate as the parameter. This nomograph is based
comparator, decoder, and error-correction logic. However, on the MTBF equation. For example, for a 16K X 16-bit
the quantity of these elements is fixed for any number memory, Figure 2 indicates that 256 1K devices would be
of words. For -a small-size memory, EDAC logic represents employed. From Figure 3 read off the MTBF of 3900
a larger percentage of the total cost. For a large-size hours. Similarly, a 256K X 32-bit memory has an MTBF
memory, EDAC logic recedes to a smaller percentage of 120 hours, using the same device reliability. It should
of the total cost. To illustrate this point the cost of be noted that Figure 3 is based solely on the RAM de-
the EDAC logic was less than 5% of the cost of a 64K X vices and does not include other components making up
32-bit MOS memory and only 1% of the cost of a the total memory. In effect, this nomograph reveals rela-
256K X 32-bit memory. tive MTBF's for various size memories over a range of
The increase in memory subsystem power is approxi- device failure rates.
mately proportional to the amount of additional storage The dominant cause of memory failure is the chip
plus a slight increase in the logic power for the EDAC failure rate; therefore, Figures 2 and 3 provide a simple
circuits. and fairly accurate estimate of performance.
224K
cO
a
:a:
0
N
CU
a:
0
w
256 1000 2000 3000 4000 5000 6000 7000 8000 9000 10,000 11,000 12,000 13,000 14,000
NUMBER OF 1 K x 1 MOS RAM DEVICES
I I I I 1 1. .I 1 1 I1
250 500 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 3250 3500
*K= 1024 NUMBER OF 4K x 1 MOS RAM DEVICES
t
x
0 103
1-
U%J
C-
5r0
w
a:
U-
0
a:
m 102
z
IV-10
r. I1
120 1000 3900 104
MTBF HOURS
w
a:
ERROR 2m
DATA CODE TOTAL WORD (m + k) (m + k - 1)
BITSm BITSk m + k = n
October 1976 47
cc 0.01-
0.0001
10 100 1,000 10,000
This expression results from the binomial expansion of Database Management Systems
(R + Q)m.. The first term, POe = Rm, is the probability
of all devices operating successfully. The second term,
= mR(m-l) Q, is the probability of only one defective
INFORMATION TECHNOLOGY SERIES,
Ple
device. The third term, P2e = m(m - 1) R(m-2) Q2)/2! is VOLUME 1
the probability of only two defective devices, and so on.
The last term, Pme = Q m, is the probability of all m de- The Information Technology Series brings together
vices failing.8 in specialized volumes the most valuable papers from
From this expansion the probability of a single-bit
error in the memory increment of m bits is given by past NCC's. Under the direction of series editor Jack
Sherman of Lockheed/Sunnyvale, the large number
= mR (m-1) (1- R)
Pi of papers and wide variety of topics covered in the
conference proceedings will be conveniently grouped
The probability of a double-bit error in the error- and readily available.
coded increment of m + k bits is given by
This first volume, just published by AFIPS Press,
p (m + k)(m + k - l)R(m+K-2)(1 - R)2 contains 16 papers on database management systems.
2e 2 Editor Ben Shneiderman of the University of Mary-
land provides excellent introductions to the areas
Consider the probability of more than one error in the covered: management and utilization perspectives,
increment of memory: implementation and design of database management
systems, query languages, security, integrity, privacy
P>le =1-POe -Ple and concurrency, and specification, simulation and
= 1 - Rm - mR(ml1)(1 - R), translation of database systems.
where R(t = e-t. The reliability function for a single
increment is: Price: Non-members, $15.00; members, $11.25.
R= )=1-P,e=Rm +mR(m-1)(1-R)
- - - - - - - - - - -- - - - - -- -
= Rm(1 - m) + mR(m-)
= cmk(1 -m) + me-(m-1) (increment reliability). Please send the
INFORMATION TECHNOLOGY SERIES,
VOLUME 1
The reliability function for a larger memory matrix of C Non-member, $15.00
N increments is El Member, $11.25
[RN(t)I = [(1 - m)e-m + me-(m-1)k1N
W
El My total payment of $ . .
is enclosed *
where Nis an integer 2 1. El Add $2.00 billing charge and invoice me
Assuming the reliability implications of EDAC are
limited to the page level of memory organization, the IEEE or IEEE Computer Society member no.
total memory subsystem reliability R. is obtained from
the reliability product rule'2 and is given by
L Addcress
Rs(t) = [RI(t)]N n R#(t), -it. State
j=l
Country. Zip
*Caiifornia residents add 6% sales tax
where L is the number of series components which
contribute to the memory failure, RI(t) is the reliability of Mail this order form with your remittance to:
an increment, and R1(t) is the reliability of the subsys- IEEE COMPUTER SOCIETY
tem components. 5855 Naples Plaza, Suite 301
Long Beach, California 90803
49
October 1976
The subsystem MTBF5 is References
00 L 1. William W. Peterson and E. J. Weldon, Jr., Error-
MTBF = p;[R1(t)IN fR/t) Correcting Codes, MIT Press, Cambridge, Massachusetts,
1972.
o j~=1
2. Shu Lin, An Introduction to Error Correcting Codes,
Prentice Hall, Englewood Cliffs, New Jersey, 1970.
fo [(1-m)e-mt + me-(m-1)X] n RR(t). 3. Elwyn R. Berlekamp, Algebraic Coding Theory, McGraw
J.=1 Hill, New York, 1968.
Since the MTBF expression is formidable, we will look 4. R. W. Hamming, "Error Detecting and Error Correcting
for an easier means of measuring improvement in Codes," Bell System Technical Journal, 26, Vol. 26, No. 2
(April 1950), pp. 147-160.
reliability. A graphical means of interpreting the MTBF
is given in Figure 6. 5. Frederick J. Hill and Gerald R. Peterson, Introduction
Curves similar to Figure 6 can be generated for any to Switching Theory and Logical Design, John Wiley and
size memory from the curves plotted in Figure 5 by Sons, Inc., New York, 1968.
means of the following method (since the probability of 6. Jack Goldberg, Karl N. Levitt, and John Wensley,
more than two errors is small, we omit them): From "An Organization for a Highly Survivable Memory,"
Figure 5 determine the probability of a double error IEEE-TC, Vol. C-23, No. 7 (July 1974), pp. 693-705.
P2e for a number of time intervals T and construct 7. J. M. Wiesen, "Mathematics of Reliability," Proc., 6th
a table for PI9e. Two or three points are all that is National Symposium on Reliability and Quality Control,
necessary for a log/log graph. The table below for Figure 6 January 1960.
is based on a 32-bit 64K word memory.
8. Bertram L. Amstadter, Reliability Mathematics Funda-
mentals; Practices; Procedures, McGraw Hill, New York,
1971.
T P2e N (1 -P2e) [PP2e N-= 1 -( - P2e) 9. Military Standardization Handbook, Reliability Prediction
of Electronic Equipment, MIL-HDBK-217B, 20 September
1974.
1020 0.001 64 0.038046 0.061954 10. W. C. Carter, D. C. Jessep, and A. Wadia, "Error-Free
4020 0.01 64 0.525582 0.474418 Decoding for Failure-Tolerant Memories," Proc. IEEE
Computer Group Conference, June 1970.
11. W. C. Carter, K. A. Duke, and D. C. Jessep, Jr., "Lookaside
Techniques for Minimum Circuit Memory Translaters,"
IEEE-TC, Vol. C-22, No. 3 (March 1973).
( 0
12. Randall C. Cork, "Reliability with Error-Detecting and
Correcting Codes in Semiconductor Memories," Ph.D.
dissertation, Arizona State University, 1975.