Académique Documents
Professionnel Documents
Culture Documents
Author(s): E. R. Berlekamp
Source: Journal of the Royal Statistical Society. Series A (General), Vol. 135, No. 1 (1972), pp.
44-73
Published by: Blackwell Publishing for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2345039
Accessed: 16/09/2010 10:07
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=black.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Royal Statistical Society and Blackwell Publishing are collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Royal Statistical Society. Series A (General).
http://www.jstor.org
J. R. Statist. Soc. A, 44 [Part 1,
(1972), 135, Part 1, p. 44
By E. R. BERLEKAMP
Bell TelephoneLaboratories,Inc.4,
SUMMARY
This paperpresentsa surveyof coding theory for statisticiansand mathema-
ticians who have some familiarity with modern algebra, finite fields and
possibly some acquaintancewith block designs. No knowledgeof stochastic
processes, informationtheory or communicationstheory is required.
Keywords: CODING THEORY
1. INTRODUCTION
1.1. Background
WE assume that the reader is familiar with the following properties of finite fields:
There exists a finite field having q elements if and only if q is a prime power. The prime
of which q is a power is called the characteristicof thefield. The field with q elements
is unique. It is called the Galoisfield of order q and is denoted by GF(q). Addition,
subtraction, multiplication and division (except by 0) in this field can be performed by
standard techniques. The nonzero elements of any finite field form a cyclic multipli-
cative group. If o is a generator of this group, then oxt= 1 if and only if i is a multiple
of q- 1. In GF(q) the polynomial xq-1 I factors into linear factors:
q-2
xq -1 -l (X_(-ot).
i=:O
means that for all n, Cn - iABni Similarly, the generating function equation
A(z) = C(z)/B(z) holds if and only if A(z)B(z) = C(z). In particular,
co
E atiZi = 1// (1-Z)
i=o
is a generating function identity whose validity does not depend on the convergence
of the sum for any particular values of a or z.
The reader should also know that there now exists a collection of good algorithms
for solving algebraic equations over finite fields, including those of Berlekamp (1970)
and Chapter 6 of Berlekamp (1968). It is not necessary for the reader to be familiar
with the details of these algorithms, but be should realize that it is possible to factor
(into irreducible factors) any polynomial of degree m over GF(q) in about (m log q)3
computations. In this paper, we are therefore able to consider a computational
problem "solved" when it is reduced to the problem of finding the roots of some
polynomial in some finite field.
It follows that there exists a k-dimensional linear binary code of block length n and
minimum nonzero distance d for any values of n, k and d which satisfy this inequality.
48 BERLEKAMP- Survey of Coding Theory [Part 1,
If u> 2p, the central limit theorem assures us that in a sufficiently long block of n
digits the weight of the channel error word will almost certainly be less than un/2, so
that if t = un/2, a t-error-correcting decoding algorithm will decode correctly with
very high probability. Therefore, on a channel with crossover probability p < i, the
Gilbert bound assures us that there exists a sequence of codes, having longer and
longer lengths but each having the same rate, with the property that the probability
of error using bounded distance decoding algorithms approaches zero as the code
length approaches infinity.
Using more complicated arguments which we shall not reproduce here, it can also
be shown that no sufficientlylong block code (linear or nonlinear) which has minimum
nonzero weight d = un can have more than about 2R*(u)n codewords. The functions
R(u) and R*(u) are plotted in Fig. 1. The derivation of R*(u), originallydue to P. Elias,
is given in Chapter 13 of Berlekamp (1968). Tighter upper bounds on the minimum
distance of binary block codes of short to moderate lengths have recently been obtained
by Johnson (1970), but the new bounds still have the same asymptotic form as the
Elias bound.
There are several values of n and d for which it is known that the biggest block
code of length n and distance d is nonlinear. For example, the biggest linear code with
n = 15 and d = 5 has 27 codewords, but the biggest nonlinear code has 28. However,
it is not known whether the asymptotic lower bound of Fig. 1 can be raised for
non-linear codes or whether the upper bound of Fig. 1 can be lowered for linear
codes.
4.
0;1
_ R Rate e
FiG. 1.
The columns of the parity check matrix of a Hamming code must be the set of all
nonzero binary m-tuples. The ordering of these columns is arbitrary, but some
orders have certain advantages over others. The most convenient orderings, called
cyclic orderings,are based on the cyclic multiplicative group of the Galois field GF(2m).
It is well known that this field, like all finite fields, contains at least one primitive
element, which we call as, which satisfies the equation oci= 1 if and only if i is a
multiple of 2m- 1. The powers of otinclude all nonzero field elements, and each may
be expressedas a sum of the basis elements1, ot,a2, ..., am-'. The representationof
each power of ot may be uniquely determined from the minimal polynomial of ox,
which has degree m.
50 BERLEKAMP- Survey of Coding Theory [Part 1,
(Y5 = (X2 + ol
of6= o 3 + o2
oX7o= 03 + of+
ot8= oc2 +
ot 14 =013 +
o 15 =
The ith column of the cyclic Hamming code may be taken as HM= oi, where o&iis
represented as an m-dimensional binary column vector, [Ao,A1, ...., Ami,]T, where the
A's are the binary numbers determined by the equation o&i= Em='A' )o4. For
example, if m = 4 and c4+c +1 = 0, we have
H= 1 0 0 1 1 0 1 0 1 1 111]
0 10 1 1 0 1 0 1 1 1 1 0 0 0
_0 0 1 0 0 1 1 0 1 0 1 1 1 1 0_
formed by taking any cyclic shift of the columns of H still defines the same code. In
fact, the conventional definition of the parity-check matrix of a cyclic Hamming code
is not the one given above, but rather
H [O, gl, ..., On-1].
The coefficients of the words are correspondingly labelled from 0 to n-1. The
polynomial whose coefficients are the components of a codeword C is denoted by
C(z) = >2C*zi. Clearly C(z) is a codeword if and only if C(ac)= 0. But oais a root
of a binary polynomial if and only if that polynomial is a multiple of the minimal
polynomial of oa, which we denote by M(z). In general, M(z) = Him1(z- o2i). In
the present example, M(z) = z4 + Z + 1. Thus, the 21"codewords of the binary cyclic
Hamming code of length 15 may be taken as the 21"binary polynomials of degree
< 15 which are multiples of M(z) = Z4 + Z + 1.
1972] BERLEKAMP- Survey of Coding Theory 51
g(z) = fI (z -a,i
ieK
If K is taken as an arbitrary set of integers mod n, then the coefficients of the g(z)
defined by this equation will lie in GF(qm),but they may not lie in the subfield GF(q).
The necessary and sufficient condition that all coefficients of g(z) lie in GF(q) is that
K be closed under multiplication by q mod n. In other words, qK = K. The quotient
of zn -1 divided by g(z) is conventionally denoted by h(z). It is given by the formula
h (z)=n I(z - i),
ieK
where K is the set of integers mod n which are not in E. In other words, K and R are
complementary sets.
The dimension of the linear cyclic q-ary code with generator polynomial g(z) is
n - degg. Since the degree of g(z) is the number of elements in the set K, the dimension
of the code is the number of elements in the set K. This is often denoted by k = IK!.
52 BERLEKAMP- Survey of Coding Theory [Part 1,
go g1 g2 .. gn-k 0 0 o]
G= go g ...g ... gn-k ? ... 0
[ ...g0 g1 .. . .. gn-k
H [ hk ... h2 h0 ho ? ? ... 0
0 O ... hk .. .. .. . . ho
The proof that GHT = 0 follows directly from the equation g(z) h(z) = z_- 1. Because
of the form of the above parity-check matrix, h(z) is often called the parity-check
polynomial. In practice this is sometimes not the most convenient basis of the row
space orthogonal to G. In a previous example we considered the binary Hamming
code of length15,for whichg(z) = 1+z+z4 andh(z) = zll+z8+z7+z5+z3+z2+z+ 1.
The rows of the parity-check matrix given in that example are now seen to be various
cyclic shifts of h(z). This is no accident, for all 24 words in the null-space of G,
excepting the all-zero word, are cyclic shifts of h(z). Any set of 4 linearly independent
shifts of h(z) could be chosen as the rows of H.
If gaug(z) is a polynomial not divisible by z- 1, and gexp(z) = (z- l)gaug(z), then
the pair of cyclic codes with these two generator polynomials are often closely related.
The code generated by gexp(z) is called the expurgated code and the code generated
by gaug(z)is called the augmentedcode. The expurgatedcode is clearly a subcode of the
augmented code. Starting from the generator matrix of the augmented code, we may
form an extended cyclic code of length n + 1 by annexing an overall parity check to the
cyclic code generated by g,ug(z). The generator and check matrices of the extended
code are as follows:
0
0
Hext = Haug
0
111 ... 11
1972] BERLEKAMP- Survey of Coding Theory 53
We may also form the lengthenedcyclic code of length n+ 1 from the cyclic code
generated by g,xP(z). Its generator and check matrices are given by
0
0
Glen = Gexp
0
11 ... 1 -n
yln
L y~~/n.
It is easily verified that Gext HT= 0, GienHen = 0 and that G HeT 0. It
follows that the code obtained by lengthening Gexp is identical to the code obtained
by extending Gaug. In some cases it is possible to show that the extended code is
invariant under a transitive group of permutations of its coordinates, and this then
enables us to obtain certain information about the weight structure of the original
(unextended) code from the following theorem of Prange:
Prange's Theorem. Let aj denote the number of codewords of weight j in an
original binary code of length n. If the extended code is invariant under a transitive
group of permutations of its codewords, then
do ~~~N
-4 n
02i-i
AZ1
0
0
?2i
We shall later show that the rows of this N x N matrix are linearly dependent,
and in fact the dimension of the row space of this matrix is N/2. The restriction that
n=_+ 1 mod 8 is equivalent to the assumption that 2 is a quadratic residue modulo n.
The motivation for labelling the N rows and columns of G with symbols from
GF(n) u {oo}is the following theorem.
Theorem. Every extended binary QR code is invariant under the permutation
Cu" C-.-., where the arithmetic operations on the subscripts are performed in
GF(n) U {oo}with + 0-1 =ox _+ ro'- = O.
Proof. We claim that if u# 0
GU,v+Go,v if u E R1,
Gu,v+Go,v+G,v if ucER.
This equation follows immediately if v = 0 or so or if u = v. To check the other
cases, we notice that the necessary and sufficient condition that G-u 1__-1 + Gu,v = 0
is that (v-u) (-v-? + u-1)E Ro. But
Since [fQ(o)]2 = [f(x2i)] = f(x1), it follows that fQtxJ)E GF(2) for all j. If j E Ro, then
f(Oc) = f(o6), but if j E R1, then f(Oyi) = 1 +f(c() because f(Q) +f(O&')
+ 1 is the sum of
all nth roots of units. By proper choice of the primitive element cx, we can assume
that f(ac) = 0, and that the greatest common divisor off(x) and xn -1 is given by
g(x) = :r (x -a, ).
ieRo
This equation provides an alternative definition of the augmented QR codes directly
in terms of the roots of their generator polynomials. This equation can also be used
to define the augmented q-ary quadratic residue code of length n whenever n is prime
and q is a quadratic residue modulo n.
56 - Surveyof CodingTheory
BERLEKAMP [Part 1,
But the weight of the product mod (xn -1) cannot exceed the product of the weights,
so we deduce that the minimum weight, d, of the augmented binary QR code of length
n satisfies the inequality d2> n.
We have thus shown that the extended binary QR codes, each of which has rate
21 , have minimum distances approaching oo with increasing block length, although
=
the bound is disappointingly weak. The Gilbert bound assures us that there exist
arbitrarily long codes of rate 2 whose distances grow linearly with block lengths.
However, it is not known whether or not any subsequence of the QR codes has this
desirable property.
Although several short QR codes are known to have minimum distances greater
than 2 + IN, the actual distances of all sufficiently long QR codes are unknown. The
best asymptotic lower bound known for their minimum distances is VN. Further
known results about QR codes are given in Section 15.2 of Berlekamp (1968).t
Notice how the bound d>4N depends on Prange's theorem. If C(?)(x) were
divisible by (x-1), then we would have C(0)(x)C(1)(x)=0modxn-1, from which
nothing could be concluded. Thus, without the transitive group of permutations
which preserve the extended code, we could only deduce that the minimum odd weight
was > Vn. In some cyclic codes, the minimum odd weight is actually much greater
than the minimum nonzero even weight. For example, it is known that the cyclic
binary code of length 25 whose generator polynomial's roots include ci for all i in the
set JC= {0, 5, 10, 15, 20} has minimum odd weight 5, but the minimum nonzero even
weight is only 2.
6. THE AFFINE GROUPS
Although the extended quadratic residue codes are the only codes which are known
to be invariant under the projective unimodular subgroup of the linear fractional
group, many cyclic codes are known which have extensions that are invariant under a
differenttransitive group of permutations. The length of each q-ary cyclic code whose
extension is invariant under this type of group is of the form n = qm-1. The co-
ordinates of the cyclic code are associated with the nonzero symbols in GF(qm). The
extended code is invariant under the cyclic shift, which fixes 0 and cycles the nonzero
coordinates according to the permutation Cu,- Cv, v = ocu in GF(qm). The group of
all cyclic shifts is just the multiplicative group of GF(qm), C.-> Cv, v = fu, where
u, v e GF(qm) and / e GF(qm)- {0}. The extended cyclic code may also be invariant
under the additive group in GF(qm). This group is more conveniently called the
translationalgroup, C. -* Cv, v = u +/3, v, u, 3 e GF(qm). If the extended cyclic code is
invariant under both cyclic shifts and translations, it is invariant under the group
generated by these permutations. This is the little affinegroup, each of whose permuta-
tions is of the form C,-> Cv, v = au + b, u, v, b E GF(qm) and a e GF(qm)- {0}. The
order of the little affine group is qm(qm- 1). In certain cases the code is also invariant
under the big affine group, whose general permutation is of the form C.-C,
v = A u + b, where now u, v and b are m-dimensional vectors over GF(q), and A is an
invertible m x m q-ary matrix. The order of the big affine group is qmHIm j(qm qi),
and it includes the little affine group as a subgroup. The little affine group is doubly
transitive for every q, and the big affine group is triply transitive if and only if q = 2.
The following well-known number theoretic result of Lucas (1878) proves useful in
characterizingcyclic codes whose extensions are invariant under the little affinegroup:
t The third printing of this book contains a substantially more up-to-date listing of known
minimum distances of QR codes than either of the first or second printings.
58 BERLEKAMP- Survey of Coding Theory [Part 1,
J = E JiA ?<JP9 j
i
then
Nimodp.
A proof may be found on p. 113 of Berlekamp (1968). Since the lemma implies that
( =) 0 if andonlyif Ji < Ni for all i, it is naturalto introducethenotationJc N if and
only if every digit of the p-ary expansion of J is no greater than the corresponding
digit of the p-ary expansion of N. It is easily seen that this relationship is transitive.
We say that J is a p-ary descendant of N if and only if Jc N.
Theorem. [Kasami et al. (1966).] If q is a power of the prime p, then the extended
cyclic code of length N = qm, obtained from the expurgated cyclic code of length
n = qm- 1 with generator polynomial given by
g(x) = HI(x - oi), a a primitive element in GF(qm)
iEK
is invariant under the little affine group if and only if the set 9 contains all of its own
p-ary descendants.
Proof. Let C(x) = En-1O C xi be an arbitrary q-ary polynomial of degree <n,
and let S(C,j) denote the value assumed by this polynomial at x = 0xj.Then
S(C,j) = Cot.
i
Clearly C(x) is a codeword of the expurgated cyclic code if and only if S(C,j) = 0
for all jE. If j7A0, then Oi= 0 and S(C,j) is also given by
S (C,j)= E Cuui.
ueGF(q"m)
If we define 00 = 1, and
S(C,0)= E Cu0u
ueGF(q")
then the condition S(C, 0) = 0 is necessary and sufficientfor COto be the overall parity
check.
We now let TA(C,j) be S(C',j), where C' is the translation of C by an element
A E GF(qm). Specifically,
TA(C,])= E C U1
ueGF(qm)
= z ev+)
veGF(qm)
veGF(qm) i I
1972] - Survey of Coding Theory
BERLEKAMP 59
But GF(ql")has characteristicp, so we may restrict the sum on i to the p-ary descen-
dants of j, interchange the order of summation and obtain
Now if ? contains all of its own descendants, then S(C, i) vanishes for every i cj
wheneverj A-K and C is a codeword. Similarly, T,(C,j) = 0 for all j eKJ, thus implying
that the translation of C is another codeword. Since C might have been any codeword,
the extended code is invariant under the translational group.
On the other hand, if X does not contain all of its own descendants, then we may
select a codeword C for which S(C,j) = 0 if] e- J, Si = 1 if]j 2.t We may also select
a j E KCwhich contains all of its descendants except some i -j for which j-i is a
power of p. For this j, we have
TA(C,j)= EAk Pk,
k
divides TX(C,j). But the degree of TX(C, j) is less than j, which is not more than
n = qm- 1. It follows that there exists a A E GF(qm) which translates C into a vector
which is not a codeword. Q.E.D.
This theorem provides a rather easy test to decide whether or not any given
extended cyclic code is invariant under the little affine group. We write out the m-digit
q-ary expansion of each of the numbers of the set X, then expand the q-ary expansions
into p-ary expansions if necessary, and then check to see whether X contains all of its
own descendants. Since multiplication of a number by q modulo qm- 1 is merely a
cyclic shift of the m-digit q-ary expansion of that number, it is natural to represent
the elements of X in terms of their m-digit q-ary expansions whenever n = qm-1. The
condition that qIC= J merely requires that the cyclic shifts of the m-digit q-ary
representation of each number in X must be another number in E.
If j is a number relatively prime to qm- 1, then the permutation u -* uj in GF(qm)
permutes each extended cyclic code into an equivalent extended cyclic code. This
permutation corresponds to multiplying the set KCby j. Since ancestry is generally
not preserved by this permutation, it follows that some extended cyclic codes which
are not invariant under the little affine group are equivalent to other cyclic codes
which are. Of course, any code equivalent to a code which is invariant under the
little affine group is itself invariant under a doubly transitive permutation group
isomorphic to the little affine group. In fact, the digits of such a code can still be
directly associated with the elements of GF(pm) in such a way as to leave the code
invariant under the little affine group. However, this correspondence is not the one
assumed in the theorem, which associates the overall parity check with 0 e GF(qm)
and then associates each of the successive digits of the cyclic code with the successive
powers of the primitive element a. If one associates the successive digits of the code
with the successive powers of a wrong primitive element, then the code is not invariant
under the translational group because the addition in GF(qm)is not correctly defined.
t The proof that in fact there exists such a codeword uses the Mattson-Solomon polynomial,
which appears later in this paper, in the proof of the BCH bound.
60 BERLEKAMP- Survey of Coding Theory [Part 1,
0 1 a a2 a3 a4 aN-2
where each of the last m rows is the m-dimensional binary vector obtained by repre-
senting each element of GF(2m) in terms of some m-dimensional basis over GF(2).
We may also write the same G in terms of its rows as
1
c(l)
G= C(2)
C(m)
is a unit vector. The proof follows from the fact that the 2m columns of the last m
rows of the G matrix each contains a different m-dimensional binary vector. The
column consisting of m ones occurs only once, and each of the other 2m-1 columns
contains at least one zero entry. This property is preserved when 1 is added to any
subset of the rows.
Since the 2m unit vectors are linearly independent, it follows that the 2m vectors
of the form
m
I(CMi+ Ai 1)
On the other hand, the dimension of the rth order RM code is IKI= E (.),
becausethereare ('.) integersas,0 < o,< 2m- 1, whichhave w2(CX)
= m-i. We have
thus provedthe followingresult:
Theorem.The codewordsof the rth orderRM code are the polynomialsof degree
at most r in m binary variables.
The canonicalbasis for the rth order RM code consists of the productsof all
combinationof at most r C's. In termsof the co-ordinatizationof the code, each of
thesebasis vectorsis the indicatorfunctionof a certainaffinesubspaceof dimension
k m - r. It is possible to select anotherbasis which consists entirelyof indicator
functionsof (m- r)-dimensionalaffinesubspaces. To do this, we simply represent
each affinesubspaceof big dimensionas the union of smalleraffinesubspaces. For
= CM CM2+ C(M)(C(2)
example, CM1) + 1), so that if r>2, the canonical basis vector
C(), which is the indicatorfunctionof an (m- 1)-dimensionalaffinesubspace,may
be replaced by C(M)(C(2)
+ 1), which is the indicator function of an (m - 2)-dimensional
affine subspace. The purpose of selectinga basis consistingentirely of indicator
functionsof (m- r)-dimensionalaffinesubspacesis to show that the rth orderRM
code is the smallestlinearcode containingall such codewords. It also follows that
a co-ordinatepermutationpreservesthe rth orderRM code if and only if it preserves
all (m - r)-dimensional subspaces. This group is well known.
Theorem.The group of all permutationsof co-ordinateswhich reservethe rth
orderRM code, 0 < r < m-1, is the big affinegroup.
Trivial exceptionsoccur when r = 0 or m- 1, and in these cases the code is
preservedby the full symmetricgroup of all permutationsof co-ordinates. When
r = 0, there are only two codewords, 0 and 1; when r = m -1, every word of even
weightis a codeword.
1972] BERLEKAMP- Survey of Coding Theory 63
Although we have shown that the rth order RM code contains codewords of
weight d = 2m-r, we have not yet shown that this is the minimum weight of the non-
zero codewords. This fact may be proved in several ways, and the astute reader may
be able to devise an elegant proof of his own. However, we shall adopt a more devious
proof, which is constructive from the decoder's point of view. We shall exhibit a
decoding procedure and prove that it works whenever the error pattern has weight
less than d/2.
7. THRESHOLD DECODING
In general, the decoder wishes to separate the received word, R, into the sum of the
transmitted codeword and the channel's error word, E. To this end, he can compute
the dot product PRT for various vectors P. If P is in the null space of the code, then
P is said to be a parity-check equation and PRT = PET. The decoder must attempt
to determine E from his knowledge of PET for various P's in the null space of the
code. Since the various components of E are given by UET for various unit vectors
U, we may rephrase the decoder's problem as the determination of QET for certain
vectors Q not in the null space of the code from his knowledge of PET for P's which
are in the null space of the code. The eventual goal is the determination of QET for
the unit vectors Q, but a good set of subgoals is often the determination of QET for
certain Q's which have smaller weights than any nonzero vector in the null space of
the code.
In the particular case of Reed-Muller codes, the vectors P in the null space of the
code are easily characterized. One immediate consequence of the definition of GRM
codes is that the orthogonal complement of the rth order GRM code is equivalent to
the ((q - l) m -1- r) order GRM code. It follows that the null space of the
((q -1) m -1- r)th order code contains the indicator function of every affine subspace
of dimension > r. However, the null space does not contain the indicator functions
of the r-dimensional affine subspaces.
Let S be an r-dimensional affine subspace of GF(2m) and let Q(S) be its indicator
function. The entire space of m-dimensional binary vectors may be partitioned into
2m-r disjoint sets, each of which is a translate of S. We denote the j = 2m-r -1
translates which are distinct from S by T1,T2,.. ., Ti, and their indicator functions by
Q(T1),Q(T2),...,Q(Tj). Now since S is an r-dimensional affine subspace and Ti is
one of its translates, S U Ti is an (r + 1)-dimensional affine subspace. It follows that
Q(S) +Q(T) is in the null space of the rth order RM code. Thus, although the decoder
cannot determine Q(S) ET directly, he can determine Q(S) ET+ Q(T) ET for
= 1,2,..., 2m-r-1. For each i, this provides an estimatet of Q(S) ET namely
Theorem. If w(E) < 2m-r-1 and if PETis knownfor everyP whichis the indicator
functionof an (r+ l)-dimensionalaffinesubspace,then QETmay be determinedby
majority vote among the 2m-r- 1 values of PET for which POQ = Q.
Oncethe decoderhas computedQETfor those Q's whichare indicatorfunctions
of r-dimensionalsubspaces,he can then use the same procedureto determineQET
for those Q whichare indicatorfunctionsof (r- 1)-dimensionalsubspaces,and con-
tinue until he finds TJET for the unit vectorsU. If w(E)< 2m-r-1,every election is
correctlydecidedand the decoder'sfinal evaluationof E is correct. It follows that
the rth orderRM code has minimumdistanced> 2m-r- 1. Sinceall codewordshave
even weightsand the indicatorfunctionsof (m- r)-dimensionalaffinesubspacesare
codewords,it is clearthat d=2m-=.
From a practicalpoint of view, thresholddecodingis most attractivewhen r is
small. For moderatevaluesof r the numberof intermediatesubspaceswhose parity
checksmust be computedcan be very large.
Notice that thresholddecodingsucceedsin correctingmanyerrorwordsof weight
> d/2. If d and N are large, then most errorpatternsof weight slightlymore than
d/2 will not be evenlydistributedamongthe translatesof any r-dimensionalsubspace
S. Less than half of these 2m-r- 1 translateswill contain an odd numberof errors
and the electionswill determinethe correctvalues of Q(S) ET for all r-dimensional
subspacesS.
If m is odd, the (mi- 1)/2 orderRM code of lengthN = 2m has rate 2, and distance
d = VN. Thus,the RM codes includea sequenceof codes of increasinglengths,each
of whichhas rate i, and the minimumdistancesof these codes comparefavourably
withthe bestlowerboundknownfor the minimumdistancesof the extendedquadratic
residuecodes,for whichd> VN. However,thislowerboundon the minimumdistance
of QR codes appearsto be weak, and in all knowncases the QR codes are actually
as good as or better than the RM codes, when goodness is measuredin terms of
minimumdistance.Thethresholddecodingalgorithmis the majorpracticalattraction
of RM codes. No comparablygood methodfor decodinglong QR codes is known,
even whernw(E) < I(N)/2.
The kntownpropertiesof RM codes enableus to obtainupperand lowerbounds
for the minimumdistancesof various other binarycyclic codes of lengths 2m- 1.
By examiningthe set of roots of the code'scheckpolynomial,we can find its biggest
RM subcodeand its least RM supercode.If
w2(a)<m-rVaeIR and m-r'<w2(a)Va eK,
then the code is a supercodeof the expurgatedrth orderRM code and a subcodeof
the r'th orderaugmentedRM code. Its minimumdistance,d, is then boundedby
2m-r - 1 ? dA 2m-r. In practice, these bounds are often very crude. A stronger
techniquefor boundingthe minimumdistanceof an arbitrarylinear cyclic binary
code was introducedby Bose and Chaudhuri(1960)and Hocquenghem(1959).
8. BCH BouND
Theorem.If k contains dBCEL-1 consecutiveelements,modulo n, and if n is
relativelyprimeto q, then thereare no nonzerocodewordsof weightless than dBcH
in the linearcyclicq-arycode of lengthn and generatorpolynomial
g(x) = H (x- 00),
ier
Proof: Mattson and Solomon (1961): We first obtain an expression for the co-
efficients of the codeword C(x) in terms of the values C(ai). It is evident that
n-1
= Cj ij
j_O
C, = E
z C(ci) a-
niK
There are dBCH -1 consecutive elements not in K. By appropriate choice of b we
may translate this gap to n-(dBCH - 1), n - (dBCH- 2), ..., n -1. In other words, we
choose b so that for all i E K+ b, O< i - n-dBCH. We then replace i by i-b to obtain
c,,lt
Cl,= E C((ai-b) a6-*l
n ieK+b
or
C, =-F(U-9
McEliece's theorem
If K does not contain any j-tuple, xl, x2, ..., Xj, for which 2=j xi = 0,t then the
weight of every codeword in the binary cyclic code generated by g(z) = IcIy (z - Ai)
is a multiple of 2i.
Corollary 1. All weights in the expurgated binary QR code of length n =-1 mod 8
or the extended QR code of length N_ 0 mod 8 are divisible by 4.
Proof. In this case, - 1 is a nonresidue, the negative of every residue is a non-
residue, and no pair of residues (or nonresidues) can sum to 0.
Corollary 2. All weights in the rth order RM code of length 2m are divisible by
2(m-1)/r
Proof. Multiply K by -1 so that each integer in K has at most r ones in its binary
expansion. In general,W2(i+ 1) ? W2(i) + W2(), from which it follows that the binary
expansion of the sum of any j elements of K has at most jr ones. Ifj] (m - 1)/r, then
jr < m, so the sum cannot be congruent to 0 mod 2m- 1 because the binary expansion
of 2m-1 has m ones.
For j = 1, the McEliece theorem is straightforward. In this case, 0 e K, sO 0 e
and ao = 1 is a root of every codeword. Thus C(1) = 0 in GF(2), which implies that
the weight of C is even.
Even for j= 2, the McEliece theorem is relatively deep. The special case given
as Corollary 1 was earlier proved by Gleason, but the general McEliece theorem for
j = 2 was first given by Solomon and McEliece (1966). The generalization to j 3
appeared in the thesis of McEliece (1967).
The McEliece theorem enables us to strengthen the bounds on the minimum
distances of several codes. According to Corollary 1, all weights of the extended
binary QR code of length 24 are divisible by 4, so from Prange's theorem we conclude
that the minimum distance of the augmented QR code of length 23 must be congruent
to -1 mod 4. Since both the square root bound for QR codes and the BCH bound
give d) 5, it follows that d) 7. A simple counting argument,
(23) + (23) + (23) + (23) 211
shows that no coset can have weight greater than three, so in fact d = 7.
The first Kasami-Tokura code is the primitive BCH code of length 127 and
dd=29, for which K=0011101, 0011111, 0101011, 0101111, 0110111, 0111111,
1111111, and their cyclic shifts, making the dimension of the code 43. It is easily
verified that any other 7-bit binary number has at least one cyclic shift less than 29.
Since every element of K has at least four ones in its binary expansion, this BCH code
is a subcode of the third-order augmented RM code. By Corollary 2 and Prange's
theorem, all weights in this code are congruent to 0 or -1 mod 4. Thus, d# 29, so
d) 31. But the BCH code is also a supercode of the fourth-order RM code, so d = 31.
No primitive binary BCH code is known whose minimum distance exceeds the
smallest value compatible with both the BCH bound and the McEliece theorem, but
a number of other techniques have appeared in the literature for obtaining slight
improvements of the lower bounds on the minimum distances of certain imprimitive
BCH codes. The work of Mattson and Solomon (1961) and Cerveira (1968) yields
conditions sufficient to conclude that d) dBose + 1 in certain cases. These conditions,
too complicated to be presented here, were later refined by Lum and Chien (1968).
t The sum of the x's is taken modulo n.
68 BERLEKAMP- Survey of Coding Theory [Part 1,
Hartmann and Tzeng (1970) obtained separate bounds for the minimum odd weight
and the minimum nonzero even weight. A summary of these and other results on the
minimum distances of imprimitive BCH codes is given by Hartmann (1970).
The rates and distances of long BCH codes turn out to be related in a very
interesting way. If we consider a sequence of BCH codes of increasing block lengths
in which the distances grow linearly with block length (i.e. d = un, where u is fixed
while d and n go to infinity), then we find that the number of information symbols
increases as a power of the block length. More precisely, if we let I(q, n, d) denote the
number of information symbols in the q-ary BCH code of length n and designed
distance d, then Berlekamp (1968, Chapter 12) has shown that
I(2, 2m_- 1,u21n)Z ns(u)
in the sense that
where if e > 0, A may be bounded in the interval - - E < A < + E for all sufficiently
large n. Since 86 approaches 1 as 8 approaches 0, we have the cruder approximation
ln R-
logn
1.0 Il
0.9s
0.8
0.7
0.6
0.4 I II0
0.3
0.2
0.1
0 .
0 0.0001 0.0010 0.001i 0.0100 0.0101 0.0110 0.0111 0.100
u(in binary)
FIG. 3.
This approximation may be obtained from our earlier assertion that R 1ns(uV)- and
the fact that for small u, s(u) 1- u/ln 4. The behaviour of the singular function
s(u) for small u is really quite remarkable. Although s'(u) = 0 for almost all positive
u, s'(0) = - I/In4.
The approximation d - 2n In R-1/log n shows that the distances of long BCH codes
are much better than RM codes or the best lower bounds known on the distances
of QR codes. At R = 2 the distance of BCH codes grows not as Vn, but as n In 4/log n.
70 BERLEKAMP- Survey of Coding Theory [Part 1,
Si= E(ci) = Y
yj(e)i
j
The error word is then characterized by its error locations, Xi = o'PiE GF(qm), and its
error values, Yj e GF(q). In order to determine the error locations, the decoder first
attempts to find the coefficients of the error locator polynomial
a(Tz) = 1I (-Xj Z) (I -zalli)
The degree of a(z) is the weight of E(x), and the reciprocal roots of a(z) reveal the
locations of the errors. In order to find the error values, the decode first attempts to
find the coefficients of the error evaluator polynomial,
cw(z)= a(z) + YjXj z HI(1-Xi z).
The degree of co is not greater than the degree of a. Once a, w and the error locations
are known, the error values Ykmay then be determined by evaluatingw(z) at z = X-1
which yields the formula.
Thus, the determination of E(x) from a(z) and w(z) is a straightforward procedure;
the decoder's major problem is to find the coefficients of a(z) and w(z). To obtain
relations between a(z) and w(z) and the known Sl, S2,..., we divide co by a to obtain
-=1+ Xi E+Zm.+
=1
a j I1-Xjz m=1
If we define the generating function S(z) = SmlSm zm we obtain the equation
(1 + S) a = w. Of course, the decoder does not know all of the coefficients of 1 + S; he
generally knows only S., S2, ..., S2t. He must therefore attempt to obtain a and
from the key equation,
(1 + S) a=co modz2+1.
1972] BERLEKAMP- Survey of Coding Theory 71
The best method known for solving this equation is by a sequence of successive
approximations, o(J),CO, a(1) ( *(1) . ,u5(2t), wo(2t), each pair of which solves an equation
of the form (1 + S) r(k) =W(k)mod zk+l. It is also necessary to give a recursive
definition of an integral-valued function D(k) which serves as an upper bound on
dego(k)and degCO(k). To obtain u(k+l) and w(k+l) from (k) and Cl, an additional
pair of polynomials, 7(k) and y(k), are needed. For certain applications of the algorithm
to decoding nonBCH codes, it is also convenient (although not strictly necessary) to
introduce another function, B(k), which allows the algorithm to break ties accordingly
as B(k) = 0 or 1.
The complete algorithm is as follows:
An algorithmfor solving the key equation over anyfield
Initially define ()- = 1, T(?)= 1, y(o) - 0, D(O) = 0, B(O) =, co(?)(0)= 1. Proceed
recursivelyas follows. If Sk?l is unknown, stop; otherwise define A(k) as the coefficient
of zk+l in the product (1+ S) u(k) and let
u(k+l) = a(k) - A k) zr(k)
-(k)
c(k?1) = - AIk) zy(k).
If (k) =0 or if D(k) > (k+ 1)/2, or if A(k)#0 and D(k) = (k+ 1)/2 and B(k) 0, set
D(k + 1) = D(k),
B(k + 1) = B(k),
7(k+l) = ZT(k)
(k +1) = zy(k)
But if Af(k)A0 and either D(k) < (k + 1)/2 or D(k) = (k + 1)/2 and B(k) 1, set
D(k+ 1) =k+ -D(k)?
B(k+ 1)= l-B(k),
7(k+1) a
a(k)
(k+1) =
The most widely known property of this algorithm is that if a and co are a pair of
relatively prime polynomials which solve the equation (1+ S) or_ modz2t+l, with
deg c ? deg ac t, then a = U(21) and = COW). In other words, if the weight of the
error word is no greater than t, then this algorithm succeeds in decoding the t-error-
correcting BCH code. More general results, which in certain situations allow
modifications of the algorithm to correct many error words of weight t + 1 or t + 2
in t-error-correctingBCH codes, are given by Berlekamp (1968).
As mentioned earlier in this paper, these results allow one to compute the probability
of decoding failure and the probability of decoding error when one of these codes is
decoded by a t-error-correctingdecoding algorithm, where t may be any number less
than half of the code's minimum distance.
There are so many recent weight enumeration theorems that no attempt will be
made to state or prove any of them here. This section is intended only as a guide to
the literature. Details of all results which are not otherwise referenced may be found
in Chapter 16 of Berlekamp (1968).
Weight enumerators for several classes of codes have been obtained by relatively
"direct" methods. Using a variety of combinatorial arguments, several groups of
researcherst have independently found the formulas for the weight enumerators of a
certain class of nonbinary codes which includes all BCH codes for which n = q- 1.
Applying the classical results of Dickson on quadratic forms over a field of character-
istic two, Berlekamp and Sloane (1970) obtained formulae for the weight enumerators
of the second-order Reed-Muller codes. Many researchers have determined several
weight enumerators by computer search, the most recent being Chen (1970).
Most of the codes whose weight enumerators have been determined directly,
including the second order Reed-Muller codes, have rather low rates. In some sense,
the direct methods succeed in obtaining the weight enumerators only because there
are relatively few codewords. Fortunately, however, the MacWilliams identities
allow us to obtain the weight enumerator of a high-rate code whenever we know the
weight distribution of its orthogonal complement, which is a low-rate code. The
MacWilliams theorem is the most fundamental result known about weight distribu-
tions. This result states that the weight enumerator of any linear code is determined
by the weight enumerator of its orthogonal complement, and it gives an explicit
method for calculating either from the other. Another form of the MacWilliam
identities, due to Pless, gives the first r moments of the weight distribution of any
linear code in terms of the number of words of each weight < r in the orthogonal
complement. If one can determine the number of codewords of all but s weights in
any given code, and if one can also determine the number of codewords of all weights
<s in the orthogonal complement, then it is possible to solve the Pless identities for
the complete weight enumerators of both codes.
This indirect approach, which requires an initial study of the weight distributions
of both a given code and its orthogonal complement, has recently proved quite
fruitful. Various authors including Pless have used this technique to determine the
weight distributions of various quadratic residue codes. Kasami was the first to apply
this approach to classes containing an infinite number of codes. The most recent
generalizations of Kasami's results are given by Berlekamp (1970). The classes of
codes whose weights are enumerated in that paper include all primitive BCH codes
of distance < 8 and all primitive BCH codes which contain fewer than 2m9/6codewords
of length 2m-1.
Partial weight enumerators are also known for certain other codes. For example,
Kasami and Tokura (1970) have obtained formulae for the number of codewords
of each weight < 2d in the rth-order RM code of minimum distance d.