Vous êtes sur la page 1sur 7

EE388 Modern Coding Theory

From Sudoku to Sparse Graph Codes


Andrea Montanari Lecture 1-2 - 4/3/2012

Sudoku became a world-wide craze in November 2004, when the rst puzzle appeared in the pages of The Times. Where did it come from? Here we want to explore the hypothesis that Sudoku was originally an error correcting code (used by aliens? by the Incas? by the army?) This will be a pretext to review some basic terminology. We will then introduce sparse graph ensembles and gure out how they can be generated. The homework will consists in writing a program that generates elements from a standard regular ensemble.

Denition
A Sudoku scheme of order is a square of side 2 (thus including 4 cells) divided in of side . Below is an example of partially lled grill of order = 3. 1 2 7 8 4 8 9 6 7 5 9 6 2 3 2 7 6 3 5 1 6 9 9 8 7 4 1 3 5 4 9 6 9 3 1 7
2

square blocks, each

A Sudoku solution is an assignment of integers in {1, 2, . . . , 2 } (numbers) to the cells in the grid that satises three conditions: (i) For each row, the numbers appearing in that row must be distinct; (ii) For each column, the numbers appearing in that column must be distinct; (iii) For each of the 2 , squares, the numbers appearing in that square are distinct. Notice that any set of 2 distinct symbols could play the role of the numbers in {1, . . . , 2 }. We shall call any Sudoku solution a codeword. The set of all solutions will be called codebook (sometimes, for short, the code) and denoted by C, or C( ) whenever it will be necessary to specify its order. The codebook size will be denoted by |C|.

Rate
Here is how Sudoku can be used to convey information. First write your information as a (very long!) binary string. Then chop the string into blocks of length L = log2 |C| (for simplicity, think of L as an integer.) Each of the blocks consists in a number z between 0 and |C| 1 = 2L 1 written in binary notation. This is encoded as the z -th Sudoku solution (say, in lexicographic order) and then transmitted.

The rate of this code is the number of information bits conveyed per channel use. This is a bit ambiguous. Argue that a good mathematical denition would be R( ) = log2 |C( )| . 4 log 2 2 (1)

The number of Sudoku solutions of order 2 is |C(2)| = 288 thus yielding R(2) 0.255310. A counting tour de force lead recently to the result |C(3)| = 6, 670, 903, 752, 021, 072, 936, 960 which corresponds to R(3) 0.282354. 4 Note that |C( )|/( 2 ) is the probability that lling the cells with iid uniformly random numbers, one gets a solution. A naive guess consists in approximating this with the product of probabilities that each constraint is satised. Show that this yields Rnaive ( ) = 1 It is reasonable to conjecture that R( ) 1 as 3 + O( 2 log
2

).

(2)

Errors
A codeword transmitted through a noisy channel can be corrupted in two ways. The rst consists in erasures and is widespred in magazines: 1 3 2 4 2 4 1 3 3 1 4 2 4 2 3 1 1 3 4 3 4 1 2 4 3 4 3 2

A probabilistic model would for this channel would be the following: each entry in the grid is erased independently with probability . A more severe type of noise would consist in ipping the numbers. 1 3 2 4 2 4 1 3 3 1 4 2 4 2 3 1 1 3 2 4 3 4 1 3 3 1 4 2 4 2 3 4

Again, within a probabilistic setting we might assume that each entry is changed independently with probability in a uniformly random element in {1, . . . , 2 }. In what sense the second type of noise is worse (for the same value of ) than the rst one? The formalization is provided by the notion of physical degradation.

Distance
Given two Sudoku solutions x, x , their (Hamming) distance is the number of positions in which they dier. The code minimum distance is the minimum distance between any two distinct codewords: dmin ( ) = min{d(x, x ) : x, x C( )} . 2 (3)

Show that dmin ( ) = 4 for any . I must confess I know a proof only for but I am pretty sure it is true in general. Can you nd a proof for 4? 9 3 8 5 1 7 2 6 4 2 5 7 8 4 6 3 1 9 6 1 4 2 9 3 8 7 5 5 4 9 3 2 1 7 8 6 3 1 5 2 7 8 2 6 5 1 6 3 7 8 4 2 5 1 3 8 6 9 7 8 7 1 9 6 2 5 4 3 3 9 6 4 7 5 1 2 8

= 2, 3 (look at the grid below),

Suppose somebody erases entries maliciously from your Sudoku solution. What is the maximum number of erasures such that you can be sure to reconstruct the original solution? What about ips?

Decoding
Any algorithm for solving Sudoku can be used as a decoder: take the output of the noisy channel, and apply the algorithm to get a complete valid solution (a codeword.) I am not a good Sudoku solver and normally use the following (pretty dumb) algorithm, that works for correcting erasures: if there is a row, a column or a square with only one empty cell, I ll it. I repeat and hope for good. In order to formalize this, let me introduce some notation. I let V denote the set of 4 cells and write i, j, V to indicate specic cells. The set of line plus rows plus squares will be indicated by C (for constraints), and Ill write a, b, C for its elements. We have |C | = 3 2 . The set of cells appearing in a given constraint a is denoted by a. Vice-versa, the set of constraints that involve cell i is i. Iterative Scanning (A partially lled grid G) 1: Set flag = false 2: For a C 3: If a include a unique empty cell 4: Fill it 5: Set flag = true 6: End for; 7: If flag = true 8: Goto 1; 9: Else 10: Return current grid; Can you invent a similar algorithm to correct ips? Notice in passing that the notation introduced above naturally leads to a bipartite graph, called the factor graph of the problem. This has vertices sets V , and C , and an edge (i, a) whenever i a or (equivalently) a i. Below is the factor graph for = 2 (squares correspond to constraints, circles to variables.)

Error probability
It would be nice to have an idea of how well the iterative algorithm does. We need to dene some performance measure. Two well known such measures are: (i) The block error probability PB . This is the probability that the original grid is not reconstructed, or it is reconstructed incorrectly. (ii) The symbol (or bit) error probability Pb . This is the probability that a symbol is not reconstructed correctly, averaged over the symbol position. In both cases we assume a uniformly random codeword (Sudoku solution) is transmitted. In order to formalize these denitions, we need some more notation. We let x = {xi : i V } be a uniformly random Sudoku solution (this is a vector with 4 entries in [ 2 ].) The output of the noisy channel will be denoted by y = {yi : i V } (this is a vector with entries yi {, 1, . . . , 2 }.) Finally, the output of the iterative agorithm on input y will be x (y ) = {x i (y ) : i V }. We have he denitions PB ( , ) P{x (y ) = x} , 1 P{x i (y ) = xi } . Pb ( , ) |V |
i V

(4) (5)

How does PB ( , ) behave for small ?

Dense graphs
Sudoku is not a convenient error correcting code for a number of reasons. Can you name one? Linear codes are somewhat more convenient. One such code is dened by a n(1 R) n binary matrix H. The codebook is C = x {0, 1}n : Hx = 0 mod 2 . (6)

Such a code can also be completely describe by a bipartite graph, namely the graph whose adjacency matrix is H. The parity check ensemble is a randomized construction of the matrix H, rst introduced by Elias. It consists in taking each entry Hij to be 0 or 1 independently of the others with probability 1/2. We might also call this the high-density parity check (HDPC) ensemble. Here the density is dened as the fraction of ones in the parity check matrix. An example of a graph from this ensemble (with blocklength n = 16 and design rate R = 1/4) is below.

How would you generate a random element from this ensemble? How much memory does it take to store it?

By the way, do you know how a pseudo-random number generator works? This ensemble can be used to communicate over the BEC( ) at vanishing error probability if < Sh 1 R. In proving this one assumes to have enough time to solve a linear system over n binary variables. A simpler algorithm is provided by the following greedy decoder, that is completely analogous to the one for solving Sudoku. We shall call it hereafter the peeling algorithm. Peeling Decoder (A partially erased codeword y ) 1: Set x = y ; 2: Set flag = false 3: For a C 4: If a includes exactly one erased position 5: Set its value to the sum of the other variables in a 6: Set flag = true 7: End for; 8: If flag = true 9: Goto 1; 10: Else 11: Return current guess x; How do you think that the dense ensemble will perform under this algorithm?

Sparse graphs
A good idea is to make the factor graph sparse. We will say that the graph is sparse if the fraction of ones in the parity check matrix is O(1/n). Equivalently the average degree of the graph is O(1). A simple random ensemble can be dened in this way: for each variable node i [n] and each check node a [m], draw an edge (i, a) independently with probability /n. Here is a density parameter that is kept xed as n . An example is drawn below.

How much memory does it take to store such a graph? What is the problem with such a code?

Almost one-dimensional graphs


Lets try something dierent (here explained for the case R = 1/4). Arrange variable nodes and check nodes on a line of length 3n (= 4m) and connect each check node to the 4 variable nodes that follow it on the line. When you arrive at the end of the line, wrap it around. In the gure below I sketched this construction but placed nodes on two rows to reduce the messiness. What is the problem with this graph?

Regular graphs
In regular ensemble the factor graph is random with degree l at variable nodes and k at check nodes. The (design) rate is R = 1 l/k . We also refer to this at the (l, k ) ensemble. Ideally the graph would be uniformly random. How would you generate a uniformly random graph from this ensemble? In practice one uses a modied ensemble dened by the following generation procedure (we refer here to the (3, 4) case). First draw 3 half-edges for each variable node, and 4 half-edges for each check node. Then draw a uniformly random permutation over 3n elements (how would you do this?). Finally connect half edges on the two sides according to this permutation. The resulting graph might have double edges. What is the expected number of double edges? What is the probability that the graph does not contain double edges? How would you deal with them? An important property of such graphs (that we will study in greater detail) can be glimpsed from the following question. What is the expected number of loops of length 4?

SUMMARY
At the end of this week you should know about: 1. Code, blocklength, rate, factor graph. 2. Binary memoryless symmetric channels. 3. Linear code ensembles. 4. Minimum distance. 5. Block error probability and symbol (bit) error probability. 6. Why low-density parity check codes are better than high-density ones. 7. How to generate graphs from low-density ensembles. The material for points 1-4 (and some more) can be found in Sections 1.1-1.6, and 1.7.1 of Richardson, Urbanke, Modern Coding Theory (Cambridge), hereafter referred to as MCT. The material for points 6-7 can be found in MCT, Sections 3.7, 3.8. Also, complementary reading in M. M ezard and A. Montanari, Information, Physics, and Computation, Oxford University Press, Chapter 9 (available at http://www.stanford.edu/ montanar/BOOK/book.html).

WORK
Write a program that generates a random graph from the (3, 4) ensemble. Estimate the average number of double edges for a few values of n. I expect to receive 1. A print-out of the code. 2. A short plain English description of the data structure used to store the graph. 3. A plot of the average number of double edges as a function of n.

Vous aimerez peut-être aussi