Discrete Mathematics - Balakrishnan and Viswanathan

Contents
1 Introduction: Sets, Functions and Relations 1.1 Introduction . . . . . . . . . . . . . . . . . . 1.2 Functions . . . . . . . . . . . . . . . . . . . 1.3 Equivalence Relations . . . . . . . . . . . . . 1.4 Finite and Innite Sets . . . . . . . . . . . 1.5 Cardinal Numbers of Sets . . . . . . . . . . 1.6 Power set of a set . . . . . . . . . . . . . . . 1.7 Exercises . . . . . . . . . . . . . . . . . . . . 1.8 Partially Ordered Sets . . . . . . . . . . . . 1.9 Lattices . . . . . . . . . . . . . . . . . . . . 1.10 Boolean Algebras . . . . . . . . . . . . . . . 1.10.1 Introduction . . . . . . . . . . . . . . 1.10.2 Examples of Boolean algebras . . . . 1.11 Atoms in a Lattice . . . . . . . . . . . . . . 1.12 Exercises . . . . . . . . . . . . . . . . . . . . 1 1 3 6 9 12 16 18 20 24 35 35 36 40 45 47 47 48 49 49 51 52 54 59 62 64 65 65 68 71
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
2 Combinatorics 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Elementary Counting Ideas . . . . . . . . . . . . . . . . . . . . 2.2.1 Sum Rule . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Product Rule . . . . . . . . . . . . . . . . . . . . . . . 2.3 Combinations and Permutations . . . . . . . . . . . . . . . . . 2.4 Stirlings Formula . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Examples in simple combinatorial reasoning . . . . . . . . . . 2.6 The Pigeon-Hole Principle . . . . . . . . . . . . . . . . . . . . 2.7 More Enumerations . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Enumerating permutations with constrained repetitions 2.8 Ordered and Unordered Partitions . . . . . . . . . . . . . . . . 2.8.1 Enumerating the ordered partitions of a set . . . . . . 2.9 Combinatorial Identities . . . . . . . . . . . . . . . . . . . . . 2.10 The Binomial and the Multinomial Theorems . . . . . . . . . i
CONTENTS 2.11 2.12 2.13 2.14 2.15 2.16 Principle of Inclusion-Exclusion . . . . . . . . . . . . . . . Eulers -function . . . . . . . . . . . . . . . . . . . . . . . Inclusion-Exclusion Principle and the Sieve of Eratosthenes Derangements . . . . . . . . . . . . . . . . . . . . . . . . . Partition Problems . . . . . . . . . . . . . . . . . . . . . . 2.15.1 Recurrence relations p(n, m) . . . . . . . . . . . . . Ferrer Diagrams . . . . . . . . . . . . . . . . . . . . . . . . 2.16.1 Proposition . . . . . . . . . . . . . . . . . . . . . . 2.16.2 Proposition . . . . . . . . . . . . . . . . . . . . . . 2.16.3 Proposition . . . . . . . . . . . . . . . . . . . . . . Solution of Recurrence Relations . . . . . . . . . . . . . . Homogeneous Recurrences . . . . . . . . . . . . . . . . . . Inhomogeneous Equations . . . . . . . . . . . . . . . . . . Repertoire Method . . . . . . . . . . . . . . . . . . . . . . Perturbation Method . . . . . . . . . . . . . . . . . . . . . Solving Recurrences using Generating Functions . . . . . . 2.22.1 Convolution . . . . . . . . . . . . . . . . . . . . . . Some simple manipulations . . . . . . . . . . . . . . . . . . 2.23.1 Solution of recurrence relations . . . . . . . . . . . 2.23.2 Some common tricks . . . . . . . . . . . . . . . . . Illustrative Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii 75 78 79 80 83 83 84 85 85 86 86 90 93 94 96 97 98 100 102 104 105
2.17 2.18 2.19 2.20 2.21 2.22 2.23
2.24
3 Basics of Number Theory 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 3.2 Divisibility . . . . . . . . . . . . . . . . . . . . . . . . 3.3 gcd and lcm of two integers . . . . . . . . . . . . . . 3.4 Primes . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Congruences . . . . . . . . . . . . . . . . . . . . . . . 3.7 Complete System of Residues . . . . . . . . . . . . . 3.8 Linear Congruences and Chinese Remainder Theorem 3.9 Lattice Points Visible from the Origin . . . . . . . . . 3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Some Arithmetical Functions . . . . . . . . . . . . . 3.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 3.13 The big O notation . . . . . . . . . . . . . . . . . . .
115 . 115 . 115 . 117 . 122 . 126 . 128 . 131 . 136 . 141 . 145 . 146 . 154 . 155
4 Mathematical Logic 161 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 4.2 Fully Parenthesized Propositions and their Truth Values . . . 167 4.3 Validity, Satisability and Related Concepts . . . . . . . . . . 172
CONTENTS 4.4 Normal forms . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Conjunctive and Disjunctive Normal Forms . . 4.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . 4.6 The Resolution Principle in Propositional Calculus . . 4.7 Predicate Calculus Basic Ideas . . . . . . . . . . . . . 4.8 Formulas in Predicate Calculus . . . . . . . . . . . . . 4.8.1 Free and bound Variables . . . . . . . . . . . . 4.9 Interpretation of Formulas of Predicate Calculus . . . . 4.9.1 Structures . . . . . . . . . . . . . . . . . . . . . 4.9.2 Truth Values of formulas in Predicate Calculus . 4.10 Equivalence of Formulas in Predicate Calculus . . . . . 4.11 Prenex Normal Form . . . . . . . . . . . . . . . . . . . 4.12 The Expansion Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii 177 178 180 182 188 191 193 194 195 196 200 203 204
5 Algebraic Structures 209 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 5.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 5.3 Addition, Scalar Multiplication and Multiplication of Matrices 210 5.3.1 Transpose of a Matrix . . . . . . . . . . . . . . . . . . 211 5.3.2 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . 211 5.3.3 Symmetric and Skew-symmetric matrices . . . . . . . . 212 5.3.4 Hermitian and Skew-Hermitian matrices . . . . . . . . 213 5.3.5 Orthogonal and Unitary matrices . . . . . . . . . . . . 213 5.4 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 5.4.1 Group Tables . . . . . . . . . . . . . . . . . . . . . . . 221 5.5 A Group of Congruent Transformations (Also called Symmetries) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 5.6 Another Group of Congruent Transformations . . . . . . . . . 224 5.7 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 5.8 Cyclic Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 5.9 Lagranges Theorem for Finite Groups . . . . . . . . . . . . . 233 5.10 Homomorphisms and Isomorphisms of Groups . . . . . . . . . 238 5.11 Properties of Homomorphisms of Groups . . . . . . . . . . . . 241 5.12 Automorphism of Groups . . . . . . . . . . . . . . . . . . . . 244 5.13 Normal Subgroups . . . . . . . . . . . . . . . . . . . . . . . . 245 5.14 Quotient Groups (or Factor Groups) . . . . . . . . . . . . . . 248 5.15 Basic Isomorphism Theorem for Groups . . . . . . . . . . . . 250 5.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 5.17 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 5.17.1 Rings, Denitions and Examples . . . . . . . . . . . . . 258 5.17.2 Units of a ring . . . . . . . . . . . . . . . . . . . . . . . 260
CONTENTS 5.18 5.19 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30 5.31 5.32 5.33 5.34 5.35 5.36 Integral Domains . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . Fields . . . . . . . . . . . . . . . . . . . . . . . . . Characteristic of a Field . . . . . . . . . . . . . . . Vector Spaces . . . . . . . . . . . . . . . . . . . . . 5.22.1 Examples of Vector Spaces . . . . . . . . . . Subspaces . . . . . . . . . . . . . . . . . . . . . . . Spanning Sets . . . . . . . . . . . . . . . . . . . . . Linear Independence and Base . . . . . . . . . . . . Bases of a Vector Space . . . . . . . . . . . . . . . Dimension of a Vector Space . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . Solutions of Linear Equations and Rank of a Matrix Solutions of Linear Equations . . . . . . . . . . . . Solutions of Nonhomogeneous Linear Equations . . LUP Decomposition . . . . . . . . . . . . . . . . . 5.32.1 Computing an LU Decomposition . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . Finite Fields . . . . . . . . . . . . . . . . . . . . . . Factorization of Polynomials over Finite Fields . . . 5.35.1 Exercises . . . . . . . . . . . . . . . . . . . . Mutually Orthogonal Latin Squares [MOLS] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv 263 263 264 266 268 270 271 272 275 279 280 285 287 290 292 294 295 301 302 309 312 313 316 316 319 319 328 331 333 337 337 339 342 349 349 353 355 355 360 361
6 Graph Theory 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 6.2 Basic denitions and ideas . . . . . . . . . . . . . . 6.2.1 Types of Graphs . . . . . . . . . . . . . . . 6.2.2 Two Interesting Applications . . . . . . . . 6.2.3 First Theorem of Graph Theory . . . . . . . 6.3 Representations of Graphs . . . . . . . . . . . . . . 6.4 Basic Ideas in Connectivity of Graphs . . . . . . . . 6.4.1 Some Graph Operations . . . . . . . . . . . 6.4.2 Vertex Cuts, Edge Cuts and Connectivity . 6.4.3 Vertex Connectivity and Edge-Connectivity 6.5 Trees and their properties . . . . . . . . . . . . . . 6.5.1 Basic Denition . . . . . . . . . . . . . . . . 6.5.2 Sum of distances from a leaf of a tree . . . . 6.6 Spanning Tree . . . . . . . . . . . . . . . . . . . . . 6.6.1 Denition and Basic Results . . . . . . . . . 6.6.2 Minimum Spanning Tree . . . . . . . . . . . 6.6.3 Algorithm PRIM . . . . . . . . . . . . . . .
CONTENTS 6.6.4 Algorithm KRUSKAL . . . . . Independent Sets and Vertex Coverings 6.7.1 Basic Denitions . . . . . . . . Vertex Colorings of Graphs . . . . . . . 6.8.1 Basic Ideas . . . . . . . . . . . 6.8.2 Bounds for (G) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v 363 363 363 366 366 369 371 371 373 374 379 380 381 386 388 389 390
6.7 6.8
7 Coding Theory 7.1 Introduction . . . . . . . . . . . . . . . . . . 7.2 Binary Symmetric Channels . . . . . . . . . 7.3 Linear Codes . . . . . . . . . . . . . . . . . 7.4 The Minimum Weight (Hamming Weight) of 7.5 Hamming Codes . . . . . . . . . . . . . . . 7.6 Standard Array Decoding . . . . . . . . . . 7.7 Sphere Packings . . . . . . . . . . . . . . . . 7.8 Extended Codes . . . . . . . . . . . . . . . . 7.9 Syndrome Decoding . . . . . . . . . . . . . . 7.10 Exercises . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . Code . . . . . . . . . . . . . . . . . . . . . . . .
8 Cryptography 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 8.2 Some Classical Cryptosystem . . . . . . . . . . . . . 8.2.1 Caesar Cryptosystem . . . . . . . . . . . . . . 8.2.2 Ane Cryptosystem . . . . . . . . . . . . . . 8.2.3 Private Key Cryptosystems . . . . . . . . . . 8.2.4 Hacking an ane cryptosystem . . . . . . . . 8.3 Encryption Using Matrices . . . . . . . . . . . . . . . 8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Other Private Key Cryptosystems . . . . . . . . . . . 8.5.1 Vigenere Cipher . . . . . . . . . . . . . . . . . 8.5.2 The One-Time Pad . . . . . . . . . . . . . . . 8.6 Public Key Cryptography . . . . . . . . . . . . . . . 8.6.1 Working of Public Key Cryptosystems . . . . 8.6.2 RSA Public Key Cryptosystem . . . . . . . . 8.6.3 The ElGamal Public Key Cryptosystem . . . 8.6.4 Description of ElGamal System . . . . . . . . 8.7 Primality Testing . . . . . . . . . . . . . . . . . . . . 8.7.1 Nontrivial Square Roots (mod n) . . . . . . . 8.7.2 Prime Number Theorem . . . . . . . . . . . . 8.7.3 Pseudoprimality Testing . . . . . . . . . . . . 8.7.4 The Miller-Rabin Primality Testing Algorithm
392 . 392 . 393 . 393 . 394 . 396 . 396 . 399 . 401 . 402 . 402 . 403 . 404 . 405 . 406 . 409 . 410 . 411 . 411 . 412 . 413 . 414
CONTENTS
vi
8.8
8.7.5 Miller-Rabin Algorithm (a, n) . . . . . . . . . . . . . . 416 The Agrawal-Kayal-Saxena (AKS) Primality Testing Algorithm417 8.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 417 8.8.2 The Basis of AKS Algorithm . . . . . . . . . . . . . . . 418 8.8.3 Notation and Preliminaries . . . . . . . . . . . . . . . . 419 8.8.4 The AKS Algorithm . . . . . . . . . . . . . . . . . . . 421
9 Finite Automata 430 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 9.2 Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 9.3 Regular Expressions and Regular Languages . . . . . . . . . . 435 9.4 Finite Automata Denition . . . . . . . . . . . . . . . . . . 437 9.4.1 The Product Automaton and Closure Properties . . . . 442 9.5 Nondeterministic Finite Automata . . . . . . . . . . . . . . . 444 9.5.1 Nondeterminism . . . . . . . . . . . . . . . . . . . . . 444 9.5.2 Denition of NFA . . . . . . . . . . . . . . . . . . . . . 446 9.6 Subset ConstructionEquivalence of DFA and NFA . . . . . . 447 9.7 Closure of Regular Languages Under Concatenation and Kleene Star . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 9.8 Regular Expressions and Finite Automata . . . . . . . . . . . 455 9.9 DFA State Minimization . . . . . . . . . . . . . . . . . . . . . 462 9.10 Myhill-Nerode Relations . . . . . . . . . . . . . . . . . . . . . 467 9.10.1 Isomorphism of DFAs . . . . . . . . . . . . . . . . . . . 467 9.10.2 Myhill-Nerode Relation . . . . . . . . . . . . . . . . . . 467 9.10.3 Construction of the DFA from a given Myhill-Nerode Relation . . . . . . . . . . . . . . . . . . . . . . . . . . 469 9.10.4 Myhill-Nerode Relation and the Corresponding DFA . 471 9.11 Myhill-Nerode Theorem . . . . . . . . . . . . . . . . . . . . . 474 9.11.1 Notion of Renement . . . . . . . . . . . . . . . . . . . 474 9.11.2 Myhill-Nerode Theorem and Its Proof . . . . . . . . . 475 9.12 Non-regular Languages . . . . . . . . . . . . . . . . . . . . . . 477 9.12.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . 477 9.12.2 The Pumping Lemma . . . . . . . . . . . . . . . . . . 479
Chapter 1 Introduction: Sets, Functions and Relations

1.1 Introduction
In this chapter, we recall some of the basic facts about sets, functions, relations and lattices. We are sure that the reader is already familiar with most of these that are usually taught in high school algebra with the exception of lattices. We also assume that the reader is familiar with the basics of real and complex numbers. If A and B are sets and A B (that is A is a subset of B , and A may be equal to B ), then the complement of A in B is the set B \ A consisting of all elements of B not belonging to A. The sets A and B are equal if A B and B A. Denition 1.1.1: By a family of sets we mean an indexed collection of sets. 1
For instance, F = {A }I is a family of sets. Here for each I , there exists a set A of F . Assume that each A is a subset of set X . Such a set X certainly exists since we can take X = A . For each I , denote by A the complement X/A of A in X . We then have the celebrated laws of de Morgan. Theorem 1.1.2 (De Morgans laws): Let {A }I be a family of subsets of a set X . Then (i) ( A ) = A , and
I I I
(ii) ( A ) = A .
I I
Proof. We prove (i); the proof of (ii) is similar. Let x ( A ) . Then x A , and therefore, x A , for each I . Hence x
I I I I
for each , and consequently, x A . Thus

I I
( A ) A . Conversely, assume that x A . Then x A for each I , and therefore, x A for each I . Thus x A , and hence x ( A ) . Consequently,
I I
( A ) . This proves (i).

I
Denition 1.1.3: A family {A }I is called a disjoint-family of sets if whenever I , I and = , we have A A = . For instance, if A1 = {1, 2}, A2 = {3, 4} and A3 = {5, 6, 7} then {A }I , I = {1, 2, 3} is a disjoint-family of sets.
1.2
Functions
Denition 1.2.1: A function (also called a map or mapping or a single-valued function) f : A B from a set A to a set B is a rule by which to each a A, there is assigned a unique element f (a) B . f (a) is called the image of a under f . For example, if A is a set of students of a particular class, then for a A if f (a) denotes the height of a, then f : A R+ the set of positive real numbers is a function. Denition 1.2.2: Two functions f : A B and g : A B are called equal if f (a) = g (a) for each a A. Denition 1.2.3: If E is a subset of A, then the image of E under f : A B is {f (a)}. It is denoted by f (E ). Denition 1.2.4: A function f : A B is one-to-one (or 11 or injective) if for a1 and a2 in A, f (a1 ) = f (a2 ) implies that a1 = a2 . Equivalently, this means that f (a1 ) = f (a2 ) implies that a1 = a2 . Hence f is 11 i distinct elements of A have distinct images in B under f . As an example, let A denote the set of 1,000 students of a college A, and B the set of positive integers. For a A, let f (a) denote the exam registration number of a. Then f (a) B . Clearly, f is 11. On the other
a E
hand if for the above sets A and B , f (a) denotes the age of the student a, f is not 11. Denition 1.2.5: A function f : A B is called onto (or surjective) if for each b B , there exists at least one a A with f (a) = b (that is, the image f (A) = B ). For example, let A denote the set of integers Z and B , the set of even integers. If f : A B is dened by setting f (a) = 2a, then f : A B is onto. Again, if f : R (set of non-negative reals) dened by f (x) = x2 , then f is onto but not 11. Denition 1.2.6: A function f : A B is bijective (or is a bijection) if it is both 11 and onto. The function f : Z 2Z dened by f (a) = 2a is bijective. An injective (respectively surjective, bijective) mapping is referred to an injection (respectively surjection, bijection). Denition 1.2.7: Let f : A B and g : B C be functions. The composition of g with f , denoted by g f , is the function gf : AC dened by (g f )(a) = g (f (a)) for a A. As an example, let A = Z denote the set of integers, B = N {0}, where
N is the set of natural numbers {1, 2, . . .}, and C = N. If f : A B is given by f (a) = a2 , a A, and g : B C is dened by g (b) = b + 1, b B then h = g f : A C is given by, h(a) = g (f (a)) = g (a2 ) = a2 + 1, a Z = A. Denition 1.2.8: Let f : A B be a function. For F B , the inverse image of F under f , denoted by f 1 (F ) is the set of all a A with f (a) F . In symbols: f 1 (F ) = {a A : f (a) F }. Consider the example under Denition 1.2.5. If F = {1, 2}, that is, if F consists of the 1st and 2nd standards of the school, then f 1 (F ) is the set of students who are either in the 1st standard or in the 2nd standard. Theorem 1.2.9: Let f : A B , and X1 , X2 A and Y1 , Y2 B . Then the following statements are true: (i) f (X1 X2 ) (ii) f (X1 X2 ) = f (X1 ) f (X2 ), f (X1 ) f (X2 ), and
(iii) f 1 (Y1 Y2 ) = f 1 (Y1 ) f 1 (Y2 ), (iv ) f 1 (Y1 Y2 ) = f 1 (Y1 ) f 1 (Y2 ).
Proof. We prove (iv). The proofs of the other statements are similar. So assume that a f 1 (Y1 Y2 ), where a A. Then f (a) Y1 Y2 , and therefore, f (a) Y1 and f (a) Y2 . Hence a f 1 (Y1 ) and a f 1 (Y2 ), and therefore, a f 1 (Y1 ) f 1 (Y2 ). The converse is proved just by retracing the steps.
Note that, in general, we may not have equality in (ii). Here is an example where equality does not hold good. Let A = {1, 2, 3, 4, 5} and B = {6, 7, 8}. Let f (1) = f (2) = 6, f (3) = f (4) = 7, and f (5) = 8. Let X1 = {1, 2, 4} and X2 = {2, 3, 5}. Then X1 X2 = {2}, and so, f (X1 X2 ) = {f (2)} = {6}. However, f (X1 ) = {6, 7}, and f (X2 ) = {6, 7, 8}. Therefore f (X1 ) f (X2 ) = {6, 7} = f (X1 X2 ). We next dene a family of elements and a sequence of elements in a set X. Denition 1.2.10: A family {xi }iI of elements xi in a set X is a map x : I X , where for i I , x(i) = xi X . I is the indexing set of the family (In other words, for each i I , there is an element xi X of the family). Denition 1.2.11: A sequence {xn }nN of elements of X is a map x : N X . In other words, a sequence in X is a family in X where the indexing set is the set N of natural numbers. For example, {2, 4, 6, . . .} is the sequence of even positive integers.
1.3
Equivalence Relations
Denition 1.3.1: The Cartesian product X Y of two (not necessarily distinct) sets X and Y is the set of all ordered pairs (x, y ), where x X and y Y . In symbols: X Y = {(x, y ) : x X, y Y }. In the ordered pair (x, y ), the order of x and y is important whereas the
unordered pairs (x, y ) and (y, x) are equal. As ordered pairs they are equal if and only if x = y . For instance, the pairs (1, 2) and (2, 1) are not equal as ordered pairs, while they are equal as unordered pairs. Denition 1.3.2: A relation R on a set X is a subset of the Cartesian product X X . If (a, b) R, then we say that b is related to a under R and we also denote this fact by aRb. For example, if X = {1, 2, 3}, the set R = {(1, 1), (1, 2), (2, 2)} is a relation on X . One of the important concepts in the realm of relations is the equivalence relation. Denition 1.3.3: A relation R on a set X is an equivalence relation on X if (i) R is reexive, that is, (a, a) R for each a X , (ii) R is symmetric, that is, if (a, b) R then (b, a) R, and (iii) R is transitive, that is, if (a, b) R and (b, c) R, then (a, c) R. We denote by [a] the set of elements of X which are related to a under R. In other words, [a] = {x X : (x, a) R}. [a] is called the equivalence class dened by a in the relation R. Example 1.3.4:(1) On the set N of positive integers, let aRb mean that a|b (a is a divisor of b). Then R is reexive and transitive but not symmetric. (2) Let X = N, the set of positive integers. For x, y X , set xRy i x is a
Chapter 1 Introduction: Sets, Functions and Relations divisor of y . Then R is not symmetric but reexive and transitive. It is clear that similar examples can be constructed. Example 1.3.5 (Example of an equivalence relation):
On the set Z of integers (positive integers, negative integers and zero), set aRb i a b is divisible by 5. Clearly R is an equivalence relation on Z. Denition 1.3.6: A partition P of a set X is a collection P of nonvoid subsets of X whose union is X such that the intersection of any two distinct members of P is empty. Theorem 1.3.7: Any equivalence relation R on a set X induces a partition on X in a natural way. Proof. As above, let [x] denote the class dened by x. We show that the classes [x], x X , dene a partition on X . First of all, each x of X belongs to class [x] since (x.x) R. Hence X = [x]
x X
We now show that if (x, y ) R, then [x] [y ] = . Suppose on the contrary, [x] [y ] = . Let z [x] [y ]. This means that z [x] and z [y ]; hence (z, x) R and (z, y ) R. This of course means that (x, z ) R and (z, y ) R and hence by transitivity (x, y ) R, a contradiction. Thus {[x] : x X } forms a partition of X . Example 1.3.8:
Let X = Z, the set of integers, and let (a, b) R i a b is a multiple of 5. Then clearly, R is an equivalence relation on Z. The equivalence classes are: [0] ={. . . , 10, 5, 0, 5, 10, . . .}, [1] ={. . . , 9, 4, 1, 6, 11, . . .}, [2] ={. . . , 8, 3, 2, 7, 12, . . .}, [3] ={. . . , 7, 2, 3, 8, 13, . . .}, [4] ={. . . , 6, 1, 4, 9, 14, . . .}. Note that [5]=[0] and so on. Then the collection {[0], [1], [2], [3], [4]} of equivalence classes forms a partition of Z.
1.4
Finite and Innite Sets
Denition 1.4.1: Two sets are called equipotent if there exists a bijection between them. Equivalently, if A and B are two sets then A is equivalent to B if there exists a bijection : A B from A onto B . If : A B is a bijection from A to B , then 1 : B A is also a bijection. Again, if : A B and : B C are bijections, then : A C is also a bijection. Trivially, the identity map i : A A dened by i(a) = a for each a A, is a bijection on A. Hence if X is a nonvoid set, and P (X ) is the power set of X , that is, the collection of all subsets of X , then for A, B P (X ), if we set ARB i there exists a bijection from A onto B , then R is an equivalence relation on P (X ). Equipotent sets have the same cardinal number or cardinality.
10
Let Nn denote the set {1, 2, . . . , n}. Nn is called the initial segment dened with respect to n. Denition 1.4.2: A set S is nite if S is equipotent to Nn for some positive integer n; otherwise S is called an innite set. If S is equipotent to Nn , then the number of elements in it is n. Hence if n = m, Nn is not equipotent to Nm . Consequently, no nite set can be equipotent to a proper subset of itself. For instance, the set of trees in a garden is nite whereas the set Z of integers is innite. Any subset of a nite set is nite and therefore any superset of an innite set is innite. (If S is an innite set and S T , then T must be innite; otherwise, S being a subset of a nite set T , must be nite). Theorem 1.4.3: Let S be a nite set and f : S S . Then f is 11 i f is onto. Proof. Suppose f : S S is 11, and T = f (S ). If T = S , as f is a bijection from S to T S . Hence S and T have the same number of elements, a
=
contradiction to the fact that T is a proper subset of the nite set S . Hence f must be onto. Conversely, assume that f is onto. (If f is not 11, there exists at least one s S having at least two preimages). For each s S choose a preimage s S under f . Let S be the set of all such s . Clearly, if s and t are distinct elements of S , then s = t . Hence S is a proper subset of S . Moreover, the
11
function : S S dened by (s) = s is a bijection. Thus S is bijective with the proper subset S of S , a contradiction to the fact that S is a nite set. Example 1.4.4: We show by means of examples that the conclusions in Theorem 1.4.3 may not be true if S is an innite set. First, take S = Z, the set of integers and f : Z Z dened by f (a) = 2a. Clearly f is 11 but not onto (the image of f being the set of even integers). Next, let R be the set of real numbers, x1 f (x) = 0 x + 1 and let f : R R be dened by if x > 0 if x = 0 if x < 0
Clearly f is onto; however, f is not 11 since f (1) = f (0) = f (1) = 0.
Theorem 1.4.5: The union of any two nite sets is nite. Proof. First we show that the union of any two disjoint nite sets is nite. Let S and T be any two nite sets of cardinalities n and m respectively. Then S is equipotent to Nn and T equipotent to Nm = {1, 2, . . . , m}. Clearly T is also equipotent to the set {n + 1, n + 2, . . . , n + m}. Hence S T is equipotent to {1, . . . , n} {n + 1, . . . , n + m} = Nn+m . Hence S T is also a nite set.
12
By induction it follows that the union of a disjoint-family of a nite number of nite sets is nite. We now show that the union of any two nite sets is nite. Let S and T be any two nite sets. Then S T is the union of the three pair-wise disjoint sets S \T , S T and T \S and hence is nite. Corollary 1.4.6: The union of any nite number of nite sets is nite. Proof. By induction on the number of nite sets.
1.5
Cardinal Numbers of Sets
In this section, we briey discuss the cardinal numbers of sets. Recall Section 1.4. Denition 1.5.1: A set A is equipotent to a set B if there exists a bijection f from A onto B , and that equipotence between members of a collection of sets S is an equivalence relation on S . As mentioned before, the sets in the same equivalence class are said to have the same cardinality or the cardinal number. Intuitively it must be clear that equipotent sets have the same number of elements. The cardinal number of any nite set is a positive integer, while the cardinal numbers of innite sets are denoted by certain symbols. The cardinal number of the innite set N (the set of positive integers) is denoted by 0 (aleph not). is
Chapter 1 Introduction: Sets, Functions and Relations the rst character of the Hebrew language. Denition 1.5.2:
13
A set is called denumerable if it is equipotent to N (equivalently, if it has cardinal number 0 ). A set is countable if it is nite or denumerable. It is uncountable if it is not countable (clearly, any uncountable set must be innite). Lemma 1.5.3: Every innite set contains a denumerable subset. Proof. Let X be an innite set and let x1 X . Then X1 = X \ {x1 } is an innite subset of X (If not, X = X1 {x1 } is a union of two nite subsets of X and therefore nite by Corollary 1.4.6). As X1 is innite, X1 has an element x2 , and X1 \ {x2 } = X \ {x1 , x2 } is innite. Suppose we have found out distinct elements x1 , x2 , . . . , xn in X with Xn = X \ {x1 , . . . , xn } innite. Then there exists xn+1 in Xn so that Xn \ {xn+1 } is innite. By induction, there exists a denumerable subset {x1 , x2 , . . . , xn , xn+1 , . . .} of X . Theorem 1.5.4: A set is innite i it is equipotent to a proper subset of itself. Proof. That no nite set can be equipotent to a proper subset of it has already been observed (Section 1.4). Assume that X is innite. Then, by Lemma 1.5.3, X contains a denumerable subset X0 = {x1 , x2 , . . . , }. Let Y =(X \ X0 ) {x2 , x3 , . . .} = X \ {x1 }.
Chapter 1 Introduction: Sets, Functions and Relations Then the mapping : X Y dened by x if x X \ X0 (x) = xn+1 if x = xn , n 1,
14
(so that (x1 ) = x2 , (x2 ) = x3 and so on) is a 11 map of X onto Y , and therefore an equipotence (that is, a bijection). Thus X is equipotent to the proper subset Y = X \ {x} of X . Notation We denote the cardinality of a set X by |X |. If X is a set of, say, 17 elements and y is a set of 20 elements, then |X | < |Y | and there exists a 11 mapping of X to Y . Conversely, if X and Y are nite sets and |X | < |Y |, then there exists a 11 map from X to Y . These ideas can be generalized to any two arbitrary sets. Denition 1.5.5: Let X and Y be any two sets. Then |X | |Y | i there exists a 11 mapping from X to Y . Suppose we have |X | |Y |, and |Y | |X |. If X and Y are nite sets, it is clear that X and Y have the number of elements, that is, |X | = |Y |. The same result holds good even if X and Y are innite sets. This result is known as Schroder-Bernstein theorem. Theorem 1.5.6 (Schroder-Bernstein): If X and Y are sets such that |X | |Y | and |Y | |X |, then |X | = |Y |. For the proof of Theorem 1.5.6, we need a lemma.
Chapter 1 Introduction: Sets, Functions and Relations Lemma 1.5.7:
15
Let A be a set and A1 and A2 be subsets of A such that A A1 A2 . If |A| = |A2 |, then |A| = |A1 |. Proof. If A is a nite set, then |A| = |A2 | gives that A = A2 , as A2 is a subset of A. Hence A = A1 = A2 , and therefore |A| = |A1 |. So assume that A is an innite set. |A| = |A2 | means that there exists a bijection : A A2 . Let (A1 ) = A3 A2 . We then have, A1 A2 A3 , and |A1 | = |A3 | (1.1)
So starting with A A1 A2 and |A| = |A2 |, we get (1.1). Starting with (1.1) and using the same argument, we get A2 A3 A4 and |A2 | = |A4 |. (1.2)
Note that the bijection from A2 to A4 is given by the same map . In this way, we get a sequence of sets A A1 A2 A3 . . . with |A| = |A3 |, and |Ai | = |Ai+2 |, for each i 1. Moreover, A \ A1 = A2 \ A3 , A1 \ A2 = A3 \ A4 , A2 \ A3 = A4 \ A5 , and so on. (see Figure 1.1). Once again, the bijections are under the same map . Let P = A A1 A2 . . . ThenA =(A \ A1 ) (A1 \ A2 ) . . . P, and (1.3)
16
A\A1 A A1
A2 \A3 A3 A2
Figure 1.1: A1 =(A1 \ A2 ) (A2 \ A3 ) . . . P, where the sets on the right are pairwise disjoint. Moreover, |A \ A1 | = |A1 \ A2 |, |A1 \ A2 | = |A2 \ A3 | and so on. Hence |A| = |A1 |. Proof of SchroderBernstein theorem. By hypothesis |X | |Y |. Hence there exists a 11 map : X Y . Let (X ) = Y Y . As |Y | |X |, there exists a 11 map : Y X . Let (Y ) = X , and (Y ) = X X . Then as Y Y , and is 11, (Y ) (Y ), that is, X X . Thus |X | = |Y | = |X |, and X X X . By Lemma 1.5.7, |X | = |X |, and since |Y | = |X |, we have |X | = |Y |. For a dierent proof of Theorem 1.5.6, see Chapter 6.
1.6
Power set of a set
We recall the denition of the power set of a given set from Section 1.4. Denition 1.6.1: The power set P (X ) of a set X is the set of all subsets of X .
Chapter 1 Introduction: Sets, Functions and Relations For instance, if X = {1, 2, 3}, then P (X ) = , {1}, {2}, {3}, {1, 2}, {2, 3}, {3, 1}, {1, 2, 3} = X
17
The empty set and the whole set X , being subsets of X , are elements of P (X ). Now each subset S of X is uniquely dened by its characteristic function S : X {0, 1} dened by 1 if s S S = 0 if s / S.
Conversely, every function f : X {0, 1} is the characteristic function of a unique subset S of X . Indeed, if S = {x X : f (x) = 1}, then f = S . Denition 1.6.2: For sets X and Y , denote by Y X , the set of all functions f : X Y . Theorem 1.6.3: |X | |P (X )| for each nonvoid set X .
Proof. Theorem 1.6.3, implies that there exists a 11 function from X to P (X ) but none from P (X ) to X . First of all, the mapping f : X P (X ) dened by f (x) = {x} P (X ) is clearly 11. Hence |X | |P (X )|. Next, suppose there exists a 11 map from P (X ) to X . Then by SchroderBernstein theorem, there exists a bijection g : P (X ) X . This means that for each element S of P (X ), the
18
mapping g : s g (S ) = x X is a bijection. Now the element x may or may not belong to S . Call x X ordinary if x S , that is, x is a member of the subset S of X whose image under the map g is x. Otherwise, call x extraordinary. Let A be the subset of X consisting of all the extraordinary elements of X . Then A P (S ). (Note: A may be the empty set; still, A P (S )). Let g (A) = a X . Is a an ordinary element or an extraordinary element of X ? Well, if we assume that a is ordinary, then a A; but then a is extraordinary as A is the set of extraordinary elements. Suppose we now assume that a is extraordinary; then a / A and so a is an ordinary element of X , again a contradiction. These contradictions show that there exists no 11 mapping from P (X ) to X (X = ), and so |P (X )| > |X |.
1.7
Exercises
1. State true or false (with reason): (i) Parallelism is an equivalence relation on the set of all lines in the plane. (ii) Perpendicularity is an equivalence relation on the set of all lines in the plane. (iii) A nite set can be equipotent to a proper subset of itself. 2. Prove the statements (i), (ii) and (iii) in Theorem 1.2.9. 3. Prove that the set Z of integers is denumerable. 4. (a) Prove that the denumerable union of denumerable sets is denumerable.
19
(b) Prove that a countable union of (nonvoid) denumerable sets is denumerable. 5. Prove that the set of all rational numbers is denumerable. 6. Let Mn = {m N : m is a multiple of n}. Find (i) Mn .
nN
(ii) Mn Mm . (iii) Mn .
nN
(iv)
p=a prime
Mp .
7. Let f : A B , and g : B C be functions. Prove: (i) If f and g are 11, then so is g f . (ii) If f and g are onto, then so is g f . (iii) If g f is 11, then f is 11. (iv) If g f is onto, then g is onto. 8. Give an example of a relation which is (i) reexive and symmetric but not transitive (ii) reexive and transitive but not symmetric. 9. Does there exist a relation which is not reexive but both symmetric and transitive? 10. Let X be the set of all ordered pairs (a, b) of integers with b = 0. Set (a, b) (c, d) in X i ad = bc. Prove that is an equivalence relation on X . What is the class to which (1,2) belongs?
Chapter 1 Introduction: Sets, Functions and Relations 11. Give a detailed proof of Corollary 1.4.6.
20
1.8
Partially Ordered Sets
Denition 1.8.1: A relation R on a set X is called antisymmetric if, for a, b X , (a, b) R and (b, a) R together imply that a = b. For instance, the relation R dened on N, the set of natural numbers, by setting that (a, b) R i a|b(a divides b) is an antisymmetric relation. However, the same relation dened on Z = Z/{0}, the set of nonzero integers, is not antisymmetric. For instance, 5|(5) and (5)|5 but 5 = 5. Denition 1.8.2: A relation R on a set X is called a partial order on X if it is (i) Reexive, (ii) Antisymmetric and (iii) Transitive. A partially ordered set is a set with a partial order dened on it
Examples
1. Let R be dened on the set N by setting aRb i a|b. Then R is a partial order on N. 2. Let X be a nonempty set. Dene a relation R on P (X ) by setting ARB in P (X ) i A B . Then P (X ) is a partially ordered set with respect to the above partial order. It is customary to denote a general partial order on a set X by . We then write that (X, ) is a partially ordered set or a poset in short.
Chapter 1 Introduction: Sets, Functions and Relations {1,2,3} {1,2} {1} {2} {1,3} {2,3} {3}
21
Figure 1.2: The Hasse diagram of P ({1, 2, 3}) Every poset (X, ) can be represented pictorially by means of its Hasse diagram . This diagram is drawn in the plane by taking the elements of X as points of the plane and representing the fact that a b in X by placing b above a and joining a and b by a line segment. As an example, take S = {1, 2, 3} and X = P (S ), the power set of S and to stand for. Figure 1.2 gives the Hasse diagram of (X, ). Note that {1, 2} since is a subset of {1,2} . However, we have not drawn a line between and {1, 2} but a (broken) line exists from to {1, 2} via {1} or {2}. When there is no relation between a and b of X , both a and b can appear in the same horizontal level. Denition 1.8.3: A partial order on X is a total order (or linear order) if for any two elements a and b of X , either a b or b a holds. For instance, if X = {1, 2, 3, 4} and is the usual less than or equal to then (X, ) is a totally ordered set since any two elements of X are
Chapter 1 Introduction: Sets, Functions and Relations comparable. The Hasse diagram of (X, ) is given in Figure1.3. 4 3 2 1 Figure 1.3: Hasse diagram of (X, ), X = {1, 2, 3, 4}
22
The Hasse diagrams of all lattices with ve elements are given in Figure 1.4. 1 1 1 1 0 V11 1 1 b 0 V25 a c 0 V35 b c a 0 V45 b 0 V12 a 0 V13 a 0 V14 1 b b a 0 V24 1 c b a 0 V55 a c 0 V15 1 1 b
c a
Figure 1.4: Hasse diagrams of ve elements
23
If S has at least two elements, then (P (S ), ) is not a totally order set. Indeed, if a, b S , then {a} and {b} are incomparable (under ) elements of P (S ). Denition 1.8.4 (Converse Relation): If f : A B is a relation, the relation f 1 : B A is called the converse of f provided that (b, a) f 1 i (a, b) f . We note that if (X, ) is a poset, then (X, ) is also a poset. Here a b in (X, ) is dened by b a in (X, ). Denition 1.8.5: Let (X, ) be a poset. (i) a is called a greatest element of the poset if x a for each x X . An element b X is called a smallest element if b x for each x X . If a and a are greatest elements of X , by denition, a a and a a, and so by antisymmetry a = a . Thus a greatest (smallest) element, if it exists, is unique. The greatest and least elements of a poset, whenever they exist, are denoted by 1 and 0 respectively. They are called the universal elements of the poset. (ii) An element a X is called a minimal element of X if there exists no element c X such that c < a (that is, c a and c = a). b X is a maximal element of X if there exists no element c of X such that c > b (that is, b c, b = c). Clearly, the greatest element of a poset is a maximal element and the least element a minimal element.
Chapter 1 Introduction: Sets, Functions and Relations Example 1.8.6:
24
Let (X, ) be the poset where X = {1}, {2}, {1, 2}, {2, 3}, {1, 2, 3} . In X, {1}, {2} are minimal elements, {1, 2, 3} is the greatest element (and the only maximal element) but there is no smallest element. Denition 1.8.7: Let (X, ) be a poset and Y X (i) x X is an upper bound for Y if y x for all y Y . (ii) x X is a lower bound for Y if x y for all y Y . (iii) The inmum of Y is the greatest lower bound of Y , if it exists. It is denoted by inf Y . (iv) The supremum of Y is the least upper bound of Y , if it exists. It is denoted by sup Y . Example 1.8.8: If X = [0, 1] and stands for the usual ordering in the reals, then 1 is the supremum of X and 0 is the inmum of X . Instead, if we take X = (0, 1), X has neither an inmum nor a supremum in X . Here we have taken Y = X . However, if X = R and Y = (0, 1), then 1 and 0 are the supremum and inmum of Y respectively. Note that the supremum and inmum of Y , namely, 1 and 0, do not belong to Y .
1.9
Lattices
We now dene a lattice.
Chapter 1 Introduction: Sets, Functions and Relations Denition 1.9.1:
25
A lattice L = (L, , ) is a nonempty set L together with two binary operations (called meet or intersection or product) and (called join or union or sum) that satisfy the following axioms: For all a, b, c L, (L1 ) a b = b a; a b = b a, (Commutative law)
(L2 ) a (b c) = (a b) c; and (L3 ) a (a b) = a; Now, by (L3 ),
a (b c) = (a b) c, (Associative law) (Absorbtion law)
a (a b) = a,
a (a a) = a,
and hence again by (L3 ) (a a) = a (a (a a)) = a. Similarly, a a = a for each a L. Theorem 1.9.2: The relation a b i a b = a in a lattice (L, , ), denes a partial order on L. Proof. (i) Trivially a a since a a = a in L. Thus is reexive on L.
(ii) If a b and b a, we have a b = a and b a = b. Hence a = b since by (L1 ), a b = b a. This proves that is antisymmetric.
(iii) Finally we prove that is transitive. Let a b and b c so that a b = a and b c = b.
Chapter 1 Introduction: Sets, Functions and Relations Now a c = (a b) c = a (b c) by(L2 ) = a b = a and hence a c. Thus (L, ) is a poset. The converse of Theorem 1.9.2 is as follows: Theorem 1.9.3:
26
Any partially ordered set (L, ) in which any two elements have an inmum and a supremum in L is a lattice under the operations, a b = inf(a, b), and a b = sup(a, b).
Proof. Follows from the denitions of supremum and inmum. Examples of Lattices For a nonvoid set S , P (S ), , is a lattice. Again for a positive integer n, dene Dn to be the set of divisors of n, and let a b in Dn mean that a | b, that is, a is a divisor of b. Then a b = (a, b), the gcd of a and b, and a b = [a, b], the lcm of a and b, and Dn , , is a lattice (See Chapter 2 for the denitions of gcd and lcm ). For example, if n = 20, Fig. 1.5 gives the Hasse diagram of the lattice D20 = {1, 2, 4, 5, 10, 20}. It has the least element 1 and the greatest element 20.
Chapter 1 Introduction: Sets, Functions and Relations 10 5 20
27
4 2 1
Figure 1.5: The lattice D20 We next give the Duality Principle valid in lattices.
Duality Principle
In any lattice (L, , ), any formula or statement involving the operations and remains valid if we replace by and by . The statement got by the replacement is called the dual statement of the original statement. The validity of the duality principle lies in the fact that in the set of axioms for a lattice, any axiom obtained by such a replacement is also an axiom. Consequently, whenever we want to establish a statement and its dual, it is enough to establish one of them. Note that the dual of the dual statement is the original statement. For instance, the statement: a (b c) = (a b) (a c) implies the statement a (b c) = (a b) (a c).
Chapter 1 Introduction: Sets, Functions and Relations Denition 1.9.4:
28
A subset L of a lattice L = (L, , ) is a sublattice of L if (L , , ) is a lattice. A subset S of a lattice (L, , ) need not be a sublattice even if it is a poset with respect to the operation dened by a b i a b = a For example, let (L, , ) be the lattice of all subsets of a vector space L and S be the collection of all subspaces of L. Then S is, in general, not a sublattice of L since the union of two subspaces of L need not be a subspace of L. Lemma 1.9.5: In any lattice L = (L, , ), the operations and are isotone, that is, for a, b, c in L, if b c, then a b a c and a b a c.
Proof. We have (see Exercise 5 of 1.12) a b = a (b c) = (a b) c (by L2)
a c (as a b a). Similarly (or by duality), a b a c. Lemma 1.9.6: Any lattice satises the two distributive inequalities: (i) x (y z ) (x y ) (x z ), and
Chapter 1 Introduction: Sets, Functions and Relations (ii) x (y z ) (x y ) (x z ).
29
Proof. We have x y x, and x y y y z . Hence, x y inf(x, y z ) = x (y z ), Also x z x, and x z z y z . Thus x z x (y z ). Therefore, x (y z ) is an upper bound for both x y and x z and hence greater than or equal to their least upper bound, namely, (x y ) (x z ). The second statement follows by duality. Lemma 1.9.7: The elements of a lattice satisfy the modular inequality: xz implies x (y z ) (x y ) z.
Proof. We have x x y and x z . Hence x (x y ) z . Also, y z y x y and y z z , whence y z (x y ) z . These together imply that x (y z ) (x y ) z. Aliter. By Lemma 1.9.6 x (y z ) (x y ) (x z ) = (x y ) z, as x z.
Distributive and Modular Lattices

Two important classes of lattices are the distributive lattices and modular lattices. We now dene them. Denition 1.9.8:
Chapter 1 Introduction: Sets, Functions and Relations A lattice (L, , ) is called distributive if the two distributive laws: a (b c) = (a b) (a c), and a (b c) = (a b) (a c) hold for all a, b, c L.
30
Note that in view of the duality that is valid for lattices, if one of the two distributive laws holds in L then the other would automatically remain valid. Example 1.9.9 (Examples of Distributive Lattices): (i) P (S ), , (ii) (N, gcd, lcm ). (Here a b = (a, b), the gcd of a and b, and a b = [a, b], the lcm of a and b.) Example 1.9.10 (Examples of Nondistributive Lattices): (i) The diamond lattice of Figure 1.6 (a) (ii) The pentagonal lattice of Figure 1.6 (b) 1 c a c b a 0 Diamond Lattice (a) 0 Pentagonal Lattice (b) b
Figure 1.6: Hasse diagram of the diamond and pentagonal lattices
Chapter 1 Introduction: Sets, Functions and Relations In the diamond lattice, a (b c) = a 1 = a, while (a b) (a c) = 0 0 = 0 (= a).
31
In the case of the pentagonal lattice, a (b c) = a 0 = a, while
(a b) (a c) = 1 c = c(= a).
Complemented Lattice
Denition 1.9.11: A lattice L with 0 and 1 is complemented if for each element a L, there exists at least one element b L such that a b = 0 and a b = 1. Example 1.9.12:(1) Let L = P (S ), the power set of a nonvoid set S . If A L, then A has a unique complement B = L \ A. Here 0 = the empty set and 1 = the whole set S . (2) The complement of an element need not be unique. For example, in the lattice of Figure 1.7 (a), both a and c are complements of b, since b a = b c = 0, and b a = b c = 1. (3) Not every lattice with 0 and 1 is complemented. In the lattice of Figure 1.7 (b), a has no complement. That the diamond lattice and the pentagonal lattice (of Figure 1.6) are crucial in the study of distributive lattices is the content of Theorem 1.9.13.
Chapter 1 Introduction: Sets, Functions and Relations 1 c b a 0 (a) Figure 1.7: Theorem 1.9.13: c 0 (b) a 1 b
32
A lattice is distributive i it does not contain a sublattice isomorphic to the diamond lattice or the pentagonal lattice. The necessity of the condition in Theorem 1.9.13 is trivial but the proof of suciency is more involved. (For a proof, see ????) However, a much simpler result is the following: Theorem 1.9.14: If a lattice L is distributive then for a, b, c L, the equations a b = a c and a b = a c together imply that b = c. Proof. Assume that L is distributive. Suppose that a b = a c and a b = a c. We have to show that b = c. Now b = b (a b) = b (a c) = (b a) (b c) (by absorption law) (by distributivity)
= (a b) (b c) = (a c) (b c) = (a (b c)) (c (b c)) = (a (b c)) c (by absorbtion law)
Chapter 1 Introduction: Sets, Functions and Relations = ((a b) (a c)) c = ((a c) (a c)) c = (a c) c = c.
33
We now consider another important class of lattices called modular lattices.
Modular Lattices
Denition 1.9.15: A lattice is modular if it satises the following modular identity: x z x (y z ) = (x y ) z. Hence the modular lattices are those lattices for which equality holds in Lemma 1.9.7. We prove in Chapter 5 that the normal subgroups of any group form a modular lattice. The pentagonal lattice of Figure 1.7 is nonmodular since in it a c, a (b c) = a 0 = a while (a b) c = 1 c = c(= a). In fact, the following result is true. Theorem 1.9.16: Any nonmodular lattice L contains the pentagonal lattice as a sublattice. Proof. As L is nonmodular, there exist elements a, b, c in L such that a < c and a (b c) = (a b) c. But the modular inequality (Lemma 1.9.7) holds for any lattice. Hence a < c and a (b c) < (a b) c.
34
Set x = a (b c), and y = (a b) c, so that x < y . Now x b = a (b c) b = a (b c) b = a b = y b = a ((b c) b) (by L3). ab y b x by Figure 1.8: By duality, x b = y b. Now since y c, b y b c x, the lattice of Figure 1.8 is the pentagonal lattice contained in L. A consequence of Theorems 1.9.13 and 1.9.16 is that every distributive lattice is modular (since L is distributive L does not contain the pentagonal lattice as a sublattice L is modular). The diamond lattice is an example of a modular lattice that is not distributive. Another example is the lattice of all vector subspaces of a vector space V . (See Exercise 1.12.) Theorem 1.9.17: In a distributive lattice L, an element can have at most one complement. Proof. Suppose x has two complements y1 , y2 . Then x y1 = 0 = x y2 , and x y1 = 1 = x y2 .
As L is distributive, by Theorem 1.9.14, y1 = y2 .
35
A consequence of Theorem 1.9.17 is that any complemented lattice having an element without a unique complement is not distributive. For example, the lattice of Figure 1.7 is not distributive since c has two complements, namely, a and b.
1.10
1.10.1
Boolean Algebras
Introduction
A Boolean algebra is an abstract mathematical system primarily used in computer science and in expressing the relationships between sets. This system was developed by the English mathematician George Boole in 1850 to permit an algebraic manipulation of logical statements. Such manipulation can demonstrate whether or not a statement is true and show how a complicated statement be rephrased in a similar, more convenient form without losing its meaning. Denition 1.10.1: A complemented distributive lattice is a Boolean algebra. Hence a Boolean algebra B has the universal elements 0 and 1 and that every element x of B has a complement x , and since B is a distributive lattice, by Theorem 1.9.17, x is unique. The Boolean algebra B is symbolically represented as (B, , , 0, 1, ).
36
1.10.2
Examples of Boolean algebras
1. Let S be a nonvoid set. Then (P (S ), , , , S, ) is a Boolean algebra. Here if A P (S ), A = S \ A is the complement of A in S . 2. Let B n denote the set of all binary sequences of length n. For (a1 , . . . , an ) and (b1 , . . . , bn ) B n , set (a1 , . . . , an ) (b1 , . . . , bn ) = (min(a1 , b1 ), . . . , min(an , bn )), (a1 , . . . , an ) (b1 , . . . , bn ) = (max(a1 , b1 ), . . . , max(an , bn )), and (a1 , . . . , an ) = (a1 , . . . , an ), where 0 = 1 and 1 = 0. Note that the zero element is the n-vector (0, 0, . . . , 0), and, the unit element is (1, 1, . . . , 1). For instance, if n = 3, x = (1, 1, 0) and
y = (0, 1, 0), then x y = (0, 1, 0), x y = (1, 1, 0), and x = (0, 0, 1). Theorem 1.10.2 (DeMorgans Laws): Any Boolean algebra B satises DeMorgans laws: For any two elements a, b B, (a b) = a b , Proof. We have by distributivity, (a b) (a b ) = (a (a b )) (b (a b )) = ((a a ) (a b )) ((b a ) (b b )) = (0 (a b )) ((b a ) 0) = (a b ) (b a ) = (a b a ) (b b a ) = (a a b) (b b a ) = (0 b) (0 a ) (see Exercise 1.12 #4) and (a b) = a b .
Chapter 1 Introduction: Sets, Functions and Relations =00 = 0 (since a a = 0 = b b ). Similarly, (a b) (a b ) = (a (a b )) (b (a b )) = ((a a ) b ) ((b b ) a ) = (1 b ) (1 a ) = 1 1 = 1.
37
(by distributivity) (since a b = b a )
Hence the complement of a b is a b . In a similar manner, we can show that (a b) = a b . Corollary 1.10.3: In a Boolean algebra B , for a, b B ,a b i a b Proof. a b a b = b (a b) = a b = b a b . Theorem 1.10.4: In a Boolean algebra B , we have for all a, b B , a b i a b = 0 i a b = 1. Proof. Since (a b ) = a b and 0 = 1, it is enough to prove the rst part of the theorem. Now a b (by isotone property) a b b b = 0. a b 0 a b = 0. Conversely, let a b = 0. Then a = a 1 = a (b b ) = (a b) (a b ) = a b a b.
38
Next we briey discuss Boolean subalgebras and Boolean isomorphisms. These are notions similar to subgroups and group-isomorphisms.
Boolean Subalgebras
Denition 1.10.5: A Boolean subalgebra of a Boolean algebra B = (B, , , 0, 1, ) is a subset B1 of B such that (B1 , , , 0, 1, ) is itself a Boolean algebra with the same elements 0 and 1 of B .
Boolean Isomorphisms
Denition 1.10.6: A Boolean homomorphism from a Boolean algebra B1 to a Boolean algebra B2 is a map f : B1 B2 such that for all a, b in B1 , (i) f (a b) = f (a) f (b), (ii) f (a b) = f (a) f (b), and (iii) f (a ) = (f (a)) . Conditions (i) and (ii) imply that f is a lattice homomorphism from B1 to B2 while condition (iii) tells that f takes the complement of an element (which is unique in a Boolean algebra) in B1 to the complement in B2 of the image of that element. Theorem 1.10.7: Let f : B1 B2 be a Boolean homomorphism. Then (i) f (0) = 0, and f (1) = 1;
Chapter 1 Introduction: Sets, Functions and Relations (ii) f is isotone. (iii) The image f (B1 ) is a Boolean subalgebra of B2 . Proof. Straightforward. Example 1.10.8:
39
Let S = {1, 2, . . . , n}, and let A be the Boolean algebra (P (S ) = A, , , ), and let B be the Boolean algebra dened by the set of all functions from S to the set [0, 1]. Any such function is a sequence (x1 , . . . , xn ) where xi = 0 or 1. Let , and be as in Example 2 of Section 1.10.1. Now consider the map f : A = P (S ) B = [0, 1]S dened as follows: For X S (that is, X P (S ) = A), f (X ) = (x1 , x2 , . . . , xn ), where xi = 1 or 0 according to whether i X or not. For X, Y P (S ), f (X Y ) = the binary sequence having 1 only in the places common to X and Y = f (X ) f (Y ) as per the denitions in Example 2 of Section 1.10.1 Similarly, f (X Y ) = the binary sequence having 1 in all the places corresponding to the 1s in the set X Y = f (X ) f (Y ). Further, f (X ) = f (S \ X ) = the binary sequence having 1s in the places where X had zeros, and zeros in the places where X had 1s = (f (X ) ). f is 11 since distinct binary sequences in B arise out of distinct subsets of S . Finally, f is onto, since any binary sequence in B is the image of the corresponding subset (that is, the subset corresponding to the places of the sequence with 1) of X . Thus f is a Boolean isomorphism. Example 1.10.9: Let A be a proper Boolean subalgebra of B = P (S ). Then if f : A B is
Chapter 1 Introduction: Sets, Functions and Relations the identity function, f is a lattice homomorphism since, f (A1 A2 ) = A1 A2 = f (A1 ) f (A2 ), and f (A1 A2 ) = A1 A2 = f (A1 ) f (A2 ). However f (A ) = f (complement of A in A)= f () = = 0B , while (f (A)) = A = B |A = . Hence f (A ) = f (A ), and f is not a Boolean homomorphism.
40
1.11
Atoms in a Lattice
Denition 1.11.1: An element a of a lattice L with zero is called an atom of L if a = 0 and for all b L, 0 < b a b = a. That is to say, a is an atom if there is no nonzero b strictly less than a. Denition 1.11.2: An element a of a lattice L is called join-irreducible if a = b c, then a = b or a = c; otherwise, a is join-reducible. Lemma 1.11.3: Every atom of a lattice with zero is join-irreducible. Proof. Let a be an atom of a lattice L, and let a = b c, a = b. Then b b c = a, but since b = a and a is an atom of L, b = 0, and hence a = c. Lemma 1.11.4: Let L be a distributive lattice and c L be join-irreducible. If c a b,
41
then c a or c b. In particular, the same result is true if c is an atom of L. Proof. As c a b, and L is distributive, c = c (a b) = (c a) (c b). As c is join-irreducible, this means that c = c a or c = c b, that is, c a or c b. The second statement follows immediately from Lemma 1.11.3. Denition 1.11.5: (i) Let L be a lattice and a and b, a b, be any two elements of L. Then the closed interval [a, b] is dened as: [a, b] = {x L : a x b}. (ii) Let x [a, b]. x is said to be relatively complemented in [a, b], if x has a complement y in [a, b], that is, x y = a and x y = b. If all intervals [a, b] of L are complemented, then the lattice L is said to be relatively complemented. (iii) If L has a zero element and all elements in [0, b] have complements in L for every nonzero b in L, then L is said to be sectionally complemented. Our next theorem is crucial for the proof of the representation theorem for nite Boolean algebras. Theorem 1.11.6: The following statements are true: (i) Every Boolean algebra is relatively complemented. (ii) Every relatively complemented lattice is sectionally complemented.
42
(iii) In any nite sectionally complemented lattice, each nonzero element is a join of nitely many atoms. Proof. (i) Let [a, b] be an interval in a Boolean algebra B , and x [a, b].
We have to prove that [a, b] is complemented. Now, as B is the Boolean algebra, it is a complemented lattice and hence there exists x in B such that x x = 0, and x x = 1. Set y = b (a x ). Then y [a, b]. Also, y is a complement of x in [a, b] since x y = x (b (a x )) = (x b) (a x ) = x (a x ) (as x [a, b], x b) = (x a) (x x ) = (x a) (0) = a (as a x) and, x y = x (b (a x )) = (x b) (x (a x )) = b (x x ) a (again by distributivity) = b 1 = b (since x x = 1, and 1 a = 1). Hence B is complemented in the interval [a, b]. (ii) If L is the relatively complemented, L is complemented in [0, b] for each b L (take a = 0). Hence L is sectionally complemented. (iii) Let a be a nonzero element of a nite sectionally complemented lattice L. As L is nite, there are only nitely many atoms p1 , . . . , pn in L such that pi a, 1 i n, and let b = p1 pn . Now, b a, since b is the least upper bound of p1 , . . . , pn while a is an upper bound of p1 , . . . pn . Suppose b = a, then b has a nonzero complement, say, c, in the section [0, a] since we have assumed that L is sectionally complemented. Let (as B is distributive)
Chapter 1 Introduction: Sets, Functions and Relations p be an atom such that p c
43
( a). Then p {p1 , . . . , pn }, as
by assumption p1 , . . . , pn are the only atoms with pi a, and hence, p = p b c b = 0 (as c is the complement of b), a contradiction. Hence b = a = p1 pn . An immediate consequence of Theorem 1.11.6 is the following result. Corollary 1.11.7: In any nite Boolean algebra, every nonzero element is a join of atoms. We end this section with the representation theorem for nite Boolean algebras which says that any nite Boolean algebra may be thought of as the Boolean algebra P (S ) dened on a nite set S . Theorem 1.11.8 (Representation theorem for nite Boolean algebras): Let B be a nite Boolean algebra and A, the set of its atoms. Then there exists a Boolean isomorphism B P (A). Proof. For b B , dene A(b) = {a A : a b} so that A(b) is the set of the atoms of B that are less than or equal to b. Then A(b) P (A). Now dene : B P (A) by setting (b) = A(b). We now prove that is a Boolean isomorphism. We rst show that is a lattice homomorphism, that is, for b1 , b2 B , (b1 b2 ) = (b1 ) (b2 ) = (b1 ) (b2 ), and (b1 b2 ) = (b1 ) (b2 ) = (b1 ) (b2 ).
Chapter 1 Introduction: Sets, Functions and Relations Equivalently, we show that A(b1 b2 ) = A(b1 ) A(b2 ), and A(b1 b2 ) = A(b1 ) A(b2 ).
44
Let a be an atom of B . Then a A(b1 b2 ) a b1 b2 a b1 and a b2 a A(b1 ) A(b2 ). Similarly, a A(b1 b2 ) a b1 b2 a b1 or a b2 (As a is an atom, a is join-irreducible by Lemma 1.11.3 and B being a Boolean algebra, it is a distributive lattice. Now apply Lemma 1.11.4) a A(b1 ) or a A(b2 ) a A(b1 ) A(b2 ). Next, as regards complementation, a (b ) a A(b ) a b a b = 0 a (by Theorem 1.10.4)
ba / A(b) a A \ A(b) = (A(b)) . Thus A(b ) = A(b)
Finally, (0) = set of atoms in B that are 0 = (as there are none), the zero element of P (A), and (1) = set of atoms in B that are 1 = set of all atoms in B = A, the unit element of P (A). All that remains to show is that is a bijection. By Corollary 1.11.7, any b B is a join, say, b = a1 an (of a nite number n) of atoms a1 , . . . , an of B . Hence ai b, 1 i n. Suppose (b) = (c), that is, A(b) = A(c). Then each ai A(b) = A(c) and so ai c for each i, and hence b c. In a similar manner, we can show that c b, and hence b = c. In other words, is injective. Finally we show that is surjective. Let C = {c1 , . . . , ck } P (A) so that C is a set of atoms in A. Set b = c1 ck . We show that (b) = C
45
and this would prove that is onto. Now ci b for each i, and so by the denition of , (b) = { set of atoms c A with c b} C . Conversely, if a (b), then a is an atom with a b = c1 . . . ck . Therefore a ci for some i by Lemma 1.11.4. As ci is an atom and a = 0, this means that a = ci C . Thus (b) = C .
1.12
Exercises
1. Draw the Hasse diagram of all the 15 essentially distinct lattices with six elements. 2. Show that the closed interval [a, b] is a sublattice of the lattice R, inf , sup . 3. Give an example of a lattice with no zero element and with no unit element. 4. In a lattice, show that (i) a b c = c b a, (ii) (a b) c = (a c) (b c). 5. Prove that in a lattice a b a b = a a b = b. 6. Show that every chain is a distributive lattice. 7. Show that the three lattices of Fig. 1.9 are not distributive.
Chapter 1 Introduction: Sets, Functions and Relations 1 d c b a 0 (a) f a 0 (b) Figure 1.9: e 1 d b c a 1 d b 0 (c)
46
8. Show that the lattice of Fig. 1.9 (c) is not modular. 9. Show that the lattice of all subspaces of a vector space is not distributive. 10. Which of the following lattices are (i) distributive (ii) modular (iii) modular, but not distributive? (a) D160 (b) D20 (c) D36 (d) D40 .
11. Give a detailed proof of Theorem 1.10.7.
Chapter 2 Combinatorics
2.1 Introduction
Combinatorics is the science (and to some extent, the art) of counting and enumeration of congurations (it is understood that a conguration arises every time objects are distributed according to certain predetermined constraints). Just as arithmetic deals with integers (with the standard operations), algebra deals with operations in general, analysis deals with functions, geometry deals with rigid shapes and topology deals with continuity, so does combinatorics deals with congurations. The word combinatorial was rst used in the modern mathematical sense by Gottfried Wilhelm Leibniz (1646 1716) in his Dissertatio de Arte Combinatoria (Dissertation Concerning the Combinatorial Arts). Reference to combinatorial analysis is found in English in 1818 in the title Essays on the Combinatorial Analysis by P. Nicholson (see Je Millers Earliest known uses of the words of Mathematics, Society for Industrial and Applied Mathematics, U.S.A.). In his book [4], C. Berge points out the following interesting aspects of combinatorics: 47
Chapter 2 Combinatorics 1. study of the intrinsic properties of a known conguration
48
2. investigation into the existence/non-existence of a conguration with specied properties 3. counting and obtaining exact formulas for the number of congurations satisfying certain specied properties 4. approximate counting of congurations with a given property 5. enumeration and classication of congurations 6. optimization in the sense of obtaining a conguration with specied properties so as to minimize an associated cost function. Basic combinatorics is now regarded as an important topic in Discrete Mathematics. Principles of counting appear in various forms in Computer Science and Mathematics, especially in the analysis of algorithms and in probability theory.
2.2
Elementary Counting Ideas
We begin with some simple ideas of counting using the Sum Rule, the Product Rule and obtaining permutations and combinations of nite sets of objects.
49
Sum Rule and Product Rule

2.2.1 Sum Rule
This is also known as the principle of disjunctive counting. If a nite set X is the union of pairwise disjoint nonempty subsets S1 , S2 , . . . , Sn , then |X | = |S1 | + |S2 | + + |Sn | where, |X | denotes the number of elements in the set X .
2.2.2
Product Rule
This is also known as the principle of sequential counting. If S1 , S2 , . . . , Sn are nonempty nite sets, then the number of elements in the cartesian product S1 S2 Sn is |S1 | |S2 | |Sn |. For instance, consider the set A = {a, b, c, d, e} and the set B = {f, g, h}. The cardinality |A B | of the set A B is |A| |B | = 5.3 = 15. The proof of the Product Rule is straightforward. One way to do it is to prove the rule taking two sets and then use induction on the number of sets. The Sum Rule and the Product Rule are basic and they are applied mechanically in many situations. Example 2.2.1: Assume that a car registration system allows a registration plate to consist of one, two or three English alphabets followed by a number (not zero or starting with zero) having a number of digits equal to the number of alphabets. How many possible registrations are there? There are 26 possible alphabets and 10 possible digits, including 0. By the
50
Product Rule there are 26, 26 26 and 26 26 26 possibilities of alphabet combination(s) of length 1,2 and 3 respectively and 9, 9 10 and 9 10 10 permissible numbers. Each occurrence of a single alphabet can be combined with any of the nine single digit numbers 1 to 9. Similarly each occurrence of a double (respectively triple) alphabet canbe combined with any of the allowed ninety two digit numbers from 10 to 99 (respectively nine hundred three digit numbers from 100 to 999).Hence the number of registrations of lengths 2, 4 and 6 characters are 26 9 (= 234), 676 90 (= 60 840) and 17 576900 (= 15 818 400) respectively. Since these possibilities are mutually exclusive and together they exhaust all the possibilities, by the Sum Rule the total number of possible registrations is 234+60 840+15 818 400 = 15 879 474. Example 2.2.2: Consider tossing 100 indistinguishable dice. By the Product Rule, it follows that there are 6100 ways of their falling. Example 2.2.3: A self-dual 2-valued Boolean function is one whose denition remains unchanged if we change all the 0s to 1s and all the 1s to 0s simultaneously. How many such functions in n-variables exist?
Chapter 2 Combinatorics Values of Variables 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Function Value 1 1 0 0 1 1 0 0
51
As an example, we consider a Boolean function in 3 variables. The function value is what is to be assigned in dening the function. As the function is required to be self-dual, it is evident that an arbitrary assignment (of 0s and 1s) of values cannot be made. The self-duality requires that changing all 0s to 1s and all 1s to 0s does not change the function denition. Thus, if the values to the variables are assigned as 0 0 0 and if the function value is 1, then for the assignment (of variables) 1 1 1 the function value must be the complement of 1 as shown in the table. It is thus easy see that only 2n1 independent assignments to the function value can be made. Thus, the total number of self-dual 2-valued Boolean functions is 22
(n1)
2.3
Combinations and Permutations
In obtaining combinations of various objects the general idea is one of selection. However, in obtaining permutations of objects the idea is to arrange the objects in some order. Consider the ve given objects, a, a, a, b, c abbreviated as {3.a, 1.b, 1.c}. The numbers 3, 1, 1 denoting the multiplicities of the objects are called the repetition numbers of the objects. The 3-combinations are a a a, a a b, a a c
52
and a b c. This ignores the ordering of objects. On the other hand, the 3permutations are a a a, a a b, a b a, b a a, a a c, a c a, c a a, a b c, a c b, b a c, b c a, c a b and c b a. When we allow unlimited repetitions of objects we denote the repetition number by . Consider the 3-combinations possible from {.a, .b, .c, .d}. There are 20 of them. On the other hand, there are 43 or 64 3-permutations. We use the following notations. P (n, r) = the number of r permutations of n distinct elements without repetitions. C (n, r) = the number of r combinations of n distinct elements without repetitions. The following are basic results: P (n, r) =
n! (nr )!
and C (n, r) =
n! r !(nr )!
where n! (read as n-factorial) is the product 1 2 3 n.
2.4
Stirlings Formula
A useful approximation to the value of n! for large values of n was given by James Stirling in 1730. This formula says that when n is large, n! can be approximated by, Sn =
n
(2n) (n/e)n
That is, lim n!/Sn = 1. The following table gives the values of n! and the corresponding Sn for the indicated n also indicates the percentage error in Sn . For standard generalizations of n! see [39]. The following two examples are variations of permutations: Example 2.4.1:
Chapter 2 Combinatorics Table 2.1: Sn for the indicated n and percentage error in Sn n 8 9 10 11 12 13 n! 40 320 362 880 3 628 800 39 916 800 479 001 600 6 227 020 800 Sn 39 902 359 537 3 598 696 39 615 625 475 687 486 6 187 239 475 Percentage error 1.0357 0.9213 0.8296 0.7545 0.6919 0.6389
53
Given n objects it is required to arrange them in a circle. In how many ways is this possible? Number the objects as 1, 2, . . . , n. Keeping the object 1 xed, it is possible to obtain (n 1)! dierent arrangements of the remaining objects. Note that, for any arrangement it is possible to shift the position of the object 1 while simultaneously shifting (rotating) all the other objects. Thus, the total number of arrangements is just (n 1)!. Example 2.4.2: Enumerating r-permutations of n objects with unlimited repetitions is easy. We consider r boxes, each to be lled by one of n possibilities. Thus the answer is U (n, r) = nr . It is easy to see that C (n, r) can also be expressed as n(n1)(n2) (n r + 1)/r! The numerator is often denoted by [n]r which is a polynomial in n of
1 2 2 r r degree r. Thus we can write, [n]r = so r + sr n + sr n + + sr n .
By denition, the coecients sk r are the Stirling Numbers of the rst kind.
Chapter 2 Combinatorics These numbers can be calculated from the following formulas: s0 r = 0, sr r = 1
54
k 1 sk rsk r +1 = sr r
(2.1)
Proof. By denition, [x]r+1 = [xr ](x r). Again by denition, we have from
k k 1 k 1 k the above equality, + sk x + sk r +1 x + = ( + sr r x + )(x r )
Equating the coecients of xk on both the sides above, gives the required recurrence. From the above relations we can build the following table: sk k=0 1 r r=1 0 1 2 0 -1 3 0 2 4 0 -6 2 0 1 -3 11 3 0 0 1 -6 4 0 0 0 1
2.5
Examples in simple combinatorial reasoning
Clever reasoning forms a key part in most of the combinatorial problems. We illustrate this with some typical examples. Example 2.5.1: Show that C (n, r) = C (n, n r). The left hand side of the equality denotes the number of ways of choosing r objects from n objects. Each such choice leaves out (n r) objects. This is exactly equivalent to choosing (n r) objects, leaving out r objects, which is the right hand side.
Chapter 2 Combinatorics Example 2.5.2: Show that C (n, r) = C (n 1, r 1) + C (n 1, r).
55
The left hand side is the number of ways of selecting r objects from out of n objects. To do this, we proceed in a dierent manner. We mark one of the n objects as X . In the selected r objects, (a) either X is included or (b) X is excluded. The two cases (a) and (b) are mutually exclusive and totally exhaustive. Case (a) is equivalent to selecting (r 1) objects from (n 1) objects while case (b) is equivalent to selecting r objects from (n 1) objects. Example 2.5.3: There are a roads from city A to city B , b roads from city B to city C , c roads from city C to city D, e roads from city A to city C , d roads from city B to city D and f roads from city A to city D. In how many number of ways one can travel from city A to city D and come back to city A while visiting at least one of city B and/or city C at least once? Starting from the city A, the dierent routes leading to city D are shown in the following tree diagram. It follows that the total number of ways of going to city D from city A is (abc + ad + ec + f ). The tree diagram also suggests (from the leaves to the root) the number of ways of going from city D to city A is (abc + ad + ec + f ). Therefore, the total number of ways of going from city A to city D and back is (abc + ad + ec + f )2 . The number of ways of directly going from city A to city D and back to city A directly is f 2 . Hence the number of ways of going from city A to city D and back while visiting city B and/or city C at least once is (abc + ad + ec + f )2 f 2 .
Chapter 2 Combinatorics A B b C c D Figure 2.1: d D D a f e C c D
56
Example 2.5.4: Show that P (n, r) = r P (n 1, r 1) + P (n 1, r). The left hand side is the number of ways of arranging r objects from out of n objects. This can be done in the following way. Among the r objects, we mark one object as X . The selected arrangement either includes X or does not include X . In the former case, we rst select (r 1) objects from among (n 1) objects and then introduce X in any of the r positions. This gives the rst term on the right hand side. In the latter case, we simply select r objects from out of (n 1) objects excluding X . This gives the second term on the right hand side. Example 2.5.5: Count the number of simple undirected graphs with a given set V of n vertices. Obviously, V contains C (n, 2) = n(n 1)/2 unordered pairs of vertices. We may include or exclude each pair as an edge in forming a graph with vertex set V . Therefore, there are 2C (n,2) simple graphs with vertex set V .
57
In the above counting, we have not considered isomorphism (see Chapter 6). Thus, although there are 64 simple graphs having four vertices, there are only eleven of them distinct up to isomorphism. Example 2.5.6: Let S be a set of 2n distinct objects. A pairing of S is a partition of S into 2-element subsets; that is, a collection of pairwise disjoint 2-element subsets whose union is S . How many dierent pairings of S are there? Method 1: We pick an element say x from S . The number of ways to select xs partner, say y , is (2n 1). (Now {x, y } forms a 2-element subset). Consider now the (2n 2) elements in S \ {x, y }. We pick any element say u from S \ {x, y }. The number of ways to select us partner, say v , is (2n 3). Thus, the total number of ways of picking {x, y } and {u, v } is (2n 1)(2n 3). Extending the argument in a similar way and applying the product rule, the total number of ways of partitioning S into 2-element subsets is given by, (2n 1) (2n 3) 5 3 1. Method 2: Consider the n 2-element subsets (boxes shown as braces) shown below: {, } , {, }, . . . , {, } numbered respectively as 1, 2, . . . , n. We form a 2-element subset of S and assign it to box 1. There are C (2n, 2) ways of doing this. Next, we form a 2-element subset from the remaining (2n 2) elements and assign it to box 2. This can be done in C (2n 2, 2) ways. It is easy to see that the total number of ways to form the dierent 2-element subsets of S is, C (2n, 2) C (2n 2, 2) C (2n 4, 2) C (4, 2) C (2, 2).
58
However, in the above, we have considered the ordering of the various 2element subsets also. Since the ordering is immaterial, the number of ways of partitioning S into 2-element subsets is 1 [C (2n, 2) C (2n 2, 2) C (2n 4, 2) C (4, 2) C (2, 2)] n! This expression is the same as the one obtained in the method 1 above. Example 2.5.7: Let S = {1, 2, . . . , (n + 1)} where n 2 and let T = {(x, y, z ) S 3 |x < z and y < z }. Show by counting |T | in two dierent ways that, k 2 = C (n + 1, 2) + 2C (n + 1, 3) . . . . . .
1kn
(2.2)
We rst show that the number on the left-hand side of 2.2 is |T |: In selecting (x, y, z ) we rst x z = 2. We then have ordered 3-tuples of the form (, , z ). The number of ways of lling the blanks is 1 1 as z is greater than both x and y . Next we x z = 3 to get ordered 3-tuples of the form (, , z ). As both 2 and 1 are less than the xed z , the number of ways of lling the blanks is 2 2 (whence we get the four ordered 3-tuples (1, 1, 3), (1, 2, 3), (2, 1, 3) and (2, 2, 3) ). The argument can be extended by xing z = 3, 4, . . . , n. Thus the total number of dierent ordered 3-tuples of the required type is simply 1 1 + 2 2 + + n n = |T |. We next show that the number on the right-hand side of 2.2) also represents |T |: We consider two mutually exclusive and totally exhaustive cases, namely x = y and x = y in the 3-tuples of interest. For the rst case, we select only 2 integers from S , take the larger to be z and the smaller to be both x as well as y . This can be done in C (n + 1, 2) ways. For the second case, we select 3 integers, take the
59
largest to be z and assign the remaining two to be x and y in two possible ways (note that each selection will produce two ordered 3-tuples). The total number of ordered 3-tuples that are possible in this way is 2C (n + 1, 3). Hence the number of elements in |T | is C (n + 1, 2) + 2.C (n + 1, 3). Example 2.5.8: A sequence of (mn + 1) distinct integers u1 , u2 , . . . , umn+1 is given. Show that the sequence contains either a decreasing subsequence of length greater than m or an increasing subsequence of length greater than n (this result is due to P. Erd os and G. Szekeres (1935)). We present the proof as in [4]. Let li () be the length of the longest decreasing subsequence with the rst term ui () and let li (+) be the length of the longest increasing subsequence with the rst term ui (+). Assume that the result is false. Then ui (li (), li (+)) denes a
mapping of {u1 , u2 , . . . , umn+1 } into the Cartesian product {1, 2, . . . , m} {1, 2, . . . , n}. This mapping is injective since if i < j , ui > uj li () > lj () (li (), li (+)) = (lj (), lj (+)) ui < uj li (+) > lj (+) (li (), li (+)) = (lj (), lj (+)) Hence, |{u1 , u2 , . . . , umn+1 }| |{1, 2, . . . , m} {1, 2, . . . , n}| and therefore mn + 1 mn which is impossible.
2.6
The Pigeon-Hole Principle
One deceptively simple counting principle that is useful in many situations is the Pigeon-Hole Principle . This principle is attributed to Johann Peter
60
Dirichlet in the year 1834 although he apparently used the German term Schubfachprinzip. The French term is le principe de tiroirs de Dirichlet which can be translated as the principle of the drawers of Dirichlet. The Pigeon-Hole Principle: If n objects are put into m boxes and n > m (m and n are positive integers) then at least one box contains two or more objects. A stronger form: If n objects are put into m boxes and n > m, then some box must contain at least [n/m] objects. Another form: Let k and n be the two positive integers. If at least kn + 1 objects are distributed among n boxes then one of the boxes must contain at least k + 1 objects. We now illustrate this principle with some examples. Example 2.6.1: Show that, among a group of 7 people there must be at least four of the same sex. We treat the 7 people as 7 objects. We create 2 boxesone (say, box1) for the objects corresponding to the females and one (say, box2) for the objects corresponding to the males. Thus, the 7(= 3 2 + 1) objects are put into two boxes. Hence, by the Pigeon-Hole principle there must be at least 4 objects in one box. In other words, there must be at least four people of the same sex. Example 2.6.2: Given any ve points, chosen within a square of side with length 2 units, prove there must be two points which are at most 2 units apart.
61
Subdivide the square into four small squares each with side of length 1 unit. By the pigeon-hole principle at least two of the chosen points must be in (or on the boundary) one small square. But then the distance between these two points cannot exceed the diagonal length, 2 of the small square. Example 2.6.3: Let A be a set of m positive integers. Show that there exists a nonempty subset B of A such that the sum
x B
is divisible by m.
We make use of the congruence relation. By a b (mod m), we mean m divides (a b). A basic property we will use, is that if a r (mod m) and b r (mod m), then a b (mod m). Let A = {a1 , a2 , . . . , an }. Consider the following m subsets of A and the sum of their respective elements: Set A1 = {a1 } sum of the elements is a1 Set A2 = {a1 , a2 } sum of the elements is a1 + a2 Set A3 = {a1 , a2 , a3 } sum of the elements is a1 + a2 + a3 ............... Set Am = {a1 , a2 , ..., am } sum of the elements is a1 + a2 + + am . If any of the sums is exactly divisible by m, then the corresponding set is the required subset B . Therefore, we will assume that none of the above sums is divisible by m. We thus have, a1 r1 ( mod m) a1 + a2 r2 ( mod m) a1 + a2 + a3 r3 ( mod m)
Chapter 2 Combinatorics a1 + a2 + + am rm ( mod m)
62
where each ri (1 i m) is in {1, 2, . . . , (m 1)}. Now, we consider (m 1) boxes numbered 1 through (m 1) and we distribute the integers r1 through rm to these boxes so that ri goes into box i if r1 = i . By the Pigeon-Hole principle there must be one box containing an ri and an rj both of which must be the same, say r. That is, we must have, a1 + a2 + + ai r ( mod m)
and a1 + a2 + + aj r ( mod m), where j > i without loss of generality. Therefore, m divides the dierence (a1 + a2 + + aj ) (a1 + a2 + + ai ). Accordingly, Aj \ A1 is the required subset B .
2.7
More Enumerations
We now consider the interesting case of enumerating combinations with unlimited repetitions: Let us consider the distinct objects to be a1 , a2 , a3 , . . . , an . We are interested in selecting r-combinations from { a1 , a2 , a3 , . . . , an }. Any r-combination will be of the form {x1 a1 , x2 a2 , . . . , xn an } where the x1 , x2 , . . . , xn are the repitition numbers, each being a non-negative integer and they add up to r. Thus the number of r-combinations of {a1 , a2 , a3 , . . . , an } is equal to the number of solutions of x1 + x2 + x3 + + xn = r. For each xi we put a bin and assign xi balls to that bin; the number of balls will add up to r. Thus the number of solutions to the above equation
63
is equal to the number of ways of placing r indistinguishable balls in n bins numbered 1 through n. We now make the following observation. Consider 10 objects and consider the 7-combinations from them. One solution is: (3 0 0 2 0 0 0 2 0 0). Corresponding to this (distribution of balls in bins) we can form a unique binary number as follows: rst we separate the integers by a 1 (imagine this as a vertical bar). Then we ignore the zeros and put (appropriate number of) zeros in the place of non-zero integers. For the above example, we get, 000 1 1 1 00 1 1 1 1 00 1 1 . Generalizing this we can say that the number of ways of placing r indistinguishable balls into n numbered bins is equal to the number of binary numbers with (n 1) 1s and r 0s. Counting such binary numbers is easy: we have (n 1+ r) positions and we have to choose r positions to be occupied by the 0s ( the remaining (n 1) positions get lled by 1s ). This can be done in C (n 1 + r, r) ways. We can now state the result: Let V (n, r) be the number of r-combinations of n distinct objects with unlimited repitions. We have, V (n, r) = C (n 1 + r, r) = C (n 1 + r, n 1). Example 2.7.1: The number of integral solutions of, x1 + x2 + + xn = r, xi > 0, for all admissible values of i is equal to the number of ways of distributing r similar balls into n numbered bins with at least one ball in each bin. This is equal to C (n 1 + (r n), r n) = C (r 1, r n).
64
2.7.1
Enumerating permutations with constrained repetitions
We begin with an illustrative example. Consider the problem of obtaining the 10-permutations of {3 a, 4 b, 2 c, 1 d}. Let x be the number of such permutations. If the 3 as are replaced by a1 , a2 , a3 it is easy to see that we will get 3!x permutations. Further if the 4bs are replaced by b1 , b2 , b3 , b4 then we will get (4!)(3!)x permutations. In addition, if we replace the 2cs by c1 , c2 we will get (2!)(4!)(3!)x permutations. In the process we have generated the set { a1 , a2 , a3 , b1 , b2 , b3 , b4 , c1 , c2 , d } the elements of which can be permuted in 10! ways. Therefore, (2!)(4!)(3!)x = 10! and hence, We now generalize the above example. Let q1 , q2 , . . . , qt be nonnegative integers such that n = q1 + q2 + . . . + qt . Also let a1 , a2 , . . . , at be t distinct objects. Let P (n; q1 , q2 , . . . , qt ) denote the number of n-permutations of the n-combination of {q1 a1 , q2 a2 , . . . , qt at }. By an argument similar to the above example we have P (n; q1 , q2 , . . . , qt ) = n!/(q1 ! q2 ! . . . qt !) = C (n, q1 )C (n q1 , q2 )C (n q1 q2 , q3 ) . . . C (n q1 q2 . . . qt1 , qt ) By substituting the formula for each term in the product, the last expression can be simplied to the previous expression. x = 10!/(2!)(4!)(3!).
65
2.8
Ordered and Unordered Partitions
Let S be a set with n distinct elements and let t be a positive integer. A t-part partition of the set S is a set {A1 , A2 , . . . , At } of t subsets of S namely, A1 , A2 , . . . , At such that S = A1 A2 At , and Ai Aj = , This refers to unordered partition. An ordered partition of S is rstly, a partition of S ; secondly there is a specied order on the subsets. For example, the ordered partitions of S = {a, b, c, d} of the type {1, 1, 2} are given below: ({a}, {b}, {c, d}) ({b}, {a}, {c, d}) ({a}, {c}, {b, d}) ({c}, {a}, {b, d}) ({a}, {d}, {b, c}) ({d}, {a}, {b, c}) ({b}, {c}, {a, d}) ({c}, {b}, {a, d}) ({b}, {d}, {a, c}) ({d}, {b}, {a, c}) ({c}, {d}, {a, b}) ({d}, {c}, {a, b}) Here, our concern is in the number of such partitions rather than the actual list itself. for i = j.
2.8.1
Enumerating the ordered partitions of a set
The number of ordered partitions of a set S of the type (q1 , q2 , . . . , qt ) , where |S | = n, is P (n; q1 , q2 , . . . , qt ) = n!/(q1 ! q2 ! qt !) We see this by choosing the q1 elements to occupy the rst subset in C (n, q1 ) ways; the q2 elements for the second subset in C (n q1 , q2 ) ways etc. Thus,
Chapter 2 Combinatorics the number of ordered partitions of the type (q1 , q2 , . . . , qt ) is C (n, q1 )C (n q1 , q2 )C (n q1 q2 , q3 ) . . . C (n q1 q2 . . . qt1 , qt ) which is equal to n!/(q1 ! q2 ! qt !). Example 2.8.1:
66
In the game of bridge, the four players N , E , S and W are seated in a specied order and are each dealt with a hand of 13 cards. In how many ways can the 52 cards be dealt to the four players? We see that the order counts. Therefore, the number of ways is 52!/(13!)4 . Example 2.8.2: To show that (n2 )!/(n!)n is an integer. Consider a set of n2 elements. Partition this set into n-part partitions. Then the number of ordered n-part partitions is (n2 )!/(n!)n which has to be an integer. We can also consider partitioning a set of objects into dierent classes. For example, consider the four objects a, b, c, d. We can partition the set {a, b, c, d} into two classes: one class containing two subsets with one and three elements and another class containing two subsets each with two elements (these are the only possible classes). The partioning gives the following: ({a}, {b, c, d}) , ({b}, {a, c, d}) , ({c}, {a, b, d}) , ({d}, {a, b, c}) , ({a, b}, {c, d}) , ({a, c}, {b, d}) , ({a, d}, {b, c}) . The number of partitions of a set of n objects into m classes, where n m,
m which is called the Stirling Number of the second kind. It is denoted by Sn
67
is also the number of distinct ways of arranging a set of n distinct objects into a collection of m identical boxes, not allowing any box to be empty ( if
1 2 m empty boxes were permitted, then the number would be Sn + Sn + . . . + Sn ).
It easily follows that,

n 1 = 1. Sn = Sn k k k 1 , Also, Sn + kSn +1 = Sn
1<k<n
Proof. Consider the partitions of (n + 1) objects into k classes. We have the following two mutually exclusive and totally exhaustive cases: (i) The (n +1)th object is the sole member of a class: In this case, we simply form the partitions of the remaining n objects into k 1 classes and attach the class containing the sole member. The number of partitions
k 1 thus formed is Sn .
(ii) The (n +1)th object is not the sole member of any class: In this case, we rst form the partitions of the remaining n objects into k classes. This
k gives Sn partitions. In each such partition we then add the (n + 1)th k object to one of the k classes. We thus get kSn partitions of the required
type. From (i) and (ii) above, the result follows.

m The interpretation of Sn as the number of partitions of {1, . . . , n} into
exactly m classes also yields another recurrence. If we remove the class (say c) containing n and if there are r elements in the class c then we get a partition of (n r) elements into (m 1) classes. The class c can be chosen
Chapter 2 Combinatorics in C (n 1, r 1) possible ways. Hence,

n m Sn
68
=
r =1
m1 C (n 1, r 1)Sn r .
We can use the above relations to build the following table:

m Sn m=1 2 n=1 1 0 2 1 1 3 1 3 4 1 7 5 1 15 6 1 31 7 1 63
3 0 0 1 6 25 90 301
4 0 0 0 1 10 65 350
5 0 0 0 0 1 15 140
6 0 0 0 0 0 1 21
7 0 0 0 0 0 0 1
2.9
Combinatorial Identities
There are many interesting combinatorial identities. Some of these identities are suggested by the Pascals Triangle shown in Fig. 2.2. Pascals triangle is constructed by rst writing down three 1s in the form of a triangle (this corresponds to the rst two rows in the gure). Any number (other than the 1s at the ends) in any other row is obtained by summing up the two numbers in the previous row that are positioned immediately before and after. The 1s at the ends of any row are simply carried over. We consider below, some well-known combinatorial identities. The identities 2 through 5 can be seen to appear in the Pascals triangle. Newtons Identity C (n, r)C (r, k ) = C (n, k )C (n k, r k ), for integers n r k 0. The left-hand side consists of selecting two sets: rst a set A of r objects and then a set B of k objects from the set A. For example, it is the number
69
C (0, 0) = 1 C (1, 0) = 1 C (1, 1) = 1 C (2, 0) = 1 C (2, 1) = 2 C (2, 2) = 1 C (3, 0) = 1 C (3, 1) = 3 C (3, 2) = 3 C (3, 3) = 1 C (4, 0) = 1 C (4, 1) = 4 C (4, 2) = 6 C (4, 3) = 4 C (4, 4) = 1
Figure 2.2: of ways of selecting a committee of r people out of a set of n people and then choosing a subcommittee of k people. The right-hand side can be viewed as selecting the k subcommittee members in the rst place and then adding (r k ) people, to form the committee, from the remaining (n k ) people. A special case: C (n, r)r = nC (n 1, r 1). Pascals Identity: C (n, r) = C (n 1, r) + C (n 1, r 1). This is attributed to M. Stifel (14861567) and to Blaise Pascal (16231662). The proof of the identity is easy. Diagonal Summation: C (n, 0) + C (n + 1, 1) + C (n + 2, 2) + + C (n + r, r) = C (n + r + 1, r). The right-hand side is equal to the number of ways to distribute r indistinguishable balls into (n+2) numbered boxes. But, the balls may be distributed as follows: x a value for k where 0 k r; for each k distribute the k balls in the rst (n + 1) boxes and then the remainder in the last box. This can be done in
k=0,1,...,r
C (n + k, k ) ways.
Chapter 2 Combinatorics Row Summation C (n, 0) + C (n, 1) + . . . + C (n, r) + + C (n, n) = 2n .
70
Consider nding the number of subsets of a set with n elements. We take n bins (one for each element). We indicate the picking (respectively, not picking) of an element by putting a 1 (respectively a 0) in the corresponding bin. It is thus easy to see that the total number of possible subsets is equal to the total number of lling up of the bins (with a 1 or with a 0). This is equal to 2n , the right-hand side of the above equality. Now, the various possible subsets can also be counted as the the number of subsets with 0 elements, the number of subsets with 1 elements and so on. This way of enumeration leads to the expression on the left-hand side (note: the proof does not use the Binomial Theorem). Row Square Summation [C (n, 0)]2 + [C (n, 1)]2 + + [C (n, r)]2 + + [C (n, n)]2 = C (2n, n) Let S be a set with 2n elements. The right-hand side above counts the number of n-combinations of n. Now, partition S into two subsets A and B , each with n elements. Then, an n-combination of S is a union of an rcombination of A and an (nr)-combination of B , for r = 0, 1, . . . , n. For any r, there are C (n, r)r-combinations of A and C (n, n r)(n r)-combinations of B . Thus, by the Product Rule there are C (n, r)C (n, n r)n-combinations obtained by taking r elements from A and (n r) elements from B . Since C (n, n r) = C (n, r). we have C (n, r)C (n, n r) = [C (n, r)]2 for each r. Then the Sum Rule gives the left-hand side.
71
2.10
The Binomial and the Multinomial Theorems
Theorem 2.10.1: Let n be a positive integer. Then all elements x and y belonging to a commutative ring with unit element with the usual operations + and , (x + y )n = C (n, 0)xn + C (n, 1)xn1 y + C (n, 2)xn2 y 2 + + C (n, r)xnr y r + + C (n, n)y n Note: For the denition of a commutative ring, see Chapter 5. For the present, it is enough to think x and y as real numbers. Proof. The inductive proof is well-known. Below, we present the combinatorial proof. We write the left-hand side explicitly as consisting of n factors: (x + y )(x + y ) (x + y ) We select an x or a y from each factor (x + y ). This gives terms of the form xnr y r for each r = 0, 1, . . . , n. We collect all such terms with similar exponents on x and y and sum them up. This sum is then the coecient of the term of the form xnr y r in the expansion of (x + y )n . For any given r, to get the term xnr y r we select r of the y s from the n factors (x gets chosen from the remaining n r factors). This can be done in C (n, r) ways. This then is the coecient of xnr y r as required by the theorem. The binomial coecients (of the type C (n, r)) appearing above occur in the Pascals triangle. For a xed n, we can obtain the ratio of the (k + 1)st bionomial coecient of order n to the k th : C (n, k +1)/C (n, k ) = (n k )/(k + 1).
72
This ratio is larger than 1 if k < (n1)/2 and is less than 1 if k > (n1)/2. Therefore, we can infer that the biggest binomial coecient must occur in the middle. We use Stirlings approximation to estimate how big the binomial coecients are: C (n, n/2) = Corollary 2.10.2: Using the Binomial Theorem we can get expansions for (1 + x)n and (1 x)n . If we set x = 1 and y = 1 in the Binomial Theorem, we get, C (n, 0) C (n, 1) + C (n, 2) + (1)n C (n, n) = 0. We can write this as: C (n, 0) + C (n, 2) + C (n, 4) + = C (n, 1) + C (n, 3) + C (n, 5) + = S Let S be the common sum. Then, by previous identity (see row summation), adding the two series, we get 2S = 2n or S = 2n1 . The combinatorial interpretation is easy. If S is a set with n elements, then the number of subsets of S with an even number of elements is equal to number of subsets of S with an odd number of elements and each of these is equal to 2n1 . Example 2.10.3: To show that: 1 C (n, 1) + 2 C (n, 2) + 3 C (n, 3) + + n C (n, n) = n2n1 , for each positive integer n. We use the Newtons Identity, rC (n, r) = nC (n 1, r 1) to replace each term on left-hand side; then the expression on the left reduces to n [C (n 1, 0) + + C (n 1, n 1)] = n2n1 (n/e)n (2n ) n! = = 2n n/ 2 2 (n/2)! {(n/2e) (n )} (2/n )
Chapter 2 Combinatorics giving the expression on the right.
73
The Multinomial Theorem: This concerns with the expansion of multinomials of the form (x1 + x2 + . . . + xt )n . Here, the role of the binomial coecients gets replaced by the multinomial coecients P (n; q1 , q2 , . . . , qt ) = where qi s are non-negative integers and n! q1 !q2 ! qt ! qi = n. (recall that the multino-
mial coecients enumerate the ordered partitions of a set of n elements of the type (q1 , q2 , . . . , qt )). Example 2.10.4:
3 3 2 By long multiplication we can get, (x1 + x2 + x3 )3 = x3 1 + x2 + x3 + 3x1 x2 2 2 2 2 + 3x2 1 x3 + 3x1 x2 + 3x1 x3 + 3x2 x3 + 3x2 x3 + 6x1 x2 x3 .
To get the coecient of, say, x2 x2 3 we choose x2 from one of the factors and x3 from the remaining two. This can be done in C (3, 1) C (2, 2) = 3 ways; therefore the required coecient should be 3. Example 2.10.5:
5 6 3 18 Find the coecient of x4 1 x2 x3 x4 in (x1 + x2 + x3 + x4 ) .
The product will occur as often as, x1 can be chosen from 4 out of the 18 factors, x2 from 5 out of the remaining 14 factors, x3 from 6 out of the remaining 9 factors and x4 from out of the last 3 factors. Therefore the
5 6 3 coecient of x4 1 x2 x3 x4 must be
74
C (18, 4) C (14, 5) C (9, 6) C (3, 3) = 18!/4!5!6!3!. Generalization of the above yields the following theorem: Theorem 2.10.6 (The Multinomial Theorem): Let n be a positive integer. Then for all x1 , x2 , . . . ,xt we have, (x1 + x2 + + xt )n =
1 q2 qt P (n; q1 , q2 , ..., qt )xq 1 x2 xt
where the summation extends over all sets of non-negative integers q1 , q2 , . . . , qt with q1 + q2 + + qt = n. To count the number of terms in the above expansion, we note that each
1 q2 qt term of the form xq 1 , x2 , . . . , xt corresponds to a selection of n objects
with repetitions from t distinct types. There are C (n + t 1, n) ways of doing this. This then is the number of terms in the above expansion. Example 2.10.7:
3 4 In (x1 + x2 + x3 + x4 +x5 )10 , the coecient of x2 1 x3 x4 x5 is
P (10; 2, 0, 1, 3, 4) = 10!/2!0!1!3!4! = 12, 600. There are C (10+51, 10) = C (14, 10) = 1001 terms in the above multinomial expansion. Corollary 2.10.8: In the multinomial theorem if we let x1 = x2 = = xt = 1, then for any positive interger t, we have tn = P (n; q1 , q2 , . . . , qt ), where the summation q i = n. extends over all sets of non-negative integers q1 , q2 , . . . , qt with
75
2.11
Principle of Inclusion-Exclusion
The Sum Rule stated earlier (see section 1.2 ) applies only to disjoint sets. A generalization is the Inclusion- Exclusion principle which applies to nondisjoints sets as well. We rst consider the case of two sets. If A and B are nite subsets of some universe U , then |A B | = |A| + |B | |A B | By the Sum Rule |A B | = |A B | + |A B | + |A B | Also, we have |A| = |A B | + |A B | |B | = |A B | + |A B |, and (2.4) (2.5) From (2.3) and (2.4), we get |A B | = |A| + |B | |A B |. Example 2.11.1: From a group of ten professors, in how many ways can a committee of ve members can be formed so that at least one of professor A or professor B is included? |A| + |B | = |A B | + |A B | + 2|A B | (2.3)
76
Let A1 and A2 be the sets of committees that include professor A and professor B respectively. Then the required number is |A1 A2 | . Now, |A1 | = C (9, 4) = 126 = |A2 | and |A1 A2 | = C (8, 3) = 56. Therefore it follows that |A1 A2 | = 126 + 126 56 = 196. The Inclusion-Exclusion Principle for the case of three sets can be stated as given below: If A, B and C are three nite sets then |A B C | = |A| + |B | + |C | |A B | |A C | |B C | + |A B C | The result can be obtained easily using Venn diagram. We now state a more general form of the Inclusion-Exclusion Principle. Theorem 2.11.2: If A1 , A2 , . . . , an are nite subsets of a universal set then, |A1 A2 An | = |Ai | |Ai Aj | + |Ai Aj Ak | (2.6)
+(1)n1 |A1 A2 . . . An |
the second summation on the right-hand side is taken over all the 2combinations (i, j ) of the integers {1, 2, ..., n}; the third summation is taken over all the 3-combinations (i, j, k ) of the integers {1, 2, ..., n} and so on. Thus, for n = 4, there are 4 + C (4, 2) + C (4, 3) + 1 = 24 1 = 15 terms on the right-hand side. In general there are, C (n, 1) + C (n, 2) + C (n, 3) + + C (n, n) = 2n 1 terms on the right-hand side. (Note: the term C (n, 0) is missing; so the 1 appears on right hand side)
77
Proof. The proof by induction is boring! Here we give the proof based on combinatorial arguments. We must show that every element of A1 A2 An is counted exactly once in the right hand side of (2.6). Suppose that an element x A1 A2 An is in exactly m (integer, 1) of the sets considered on the right-hand side of (2.6); for deniteness, say x A1 , x A2 , . . . , x A m and x / Am+1 , . . . , x / An .
Then x will be counted in each of the terms |Ai |, for i = 1, . . . , m; in other words x will be counted C (m, 1) times in the |Ai | term on right-hand |Ai Aj | side of (2.6). Also, x will be counted C (m, 2) times in the
term on right-hand side of (2.6) since there are C (m, 2) pairs of sets Ai , Aj where x is in both Ai and Aj . Likewise x is counted C (m, 3) times in the |Ai Aj Ak | term since there are C (m, 3) 3-combinations of Ai , Aj
and Ak such that x Ai , x Aj and x Ak . Continuing in this manner, we see that on the right hand side of (2.6) x is counted, C (m, 1) C (m, 2) + C (m, 3) + + (1)m1 C (m, m) number of times. Now, we must show that, this last expression is 1. Expanding (m 1)m by the Binomial Theorem we get, 0 = C (m, 0) C (m, 1) + C (m, 2) + + (1)m1 C (m, m). Using the fact that C (m, 0) = 1 and transposing all other terms to the lefthand side of the above equation, we get the required relation.
78
2.12
Eulers -function
If n is a positive integer, by denition, (n) is the number of integers x such that 1 x n and n and x are relatively prime (note: two positive integers are relatively prime if their gcd is 1). For example, (30) = 8 because the eight integers 1, 7, 11, 13, 17, 19, 23 and 29 are the only positive integers less than 30 and relatively prime to 30. Let Ai be the subset of U consisting of those integers divisible by pi . The integers in U relatively prime to n are those in none of the subsets A1 , A2 , . . . , Ak . So, (n) = |A1 A2 Ak | = |U | |A1 A2 Ak | If d divides n, then there are n/d multiples of d in U . Hence, |Ai | = n/pi , |Ai Aj | = n/pi pj , |A1 A2 Ak | = n/p1 p2 . . . pk .
Thus by Inclusion-Exclusion Principle, (n) = n n/pi +

i 1ij k
n/pi pj + + (1)k (n/p1 p2 pk ).

1 p1
This is equal to the product n 1
1 p2
1 pk
It turns out that computing (n) is as hard as factoring n. The following is a beautiful identity involving the Euler -function due to Smith (1875): (1, 1) (1, 2) . . . (1, n) (2, 1) (2, 2) . . . (2, n) = (1)(2) (n) ... ... ... ... (n, 1) (n, 2) (n, n) where (a, b) denotes the gcd of the integers a and b.
79
2.13
Inclusion-Exclusion Principle and the Sieve of Eratosthenes
The Greek mathematician Eratosthenes who lived in Alexandria in the 3rd century B.C. devised the sieve technique to get all primes between 2 and n. The method starts by writing down all the integers from 2 to n in the natural order. Then starting with the smallest that is 2, every second number (these are multiples of 2) is crossed out. Next starting with the smallest (uncrossed) number, that is 3, every third number (these are multiples of 3) is crossed out. This procedure is repeated until a stage is reached when no more numbers could be crossed out. The surviving uncrossed numbers are the required primes. We now compute how many integers between 1 and 1000 are not divisible by 2, 3, 5 and 7; that is how many integers remain after the rst 4 steps of the sieve method. Let U be the set of integers x such that 1 x 1000. Let A1 , A2 , A3 , A4 be the sets of elements (of U ) divisible by 2, 3, 5 and 7 respectively. Then, A1 A2 A3 A4 is the set of numbers in U that are divisible by at least one of 2, 3, 5, and 7. Hence the required number is |(A1 A2 A3 A4 ) | . We know that, |A1 | = 1000/2 = 500; |A3 | = 1000/5 = 200; |A1 A2 | = 1000/6 = 166; |A1 A4 | = 1000/14 = 71; |A2 A4 | = 1000/21 = 47; |A2 | |A4 | = 1000/3 = 333; = 1000/7 = 142;
|A1 A3 | = 1000/10 = 100; |A2 A3 | = 1000/15 = 66; |A3 A4 | = 1000/35 = 28;
80
|A1 A2 A3 | = 1000/30 = 33; |A1 A2 A4 | = 1000/42 = 23; |A1 A3 A4 | = 1000/70 = 14; |A2 A3 A4 |= 1000/106 = 9; |A1 A2 A3 A4 | = 1000/210 Then, |A1 A2 A3 A4 | = (500+333+200+142)(166+100+71+66+47+28)+(33+23+14+9)4 = 772. Therefore, |(A1 A2 A3 A4 ) | = 1000 772 = 228 =4
2.14
Derangements
Derangements are permutations of {1, . . . , n} in which none of the n elements appears in its natural place. Thus, (i1 , i2 , . . . , in ) is a derangement if i1 = 1, i2 = 2, . . . , and in = n. Let Dn denote the number of derangements of (1, 2, . . . , n). It follows that D1 = 0; D2 = 1, because (2, 1) is a derangement; D3 = 2 because (2, 3, 1) and (3, 1, 2) are the only derangements of (1, 2, 3). We will determine Dn using the Inclusion-Exclusion principle. Let U be the set of all n! permutations of {1, 2, . . . , n}. For each i, let Ai be the set of permutations such that element i is in its correct place; that is, all permutations (b1 , b2 , . . . , bn ) such that bi = i. Evidently, the set of derangements is precisely the set A1 A2 A3 An ; the number of elements in it is Dn . Now, the permutations in A1 are all of the form (1, b2 , . . . , bn ) where (b2 , . . . , bn ) is a permutation of {2, . . . , n}. Thus |A1 | = (n 1)!. Similarly it follows thet |Ai | = (n 1)!.
81
Likewise, A1 A2 is the set of permutations of the form (1, 2, b3 , . . . , bn ), so that |A1 A2 | = (n 2)!. We can similarly argue, |Ai Aj | = (n 2)! for all admissible pairs of values of i, j, i = j . For any integer k , where 1 k n, the permutations in A1 A2 . . . Ak are of the form (1, 2, . . . , k, bk+1 , . . . , bn ) where (bk+1 , . . . , bn ) is a permutations of (k + 1, . . . , n). Thus, |A1 A2 . . . Ak | = (n k )! . More generally, we have |Ai1 Ai2 . . . Aik | = (n k )! for {i1 , i2 , . . . , ik }, a k -combination of {1, 2, . . . , n}. Therefore, |(A1 A2 A3 An )| = |U | |A1 A2 . . . An | = n! C (n, 1)(n 1)! + C (n, 2)(n 2)! + . . . + (1)n C (n, n) = n! Thus, Dn = n! 1 We know that, e1 = 1 (1/1!) + (1/2!) (1/3!) + ... + (1)n (1/n!) + (1)n+1 (1/(n + 1)!) + . . . or e1 = (Dn /n!) + (1)n+1 (1/(n + 1)!) + (1)n+2 (1/(n + 2)!) + . . . or |e1 (Dn /n!)| (1/(n + 1)!) + (1/(n + 2)!) + . . . (1/(n + 1)!)[1 + (1/(n + 2)) + (1/(n + 2)2 ) + . . .] (1/(n + 1)!)[1/{1 (1/(n + 2)}] = (1/(n + 1)!)[1 + 1/(n + 1)] n! n! n! n! + + . . . + (1)n . 1! 2! 3! n! +
1 2!
1 1!
1 3!
1 . + . . . + (1)n n !
We can get a quick approximation to Dn in terms of the exponential e.
82
As n , we know (n + 1)! at a faster rate; thus for large values of n, we regard n!/e as a good approximation to Dn . For example, D8 = 14833 and the value of 8!/e is about 14832.89906. A dierent approach leads to a recurrence relation for Dn . In considering the derangements of {1, . . . , n}, n 2 we can form two mutually exclusive and totally exhaustive cases as follows: (i) The integer 1 is displaced to the k th position (1 < k n) and k is displaced to the 1st position: In this case, the total number of derangements of is equal to the number of derangements of the set of n 2 numbers {2, 3, . . . , k 1, k +1, ..., n}. The required number is thus Dn2 . (ii) The integer 1 is displaced to the k th position (1 < k n) but k is displaced to a position dierent from the 1st position: These are precisely the derangements of the set of n 1 numbers {k, 2, 3, . . . , k 1, k + 1, ..., n}. Clearly, any derangement will displace the integer k from the 1st position. The required number is thus Dn1 . We note that in the above argument, k can be any one of 2, 3, ..., n; that is, it can take (n 1) possible values. Thus, we have, Dn = (n 1)(Dn1 + Dn2 ) By simple algebraic manipulations, it can be shown that Dn nDn1 = (1)n , for n 2. This recurrence relation when solved with the initial condition gives, Dn = n !
n1 i=1
(1)i+1 . (i + 1)!
83
2.15
Partition Problems
Several interesting combinatorial problems arise in connection with the partioning of integers. These are collectively referred to as partition problems . We denote as p(n, m), the number of partitions of an integer n into m parts. We write, n = 1 + 2 + + m and specify 1 2 m 1
For example the partitions of 2 are 2 and 1 + 1; so p(2, 1) = p(2, 2) = 1. The partitions of 3 are 3, 2+2 and 1+1+1 and so, p(3, 1) = p(3, 2) = p(3, 3) = 1. Similarly the partitions of 4 are 4, 3 + 1, 2 + 2, 2 + 1 + 1 and 1 + 1 + 1 + 1; so p(4, 2) = 2 and p(4, 1) = p(4, 3) = p(4, 4) = 1.
2.15.1
Recurrence relations p(n, m)
The following recurrence relations can be used to compute p(n, m): p(n, 1) + p(n, 2) + + p(n, k ) = p(n + k, k ) and p(n, 1) = p(n, n) = 1 (2.7) (2.8)
Proof. The second formula is obvious by denition. We proceed to prove the rst formula. Let A be the set of partitions of n having m parts, m k ; each partition in A can be considered as a k -tuple. Let B be the set partitions of n + k into k parts. Dene a mapping : A B by setting, (1 , 2 , . . . , m ) = (1 + 1, 2 + 1, . . . , m + 1, 1, 1, . . . , 1) Since 1 + 2 + + m = n, the sequence on the right side above (equation
84
***) gives a partition of n + k into k parts. (Note that if m = k the 1s in the partion of n + k will be absent.) Clearly the mapping is bijective. Hence, |A| = p(n, 1) + p(n, 2) + + p(n, k ) = |B | = p(n + k, k ) The equations (2.7) and (2.8) allow us to compute p(n, m)s recursively. For example, if n = 4 and m = 6 the values of p(n, m)s for n 4 and m 6 are given by the following array: p(n, m) m = 1 n=1 1 2 1 3 1 4 1 5 1 6 1 2 0 1 1 2 2 3 3 0 0 1 1 2 3 4 0 0 0 1 1 2 5 0 0 0 0 1 1 6 0 0 0 0 0 1
2.16
Ferrer Diagrams
We next illustrate the idea of Ferrer diagram to represent a partition. Consider a partition such as 5 + 3 + 2 + 2. We represent it diagrammatically as shown in the Fig. 2.3 on the following page. The representation as seen in the diagram has one row for each part; the number of squares in a row is equal to the size of the part it represents; an upper row has at least as many number of squares as there are in a lower row; the rows are aligned to the left. The partition obtained by rendering the columns (of a given partition) as rows is known as the conjugate partition of a given partition. For example, from the above Ferrer diagram it follows (by turning the gure by 90o clockwise and by taking a mirror reection) that the conjugate partition of 5 + 3 + 2 + 2 is 4 + 4 + 2 + 1 + 1.
85
x x 5+3+3+1+1 x x x
x x x
x x
x 9+3+1
Figure 2.3:
Figure 2.4:
2.16.1
Proposition
The number of partitions of n into k parts is equal to the number of partitions of n into parts, of which the largest is k . It is easy to see that we can establish a bijection between the set of partitions of n having k as the largest part. This can be seen using the Ferrer diagram.
2.16.2
Proposition
The number of self-conjugate partitions of n is the same as the number of partitions of n with all parts unequal and odd. Consider the Ferrer diagram associated with a partition of n with all parts unequal and odd. We can obtain a new Ferrer diagram by placing the squares of each row in a set-square arrangement as shown in Fig. 2.4. The new Ferrer diagram denes a self-conjugate partition. Similarly, reversing the argument, each self-conjugate partition corresponds to a unique partition with all parts unequal and odd.
86
2.16.3
Proposition
The number of partitions of n is equal to the number of partitions of 2n which have exactly n parts. Any partition of n can have at most n parts. Treat each partition as an n-tuple, (1 , 2 , . . . , n ) where, in general, for some i all i+1 to n will be 0. Now add 1 to each j , 1 j n this denes a partition of 2n (as we are adding n 1s) where there must be exactly n parts. It is easy to see that we have a bijection from the set of the original n-tuples to the set of n-tuples formed as above. Hence the result follows.
2.17
Solution of Recurrence Relations
Dierent types of recurrence relations (also known as dierence equations) arise in many enumeration problems and in the analysis of algorithms. We illustrate many types of recurrence relations and the commonly adopted techniques and tricks used to solve them. We also give the general technique based on generating functions to solve recurrence relations. [20], [9] and [56] provide a good material for solving recurrence relations. Example 2.17.1: The number Dn of derangements of the integers (1, 2, . . . , n) as we have seen in section *** satises the recurrence relation, Dn nDn1 = (1)n , for n 2 with D1 = 0. The easiest way to solve this recurrence relation is to rewrite it as, Dn Dn1 (1)n = n! (n 1)! n! which is easy to solve.
Chapter 2 Combinatorics Example 2.17.2:
87
The sorting problem asks for an algorithm that takes as input a list or an array of n integers and that sorts them that is, arranges them in nondecreasing (or nonincreasing) order. One algorithm, call it procedure Mergesort(n), does this by splitting the given list of n integers into two sublists of n/2 and n/2 integers, applies the procedure on the sublists to sort them and merges the sorted sublists (note the recursive formulation). If the time taken by procedure Mergesort(n) in terms of the number of comparisons is denoted by T (n) then T (n) is known to satisfy the following recurrence relation: T (n) = T n 2 +T n 2 + n 1, with T (2) = 1
The above method is characteristic of divide-and-conquer algorithms where a main problem of size n is split it into b (b > 1) subproblems of size say, n/c (c > 1); the subproblems are solved by further splitting; the splitting stops when the size of the subproblems are small enough to be solved by a simple technique. A divide-and-conquer algorithm combines the solutions to the subproblems to yield a solution to the original problem. Thus there is a non-recursive cost (denoted by the function f (.) ) associated in splitting and/or combining (the solutions). Thus we generaly get a recurrence relation of the type, T (n) = bT (n/c) + f (n) Example 2.17.3: Consider the problem of nding the number of binary sequences of length n which do not contain two consecutive 1s. Let wn be the number of such sequences. Let un be the number of such
88
sequences whose last digit is a 1. Also, let vn be the number of such sequences whose last digit is a 0. Obviously, wn = un + vn . Consider extending any sequence of the required type from length n 1 to length n. We have the following two possibilities: (i) If a sequence of length n 1 ends with a 1 then, we can append a 0 but not a 1. (ii) If a sequence of length n 1 ends with a 0 then, we can append either a 0 or a 1. It is not dicult to reason that un and vn can be counted as given by the following recurrence relations: vn = vn1 + un1 These lead to the equations, vn = vn1 + vn2 and un = un1 + un2 and un = vn1 .
which when added give the recurrence relation, wn = wn1 + wn2 . This equation is the same as that for Fibonacci numbers, which can be solved with the initial conditions w1 = 2 and w2 = 3. Example 2.17.4: A circular disk is divided into n sectors. There are p dierent colors (of paints) to color the sectors so that no two adjacent sectors get the same color. We are interested in the number of ways of coloring the sectors.
89
2 3 4
Let un be the number of ways to color the disk in the required manner. This number clearly depends upon both n and p. We form a recurrence relation in n using the following reasoning as given in [56]. We construct two mutually exclusive and exhaustive cases: (i) The sectors 1 and 3 are colored dierently. In this case, removing sector 2 gives a disk of n 1 sectors. Sector 2 can be colored in p 2 ways. (ii) The sectors 1 and 3 are of the same color. In this case, removing sector 2 gives a disk of n 2 sectors as sectors 1 and 3 being of the same color can be fused as one. Sector 2 can be colored using any of the p 1 colors (i.e., excluding the color of sector 1 or sector 3). For each coloring of sector 2, we can color the disk of n 2 sectors in un2 ways. Thus, we have the following recurrence relation: un = (p 2)un1 + (p 1)un2 with the initial conditions, u2 = p(p 1)andu3 = p(p 1)(p 2). The solution can be obtained as, un = (p 1)[(p 1)n1 + (1)n ].
90
2.18
Homogeneous Recurrences
The recurrence relations in the above examples can be solved using the techniques described in this section. In general, we can consider equations of the form, an = f (an1 , an2 , . . . , ani ), where n i
Here we have a recurrence with nite history as an depends on a xed number of earlier values. If an depends on all the previous values then the recurrence is said to have a full history. In this section, we here consider equations of the form, c0 an + c1 an1 + ... + ck ank = 0. (2.9)
The above equation is linear as it does not contain terms like ani anj , a2 ni and so on; it is homogeneous as the linear combination of ani is zero; it is with constant coecients because the ci s are constants. For example, the well-known Fibonacci Sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, . . . is dened by the linear homogeneous recurrence, fn = fn1 + fn2 , when n 2 with the initial conditions, f0 = 0 and f1 = 1. It is easy to see that if fn and gn are solutions to (2.9), then so does a linear combinations of fn and gn say pfn + qgn , where p and q are constants. To solve (2.9), we try an = xn where x is an unknown constant. If an = xn is used in (2.9) we should have, c0 xn + c1 xn1 + . . . + ck xnk = 0. The solution x = 0 is trivial; otherwise, we must have p(x) co xk + c1 xk1 + . . . + ck = 0.
91
The equation p(x) = 0 is known as the characteristic equation which can be solved for x. Example 2.18.1: Consider the Fibonacci Sequence as above. We can write the recurrence relation as fn fn1 fn2 = 0. (2.10)
The characteristic equation is x2 x 1 = 0. The roots r1 , r2 of this equation are given by, r1 = (1 + 5)/2 and r2 = (1 5)/2.
n n So, the general solution of (2.10) is, fn = c1 r1 + c 2 r2 .
Using the initial conditions f0 = 0 and f1 = 1 we get, c1 + c2 = 0 and c1 r1 + c2 r2 = 1 which give, c1 = 1/ 5 and c2 = 1/ 5. Thus the nth Fibonacci number fn1 is given by, 1 1 + 5 n1 1 1 5 fn1 = 2 2 5 5
n1
It is interesting to observe that the solution of the above recurrence which has integer coecients and integer initial values, involves irrational numbers. Example 2.18.2: Consider the relation, an 6an1 + 11an2 6an3 = 0, with a0 = 1, a1 = 3 and a2 = 5. From the given equation we directly write its chracteristic equation as, x3 6x2 + 11x 6 = 0, that is, (x 1)(x 2)(x 3) = 0. n>2
92
Thus the roots of the characteristic equation are, 1, 2 and 3 and the solution should be of the form, an = A1n + B 2n + C 3n . Applying the initial conditions, we get, a0 = 1 = A + B + C ; a1 = 3 = A + 2B + 3C ; and a2 = 5 = A + 4B + 9C
which yield, A = 2, B = 4 and C = 1. Hence the solution is given by an = 2 + 4.2n 3n . We next deal with the case when the characteristic equation has multiple roots. Let the characteristic polynomial p(x) be, p(x) = co xk + c1 xk1 + + ck . Let r be a multiple root occurring two times; that is, (x r)2 is a factor of p(x). We can write, p(x) = (x r)2 q (x), where q (x) is of degree (k 2).
For every n k consider the nth degree polynomials, un (x) = c0 xn + c1 xn1 + . . . + ck xnk and vn (x) = c0 nxn + c1 (n 1)xn1 + + ck (n k )xnk . We note that, vn (x) = xun (x). Now, un (x) = xnk p(x) = xnk (x r)2 q (x) = (x r)2 [xnk q (x)] So, un (r) = 0, which implies that vn (r) = run (r) = 0, for all n k . That is, c0 nrn + c1 (n 1)rn1 + + ck (n k )rnk = 0. From (2.9), we conclude an = nrn is also a solution to the given recurrence. More generally, if root r has multiplicity m, then an = rn , an = nrn , an = n2 rn , . . . , an = nm1 rn all are distinct solutions to the recurrence.
93
Consider the recurrence, an 11an1 + 39an2 45an3 = 0, when n > 3 with a0 = 0, a1 = 1 and a2 = 2. We can write the characteristic equation as x3 11x2 + 39x 45 = 0, the roots of which are 3, 3 and 5. Hence, the general solution can be written as an = (A + Bn)3n + C 5n . Using the initial conditions, we get a0 = A + C = 0; a1 = 3(A + B ) + 5C = 1; and a2 = 9(A + 2B ) + 25C = 2. This gives, A = 1, B = 1 and C = 1. Hence the required solution is given by, an = (1 + n)3n + 5n .
2.19
Inhomogeneous Equations
Inhomogeneous equations are more dicult to handle and linear combinations of the dierent solutions may not be a solution. We begin with a simple case. c0 an + c1 an1 + + ck ank = bn p(n). Here the left-hand side is same as before; on the right-hand side, b is a constant and p(n) is a polynomial in n of degree d. The following example is illustrative. Example 2.19.1:
Chapter 2 Combinatorics Consider the recurrence relation, an 3an1 = 5n Multiplying by 5, we get 5an 15an1 = 5n+1 Replacing n by n 1, we get 5an1 15an2 = 5n Subtracting (2.12) from (2.11), an 8an1 + 15an2 = 0
94
(2.11)
(2.12) (2.13)
The corresponding characteristic polynomial is x2 8x+15 = (x3)(x5) and so we can write the general solution as, an = A3n + B 5n (2.14)
We can observe that 3n and 5n are not solutions to (2.11)! The reason is that (2.11) implies (2.13), but (2.13) does not imply (2.11) and hence they are not equivalent. From the original equation we can write a1 = 3a0 + 5, where a0 is the initial condition. From (2.14) we get, A + B = a0 and 3A + 5B = a1 = 3a0 + 5. Therefore, we should have, A = a0 5/2 and B = 5/2. Hence, an = [(2a0 5)3n + 5n+1 ]/2.
2.20
Repertoire Method
The repertoire method (illustrated in [60]) is one technique that works by trial and error and may be useful in certain cases. Consider for example, an nan1 + (n 2)an2 = (2 n), with the initial condition a1 = 0 and a2 = 1. We relax the above recurrence by writing a general function, say f (n) on the right-hand side of the above
Chapter 2 Combinatorics equation. Thus, we consider the equation, an nan1 + (n 2)an2 = f (n).
95
We now try various possible candidate solutions for an , evaluate the lefthand side above and check to see if it yields the required value for f (n) the required f (n) should be (2 n). We tabulate the work as follows: Row No. 1 2 3 Suggested an 1 -1 n Resulting f (n) Resulting initial conditions 0 a1 = 1 and a2 = 1 0 a1 = 1 and a2 = 1 (2-n) a1 = 1 and a2 = 2
We note that row 1 and row 2 are not linearly independent. We also note that in row 3, we have an = n which gives the correct f (n) but the initial conditions are not correct. We can subtract row 1 from row 3 which gives an = (n 1) with the correct initial conditions. Thus, an = (n 1) is the required solution. Next we consider the following example. an (n 1)an1 + nan2 = (n 1) for n > 1; a0 = a1 = 1. We generalize this as, an (n 1)an1 + nan2 = f (n) for n > 1; a0 = a1 = 1. As before, we try various possibilities for an and look for the resulting f (n) to get a repertoire of recurrences. We summarize the work in the following table: Row No. 1 2 3 Suggested an 1 n n2 Resulting f (n) Resulting initial conditions 2 a0 = a1 = 1 n1 a0 = 0 and a1 = 1 n+1 a0 = 0 and a1 = 1
96
From row 1 and row 3, by subtraction we nd that an = (n2 1) is a solution because it results in f (n) = (n 1); however it gives the initial conditions a0 = 1 and a1 = 0. This solution and the solution in the second row are linearly independent. We combine these to give an = (n n2 + 1) which gives the right initial conditions.
2.21
Perturbation Method
The Perturbation Method is another technique to approximate the solution to a recurrence. The method is nicely illustrated in the following example given in [60]. Consider the following recurrence: a0 = 1; a1 = 2 and an+1 = 2an + an1 /n2 , n > 1 (2.15)
In the perturbation step, we note that the last term contains the 1 / n2 factor; we therefore reason that it will bring a small contribution to the recurrence; hence, approximately, an+1 = 2an This yields an = 2n which is an approximate solution. To correct the error involved, we consider the exact recurrence, bn+1 = 2bn , n > 0; with b0 = 1.
Then, bn = 2n which is exact. We now compare the two quantities an and bn . Let, Pn = an /bn = an /2n which gives an = 2n Pn .
Using the last relation in (2.15) we get, 2n+1 Pn+1 = 2 2n Pn + 2n1 Pn1 /n2 when n > 1.
Chapter 2 Combinatorics Therefore, Pn+1 = Pn + (1/4n2 )Pn1 , n > 0; P0 = 1 Clearly the Pn s are increasing. We see that, Pn+1 Pn (1 + (1/4n2 )), n 1, so that, Pn+1
n
97
[1 + (1/4k 2 )].
k=1
The innite product 0 corresponding to the right hand side above converges monotonically to 0 =
(1 + (1/4k 2 )) = 1.46505 . . .
k=1
Thus Pn is bounded above by 0 and as it is increasing, it must converge to a constant. We have thus proved that, an 2n , for some constant < 1.46505.
2.22
Solving Recurrences using Generating Functions
We next consider a general technique to solve linear recurrence relations. This is based on the idea of generating functions. We begin with an introduction to generating functions. An innite sequence {a0 , a1 , a2 , . . .} can be represented as a power series in an auxiliary variable z . We write, A(z ) = a0 + a1 z + a2 z 2 + . . . =
k 0
ak z k . . .
(2.16)
When it exists, A(z ) is called the generating function of the sequence {a0 , a1 , a2 , . . .}. It may be noted that the theory of innite series says that:
98
(i) If the series converges for a particular value of z0 of z then it converges for all values of z such that |z | < |z0 |. (ii) The series converges for some z = 0 if and only if the sequence {|an |n/2 } is bounded. (If this condition is not satised, it may be that the sequence {an /n!} is convergent.) In practice, the convergence of the series may simply be assumed. When a solution is discovered by any means, however sloppy, it may be justied independently say, by using induction. We use the notation [z n ]A(z ) to denote the coecient of z n in A(z ); that is, an = [z n ]A(z ). The following are easy to check: 1. If the given innite sequence is {1, 1, 1, . . .} then it follows that the corresponding generating function A(z ) is given by, A(z ) = 1 + z + z 2 + z 3 + . . . = 1/(1 z ). 2. If the given innite sequence is {1, 1/1!, 1/2!, 1/3!, 1/4!, . . .} then A(z ) = 1 + z/1! + z 2 /2! + z 3 /3! + z 4 /4! + . . . = ez . 3. If the given innite sequence is {0, 1, 1/2, 1/3, 1/4, . . .} then, A(z ) = z + z 2 /2 + z 3 /3 + z 4 /4 + . . . = ln(1/(1 z )).
2.22.1
Convolution
Let A(z ) and B (z ) be the generating functions of the sequences {a0 , a1 , a2 , . . .} and {b0 , b1 , b2 , . . .} respectively. The product A(z )B (z ) is the series, (a0 + a1 z + a2 z 2 + . . .) (b0 + b1 z + b2 z 2 + . . .)
Chapter 2 Combinatorics = a0 b0 + (a0 b1 + a1 b0 )z + (a0 b2 + a1 b1 + a2 b0 )z 2 + . . . It is easily seen that, [z n ]A(z )B (z ) =

n k=0
99
ak bnk . Therefore, if we wish to
evaluate any sum that has the general form

n
cn =
k=0
ak bnk
(2.17)
and if the generating functions A(z ) and B (z ) are known then we have cn = [z n ]A(z )B (z ). The sequence {cn } is called the convolution of the sequences {an } and {bn }. In short, we say that the convolution of two sequences corresponds to the product of the respective generating functions. We illustrate this with an example. From the Binomial Theorem, we know that, (1 + z )r is the generating function of the sequence {C (r, 0), C (r, 1), C (r, 2), . . .}. Thus we have, (1 + z )r =
k 0
C (r, k )z k
and (1 + z )s =
k 0
C (s, k )z k
By multiplication, we get (1 + z )r (1 + z )s = (1 + z )r+s . Equating the coecients of z n on both the sides we get, C (r, k )C (s, n k ) = C (r + s, n)
k=0
which is the well-known Vandermonde convolution. When bk = 1 (for all k = 0, 1, 2, . . . ) , then from (2.12) we get, cn =
k=0
ak .
Thus, convolution of a given sequence with the sequence {1, 1, 1, . . .} gives a sequence of sums. The following examples are illustrative of this fact.
100
By taking the convolution of the sequence {1, 1, 1, . . .} (whose generating function is 1/(1 z ) ) with itself we can immediately deduce that 1/(1 - z)2 is the generating function of the sequence {1, 2, 3, 4, 5, . . .}. Example 2.22.2: We can easily see that 1/(1 + z ) is the generating function for the sequence {1, 1, 1, 1, . . .}. Therefore 1/(1 + z )(1 z ) or 1/(1 z 2 ) is the generating function of the sequence {1, 0, 1, 0, . . .} which is the convolution of the sequences {1, 1, 1, . . .} and {1, 1, 1, 1, . . .}.
2.23
Some simple manipulations
In the following, we assume F (z ) and G(z ) to be the respective generating functions of the innite sequences {fn } and {gn }. 1. For some constants u and v we have, uF (z ) + vG(z ) = u
n0
fn z n + v
n0
gn z n =
n0
(ufn + vgn )z n ,
which is therefore the generating function of the sequence {ufn + vgn } . 2. Replacing z by cz where c is a constant, we get G(cz ) =
n0
gn (cz )n =
n0
c n gn z n
which is the generating function for the sequence {cn gn }. Thus 1/(1 cz ) is the generating function of the sequence {1, c, c2 , c3 , . . .}.
101
3. Given G(z ) in a closed form we can get G (z ). Term by term dierentiation (when possible) of the innite sum of G(z ) yields, G (z ) = g1 + 2g2 z + 3g3 z 2 + 4g4 z 3 + . . . Thus G (z ) represents the innite sequence {g1 , 2g2 , 3g3 , 4g4 , . . .} i.e., {(n+ 1)gn+1 }. Thus, with a shift, we have brought down a factor of n into the terms of the original sequence {gn }. Equivalently, zG (z ) is the generating function for {ngn }. 4. By term by term integration (when possible) of the innite sum of G(z ), we have
z
G(t)dt = g0 z + (1/2)g1 z 2 + (1/3)g2 z 3 + (1/4)g3 z 4 + . . . =

0 n1
(1/n)gn1 z n
Thus by integrating G(z ) we get the generating function for the sequence {gn1 /n}. Example 2.23.1: It is required to nd the generating function of the sequence {12 , 22 , 32 , . . .}. We have seen above that 1/(1 x)2 is the generating function of the sequence {1, 2, 3, . . .}. By dierentiation, we can see that 2/(1 x)3 is the generating function of the sequence {2.1, 3.2, 4.3, . . .}. In this sequence, the term with index k is (k + 2)(k + 1) which can be written as (k + 1)2 + k + 1. We want the sequence {ak } where ak = (k + 1)2 . By subtracting the generating function for the sequence {1, 2, 3, . . .} from that for the sequence {2.1, 3.2, 4.3, . . .} , we get the required answer as [2/(1 x)3 ] [1/(1 x)2 ].
102
2.23.1
Solution of recurrence relations
Consider the following simultaneous recurrence relations, an + 2an1 + 4bn1 = 0 and bn 4an1 6bn1 = 0 (2.18) (2.19)
with a0 = 1 and b0 = 0. We multiply both the sides of equation (2.18) by z n and take the sum over n 1. This gives, an z n + 2z
n1 n1
an1 z n1 + 4z
n1
bn1 z n1 = 0. bn z n = 0.
n1
or,
n1
an z n + 2z
n1
an z n + 4z
The last equation can be written as, G(z ) 1 + 2zG(z ) + 4zH (z ) = 0 (2.20)
where the initial condition a0 = 1 has been used and it is assumed that G(z ) and H (z ) are the generating functions of the sequences {a0 , a1 , a2 , . . .} and {b0 , b1 , b2 , . . .} respectively. In a similar manner, from (2.19) we can obtain, H (z ) 4zG(z ) 6zH (z ) = 0 Equations (2.20) and (2.21) can be solved to yield, G(z ) = (1 6z )/(1 2z )2 and H (z ) = 4z/(1 2z )2 . (2.21)
From these closed forms it is easy to obtain, [z n ]G(z ) = an = 2n (1 2n) and [z n ]H (z ) = bn = n2n+1 Next we consider the following recurrence relation. an+2 3an+1 + 2an = n, with a0 = a1 = 1.
103
We multiply both sides of the above equation by z n+2 and sum over all n obtaining, an+2 z n+2 3z
n0
an+1 z n+1 + 2z 2
n0 n0
an z n =
nz n+2 .
n0
If we dene G(z ) = as,
an z n then the above equation can then be written
(G(z ) z 1) 3z (G(z ) 1) + 2z 2 G(z ) = z 3 /(1 z )2 . Note that in obtaining the above we have used the initial conditions a0 = a1 = 1 and we have used the fact that the innite sequence {0, 0, 0, 1, 2, 3, . . .} has the generating function z 3 /(1 z )2 . From the above, we get G(z ) as, G(z ) = {z 3 /(1 z )2 (1 3z + 2z 2 )} + {(1 2z )/(1 3z + 2z 2 )}. To get [z n ]G(z ) , we rst express the right-hand side of G(z ) above in terms of partial fractions. We get, G(z ) = 1 1 + (1 2z ) (1 z )2 1 . (1 z )3
Using innite series expansions of the terms on the right, it is easy to check that, [z n ]G(z ) = an = 2n (n2 + n)/2. We note that we can describe a sequence by a recurrence relation and then try to use the recurrence relation (as in the above examples) to obtain an equation in the associated generating function. The following exercise illustrates that, in general, we may end up in a dierential equation involving the generating function.
Chapter 2 Combinatorics Example 2.23.2: Consider the following sequence: a0 = 1, a1 = 1/2, a2 = (1 3)/(2 4), a4 = (1 3 5 7)/(2 4 6 7)
104
a3 = (1 3 5)/(2 4 6),
Obtain the generating function for the sequence {an }.
2.23.2
Some common tricks
We now illustrate some common manipulations in dealing with certain recurrences. 1. Consider the recurrence, an = nan1 + n(n 1)an2 , where n > 1 with a1 = 1 and a0 = 0. Dividing both the sides by n! gives the Fibonacci relation in an /n! which can be readily solved with the given initial conditions. 2. Consider the non-linear recurrence, an = (an1 an2 ), n > 1 with a1 = 2
and a0 = 1. We take the logarithm on both the sides and set bn = log an . This gives, bn = (bn1 + bn2 )/2, n > 1 with b1 = 1 and b0 = 0
which is a linear recurrence with constant coecients. 3. A slightly more tricky case is the following recurrence: an = 1/(1 + an1 ), n > 0 with a0 = 1 The rst few iterations give a0 = 1, a1 = 1/2, a2 = 2/3, a3 = 3/5, a4 = 5/8 etc. We can easily recognize the Fibonacci numbers in these ratios. This
Chapter 2 Combinatorics suggests the substitution an = bn1 /bn which yields the equation, bn1 /bn = 1/(1 + bn2 /bn1 ), n > 0 with b1 = 1 and b0 = 1 which reduces to, bn = bn1 + bn2 which can be easily solved.
105
2.24
Illustrative Problems
We end this chapter with two illustrative problems. Problem 2.24.1 (Counting binary trees): We consider this problem as given in [49]. We recursively dene a binary tree as: a binary tree is empty (having no vertex) or it consists of a distinguished vertex, called the root, together with an ordered pair of binary trees called the left subtree and the right subtree. It can be noted that this denition admits a binary tree to have only a left subtree or only a right subtree. The problem is to count the number bn of binary trees with exactly n vertices. By exhaustive listing we nd b0 = 1, b1 = 1, b2 = 2 and b3 = 5. Let B (z ) be the generating function for the sequence {b0 , b1 , b2 , b3 , . . .}. By denition, B (z ) = b0 + b1 z + b2 z 2 + b3 z 3 + . . .. For n 1 the number of binary trees with n vertices can be enumerated as the number of ordered pairs of the form (B1 , B2 ) where B1 and B2 are binary trees that together have exactly (n 1) vertices i.e., if B1 has k vertices then B2 will have (n k 1) vertices where k can take the values 0, 1, 2, 3, . . . , (n 1). Therefore the number bn of such ordered pairs is given
Chapter 2 Combinatorics by, bn = b0 bn1 + b1 bn2 + + bn1 b0 where n 1.
106
The right-side above can be recognized to be the coecient of z n1 in the product B (z )B (z ) = {B (z )}2 . Hence bn should be the coecient of z n in z {B (z )}2 . We note that z {B (z )}2 is the generating function for the sequence {b1 , b2 , b3 , . . .} which is the same as the sequence generated by B (z ) except that B (z ) also generates b0 = 1. We thus have, B (z ) = 1 + z {B (z )}2 If z is such a real number so that the power series B (z ) converges then B (z ) will also be a real number; then the above quadratic can be solved to give, B (z ) = 1+ (1 4z ) 2z or 1 (1 4z ) 2z
yielding two possibilities. We thus have, zB (z ) = 1+ (1 4z ) 2 or 1 (1 4z ) 2
Dierentiating the right-side of the above equation with respect to z and letting z 0, we see that the rst expression tends to 1 whereas the second expression tends to 1. Now, zB (z ) = b0 z + b1 z 2 + b2 z 3 + , and its derivative with respect to z is b0 + 2b1 z + 3b2 z 2 + , which tends to 1 as z 0. Therefore we must have, zB (z ) = 1 (1 4z )1/2 2 or B (z ) = 1 (1 4z )1/2 2z
107
By expanding (1 2z )1/2 in innite series we can get the expansion for B (z ) and get, bn = [z n ]B (z ) = 1 2n n+1 n
The numbers bn are known as Catalan numbers. Problem 2.24.2: Average-case analysis of a simple algorithm to nd the maximum in a list. This problem is to do an average-case analysis of a trivial algorithm. Let X [1 . . . n] be an array of n distinct positive real numbers. The following pseudo-code FindMax returns in the variable max, the largest element
in the array X [1 . . . n]:

max:= -1; for i := 1 to n do
if max < X [i] then max := X [i];
The problem is to nd the average-case time complexity of the above codesegment. This essentially involves nding the average number of times the assignment max := X[i] is executed . FindMax can be written as the following assembly-language like code using a reduced set of pseudo-code
statements:
max := 1; i := 0; 1: i := i + 1; if i > n then goto 2; if max X [i] then goto 1;
max := X [i]; goto 1;
108
2: - - -
Thus on a sequential computer, a compiled form of FindMax will execute: (i) a xed number of assignments to initialize the variables max and i (ii) (n+1) comparisons of the form i > n? (iii) (n+1) increments of the index i (iv) n comparisons of the form max X[i]? (v) a variable number (between 1 and n ) of assignments max := X[i]. We can thus conclude that the time, tmaxfn of FindMax has to be of the form: tmaxfn = c0 + c1 n + c2 EXCH[X] where EXCH[X] is the number of times the instruction max := X[i] is executed ( i.e., the number of exchanges that has taken place) and c0 , c1 and c2 are the implementation constants dependent on the machine where the code runs. We note that EXCH[X] is 1 if X[1] is the largest element. Also, EXCH[X] takes the maximum value n when the array X is already sorted in the increasing order. To get an estimate of the expected value of EXCH[X] we introduce the permutation model. In this model we will assume that the array X is a permutation of the integers (1, . . . , n). Then each permutation can occur (be an input to FindMax) with equal probability 1/n!. Let sn,k be the number of those permutations wherein EXCH[X] is k . Then, if pn,k denotes the
Chapter 2 Combinatorics probability that EXCH[X] = k we have, pn,k = sn,k n!
109
The expected value, exch[X] of EXCH[X] is given by, exch[X] =

k=1
1 ksn,k = n! n!
ksn,k .
k=1
(2.22)
To get the sum on the right-side above, we consider all those permutations 1 2 3 . . . n of (1, . . . , n) wherein the value EXCH[X] is exactly k (by denition, there are exactly sn,k of these). With respect to these types of permutations, we reason that the following two cases can occur: (a) the last element n is equal to n: in this case 1 2 . . . n1 should have produced exactly k 1 exchanges because the last element being the largest will surely force one more exchange. Thus the number of permutations in this case is sn1,k1 . (b) the last element n is not equal to n: in this case n is one of 1, 2, 3, . . . , (n 1). Then 1 2 . . . n1 should have produced exactly k exchanges because the element (being less than the maximum) will not be able to force an exchange. In this case the number of permutations is (n 1)sn1,k . Thus we have, sn,k = sn1,k1 + (n 1)sn1,k We introduce the following generating function Sn (x) for each n:
n
(2.23)
Sn (x) =
k=1
sn,k xk .
(2.24)
Multiplying both the sides of (2.23) by xk and summing over k from 1 through n and using the denition (2.12) we get, Sn (x) = xSn1 (x) + (n 1)Sn1 (x) = (x + n 1)Sn1 (x) (2.25)
110
From the denition (2.24) we nd S1 (x) = x. Then from (2.25) we get S2 (x) = x(x + 1) etc. In general we nd that the explicit form of Sn (x) is given by Sn (x) = From (2.24) we get,
n Sn (x) n n1 j =0
(x + j )
(2.26)
=
k=1
ksn,k x
k 1
which gives
Sn (1)
=
k=1
ksn,k
(2.27)
Also from (2.26) we get, Sn (1) = From (2.23) we have, 1 exch[X] = n!

n n1 j =0
(x + j ) = n!
(2.28)
ksn,k =
k=1
Sn (1) Sn (1)
(2.29)
The second equality above uses (2.27) and (2.29). The derivative of (2.26) after taking natural logarithm on both the sides gives,
Sn (x) 1 1 1 = + + + Sn (x) x (x + 1) (x + n 1)
Substituting x = 1, this yields, Sn (1)/Sn (1) = Hn , the nth Harmonic number
which is the value of exch[X]. Hence the average time FindMax takes under the permutation model is
G tmaxfAV = c0 + c1 n + c2 Hn . n
Exercises
1. Let A = {a1 , a2 , a3 , a4 , a5 } be a set of ve integers. Show that for any permutation a1 , a2 , a3 , a4 , a5 of A, the product, (a1 a1 )(a2 a2 ) (a5 a5 )
Chapter 2 Combinatorics is always divisible by 2.
111
2. This problem concerns one instance of what is known as Langfords Problem. A 27-digit sequence includes the digits 1 through 9, three times each. There is just one digit between the rst two 1s and between the last two 1s. There are just two digits between the rst two 2s and between the last two 2s and so on. The problem asks to nd all such sequences. The na ve method is: Step 1. Generate the rst/next sequence of 27-digits using the digits 1 through 9, each digit occuring exactly three times. Step 2. Check if the current sequence satises the given constraint. Step 3. Output the current sequence as a solution if it satises the constraint. Step 4. If all sequences have been generated then stop; else go to Step 1. How many sequences are examined in the above method? Bonus: How is it possible to do better? 3. How many ways are there to choose three or more people from a set of eleven people? 4. Three boys and four girls are to sit on a bench. The boys must sit together and the girls must sit together. In how many ways can this be done? 5. An urn contains 5 red marbles, 2 blue marbles and 5 green marbles. Assume that marbles of the same color are indistinguishable. How
Chapter 2 Combinatorics many dierent sequences of marbles of length four can be chosen?
112
6. Let X = {1, 2, 3, 4, . . . , (2n 1)} and let |X | (n + 1). Prove the following result (due to P. Erd os): There are two numbers a, b X , with a < b such that a divides b. In the above problem, if we prescribe |X | = n, will the above result be still true? 7. Find the approximate value (correct to three decimal places) of (1.02)10 . (Hint: Use the Binomial Theorem.) 8. Without using the general formula for (n) above, reason that when n = p ( is a natural number) is a prime power, (n) can be expressed as p (1 1/p). 9. If m and n are relatively prime, then argue that (mn) = (m)(n). 10. For an arbitrary natural number n prove that, is over all natural numbers dividing n). 11. If p is a prime and n is a natural number, then argue that, (pn) = p(n) if p divides n and (pn) = (p 1) (n), otherwise. 12. Argue that the number of partitions of a number n into exactly m terms is equal to the number of partitions of n m into no more than m terms. 13. Find the number of ways in which eight rooks may be placed on a conventional 8 8 chessboard so that no rook can attack another and the white diagonal is free of rooks.
d|n
(d) = n (the sum
113
14. What is the number An of ways of going up a staircase with n steps if we are allowed to take one or two steps at a time? 15. Consider the evaluation of n n determinant by the usual method of expansion by cofactors. If fn is the number of multiplications required then argue that fn satises the recurreence fn = n(fn1 + 1), n > 1, f1 = 0
Solve the above recurrence and hence show that fn en!, for all n > 1. 16. Without using induction prove that
n
i=1
1 i(i + 1)(i + 2) = n(n + 1)(n + 2)(n + 3) 4
17. This problem is due to H. Larson (1977). Consider the innite sequence an with a1 = 1, a5 = 5, a12 = 144 and an + an+3 = 2an+2 . Prove that an is the nth Fibonacci number. 18. Solve the recurrence relation an = 7an1 13an2 3an3 + 18an4 where, a0 = 5, a1 = 3, a2 = 6, a3 = 21. [Solution: Characteristic equation is (x + 1)(x 2)(x 3)2 , an = 2(1)n + 2n + 2 3n n3n , n 0.] 19. Consider the following recurrence: an 6an1 7an2 = 0, n5
with a3 = 344, a4 = 2400. Show that an = 7n + (1)n+1 , n 3.
Chapter 2 Combinatorics 20. Consider the recurrence relation an = 6an1 9an2 ,
114
with a0 = 2 and a1 = 3. Show that the associated generating function g (x) is given by g (x) = Hence obtain an . Solution: an = (2 n) 3n , n 0. 21. Let N be the number of strings of length n made up of the letters x, y and z, where z occurs an even number of times.
1 n Show that N = 2 (3 + 1).
2 9x . 1 6x + 9x2
22. How many sequences of length n can be composed from a, b, c, d in such a way that a and b are never neighboring elements? (Hint: Let xn = number of such sequences that start from a or b; let yn = number of such sequences that start from c or d; form two recurrences involving xn and yn .)
Chapter 3 Basics of Number Theory

3.1 Introduction
In this chapter, we present some basics of number theory. These include divisibility, primes, congruences, some number-theoretic functions and the Euclidean algorithm for nding the gcd of two numbers. We also explain the big O notation and polynomialtime algorithms. We show, as examples, that the Euclidean algorithm and the modular exponentiation algorithm are polynomialtime algorithms. We denote by Z, the set of integers and N, the set of positive numbers.
3.2
Divisibility
Denition 3.2.1: Let a and b be any two integers and a = 0. Then b is divisible by a (equivalently, a is a divisor of b) if there exists an integer c such that b = ac.
115
116
If a divides b, it is denoted by a | b. If a does not divide b, we denote it by a b. Theorem 3.2.2: (i) a | b implies that a | bc for any integer c. (ii) If a | b and b | c, then a | c. (iii) If a divides b1 , b2 , . . . , bn , then a divides b1 x1 + + bn xn for integers x1 , . . . , xn . (iv) a | b and b | a imply that a = b. The proofs of these results are trivial and are left as exercises. (For instance, to prove (iii), we note that if a divides b1 , . . . , bn , there exist integers c1 , c2 , . . . , cn such that b1 = ac1 , b2 = ac2 , . . . , bn = acn . Hence b1 x1 + + bn xn = a(c1 x1 + + cn xn ), and so a divides b1 x1 + bn xn ). Theorem 3.2.3 (The Division Algorithm): Given any integers a and b with a = 0, there exist unique integers q and r such that b = qa + r, 0 r < |a|
Proof. Consider the arithmetic progression . . . b 3|a|, b 2|a|, b |a|, b, b + |a|, b + 2|a|, b + 3|a|, . . . with common dierence |a| and extended innitely in both the directions. Certainly, this sequence contains a least non-negative integer r. Let this term be b + q |a|, q Z. Thus b + q |a| = r (3.1)
117
and, its previous term, namely, r |a| = 0 so that r < |a|. If a > 0, then (3.1) gives b = qa + r, 0 r < |a| = a,
while if a < 0, |a| = a so that b = qa + r, 0 r < |a|. It is clear that the numbers q and r are unique. Theorem 3.2.3 gives the division algorithm, that is, the process by means of which division of one integer by a nonzero integer is made. An algorithm is a step by step procedure to solve a given mathematical problem in nite time. We next present Euclids algorithm to determine the gcd of two numbers a and b. Euclids algorithm (Euclid B.C. ) is the rst known algorithm in the mathematical literature. It is just the usual algorithm taught in High School Algebra.
3.3
The Greatest Common Divisor (gcd) and the Least Common Multiple (lcm) of two integers
Denition 3.3.1: Let a and b be two integers, at least one of which is not zero. A common divisor of a and b is an integer c(= 0) such that c | a and c | b. The greatest common divisor of a and b is the greatest of the common divisors of a and b. It is denoted by (a, b). If c divides a and b, then so does c. Hence (a, b) > 0 and is uniquely dened. Moreover, if c is a common divisor of a and b, that is, if c | a and
Chapter 3 Basics of Number Theory c | b, then a = a c and b = b c for integers a and b . Hence (a, b) = c(a , b )
118
so that c | (a, b). Thus any common divisor of a and b divides the gcd of a and b. Hence (a, b) is the least common divisor of a and b that is divisible by every common divisor of a and b. Moreover, (a, b) = (a, b). Proposition 3.3.2: If c | ab and (c, b) = 1, then c | a. Proof. By hypothesis c | ab. Trivially c | |ac. Hence c is a common divisor of ab and ac. Hence c is a divisor of (ab, ac) = a(b, c) = a, as (b, c) = 1. Denition 3.3.3: If a, b and c are nonzero integers and if a | c and b | c, then c is called a common multiple of a and b. The least common multiple (lcm) of a and b is the smallest of the positive common multiples of a and b and is denoted by [a, b]. As in the case of gcd, [a, b] = [a, b].
Euclids Algorithm
Since (a, b) = (a, b), we may assume without loss of generality that a > 0 and b > 0 and that a > b (If a = b, then (a, b) = (a, a) = a). By Division
Chapter 3 Basics of Number Theory Algorithm (Theorem 3.2.3), there exist integers q1 and r1 such that a = q 1 b + r1 , Next divide b by r1 and get b = q 2 r1 + r2 , Next divide r1 by r2 and get r1 = q 3 r2 + r3 , 0 r3 < r2 . 0 r2 < r1 . 0 r1 < b.
119
(3.2)
(3.3)
(3.4)
At the (i + 2)-th stage, we get the equation ri = qi+2 ri+1 + ri+2 , 0 ri+2 < ri+1 . (3.5)
Since the sequence of remainders r1 , r2 , . . . is strictly decreasing, this procedure must stop at some stage, say, rj 1 = qj +1 rj . Then (a, b) = rj . Proof. First we show that rj is a common divisor of a and b. To see this we observe from equation (3.6) that rj | rj 1 . Now the equation (3.5) for i = j 2 is rj 2 = q j rj 1 + rj . (3.7) (3.6)
Since rj | rj 1 , rj divides the expression on the right side of (3.7), and so rj | rj 2 . Going backward, we get successively that rj divides rj 1 | rj 2 , . . . , r1 , b and a. Thus rj | a and rj | b. Next, let c | a and c | b. Then from equation (3.2), c | r1 , and this when substituted in equation (3.3), gives c | r2 . Thus the successive equations of
120
the algorithm yield c | a, c | b, c | r1 , c | r2 , . . . , c | rj . Thus any common divisor of a and b is a divisor of rj . Consequently, rj = (a, b).
Extended Euclidean Algorithm

Theorem 3.3.4: If rj = (a, b), then it is possible to nd integers x and y such that ax + by = rj Proof. The equations preceding (3.6) are rj 3 = q j 1 rj 2 + rj 1 , rj 2 = q j rj 1 + rj . and (3.9) (3.8)
Equation (3.9) expresses rj in terms of rj 1 and rj 2 while the equation preceding it expresses rj 1 in terms of rj 2 and rj 3 . Thus rj = rj 2 q j rj 1 = rj 2 qj (rj 3 qj 1 rj 2 ) = (1 + qj qj 1 ) rj 2 qj rj 3 . Thus we have expressed rj as a linear combination of rj 2 and rj 3 , the coefcients being integers. Working backward, we get rj as a linear combination of a, b with the coecients being integers. The process given in the proof of Theorem 3.8 is known as the Extended Euclidean Algorithm.
Chapter 3 Basics of Number Theory Corollary 3.3.5:
121
If (a, m) = 1, then there exists an integer u such that au 1(mod m) and any two such integers are congruent modulo m. Proof. By the Extended Euclidean Algorithm, there exist integers u and v such that au + mv = 1. This however means that au 1(mod m). The second part is trivial as (a, m) = 1. Example 3.3.6: Find integers x and y so that 120x + 70y = 1. We apply Euclids algorithm to a = 120 and b = 70. We have 120 = 1 70 + 50 70 = 1 50 + 20 50 = 2 20 + 10 20 = 2 10 Hence gcd(120, 70) = 10.
Now starting from the last but one equation and going backward, we get 10 = 50 2 20 = 50 2(70 1 50) = 3 50 2 70 = 3 (120 1 70) 2 70 = 3 120 5 70.
Therefore x = 3 and y = 5 full the requirement.
122
3.4
Primes
Denition 3.4.1: An integer n > 1 is a prime if its only positive divisors are 1 and n. A natural number greater than 1 which is not a prime is a composite number. Naturally, 2 is the only even prime. 3, 5, 7, 11, 13, 17, . . . are all odd primes. The composite numbers are 4, 6, 8, 9, 10, . . . Theorem 3.4.2: Every integer n > 1 can be expressed as a product of primes, unless the number n itself is a prime. Proof. The result is obvious for n = 2, 3 and 4. So assume that n > 4 and apply induction. If n is a prime there is nothing to prove. If n is not a prime, then n = n1 n2 , where 1 < n1 < n and 1 < n2 < n. By induction hypothesis, both n1 and n2 are products of primes. Hence n itself is a product of primes. (Note that the prime factors of n need not all be distinct.) Suppose in a prime factorization of n, the distinct prime factors are p1 , p2 , . . . , pr , and that pi is repeated i times, 1 i r. Then
r 1 2 n = p 1 p2 pr .
(3.10)
We now show that this factorization is unique in the sense that in any prime factorization, the prime factors that occur are the same and that the prime powers are also the same except for the order of the prime factors. For instance, 200 = 23 52 , and the only other way to write it in the form (3.10) is 52 23 .
Chapter 3 Basics of Number Theory Theorem 3.4.3 (Unique factorization theorem): Every positive integer n > 1 can be expressed uniquely in the form
r 1 2 n = p 1 p2 pr
123
where pi , 1 i r, are distinct primes; the above factorization is unique except for the order of the primes. To prove Theorem 3.4.3, we need a property of primes. Lemma 3.4.4: If p is a prime such that p | (ab), but p a, then p | b. Proof. As p a, (p, a) = 1. Now apply Proposition 3.3.2. Note 3.4.5: Theorem 3.4.4 implies that if p is a prime and p a and p b, then p (ab). More generally, p a1 , p a2 , . . . p an imply that p (a1 a2 an ). Consequently if p | (a1 a2 an ), then p must divide at least one ai , 1 i n. Proof. (of Theorem 3.4.3) Suppose
1 2 j s r 1 2 n = p 1 p 2 p r = q1 q2 qj qs
(3.11)
are two prime factorizations of n, where the pi s and qi s are all primes. As
1 r r 1 . Hence by Note 3.4.5, p1 must divide some p1 | (p 1 pr ), p1 | q1 qs
qj . As p1 and qj are primes and p1 | qj , p1 = qj . Cancelling p1 on both the sides, we get

1 2 j 1 1 2 r p p2 p 1 r = q1 q2 qj 1 s . qs
(3.12)
124
Now argue as before with p1 if 1 1 1. If 1 < j , this procedure will result in the relation
1 2 j r 2 p 2 p r = q1 q2 qj 1 s . qs
(3.13)
Now p1 divides the right hand expression of (3.13) and so must divide the left hand expression of (3.13). But this is impossible as the pi s are distinct
1 primes. Hence 1 = j . Cancellation of p 1 on both sides of (3.11) yields
1 2 j 1 j +1 s r 2 p 2 pr = q1 q2 qj 1 qj +1 qs .
Repetition of our earlier argument gives p2 = one of the qi s, 1 i s, say, qk , k = j . Hence 2 = k and so on. This shows that each pi = some qt and
1 2 t r i that i = t so that p i = qt . Cancellation of p1 followed by p2 , . . . , pr
on both sides will leave 1 on the left side expression of (3.11) and so the right side expression of (3.11) should also reduce to 1. The unique factorization of numbers enables us to compute the gcd and lcm of two numbers. Let a and b be any two integers 2. Let p1 , . . . , pr be the primes which divide at least one of a and b. Then a and b can be written uniquely in the form
r 1 2 a = p 1 p2 pr r 1 2 b = p 1 p2 pr ,
where i s and j s are nonnegative integers. (Taking i s and j s to be nonnegative integers in the prime factorizations of a and b, instead of taking them to be positive, enables us to use the same prime factors for both a and b. For instance, if a = 72 and b = 45, we can write a and b as, a = 23 32 50
Chapter 3 Basics of Number Theory and b = 20 32 51 . Then clearly,

r
125
(a, b) =
i=1 r
pi pi
i=1
min(i , i )
and
[a, b] =
max(i , i )
We next establish two important properties of prime numbers. Theorem 3.4.6 (Euclid): The number of primes is innite. Proof. The proof is by contradiction. Suppose there are only nitely many primes, say, p1 , p2 , . . . , pr . Then the number n = 1 + p1 p2 pr is larger than each pi , 1 i r, and hence composite. Now any composite number is divisible by some prime. But none of the primes pi , 1 i r, divides n. (For, if pi divides n, then pi | 1, an impossibility). Hence the number of primes is innite. Theorem 3.4.7 (Nagell): There are arbitrarily large gaps in the sequence of primes. In other words, for any positive integer k 2, there exist k consecutive composite numbers. Proof. The k numbers (k + 1)! + 2, (k + 1)! + 3, . . . , (k + 1)! + k, (k + 1)! + (k + 1) are consecutive. They are all composite since (k + 1)! + 2 is divisible by 2, (k + 1)! + 3 is divisible by 3 and so on. In general (k + 1)! + j is divisible by j for each j , 2 j k + 1.
Chapter 3 Basics of Number Theory Denition 3.4.8:
126
Two numbers a and b are coprime or relatively prime if they are prime to each other, that is, if (a, b) = 1.
3.5
Exercises
1. For any integer n, show that n2 n is divisible by 2, n3 n by 6 and n5 n by 30. 2. Show that (n, n + 1) = 1 and that [n, n + 1] = n(n + 1). 3. Use the unique factorization theorem to prove that for any two positive integers a and b, (a, b)[a, b] = ab, and that (a, b) | [a, b]. (Remark: This shows that if (a, b) = 1, then [a, b] = ab. More generally, if {a1 , a2 , . . . , ar }, is any set of positive integers, then (a1 , a2 , . . . , ar ) divides [a1 , a2 , . . . , ar ]. Here (a1 , a2 , . . . , ar ) and [a1 , a2 , . . . , ar ] denote respectively the gcd and lcm of the numbers a1 , a2 , . . . , ar . 4. Prove that no integers x, y exist satisfying x + y = 100 and (x, y ) = 3. Do x, y exist if x + y = 99, (x, y ) = 3 ? 5. Show that there exist innitely many pairs (x, y ) such that x + y = 72 and (x, y ) = 9. Hint: One choice for (x, y ) is (63, 9). Take x prime to 8, that is, (x , 8) = 1 and take y = 8 x . Now use the pairs (x , y ). 6. If a+b = c, show that (a, c) = 1, i (b, c) = 1. Hence show that any two consecutive Fibonacci numbers are coprime. (The Fibonacci numbers
127
Fn are dened by the recursive relation Fn = Fn1 + Fn2 , where F0 = 1 = F1 . Hence the Fibonacci sequence is {1, 1, 2, 3, 5, 8, . . .}. 7. Find (i) gcd (2700, 15120). (ii) lcm [2700, 15120]. 8. Determine integers x and y so that (i) 180x + 72y = 36. (ii) 605x + 96y = 67. 9. For a positive integer n, show that there exist integers a and b such that n is a multiple of (a, b) = d and ab = n i d2 | n. (Hint: By Exercise 3 above, (a, b)[a, b] = ab = n and that (a, b) | [a, b]. Hence d2 | n. Conversely, if d2 | n, n = d2 c. Now take d = a and dc = b.) 10. Prove that amn 1 is divisible by am 1, when m and n are positive integers. Hence show that if an 1 is prime, a 2, then n must be prime. 11. Prove that
2N 1 n=1
1 is not an integer for n > 1. 2j 1
12. Let pn denote the n-th prime (p1 = 2, p2 = 3 and so on). Prove that pn > 22 . 13. Prove that if an = 22 + 1, then (an , an + 1) = 1 for each n 1. (Hint: Set 22 = x).
ar 1 14. If n = pa 1 pr is the prime factorization of n, show that d(n), the
n n n
number of distinct divisors of n is (a1 + 1) (ar + 1).
128
3.6
Congruences
A congruence is a division with reference to a number or a function. The congruence relation has a notational convenience that could be employed in making addition, subtraction, multiplication by constants and division in some special cases. Denition 3.6.1: Given integers a, b and n (= 0), a is said to be congruent to b modulo n, if a b is divisible by n, that is, a b is a multiple of n. In symbols, it is denoted by a b (mod n), and is read as a is congruent to b modulo n. The number n is the modulus of the congruence. Denition 3.6.2: If f (x), g (x) and h(x) (= 0) are any three polynomials with real coecients then by f (x) g (x) (mod h(x)), we mean that f (x) g (x) is divisible by h(x) over R, that is to say, there exists a polynomial q (x) with real coecients such that f (x) g (x) = q (x)h(x). The congruence given in Denition 3.6.1 is numerical congruence while that given in Denition 3.6.2 is polynomial congruence . We now concentrate on numerical congruence. Trivially, a b (mod m), i a b mod (m) . Hence we assume without loss of generality that the modulus of any numerical congruence is a positive integer. Proposition 3.6.3: 1. a b (mod m) i b a (mod m).
Chapter 3 Basics of Number Theory 2. If a b(mod m), and b c(mod m), then a c(mod m).
129
3. If a b (mod m), then for any integer k , ka kb (mod m). In particular, taking k = 1, we have a b (mod m), whenever a b ( mod m). 4. If a b (mod m), and c d (mod m), then (a) a c b d (mod m), and (b) ac bd (mod m). Proof. We prove only 3 and 4; the rest follow immediately from the denition.
3. If a b (mod m), then a b is divisible by m, and hence so is k (a b) = ka kb. Thus ka kb (mod m). 4. If a b and c d are multiples of m, say, a b = km and c d = k m for integers k and k , then (a + c) (b + d) = (a b) + (c d) = km + k m = (k + k )m, a multiple of m. This means that a + c b + d ( mod m). Similarly a c b d (mod m). Next, if a b (mod m), then by (3), ac bc (mod m). But then c d (mod m) gives bc bd ( mod m). Hence ac bd (mod m). Proposition 3.6.4: If ab ac (mod m), and (a, m) = 1, then b c (mod m). Proof. ab ac (mod m) gives that a(b c) = km for some integer k . As a | a(b c), a | km. But (a, m) = 1. Hence a | k so that k = ak , k Z, and
130
a(b c) = km = ak m. This however gives that b c = k m, and therefore b c (mod m). Corollary 3.6.5: If ab ac (mod m), then b c (mod
m ), d
where d = (a, m).
Proof. As d = (a, m), d | a and d | m. Therefore, a = da and m = dm , where (a , m ) = 1. Then ab ac (mod m) gives that da b da c ( mod m), that is da b da c ( mod dm ) and therefore a b a c(mod m ). But (a , m ) = 1. Hence b c. (mod m ). Proposition 3.6.6: If (a, m) = (b, m) = 1, then (ab, m) = 1. Proof. Suppose (ab, m) = d > 1 and p is a prime divisor of d. Then p | m and p | ab. But p | ab means that p | a or p | b as p is prime. If p | a, then (a, m) p > 1, while if p | b, (b, m) p > 1. This shows that (ab, m) = 1. Proposition 3.6.7: If ax 1 (mod m) and (a, m) = 1, then (x, m) = 1. Proof. Suppose (x, m) = d > 1. Let p be a prime divisor of d. Then p | x and p | m. This however means, since ax 1 = km for some k Z, that p | 1, a contradiction. (3.14) (3.15)
Chapter 3 Basics of Number Theory Proposition 3.6.8:
131
If a b (mod mi ), 1 i r, then a b (mod [m1 , . . . , mr ]), where [m1 , . . . , mr ] stands for the lcm of m1 , . . . , mr . Proof. The hypothesis implies that a b is a common multiple of m1 , . . . , mr and hence it is a multiple of the least common multiple of m1 , . . . mr . (Because if a b = i mi , 1 i r, and mi has the prime factorization mi = p1 1 p2 2 pt t , 1 i r, then pj j | mi for each j , in 1 j t and
i for each i, 1 i r. (Here the exponents j are nonnegative integers. This i i i i
enables us to take the same prime factors p1 , . . . , pt for all the mi s. See Section 3.4.) Hence mi is divisible by pj
maxi i j
for each j in 1 j t.
Thus each of these t numbers divides a b and they are pairwise coprime. Hence a b is divisible by their product. But their product is precisely the lcm of m1 , . . . , mr . (See Section 3.4.)
3.7
Complete System of Residues
A number b is called a residue of a number a modulo m if a b (mod m). Obviously, b is a residue of b modulo m. Denition 3.7.1: Given a positive integer m, a set S = {x1 , . . . , xm } of m numbers is called a complete system of residues modulo m if for any integer x, there exists a unique xi S such that x xi (mod m). We note that no two numbers xi and xj , i = j , in S are congruent modulo m. For if xi xj (mod m), then since xi xi (mod m) trivially, we have
132
a contradiction to the fact that S is a complete residue system modulo m. Conversely, it is easy to show that any set of m numbers no two of which are congruent modulo m forms a complete residue system modulo m. In particular, the set {0, 1, 2, . . . , m 1} is a complete residue system modulo m. Next, suppose that (x, m) = 1 and x xi (mod m). Then xi is also prime to m. (If xi and m have a common factor p > 1 then p | x as x xi = km, k Z. This however means that (x, m) p > 1, a contradiction to our assumption.) Thus if S = {x1 , . . . , xm } is a complete system of residues modulo m and x xi (mod m), then (x, m) = 1 i (xi , m) = 1. Then deleting the numbers xj of S that are not coprime to m, we get a subset S of S consisting of a set of residues modulo m each of which is relatively prime to m. Such a system is called a reduced system of residues modulo m. For instance, taking m = 10, {0, 1, 2, . . . . , 9} is a a complete system of residues modulo 10, while S = {1, 3, 7, 9} is a reduced system of residue modulo 10. The numbers in S are all the numbers that are less than 10 and prime to 10.
Eulers -Function
Denition 3.7.2: The Euler function (n) (also called the totient function) is dened to be the number of positive integers less than n and prime to n. It is also the cardinality of a reduced residue system modulo n. We have seen earlier that (10) = 4. We note that (12) is also equal to 4 since 1, 5, 7, 11 are all the numbers less than 12 and prime to 12. If p is a prime, then all the numbers in {1, 2, . . . , p 1} are less than p and prime to
133
p and so (p) = p 1. We now present Eulers theorem on the -function. Theorem 3.7.3 (Euler): If (a, n) = 1, then a(n) ( mod n). Proof. Let r1 , . . . , r(n) be a reduced residue system modulo n. Now
(ri , n) = 1 for each i, 1 i (n). Further, as (a, n) = 1, by Proposition 3.6.6, (ari , n) = 1. Moreover, if i = j , ari arj (mod n). For, ari arj ( mod n) implies (as (a, n) = 1), by virtue of Proposition 3.6.4, that ri rj ( mod n), a contradiction to the fact that r1 , . . . , r(n) is a reduced residue system modulo n. Hence ari , . . . , ar(n) is also a reduced residue system modulo n and
(n)
i=1
This gives that a(n)
(n) i=1 ri
(ari )
(n)
j =1
(n) j =1 rj
rj ( mod n).
(n) i=1 ri ,
(mod n). Further (ri , n) = 1 for n = 1 by Proposition 3.6.6.
each i = 1, 2, . . . , (n) gives that Consequently, by Proposition 3.6.4,
a(n) 1 ( mod n). Corollary 3.7.4 (Fermats Little Theorem): If n is a prime and (a, n) = 1, then an1 1 (mod n). Proof. If n is a prime, then (n) = n 1. Now apply Eulers theorem (Theorem 3.7.3). We see more properties of the Euler function (n) in Section 3.11. Another interesting theorem in elementary number theory is Wilsons theorem.
Chapter 3 Basics of Number Theory Theorem 3.7.5:
134
If u [1, m 1] is a solution of the congruence ax 1 (mod m), then all the solutions of the congruence are given by u + km, k Z. In particular, there exists a unique u [1, m 1] such that au 1(mod m). Proof. Clearly, u+km, k Z, is a solution of the congruence ax 1 (mod m) because a(u + km) = au + akm au 1 (mod m). Conversely, let ax0 1 ( mod m). Then a(x0 u) 0 (mod m), and therefore by Proposition 3.6.7, (x0 u) 0 (mod m) as (a, m) = 1 in view of au 1 (mod m). Hence x0 = u + km for some k Z. The proof of the latter part is trivial. Theorem 3.7.6 (Wilsons theorem): If p is a prime, then (p 1)! 1 ( mod p). Proof. The result is trivially true if p = 2 or 3. So let us assume that prime p 5. We look at (p 1)! = 1 2 (p 1). Now 1 1 ( mod p), p 1 1 ( mod p). Hence it is enough if we prove that 2 3 (p 2) 1 ( mod p), since the multiplication of the three congruences (See Proposition 3.6.3) will yield the required result. Now, as p ( 5) is an odd prime, the cardinality of L is even, where L = {2, 3, . . . , p 2}. For each i L, by virtue of Corollary 3.3.5, there and
135
exists a unique j , 1 j p 1 such that ij 1 (mod p). Now j = 1, and j = p 1. If j = p 1, then ij = i(p 1) i (mod p) and therefore i 1 (mod p). This means that p | (i + 1). This is not possible as i L . Also ij = ji. Moreover j = i since j = 1 implies that i2 1 (mod p) and therefore p | (i 1) or p | (i + 1). However this is not possible as this will imply that i = 1 or (p 1). Thus each i L can be paired o with a unique p3 j L such that ij 1 (mod p). In this way we get congruences. 2 p3 Multiplying these congruences, we get 2 2 3 (p 2) 1 1 1 ( mod p) Example 3.7.7: As an application of Wilsons theorem, we prove that 712!+1 0 (mod 719). Proof. Since 719 is a prime, Wilsons theorem implies that 718! + 1 0 ( mod 719). We now rewrite 718! in terms of 712 as 718! = (712)! 713 714 715 716 717 718 = 712!(719 6)(719 5) (719 1) = 712! (M (719) + 6!) . M (719) stands for a multiple of 719. 712! 6! ( mod 719) 712! 720 ( mod 719) 712! (719 + 1) ( mod 719) (712! 719) + 712! ( mod 719) 712! ( mod 719).
Chapter 3 Basics of Number Theory Thus 712! + 1 718! + 1 0 (mod 719)
136
If a b (mod m), then by Proposition 3.6.3, a2 b2 (mod m), and so a3 b3 (mod m) and so on. In general, ar br (mod m) for every positive integer r and hence, again by Proposition 3.6.3, tar tbr for every integer t. In particular, if f (x) = a0 + a1 x + + an xn is any polynomial in x with integer coecients, then f (a) = a0 + a1 a + a2 a2 + + an an a0 + a1 b + a2 b2 + + an bn = f (b)(mod m). We state this result as a theorem. Theorem 3.7.8: If f (x) is a polynomial with integer coecients, and a b(mod m), then f (a) f (b) (mod m).
3.8
Linear Congruences and Chinese Remainder Theorem
Let f (x) be a polynomial with integer coecients. By a solution of the polynomial congruence f (x) 0 ( mod m), (3.16)
we mean an integer x0 with f (x0 ) 0 (mod m). If x0 y0 (mod m), by Theorem 3.7.8, f (x0 ) f (y0 ) (mod m), and hence y0 is also a solution of the congruence (3.16). Hence, when we speak of all the solutions of (3.16), we consider congruent solutions as forming a class. Hence by the number of solutions of a polynomial congruence, we mean the number of distinct congruence classes of solutions. Equivalently, it is the number of incongruent
137
solutions modulo m of the congruence. Since any set of incongruent numbers modulo m is of cardinality at most m, the number of solutions of any polynomial congruence modulo m is at most m. The congruence (3.16) is linear if f (x) is a linear polynomial. Hence a linear congruence is of the form ax b ( mod m) (3.17)
It is not always necessary that a congruence modulo m has m solutions. In fact, a congruence modulo m may have no solution or less than m solutions. For instance, the congruence 2x 1 (mod 6) has no solution since 2x 1 is an odd integer and hence cannot be a multiple of 6. The congruence x3 1 ( mod 7) has exactly 3 solutions given by x 1, 2, 4 (mod 7). Theorem 3.8.1: Let (a, m) = 1 and b an integer. Then the linear congruence ax b ( mod m) has exactly one solution. Proof. (See also Corollory 3.3.5.) The numbers 1, 2, . . . , m form a complete residue system modulo m. Hence, as (a, m) = 1, the numbers a 1, a 2, , a m also form a complete residue system modulo m. Now any integer is congruent modulo m to a unique integer in a complete residue system modulo m (by denition of a complete residue system). Hence b is congruent modulo m to a unique a i, 1 i m. Thus there exists a unique x {1, 2, . . . m} such that ax b ( mod m) (3.18)
138
If (a, m) = 1, taking b = 1 in Theorem 3.8.1, we see that there exists a unique x in {1, 2, . . . m} such that ax 1 ( mod m) This unique x is called the reciprocal of a modulo m. We have seen in Theorem 3.8.1 that if (a, m) = 1, the congruence has exactly one solution. What happens if (a, m) = d? Theorem 3.8.2: Let (a, m) = d. Then the congruence ax b ( mod m) (3.19)
has a solution i d | b. If d | b, the congruence has exactly d solutions. The d solutions are given by x0 , x0 + m/d, x0 + 2m/d, . . . , x0 + (d 1)m/d, where x0 is the unique solution in {1, 2, . . . , m/d} of the congruence a b x d d mod m . d (3.20)
Proof. Suppose x0 is a solution of the congruence (3.19). Then ax0 = b + km, k Z. As d | a and d | m, d | b. Conversely, if d | b, let b = db0 . Further, if a = da0 and m = dm0 . Then the congruence (3.19) becomes a0 dx db0 ( mod dm0 ) and therefore a0 x b0 ( mod m0 ) (3.21)
139
where (a0 , m0 ) = 1. But the latter has a unique solution x0 {1, 2, . . . m0 = m/d}. Hence a0 x0 b0 (mod m0 ). So x0 is also a solution of (3.19). Assume now that d | b. Let y be any solution of (3.19). Then ay b (mod m). Also ax0 b (mod m). Hence ay ax0 (mod m) so that a0 dy a0 dx0 (mod (m/d)d). Hence a0 y a0 x0 (mod (m/d)). As d = (a, m), (a0 , m/d) = 1. So by Proposition 3.6.4, y x0 (mod (m/d)) and so y = x0 + k (m/d) for some integer k . But k r (mod d) for some r, 0 r < d. This gives k m rm (mod m). Thus x0 + k (m/d) x0 + r(m/d) ( d d mod m), 0 r < d and so y is congruent modulo m to one of the numbers in (3.20).
Chinese Remainder Theorem

Suppose there are more than one linear congruences. In general, they need not possess a common solution. (In fact, as seen earlier, even a single linear congruence may not have a solution.) The Chinese Remainder Theorem ensures that if the moduli of the linear congruences are pairwise coprime, then the simultaneous congruences all have a common solution. To start with, consider congruences of the form x bi (mod mi ). Theorem 3.8.3: Let m1 , . . . , mr be positive integers that are pairwise coprime, that is, (mi , mj ) = 1 whenever i = j . Let b1 , . . . , br be arbitrary integers. Then the system of congruences x b1 ( mod m1 ) . . .
Chapter 3 Basics of Number Theory x br ( mod mr ) has exactly one solution modulo M = m1 mr . Proof. Let Mi = M/mi , 1 i r. Then, by hypothesis,
r
140
(Mi , mi ) =
k=1 k =i
mk , mi = 1.
Hence each Mi has a unique reciprocal Mi modulo mi , 1 i r. Let

x = b 1 M1 M1 + + b r Mr M r .
(3.22)
Now mi divides each Mj , j = i. Hence, taking modulo mi on both sides of (3.22), we get x bi Mi Mi ( mod mi ) bi ( mod mi ) as Mi Mi 1 ( mod mi ). Hence x is a common solution of all the r congruences. We now show that x is unique modulo M . In fact, if y is another common solution, we have y bi ( mod mi ), and, therefore, x y ( mod mi ), 1 i r. 1 i r,
This means, as the mi s are pairwise coprime, that x y ( mod M ). We now present the general form of the Chinese Remainder Theorem.
Chapter 3 Basics of Number Theory Theorem 3.8.4 (Chinese Remainder Theorem):
141
Let m1 , . . . , mr be positive integers that are pairwise coprime. Let b1 , . . . , br be arbitrary integers and let integers a1 , . . . , ar satisfy (ai , mi ) = 1, 1 i r. Then the system of congruences a1 x b1 ( mod m1 ) . . .
ar x br ( mod mr ) has exactly one solution modulo M = m1 m2 mr . Proof. As (ai , mi ) = 1, ai has a unique reciprocal ai modulo mi so that ai ai 1 (mod mi ). Then the congruence ai x bi (mod mi ) is equivalent to ai ai x ai bi (mod mi ), that is, to x ai bi (mod mi ), 1 i r. By Theorem 3.8.3, these congruences have a common unique solution x modulo M = m1 mr . Because of the equivalence of the two sets of congruences, x is a common solution to the given set of r congruences as well.
3.9
Lattice Points Visible from the Origin
A lattice point of the plane is a point both of whose cartesian coordinates (with reference to a pair of rectangular axes) are integers. For example, (2, 3) is a lattice point while (2.5, 3) is not. A lattice point (a, b) is said to be visible from another lattice point (a , b ) if the line segment joining (a , b ) with (a, b) contains no other lattice point. In other words, it means that there is no lattice point that obstructs the view of (a, b) from (a , b ). It is clear that (1, 0) and (0, 1) are the only lattice points on the coordinate
142
axes visible from the origin. Further, the point (2, 3) is visible from the origin, but (2, 2) is not (See Figure 3.1). Hence we consider here lattice points (a, b) not on the coordinate axes but visible from the origin. Without loss of generality, we may assume that a 1 and b 1. y y (2,3) (2,2) (1,1) O x O Figure 3.2:
(a, b) (a , b ) x
Figure 3.1: Lattice points visible from the origin
Lemma 3.9.1: The lattice point (a, b) (not belonging to any of the coordinate axes) is visible from the origin i (a, b) = 1. Proof. As mentioned earlier, assume without loss of generality, that a 1 and b 1. Similar argument applies in the other cases. Suppose (a, b) = 1. Then (a, b) must be visible from the origin. If not, there exists a lattice point (a , b ) with a < a and b < b in the segment b b joining (0, 0) with (a, b). (See Figure 3.2) Then = (= slope of the line a a joining (0, 0) with (a, b)) so that ba = b a. Now a | b a and so a | ba . But (a, b) = 1 and hence by Proposition 3.3.2, a | a . But this is a contradiction since a < a. Next assume that (a, b) = d > 1. Then a = da , b = db for positive
143
integers a , b . Then the lattice point (a , b ) lies on the segment joining (0, 0) with (a, b), and since a < a and b < b, (a, b) is not visible from the origin. Corollary 3.9.2: The lattice point (a, b) is visible from the lattice point (c, d) i (a c, b d) = 1. Proof. Shift the origin to (c, d) through parallel axes. Then, the new origin is (c, d) and the new coordinates of the original point (a, b) with respect to the new axes are (a c, b d). Now apply Lemma 3.9.1. We now give an application of the Chinese remainder theorem to the set of lattice points visible from the origin. Theorem 3.9.3: The set of lattice points visible from the origin contains arbitrarily large square gaps. That is, given any positive integer k , there exists a lattice point (a, b) such that none of the lattice points (a + r, b + s), is visible from the origin. 1 r k, 1 s k,
Proof. Let {p1 , p2 , . . .} be the sequence of primes. Given the positive integer k , construct a k by k matrix M whose rst row is the sequence of rst k primes p1 , p2 , . . . , pk , the second row is the sequence of next k primes,
Chapter 3 Basics of Number Theory y (a + t, b + k ) (a + k, b + k )
144
(a + r, b + s) (a, b)
(a + k, b + t) x
Figure 3.3: Lattice points visible from (a, b) namely pk+1 , . . . , p2k , and so on. p p2 1 M : pk+1 pk+2 . . . . . .
. . . ps . . . pk . . . p k +s ... . . .
. . . p2k . . .
Let mi (resp. Mi ) be the product of the k primes in the i-th row (resp. column) of M . Then for i = j , (mi , mj ) = 1 and (Mi , Mj ) = 1 because in the products mi and mj (resp. Mi and Mj ), there is no repetition of any prime. Now by Chinese Remainder Theorem, the set of congruences x 1 ( mod m1 ) x 2 ( mod m2 ) . . .
x k ( mod mk )
has a unique common solution a modulo m1 mk . Similarly, the system y 1 ( mod M1 ) y 2 ( mod M2 )
Chapter 3 Basics of Number Theory . . . y k ( mod Mk )
145
has a unique common solution b modulo M1 Mk . Then a r (mod mr ), and b s (mod Ms ), 1 r, s k . Hence a + r is divisible by the product of all the primes in the r-th row of M , and similarly b + s is divisible by the product of all the primes in the s-th column of M . Hence the prime common to the r-th row and s-th column of M divides both a + r and b + s. In other words (a + r, b + s) = 1. So by Lemma 3.9.1, the lattice point (a + r, b + s) is not visible from the origin. Now any lattice point inside the square is of the form (a + r, b + s), 0 < r < k , 0 < s < k . (For 1 r k 1, 1 s k 1, we get lattice points inside the square while for r = k , or s = k , we get lattice points on the boundary of the square.)
3.10
Exercises
1. If a is prime to m, show that 0 a, 1 a, 2 a, . . . , (m 1) a form a complete system of residues mod m. Hence show that for any integer b and for (a, m) = 1, the set {b, b + a, b + 2a, . . . , b + (m 1)a} forms a complete system of residues modulo m. n(n). 2. Show that the sum of the numbers less than n and prime to n is 1 2 3. Prove that 18! + 1 is divisible by 437. 4. If p and q are distinct primes, show that pq1 + q p1 1 is divisible by pq .
146
5. If p is a prime, show that 2(p 3)! + 1 0 (mod p). [Hint: (p 1)! = (p 3)!(p 2)(p 1) 2(p 3)!]. 6. Solve: 5x 2 (mod 6). [Hint: See Theorem 3.8.1]. 7. Solve: 3x 2 (mod 6). [Hint: Apply Theorem 3.8.2]. 8. Solve: 5x 10 (mod 715). [Hint: Apply Chinese Remainder Theorem]. 9. Solve the simultaneous congruences: (i) x 1 (mod 3), x 2 (mod 4); 2x 1 (mod 5). (ii) 2x 1 (mod 3), 3x 1 (mod 4); x 2 (mod 5). 10. Prove the converse of Wilsons theorem, namely, if (n 1)! + 1 0 ( mod n), then n is prime. [Hint: Prove by contradiction].
3.11
Some Arithmetical Functions
Arithmetical functions are real or complex valued functions dened on N, the set of positive integers. We have already come across the arithmetical function (n), the Eulers totient function. In this section, we look at the basic properties of the arithmetical functions (n), the M obius function (n) and the divisor function d(n). Denition 3.11.1: An arithmetical function or a number-theoretical function is a function whose domain is the set of natural numbers and codomain is the set of real or complex numbers.
147
The M obius Function (n)

Denition 3.11.2: The M obius function (n) is dened as follows: (1) = 1;
ar 1 If n > 1, and n = pa 1 pr is the prime factorization of n,
For instance, if n = 5 11 13 = 715, a product of three distinct primes, (n) = (1)3 = 1, while if n = 52 11 13 or n = 73 13, (n) = 0. Most of these arithmetical functions have nice relations connecting n and the divisors of n. Theorem 3.11.3: If n 1, we have (d) =
d|n
(1)r if a1 = a2 = = ar = 1 (n) = (that is, if n is a product of r distinct primes) 0 otherwise (that is, n has a square factor > 1).
(Recall that for any real number x, x stands for the oor of x, that is, the greatest integer not greater than x. (For example, 15 = 7). ) 2 Proof. If n = 1, (1) is, by denition, equal to 1 and hence the relation (3.23) is valid. Now assume that n > 1 and that p1 , . . . , pr are the distinct
ar 1 prime factors of n. Then any divisor of n is of the form pa 1 pr , where
1 if n = 1 1 = n 0 if n > 1.
(3.23)
148
each ai 0 and hence the divisors of n for which the -function has nonzero
r 1 values are the numbers in the set {p 1 pr : i = 0 or 1, 1 i r } =
{1; p1 , . . . , pr ; p1 p2 , p1 p3 , . . . , pr1 pr ; . . . ; p1 p2 pr }. Now (1) = 1; (pi ) = (1)1 = 1; (pi pj ) = (1)2 = 1; (pi pj pk ) = (1)3 = 1 and so on. Further the number of terms of the form pi is so on. Hence if n > 1, (d) = 1 n n n + (1)r + r 2 1
n 1
, of the form pi pj is
n 2
and
d|n
= (1 1)r = 0
A relation connecting and

Theorem 3.11.4: If n 1, we have (n) =
d|n
(d)
n d
(3.24)
Proof. If (n, k ) = 1, then
1 (n, k)
1 1
= 1, while if (n, k ) > 1,
1 (n, k)
a positive number less than 1 = 0. Hence

n
(n) =
k=1
1 . (n, k )
Now replacing n by (n, k ) in Theorem 3.11.3, we get (d) =

d|(n, k)
1 . (n, k )
Hence
n
(n) =
k=1 d|(n, k)
(d)

n
149 (3.25)
=
k=1
d|n
(d).
d|k
For a xed divisor d if n, we must sum over all those k in the range 1 k n which are multiples of d. Hence if we take k = qd, then 1 q n/d. Therefore (3.25) reduces to
n/d
(n) =
d|n q =1
(d)
n/d
=
d|n
(d)
q =1
1=
d|n
(d)
n d
Theorem 3.11.5: If n 1, we have

d|n
(d) = n.
Proof. For each divisor d of n, let A(d) denote those numbers k , 1 k n, such that (k, n) = d. Clearly, the sets A(d) are pairwise disjoint and their union is the set {1, 2, . . . , n}. (For example, if n = 6, d = 1, 2, 3 and 6. Moreover A(1) = {k : (k, n) = 1} = {set of numbers n and prime to n} = {1, 5}. Similarly, A(2) = {2, 4}, A(3) = {3}, A(4) = = A5 , and A6 = {6}. Clearly, the sets A(1) to A(6) are pairwise disjoint and their union is the set {1, 2, 3, 4, 5, 6}). Then if |A(d)| denotes the cardinality of the set A(d),
d|n
| A(d) |= n
(3.26)
But (k, n) = d i
k n = 1 and 0 < k n, that is, i 0 < k n . Hence , d d d d , there is a 1 1 correspondence between the elements in A(d) if we set q = k d
, where (q, n/d) = 1. The number of and those integers q satisfying 0 < q n d such q s is (n/d). [Note: If q = n/d, then, (q, n/d) = (n/d, n/d) = n/d = 1
Chapter 3 Basics of Number Theory i d = n. In this case, q = 1 = (1) = (3.26) becomes (n/d) = n.
d|n
150 n . Thus | A(d) |= (n/d) and d
But this is equivalent to of n, so does n/d.
d|n
(d) = n since as d runs through all the divisors
As an example, take n = 12. Then d runs through 1, 2, 3, 4, 6 and 12. Now (1) = (2) = 1, (3) = (4) = (6) = 2, and (12) = 4 and
d|n
(d) =
(1) + (2) + (3) + (4) + (6) + (12) = (1 + 1) + (2 + 2 + 2) + 4 = 12 = n.
A Product Formula for (n)

We now present the well-known product formula for (n) expressing it as a product extended over all the distinct prime factors of n. Theorem 3.11.6: For n 2, we have (n) = n
p|n
1 p
(3.27)
p=a prime
Proof. We use the formula (n) =
d|n
(d) n of Theorem 3.11.4 for the d
proof. Let p1 , . . . , pr be the distinct prime factors of n. Then 1 1 p

r
=
i=1
p|n p=a prime
1 pi
= 1
1 + pi
i=j
1 pi pj
1 + , pi pj pk (3.28)
where, for example, the sum
1 pi pj pk
is formed by taking distinct prime
divisors pi , pj and pk of n. Now, by denition of the -function, (pi ) = 1,
151
(pi pj ) = 1, (pi pj pk ) = 1 and so on. Hence the sum on the right side of (3.28) is equal to 1+
pi
(pi ) (pi pj ) + + = pi pi pj p ,p
i j
d|n
(d) , d
since all the other divisors of n, that is, divisors which are not products of distinct primes, contain a square and hence their -values are zero. Thus n
p|n
1 p
=
d|n
(d)
n = (n) (by Theorem 3.11.4) d
p=a prime
The Euler -function has the following properties. Theorem 3.11.7: (i) (pr ) = pr pr1 for prime p and r 1. (ii) (mn) = (m)(n)(d/(d)), where d = (m, n). (iii) (mn) = (m)(n), if m and n are relatively prime. (iv) a | b implies that (a) | (b). (v) (n) is even for n 3. Moreover, if n has k distinct odd prime factors, then 2k | (n). Proof. (i) By the product formula, (pr ) = pr 1 (ii) We have (mn) = mn
p|mn
1 p
= p r p r 1 .
1 p
p=a prime
. If p is a prime that divides
mn, then p divides either m or n. But then there may be primes which divide both m and n and these are precisely the prime factors of (m, n).
152
Hence if we look at the primes that divide both m and n separately, the primes p that divide (m, n) = d occur twice. Therefore 1 1 p
p|m
1 p
p|n
1
1 p
1 p
p|mn
p|(m, n)
(m) (n) n = m (by the product formula) (d) d d 1 (m)(n) = mn (d) This gives the required result since the term on the left is (iii) If (m, n) = 1, then d in (ii) is 1. Now apply (ii). (iv) If a | b, then every prime divisor of a is a prime divisor of b. Hence a
p|a
1 (mn). mn
1 p
|b
p|b
1 p
. This however means that (a) | (b).
(v) If n 3, either n is a power of 2, say, 2r , r 2, or else n = 2k m, where m 3 is odd. If n = 2r , (n) = 2r1 , an even number. In
ks 1 the other case, by (iii), (n) = (2k )(m). Now if m = pk 1 ps is
the prime factorisation of m, by (iii) (m) =

s i=1
s i=1
i (pk i ) = by (i)
i 1 pk (pi 1). Now each pi 1 is even and hence 2s is a factor of i
(n). There are other properties of the three arithmetical functions described above as also other arithmetical functions not described here. The interested reader can consult [1, 2, 54]. We now present an application of Theorem 3.11.5.
153
An Application
We prove that (1, 1) (1, 2) . . . (1, n) (2, 1) (2, 2) . . . (2, n) = (1)(2) (n). . . . ... . . . . . . (n, 1) (n, 2) . . . (n, n) Proof. Let D be the diagonal matrix (1) 0 0 (2) . . . . . . 0
Then det D = (1)(2) (n). Dene the n by n matrix A = (aij ) by 1 if i | j aij = 0 otherwise. entries equal to 1. Hence det A = 1 = det At . Set S = At DA. Then det S = det At det D det A = 1 ((1)(2) (n)) 1 = (1)(2) (n).
0 ... 0 ... . . . (n)
...
Then A is an upper triangular matrix (See Denition ??) with all diagonal
We now show that S = (sij ), where sij = (i, j ). This would prove our statement.
154
Now At = (bij ), where bij = aji . Hence if D is the matrix (d ), then the (i, j )-th entry of S is given by:
n n
sij =
=1 =1
bi d aj .
Now d = 0 if = , and d = () for each . Therefore

n
sij =
=1 n
bi d aj bi aj ()
=1 n
= =
=1
ai aj ()
Now by denition ai = 0 i i and ai = 1 if | i. Hence the nonzero terms of the last sum are given by those that divide i as well as j . Now | i and | j i | (i, j ). Thus sij = is, by Theorem 3.11.5, (i, j ). (). But the sum on the right
|(i, j )
3.12
Exercises
1. Find those n for which (n) | n. 2. An arithmetical function f is called multiplicative if it is not identically zero and f (mn) = f (m)f (n) whenever (m, n) = 1. It is completely multiplicative if f (mn) = f (m)f (n) for all positive integers m and n. Prove the following: (i) is multiplicative but not completely multiplicative.
Chapter 3 Basics of Number Theory (ii) is multiplicative but not completely multiplicative.
155
(iii) d(n), the number of positive divisors of n is not even multiplicative. (iv) If f is multiplicative, then prove that (d)f (d) =
d|n
p|n
1 p
p= a prime
3. Prove that the sum of the positive integers less than n and prime to n 1 is n(n). 2 4. Let (n) denote the sum of the divisors of n. Prove that (n) is multiar r plicative. Hence prove that if n = pa 1 pr is the prime factorization
of n, then (n) =
i=1
i +1 1 pa i . pi 1
3.13
The big O notation
The big O notation is used mainly to express an upper bound for a given arithmetical function in terms of a another simpler arithmetical function. Denition 3.13.1: Let f : N C be an arithmetical function. Then f (n) is O(g (n)) (read big O of g (n)), where g (n) is another arithmetical function provided that there exists a constant K > 0 such that |f (n)| K |g (n)| for all n N. More generally, we have the following denition for any real-valued function.
Chapter 3 Basics of Number Theory Denition 3.13.2:
156
Let f : R C be a real valued function. Then f (x) = O g (x) , where g : R C is another function if there exists a constant K > 0 such that |f (x)| K |g (x)| for each x in R. An equivalent formulation of Denition 3.13.1 is the following. Denition 3.13.3: Let f : N C be an arithmetical function. Then f (n) is O(g (n)), where g (n) is another arithmetical function if there exists a constant K > 0 such that |f (n)| K |g (n)| for all n n0 , for some positive integer n0 (3.29)
Clearly, Denition 3.13.1 implies Denition 3.13.3. To prove the converse, assume that (3.29) holds. Choose positive numbers c1 , c2 , . . . , cn0 1 such that |f (1)| < c1 |g (1)|, |f (2)| < c2 |g (2)| . . . , |f (n 1)| < cn1 |g (n 1)|. Let K0 =max (c1 , c2 , . . . , cn0 1 , K ). Then |f (n)| K0 |g (n)| for each n N. This is precisely Denition 3.13.1. The time complexity of an algorithm is the number of bit operations required to execute the algorithm. If there is just one input, and n is the size of the input, the time complexity is a function T (n) of n. In order to see that T (n) is not very unwieldy, usually, we try to express T (n) = O(g (n)), where g (n) is a known less complicated function of n. The most ideal situation is where g (n) is a polynomial in n. Such an algorithm is known as a polynomial
157
time algorithm. We now give the denition of a polynomialtime algorithm where there are, not just one, but k inputs. Denition 3.13.4: Let n1 , . . . , nr be positive integers and let ni be a ki -bit integer (so that the size of ni is ki ), 1 i r. An algorithm to perform a computation involving n1 , . . . , nr is said to be a polynomialtime algorithm if there exist nonnegative integers m1 , . . . mr such that the number of bit operations required to perform
m1 mr the algorithm is the O(k1 . . . kr ).
Recall that the size of a positive integer is the number of bits in it. For instance, 8 = (1000) and 9 = (1001). So both 8 and 9 are of size 4. In fact all numbers n such that 2k1 n < 2k are k -bits. Taking logarithms with respect to base 2, we get k 1 log2 n < k and hence k 1 log2 n < k , so that log2 n = k 1.Thus k = 1 + log2 n and hence k is O(log2 n). Thus we have proved the following result. Theorem 3.13.5: The size of n is O(log2 n). Note that in writing O(log n), the base of the logarithm is immaterial. For, if the base is b, then any number that is O(log2 n) is O(logb n) and vice verse. This is because log2 n = logb n, log2 b, and log2 b can be absorbed in the constant K of Denition 3.13.1. Example 3.13.6: Let g (n) be a polynomial of degree t. Then g (n) is O(nt ).
Chapter 3 Basics of Number Theory Proof. Let g (n) = a0 nt + a1 nt1 + + at , ai R. Then |g (n)| |a0 |nt + |a1 |nt1 + . . . + |at | nt (|a0 | + |a1 | + . . . + |at |)
t
158
=Knt ,
where K =
i=0
|ai |.
Thus g (n) is O(nt ). We now present two examples of a polynomial time algorithm. We have already described (see Section 3.3) Euclids method of computing the gcd of two positive integers a and b. Theorem 3.13.7: Euclids algorithm is a polynomial time algorithm. Proof. We show that Euclids algorithm of computing the gcd (a, b), a > b, can be performed in time O(log3 a) . Adopting the same notation as in (3.5), we have rj = qj +2 rj +1 + rj +2 , Now the fact that qj +2 1 gives, rj rj +1 + rj +2 > 2rj +2 , Hence rj +2 <
1 r 2 j
0 rj +2 < rj +1 .
for each j . This means that the remainder in every
other step in the Euclidean algorithm is less than half of the original remainder. Hence if a = O(2k ), then are at most k steps in the Euclidean
159
algorithm. Now, how about the number of arithmetic operations in each step? In equation (3.5), the number rj is divided by rj +1 and the remainder rj +2 is computed. Since both rj and rj +1 are numbers less than a they are O(log a). Hence equation (3.5), involves O(log2 a) bit operations. Since there are k = O(log a) such steps, the total number of bit operations is (O(log3 a)). Next we show that the modular exponentiation ac (mod m) for positive integers a, c and m can be performed in polynomial time. Here we can take without loss of generality that a < m (Because if a a (mod m), then ac (a )c (mod m), where a can be taken to be < m). We now show that ac (mod m) can be computed in O(log a log2 m) bit operations. Note that O(log a) is O(log m), Write c in binary. Let c = (bk1 bk2 b1 b0 ) in the binary scale. Then c = bk1 2k1 + bk2 2k2 + + b0 20 , and therefore ac = abk1 2 a bk 2 2
k 2 k 1
ab1 2 ab0 , where each bi = 0 or 1. We now compute ac (mod m)
recursively by reducing the number computed at each step by mod m. Set y 0 = a bk 1 = a

2 bk 2 y1 = y0 a = abk1 2 abk2 2 bk 3 y2 = y1 a = a bk 1 2 a bk 2 2 a bk 3
2
. . .
2 bki2 yi+1 = yi a = a bk 1 2 a bk 2 2
i i1
. . .
2 bk k yk 1 = yk = a bk 1 2 2 a
k 1
abki2
k 2
a bk 2 2
a b0
Chapter 3 Basics of Number Theory = a ( bk 1 2

k 1 +b k2 ++b ) 0 k 2 2
160 = ac ( mod m).
There are k 1 steps in the algorithm. Note that yi+1 is computed by squaring yi and multiplying the resulting number by 1 if bki2 = 0 or else multiplying the resulting number by a if bki2 = 1. Now yi (mod m) being a O(log m)
2 number, to compute yi , we make O(log2 m) = O(t2 ) where t = O(log2 m) 2 bit operations. yi being t-bit, yi is a 2t or (2t + 1) bit number and so 2 it is also a O(t)-bit number. Now we reduce yi modulo m, that is, we 2 divide the O(t) number yi by the O(t) number m. Hence this requires an
additional O(t2 ) bit operations. Thus in all we have performed until now O(t2 ) + O(t2 ) bit operations, that is O(t2 ) bit operations. Having computed
2 yi (mod m), we next multiply it by a0 or a1 . As a is O(log2 m) = O(t),
this requires O(t2 ) bit operations. Thus in all, computation of yi+1 from yi requires O(t2 ) bit operations. But then there are k 1 = O(log2 c) steps in the algorithm. Thus the number of bit operations in the computation of
2 ac (mod m) is O(log2 c log2 2 m) = O (kt ). Thus the algorithm is a polynomial
time algorithm. Next, we give an algorithm that is not a polynomialtime algorithm. (Sieve of Eratosthenes)
Chapter 4 Mathematical Logic

The study of logic can be traced back to the ancient Greek philosopher Aristotle (384322 B.C ). Modern logic started seriously in mid-19th century mainly due to the British mathematicians George Boole and Augustus de Morgan. The German mathematician and philosopher Gottlob Frege (1848 1925) is widely regarded as the founder of modern mathematical logic. Logic is implicit in every form of common reasoning. It is concerned with the relationships between language (syntax), reasoning (deduction and computation) and meaning (semantics). A simple and popular denition of logic is that it is the analysis of the methods of reasoning. In the study of these methods, logic is concerned with the form of arguments rather than the contents or the meanings associated with the statements. To illustrate this point, consider the following arguments: (a) All men are mortal. Socrates is a man. Therefore Socrates is a mortal. (b) All cyclones are devastating. Bettie is a cyclone. Therefore Bettie is devastating. 161
162
Both (a) and (b) have the same form : All A are B ; x is an A; therefore x is a B . The truth or the falsity of the premise and the conclusion is not the primary concern. Again, consider the following pattern of argument: The program gave incorrect output because of incorrect input or a bug. The input has been validated to be correct. Therefore, the incorrect output is due to a bug. We can extract the pattern of reasoning: A occurs due to B or due to C ; B is shown to be absent; therefore, A has occurred due to C . In general, whether a given set of premises logically lead to a conclusion is of interest. Logic provides the formal basis to the theory of Computer Science (e.g., Computability, Decidability, Complexity and Automata Theory) and Mathematics (e.g., Set Theory). The classical theory of computation has it origins in the works of logicians such as G odel, Turing, Church, Kleene, Post and others in the 1930s. In Computer Science, the methods employed (primarily, programming activity) in its study are themselves rooted in logic. Digital logic is at the heart of the operational functions in all modern digital computers. Logical techniques have been successful in understanding and formalizing algorithms, program development, program specication. and program verication. More practical examples can be cited. Predicate Calculus has been successfully employed in the design of relational database query languages. The programming language PROLOG is a practical realization of programming with logic. Also, relational databases can be extended with logical rules to give deductive databases. The focus of the present chapter is on the rst principles of propositional calculus and predicate calculus (together, they comprise rst-order logic).
Chapter 4 Mathematical Logic We start with informal ideas in the next section.
163
4.1
Preliminaries
In simple terms, an assertion can be dened as a statement. Consider the following examples: This topic is very interesting I am writing this sentence using English alphabets x + y < z A proposition is an assertion which is either true or false but not both at any time. Thus the following statements are (mathematical) propositions: 2 is an even number 52! is always less than 100! If x, y and z are the sides of a triangle, then x + y = z An assertion need not be always a proposition. This can be illustrated by the following two examples: (i) Let C be any input-free computational procedure. Assume that there is a decision procedure HALT( ) such that HALT(C) returns true if C halts and returns false otherwise. Now consider the procedure given
below:
procedure ABSURD; begin if HALT(ABSURD) then while true do print Running
end
164
Next, consider the assertion, procedure ABSURD never halts it is not too dicult to reason that this assertion cannot be assigned a truth value consistent with the behaviour of procedure ABSURD. In this case the absurdity of procedure ABSURD arises because it is assumed that HALT( ) exists with the stated behaviour. (ii) In English, call a word homological if the words meaning applies to itself; otherwise, call it heterological. For instance, the words English, erudite, polysyllabic are all homological because English is an English word, erudite is in the learned peoples vocabulary, and polysyllabic is made of more than one syllable. By a similar reasoning it follows that German, monosyllabic and indecipherable are all heterological. Now consider the following statement: Heterological is heterological. It is not dicult to reason that a truth-value true or false cannot be assigned to the above assertion (consistent with the dened meanings) and hence it is not a proposition. In the subsequent discussions we are mostly concerned with dierent propositions and their truth values (the values can be true denoted by or false denoted by ) rather than their actual meanings. In such cases it is convenient to denote the propositions simply by letters of English.. Thus we can speak of a propositional variable p whose truth value is True i.e., . Using dierent types of logical connectives, we form compound statements from simple statements. We say that atomic statements are those that have no connectives. Logical connectives (or operators) are precisely
165
dened by specifying appropriate truth tables which dene the value of a compound statement, given the values for the propositional variables. The basic connectives are (read not, indicating negation), (read and, indicating conjunction) and (read or, indicating disjunction) are dened by the following truth tables: p p p q pq p q pq
The logical connectives (called exclusive-or), (called implication) and (called biconditional) are dened by the following truth tables: p q pq p q pq p q pq
value .
The truth value of p q is if and only if exactly one of p or q has a truth In the statement, p q we say that p is the premise, hypothesis or antecedent and we say q is the conclusion or consequence. To understand the meaning of p q , we rst remark that, no relationship is to be assumed between p and q , unlike in ordinary reasoning using the verb implies. Let p denote I have money and let q denote I own a car. Then, it appears that the implication, I have money implies I dont own a car is false if we
166
realize that money is sucient to buy and therefore own a car. By a similar reasoning, the implication, I have money implies I own a car appears to be true. The implication, I dont have money implies I dont own a car appears to be not a false statement. The implication, I dont have money implies I own a car is not clear. If money is a necessary prerequisite to buy, and therefore own a car, then this last implication is false. But since p and q are unrelated, we can, in a relaxed manner, reason that in the absence of money also, owning a car may be possible. We make this allowance and take the implication to be true. If p q is true then p is said to be a stronger assertion than q . For example, consider the implications, (x is a positive integer) (x is an integer) ( ABC is equilateral) ( ABC is isosceles). It certainly follows that the premise is a stronger assertion. The assertion, ( ABC is equilateral) ( ABC has two equal angles) can be made stronger in two dierent ways as given below: ( ABC is isosceles) ( ABC has two equal angles) ( ABC is equilateral) ( ABC has three equal angles). This means that we can strengthen p q by making q stronger or p weaker. We note that the expression p q is exactly equivalent to the expression p q in the sense that both have identical truth table. Given the statement p q we say that the converse statement is q p while the contrapositive statement is q p. We can verify using the truth table that if p q
167
then we have q p. This can be illustrated by the following examples:
Example 4.1.1: f (x) = x2 f (x) = 2x has the contrapositive form f (x) = 2x f (x) = x2 Example 4.1.2: a 0 a is real has the contrapositive form a is not real a < 0. The biconditional of p and q has the truth-value if p and q both have the truth-value or both have the truth-value ; otherwise, the biconditional has truth-value .
4.2
Fully Parenthesized Propositions and their Truth Values
Using the dierent logical connectives introduced above, given any set of prepositional variables, we can form meaningful fully parenthesized propositions or formulas . Below, we rst dene the rules which describe the syntactic structure of such formulas: (i) T and F are constant propositions. (ii) Assertions denoted by small letters such as p, q etc., are atomic propositions or formulas (which are dened to take values or ). (iii) If P is any (atomic or not) proposition then ( P ) is also a proposition.
168
(iv) If P and Q are any propositions then so are (P Q), (P Q), (P Q), (P Q) and (P Q). Note: (i) An atomic formula or the negation of an atomic formula is called a literal. (ii) When there is no confusion, we relax the above rules and we will denote (P ) simply by P , (R S ) simply by R S etc. (iii) If one formula is a substring of another, we say that it is a subformula of the otheran atomic subformula if it is atomic e.g., if R = (A B ) ( B C ) , then trivially R itself is a subformula. Also, (A B ), ( B C ), A, B , B and C are subformulas. A, B and C are atomic subformulas. The rules given above specify the syntax for propositions formed using the dierent logical connectives. The meaning or semantics of such formulas is given by specifying how to systematically evaluate any fully parenthesized proposition. The following cases arise in evaluating a fully parenthesized proposition: Case 1. The value of the proposition T is ; the value of the proposition F is . Case 2. The values of ( P ), (P Q), (P Q), (P Q), (P Q) and (P Q), when the values of P and Q are specied, are determined by using the corresponding operators truth table (described above). In the truth tables above, we allow p to be any formula P and q to be any formula Q.
169
Case 3. The value of a formula with more than one operator is found by repeatedly applying the Case 2 to the subformulas and replacing every subformula by its value until the given proposition is reduced to or . The following examples illustrate the systematic evaluation of propositions. Example 4.2.1: Consider evaluating (T T ) F . We rst substitute the values for T and F and get the proposition ( ) . This is rst reduced to ( ) by evaluating ( ) and then is reduced to . Example 4.2.2: To construct the truth table for the proposition (p q ) (q p) , we proceed in stages by assigning all possible combinations of truth values to the prepositions p and q . We then evaluate the inner propositions and nally determine the truth-value of the given proposition. This is summarized in the following truth table which contains the intermediate results also. p q pq qp (p q) (q p)
We note that the given formula has truth value whenever p and q have identical truth values. Example 4.2.3:
Chapter 4 Mathematical Logic To construct the truth table for the proposition
170 (p q ) ( r) p we
have to consider all combinations of truth values to p, q and r. This results in the following table. p q r (p q) ( r) (p q) ( r) (p q) ( r) p
The last two examples above illustrate that, given a propositional formula we can determine its truth value by considering all possible combinations of truth values to its constituent atomic variables. On the other hand given the propositional variables (such as p, q , r, etc.), for each possible combination of truth values to these variables we can dene a functional value by assigning a corresponding truth value (conventionally denoted in Computer Science by 0 or 1) to the function. This corresponds to the denition of a Boolean function. Denition 4.2.4: A Boolean function is a function f : {0, 1}n {0, 1}, where n 1.
Thus a truth table can be considered as a tabular representation of a Boolean function and every propositional formula denes a Boolean function
171
by its truth table. The notion of assigning a combination of truth values and to atomic variables is captured by a truth-assignment, A dened as follows: Denition 4.2.5: A truth-assignment A is a function from the set of atomic variables to the set {, } of truth values. A given proposition is said to be well-dened with respect to a truthassignment if each atomic variable in the proposition is associated with either the value or the value . Thus any well-dened proposition can be evaluated by the technique described in the above examples. On the other hand, if A is a truth-assignment and P is a formula, we say A is appropriate to P if each atomic variable of P is in the domain of A. Note that we can extend the idea of Awe have already done this in the evaluation of propositions considered above under cases 1, 2 and 3 above. We repeat this in a more formal way in the following denition. Denition 4.2.6: Let A be any function from the set of atomic variables to {, }. Then, we extend the notion of A as follows:
1.
A(R S ) = if A(R) = otherwise.
= or
A(S ) =
2.
A(R S ) = if A(R) = otherwise.
= and A(S ) =
Chapter 4 Mathematical Logic 3. A( R) = if A(R) = otherwise. 4. A(R S )= if A(R) = otherwise. 5. A(R S )= if A(R) = A(S ) = otherwise. 6. A(R S ) = if A(R) = A(S ) = otherwise. = or A(S ) = =
172
4.3
Validity, Satisability and Related Concepts
. Let P be a given formula and let A be a truth-assignment appropriate to P . If A(P ) = then we say that A veries P ; if A(P ) = then we say that A falsies P . Also if we have a set S = {P1 , P2 , . . . , Pk } of formulas and if A is appropriate to each Pi (1 i n) then we say that A veries S if A(Pi ) = for all i. If A veries a formula or a set of formulas then A is said to be a model for that formula or set of formulas. Note that any truth-assignment is a model for the empty set. A formula or a set of formulas is satisable if it has at least one model; otherwise the formula or set of formulas is unsatisable. We remark that the general problem of determining whether formulas in propositional calculus are satisable is a tough problemno one has found and ecient algorithm
173
for this problem but no one has proved that an ecient solution does not exist. In classical complexity theory, the problem is termed NP-Complete. A formula P is valid if every truth-assignment appropriate to P , veries P . We then call P , a tautology or a universally valid formula. Trivially T is a tautology and F is not. It easily follows that the proposition (P P ) is a tautology. It is easy to see that if S and T are equivalent, then (S T ) is a tautology. A tautology can be established by constructing a truth table as the following example shows. Example 4.3.1: To show that ( p) (p q ) is a tautology. We do this by constructing the following truth-table: p q ( p) (p q) p (p q)
If P is a formula and if, for all truth-assignments A appropriate to P , if A(P ) = then P is called a contradiction. Obviously the negation of a contradiction is a tautology. We also say that a statement P (not atomic, in general) tautologically implies a statement Q (not atomic, in general) if and only if (P Q) is a tautology. For example it is easy to verify that (p q ) p and p (p q ) q are tautological implications. The following tautological
Chapter 4 Mathematical Logic implications are well-known: (i) Addition: p (p q ) (ii) Simplication: (p q ) p (iii) Modus Ponens: (iv) Modus Tollens: p (p q ) q (p q ) q p p (p q ) q (p q ) (q p) (p r)
174
(v) Disjunctive Syllogism: (vi) Hypothetical Syllogism:
Consider any formula P with n distinct atomic subformulas. There are only 2n dierent truth-assignments possible to the n atomic.subformulas. It is quite possible that there may be a syntactically dierent (in particular, shorter) formula Q such that under any of the 2n possible truth-assignments, P and Q both evaluate to or both evaluate to . In such a case P and Q are equivalent. As an extreme case, consider the formula ( p) (p q ) of Example 4.3.1 above. This formula was shown to be a tautology and so it is equivalent to T . In general if we show that P is equivalent to a syntactically shorter formula Q, we can construct the truth-table for Q (and hence for P ) with lesser eort. The laws of equivalence, stated below, will be useful in transforming a given P to an equivalent Q. Denition 4.3.2: Two propositional formulas P and Q are said to be equivalent (and we write P Q), if and only if they are assigned the same truth-value by every truth assignment appropriate to both.
175
We note that P Q is a shorthand for the assertions that P and Q are equivalent formulas. On the other hand, the formula (P Q) is not an assertion. It is easy to see that (logical) equivalence between two formulas is an equivalence relation on the set of formulas. It is also easy to see that any two tautologies are equivalent and any two unsatisable formulas are equivalent. This means that two formulas with dierent atomic subformulas can be equivalent. The following laws of equivalence are well-known: i) Idempotence of and ii) Commutativity Laws : (a) P (P P ) and (b) P (P P ) : (a) (P Q) (Q P ), (b) (P Q) (Q P ) and (c) (P Q) (Q P ) iii) Associativity Laws : (a) (P Q) R P (Q R) and (b) (P Q) R P (Q R) iv) Distributive Laws : (a) P (Q R) (P Q) (Q R) and (b) P (Q R) (P Q) (Q R) v) de Morgans Laws : (a) (P Q) ( P Q) and (b) (P Q) ( P Q) vi) vii) viii) ix) x) xi) Law of Negation Law of the excluded Middle Law of Contradiction Law of Implication : ( P ) P : P P T : P P F : (P Q) ( P Q) (a) P P P , (b) P T T ,
Equivalence or Law of Equality: P Q (P Q) (Q P ) Laws of OR-Simplication :
176 (c) P F P and (d) P (P Q) P
xii)
Laws of AND-Simplication
(a) P P P , (b) P T P , (c) P F F and (d) P (P Q) P
xiii) xiv)
Law of Exportation Law of Contrapositivity
: P (Q R) (P Q) R : (P Q) ( Q P )
Obviously, each of the above laws, of the form P Q, can be proved by proved by constructing the truth-tables for P and for Q separately and by checking that they are identical. Example 4.3.3: We show that it is possible to use the laws of equivalence to check if a formula is a tautology. Consider the formula P where, P = (p q ) ( p (q r) ( q r)
Eliminating the main implication (using the law of implication), we get, (p q ) p (q r) Next, using de Morgans law, we get, (p q ) p (q r) ( q r) Application of the law of negation gives, (p q ) p (q r) ( q r) Again, using the laws of implication and negation, we get, ( p q ) p (q r) (q r) ( q r)
177
Using laws of Associativity, Commutativity and OR-simplication, we get, p p q r This last formula, by laws of Excluded Middle and OR-simplication evaluates to T. We therefore conclude that the original formula P is a tautology. Let P Q and let R be any formula involving P ; say, R = (A P ) (C P ) . In the process of the evaluation of R, every occurrence of P will result in a truth-value. In each such occurrence of P in R, the same truth-value will result if any (all) occurrence(s) of P is (are) replaced by Q in R. Thus, in general, if R is the formula that results by replacing some occurrence of P by Q within R, then we will have R R .
4.4
Normal forms
Given the formulas F1 , F2 , . . . , Fn , we write, (a) (b) n i=1 Fi for (F1 F2 . . . Fn ) n i=1 Fi for (F1 F2 . . . Fn )
this is possible because the associativity laws imply that in nested conjunctions or disjunctions the grouping of the subformulas is immaterial. We use the above notations even when n = 1, in which case, both representations simply denote F1 . We call (a) and (b) respectively the disjunction and conjunction of the formulas F1 , F2 , . . . , Fn .
178
4.4.1
Conjunctive and Disjunctive Normal Forms
Denition 4.4.1: A formula is in conjunctive normal form (CNF) if it is a conjunction of disjunctions of literals. Similarly, a formula is in disjunctive normal form (DNF) if it is a disjunction of conjunctions of literals. For example, the formula (a b) ( a c) is in DNF and the formula (a b) (b c a) (a c) is in CNF. The formulas (a b) and (a c) are both in CNF as well as in DNF. (Notation: if F = G, then let F = G; if F is not a a negation of any formula, let F = F ; this ts well with the case if F is atomic say A; then A = A and A = A) Theorem 4.4.2: Every formula has at least one CNF and at least one DNF representation. Furthermore, there is an algorithm that transforms any given formula into a CNF or a DNF as desired. Proof. The proof is by induction on the number of occurrences of the logical connectives in the given formula. A formula with no logical connectives is atomic and is already in CNF or DNF. We now assume that the theorem is true for formulas with k or fewer occurrences of logical connectives. Let F be the formula with (k + 1) occurrences of logical connectives. The following cases arise:
(a) F = G for some formula G with k occurrences of logical connectives.
Chapter 4 Mathematical Logic By induction hypothesis, we can nd both CNF and DNF of G. If
n mi
179
i=1
j =1
Gij
is a CNF of G then by de Morgans Law,

n mi
i=1
j =1
Gij
is a DNF of F . Similarly a CNF of F can be obtained from a DNF of G. (b) F = (G1 G2 ) for some formulas G1 and G2 each of which has k or fewer occurrences of logical connectives. To obtain a DNF of F we simply take the disjunction of a DNF of G1 and DNF of G2 . To obtain a CNF of F , nd CNFs of G1 and G2 , such as
i=1 m n
Hi and
j =1
Ji , where H1 , . . . ,
Hm and J1 , . . . , Jn are disjuncts of literals, then by Distributivity Law

m
i=1
j =1
(Hi Ji )
is equivalent to (G1 G2 ). Each formula (Hi Ji ) is a disjunction of literals, so this formula is a CNF of F. (c) F = (G1 G2 ) for some formulas G1 and G2 . This case is entirely analogous to case (b) above. (d) F = (G1 G2 ). As (G1 G2 ) can be written in the equivalent form G1 G2 . we can apply the reasoning of case (b) (e) F = (G1 G2 ). The proposition can be written as (G1 G2 ) (G2 G1 ). We can apply case (c) now.
180
4.5
Compactness
In Mathematics, any property that can be inferred of a countably innite set, on the basis of nite approximations to that set is called the compactness phenomenon. Compactness is a fundamental property of formulas in propositional calculus which is stated in the following theorem: Theorem 4.5.1: A set of formulas is satisable if and only if each of its nite subsets is satisable. (Alternatively, if a set of formulas is unsatisable, some nite subset of the given set must be unsatisable.) Proof. Given an innite set of formulas which is satisable, it easily follows that any of its nite subsets is satisable. Proving the converse is harder. Now, let S be an innite set of formulas such that each nite subset of S is satisable. Let A1 , A2 , A3 , . . . , An , be the listing of all atomic subformulas in S . For each n 0, denote by sn , the set of all formulas whose atomic formulas are among A1 , A2 , A3 , . . . , An . Obviously in sn there are innitely many formulas. However, it is not too dicult to see that there are only 22 equivalence classes in sn where all the formulas in any one class are all equivalent to one another. Since each nite subset of S is satisable S sn is also satisable for each n. To see this, we choose from sn only one representative from the 22 equivalence classes and any truth-assignment verifying the representative subset veries S sn also. As S sn is satisable for each n = 0, 1, 2, ... there is a truth-assignment An appropriate to sn such that An veries S sn . But, this does not mean that there is a k such that Ak veries S ; in fact, none of the An need even
n n
181
be dened on all the atomic subformulas of S . We, however, claim that from {A0 , A1 , A2 , . . .} we can construct a truth-assignment A that does verify S . For each n 0, this construction species how to build, a set Un of truth-assignments. The construction species: (a) For all n > 0, Un is a subset of Un1 . (b) Un is an innite subset of {An , An+1 , An+2 , . . .}. (c) For 1 n m, any two truth-assignments in Um agree on the truth-value assigned to the atomic formula An . We set U0 = {A0 , A1 , A2 , . . .}this means, for n = 0 (a) through (c) are true. Once Un has been dened, we dene Un+1 as follows: By (b), Un contains Ai for innitely many i n + 1. That is, the set {i | i n + 1 and Ai Un } is innite. This set can be partitioned into two disjoint subsets say J1 and J2 where: J1 = {i | i n + 1 and Ai Un and Ai veries An+1 } and J2 = {i | i n + 1 and Ai Un and Ai does not verify An+1 } At least, one of J1 and J2 must be innite because the union of two nite sets is nite. Let the innite one of these (or any one if both are innite) be J (= J1 or J2 , whichever is innite; = either, if both J1 and J2 are innite) and let Un+1 = {Ai | i J }. Then, Un+1 is denitely a subset of Un ; also, Un+1 contains Ai for innitely many i > n + 1. Further, all the truth-assignments in Un+1 agree on A1 , . . . , An and on An+1 . We now dene a truth-assignment, A : {A1 , A2 , A3 , . . .} {, } as follows: A(Am ) is the common value of B(Am ) for all B Un , n m (such a B(Am ) exists by (a) and (c) ). The truth-assignment A veries S : if any
182
formula F S , then F S sm for some m and so An veries F for all n m. Hence, B veries F for all B Un , if n m; and hence, A which agrees with all these B Un for all n m on formulas in sm also veries F.
4.6
The Resolution Principle in Propositional Calculus
We rst introduce the idea of a clause and a clause set. Consider a CNF formula F = (p q ) ( r s t) . There are other equivalent ways of writing F e.g., (q p) ( r t s) . All such other syntactical forms of F and F itself can therefore be just regarded as a set of set of literals. For example, F is captured by the set {p, q }, {t, s, r} . Formally, a clause is a nite set of literals. Each disjunction of literals corresponds to a clause and each nonempty clause corresponds to one or more disjunction of literals. We also allow the empty set as a clause, in which case, it does not correspond to any formula. We write
for the empty clause.
A clause set is a set of clauses (empty clause allowed), possibly empty and may be innite. Every CNF formula naturally corresponds to a clause set and every nite clause set not containing the empty clause and not itself empty corresponds to one or more formulas in CNF. The empty clause set is not the same as the empty clause, although they are identical when considered as sets. We write to denote the empty clause set. We can carry forward the notion of a truth-assignment A as appropriate to a clause set. If S is a clause set and every atomic formula in S is in the
183
domain of A then we say that A is appropriate to S . The truth-value of any clause in S or the truth-value of S as a whole can be analogously determined by taking the truth-value of a corresponding formula for the clause or the clause set. We also say that A veries a clause if and only if A veries at least one of its members and we say A veries a clause set if and only if A veries each of its members. Example 4.6.1: Let S be the clause set {p, q }, { r} . If a truth-assignment A veries p,
veries q and veries r, then, A also veries S . On the other hand, if A veries all of p, q and r then it follows that A does not verify S . It follows that any truth-assignment A does not verify
because has
no members for A to verify. But, any truth-assignment A veries because does not verify {} for any A. Thus, the dierence between A veries each clause in (of which there is none). On the other hand, A
and is
apparent. We can also say that two given clause sets are equivalent if any truth-assignment appropriate each assigns to both, the same truth-value. It follows that a clause or a clause set may be satisable, unsatisable or a tautology. The empty clause set is a tautology but the empty clause is unsatisable. Any clause set containing the empty clause is unsatisable. The clause set {p}, { p} , although not empty, is also unsatisable. The resolution rule can be illustrated with a simple example. Consider the clause set S = {p, q, r}, {q, r, s} . To verify S , any appropriate
truth-assignment must verify {p, q, r} and must also verify {q, r, s}. This, it is not dicult to reason, leads to the conclusion that the clause set {p, q, s}
184
must also be veried by the same truth- assignment. In fact the clause set {p, q, r}, {q, r, s}, {p, q, s} can be shown to be equivalent to S . We say that the clause set {p, q, s} is a resolvent of the clause sets {p, q, r} and {q, r, s}. Denition 4.6.2: Let C1 and C2 be two clauses. Then the clause D is a resolvent of C1 and C2 if and only if for some literal l we have l C1 and l C2 and D = C1 \{l} C2 \{ l} . Thus a resolvent of two clauses is any clause obtained by striking out a complementary pair of literals, one from each clause and merging the remaining literals into a single clause. Evidently, from C1 and C2 , in general it is possible to produce more than one D.
The Resolution Rule

Let S be a clause set with two or more elements and let D be a resolvent of any two clauses in S . Then S S {D}. Proof. Let A be a truth-assignment appropriate to S . Clearly, if A veries S {D} then A veries S as well. Now, let it be that A veries S and let D = C1 \{l} C2 \{ l} where C1 , C2 S, l C1 and l C2 . As A cannot verify both l and l, in order that A veries both C1 and C2 , it must verify some literal in C1 \{l} C2 \{ l} i.e., some literal in D. Starting with a clause set S , the resolution rule allows us to build a new clause set S by adding all resolvents to S (in the sense that S and S are
185
equivalent). Now we can replace S by S and add new resolvents again and we can repeat this procedure as long as it is possible. We precisely formalize this in the following: Let S be a nite clause set with two or more elements. We dene, AddRes(S ) = S {D | D is a resolvent of two clauses in S }. Evidently, AddRes(S ) S . Now, we let AddRes0 (S ) to be S itself. We can then dene, AddResi+1 (S ) = AddRes (AddRes i (S )) for each i 0. Finally we let, AddRes (S ) = { AddRes i (S )) | i 0}. In other words, AddRes (S ) is the closure of S under the operation AddRes of adding all resolvents of clauses already present. Since each clause in S is nite, there are only a nite number of clauses that can be formed by the atomic formulas appearing in S . Hence only a nite number of resolvents can ever be formed starting from a nite S . So there exists an i > 0 such that, AddResi+1 (S ) =AddResi (S ). Then AddRes (S ) = AddRes i (S ). By induction, it follows that S AddRes (S ). Determining AddRes (S ) by repeated application of the resolution rule can be used to nd out whether a clause set is satisable. In turn, this implies that we can have a computational procedure to determine whether a formula is satisable. This is possible because of the following theorem: Theorem 4.6.3 (The Resolution Theorem): A clause set S is unsatisable if and only if
AddRes(S ).
Proof. Clearly, if AddRes (S ) then AddRes (S ) and so S is unsatisable. To prove the converse, assume that S is unsatisable. By the Compactness
Chapter 4 Mathematical Logic Theorem, some nite subset of S is unsatisable.
186
Let An (n 1) denote all possible atomic formulas appearing in S . Let Cn be the set of all clauses that can be constructed using only A1 , A2 , . . . , An . We denote {} by C0 . Then there is an n > 0 such that S Cn and hence AddRes (S ) Cn is unsatisable. We now claim that for each k = n, n 1, . . . , 0, and for each truthassignment A appropriate to {A1 , A2 , . . . , An } there exists some clause C in AddRes (S ) Ck such that A does not verify C . That is, for every truthassignment that assigns truth-values to the rst k atomic formulas, there is some clause in AddRes (S ) that contains only these atomic formulas and is falsied by the truth-assignment. Since the only clause that can be falsied when k = 0 is
, it follows that AddRes(S ).
We apply induction to prove the above claim. For k = n, the claim is immediate, since AddRes (S ) Cn is unsatisable. Now suppose that the claim holds for some k + 1 but fails for k . Then there is some truth-assignment A appropriate to {A1 , A2 , . . . , Ak } such that A veries AddRes (S ) Ck . Now let A1 and A2 be two truth-assignments that assign the same truth values as A to A1 , A2 , . . . , Ak and are such that A1 veries Ak+1 but A2 falsies Ak+1 . By the induction hypothesis there are clauses C1 , C2 AddRes (S ) Ck+1 such that A1 does verify C1 and A2 does not verify C2 . Now both C1 and C2 must contain Ak+1 as otherwise one of them would be in AddRes (S ) Ck and would be falsied by A as well as by A1 or by A2 also. It is easy to reason that C1 must contain Ak+1 as its member but not Ak+1 while C2 must contain Ak+1 and not Ak+1 . Thus, in any case, A1 must verify C1 or A2 must verify C2 . Then we can produce a
187
resolvent D from C1 and C2 as, D = C1 \{ Ak+1 } C2 \{Ak+1 } . Now, D is in Ck because all occurrences of Ak+1 and Ak+1 are discarded from C1 and C2 in forming D. Also, D is in AddRes (S ) because it is a resolvent of clauses in AddRes (S ). Therefore, D is in AddRes (S ) Ck . Further, A does not verify D; for if it does then either A veries D = C1 \{ Ak+1 } or C2 \{Ak+1 } and then A1 veries C1 or A2 veries C2 . But this is a contradiction because we assumed that A veries AddRes (S ) Ck and here we have got a D AddRes (S ) Ck such that A does not verify D. This completes the induction and therefore the proof. The resolution theorem can be used to determine the satisability of a given formula. The following procedure precisely does this by brute-force: Step 1: Convert the given formula into a CNF. Step 2: From the CNF formula, get the corresponding clause set, say S . Step 3: Compute AddRes1 (S ), AddRes2 (S ), . . . , until AddResi+1 (S ) = AddRes i (S ) for some i = 1. Step 4: If
AddResi(S ) then conclude that S is unsatisable;
otherwise S is satisable. Strictly speaking, not all the intermediate resolvents in AddRes (S ) are really needed to check whether illustrative. Example 4.6.4: Show that the following formula is unsatisable: (A B ) (A C ) (B C ) ( A B ) ( A C ) ( B C )
AddRes(S ).
The following example is
Chapter 4 Mathematical Logic The given formula corresponds to the clause set, {A, B }, {A, C }, {B, C }, { A, B }, { A, C }, { B, C } .
188
Starting with the given clause set, the following self-explanatory tree diagram gives the intermediate resolvents that are needed to show that also a resolvent belonging to AddRes (S ): {A, C } {A, B } {B, C } {A, B } {A, C } {B, C }
is
{C, B }
{B, C }
{C }
{C }
The following exercise gives an important result where the resolution technique holds the key.
4.7
Predicate Calculus Basic Ideas
Formulas in propositional calculus are nite and are limited in expressive power. There is no way of making an assertion in a single formula that covers innite similar cases. For example, even certain simple assertions in Mathematics do not t into the language of propositional calculus. Thus assertions such as, x = 3 , y z and x + y > z are not propositions
189
as truth-values or cannot be assigned to them. However, if integer values are assigned to the variables x, y , z in the above assertions then each assertion becomes a proposition. Similarly it is possible to consider sentences in English where pronouns and improper nouns act as variables, as in the assertions, He is tall and blonde (x is tall and blonde) and She does not smoke (y does not smoke) These assertions can be regarded as templates or patterns, expressing relationships between objectsthese templates are technically called predicates. Using a uniform notation, we can write the above assertions as, EQ(x, 3), GTE(y, z ), SGT(x, y, z ), TNB(x), DOSNSMOKE(y ). We further note that in the above, we implicitly assume that x, y , z are integers in one case whereas in another, we assume x, y to be the persons (or names of persons). In Predicate Calculus, we make general statements about objects in a xed set, called the Universe. In an assertion like P (x, y ), P is called the Predicate sign and x and y are called variables. More technically, P (x, y ) is simply written as P xy . An assertion such as P (x1 , . . . , xn ), where x1 , . . . , xn are variables is said to be an n-place predicate. Note that if P is an n-place predicate constant and values C1 , . . . , Cn from the Universe are assigned to each of the individual variables, the result is a proposition. Predicates are commonly used in control statements in programming languages. For example, consider the statement in a Pascal-like language, if ( x > 3 ) then goto L ; During program execution, when the if- statement is encountered, the current value of x is substituted to determine whether (x > 3) evaluates to true or false (and then the program control is transferred conditionally).
190
Restricting the Universe of discourse to positive integers, now consider the assertions, For all even integers x, we have x > 1 If x is an even integer there exists integers y , z such that x = y + z .
Both the above assertions are true and hence these are propositions. To succinctly express the above, we need two special symbols (read for-all ) and (read there-exists ) respectively known as universal quantier and existential quantier . We can now write the above propositions as, x (x mod 2 = 0) (x > 1) xy z (x mod 2) = 0 (y + z ) = x
Taking mankind as the universe of discourse, if Lxy is used to denote x loves y then the formula, x(Lxx yLxy ) can be interpreted to mean the statement, Anyone who loves himself loves someone. Let M (x) stand for x is a man; let T (y ) stand for y is a truck; and let D(x, y ) stan for x drives y . Consider the following, a formula in predicate calculus: x M (x) y T (y ) D(x, y ) this says, for all x, if x is a man then there exists a y such that y is a truck and x drives y . In other words, it says every man drives at least one truck. The following formula, y T (y ) x M (x) D(x, y ) says that every truck is driven by at least one man. We remark that parentheses add to clarication. For example, the formula T (y ) D(x, y ) can be
191
interpreted in two ways. It can denote T (y ) D(x, y ) which means, y is not a truck and x drives y . Alternately, it can denote, T (y ) D(x, y ) which means, y is not a truck that x drives. More generally, we can make statements about relations that are not explicitly specied. For example, xyP (x, y ) P (y, x) simply states that P is a symmetric relation; the formula x P (x, x) can be interpreted to mean that P is not reexive. Apart from predicate signs and variables, predicate calculus also allows function signs, Thus, if f is a function sign corresponding to a binary function f (x, y ) where x and y are variables (denoting objects in a xed universe), then xyP f xyf yf yx (which may be informally written as) xyP f (x, y ), f y, f (y, x) is a legal formula. Formulas in Predicate Calculus turn out to be true or false depending on the interpretation of predicates and function signs. Thus x P xx is true if and only if P is interpreted as a binary relation that is not reexive. Also, the formula xy zP xy P yz P xz is true if and only if P is interpreted as a transitive relation. In the universe of discourse U , for any predicate P and for any element m U , the formula x P (x) P (m) is always true. In the sequel, we will prefer to use the informal style of writing the formulas to provide some clarity.
4.8
Formulas in Predicate Calculus
We now dene the syntax of formulas in predicate calculus.
192
1. We use P, Q, R, . . . to denote predicate signs; f, g, h . . . to denote function signs; and x, y, z, . . . to denote variables. A 0-place function sign stands for a specic element in the Universewe call these constant signs and denote them by a, b, c, . . . 2. We next dene terms inductively as: (a) Every variable is a term. (b) If f is an n-place function sign and t1 , . . . , tn are terms, then f (t1 , . . . , tn ) is also a term. 3. Atomic formulas are dened as: If P is an n-place predicate sign and t1 , . . . , tn are terms, then P (t1 , . . . , tn ) is an atomic formula. 4. Formulas are dened as follows: (a) Atomic formulas are formulas. (b) If F and G are any formulas, then so are (F G), (F G) and F . (c) If F is any formula and x is a variable then xF and xF are formulas. A subformula of a formula is a substring of that formula which is itself a formula. The matrix of a formula is the result of deleting all occurrences of quantiers and the variables immediately following those occurrences of quantiers. It is easy to see that the result of the process is a formula. As in propositional calculus, we call (F G) the disjunction of F and G; we call (F G), the conjunction of F and G; we call F , the negation of F .
193
Also, we introduce the notions of the conditional and biconditional of two formulas: (F G) is used as an abbreviation for ( F G) and (F G) is used as an abbreviation for (F G) (G F ) . The scope of an occurrence of the negative sign or a quantier in a formula is another syntactic notion. Any occurrence of or or in a formula F refers to or embraces a particular subformula of F . The scope of an occurrence of in F is that uniquely determined subformula G such a that the occurrence of is the leftmost symbol of G. Similarly the scope of an occurrence of or is that unique formula G such that for some variable x, the occurrence of the or is the leftmost symbol of xG or xG. Consider the formula, (xP (x, y )) (xQ(x) x Q(x)). The scope of is P (x, y ). The scope of rst occurrence of is Q(x) and that of the second occurrence of is Q(x). The scope of occurrence of is Q(x).
4.8.1
Free and bound Variables
An occurrence of a variable in a formula is said to be free if it is governed by no quantier (in that formula) containing that variable. Otherwise, the variable is said to be bound. We illustrate this idea in an informal way. Consider the predicate i(x i > 0), where i is specied to be an integer between m and n(m < n) and x is an integer. The predicate is true if both x and m are greater than 0 or if x and n are less than 0. Hence it is equivalent to the predicate, (x > 0 m > 0) (x < 0 n < 0) Thus the truth value of the predicate depend on the values of m, n and x but not on the value of i. It is obvious that the truth value of the predicate does
194
not change if all occurrence of i are replaced by a new variable, say j . The variable i is bound to the quantier in the predicate. The variables m, n and x are free in the predicate. A predicate such as (i > 0)(i(xi > 0)) can be confusing. In such cases we rewrite the predicate to remove the ambiguity. For example, we can rewrite this last predicate as (i > 0) (j (x j ) > 0). More precisely the free variable of a formula are dened inductively as follows: (a) The free variables of an atomic formula are all the variables occurring in it. (b) The free variables of the formulas (F G) or (F G) are the free variables of F and the free variables of G; the free variables of F are the free variables of F (c) The free variables of xF and xF are the free variables of F , except for x (if x happens to be a free variable in F ). When there are no free occurrences of any variable in a formula, the formula is called a closed formula or sentence.
4.9
Interpretation of Formulas of Predicate Calculus
A formula in predicate calculus can be considered to be true or false depending on the interpretation. Any interpretation for a formula must contain sucient information to determine if the formula is true or not. This point can be understood by considering a statement such as all students have
195
passed (may be written as sP (s), where s is a variable denoting a student, P is the predicate have passed). To decide if this statement is true we need to know who the students are i.e., we need to know a universe of discourse. Also, we need to know who has failed. That is we need some type of assignment of the predicate have passed. More generally, we must have interpretations for predicate and functions signs as relations and functions respectively. Also, we may have to interpret some of the variables as particular objects. To do this, we rst dene the notion of a structure.
4.9.1
Structures
A structure A is a pair ([A],IA ). where [A] is any nonempty set called the universe of A and IA is a function whose domain is a set of predicate and function signs. Specically - if P is an n-place predicate sign in the domain of IA , then IA (P ) is an n-ary relation on [A]. - if f is an n-place function sign in the domain of IA , then IA (f ) is a function from [A]n to [A]. We also write P A for IA (P ) and f A for IA (f ). In the case the domain of IA is nite, say {P1 , . . . , Pm , f1 , . . . , fn } we also write A as the (m + n + 1)-tuple:
A A A A ). , . . . , fn , f1 , . . . , Pm ([A], P1
If A is structure and F is a formula such that each predicate letter and function sign of F is assigned a valued by IA , then A is said to be appropriate to F .
Chapter 4 Mathematical Logic Example 4.9.1:
196
Let P be a 2-place predicate and f be a 1-place function sign and let F be the formula xP (x, f (x)). The structure A = ([A], P A , f A ), dened as follows is appropriate to F : [A] = {0, 1, 2, . . .}, fA the set of natural numbers N
P A = {(m, n)| m, n N and m < n} is the successor function i.e., f (n) = n + 1 for each n N.
We regard F as true in the structure A since every number is less than its successor. If we dene a new structure B which is the same as A except that, P A = {(m, n)| m, n N and m > n}, then F is false in B. The formula P (f (x), y ) cannot be regarded as true or false in A or in B without knowing what x and y are. The next subsection formalizes these ideas.
4.9.2
Truth Values of formulas in Predicate Calculus
Let F be a given formula and let A be a structure appropriate to F . Let be a function with a domain that includes all variables of F and with a range that is a subset of [A]. We then say is appropriate to F and A. Let t be a term and G be a formula that can be constructed by using the variables, predicate signs and function signs of F . Now, given F , A and , for each t or G, we dene A(t) in [A] or A(G) in {, } as follows: 1. (a) If x is a variable of F , then A(x) = (x).
Chapter 4 Mathematical Logic (b) If t1 , . . . , tn are terms and f is an n-place function of F , then A(f (t1 , . . . , tn )) = f A (A(t1 ) , . . . , A(tn ) )
197
In other words, A(f (t1 , . . . , tn )) is calculated by rst nding A(t1 ) , . . . , A(tn ) which are members of [A] and then applying to these values, the function, f A : [A]n [A]. We require one or more auxiliary denition. If is as described above, x is a variable of F and a is a member of [A], then we take [x/a] to be the function which is identical to except that (x) = a (whatever (x) might be). We now dene the value of A(G) considering the dierent cases. We let G and H to be formulas that can be constructed by using the variables, predicate signs and functions signs of F 2. (a) If t1 , . . . , tn are terms and P is an n-place predicate sign of F , then, if A(t1 ) , . . . , A(tn ) P A A(P (t1 , . . . , tn )) = otherwise if A(G) = , or A(H ) = (b) A (G H ) = otherwise if A(G) = , and A(H ) = (c) A (G H ) = otherwise if A(G) = (d) A( G) = otherwise if A(G)[x/a] = , for each a [A] (e) A( xG) = otherwise
Chapter 4 Mathematical Logic if A(G)[x/a] = ,
198 for some a [A]
(f)
A( xG) =
This completes the task of evaluating the truth values of formulas in predicative calculus. We write A Example 4.9.2: Let us consider the structure A of Example 4.9.1 and the formula P (x, f (y )). Let be the function from {x, y } to [A] = N such that, (x) = 1 (y ) = 2
otherwise
G if and only if A(G) = .
Now A(y ) = (y ) = 2 and A(f (y )) = f A (A(y )) = f A (2) = 2 + 1 = 3. Next A P (x, f (y ))
= T if and only if A(x) , A(f (y )) P A . That is, if and
only if (1,3) P A which in indeed true. Hence A Example 4.9.3:
P (x, f (y )).
For the structure A and the function as in example 4.9.2, let us consider the formula xP (x, f (x)). By denition, A xP (x, f (x)) of A(x, f (x))[x/a] = for each a [A] = N. Now A P (x, f (x))
[x/a]
= if and only
= if and only if A(x)[x/a] , A(f (x)[x/a] ) P A . xP (x, f (x)).
We see that A(x)[x/a] = a and A(f (x))[x/a] = f A A(f (x))[x/a] = f A (a) = a + 1. Since (a, a + 1) P A for each a N, we conclude A Example 4.9.4: Let L be a 2-place predicate sign and f , a 2-place function sign. Let x and
Chapter 4 Mathematical Logic y be variables. Consider the structure A, where, [A] = N LA = {(m, n)|m < n, f A (m, n) = m + n Let be the function from {x, y } to N such that (x) = 5, (y ) = 2. Consider the formula x yL(x, f (x, y )). Now A x yL(x, f (x, y )) if and only if A yL(x, f (x, y )) i.e., if and only if A L(x, f (x, y )) i.e., if and only if A(x)[x/a][y/b] , A f (x, y ) i.e., if and only if a, f A (A(x)[x/a][y/b] , A(y )[x/a][y/b] ) LA i.e., if and only if a, f A (a, b) LA for each a, b LA for each a, b N LA for each a, b N = for each a, b [A] = N = for each a [A] = m, n N}
199
[x/a]
[x/a][y/b]
[x/a][y/b]
Chapter 4 Mathematical Logic i.e., if and only if (a, a + b) LA for each a, b N
200
But there is a value of (a, a + b) namely (0, 0) N such that (0, 0) LA . Therefore A
x yL(x, f (x, y )).
It is easy to reason that if x has no free occurrences in F ,then the value of A(F ) is independent of the value of (x). We write A A

G if and only if A(G) = . If F is a closed formula, then
F for some appropriate if and only if A
F for every appropriate . F
In this case we simply write A
F and we say A is a model for F . If A
for every A appropriate to F , then F is said to be valid. Valid formulas in predicate calculus play the same role as tautologies in propositional calculus. A closed formula is said to be satisable if it has at least one model; other wise it is unsatisable. We remark that there is no method similar to the true-table approach to check if a formula in predicate calculus is satisable.
4.10
Equivalence of Formulas in Predicate Calculus
Two given formulas F and G are equivalent if and only if, for every structure A and function appropriate to both F and G, A(F ) = A(G) .As in propositional calculus, write F G if F and G are equivalent. It should be obvious that is an equivalence relation on the set of sentences. All the laws of equivalence in propositional calculus (seen earlier)
201
such as associativity and commutativity laws for and , DeMorgans laws, the law of double negation etc. continue to hold good in predicate calculus also. In addition, we also have the following important equivalences. Lemma 4.10.1:(a) For any formula F and variable x, xF x F xF x F (b) For any formulas F and G variable x, such that x has no occurrence in G we have, (xF G) x(F G) (xF G) x(F G) (xF G) x(F G) (xF G) x(F G)
(c) For any formulas F and G and any variable x, x(F G) (xF xG) x(F G) (xF xG)
Proof. (a) We prove that xF x F . The other equivalence can be proved in a similar way. A( xF ) =
Chapter 4 Mathematical Logic if and only if A(xF ) = , i.e., if and only if A(F )[x/a] = for some a [A] i.e., if and only if A( F )[x/a] = for some a [A] i.e., if and only if A(x F ) = (b) Consider the formula (xF G) x(F G). A((xF G)) = if and only if i.e., if and only if, A(xF ) = or A(G) =
202
A(F )[x/a] = for each a [A] or A(G) =
i.e., if and only if, for each a [A] either A(F )[x/a] = or A(G)[x/a] = (since x has no free occurrences in G) i.e., if and only ifA (F G) i.e., if and only if,
[x/a]
= for each a [A]
A(F G)[x/a] = for each a [A]
i.e., if and only if, A(x(F G)) = The other equivalence may be proved along similar lines. (c) We consider the rst equivalence x(F G) (xF xG). Now A x(F G)
= for each a [A]
if and only if A((F G)[x/a] = , i.e., if and only if A(F )[x/a] = , and A(G)[x/a] = ,
for each a [A]
Chapter 4 Mathematical Logic i.e., if and only if A(F )[x/a] = , and A(G)[x/a] = , if and only if A(xF ) = , if and only if A (xF xG)
203 for each a [A] for each a [A] and A(xG) =
The other equivalence may be proved similarly.
4.11
Prenex Normal Form
A formula is in prenex form (or prenex normal form) if and only if all quantiers (if any) occur at the extreme left without intervening parentheses. The prenex form is Q1 v1 . . . Qn vn G where G is a quantier-free formula and each Qi is either or . The formula F1 = (xP x P y ) is not in prenex form. The equivalent formula F2 = x(P x P y ) is a prenex form. Given any formula that is not in prenex form, we can systematically use the previous lemma to successively move quantiers to the left (in a sequence of logically equivalent formulas) until nally a prenex formula is obtained. In this process, it may be necessary to rename variables so that no variables is both free and bound in the same formula. Thus to transform any formula into an equivalent prenex form,we carry out the following steps: Step 1 Rename the variable, if necessary, so that no variable is both free and bound and so that there is at most once occurrence of a quantier with any particular variable (we get what is called as a rectied formula).
204
Step 2 Apply the equivalences of Lemma 9.10.2 to move the quantiers to the left end. It is easy to see that a give n formula may have dierent prenex forms. Example 4.11.1: We wish to get the prenex form of the formula. ( xP (x, y ) xR(x, y )) The given formula is equivalent to the rectied formula ( xP (x, y )zR(z, y )). This formula is equivalent to ( x P (x, y ) zR(z, y )) which is of the form ( xF G) with x having no free occurrence in G, which is equivalent to x(F G). So the formula is equivalent to x( P (x, y ) zP (z, y )) x( zP (z, y ) P (x, y )) x z (P (z, y ) P (x, y )) which is in prenex form.
4.12
The Expansion Theorem
Given a formula F in predicate calculus,. it can be reduced in a systematic way to a countable set of formulas without quantiers or variables. The set of formulas can therefore be regarded as formulas in propositional calculus. That is, for each F we can generate E (F ), the collection of quantier-free formulas. E (F ) is known as th Herbrand expansion of F . The way to obtain E (F ) from F is best illustrated by considering an example. Let F = ( y xP (y, x) z w P (w, z )). Let A be a structure appropriate to F . In the Herbrand expansion, for A to be a model for F we prescribe values of existentially quantied variables corresponding to various
205
ways of substituting values for the universally quantied variables so that the matrix urns out to be true in each case. For the above case, we rst select a 1-place function sign f and a 0-place function sign a. We next replace F by its functional form, yP (y, f (y )) w P (w, a) Here f (y ) denotes a choice for the object x corresponding to y such that P (y, x) holds. For each y , there can be many possible xwe simply say that there must be at least one choice for x. Similarly a stands for some xed object such that P (w, a) holds, whatever be the value of w. To make P (y, x) true we can also take y to be the object f (a) itself and take x to be f (f (a)), and so on. The Herbrand Universe of F is the set of terms that can be formed from a and f , namely, {a, f (a), f (f (a)), . . .} It turns out that in order to test whether F is satisable, it is sucient to consider the functional form and to consider values for y and w drawn from the Herbrand universe. Moreover, it is not necessary to consider arbitrary interpretations for the function signs f and a. It suces to interpret f syntactically; that is, f is simply considered as a function from the Herbrand Universe to itself. To get the Herbrand expansion of F , we rst obtain the matrix F of the functional form of F . For the above example, F = P (y, f (y )) P (w, a) By substituting dierent values for y , we get the Herbrand expansion as
Chapter 4 Mathematical Logic follows:
206
P (a, f (a)) P (a, a) P f (a), f (f (a)) P (a, a) P (a, f (a)) P (f (a), a) . . . P f (a), f (f (a)) P (f (a), a)
= F [y/a, w/a] = F [y/f (a), w/a] = F [y/a, w/f (a)] = F [y/f (a), w/f (a)]
Let us take the formula G = ( y xP (y, x) z w P (a, w). The corresponding functional form is, yP (y, f (y )) w P (a, w). We have the Herbrand expansion as,
P (a, f (a)) P (a, a) P f (a), f (f (a)) P (a, a) P (a, f (a)) P (a, f (a)) . . .
We can see that G is unsatisable because in th Herbrand expansion both P (a, f (a)) and P (a, f (a)) must be true. We state the main result of this section without proof. Theorem 4.12.1 (The Expansion Theorem):
207
A closed formula is satisable if and only if its Herbrand expansion is satisable. The expansion theorem gives a procedure to detect unsatisable formulas in predicate calculus. A formula is unsatisable if and only if its Herbrand expansion is unsatisable. By Compactness theorem for propositional calculus, the expansion is unsatisable if and only if some nite subset of it is unsatisable. These two facts together suggest the following computational procedure for testing unsatisability. Generate the Herbrand expansion in small portions in some systematic way. At periodic intervals stop and test (using truth-tables for example) whether the portion generated is unsatisable. If the original formula is unsatisable, then the fact will be discovered at some point. However if the original formula is satisable this procedure may not halt.
Exercises
1. The Sheer stroke (called the nand operator in Computer Science) and the Pierce arrow (called the nor operator in Computer Science) are dened by the following truth tables: p q pq p q pq
Express p, p q and p q in terms of and operators.
208
2. If P = (A B ) (B C ) (A C ) then show that a DNF of P is ( A B ) ( B C ) ( A C ) . 3. Show that: (a) P (Q R) (P Q) (P R) (b) (P Q) ( P Q) (c) (P Q) Q, if P is unsatisable. 4. Show that the formula formula c (b a) . 5. Let S be a nite clause set such that |C | 2 for each C S . Show that the resolution technique provides a polynomial-time decision procedure for determining the satisability of S . 6. Prove the following equivalences: (i) x(P x xP x) xy (P x P y ) (ii) x(xP x P x) xy (P y P x) 7. Obtain the prenex normal form of x yR(x, y ) y S (x, y ) yR(x, y ) P (a b c) (c a) is equivalent to the
8. Show that the denitions 4, 5 and 6 in Denition 4.2.6 are direct consequences of the other denitions.
Chapter 5 Algebraic Structures

5.1 Introduction
In this chapter, the basic properties of the fundamental algebraic structures, namely, matrices, groups, rings, vector spaces and elds are presented. In addition, the basic properties of nite elds that are so basic to nite geometry, coding theory and cryptography are also discussed.
5.2
Matrices
A complex matrix A of type (m, n) or an m by n complex matrix is an arrangement of mn complex numbers in m rows and n columns in the form: a11 a12 . . . a21 a22 . . . A= . . . . . . . . am1 am2 . . . 209 a1n a2n . . . . amn
210
A is usually written in the shortened form A = (aij ), where 1 i m and 1 j n. If m = n, A is a square matrix of order n. aij is the (i, j )-th entry of A. All the matrices that we consider in this section are complex matrices.
5.3
Addition, Scalar Multiplication and Multiplication of Matrices
If A = (aij ) and B = (bij ) are two m by n matrices, then A + B is the m by n matrix (aij + bij ), and for a scalar (that is, a complex number) , A = (aij ). Further, if A = (aij ) and B = (bij ) is an n by p matrix, then the product AB is dened to be the m by p matrix (cij ), where, cij = ai1 b1j + ai2 b2j + + ain bnj = the scalar product of the i-th row vector Ri of A
and the j -th column vector Cj of B . Thus Cij = Ri Cj . Both Ri and Cj are vectors of length n. It is well-
known that the matrix product satises both the distributive laws and the associative laws, namely, for matrices A, B and C , A(B + C ) = AB + AC, (A + B )C = AC + BC, (AB )C = A(BC ) whenever these sums and products are dened. and
211
5.3.1
Transpose of a Matrix
If A = (aij ) is an m by n matrix, then the n by m matrix (bij ), where Bij = aji is called the transpose of A. It is denoted by At . Thus At is obtained from A by interchanging the row and column vectors of A. For instance, if 147 123 A = 4 5 6 , then At = 2 5 8 . 369 789 (i) (At ) =A,
It is easy to check that
and whenever the product AB is dened
(ii) (AB )t =B t At ,
5.3.2
Inverse of a Matrix
Let A = (aij ) be an n by n matrix, that is, a matrix of order n. Let Aij be the cofactor of aij in det A(= determinant of A). Then the matrix (Aij )t of order n is called the adjoint (or adjucate) of A, and denoted by adj A. Theorem 5.3.1: For any square matrix A of order n, A(adj A) = (adj A)A = (det A)In , where In is the identity matrix of order n. (In is a matrix of order n in which the n diagonal entries are 1 and the remaining entries are 0). Proof. By a property of determinants, we have ai1 A1j + ai2 A2j + + ain Anj = A1i a1j + A2i a2j + + Ani anj = det A or 0
212
according to whether i = j or i = j . We note that in adj A, (Aj 1 , . . . , Anj ) is the j -th column vector and (A1j , . . . , Ajn ) is the j -th row vector. Hence, actual multiplication yields det A 0 . . . 0 0 det A . . . 0 A(adj A) = (adj A)A = = (det A)In . . . . . . . . . . . . . . . . 0 0 . . . det A
Corollary 5.3.2: Let A be a nonsingular matrix that is, (det A = 0). Set A1 = Then AA1 = A1 A = In , where n is the order of A. The matrix A1 , as dened in Corollary 5.3.2, is called the inverse of the (nonsingular) matrix A. If A, B are square matrices of the same order with AB = I , then B = A1 and A = B 1 . This is seen by premultiplying the equation AB = I by A1 and postmultiplying it by B 1 . Note that A1 and B 1 exist since taking determinants of both sides of AB = I , we get det(AB ) = det A det B = det I = 1 and hence det A = 0 as well as det B = 0.
1 (det A). adj A
5.3.3
Symmetric and Skew-symmetric matrices
A matrix A is said to be symmetric i A = At . A is skew-symmetric i A = At . Hence if A = (aij ), then A is symmetric if aij = aji for all i and j ; it
213
is skew-symmetric if aij = aji for all i and j . Clearly, symmetric and skewsymmetric matrices are square matrices. If A = (aij ) is skew-symmetric, then aii = aii , and hence aii = 0 for each i. Thus in a skew-symmetric matrix, all the diagonal entries are zero.
5.3.4
Hermitian and Skew-Hermitian matrices
Let H = (hij ) denote a complex matrix. The conjugate H of H is the matrix (hij ). The conjugate-transpose of H is the matrix H = (H )t = (H t ) =
(hji ) = (h ij ). H is Hermitian i H = H ; H is skew-hermitian i H = H .
For example, the matrix H = S =

i 1+2i 1+2i 5i
1 2+3 i 23i 5
is Hermitian, while the matrix
is skew-hermitian. Note that the diagonal entries of a
skew-hermitian matrix are purely imaginary.
5.3.5
Orthogonal and Unitary matrices
A real matrix (that is, a matrix whose entries are real numbers) P of order n is called orthogonal if P P t = In . If P P t = In , then P t = P 1 . Thus the inverse of an orthogonal matrix is its transpose. Further as P 1 P = In , we also have P t P = In . If R1 , . . . , Rn are the row vectors of P , the relation P P t = In implies that Ri Rj = ij , where ij = 1 if i = j , and ij = 0 if i = j . A similar statement also applies to the column vectors of P . As an
cos sin example, the matrix ( sin cos ) is orthogonal. Indeed, if (x, y ) are cartesian
coordinates of a point P referred to a pair of rectangular axes and if (x , y ) are the coordinates of the same point P with reference to a new set of rectangular axes got by rotating the original axes through an angle about the origin,
Chapter 5 Algebraic Structures then x = x cos + y sin y = x sin + y cos , x cos sin x = y sin cos y
214
that is,
so that rotation is eected by an orthogonal matrix.
Again, if (l1 , m1 , n1 ), (l2 , m2 , n2 ) and (l3 , m3 , n3 ) are the direction cosines of three mutually orthogonal directions referred to an orthogonal coordinate system in the Euclidean 3-space, then the matrix
l1 m1 n1 l2 m2 n2 l3 m3 n3
is orthogonal.
In passing, we mention that rotation in higher dimensional Euclidean spaces is dened by means of an orthogonal matrix. A complex matrix U of order n is called unitary if U U = In . Again, this means that U U = In . Also a real unitary matrix is simply an orthogonal matrix. The unit matrix is both orthogonal as well as unitary. For example, the matrix U =
1 5 1+2i 42i 24i 2i
is unitary.
Exercises 5.2
1. If A =
3 4 1 1
, prove by induction that Ak =
1+2k 4k k 12k
for any posi-
tive integer k .
N cos sin cos n sin n 2. If M = ( = ( sin cos ), prove that M sin n cos n ), n N .
3. Compute the transpose, adjoint and inverse of the matrix
1 1 0 0 1 1 1 0 1
2 1 1 3 4. If A = ( 2 2 ), show that A 3A + 8I = 0. Hence compute A .
5. Give two matrices A and B of order 2, so that
Chapter 5 Algebraic Structures (i) AB = BA (ii) (AB )t = AB
215
6. Prove: (i) (AB )t = B t At ; (ii) If A and B are nonsingular, (AB )1 = B 1 A1 . 7. Prove that the product of two symmetric matrices is symmetric i the two matrices commute. 8. Prove: (i) (iA) = iA (ii) H is Hermitian i iH is skew-Hermitian. 9. Show that every real matrix is the unique sum of a symmetric matrix and a skew-symmetric matrix. 10. Show that every complex matrix is the unique sum of a Hermitian and a skew-Hermitian matrix.
5.4
Groups
Groups constitute an important basic algebraic structure that occur very naturally not only in mathematics but also in many other elds such as physics and chemistry. In this section, we present the basic properties of groups. In particular, we discuss abelian and nonabelian groups, cyclic groups, permutation groups and homomorphisms and isomorphisms of groups. We establish Lagranges theorem for nite groups and the basic isomorphism theorem for groups.
216
Abelian and Nonabelian Groups

Denition 5.4.1: A binary operation on a nonempty set S is a map. S S S , that is, for every ordered pair (a, b) of elements of S , there is associated a unique element a b of S . A binary system is a pair (S, ), where S is a nonempty set and is a binary operation on S . The binary system (S, ) is associative if is an associative operation on S , that is, for all a, b, c in S , (a b) c = a (b c) Denition 5.4.2: A semi group is an associative binary system. An element e of a binary system (S, ) is an identity element of S if a e = e a = a for all a S . We use the following standard notations: Z =the set of integers (positive integers, negative integers and zero) Z+ =the set of positive integers =the set of natural numbers {1, 2, . . . , n, . . .} = N Q =the set of rational numbers Q+ =the set of positive rational numbers Q =the set of nonzero rational numbers R =the set of real numbers R =the set of nonzero real numbers C =the set of complex numbers C =the set of nonzero complex numbers
217
Examples
1. (N, ) is a semigroup, where denotes the usual multiplication.
2. The operation subtraction is not a binary operation on N (for example, 35 / N). 3. (Z, ) is a binary system which is not a semigroup since the associative law is not valid in (Z, ); for instance, 10 (5 8) = (10 5) 8. We now give the denition of a group. Denition 5.4.3: A group is a binary system (G, ) such that the following axioms are satised: (G1 ): The operation is associative on G, that is, for all a, b, c G, (a b) c = a (b c). (G2 ): (Existence of identity)There exists an element e G (called an identity element of G with respect to the operation ) such that a e = e a = a for all a G. (G3 ): (Existence of inverse) To each element a G, there exists an element a1 G (called an inverse of a) with respect to the operation ) such that a a1 = a1 a = e. Before proceeding to examples of groups, we show that identity element e, and inverse element a1 of a, given in Denition 5.4.3 are unique. Suppose G has two identities e and f with respect to the operation . Then e=ef (as f is an identity of (G, ))
Chapter 5 Algebraic Structures =f (as e is an identity of (G, )).
218
Next, let b and c be two inverses of a in (G, ). Then b = b e = b (a c) = (b a) c by the associativity of =ec = c. Thus henceforth we can talk of The identity element e of the group (G, ), and The inverse element a1 of a in (G, ). If a G, then a a G; also, a a (n times) G. We denote a a (n times) by an . Further, if a, b G, a b G, and (a b)1 = b1 a1 . (Check that (a b)(a b)1 = (a b)1 (a b) = e). More generally, if a1 , a2 , . . . , an G,
1 1 1 n 1 then (a1 a2 an )1 = a = (a1 )n = n an1 aa , and hence (a )
(written as) an . Then the relation am+n = am an holds for all integers m and n, with a0 = e. In what follows, we drop the group operation in (G, ), and simply write group G, unless the operation is explicitly needed. Lemma 5.4.4: In a group, both the cancellation laws are valid, that is, if a, b, c are elements of a group G with ab = ac, then b = c (left cancellation law), and if ba = ca, then b = c (right cancellation law). Proof. If ab = ac, premultiplication by a1 gives a1 (ab) = a1 (ac). So by the associative law, (a1 a)b = (a1 a)c, and hence eb = ec. This implies that b = c. The other cancellation is proved similarly.
Chapter 5 Algebraic Structures Denition 5.4.5:
219
The order of a group G is the cardinality of G. The order of an element a of a group G is the least positive n such that an = e, the identity element of G. If no such n exists, the order of a is taken to be innity. Denition 5.4.6 (Abelian Group): A group G is called abelian (after Abel) if the group operation of G is commutative, that is, ab = ba for all a, b G. A group G is nonabelian if it is not abelian, that is, there exists a pair of elements x, y in G with xy = yx.
Examples of Abelian Groups

1. (Z, +) is an abelian group, that is, the set Z of integers is an abelian group under the usual addition operation. The identity element of this group is O, and the inverse of a is a. (Z, +) is often referred to as the additive group of integers. Similarly, (Q, +), (R, +), (C, +) are all additive abelian groups. 2. The sets Q , R and C are groups under the usual multiplication operation. 3. Let G = C[0, 1], the set of complex-valued continuous functions dened on [0, 1]. G is an abelian group under addition. Here, if f, g C[0, 1], then f + g is dened by (f + g )(X ) = f (X ) + g (X ). The zero function T is the identity element of the group while the inverse of f is f . 4. For any positive integer n, let Zn = {0, 1, . . . , n 1}. Dene addition + in
220
Zn as congruent modulo n addition, that is, if a, b Zn , then a + b = c, where c Zn and a + b c ( mod n). Then (Zn , +) is an abelian group. For instance, if n = 5, then in Z5 , 2 + 2 = 4, 2 + 3 = 0, 3 + 3 = 1 etc. 5. Let G = {r : r = rotation of the plane about the origin through an angle in the anticlockwise sense }. Then if we set r r = r+ (that is, rotation through followed by rotation through = rotation through + ), then (G, ) is a group. The identity element of (G, ) is r0 , while (r )1 = r , the rotation of the plane about the origin through an angle in the clockwise sense.
Examples of Nonabelian Groups

1. Let G = GL(n, R), the set of all n by n nonsingular matrices with real entries. Then G is an innite nonabelian group under multiplication. 2. Let G = SL(n, Z) be the set of matrices of order n with integer entries having determinant 1. G is again an innite nonabelian multiplicative group. (Note that if A SL(n, Z), then A1 = Section 5.3.2) 3. Let S4 denote the set of all 1-1 maps f : N4 N4 , where N4 = {1, 2, 3, 4}. If denotes composition of maps, then (S4 , ) is a nonabelian group of order 4! = 24. (See Section*** for more about such groups). For instance, let 123 f = 412 4 . 3
1 (adj A) det A
SL(n, Z)
since det A = 1 and all the cofactors of the entries of A are integers. (See
221
Here the parantheses notation signies the fact that the image under f of a number in the top row is the corresponding number in the bottom row. For instance, f (1) = 4, f (2) = 1 and so on. Let 123 g= 312 4 4 123 Then g f = 431 4 . 2
Note that (g f )(1) = g f (1) = g (4) = 4, while (f g )(1) = f g (1) = f (3) = 2, and hence f g = g f . In other words, S4 is a nonabelian group. The identity element of S4 123 I= 123 is the map 123 4 and f 1 = 234 4 4 . 1
5.4.1
Group Tables
The structure of a nite group G can be completely specied y means of its group table (sometimes called multiplication table ). This is formed by listing the elements of G in some order as {g1 , . . . , gn }, and forming an n by n double array (gij ), where gij = gi gj , 1 i j n. It is customary to take g1 = e, the identity element of G.
Examples Continued
9. a [Kleins 4-group K4 ] This is a group of order 4. If its elements are e, a, b, c, the group table of K4 is given by Table 5.1. It is observed from the Table that ab = ba = c, and a(ba)b = a(ab)b
Chapter 5 Algebraic Structures This gives c2 = (ab)(ab) = a(ba)b = a(ab)b = a2 b2 = ee = e. Thus every element of K4 other than e is of order 2.
222
5.5
A Group of Congruent Transformations (Also called Symmetries)
We now look at the congruent transformations of an equilateral triangle ABC . Assume without loss of generality that the side BC of the triangle is horizontal so that A is in the vertical through the middle point D of BC . Let us denote the rotation of the triangle about its centre through an angle of 120 in the anticlockwise sense and let f denote the ipping of the triangle about the vertical through the middle point of the base. f interchanges the base vertices and leaves all the points of the vertical through the third vertex unchanged. Then f r denotes the transformation r followed by f and so on. e e a b c e a b c a a e c b b b c e a c c b a e
Table 5.1: Group Table of Kleins 4-group.
Chapter 5 Algebraic Structures A r C f C
223
Thus f r leaves B xed and ips A and C in ABC . There are six congruent transformations of an equilateral triangle and they form a group as per the following group table. r3 = e r3 = e r r2 f rf r2 f e r r2 f rf r2 f r r r2 e fr f rf r2 r2 e r f r2 r2 f f f f rf r2 f e r r2 rf rf r2 f f r2 e r r2 f r2 f f rf r r2 e
Group Table of the Dihedral group D3 For instance, r2 f r and rf are obtained as follows: A r C f C r2 B
C A
B A f
A B r
224
Thus r2 f r = rf , and similarly the other products can be veried. The resulting group is known as the dihedral group D3 .
5.6
Another Group of Congruent Transformations
Let D4 denote the transformations that leave a square invariant. If r denotes a rotation of 90 about the centre of the square in the anticlockwise sense, and f denotes the ipping of the square about one of its diagonals, then the dening relations for D4 are given by: r4 = e = f 2 = (rf )2 . D4 is the dihedral group of order 2 4 = 8. Both D3 and D4 are nonabelian groups. The dihedral group Dn of order n is dened in a similar fashion as the group of congruent transformations of a regular polygon of n sides. The groups Dn , n 3, are all nonabelian. Dn is of order 2n for each n.
5.7
Subgroups
Denition 5.7.1: A subset H of a group (G, ) is a subgroup of (G, ) if (H ) is a group, under the operation of G. Denition 5.7.1 shows that the group operation of a subgroup H of G is the same as that of G.
225
Examples of Subgroups
1. Z is a subgroup of (Q, +). 2. 2Z is a subgroup of (Z, +). (Here 2Z denotes the set of even integers). 3. Q is a subgroup of (R , ). (Here denotes multiplication. 4. Let H be the subset of maps of S4 (See Example 3 of Section 5.4) that x 1, that is, H = {f S4 : f (1) = 1}. Then H is a subgroup of S4 . Note that the set N of natural numbers does not form a subgroup of (Z, +).
Subgroup Generated by an Element

Denition 5.7.2: LEt S be a nonempty subset of a group G. The subgroup generated by S in G, denoted by < S >, is the intersection of all subgroups of G containing S . Before we proceed to the properties of < S >, we need a result. Proposition 5.7.3: The intersection of any family of subgroups of G is a subgroup of G. Proof. Consider a family {G }I of subgroups of G, and let H = G . If a, b H ,then a, b G for each I , and since G is a subgroup of G, a b G for each H . Therefore ab H , and similarly a1 H and e H . The associative law holds in H , a subset of G, as it holds in G. Thus H is a subgroup of G. Corollary 5.7.4:
226
Let S be a nonempty subset of a group G. Then < S > is the smallest subgroup of G containing S . Proof. By denition, < S > is a subgroup of G containing S . If < S > is not the smallest subgroup of G containing S , then there exists a subgroup H of G such that S H < S >. But, by denition of < S >, any subgroup that contains S must contain < S >. Hence < S > H . Thus H =< S >, and so < S > is the smallest subgroup of G containing S .
5.8
Cyclic Groups
Denition 5.8.1: Let G be a group and a, an element of of G. Then the subgroup generated by a in G is {a} , that is, the subgroup generated by the singleton subset {a}. It is also denoted simply by a . By Corollary 5.7.4, a is the smallest subgroup of g containing a. As a < a >, all the powers of an , n Z, also belong to < a >. But then, as may be checked easily, the set {an : n Z} of powers of a is already a subgroup of G. Hence a = {an : n Z}. Note that a0 = e, the identity element of G, and an = (a1 )n , the inverse of an . This makes am an = am+n for all integers m and n. The subgroup A = a of G is called the subgroup generated by a, and a is called a generator of A. Now since {an : n Z} = {(a1 ) : n Z}, a1 is also a generator of a . Suppose a is of nite order m in a . This means that m is the least positive integer with the property that am = e. Then the elements
227
a1 = a, a2 , . . . , am1 , am = e are all distinct. Moreover, for for any integer m, by Euclidean algorithm (see....), there are integers q and r such that m = qn + r, Then am = a(qn+r) = (an )q ar = eq ar = ear = ar , and hence am < a >. Thus in this case, a = a, a2 , . . . , am1 , am = e In the contrary case, there exists no positive integer m such that am = e. Then all the powers ar : r N an distinct. If not, there exist integers r and s, r = s, such that ar = as . Suppose r > s. Then r s > 0 and the equation ar = as gives as ar = as as , and therefore ars = a0 = e, a contradiction. In the rst case, the cyclic group < a > is of order m while in the latter case, it is of innite order. 0 r < n, 0 r < n.
Examples of cyclic Groups

1. The additive group of integers (Z, +) is an innite cyclic group. It is generated by 1 as well as 1. 2. The group of n-th roots of unity, n 1. Let G be the set of n-th roots of unity so that G= , 2 , . . . , n = 1; = cos 2 2 + i sin n n .
Then G is a cyclic group of order n generated by , that is, G =< >. In fact k , 1 k n, also generates k i (k, n) = 1. (See****). Hence
228
the number of generators of G is (n). As G is a cyclic group of order n, it follows that for each positive integer n, there exists a cyclic group of order n. If G =< a >= {an : n Z}, then since for any two integers n and m, an am = an+m = am an , G is abelian. In other words, every cyclic group is abelian. However, the converse is not true. K4 , the Kleins 4-group (See Table 5.1 of Section 5.4) is abelian but not cyclic since K4 has no element of order 4. Theorem 5.8.2: Any subgroup of a cyclic group is cyclic. Proof. Let G =< a > be a cyclic group, and H , a subgroup of G. If H = {e}, then H is trivially cyclic. So assume that H = {e}. As the elements of G are powers of a, an H for some nonzero integer n. Then its inverse a1 also belongs to H , and of n and n at least one of them is a positive integer. Let s be the least positive integer such that as H . (recall that H = {e} as per our assumption). We claim that H =< as >, the cyclic subgroup of G generated by as . To prove this we have to show that each element of H is a power of as . Let g be any element of H . As g G, g = am for some integer m. By division algorithm, m = qs + r, 0 r < s.
Hence ar = amqs = am (as )q H as am H and as H . Thus ar H . This however implies, by the choice of s, r = 0 (otherwise ar H with 0 < r < s). Hence ar = a0 = e = am (as )q , and therefore, g = am =
Chapter 5 Algebraic Structures (as )q ,
229
q Z. Thus every element of H is a power of as and so H < as >.
Now since as H , all powers of as also H , and so < as > H . Thus H =< as > and therefore H is cyclic. Denition 5.8.3: Let S be any nonempty set. A permutation on S is a bijective mapping from S to S . Lemma 5.8.4: If 1 and 2 are permutations on S , then the map = 1 2 dened on S by 1 2 (s) = (s) = 1 (2 (s)) , sS
is also a permutation on S . is one-to-one as both 1 and 2 are one-to-one; it is onto as both 1 and 2 are onto. 2 s1 s2 S 2 2 (s1 ) 2 (s1 ) S S 1 1 1 (2 s1 ) = (s1 ) 1 (2 s2 ) = (s2 )
Proof. Indeed, we have, for s1 , s2 in S , (s1 ) = (s2 ) gives that 1 (2 s1 ) = 2 (2 s2 ). This implies, as 1 is 1 1, 2 s1 = 2 s2 . Again, as 2 is 1 1, this gives that s1 = s2 . Thus is 1 1. For a similar reason, is onto. Let B denote the set of all bijections on S . Then it is easy to verify that (B, ), where is the composition map, is a group. The identity element of this group is the identity function e on S .
230
The case when S is a nite set is of special signicance. So let S = {1, 2, . . . , n}. The set P of permutations of S forms a group under the composition operation. Any P can be conveniently represented as: 1 2 ... n . : (1) (2) . . . (n) Then 1 is just the permutation (1) (2) . . . (n) 1 : 1 2 ... n
What is the order of the group P? Clearly (1) has n choices, namely, any one of 1, 2, . . . , n. Having chosen (1), (2) has n 1 choices (as is 1 1, (1) = (2)). For a similar reason, (3) has n 2 choices and so on, and nally (n) has just one left out choice. Thus the total number of permutations on S is n (n 1) (n 2) 2 1 = n! In other words, the group P of permutations on a set of n elements is of order n! It is denoted by Sn and is called the symmetric group of degree n. Any subgroup of Sn is called a permutation group of degree n.
Example
Let S = {1, 2, 3, 4, 5}, and let and S5 be given by 12345 12345 . , = = 52413 23541 Then 12345 , = 13425
1 2 3 4 5 and 2 = = 35142
231
A cycle in Sn is a permutation Sn that can be represented in the form (a1 , a2 , . . . , ar ), where the ai , 1 i r, r n, are all in S , and (ai ) = ai+1 , 1 i r 1, and (ar ) = a1 , that is, each ai is mapped cyclically to the next element (or) number ai+1 and xes the remaining ai s. For example, 1324 S4 , then can be represented by (132). Here leaves 4 if = 3214 xed. 1234567 . Clearly, p is the Now consider the permutation p = 3421657 product of the cycles (1324)(56)(7) = (1324)(56). Since 7 is left xed by p, 7 is not written explicitly. In this way, every permutation on n symbols is a product of disjoint cycles. The number of symbols in a cycle is called the length of the cycle. For example, the cycle (1 3 2 4) is a cycle of length 4. A cycle of length 2 is called a transposition. For example, the permutation (1 2) is a transposition. It maps 1 to 2, and 2 to 1. Now consider the product of transpositions t1 = (13), t2 = (12) and t3 = (14). We have 1 t1 t2 t3 = (13)(12)(14) = 3 1 3 2 1 1 2 4 1 123 4 = 431 1 4 = (1423) . 2 and so on.
To see this, note that
(t1 t2 t3 )(4) = (t1 t2 )(t3 (4)) = (t1 t2 )(1) = t1 (t2 (1)) = t1 (2) = 2,
In the same way, any cycle (a1 a2 . . . an ) = (a1 an )(a1 an1 ) (a1 a2 ), a product of transpositions. Since any permutation is a product of disjoint cycles and
232
any cycle is a product of transpositions, it is clear that any permutation is a product of transpositions. Now in the expression of a cycle as a product of transpositions, the number of transpositions need not be unique. For instance, (12)(12) = identity permutation, and (1324) = (14)(12)(13) = (12)(12)(14)(12)(13). However, this number is always odd or always even. Theorem 5.8.6: Let be any permutation on n symbols. Then in whatever way is expressed as a product of transpositions, the number of transpositions is always odd or always even. Proof. Assume that is a permutation on {1, 2, . . . , n}. Let the product P = (a1 a2 )(a1 a3 ) (a1 an ) (a2 a3 ) (a2 an ) ... ...
(an1 an )
1 a1 =
1i<j n
a2 an
2 . a2 2 an
(ai aj ) = det a2 1
n1 n1 n1 a1 a2 an
Any transposition (ai aj ) applied to the product P changes P to P as this amounts to the interchange of the i-th and j -th columns of the above de-
233
terminant. Now when applied to P has a denite eect, namely, either it changes P to P or leaves P unchanged. In case changes P to P , must always be a product of an odd number of transpositions; otherwise it must be the product of an even number of transpositions. Denition 5.8.7: A permutation is odd or even according to whether it is expressible as the product of an odd number or even number of transpositions. Example 5.8.8: 123456789 Let = 451237986 = (13)(15)(12)(14)(69)(67) = a product of an even number of transpositions
Then = (14253)(679)(8)
Hence is an even permutation.
5.9
Lagranges Theorem for Finite Groups
We now establish the most famous basic theorem on nite groups, namely, Lagranges theorem. For this, we need the notion of left and right cosets of a subgroup. Denition 5.9.1:
234
Let G be a group and H , a subgroup of G. For a G, the left coset aH of a in G is the subset {ah : h H } of G. The right coset Ha of a is dened in an analogous manner. Lemma 5.9.2: Any two left cosets of a subgroup H of a group G are equipotent (that is, have he same cardinality). Moreover, they are equipotent to H . Proof. Let aH and bH be two left cosets of H in G. Consider the map : aH bH
dened by (ah) = bh, h H . is 1 1 since (ah1 ) = (ah2 ), for h1 , h2 H , implies that bh1 = bh2 and therefore h1 = h2 . Clearly, is onto. Thus is a bijection of aH onto bH . In other words, aH and bH are equipotent. SInce eH = H , H itself is a left coset of H and so all left cosets of H are equipotent to H . Lemma 5.9.3: The left coset aH is equal to H i a H . Proof. If a H , then aH = {ah : h H } H as ah H . Further if b H , then a1 b H , and so a(a1 b) H . But a(a1 b) = b. Hence b aH , and therefore aH = H . (In particular if H is a group, then multiplication of the elements of H by any element a H just gives a permutation of H .) Conversely, if aH = H , then a = ae aH , as e H . Example 5.9.4: It is not necessary that aH = Ha for all a G. For example, consider S3 ,
235
the symmetric group of degree 3. The 3! = 6 permutations of S3 are given by 1 2 3 1 2 3 1 2 3 , = (23), = (13) e= 123 132 321 S3 = 1 2 3 1 2 3 1 2 3 = (12), = (123), = (132). 213 231 312 aH = {(123)e, Ha = {e(123), so that aH = Ha. Proposition 5.9.5: The left cosets aH and bH are equal i a1 b H . Proof. If aH = bH , then there exist h1 , h2 H such that ah1 = bh2 , and
1 1 1 1 therefore a1 b = h1 h 2 H , h2 H . If a b H , let a b = h H so
Let H be the subgroup {e, (12)}. For a = (123), we have (123)(12) = (13)} , (12)(123) = (23)}
and
that b = ah and bH = (ah)H = a(hH ) = aH by Lemma ??. Lemma 5.9.6: Any two left cosets of the same subgroup of a group are either identical or disjoint. Proof. Suppose aH and bH are two left cosets of the subgroup H of a group G, where a, b G. If aH and bH are disjoint there is nothing to prove. Otherwise, aH bH = ,and therefore, there exist h1 , h2 H with ah1 = bh2 .
236
1 This however means that a1 b = h 1 h2 H . So by Proposition 5.9.5,
aH = bH . Example 5.9.7: For the subgroup H of Example 5.9.4, we have seen that (123)H = {(123), (13)}. Now (12)H = {(12)e, (12)(12)} = {(12), e} = H , and hence (123)H (12)H = . Also (23)H = {(23)e, (23)(12)} = {(23), (132)}, and (13)H = {(13)(123), (13)((13)} = {(123), e} = (12) H The last equation holds since (13)1 (123) = (13)(123) = (12) H (Refer to Proposition 5.9.5). Theorem 5.9.8: [Lagranges Theorem, after the French mathematician J. L. Lagrange]Lagrange The order of any subgroup of a nite group G divides the order of G. Proof. Let H be a subgroup of the nite group G. We want to show that o(H )| o(G). We show this by proving that the left cosets of H in G form a partition of G. First of all, if g is any element of G, g = ge gH . Hence every element of G is in some left coset of H . Now by Lemma 5.9.6, the distinct cosets of H are pairwise disjoint and hence form a partition of G. Again, by Lemma ??, all the left cosets of H have the same cardinality as H , namely, o(H ). Thus if there are l left cosets of H in G, we have l o(H ) = o(G) Consequently, o(H ) divides o(G). Denition 5.9.9: Let H be a subgroup of a group G. Then the number (may be innite) of (5.1)
Chapter 5 Algebraic Structures left cosets of H in G is called the index of H in G and denoted by iG (H )
237
If G is a nite group, then equation 5.1 in the proof of Theorem 5.9.8 shows that o(G) = o(H )iG (H ) Example 5.9.10: [An application of Lagranges theorem] If p is a prime, and n any positive integer, then n|(pn 1) First we prove a lemma. Lemma 5.9.11: If m 2 is a positive integer, and S , the set of positive integers less than m and prime to it, then S is a multiplicative group modulo m. Proof. If (a, m) = 1 and (b, m) = 1, then (ab, m) = 1. For if p is a prime factor of ab and m, then as p divides ab, p must divide either a or b, say, p|a. Then (a, m) p, a contradiction. Moreover, if ab c( mod m), 1 c < m, then (c, m) = 1. Thus ab( mod m) = c S . Also as (1, m) = 1, 1 S . Now for any a S , by Euclidean algorithm, there exists b N such that ab 1( mod m). Then (b, m) = 1 (if not, there exists a prime p with p|b and p|m, then p|1, a contradiction.) Thus a has an inverse b( mod m) in S . Thus S is a multiplicative group modulo m, and o(S ) = (m). Proof. [of 5.9.10] We apply Lemma 5.9.11 by taking m = pn 1. Let H = {1, p, p2 , . . . , pn1 }. All the numbers in H are prime to m and hence H S (as dened in Lemma 5.9.10). Further pj pnj = pn = 1( mod m). Hence
238
every element of H has an inverse modulo m. Therefore (as the other group axioms are trivially satised byH ), H is a subgroup of order n of S . By Lagranges theorem, o(H )|o(S ), and so n|(pn 1). As an application of Lagranges theorem, we have the following result. Theorem 5.9.12: Any group of prime order is cyclic.
5.10
Homomorphisms and Isomorphisms of Groups
Consider the two groups: G1 = the multiplicative group of the sixth roots of unity = { 6 = 1, , 2 , . . . , 5 : = a primitive sixty root of unity} and G2 = the additive group of Z = {0, 1, 2, . . . , 5} G1 is a multiplicative group while G2 is an additive group. However, structurewise, they are just the same. By this we mean that if we can make a suitable identication of the elements of the two groups, then they behave in the same manner. If we make the correspondence i i, we see that i j i + j as i j = i+j , when i + j is taken modulo 6. For instance, in G1 , 3 4 = 7 = 1 , while in G2 , 3 + 4 = 1 as 7 1( mod 6). The order of in G1 = 6 = the (additive) order of 1 in G2 . G1 has {1, 2 , 4 } as a subgroup while G2 has {0, 2, 4} as a subgroup and so on. It is clear that we can replace 6 by any positive integer n and a similar result holds good.
239
In the above situation, we say G1 and G2 are isomorphic groups. We now formalize the above concept. Denition 5.10.1: Let G and G be groups (distinct or not). A homomorphism from G to G1 is a map f : G G such that f (ab) = f (a)f (b). (5.2)
In Denition 5.10.1 the multiplication operation has been used to denote the group operations in G1 and G2 . If, for instance, G1 is an additive group and G2 is a multiplicative group, the equation 5.2 should be changed to f (a + b) = f (a)f (b) and so on. Denition 5.10.2: An isomorphism from a group G to a group G is bijective homomorphism from G to G , that is, it is a map f : G G which is both a bijection and a group homomorphism. It is clear that if f : G G is an isomorphism from G to G , then f 1 : G G is an isomorphism from G G. Hence if there exists a group isomorphism from G to G , we can say without any ambiguity that G and G are isomorphic groups. A similar statement cannot be made for group homomorphism. If G is isomorphic to G , we write: G G
Chapter 5 Algebraic Structures Examples
240
1. Let G = (Z, +), and G = (nZ, +). (nZ is the set got by multiplying all integers by n). The map f : G G dened by f (m) = mn, m G, is a group homomorphism from G onto G . 2. Let G = (R, +) and G = (R+ , ). The map f : G G dened by f (x) = ex , x G, is a group homomorphism from G onto G . 3. Let G = (Z, +), and G = (Z Z, +). The map f : G G dened by f (n) = (0, n), n Z is a homomorphism from G to G . 4. Let G = R2 = R R, the real plane with addition + group operation (this (x, y ) + (x , y ) = (x + x , y + y ), and PX : R2 R be dened by PX (x, y ) = x, the projection of R2 on the X-axis. PX is a homomorphism from (R2 , +) to (R, +). We remark that the homomorphism in Example 3 above is not onto while those in Examples 1, 2 and 4 are onto. The homomorphisms in Examples 1 and 2 are isomorphisms. The isomorphism in Example 1 is an isomorphism of G onto a proper subgroup of G. We now check that the map f of Example 2 is an isomomorphism. First it is a group homomorphism since f (x + y ) = ex+y = ex ey = f (x)f (y ). (Note that the group operation in G is addition while that in G is multiplication). Next we check that f is 1 1. In fact, f (x) = f (y ) gives ex = ey , and therefore exy = 1. This means, as the domain of f is R, x y = 0, and hence x = y . Finally, f is onto. If y R+ , then there exists x such that
241
ex = ey ; in fact, x = l0 ge y , is the unique preimage of y . Thus f is a 1 1, onto group homomorphism and hence it is a group isomorphism.
5.11
Properties of Homomorphisms of Groups
Let f : G G be a group homomorphism. Then f satises the following properties: Property 1 f (e) = e , that is, the image of the identity element e of G under f is the identity element e of G . Proof. For x G, the equation xe = x in G gives, as f is a group homomorphism, f (xe) = f (x)f (e) = f (x) = f (x)e in G . As G is a group, both the cancellation laws are valid in G . Hence cancellation of f (x) gives f (e) = e . Property 2 The image f (a1 ) of the inverse of an element a of G is the inverse of f (a) in G , that is, f (a1 ) = (f (a))1 . Proof. The relation aa1 = e in G gives f (aa1 ) = f (e). But by Property 1, f (e) = e , and as f is a homomorphism, f (aa1 ) = f (a)f (a1 ). Thus f (a)f (a1 ) = e in G . This implies that f (a1 ) = (f (a))1 . Property 3 The image f (G) G is a subgroup of G . In other words, the homomorphic image of a group is a group. Proof. (i) Let f (a), f (b) f (G), where a, b G. Then f (a)f (b) =
f (ab) f (G), as ab G.
242
(ii) The associative law is valid in f (G). As f (G) G and G , being a group, satises the associative law. (iii) By Property 1, the element f (e) f (G) acts as the identity element of f (G). (iv) Let f (a) f (G), a G. By Property 2, (f (a))1 = f (a1 ) f (G), as a1 G. Thus f (G) is a subgroup of G . Theorem 5.11.1: Let f : G G be a group homomorphism and K = {a G : f (a) = e }, that is K is the set of all those elements of G that are mapped by f to the identity element e of G. Then K is a subgroup of G. Proof. By Exercise***, it is enough to check that if a, b K , then ab1 K . Now f (ab1 ) = (as f is a group homomorphism) f (a)f (b1 ) = f (a) (f (b))1 = e (e )1 = e e = e = e and hence ab1 K . Thus K is a subgroup of G. Denition 5.11.2: The subgroup K dened in the statement of Theorem 5.11.1 is called the kernel of the group homomorphism f . As before, let f : G G be a group homomorphism. Property 4 For a, b G, f (a) = f (b) i ab1 K , the kernel of f .
Chapter 5 Algebraic Structures Proof. f (a) = f (b) f (a) (f (b))1 = e f (a)f (b1 ) = e f (ab1 ) = e ab1 K. Property 5 f is a 1 1 map i K = {e}. (By Property 2)
243
Proof. Let f be 1 1, a K . Then by the denition of the kernel, f (a) = e . But e = f (e), by Property 1. Thus f (a) = f (e), and this implies, as f is 1 1, that a = e. Conversely, assume that K = {e}, and let f (a) = f (b). Then, by Property 4, ab1 K . Thus ab1 = e and so a = b. Hence f is 1 1. Property 6 A group homomorphism f : G G is an isomorphism i f (G) = G and K (= the kernel of f )= {e}. Proof. f is an isomorphism i f is a 1 1, onto homomorphism. Now f is 1 1 i K = {e}, by Property 5. Further, f is onto i f (G) = G . Property 7 (Composition of homomorphisms) Let f : G G and g : G G be group homomorphisms. Then the composition map h = gof : G G is also a group homomorphism. Proof. h is a group homomorphism i h(ab) = h(a)h(b) for all a, b G. Now h(ab) = (gof )(ab) = g (f (ab))
Chapter 5 Algebraic Structures = g (f (a)f (b)) , = g (f (a)gf (b)) , = (g f ) (g f )(b) = h(a)h(b). as f is a group homomorphism as g is a group homomorphism
244
5.12
Automorphism of Groups
Denition 5.12.1: An automorphism of a group G is an isomorphism of G onto itself. Example 5.12.2: Let G = { 0 = 1, , 2 }, be the group of cube roots of unity, where =
+ i sin 23 . Let f : G G be dened by f ( ) = 2 . To make f a group cos 23
homomorphism, we have to set f ( 2 ) = f ( ) = f ( )f ( ) = 2 2 = , and f (1) = f ( 3 ) = (f ( ))3 = ( 2 )3 = ( 3 )2 ) = 13 = 1. In other words, the homomorphism f : G G is uniquely dened on G once we set f ( ) = 2 . Clearly, f is onto. Further, only 1 is mapped to 1 by f , while the other two elements and 2 are moved by f . Thus Ker f = {1}. So by Property 7, f is an isomorphism of G onto G, that is, an automorphism of G. Our next theorem shows that there is a natural way of generating at least one set of automorphisms of a group. Theorem 5.12.3: Let G be a group and a G. The map fa : G G dened by fa (x) = axa1 is an automorphism of G.
Chapter 5 Algebraic Structures Proof. First we show that fa is a homomorphism. In fact, for x, y G, fa (xy ) = a(xy )a1 by the denition of fa
245
= a(xa1 ay )a1 = (axa1 )(aya1 ) = fa (x)fa (y ). Thus fa is a group homomorphism. Next we show that fa is 1 1. Suppose for x, y G, fa (x) = fa (y ). This gives axa1 = aya1 , and so by the two cancellation laws that are valid in a group, x = y . Finally, if y G, then a1 ya G, and fa (a1 ya) = a(a1 ya)a1 = (aa1 )y (aa1 ) = eye = y , and so f is onto. Thus f is an automorphism of the group G. Denition 5.12.4: An automorphism of a group G that is a map of the form fa for some a G is called an inner automorphism of G.
5.13
Normal Subgroups
Denition 5.13.1: A subgroup N of a group G is called a normal subgroup of G (equivalently, N is normal in G) if aN a1 N for each a G. (5.3)
In other words, N is normal in G if N is left invariant by the inner automorphisms fa for each a G. We state this observation as a proposition.
Chapter 5 Algebraic Structures Proposition 5.13.2:
246
The normal subgroup of a groups G are those subgroups of G that are left invariant by all the inner automorphisms of G. Now the condition aN a1 N for each a G shows, by replacing a by a1 , a1 N (a1 )1 = a1 N a N . The latter condition is equivalent to N aN a1 for each a G. (5.4)
The conditions (5.3) and (5.4) give the following equivalent denition of a normal subgroup. Denition 5.13.3: A subgroup N of a group G is normal in G i aN a1 = N (equivalently, aN = N a) for every a G. Examples 1. Let G = S3 , the group of 3! = 6 permutations on {1, 2, 3}. Let N = {e, (123), (132)}. Then N is a normal subgroup of S3 . First of all note that N is a subgroup of G. In fact, we have (123)2 = (132), (132)2 = (123), and (123)(132) = e. Let a S3 . Hence (123)1 = (132) and (132)1 = (123). So let
If a N, then aN = N = N a (See 5.9.3).
a S \ N . Hence a = (12), (23) or (13). If a = (12), then aN a1 = {(12)e, (12)(123)(12), (12)(132)(12)} = {e, (132), (123)}. In a similar manner, we have (23)N (23) = N and (13)N (13) = N . Thus N is a normal subgroup of S3 . 2. Let H = {e, (12)} S3 . ThenH is is a subgroup of G that is not normal
Chapter 5 Algebraic Structures in S3 . In fact, if a = (23), we have aHa1 = (23) {e, (12)} (23) = {(23)e(23), (23)(12)(23)} = {e, (13)} = H Hence H is not a normal subgroup of S3 . Denition 5.13.4:
247
The centre of a group G consists of those elements of G each of which commutes with all the elements of G. It is denoted by C (G). Thus C (G) = {x G : xa = ax for each a G} For example, C (S3 ) = {e}, that is, the centre of S3 is trivial. Also, it is easy to see that the centre of an abelian group G is G itself. Clearly the trivial subgroup {e} is normal in G and G is normal in G. (Recall that aG = G for each a G). Proposition 5.13.5: The centre C (G) of a group G is a normal subgroup of G. Proof. We have for a G, aC (G)a1 = aga1 : g C (G) = (ag )a1 : g C (G) = g (aa1 ) : g C (G) = {g : g C (G)} = C (G)
Chapter 5 Algebraic Structures Theorem 5.13.6:
248
f : G G be a group homomorphism. Then H = Ken f is a normal subgroup of G. Proof. We have, for a G, aKa1 = {aka1 : h K }. Now f (aka1 ) = f (a)f (k )f (a1 ) = f (a)e f (a1 ) = f (a)f (a1 ) = f (a)(f (a))1 e . Hence aka1 K for each k K and so aKa1 Ker f = K for every a G. This implies that H is a normal subgroup of G.
5.14
Quotient Groups (or Factor Groups)
Let G be a group, and H a normal subgroup of G. Let G/H (read G modulo H ) be the set of all left cosets of H . Recall that when H is a normal subgroup of G, there is no distinction between the left coset aH and the right coset Ha of H . The fact that H is a normal subgroup of G enables us to dene a group operation in G/H . We set, for any two cosets aH and bH of H in G, aH bH = (ab)H . This denition is well-dened. By this we mean that if we take dierent representative elements instead of a and b to dene the cosets aH and bH , still we end up with the same product. To be precise, let aH = a1 H, and bH = b1 H (5.5)
Then (ab)H = (aH )(bH ) = (a1 H )(b1 H ) = (a1 b1 )H (5.6) because (5.5) implies that a1 a1 H and b1 b1 H,
249
and so (ab)1 (a1 b1 ) = b1 (a1 a1 )b1 = b1 hb1 ) (h = a1 a1 H ). (5.7) Now we apply Property ???. Thus the product of two (left) cosets of H in G is itself a left coset of H . Further for a, b, c G, (aH )(bH cH ) = (aH )(bc H ) = (abc)H = ((ab)H )cH = (aH bH )cH. Thus the binary operation dened in G/H satises the associative law. Further eH = H acts as the identity element of G/H as (aH )(eH ) = (ae)H = aH = (ea)H = eH aH Finally, the inverse of aH is a1 H since (aH )(a1 H ) = (aa1 )H = eH = H, and for a similar reason (a1 H )(aH ) = H . Thus G/H is called a group under this binary operation. G/H is called the quotient group or factor group of G modulo H . Example 5.14.1: We now present an example of a quotient group. Let G = (R2 , +), the additive group of points of the plane R2 . (If (x1 , y1 ) and (x2 , y2 ) are two points of R2 , their sum (x1 , y1 )+(x2 , y2 ) is dened as (x1 + x2 , y1 + y2 ). The identity element of this group (0, 0) and the inverse of (x, y ) is (x, y )). Let H be the subgroup: {(x, 0) : x R} = X-axis. If (a, b) is any point of R2 , then (a, b) + H = {(a, b) + (x, 0) = (a + x, b) : x R}
250
=line through (a, b) parallel to X-axis. Clearly (a, b) + H = (a , b ) + H , then ((a a), (b b) H ) =X-axis and therefore the Y-coordinate b b = 0 and so b = b. In other words, the line through (a, b) and the line through (a , b ), both parallel to the X-axis, are the same i b = b , as is expected (See Figure 5.15. For this reason, this line may be taken as (0, b) + H . Thus the cosets of H in R ar the lines parallel to X-axis and therefore the elements of the quotient group R/H are the lines parallel to the X-axis. If (a, b) + H and (a , b ) + H are two elements of R/H , we dene their sum to be (a + a , b + b ) + H = (0, b + b ) + H , the line through (0, b + b ) parallel to the X-axis. Note that (R, +) is an abelian group and so H is a normal subgroup of R2 . Hence the above sum is welldened. The above addition denes a group structured on the set of lines parallel to the X-axis, that is, the elements of R/H . The identity element of the quotient group is the X-axis = H , and the inverse of (0, b) + H is (0, b) + H . Our next result exhibits the importance of factor groups.
5.15
Basic Isomorphism Theorem for Groups
If there exists a homomorphism f from a group G onto a group G with kernel K then G/K G .
Chapter 5 Algebraic Structures Y (0, b + b ) (0, b ) (a, b) (0, b) x O (x, 0) Figure 5.15 X (a + x, b) (a + x, b )
251
Proof. We have to prove that there exists an isomorphism from the factor group G/K onto G (observe that the factor group G/K is dened, as K is a normal subgroup of G (See Theorem 5.13.6). Dene : G/ K G by (gK ) = f (G). The mapping is pictorially depicted in Figure 5.15. First we need to establish that is a well dened map. This is because, it is possible that gK = g K with g = g in G. Then as per our denition of , (gK ) = f (G), while (g K ) = f (g ). Hence our denition of will be valid only if f (G) = f (g ). Now gK = g K implies that g 1 g K . Let g 1 g = k K . Then f (k ) = e , the identity element of G . Moreover, e = f (g 1 g ) = f (g 1 )f (g ) = f (g )1 f (g ) (as f is a group homomorphism, by Property 2 holds). Thus f (g ) = f (g ), and f is well dened. We next show that is a group isomorphism. (i) is a group homomorphism: We have for g1 K, g2 K in G/K , (g1 Kg2 K ) = ((g1 g2 )K )
Chapter 5 Algebraic Structures = f (g1 g2 ), by the denition of as f is a group homomorphism
252
= f (g1 )f (g2 ),
= (g1 K )(g2 K ) Thus is a group homomorphism. (ii) is 1 1: Suppose (g1 K ) = (g2 K ), where g1 K, g2 K G/K . This gives that f (g1 ) = f (g2 ), and therefore f (g1 )f (g2 )1 = e , the
1 1 identity element of G . But f (g1 )f (g2 )1 = f (g1 )f (g2 ) = f (g1 g2 ). 1 1 Hence g1 g2 = e , and so g1 g 2 K , and consequently, g1 K = g2 K
(by Property 4). Thus is 1 1. (iii) is onto: Let g G . As f is onto, there exists g G with f (g ) = g . Now gK G/K , and (gK ) = f (g ) = g . Thus is onto and hence is an isomorphism. Let us see as to what this isomorphism means with regard to the factor group R2 /H given in Example 5.14.1. Dene f : R2 R by f (a, b) = (0, b), the projection of the point (a, b) R2 on the Y-axis= R. The identity element of the image group is the origin (0, 0). Clearly K is the set of all points (a, b) R2 that are mapped to (0, 0), that is, the set of those points of R2 whose projections on the Y-axis coincide with the origin. Thus K is the X-axis(= R). Now : G/K = R2 /R G is dened by ((a + b) + K ) = f (a, b) = (0, b). This means that all points of the line through (a, b) parallel to the X-axis are mapped to their common projection on the Y-axis, namely, the point (0, b). Thus the isomorphism between G/K and G is obtained by mapping each line parallel to the X-axis to the point where the line meets the Y-axis.
253
We now consider another example. Let G = Sn , the symmetric group of degree n, and G = {1, 1}, the multiplicative group of two elements (with multiplication dened in the usual way). 1 is the identity element of G and the inverse of 1 is 1. Dene f : G G by setting 1 if Sn is an even permutation, that is, if An f ( ) = 1 if Sn is an odd permutation.
Recall that An is a subgroup of Sn . Now |Sn | = n! and |An | = n!/2. Further, if is an odd permutation in Sn , and An , then is an odd permutation, and hence |An | = n!/2. Let Bn denote the set of odd permutations in Sn . Then Sn = An Bn , An Bn = , and An = Bn for each Bn . Thus Gn /An has exactly two distinct cosets of An , namely, An and Sn \ An = Bn . The mapping f : Sn {1, 1} dened by f () = 1 or 1 according to whether the permutation is even or odd clearly denes a group homomorphism. The kernel K of this homomorphism is An , and we have Sn \ An {1, 1}. The isomorphism is obtained by mapping the coset An to 1 or 1 according to whether is an even or an odd permutation.
5.16
Exercises
1. Let G = SL(n, C) be the set of all invertible complex matrices A of order n. IF the operation denotes matrix multiplication, show that G is a group under .
0 2. Let G denote the set o fall real matrices of the form ( a b 1 ) with a = 0.
Show that G is a group under matrix multiplication.
Chapter 5 Algebraic Structures 3. Which of the following semigroups are groups? (i). (Q, ) (ii). (R , ) (iii). (Q, +) (iv). (R , ) (v). The set of all 2 by 2 real matrices under matrix multiplication.
0 (vi). The set of all 2 by 2 real matrices of the form ( a b 1)
254
4. Prove that a nite cancellative semigroup is a group, that is, if H is a nite semigroup in which both the cancellation laws are valid (that is, ax = ay implies that x = y , and xa = ya implies that x = y , a, x, y H ), is a group. (Hint: If H = {x1 , x2 , . . . , xn }, consider aH and Ha and apply Ex. 7 above. 5. Prove that in semigroup G in which the equations ax = b and yc = d, are solvable in G, where a, b, c, d G, is a group. 6. In the group GL(2, C) of 2 2 complex nonsingular matrices, nd the order of the following elements:
0 (i) ( 1 0 i ) 1 (ii) ( 1 0 1) i 0 (iii) ( 0 i )
(iv)
2+3i 1+2i 1i 32i
7. Let G be a group, and : G G be dened by (g ) = g 1 , g G. Show that is an automorphism of G i G is abelian. 8. Prove that any group of even order has an element of order 2. (Hint: o(a) = 2 i a = a1 . Pair of such elements (a, a1 ).
255
9. Give an example of a noncyclic group each of whose proper subgroup is cyclic. 10. Show that no group can be the set union of two of its proper subgroups. 11. Show that S = {3, 5} generates the group (Z, +). 12. Give an example of an innite nonabelian group. 13. Show that the permutation T = {12345} and S = (25)(34) generate a subgroup of order 10 in S5 . 14. Find if the following permutations are odd: (i) (123)(456) (ii) (1546)(2)(3)
2 3 4 5 6 7 8 9 (iii) ( 1 2 5 4 3 1 7 6 9 8).
15. If G = {a1 , . . . , an } is a nite abelian group of order n, show that (a1 a2 . . . an )2 = e. 16. Let G = {a R : a = 1}. If the binary operation is dened in G by a b = a + b + ab for all a, b G, show that (G, ) is a group. 17. Let G = {a R : a = 1}. If the binary operation is dened on G by a b = a + b ab for all a, b G, show that (G, ) is not a group. 18. Let = (i1 i2 . . . ir ) be a cycle in Sn of length r. Show that the order of (= the order of the group generated by ) is r. 19. Prove that the centre of a group is a normal subgroup G. 20. Let , , be permutations in S4 dened by
2 3 4 = (1 1 4 3 2 ), 2 3 4 = (1 2 1 4 3 ), 2 3 4 = (1 3 1 2 4 ).
Find (i) 1 ,
(ii) 1
(iii) 1 .
Chapter 5 Algebraic Structures 21. Show that any innite cyclic group is isomorphic to (Z, +).
256
22. Show that the set {ein : n Z} forms a multiplicative group. Show that this is isomorphic to (Z, +). Is this group cyclic? 23. Find a homomorphism of the additive group of integers to itself that is not onto. 24. Give an example of a group that is isomorphic to one of its proper subgroups. 25. Prove that (Z, +) is not isomorphic to (Q, +). (Hint: Suppose an isomorphism : Z Q. Let (5) = a Q. Then b Q with 2b = a. Let x Z be the preimage of b. Then 2x = 5 in Z, which is not true.) 26. Prove that the multiplicative groups R and C are not isomorphic. 27. Give an example of an innite group in which element is of nite order. 28. Give the group table of the group S3 . From the table, nd the centre of S3 . 29. Show that if a subgroup H of a group G is generated by a subset S of G, then H is a normal subgroup i aSa1 < S > for each a G. 30. Show that a group G is abelian i the centre C (G) of G is G. 31. Let G be a group. Let [G, G] denote the subgroup of G generated by all elements of G of the form aba1 b1 (called the commutator of a and b) for all pair of elements a, b G. Show that [G, G] is a normal subgroup of G. [Hint: For c G, we have c(aba1 b1 )c1 = (cac1 )(cbc1 )(cc1 )1 [G, G]. Now apply Exercise 29.]
Chapter 5 Algebraic Structures 32. Show that a group G is abelian [G, G] = {e}.
257
Remark: [G, G] is called the commutator subgroup of G and it is for this reason, we take the subgroup generated by the commutators of G. There is no known elementary counter example. For a counter example, see for instance, Rotman, Theory of Groups. 33. Let G be the set of all roots of unity, that is, G = { C : n = 1 for some n N}. Prove that G is an abelian group that is not cyclic. 34. If A and B are normal subgroups of a group G such that A B = {e}. Then show that for a A and b B , ab = ba. 35. If H is the only subgroup of a given nite order in a group G, show that H is normal in G. 36. Show that any subgroup of a group G of index 2 is normal in G. 37. Prove that the subgroup {e, (13)} of S3 is not normal in S3 . 38. Prove that the subgroup {e, (123), (132)} is a normal subgroup of S3 .
5.17
Rings
The study of commutative rings arose as a natural abstraction of the algebraic properties of the set of integers while that of the elds arose out of the sets of rational, real and complex numbers. We begin with the denition of a ring and then proceed on to establish some of its basic properties.
258
5.17.1
Rings, Denitions and Examples
Denition 5.17.1: A ring is a set A with two binary operations, denoted by + and (called addition and multiplication respectively) satisfying the following axioms: R1 : (A, +) is an abelian group. (The identity element of (A, +) is denoted by 0). R2 : is associative, that is, a (b c) = (a b) c for all a, b, c A. R3 : For all a, b, c A, a (b + c) = a b + a c (left distribution law) (a + b) c = a c + b c (right distribution law). It is customary to write ab instead of a b. Examples of Rings 1. A = Z, the set of all integers with the usual addition + and the usual multiplication taken as . 2. A = 2Z, the set of even integers with the usual addition and multiplication. 3. A = Q, R or C with the usual addition and multiplication. 4. A = Zn = {0, 1, 2, . . . , n 1}, the set of integers modulo n, where + and denote addition and multiplication taken modulo n. (For instance, if A = Z5 , then in Z5 , 3 + 4 = 7 = 2, and 3 4 = 12 = 2 as both 7 and 12 are congruent to 2 modulo 5).
259
5. A = Z[X ], the set of polynomials in the indeterminate X with integer coecients with addition + and multiplication dened in the usual way. 6. A = Z + i 3Z = a + ib 3 : a, b Z C. Then A is a ring with the usual + and in C. 7. (Ring of Gaussian integers). Let A = Z+iZ = {a + ib : a, b Z} C. Then with the usual addition + and multiplication in C, A is a ring. 8. (A ring of functions). Let A = C [0, 1], the set of all complex-valued continuous functions on [0, 1]. For t [0, 1], and f, g A, set (f + g )(t) = f (t) + g (t), and (f g )(t) = f (t)g (t). Then it is clear that both f + g and f g are in A. It is easy to check that A is a ring. Denition 5.17.2: A ring A is called commutative if for all a, b A, ab = ba. Hence if A is a noncommutative ring, there exists a pair of elements x, y A with xy = yx. All the rings given above in Examples 1 to 8 are commutative rings. We now present an example of a noncommutative ring. Example 9: Let A = M2 (Z), the set of all 2 by 2 matrices with integers
as entries. A is a ring with the usual matrix addition as + and usual matrix multiplication as . It is a noncommutative ring since M =
11 00
,N =
00 11
are in A, but M N = N M .
260
Unity element of a ring

Denition 5.17.3: An element e of a ring A is called a unity element of A if ea = ae = a for all a A. A unity element of A, if it exists, must be unique. For, if e and f are unity elements of A, then, ef = e as f is a unity or an identity element of A, ef = f as e is a unity element of A.
Therefore e = f . Hence if a ring A has a unity element e, we can refer to it as the unity element e of A. For the rings in Examples 1, 3 and 7 above, the number 1 is the unity element. For the ring C [0, 1] of Example 8 above, the function 1 C [0, 1] dened by 1(t) = 1 for all t [0, 1] acts as the unity element. For the ring M2 (Z) of Example 9., the matrix
10 01
acts as a the unity element. A ring
may not have a unity element. For instance, the ring 2Z in Example 5.17.1 above has no unity element.
5.17.2
Units of a ring
An element a of a ring A with unit element e is called a unit in A if there exist elements b and c in A such that ab = e = ca. Proposition 5.17.4: If a is a unit in a ring A with unity element e, and if ab = ca = e, then b = c. Proof. We have, b = eb = (ca)b = c(ab) = ce = c.
261
We denote the element b(= c) described in Proposition 5.17.4 as the inverse of a and denote it by a1 . Thus if a is a unit in A, then there exists an element a1 A such that aa1 = a1 a = e. Clearly, a1 is unique. Proposition 5.17.5: The units of a ring A (with identity element) form a group under multiplication. Proof. Exercise. Units of the ring Zn Let a be a unit in the ring Zn . (See Example 5.17.1 above). Then there exists an x Zn such that ax = 1 in Zn , or equivalently, ax 1( mod n). But this implies that ax 1 = bn for some integer b. Hence (a, n) = the gcd of a and n = 1. (Because if an integer c > 1 divides both a and n, then it should divide 1). Conversely, if (a, n) = 1, by Euclidean algorithm (see Section ??), there exist integers x and y with ax + ny = 1, and therefore ax 1( mod n). This however means that a is a unit in Zn . Thus the set U of units of Zn consists precisely of those integers in Zn , that are relatively prime to n. By Denition 3.7.2, |U | is (n), where is the Euler function.
Zero Divisors
In the ring Z of integers, a is a divisor of c if there exists an integer b such that ab = c. As Z is a commutative ring, we simply say that a is a divisor of c and not a left divisor or right divisor of c. Taking c = 0, we have the following more general denition.
262
A left zero divisor in a ring A is a non-zero element a of A such that there exists a non-zero element b of A with ab = 0 in A. a A is a right zero divisor in A if ca = 0 for some c A, c = 0. If A is commutative ring, a left zero divisor a in A is automatically a right zero divisor in A and vice versa. In this case, we simply call a a zero divisor in A.
Examples
1. If a = 2 in Z4 , then a is a zero divisor in Z4 , as 2 2 = 4 = 0 in Z4 .
1 2. In the ring M2 (Z), the matrix [ 1 0 0 ] is a right zero divisor as
0 0 1 1 0 [0 0 1][0 0] = [0 0]
0 and [ 0 0 1 ] is not the zero matrix of M2 (Z).
3. If p is a prime, then every non-zero element of Zp is a unit. This follows from the fact that if 1 a < p, then (a, p) = 1. Hence no a Zp , a = 0 is a zero divisor in Zp . Theorem 5.17.7: The following statements are true for any ring A. (i) a0 = 0a for any a A. (ii) a(b) = (a)b = (ab) for all a, b A. (iii) (a)(b) = ab for all a, b A.
Chapter 5 Algebraic Structures Proof. Exercise.
263
5.18
Integral Domains
An integral domain is an abstraction of the algebraic structure of the ring of integers. Denition 5.18.1: An integral domain A is a commutative ring with unity element having no divisors of zero.
Examples
1. The rings Z and Z + iZ are both integral domains. 2. The ring 2Z of even integers is not an integral domain (Why?) even though it has no zero divisor. 3. Let Z Z = {(a, b) : a Z, b Z}. For (a, b), (c, d) in Z Z, dene (a, b) (c, d) = (a c, b d), and(a, b) (c, d) = (ac, bd). Clearly Z Z is a commutative ring with zero element (0, 0) and unity element (1, 1) but not an integral domain as it has zero divisors. For instance, (1, 0) (0, 1) = (0, 0).
5.19
Exercises
1. Prove that Zn , n 2, is an integral domain i n is a prime.
Chapter 5 Algebraic Structures 2. Give the proof of Proposition 5.17.5. 3. Determine the group of units of the ring (i) Z, (ii) M2 (Z), (iii) Z + iZ, (iv) Z + i 3Z.
264
4. Let A be a ring, and a, b1 , b2 , . . . , bn A. Then show that a(b1 + b2 + + bn ) = ab1 + ab2 + abn . (Hint: Apply induction on n). 5. Let A be a ring, and a, b A. Then show that for any positive integer n, n(ab) = (na)b = a(nb). (na stands for the element a + a + (n times) of A). 6. Show that no unit of a ring A can be a zero divisor in A. 7. (Denition: A subset B of a ring A is a subring of A if B is a ring with respect to the binary operations + and of A). Prove: (i) Z is a subring of Q. (ii) Q is a subring of R. (iii) R is a subring of C. 8. Prove that any ring A with identity element and cardinality p, where p is a prime, is commutative. (Hint: Verify that the elements 1, 1 + 1, . . . , 1 + 1 + + 1 (p times) are all distinct elements of A).
5.20
Fields
We now discuss the fundamental properties of elds and then go on to develop the properties of nite elds that are basic to coding theory and cryptography.
265
If rings are algebraic abstractions of the set of integers, elds are algebraic abstractions of the sets Q, R and C (as mentioned already). Denition 5.20.1: A eld is a commutative ring with unity element in which every non-zero element is a unit. Hence if F is a eld, and F = F \ {0}, the set of non-zero elements of F , then every element of F is a unit in F . Hence F is a group under the multiplication operation of F . Conversely, if F is a commutative ring with unit element and if F is a group under the multiplication operation of F , then every element of F is a unit under the multiplication operation of F , and hence F is a eld. This observation enables one to give an equivalent denition of a eld. Denition 5.20.2 (Equivalent denition): A eld is a commutative ring F with unit element in which the set F of non-zero elements is a group under the multiplication operation of F . Every eld is an integral domain. To see this all that we have to verify is that F has no zero divisors. Indeed, if ab = 0, a = 0, then a1 exists in F and so we have 0 = a1 (ab) = (a1 a)b = b in F . However, not every integral domain is a eld. For instance, the ring Z of integers is an integral domain but not a eld. (Recall that the only non-zero integers which are units are 1 and 1.)
266
5.21
Characteristic of a Field
Denition 5.21.1: A eld F is called nite if |F |, the cardinality of F , is nite; otherwise, F is an innite eld. Let F be a eld whose zero and unity elements are denoted by 0F and 1F respectively. A subeld of F is a subset F of F such that F is also a eld with the same addition and multiplication operations of F . This of course means that the zero and unity elements of F are the same as those of F . It is clear that the intersection of any family of subelds of F is again a subeld of F . Let P denote the intersection of the family of all subelds of F . Naturally, the subeld P is the smallest subeld of F . Because if P is a subeld of F that is properly contained in P , then P P P , a
=
contradiction. This smallest subeld P of F is called the prime eld of F . Necessarily, 0F P and 1F P . As 1F P , the elements 1F , 1F + 1F = 2 1F , 1F + 1F + 1F = 3 1F and, in general, n 1F , n N, all belong to P . There are then two cases to consider: Case 1: The elements n 1F , n N, are all distinct. In this case, the subeld P itself is an innite eld and therefore F is an innite eld. Case 2: The elements n 1F , n N, are not all distinct. In this case, there exist r, s N with r > s such that r 1F = s 1F , and therefore, (r s) 1F = 0, where r s is a positive integer. Hence there exists a least positive integer p such that p 1F = 0. We claim that p is a prime number. If not, p = p1 p2 , where p1 and p2 are positive integers less than p. Then 0 = p 1F = (p1 p2 ) 1F = (p1 1F )(p2 1F ) gives, as F is a eld, either
267
p1 1F = 0 or p2 1F = 0. But this contradicts the choice of p. Thus p is prime. Denition 5.21.2: The characteristic of a eld F is the least positive integer p such that p1F = 0 if such a p exists; otherwise, F is said to be of characteristic zero. A eld of characteristic zero is necessarily innite (as its prime eld already is). A nite eld is necessarily of prime characteristic. However, there are innite elds with prime characteristic. Note that if a eld F has characteristic p, then px = 0 for each x F .
Examples
(i) The elds Q, R and C are all of characteristic zero. (ii) The eld Zp of integers modulo a prime p is of characteristic p. (iii) For a eld F , denote by F [X ] the set of all polynomials in X over F , that is, polynomials whose coecients are in F . F [X ] is an integral domain and the group of units of F [X ] = F . (iv) The eld Zp (X ) of rational functions of the form a(X ) , where a(X ) b(X ) and b(X ) are polynomials in X over Zp , and b(X ) = 0, is an innite eld of (nite) characteristic p. Theorem 5.21.3: Let F be a eld of (prime) characteristic p. Then for all x, y F , (x y )p = xp y p , and (xy )p = xp y p .
n n n n n n
268
Proof. We apply induction on n. If n = 1, (by binomial theorem which is valid for any commutative ring with unit element). (x + y )p = xp + p p p 1 xy p1 + y p x y + + p1 1 p = xp + y p , since p| , 1ip1 i
(5.1)
So assume that (x + y )p = xp + y p . Then (x + y )p

n+1 n n n
= (x + y )p = xp + y p
n n
p
n
by induction assumption (by (5.1)) (5.2)
= (xp )p + (y p )p = xp
n n+1
+ yp
n+1
Next we consider (x y )p . If p = 2, then y = y and so the result is valid. If p is an odd prime, change y to y in (5.2). This gives (x y )p = xp + (y )p
n n n n
= xp + (1)p y p = xp y p , since (1)p = 1.

n n n
5.22
Vector Spaces
In Section 5.20, we discussed some of the basic properties of elds. In the present section, we look at the fundamental properties of vector spaces. We follow up this discussion with a section on nite elds.
269
While the three algebraic structuresgroups, rings and eldsare natural generalizations of integers and real numbers, the algebraic structure vector space is a natural generalization of the 3-dimensional Euclidean space. We start with the formal denition of a vector space. To dene a vector space, we need two objects: (i) a set V of vectors, and (ii) a eld F of scalars. In the case of Euclidean 3-space, V is the set of vectors, each vector being an ordered triple (x1 , x2 , x3 ) of real numbers and F = R, the eld of real numbers. The axioms for a vector space that are given in Denition 5.22.1 below are easily seen to be generalizations of the properties of R3 . Denition 5.22.1: A vector space (or linear space) V over a eld F is a nonvoid set V whose elements satisfy the following axioms: (A) V has the structure of an additive abelian group. (B) For every pair of elements and v , where F and v V , there exists an element v V called the product of v by such that (i) (v ) = ( )v for all , F and v V , and (ii) 1v = v for each v V (Here 1 is the unity element of the eld F ). (C) (i) For F , and u, v in V , (u + v ) = u + v , that is, multiplication by elements of F is distributive over addition in V . (ii) For , F and v V , ( + )v = v + v , that is multiplication of elements of V by elements of F is distributive over addition in F.
270
If F = R, V is called a real vector space; if F = C, then V is called a complex vector space. When an explicit reference to the eld F is not required, we simply say that V is a vector space (omitting the words over the eld F ). The product v, F, v V , is often referred to as scalar multiplication, being the scalar.
5.22.1
Examples of Vector Spaces
1. Let V = R3 , the set of ordered triples x1 , x2 , x3 of real numbers. Then R3 is a vector space over R (as mentioned earlier) and hence R3 is a real vector space. More generally, if V = Rn , n 1, the set of ordered n-tuples (x1 , . . . , xn ) of real numbers, then Rn is a real vector space. If x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) Rn , then x + y = (x1 + y1 , . . . , xn + yn ), and x = (x1 , . . . , xn ). The zero vector of Rn is (0, . . . , 0). Rn is known as the real ane space of dimension n. 2. Let V = Cn , the set of ordered n-tuples (z1 , . . . , zn ) of complex numbers. Then V is a vector space over R as well as over C. We note that the real vector space Cn and the complex vector space Cn are essentially dierent spaces despite the fact that the underlying set of vectors is the same in both the cases. 3. R is a vector space over Q. 4. F [X ], the ring of polynomials in X over the eld F , is a vector space over F . 5. The set of solutions of a homogeneous linear ordinary dierential equation with real coecients forms a real vector space. The reason is that
Chapter 5 Algebraic Structures any such dierential equation has the form dn y dn1 y + C + + Cn1 y = 0. 1 dxn dxn1
271
(1)
Clearly, if y1 (x) and y2 (x) are two solutions of the dierential equation (1), then so is y (x) = 1 y1 (x) + 2 y2 (x), 1 , 2 R. It is now easy to verify that the axioms of a vector space are satised.
5.23
Subspaces
The notion of a subspace of a vector space is something very similar to the notions of a subgroup, subring and a subeld. Denition 5.23.1: A subspace W of a vector space V over F is a subset W of V such that W is also a vector space over F with addition and scalar multiplication as dened for V . Proposition 5.23.2: A non-void subset W of a vector space V is a subspace of V i for all u, v W and , F , u + v W Proof. If W is a subspace of V , then, as W is a vector space over F (with the same addition and scalar multiplication as in V ), u W and v W , and therefore u + v W . Conversely, if the condition holds, then it means that W is an additive subgroup of V (. . .). Moreover, taking = 0, we see that for each F ,
272
u W , u W . As W V , all the axioms of a vector space are satised by W and hence W is a subspace of V .
An example of a subspace
Let W = {(a, b, 0) : a, b R}. Then W is a subspace of R3 . (To see this, apply Denition 5.23.1). Geometrically, this means that the xy -plane of R3 (that is, the set of points of R3 with the z -coordinate zero) is a subspace of R3 . Proposition 5.23.3: If W1 and W2 are subspaces of a vector space V , then W1 W2 is also a subspace of V . More generally, the intersection of any family of subspaces of a vector space V is also a subspace of V . Proof. Let u, v W = W1 W2 , and , F . Then u + v belongs to W1 as well as to W2 by Proposition 5.23.2, and therefore to W . Hence W is a subspace of V again by Proposition 5.23.2. The general case is similar.
5.24
Spanning Sets
Denition 5.24.1: Let S be a subset of a vector space V over F . By the subspace spanned by S , denoted by < S >, we mean the smallest subspace of V that contains S . If < S >= V , we call S a spanning set of V . Clearly, there is at least one subspace of V containing S , namely, V . Let
273
S denote the collection of all subspaces of V containing S . Then W is also a subspace of V containing S . Clearly, it is the smallest subspace of V containing S , and hence < S >= W .
W S W S
Example 5.24.2: We shall determine the smallest subspace of R3 containing the vectors (1, 2, 1) and (2, 3, 4). Clearly, W must contain the subspace spanned by (1, 2, 1), that is, the line joining the origin (0, 0, 0) and (1, 2, 1). Similarly, W must also contain the line joining (0, 0, 0) and (2, 3, 4). These two distinct lines meet at the origin and hence dene a unique plane through the origin, and this is the subspace spanned by the two vectors (1, 2, 1) and (2, 3, 4). (See Proposition 5.24.3 below. Proposition 5.24.3: Let S be a subset of a vector space V over F . Then < S >= L(S ), where L(S ) = 1 s1 + 2 s2 + + r sr : si S, 1 i r and i F, 1 i r, r N = set of all nite linear combinations of vectors of S over F . Proof. First, it is easy to check that L(S ) is a subspace of V . In fact, let u, v L(S ) so that u = 1 s1 + + r sr , v = 1 s1 + + t st where si S , i F for each i and j F for each j . Hence if , F , then u + v = (1 )s1 + + (r )sr + (1 )s1 + + (t )st L(S ). and
274
Hence by Proposition 5.23.2, L(S ) is a subspace of V . Further 1s = s L(S ) for each s S , and hence L(S ) contains S . But by denition, < S > is the smallest subspace of V containing S . Hence < S > L(S ). Now, let W be any subspace of V containing S . Then any linear combination of vectors of S is a vector of W , and hence L(S ) W . In other words, any subspace of V that contains the set S must contain the subspace L(S ). Once again, as < S > is the smallest subspace of V containing S , L(S ) < S >. Thus < S > = L(S ). Note : If S = {u1 , . . . , un } is a nite set, then < S > = < u1 , . . . , un >= subspace of linear combinations of u1 , . . . , un over F . In this case, we say that S generates the subspace < S > or S is a set of generators for < S >. Also L(S ) is called the linear span of S in V . Proposition 5.24.4: Let u1 , . . . , un and v be vectors of a vector space V . Suppose that v < u1 , u2 , . . . , un >. Then < u1 , . . . , un > = < u1 , . . . , un ; v >. Proof. Any element 1 u1 + + n un , i F for each i, of < u1 , . . . , un > can be rewritten as 1 u1 + + n un + 0 v and hence belongs to < u1 , . . . , un ; v >. Thus < u1 , . . . , un > < u1 , . . . , un ; v > Conversely, if w = 1 u1 + + n un + v < u1 , . . . , un ; v >,
Chapter 5 Algebraic Structures then as v < u1 , . . . , un >, v = 1 u1 + + n un , i F and therefore w = (1 u1 + + n un ) + (1 u1 + + n un )

n
275
=
i=1
(i + i ) ui < u1 , . . . , un > .
Thus < u1 , . . . , un ; v >< u1 , . . . , un > and therefore < u1 , . . . , un > = < u1 , . . . , un ; v >
Corollary 5.24.5: If S is any nonempty subset of a vector space V , and v < S >, then < S {v } > = < S >. Proof. v < S > implies, by virtue of Proposition 5.24.3, v is a linear combination of a nite set of vectors in S . Now the rest of the proof is as in the proof of Proposition 5.24.4.
5.25
Linear Independence and Base
Let V = R3 , and e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1) in R3 . If v = (x, y, z ) is any vector of R3 , then v = xe1 + ye2 + ze3 . Trivially, this is the only way to express v as a linear combination of e1 , e2 , e3 . For this reason, we call {e1 , e2 , e3 } a base for R3 . We now formalize these notions. Denition 5.25.1: (i) A nite subset S = {v1 , . . . , vn } of vectors of a vector space V over a eld F is said to be linearly independent if the equation 1 v1 + 2 v2 + + n vn = 0, i F
Chapter 5 Algebraic Structures implies that i = 0 for each i.
276
In other words, a linearly independent set of vectors admits only the trivial linear combination between them, namely, 0 v1 + 0 v2 + + 0 vn = 0 In this case we also say that the vectors v1 , . . . , vn are linearly independent over F . In the above equation, the zero on the right refers to the zero vector of V while the zeros on the left refer to the scalar zero, that is, the zero element of F . (ii) An innite subset S of V is linearly independent in V if every nite subset of vectors of S is linearly independent. (iii) A subset S of V is linearly dependent over F if it is not linearly independent over F . This means that there exists a nite subset {v1 , . . . , vn } of S and a set of scalars 1 , . . . , n , not all zero, in F such that 1 v1 + + n vn = 0. If {v1 , . . . , vn } is linearly independent over F , we also note that the vectors v1 , . . . , vn are linearly independent over F . Remark 5.25.2: (i) The zero vector of V forms a linearly dependent set since it satises the nontrivial equation 1 0 = 0, where 1 F and 0V. (ii) Two vectors V are linearly dependent over F i one of them is a scalar multiple of the other.
277
(iii) If v V and v = 0, then {v } is linearly independent (since for F, v = 0 implies that = 0). (iv) The empty set is always taken to be linearly independent. Proposition 5.25.3: Any subset T of a linearly independent set S of a vector space is linearly independent. Proof. First assume that S is a nite subset of V . We can take T =
{v1 , . . . , vr } and S = {v1 , . . . , vr ; vr+1 , . . . , vn }, n r. The relation 1 v1 + r vr = 0, i F is equivalent to the condition that (1 v1 + r vr ) + (0 vr+1 + 0 vn ) = 0. But this implies, as S is linearly independent over F , i = 0, 1 i n. Hence T is linearly independent. If S is an innite set and T S , then any nite subset of T is a nite subset of S and hence linearly independent. Hence T is linearly independent over F . A restatement of Proposition 5.25.3 is that any superset ( V ) of a linearly dependent subset of V is linearly dependent. Corollary 5.25.4: If v L(S ), then S {v } is linearly dependent.
278
Proof. By hypothesis, there exist v1 , . . . , vn in S , and 1 , . . . , n F such that 1 v1 + n vn + (1)v = 0. Hence {v1 , . . . , vn ; v } is linearly dependent and so by Proposition 5.25.3, S {v } is linearly dependent.
Examples
1. C is a vector space over R. The vectors 1 and i of C are linearly independent over R. In fact, if , R, then 1+i=0 gives that + i = 0, and therefore = 0 = . One can check that {1+ i, 1 i} is also linearly independent over R, while {2+ i, 1+ i, 1 i} is linearly dependent over R. The last assertion follows from the fact that if u = 2 + i, v = 1 + i and w = 1 i, then u + w = 3 and v + w = 2 so that 2(u + w) = 3(v + w) and giving 2u 3v w = 0. 2. The innite set of polynomials S = {1, X, X 2 , . . .} in the vector space R[X ] of polynomials in X with real coecients is linearly independent. Recall that an innite set S is linearly independent i every nite subset of S is linearly independent. So consider a nite subset {X i1 , X i2 , . . . , X in } of S . The equation 1 X i1 + 2 X i2 + + n X in = 0 (A)
279
where the scalars i , 1 i n, all belong to R, implies that the polynomial on the left side of equation (A) is the zero polynomial and hence must be zero for every real value of X . In other words, every real number is a zero (root) of this polynomial. This is possible only if each i is zero. Hence the set S is linearly independent over R.
5.26
Bases of a Vector Space
Denition 5.26.1: A basis (or base) of a vector space V over a eld F is a subset B of V such that (i) B is linearly independent over F , and (ii) B spans V ; in symbols, < B >= V . Condition (ii) implies that every vector of V is a linear combination of (a nite number of) vectors of B while condition (i) ensures that the expression is unique. Indeed, if u = 1 u1 + + n un = 1 u1 + + n un , where the ui s are all in B , then (1 1 )u1 + +(n n )un = 0. The linear independence of the vectors u1 , . . . , un , mean i i = 0, that is, i = i for each i. Notice that we have taken the same u1 , . . . , un in both the expressions as we can always add terms with zero coecients. For example, if u = 1 u1 + 2 u2 = 1 u1 + 2 u2 , u = 1 u1 + 2 u2 + 0 u1 + 0 u2 = 0 u1 + 0 u2 + u1 + u2 . then
Chapter 5 Algebraic Structures Example 5.26.2:
280
The vectors e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1) form a basis for R3 . This follows from the following two facts. 1. {e1 , e2 , e3 } is linearly independent in R3 . In fact, ! e1 + 2 e2 + 3 e3 = 0, i R, implies that (1 , 2 , 3 ) = 1 (1, 0, 0) + 2 (0, 1, 0) + 3 (0, 0, 1) = (0, 0, 0), and hence i = 0, 1 i 3. 2. < e1 , e2 , e3 >= R3 . To see this, any vector of < e1 , e2 , e3 >= 1 e1 + 2 e2 + 3 e3 = (1 , 2 , 3 ) R3 and, conversely, any (1 , 2 , 3 ) in R3 is 1 e1 + 2 e2 + 3 e3 and hence belongs to < e1 , e2 , e3 >.
5.27
Dimension of a Vector Space
Denition 5.27.1: By a nite-dimensional vector space, we mean a vector space that can be generated (or spanned) by a nite number of vectors in it. Our immediate goal is to establish that any nite-dimensional vector space has a nite basis and that any two bases of a nite-dimensional vector space have the same number of elements. Lemma 5.27.2: No nite-dimensional vector space can have an innite basis. Proof. Let V be a nite-dimensional vector space with a nite spanning set S = {v1 , . . . , vn }. Suppose to the contrary V has an innite basis B . Then,
281
as B is a basis, vi is a linear combination of a nite subset Bi , 1 i n of B . Let B = Bi . Then B is also a nite subset of B . As B B , B is
i=1 n
linearly independent and further, as each v V is a linear combination of v1 , . . . , vn , v is also a linear combination of the vectors of B . Hence B is also a basis for V . If x B |B , then x L(B ) and so B {x} is a linearly dependent subset of the linearly independent set B , a contradiction. Lemma 5.27.3: A nite sequence {v1 , . . . , vn } of non-zero vectors of a vector space V is linearly dependent i for some k , 2 k n, vk is a linear combination of its preceding vectors. Proof. In one direction, the proof is trivial; if vk < v1 , . . . , vk1 >, then by Proposition 5.25.3, {v1 , . . . vk1 ; vk } is linearly dependent and so is its superset {v1 , . . . , vn }. Conversely, assume that {v1 , v2 , . . . , vn } is linearly dependent. As v1 is a non-zero vector, {v1 } is linearly independent (See (iii) of Remark 5.25.2). Hence there must exist a k, 2 k n such that {v1 , . . . , vk1 } is linearly independent while {v1 , . . . , vk } is linearly dependent since at worst k can be n. Hence there exists a set of scalars 1 , . . . , k , not all zero, such that 1 v1 + + k vk = 0. Now k = 0; for if k = 0, there exists a nontrivial linear relation connecting v1 , . . . , vk1 contradicting the fact that {v1 , . . . , vk1 } is linearly independent. Thus
1 1 vk = k 1 v1 k k1 vk1 .
282
Lemma 5.27.3 implies, by Proposition 5.24.4, that under the stated conditions on vk , < v1 , . . . , vk , . . . , vn >=< v1 , . . . , vk1 , vk+1 , . . . , vn >=< v1 , . . . , vk , . . . , vn >, where the

symbol upon vk indicates that the vector vk should be deleted.
We next prove a very important property of nite-dimensional vector spaces. Theorem 5.27.4: Any nite-dimensional vector space has a basis. Moreover, any two bases of a nite-dimensional vector space have the same number of elements. Proof. Let V be a nite-dimensional vector space. By Lemma 5.27.2, every basis of V is nite. Let S = {u1 , . . . , um }, and T = {v1 , . . . , vn } be any two bases of V . We want to prove that m = n. Now v1 V =< u1 , . . . , um >. Hence the set S1 = {v1 ; u1 , . . . , um } is linearly dependent. By Lemma 5.27.3, there exists a vector ui1 {u1 , . . . , um } such that < v1 ; u1 , . . . , um >=< v1 ; u1 , . . . ui1 , . . . , um > . Now consider the set of vectors S2 = {v2 , v1 ; u1 , . . . , , ui1 , . . . , um } = {v2 , v1 ; u1 , . . . um }/{ui1 }. As v2 V = < v1 ; u1 , . . . , ui1 , . . . , um >, there exists a vector ui2 {u1 , . . . , ui1 , . . . , um } such that ui2 is a linear combination of the vectors preceding it in the sequence S2 . (Such a vector cannot be a vector of T as every subset of T is linearly independent). Hence if S3 = {v1 , v2 ; u1 , . . . , ui1 , . . . ui2 , . . . , um }

Chapter 5 Algebraic Structures = {v1 , v2 ; u1 , . . . , ui1 , . . . ui2 , . . . , um }/ ui1 , ui2 , < S3 > = V.

283
Thus every time we introduce a vector from T , we are in a position to delete a vector from S . Hence |T | |S |, that is, n m. Interchanging the roles of the bases S and T , we see, by a similar argument, that m n. Thus m = n. Note that we have actually shown that any nite spanning subset of a nite-dimensional vector space V does indeed contain a nite basis of V . Theorem 5.27.4 makes the following denition unambiguous. Denition 5.27.5: The dimension of a nite-dimensional vector space is the number of elements in any one of its bases. If V is of dimension n over F , we write dimF V = n or, simply, dim V = n, when F is known.
Examples
1. Rn is of dimension n over R. In fact, it is easy to check that the set of vectors S = {e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1)} is a basis for Rn over R. 2. Cn is of dimension n over C.
284
3. Cn is of dimension 2n over R. In fact, if ek Cn has 1 in the k -th position and 0 in the remaining positions, and fk Cn has i = 1 in the k -th position and 0 in the remaining positions, then S = {e1 , . . . , en ; f1 , . . . fn } forms a basis of Cn over R. (Verify!) 4. Let Pn (X ) denote the set of real polynomials in X with real coecients of degrees not exceeding n. Then B = {1, X, X 2 , . . . , X n } is a basis for Pn (X ). Hence dimR Pn (X ) = n + 1. 5. The vector space R over Q is innite-dimensional. This can be seen as follows: Suppose dimQ R = n (nite). Then R has a basis {v1 , . . . , vn } over Q. Hence R = {1 v1 + + n vn : i Q for each i} But as Q is countable, the number of such linear combinations is countable (See Section ??). This is a contradiction as R is uncountable. Thus R is innite-dimensional over Q. Proposition 5.27.6: Any maximal linearly independent subset of a nite-dimensional vector space V is a basis for V . Proof. Let B be a maximal linearly independent subset of V , that is, B is not a proper subset of B , where B is linearly independent in V . Suppose B is not a basis for V . This means, by the denition of a basis for a vector space, that there exists a vector x in V such that x < B >. Then B {x}
285
must be a linearly independent subset of V ; for, suppose B {x} is linearly dependent. Then there exist v1 , . . . , vn in B satisfying a nontrivial relation 1 v1 + + n vn + x = 0, i F, for each i and F .
Now, = 0, as otherwise, {v1 , . . . , vn } would be linearly dependent over F . Thus x = 1 (1 v1 + + n vn ) < B >, a contradiction. Thus B {x} is linearly independent. But then the fact that B {x} B violates the maximality of B. Hence B must be a basis for V .
5.28
Exercises
1. Show that Z is not a vector space over Q. 2. If n N, show that the set of all real polynomials of degree n does not form a vector space over R (under usual addition and scalar multiplication of polynomials). 3. Which of the following are vector spaces over R? (i) V1 = {x, y, z R3 such that y + z = 0}. (ii) V1 = {x, y, z R3 such that y + z = 1}. (iii) V1 = {x, y, z R3 such that y 0}. (iv) V1 = {x, y, z R3 such that z = 0}. 4. Show that the dimension of the vector space of all m by n real matrices over R is mn. [Hint: For m = 2, n = 3, the matrices 100 010 001 000 000 000 000 , 000 , 000 , 100 , 010 , 001 form a basis for the space of all 2 by 3 real matrices. Verify this rst].
286
5. Prove that a subspace of a nite-dimensional vector space is nitedimensional. 6. Show that the vector space of real polynomials in X is innite-dimensional over R. 7. Find a basis and the dimension of the subspace of R4 spanned by the vectors u1 = (1, 2, 2, 0), u2 = (2, 4, 0, 1) and u3 = (4, 8, 4, 1). 8. Find the dimension of the subspace of R3 spanned by v1 = (2, 3, 7), v2 = (1, 0, 1), v3 = (1, 1, 2) and v4 = (0, 1, 3). [Hint: v1 = 3v3 v2 and v4 = v3 v2 .] 9. State with reasons whether each of the following statements is true or false: (i) A vector space V can have two disjoint subspaces. (ii) Every vector space of dimension n has a subspace of dimension m for each m n. (iii) A two-dimensional vector space has exactly three subspaces. (iv) In a vector space, any two generating (that is, spanning) subsets are disjoint. (v) If n vectors of a vector space V span a subspace U of V , then dim U = n.
287
5.29
Solutions of Linear Equations and Rank of a Matrix
Let
be an m by n matrix over a eld F . To be precise, we take F = R, the eld of real numbers. Let R1 , R2 , . . . , Rm be the row vectors and C1 , C2 , . . . , Cn the column vectors of A. Then each Ri Rn and each Cj Rm . The row space of A is the subspace < R1 , . . . , Rm > of Rn , and its dimension is the row rank of A. Clearly, (row rank of A) m since any m vectors of a vector space span a subspace of dimension at most m. The column space of A and the column rank of A ( n) are dened in an analogous manner. We now consider three elementary row transformations (or operations) dened on the row vectors of A: (i) Rij interchange of the i-th and j -th rows of A. (ii) kRi multiplication of the i-th row vector Ri of A by a non-zero scalar (real number) k . (iii) Ri + cRj addition to the i-th row of A, c times the j -th row of A, c being a scalar. The elementary column transformations are dened in an analogous manner. The inverse of each of these three transformations is again a transformation of the same type. For instance, the inverse of Ri + cRj is got by additing to the new i-th row, c times Rj .
a11 a12 . . . a1n a21 a22 . . . a2n A= . . . . . . . . . am1 am2 . . . amn
288
Let A be a matrix obtained by applying a nite sequence of elementary row transformations to a matrix A. Then the row space of A = row space of A , and hence, row rank of A = dim( row space of A) = dim( row space of A ) = row rank of A . Now a matrix A is said to be in row-reduced echelon form if: (i) The leading non-zero entry of any non-zero row (if any) of A is 1. (ii) The leading 1s in the non-zero rows of A occur in increasing order of their columns. (iii) Each column of A containing a leading 1 of a row of A has all its other entries zero. (iv) The non-zero rows of A precede its zero rows, if any. Now let D be a square matrix of order n. The three elementary row (respectively column) operations considered above do not change the singular or nonsingular nature of D. In other words, if D is a row-reduced echelon form of D, then D is singular i D is singular. In particular, D is nonsingular i D = In , the identity matrix of order n. Hence if a row-reduced echelon form A of a matrix A has r non-zero rows, the maximum order of a nonsingular square submatrix of A is r. This number is called the rank of A. Denition 5.29.1: The rank of a matrix A is the maximum order of a nonsingular square submatrix of A. Equivalently, it is the maximum order of a nonvanishing determinant minor of A.
Chapter 5 Algebraic Structures Example 5.29.2: Find the row-reduced echelon form of 12 2 1 A = 3 3 66 3 1 2 4 1 4 3 . 6
289
As the leading entry of R1 is 1, we perform the operations R2 2R1 ; R3 3R1 ; R4 6R1 (where Ri stands for the i-th row of A). This gives 1 2 3 1 0 3 7 6 A1 = 0 3 7 6 . 0 6 14 12
1 1 R2 (that is, replace R2 by 3 R2 ). This gives Next perform 3 1 2 3 1 0 1 7/3 2 A1 = 0 3 7 6 . 0 6 14 12
Now perform R1 2R2 (that is, replace R1 by R1 2R2 etc.); R3 + 3R2 ; R4 + 6R2 . This gives the matrix 1 0 A2 = 0 0 0 5/3 1 7/3 0 0 0 0 3 2 . 0 0
A2 is the row-reduced echelon form of A. Note that A2 is uniquely determined by A. Since the maximum order of a non-singular submatrix of A2 is 2, rank of A = 2. Moreover, row space of A2 =< R1 , R2 (of A2 ) >. Clearly R1 and R2 are linearly independent over R since for 1 , 2 R, 1 R1 + 2 R2 = (1 , 2 , 51 /3 + 72 /3, 31 22 ) = 0 = (0, 0, 0, 0) implies that 1 = 0 = 2 . Thus the row rank of A is 2 and therefore the column rank of A is also 2. Remark 5.29.3: Since the last three rows of A1 are proportional (that is, one row is a multiple
290
of the other two), any 3 3 submatrix of A2 will be singular. Since A1 has a

2 nonsingular submatrix of order 2, (for example, ( 1 0 3 ] ), A1 is of rank 2 and
we can conclude that A is also of rank 2.
5.30
Solutions of Linear Equations
Consider the system of linear homogeneous equations X1 + 2X2 + 3X3 X4 = 0 2X1 + X2 X3 + 4X4 = 0 3X1 + 3X2 + 2X3 + 3X4 = 0 6X1 + 6X2 + 4X3 + 6X4 = 0 . These equations are called homogeneous because if (X1 , X2 , X3 , X4 ) is a solution of these equations, then so is (kX1 , kX2 , kX3 , kX4 ) for any scalar k . Trivially (0, 0, 0, 0) is a solution. We express these equations in the matrix form AX = 0, where 1 2 A = 3 6 2 1 3 6 3 1 2 4 1 4 3 , 6 X 1 X2 X= X , 3 X4 0 0 and 0 = 0 . 0 (5.4) (5.3)
If X1 and X2 are any two solutions of (5.4), then so is aX1 + bX2 for scalars a and b since A(aX1 + bX2 ) = a(AX1 ) + b(AX2 ) = a 0 + b 0 = 0. Thus the set of solutions of (5.4) is (as X Rn ) a vector subspace of Rn , where n = the number of indeterminates in the equations (5.3). It is clear that the three elementary row operations performed on a system of homogeneous linear equations do not alter the set of solutions of the
291
equations. Hence if A is the row reduced echelon form of A, the solution sets of AX = 0 and A X = 0 are the same. In the Example 5.29.2, A = A2 . Hence the equations A X = 0 are: X1 5 X3 + 3X4 = 0, 3 7 X2 + X3 2X4 = 0 3 5 X1 = X3 3X4 3 7 X2 = X3 + 2X4 3 so that X 5/3 3 1 X 2 7 / 3 2 X = X = X3 1 + X4 0 . 3 1 X4 0 and
3 2 0 1
and
These give
Thus the space of solutions of AX = 0 is spanned by the two linearly independent vectors
5/3 7/3 1 0
and hence is of dimension 2. This number
2 corresponds to the fact that X1 , X2 , X3 , and X4 are all expressible in terms of X3 and X4 . Here X1 and X2 correspond to the identity submatrix of order 2 of A , that is, the rank of A. Also, dimension 2 = 4 2 = (number of indeterminates) (rank of A). The general case is clearly similar where the system of equations is given by ai1 X1 + + ain Xn = 0, 1 i m. These are given by the matrix equation AX = 0, where A is the m by n matrix
X1
(aij ) of coecients and X = Theorem 5.30.1:
Xn
. . .
; we state it as a theorem.
The solution space of a system of homogeneous linear equations is of dimension n r, where n is the number of unknowns and r is the rank of the matrix A of coecients.
292
5.31
Solutions of Nonhomogeneous Linear Equations
A system of nonhomogeneous linear equations is of the form a11 X1 + a12 X2 + + a1n Xn = b1 . . . . . . . . . . . .
am1 X1 + am2 X2 + + amn Xn = bm . These m equations are equivalent to the single matrix equation AX = B, where A = (aij ) is an m by n matrix and B is a non-zero column vector of length m. It is possible that such a system of equations has no solution at all. For example, consider the system of equations X1 X2 + X3 = 2 X1 + X2 X3 = 0 3X1 = 6.
From the last equation, we get X1 = 2. This, when substituted in the rst two equations, yields X2 + X3 = 0, X2 X3 = 2 which are mutually contradictory. Such equations are called inconsistent equations. When are the equations represented by AX = B consistent? Theorem 5.31.1: The equations AX = B are consistent if and only if B belongs to the column space of A.

1
293 . . . such
Proof. The equations are consistent i there exists a vector X0 =
that AX0 = B . But this happens i 1 C1 + + n Cn = B , where C1 , . . . , Cn are the column vectors of A, that is, i B belongs to the column space of A. Corollary 5.31.2: The equations represented by AX = B are consistent i rank of A = rank of (A, B ). [(A, B ) denotes the matrix obtained from A by adding one more column vector B at the end. It is called the matrix augmented by B ]. Proof. By the above theorem, the equations are consistent i B < C1 , . . . , Cn >. But this is the case, by Proposition 5.24.4, i < C1 , . . . , Cn >=< C1 , . . . , Cn , B >. The latter condition is equivalent to, column rank of A = column rank of (A, B ) and consequently to rank of A = rank of (A, B ). We now ask the natural question. If the system AX = B is consistent, how to solve it? Theorem 5.31.3: Let X0 be any particular solution of the equation AX = B . Then, the set of all solutions of AX = B is given by {X0 + U }, where U varies over the set of solutions of the auxilliary equation AX = 0. Proof. If AU = 0, then A(X0 + U ) = AX0 + AU = B +0 = B , so that X0 + U is a solution of AX = B . Conversely, let X1 be an arbitrary solution of AX = B , so that AX1 = B .
294
Then AX0 = AX1 = B gives that A(X1 X0 ) = 0. Setting X1 X0 = U , we get, X1 = X0 + U , and AU = 0. As in the case of homogeneous linear equations, the set of solutions of AX = B remains unchanged when we perform any nite number of elementary row transformations on A, provided we take care to perform simultaneously the same operations on the matrix B on the right. As before, we row-reduce A to its echelon form A0 and the equation AX = B gets transformed into its equivalent equation A X0 = B0 .
5.32
LUP Decomposition
Denition 5.32.1: By an LUP decomposition of a square matrix A we mean an equation of the form P A = LU (5.5)
where P is a permutation matrix, L, a unit-triangular matrix and U , an upper triangular matrix. (See Chapter 1 for denitions). Suppose we have determined matrices P, L and U so that equation (5.5) holds. The equation AX = b is equivalent to (P A)X = P b in that both have the same set of solutions X . This is because P 1 exists and hence (P A)X = P b is same as AX = b. Now set P b = b . Then P AX = P b gives LU X = b . (5.6)
Hence if we set U X = Y (a column vector), then (5.6) becomes LY = b . We know from Section 5.31 how to solve LY = b . A solution Y of this equation, when substituted in U X = Y gives X again by the same method.
295
5.32.1
Computing an LU Decomposition
We rst consider the case when A = (aij ) is a nonsingular matrix of order n. We begin by obtaining an LU decomposition for A; that is, an LUP decomposition with P = In in (5.5). The process by which we obtain the LU decomposition for A is known as Gaussian elimination . Assume that a11 = 0. a 11 a21 A= . . . an1 We write a12 . . . a22 . . . . . . an2 . . . a1n a11 a12 . . . a1n a2n a21 . , = . . . . . A an1 ann
where A is a square matrix of order n 1. We can now factor A as a a ... a1n 11 12 0 ... 0 1 0 a21 /a11 . t vw . . . A . . In1 a11 an1 /a11 0
a21
where v =
an1
. . .
and wt = (a12 . . . a1n ). Note that vwt is also a matrix of vwt is called the Schur complement of A a11
order n 1. The matrix A1 = A with respect to a11 .
We now recursively nd an LU decomposition of A. If we assume that A1 = L U , where L is unit lower-triangular and U is upper-triangular, then 1 0 a11 wt A= v/a11 In1 0 A1 t a 1 0 w 11 = v/a11 In1 0 LU a11 wt 1 0 = v/a11 L 0 U
Chapter 5 Algebraic Structures = LU, where L = 1 0
296 (5.7)
v/a11 L 0 U equations on the right of (5.7) can be veried by routine block multiplication of matrices (See ???). This method is based on the supposition that a11
, and U =
a11 w
. The validity of the two middle
and all the leading entries of the successive Schur complements are all nonzero. If a11 is zero, we interchange the rst row of A with a subsequent row having a non-zero rst entry. This amounts to premultiplying both sides by the corresponding permutation matrix P yielding the matrix P A on the left. We now proceed as with the case when a11 = 0. If a leading entry of a subsequent Schur complement is zero, once again we make interchanges of rowsnot just the rows of the relevant Schur complement but the full rows got from A. This again amounts to premultiplication by a permutation matrix. Since any product of permutation matrices is a permutation matrix, this process nally ends up with a matrix P A, where P is a permutation matrix of order n. We now present two examples, one to obtain the LU decomposition when it is possible and another to determine the LUP decomposition. Example 5.32.2: Find the LU decomposition of A = Here a11 = 2, v = Therefore v/a11 = transpose of w. Hence the Schur complement of A is A1 = Now the Schur complement of A1 is
7 4 7 7 13 16 10 13 15 2 1 3 4 2 6 2 4 2 6 3 7 7 10 1 4 13 13 2 7 16 15
, wt = [3, 1, 2].
6 2 4 3 1 2 9 3 6
, and so vwt /a11 =
, where wt denotes the
6 2 4 3 1 2 9 3 6
1 2 3 4 12 14 1 10 9

14 4 2, 3 ] = [ 12 14 ] [ 8 12 ] = [ 4 2 ]. A2 = [ 12 10 9 ] [ 1 ] [ 10 9 2 3 8 6
297
This gives the Schur complement of A2 as A3 = (6) (2)(2) = (2) = (1)(2) = L3 U3 , where L3 = (1) is lower unit triangular and U3 = (2) is upper triangular. Tracing back we get, 0 1 A2 = v /a L 2 11 3 10 = 21 This gives, A1 = Consequently, 1 2 A = 1 3 0 0 0 1 2 3 2 2 0 1 4 1 0 0 1 2
t 4 w2 0 U3
42 0 2 = L2 U2 . 123 042 . 002 3 1 0 0 1 2 4 0 2 3 2 = LU, 2
100 410 121 02 00 0 0 0 1
where 1 2 L = 1 3 2 0 U = 0 0 0 1 4 1 3 1 0 0 0 0 1 2 1 2 4 0
is unit lower-triangular (see Ch. 1 for denition), and
is upper-triangular.
Example 5.32.3:
2 4 Find the LUP decomposition of A = 2 6
3 6 7 10
1 4 13 13
2 7 16 . 15
Chapter 5 Algebraic Structures Suppose we proceed as before: The Schur complement of A is
298
0 2 3 6 4 7 624 6 4 7 2 7 13 16 3 1 2 7 13 16 1 3 1 2 = 4 12 14 . [ ]= A1 = 1 10 9 936 3 10 13 15 10 13 15 Since the leading entry is zero, we interchange the rst row of A1 with some other row. Suppose we interchange the rst and third rows of A1 . This amounts to considering the matrix P A instead of A1 , where P =
1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0
Note that the rst row of A2 corresponds to the second row of A and the last row of A2 to the fourth row of A. This means that the Schur complement of P A (instead of A) is A1 = The Schur complement of A1 is 12 14 4 12 14 40 36 28 22 A2 = 2 3 0 [10 9] = 2 3 0 0 = 2 3 . The Schur complement of A2 is A3 = 3
2 28 1 10 9 4 12 14 0 2 3
. We now proceed with A1 as before.
[22] = 3
11 10 = = (1)(10/7) = L3 U3 , 7 7
where L3 = [1] and U3 =
10 . Hence 7 A2 = 1 0 2 1 28 28 22 . 0 10 7 1 10 9 0 28 22 . 0 0 10 7
This gives
1 0 0 A1 = 4 1 0 1 1 0 14 1 3 1 2
Thus
where L and U are the rst and second matrices in the product. Notice that we have interchanged the second and fourth rows of A while computing L.
0 0 0 23 1 2 1 0 0 0 1 10 9 4 1 0 0 0 28 22 = LU, 1 0 1 0 0 0 10 14 7
299
We have assumed, to start with, that A is a nonsingular matrix. Even if A is invertible, it is possible that A has no LU decomposition. For example, 01 the nonsingular matrix 1 0 has no LU decomposition. If A is singular it is possible that not only a column of a Schur complement but even the full column corresponding to it may be zero. In that case, we have to interchange columns. This would result in a matrix of the form AP rather than P A. It is also possible that we get the form P1 AP2 where P1 and P2 are both permutation matrices. Example 5.32.4: Solve the system of linear equations 2X1 + 3X2 + X3 + 2X4 = 10 4X1 + 6X2 + 4X3 + 7X4 = 25 2X1 + 7X2 + 13X3 + 16X4 = 40 6X1 + 10X2 + 13X3 + 15X4 = 50. These equations are equivalent to AX = B , where A is the matrix of Example 5.32.3 and B =
10 25 40 50
(5.8)
. If we interchange any pair of rows of (A|B ),
it amounts to interchanging the corresponding equations. However, this will in no way alter the solution. Hence the solutions of (5.8) are the same as solutions of P AX = P B , where P is the permutation matrix obtained in Example 5.32.3. Thus the solutions are the same as the solutions of LU X = B , where L and M are again those obtained in Example 5.32.3 and B is got from B by interchanging the second and fourth entries. Now set U X = Y so that the given equations become LY = B , where
Chapter 5 Algebraic Structures Y =

Y1 Y2 Y3 Y4
300
. This gives 1 3 1 2 0 0 0 10 1 0 0 Y1 Y 2 = 50 . 4 1 0 40 Y3 1 25 1 Y4 0 14
These are equivalent to
Y1 3Y1 + Y2
= 10 = 50
Y1 + 4Y2 +Y3 = 40 2Y1 1 Y3 +Y4 = 25, 14
and we get Y1 = 10, Y2 = 20, Y3 = 50 and Y4 = 10/7. Substituting these values in U X = Y , we get 23 1 0 1 10 0 0 28 00 0 2 10 X 1 9 20 X 2 50 . 22 = 10 X3 10 X4 7 7
These give
2X1 + 3X2 + X3 + 2X4 = 10 X2 + 10X3 + 9X4 = 20 28X3 22X4 = 50 10 10 X4 = 7 7
Solving backward, we get X1 = 2, X2 = X3 = X4 = 1.
301
5.33
Exercises
1. Examine if the following equations are consistent. X1 + X2 + X3 + X4 = 0 2X1 X2 + 3X3 + 4X4 = 1 3X1 + 4X3 + 5X4 = 2 2. Solve the system of homogeneous linear equations: 4X1 + 4X2 + 3X3 5X4 = 0 X1 + X2 + 2X3 3X4 = 0 2X1 + 2X2 X3 + X4 = 0 X1 + X2 + 2X3 3X4 = 0 Show that the solution space is of dimension 2. 3. Solve: X1 + X2 + X3 + X4 = 0 X1 + 3X2 + 2X3 + 4X4 = 0 2X1 + X3 X4 = 0 4. Solve by using LUP decomposition: (i) 2X1 + 3X2 5X3 + 4X4 = 8 3X1 + X2 4X3 + 5X4 = 10 7X1 + 3X2 2X3 + X4 = 10
Chapter 5 Algebraic Structures 4X1 + X2 X3 + 3X4 = 10 (ii) 3X1 2X2 + X3 = 7 X1 + X2 + X3 = 12 X1 + 4X2 X3 = 3 (iii) 2X1 + 4X2 5X3 + X4 = 8 4X1 + 5X3 X4 = 16 4X1 + 2X2 + X4 = 5 6X1 + 4X2 10X3 + 7X4 = 13
302
5.34
Finite Fields
In this section, we discuss the basic properties of nite elds. Finite elds are fundamental to the study of codes and cryptography. Recall that a eld F is nite if |F | is nite. |F | is the order of F . The characteristic of a nite eld F , as seen in Section 5.21, is a prime number and the prime eld P of F is a eld of p elements. P consists of the p elements 1F , 2 1F = 1F + 1F , . . . , p 1F = 0F . Clearly, F is a vector space over P . If the dimension of F over P is n, then n is nite. Hence F has a basis {u1 , . . . , un } of n elements over P . This means that each element v F is a unique linear combination of u1 , . . . , un , say, v = 1 u1 + 2 u2 + + n un , i P, 1 i n.
303
For each i, i can take |P | = p values, and so there are p p (n times) = pn distinct elements in F . Thus we have proved the following result. Theorem 5.34.1: The order of a nite eld is a power of a prime number. Finite elds are known as Galois elds after the French mathematician Evariste Galois (18111832) who rst studied them. A nite eld of order q is denoted by GF (q ). We now look at the converse of Theorem 5.34.1. Given a prime power pn (where p is a prime), does there exist a eld of order pn ? The answer to this question is in the armative. We give below two dierent constructions that yield a eld of order pn . Theorem 5.34.2: Given pn (where p is a prime), there exists a eld of pn elements. Construction 1: Consider the polynomial X p X Zp [X ] of degree pn . (Recall that Zp [X ] stands for the ring of polynomials in X with coecients from the eld Zp of p elements). The derivative of this polynomial is pn X p
n 1 n
1 = 1 Zp [X ],
n
and is therefore relatively prime to it. Hence the pn roots of X p X are all distinct. (Here, though no concept of the limit is involved, the notion of the derivative has been employed as though it is a real polynomial). It is known [28] that the roots of this polynomial lie in an extension eld K Zp .
Chapter 5 Algebraic Structures K is also of characteristic p. If a and b any two roots of X p X , then ap = a, Now by Theorem 5.21.3, (a b)p = ap bp , and, by the commutativity of multiplication in K , ap bp = (ab)p ,
n n n n n n n n
304
and bp = b.
and so a b and ab are also roots of X p X . Moreover, if a is a non-zero root of X p X , then so is a1 since (a1 )p = (ap )1 = a1 . Also the associative and distributive laws are valid for the set of roots since they are all elements of the eld K . Finally 0 and 1 are also roots. In other words, the pn roots of X p X Zp [X ] form a eld of order pn . Construction 2: Let f (X ) = X n + a1 X n1 + an Zp [X ] be a polynomial of degree n irreducible over Zp The existence of such an irreducible polynomial (with leading coecient 1) of degree n is guaranteed by a result (see [5]) in Algebra . Let F denote the ring of polynomials in Zp [X ] reduced modulo f (X ) (that is, if g (X ) Zp [X ], divide g (X ) by f (X ) and take the remainder g1 (X ) which is 0 or of degree less than n). Then every non-zero polynomial in F is a polynomial of Zp [X ] of degree at most n 1. Moreover, if a0 X n1 + + an and b0 X n1 + + bn are two polynomials in F of degrees at most n 1, and if they are equal, then, (a0 b0 )X n1 + + (an bn ) is the zero polynomial of F , and hence is a multiple of f (X ) in Zp [X ]. As degree of f is n, this is possible only if ai = bi , 0 i n. Now if a0 + a1 X +
n n n n
305
+ an1 X n1 is any polynomial of F , ai Zp and hence has p choices. Hence the number of polynomials of the form a0 X n1 + + an Zp [X ] is pn . We now show that F is a eld. Clearly, F is a commutative ring with unit element 1(= 0 X n1 + + 0 X + 1). Hence we need only verify that if a(X ) F is not zero, then there exists b(X ) F with a(X )b(X ) = 1. As a(X ) = 0, and f (X ) is irreducible over Zp , the gcd (a(X ), f (X )) = 1. So by Euclidean algorithm (Section 3.3), there exist polynomials C (X ) and g (X ) in Zp [X ] such that a(X )C (X ) + f (X )g (X ) = 1 (5.9)
in Zp [X ]. Now there exists C1 (X ) F with C1 (X ) C (X )( mod f (X )). This means that there exist a polynomial h(X ) in Zp [X ] with C (X ) C1 (X ) = h(X )f (X ), and hence C (X ) = C1 (X ) + h(X )f (X ). Substituting this in (1) and taking modulo f (X ), we get, a(X )C1 (X ) = 1 in F . Hence a(X ) has C1 (X ) as inverse in F . Thus every non-zero element of F has a multiplicative inverse in F , and so F is a eld of pn elements. We have constructed a eld of pn elements in two dierent waysone, as the eld of roots of the polynomial X p X Zp [X ], and the other, as the eld of polynomials in Zp [X ] reduced modulo the irreducible polynomial f (X ) of degree n over ZP . Essentially, there is not much of a dierence between the two constructions, as our next theorem shows. Theorem 5.34.3: Any two nite elds of the same order are isomorphic under a eld isomorn
Chapter 5 Algebraic Structures phism. Example 5.34.4:
306
Take p = 2 and n = 3. The polynomial X 3 + X + 1 of degree 3 is irreducible over Z2 . (If it is reducible, one of the factors must be of degree 1, and it must be either X or X + 1 = X 1 Z2 [X ]. But 0 and 1 are not roots of X 3 + X + 1 Z2 [X ]). The 23 = 8 polynomials over Z2 reduced modulo X 3 + X + 1 are: 0, 1, X, X + 1, X 2 , X 2 + 1, X 2 + X, X 2 + X + 1 and they form a eld. (Note that X 3 = X + 1, X 3 + X = 1 and X 3 + X = 0). We have, for instance, (X 2 + X + 1) + (X + 1) = X 2 and (X 2 + 1)(X + 1) = X 3 + X 2 + X + 1 = X 2 . Also (X + 1)2 = X 2 + 1. We know that if F is a eld, the set F of non-zero elements of F is a group. In the case when F is a nite eld, F has an additional algebraic structure. Theorem 5.34.5: If F is a nite eld, F (the set of non-zero elements of F ) is a cyclic group. Proof. We know (as F is a eld), F is a group. Hence we need only show that F is generated by a single element. Let be an element of the group F of maximum order, say, k . Necessarily, k q 1, where q = |F |. Choose F , = , 1. Let o( ) (= the order l of ) = l. Then l > 1. We rst show that l|k . Now o( (k,l) ) = , where (k, l) l (k, l) denotes the gcd of k and l. Further, as 0(), o( (k,l) ) = k, = (k, l)
Chapter 5 Algebraic Structures 1, we have o( (k,l) ) = o() o( (k,l) ) = k
307
l = [k, l], the lcm of k and l. (k, l) But, by our choice, the maximum order of any element of F is k . Therefore [k, l] = k which implies that l|k . But l = o( ). Therefore k = 1. Thus for each of the q 1 elements of x of F , xk = 1 and so is a root of xk 1. This means that, as |F | = q 1, k = q 1. Thus o() = q 1 and so F is the cyclic group generated by . Denition 5.34.6: (i) A primitive element of a nite eld F is a generator of the cyclic group F . (ii) A monic polynomial is a polynomial with leading coecient 1. For example, X 2 + 2X + 1 R[X ] is monic while 2X 2 + 1 is not. (iii) Let F be a nite eld with prime eld P . A primitive polynomial in F [X ] over P is the minimal polynomial in P [X ] of a primitive element of F . A minimal polynomial in P [X ] of an element F is a monic polynomial of least degree in P [X ] having as a root. Clearly the minimal polynomial of any element of F in P [X ] is irreducible over P . Let be a primitive element of F , and f (X ) = X n + a1 X n1 + + an P[X ] be the primitive polynomial of . Then any polynomial in P [] of degree n or more can be reduced to a polynomial in P [] of degree at most n 1. Moreover, no two distinct polynomials of P [] of degree at most n 1 can be equal; otherwise would be a root of a polynomial of degree less than n over P . Hence all the polynomials of the form a0 + a1 + + an1 n1 , ai P
308
in P [] are all distinct and |P []| = pn where p = |P |. These pn elements constitute a subeld F of F and F . But then F F and hence F = F . Thus |F | = |F | = pn . As is a primitive element of F this means that F = 0; , 2 , . . . , p Example 5.34.7: Consider the polynomial X 4 + X + 1 Z2 [X ]. This is irreducible over Z2 (Check that it can have no linear or quadratic factor in Z2 [X ]). Let be a root (in an extension eld of Z2 ) of this polynomial so that 4 + + 1 = 0. This means that 4 = + 1. We now prove that is a primitive element of a eld of 16 elements over Z2 by checking that the 15 powers , 2 , . . . , 15 are all distinct and that 15 = 1. Indeed, we have 1 = 2 = 2 3 = 3 4 = + 1 5 = 4 = ( + 1) = 2 + 6 = 5 = 3 + 2 7 = 6 = 4 + 3 = 3 + + 1 8 = 7 = 4 + (2 + ) = ( + 1) + (2 + ) = 2 + 1 9 = 8 = 3 + 10 = 9 = 4 + 2 = 2 + + 1
n 1
=1 .
Chapter 5 Algebraic Structures 11 = 10 = 3 + 2 +
309
12 = 11 = 4 + (3 + 2 ) = ( + 1) + (3 + 2 ) = 3 + 2 + + 1 13 = 12 = 4 + (3 + 2 + ) = ( + 1) + (3 + 2 + ) = 3 + 2 + 1 14 = 13 = 4 + (3 + ) = ( + 1) + (3 + ) = 3 + 1 15 = 14 = 4 + = ( + 1) + = 1 Thus F = {0} F = {0, , 2 , . . . , 15 = 1} and so is a primitive element of F = GF (24 ). We observe that a polynomial irreducible over a eld F need not be primitive over F . For instance, the polynomial f (X ) = X 4 +X 3 +X 2 +X +1 Z2 [X ] is irreducible over Z2 but it is not primitive. To check that f (X ) is irreducible, verify that F (X ) has no linear or quadratic factor over Z2 . Next, for any root of f (X ), check that 5 = 1 so that o() < 15, and f (X ) is not primitive over Z2 . Recall that if f (X ) were a primitive polynomial, some
root of f (X ) should be a primitive element of GF (24 ) .
5.35
Factorization of Polynomials over Finite Fields
Let be a primitive element of the nite eld F = GF (pn ), where p is a prime. Then F = , 2 , . . . , p
n 1
= 1 , and for any x F , xp = x.
Hence for each i, 1 i pn 1, (i )p = i .

n
Chapter 5 Algebraic Structures This shows that there exists a least positive integer t such that ip Then set Ci = i, pi, p2 i, . . . , pt i , 0 i pn 1.
t+1
310 = i .
The sets Ci are called the cyclotomic cosets modulo p dened with respect to F and . Now, corresponding to the coset Ci , 0 i pn 1, consider the polynomial fi (X ) = (X i )(X ip )(X ip ) (X ip ). The coecients of fi are elementary symmetric functions of i , ip , . . . , ip
t 2 t
and if denotes any of these coecients, then satises the relation p = . Hence Zp and fi (X ) Zp [X ] for each i, 0 i pn 1. Each element of Ci determines the same cyclotomic coset, that is, Ci = Cip = Cip2 = = Cipt . Moreover, if j / Ci , Ci Cj = . This gives a factorization of X p X into irreducible factors over Zp . In fact, X p X = X (X p Xp
n 1 n n 1 n
1), and
1 = (x )(X 2 ) (X p =
i j C i
n 1
(X j ) ,
where the rst product is taken over all the distinct cyclotomic cosets. What is more, each polynomial fi (X ) is irreducible over Zp as shown below. To see this, assume that g (X ) = a0 + a1 X + + ak X k F [X ]. Then g (X )
p p p p k p = ap 0 + a1 X + + ak (X )
= a0 + a1 X p + + ak X kp = g (X p ).
311
Consequently, if is a root of g , g ( ) = 0, and therefore 0 = (g ( ))p = g ( p ), that is, p is also a root of g (X ). Hence if j Ci and j is a root of fi (X ), then all the powers k , k Ci , are roots of fi (X ). Hence any non-constant irreducible factor of fi (X ) over Zp must contain all the terms (X j ), j Ci as factors. In other words, g (X ) is irreducible over Zp . Thus the determination of the cyclotomic cosets yields a simple device to factorize X p X into irreducible factors over ZP . We illustrate this fact by an example. Example 5.35.1: We factorize X 2 X into irreducible factors over Z2 . Let be a primitive element of the eld GF (24 ). As a primitive polynomial of degree 4 over Z2 having as a root, we can take (See Example 5.34.7) X 4 + X + 1. The cyclotomic cosets modulo 2 w.r.t. GF (24 ) and are: C0 = {0} C1 = 1, 2, 22 = 4, 23 = 8 C3 = {3, 6, 12, 9} C5 = {5, 10} C7 = {7, 14, 13, 11} . Note that C2 = C1 = C4 , and so on. Thus
15
4 n
(Note: 24 = 16 1( mod 15))
16
X = X (X
15
1) = X
i=1
X i X i
= X X 0
iC1
X i
iC3
Chapter 5 Algebraic Structures X i X i
312
iC5
iC7
= X (X + 1) X 4 + X + 1
X4 + X3 + X2 + X + 1 X4 + X3 + 1 . (5.10)
X2 + X + 1
In computing the products, we have used the relation 4 + + 1 = 0, that is, 4 = + 1. Hence, for instance, X i = X 5 X 10
iC5
= X 2 5 + 10 X + 15 = X2 + 2 + + 2 + + 1 X + 15 = X 2 + X + 1. The six factors on the right of Equation (5.10) are all irreducible over Z2 . The minimal polynomials of , 3 and 7 are all of degree 4 over Z2 . However, while and 7 are primitive elements of GF (24 ) (so that the polynomials X 4 + X +1 and X 4 + X 3 +1 are primitive), 3 is not (even though its minimal polynomial is also of degree 4). Primitive polynomials ???? are listed in [44].
5.35.1
Exercises
1. Construct the following elds: GF (24 ), GF (25 ) and GF (32 ). 2. Show that GF (25 ) has no GF (23 ) as a subeld 3. Factorize X 2 + X and X 2 + X over Z2 .
3 5
Chapter 5 Algebraic Structures 4. Factorize X 3 X over Z3 .

2
313
5. Using Theorem 5.34.5, prove Fermats little Theorem that for any prime p, ap1 1 ( mod p), for a 0 ( mod p).
5.36
Mutually Orthogonal Latin Squares [MOLS]
In this section, we show, as an application of nite elds, the existence of n 1 mutually orthogonal latin squares of order n. A latin square of order n is a double array L of n rows and n columns in which the entries belong to a set S of n elements such that no two entries of the same row or column of L are equal. Usually, we take S to be the set {1, 2, . . . , n} but this is not always essential.
2 For example, [ 1 2 1 ] and 1 2 3 2 3 1 3 1 2
are latin squares of orders 2 and 3 re-
spectively. Let L1 = (aij ), and L2 = (bij ) be two latin squares of order n with entries in S . We say that L1 and L2 are orthogonal latin squares if the n2 ordered pairs (aij , bij ) are all distinct. For example, L1 = and L2 =
1 2 3 3 1 2 2 3 1 1 2 3 2 3 1 3 1 2
are orthogonal latin squares of order 3 since the nine or-
dered pairs (1, 1), (2, 2), (3, 3); (2, 3), (3, 1), (1, 2); (3, 2), (1, 3), (2, 1) are all
2 2 1 distinct. However if M1 = [ 1 2 1 ] and M2 = [ 1 2 ], then the 4 ordered pairs
(1, 2), (2, 1), (2, 1) and (1, 2) are not all distinct. Hence M1 and M2 are not orthogonal. The study of orthogonal latin squares started with Euler, who had proposed the following problem of 36 ocers. The problem asks for an arrangement of 36 ocers of 6 ranks and from 6 regiments in a square formation of size 6 by 6. Each row and column of this arrangement are to contain only one ocer of each rank and only one ocer from each regiment.
314
We label the ranks and the regiments from 1 through 6, and assign to each ocer an ordered pair of integers in 1 through 6. The rst component of the ordered pair corresponds to the rank of the ocer and the second component his regiment. Eulers problem then reduces to nding a pair of orthogonal latin squares of order 6. Euler conjectured in 1782 that there exists no pair of orthogonal latin squares of order n 2( mod 4). Euler himself veried the conjecture for n = 2, while Tarry in 1900 veried it for n = 6 by a systematic case by case analysis. But the most signicant result with regard to the Euler conjecture came from Bose, Shrikande and Parker who disproved the conjecture by establishing that if n 2( mod 4) and n > 6, then there exists a pair of orthogonal latin squares of order n. A set {L1 , . . . , Lt } of t latin squares of order n on S is called a set of mutually orthogonal latin squares (MOLS) if Li and Lj are orthogonal whenever i = j . It is easy to see [59] that the number t of MOLS of order n is bounded by n 1. Further, any set of n 1 MOLS of order n is known to be equivalent to the existence of a nite projective plane of order n. A long standing conjecture is that if n is not a prime power, then there exists no complete set of MOLS of order n. We now show that if n is a prime power, there exists a set of n 1 MOLS of order n. (Equivalently, this implies that there exists a projective plane of any prime power order, though we do not prove this here). Theorem 5.36.1: Let n = pk , where p is a prime and k is a positive integer. Then for n 3, there exists a complete set of MOLS of order n.
315
Proof. By Theorem 5.34.2, we know that there exists a nite eld GF (pk ) = GF (n) = F , say. Denote the elements of F by a0 = 0, a1 = 1, a2 , . . . , an1 . Dene the n 1 matrices A1 , . . . , An1 of order n by At = (at ij ) = (at aij ), 0 i, j n 1; and 1 t n 1,
where at aij = at ai + aj . The entries at ij are all elements of the eld F . We claim that each At is a latin square. Suppose, for instance, two entries of
t some i-th row of At , say at ij and ail are equal. This implies that
at ai + aj = at ai + al , and hence aj = al . Consequently j = l. Thus all the entries of the i-th row of At are distinct. For a similar reason, no two entries of the same column of At are equal Hence At is a latin square. We next claim that {A1 , . . . , An1 } is a set of MOLS. Suppose 1 r < u n 1. Then Ar and Au are orthogonal. For suppose that
u r u ar ij , aij = ai j , ai j .
This means that ar ai + aj = ar ai + aj , and au ai + aj = au ai + aj . Subtraction gives (ar au )ai = (ar au )ai and hence, as ar = au , ai = ai . Consequently, i = i and j = j . thus Ar and Au are orthogonal.
Chapter 6 Graph Theory

6.1 Introduction
Graph theory, broadly speaking, studies properties between a given set of objects with some structure. Informally, a graph is a set of objects called vertices (or points ) connected by links called edges (or lines ). A graph is usually depicted by means of a diagram in the plane in which the vertices are denoted by points of the plane and edges by lines joining certain pairs of vertices. Graph theory has its origins in recreational mathematics and its use as a tool to solve practical problems. Since 1930, graph theory has received considerable attention as a mathematical discipline. This is due to its wide range of practical applications in many areas of the society. In fact, in recent times, its importance has seen phenomenal growth in view of its wide-range connections to Computer Science. The earliest known recorded result in graph theory is due to the Swiss mathematician Leonhard Euler (17011783) in 1736. The city of K onigsberg (now Kaliningrad) has seven bridges linking two islands A and B and the 316
317
banks C and D of the Pregal river (later called the Pregolya). The people of K onigsberg wondered if it was possible to take a stroll across the seven bridges, crossing each bridge exactly once and returning to the starting point. Euler showed that it was not possible.
C (land) Pregel river A (island) B (island)
D (land)
Figure 6.1: In 1859, William Rowan Hamilton (18051865) discovered the idea of a cycle that visits each vertex of a graph exactly once. Hamiltons game was a wooden regular dodecahedron with 20 vertices (labeled by cities). The objective was to nd a cycle traveling along the edges of the solid so that each city was visited exactly once. Fig. 6.2 below is the planar graph for a dodecahedronthe required cycle is designated by dark edges. Such a cycle is now called a Hamilton cycle.
318
Figure 6.2: Trees are special kinds of connected graphs that contain no cycles. In 1847, G. B. Kirchho (18241877) rst used trees in his work on electrical networks. It is also known that Arthur Caley (18211895) used trees systematically in his attempts to enumerate the isomers of the saturated hydrocarbons. Many problems in graph theory are easy to pose although their solutions may require research eort. Perhaps the most famous one is the Four Color Theorem (FCT) which states that every planar map can be colored using not more than four colors so that regions sharing a common boundary are colored dierent. The nal proof of the FCT, given by Kenneth Appel and Wolfgang Haken in 1976 required 1200 hours of computer time. The present chapter gives an exposure to some of the basic ideas in graph theory.
319
6.2
6.2.1
Basic denitions and ideas

Types of Graphs
Denition 6.2.1: A graph G consists of a vertex (or point) set V (G) = {v1 , . . . , vn } and an edge (or line) set E (G) = {e1 , . . . , em }, where each edge consists of an unordered pair of vertices. If e is an edge of G, then it is represented by the unordered pair {a, b} (denoted by ab when no confusion arises). We call a and b, the endpoints of e. If an edge e = {u, v } E (G), then u and v are said to be adjacent. A null graph on n vertices, denoted by Nn , has no edges. The empty graph is denoted by and has vertex set (and therefore edge set ). A loop is an edge whose endpoints are the same. Parallel edges or multiple edges are edges that have the same pair of endpoints. A simple graph is one that has no loops and no multiple edges. The order of a graph G, denoted by n(G) or simply n, is the number of vertices in V (G). A graph of order 1 is called trivial. The size of a graph G, denoted by m(G) or simply m, is the number of edges in E (G). If a graph G has nite order and nite size, then G is said to be a nite graph. Unless stated otherwise, we consider only simple nite graphs. It is possible to assign a direction or orientation to each edge in a graph we then treat an edge e (with endpoints u and v ) as an ordered pair (u, v ) or (v, u) often denoted by uv or vu . Denition 6.2.2: A directed graph or a digraph G consists of a vertex set V (G) = {v1 , . . . , vn }

Chapter 6 Graph Theory 2 1 4 5 4 1 3 G1 V (G1 ) = {1, 2, 3, 4, 5} E (G1 ) = {1, 2}, {2, 3}, {4, 5} 7 5 6
320
2 G2 3 V (G2 ) = {1, 2, 3, 4, 5, 6, 7} E (G2 ) = (1, 2), (2, 3), (1, 4), (3, 6), (6, 5), (6, 7), (7, 6)
Figure 6.3: and an arc set A(G) = {e1 , . . . , em } where each arc is an ordered pair of vertices. A simple digraph is one in which each ordered pair of vertices occurs at most once as an arc. We indicate an arc as (u, v ) or uv where u and v are the endpoints. Note that this is dierent from the arc vu with same endpoints. If e = (u, v ), then u is called the tail and v is called the head of e. In depicting a digraph, we assign a direction, marked by an arrow to each edge. Thus an arc uv of a digraph will be a line from u to v with an arrow in the direction from u to v . Figures 6.3 and 6.4 show some examples of

graphs and digraphs. G1 is a graph where the set of vertices {1,2,3} forms one component and the set of vertices {4,5} forms another component. We thus have a graph that is not connected. The connected components of such a graph may be studied separately. G2 is an example digraph. Consider the graphs G3 and G4 . We can make the following correspondence (denoted by the double arrow ) between V (G3 ) and V (G4 ): 1 a, 2 b, 3 d, 4d
Chapter 6 Graph Theory 6 5 2 1 8 3 a 7 b f e g h c
321
4 G3 V (G3 ) = {1, 2, 3, 4, 5, 6, 7, 8} E (G3 ) = {1, 4}, {4, 8}, {8, 5}, {5, 1}, {1, 2}, {5, 6}, {8, 7}, {4, 3}, {2, 3}, {3, 7}, {7, 6}, {6, 2} Figure 6.4: 5 e, 6 f,
d G4 V (G4 ) = {a, b, c, d, e, f, g, h} E (G4 ) = {ab, bc, cd, da, ef, f g, gh, he, ae, bf, cg, dh}
7 g,
8h
Letting i, j {1, 2, 3, 4, 5, 6, 7, 8} and x, y {a, b, c, d, e, f, g, h} we can easily check that if i x and j y then ij E (G3 ) if and only if xy E (G4 ). In other words, G3 and G4 are the same graph. More formally we have the following denition. Denition 6.2.3: Two simple graphs G and H are isomorphic if there is a bijection : V (G) V (H ) such that {u, v } E (G) if and only if {(u), (v )} E (H ). Exercise 6.2.4: Show that the Petersen graph P (most commonly drawn as G1 below) is isomorphic to the graphs G2 , G3 and G4 below:
322
G1
G2 Figure 6.5:
G3
G4
Exercise 6.2.5: Show that the Petersen graph is isomorphic to the following graph Q: The vertex set V (Q) is the set of unordered pairs of numbers (i, j ), i = j, 1 i, j 5. Two vertices {i, j } and {k, l} (i, j, k, l {1, 2, . . . , 5}) form an edge if and only if {i, j } {k, l} = . Example 6.2.6: Let V be a set cardinality n (vertices). From V we can obtain
n 2
n(n1) 2
unordered pairs (these can be treated as possible edges). Each subset of these
n ordered pairs denes a simple graph and hence there are 2( 2 ) simple graphs
with vertex set V . If |V | = 4, then there are 64 simple graphs on four vertices. However, they fall into only eleven isomorphism classes as shown in Fig 6.6.
323
G1
G2
G3
G4
G5
G6
G11
G10
G9 Figure 6.6:
G8
G7
Consider the pairs Gi and G12i in Fig 6.6. We see that G12i is obtained from Gi by removing its edge(s) and introducing all edges not in Gi . This is the idea of complementation of a graph. Note that G6 is isomorphic to its complement (formal denition follows). While there are only 11 non-isomorphic simple graphs on six vertices, there are as many as 1044 non-isomorphic simple graphs on seven vertices! When the order and the size of graphs are small (small graphs), it is usually easy to check for isomorphism. However, the problem of deciding whether two given graphs are isomorphic or not, is dicult in general. Denition 6.2.7: of a simple graph G is the simple graph with vertex set The complement G ) = V (G) and edge set E (G ) dened thus: uv E (G ) if and only if V (G ) = {uv |uv uv / E (G). That is, E (G / E (G)} Example 6.2.8: : The following graph G is isomorphic to its complement G
Chapter 6 Graph Theory 1 5 6 2 1 5 6 2
324
8 4 G
7 3 4
7 3 G
Figure 6.7: The mapping for the isomorphism is: 1 4, Exercise 6.2.9: and H are Show that two graphs G and H are isomorphic if and only if G isomorphic. Denition 6.2.10: A simple graph is called self-complementary if it is isomorphic to its own complement. The complement of the null graph Nn is a graph with n vertices in which every distinct pair of vertices is an edge. Such a graph is called a complete graph or a clique. The complete graph on n vertices is denoted by Kn . It
1 n(n 1) edges. easily follows that Kn has 2
26 31 47 52 68 73 85
Exercise 6.2.11: The line-graph L(G) of a given graph G = V (G), E (G) is the simple graph whose vertices are the edges of G with ef E L(G) if and only if two edges
325
e and f in G have a common endpoint. Prove that the Petersen graph is the complement of the line-graph of K5 . Exercise 6.2.12: Prove that if G is a self-complementary graph with n vertices, then n is either 4t or 4t + 1, for some integer t (Hint: consider the number of edges in Kn ) A subgraph of a graph G is a graph H such that V (H ) V (G) and E (H ) E (G). In notation, we write H G and say that H is a subgraph of G. For S V (G), the induced subgraph G[S ] or < S > of G is a subgraph H of G such that V (H ) = S and E (H ) contains all edges of G whose endpoints are in S (see Fig 6.8).
G1
G2
Figure 6.8: G1 is an induced subgraph of G; G2 is not Note that a complete graph may have many subgraphs that are not cliques but every induced subgraphs of a complete graph is a clique. The components of a graph G are its maximal connected subgraphs. A component is nontrivial if it contains an edge. An independent set of a graph G is a vertex subset S V (G) such that no two vertices of S are adjacent in G. It is easy to check that a clique of G and vice versa. is an independent set of G We next introduce a special class of graphs, called bipartite graphs. A
326
graph G is called bipartite if V (G) can be partitioned into two subsets X and Y such that each edge of G has one endpoint in X and the other in Y . We express this by writing G = G(X, Y ). A complete bipartite graph G is a bipartite graph G(X, Y ) whose edge set consists of all possible pairs of vertices having one endpoint in X and the other in Y . If X has m vertices and Y has n vertices such a graph is denoted by Km,n . Note that Km,n is isomorphic to Kn,m . It is easy to see that Km,n has mn edges. Example 6.2.13: The graph of Fig 6.9.(a) is the 3-cube. It is a bipartite graph (though not complete). Fig. 6.9.(b) is a redrawing of Fig. 6.9.(a) exhibiting the bipartitions X = {x1 , x2 , x3 , x4 } and Y = {y1 , y2 , y3 , y4 }. Graphs of Fig. 6.9.(c) are isomorphic to the complete bipartite graph K3,3 . x1 y2 x2 x3 y4 (a) y3 x4 (b) y1 y2 y3 y4 y1 x1 x2 x3 x4
(c) Three drawings of the complete bipartite graph K3,3 Figure 6.9:
Chapter 6 Graph Theory Exercise 6.2.14:
327
Let G be the graph whose vertex set is the set of binary strings of length n (n 1) . A vertex x in G is adjacent to vertex y of G if and only if x and y dier exactly in one position in their binary representation. Prove that G is a bipartite graph. We next introduce the notion of a walk and related concepts. Denition 6.2.15: A walk of length k in a graph G is a non-null alternating sequence v0 e1 v1 e2 . . . ek vk of vertices and edges of G (starting and ending with vertices) such that ei = vi1 vi , for all i. A trail is a walk in which no edge is repeated. A path is a walk with no repeated vertex. A (u, v )- walk is one whose rst vertex is u and last vertex v (u and v are the end vertices of the walk). A walk or trail is closed if it has length at least one and has its endvertices the same. A cycle is a closed path. A cycle on n vertices is denoted by Cn (where the vertices are unlabeled). Note that in a simple graph a walk is completely specied by its sequence of vertices. We now formally state the notion of connectedness. Denition 6.2.16: A graph G is connected if it has a (u, v )-path for each pair u, v V (G). Exercise 6.2.17: Let G be a simple graph. Show that if G is not connected, then its comple-
Chapter 6 Graph Theory is connected. ment G
328
6.2.2
Two Interesting Applications
We now illustrate how the above concepts can be applied to problems. (a) A Party Problem Six people are at a party. Show that there are three people who all know each other or there are three people who are mutually strangers. Perhaps, the easiest way to solve the problem is using graph theory. Consider the complete graph K6 . We associate the six people with the six vertices of K6 . We color the edges joining two vertices black if the corresponding people know each other. If two people do not know each other, we color the edge joining the corresponding vertices grey. If there are three people who know (dont know) each other, then we should have a black (grey) triangle in K6 . Given an assignment of colors to all edges of K6 , a subgraph H is called monochromatic if all edges of H have the same color. The party problem can now be posed as follows: If we arbitrarily color the edges of K6 black or grey, then there must be a monochromatic clique on three vertices. Let u, v, w, x, y, z be the vertices of K6 . An arbitrary vertex, say u in K6 has degree 5. So when we color the edges incident with u, we must use the color black or grey at least three times. Without loss of generality, assume that the three edges are colored black as shown in Fig. 6.10. Let these edges be uv , ux and uw. If any one of the edges vw, vx or xw is now colored black, we get the required black triangle. Hence we suppose all these edges
Chapter 6 Graph Theory are colored grey. But then this gives a grey triangle. u v y x Figure 6.10: w z
329
We remark that the generalization of the above party problem leads to Ramsey Theory. However we will not develop the theory in this book. (b) Proof of Schr oder-Bernstein Theorem An interesting application of bipartite graphs appears in the proof of Schr oderBernstein theorem (See Theorem 1.5.6) We begin with a result which we state without proof: A graph G is bipartite i no cycle of G is odd. Let f : X Y and g : Y X be 11 maps. Form a bipartite graph G = G(X, Y ) with bipartition (X, Y ). If x X , set (x, f (x)) E (G) and if y Y , set y, g (y ) E (G). The components of G could only be of one of the following four types (See Fig. 6.11): x g (f (x)) d x = g (y ) x = g (y )
f (x) f g f (x)
y Figure 6.11:
y = f (x )
330
(i) a one-way innite path starting at a vertex x X . Such a path is of the form x, f (x), g f (x) , f g f (x) , .
(ii) a one-way innite path starting at a vertex y Y . Such a path is of the form y , g (y ), f g (y ) , . (iii) two-way innite paths with vertices alternating between X and Y . Such a path is of the form . . . , x, y, x , y , . . . , where x, x , . . . X , and y, y . . . Y . (iv) an even cycle of the form x1 y1 x2 y2 , . . . , xn , yn x 1 , where xi X and yj Y for each i and j . It is now easy to set up a bijection from X onto Y . If z is a vertex of a component of type (i), set f (z ) if z X g (z )
(z ) =
if z Y.
A similar statement applies for vertices of a component of type (ii). In type (iii), dene (x) = y , (x ) = y and so on. Finally, in type (iv), if the component is x1 y1 x2 y2 . . . xn yn x1 , set (xi ) = yi , 1 i n. Then : X Y is a bijection from X onto Y . Thus X and Y are equipotent sets.
331
6.2.3
First Theorem of Graph Theory
We next come to the rst theorem of graph theoryrst we introduce the notion of degree of a vertex. An edge e of a graph G is said to be incident with a vertex v if v is an endvertex of e. We then say, v is incident with e. Two edges e and f having a common vertex v are said to be adjacent. Denition 6.2.18: Given a graph G, let v V (G). The degree d(v ) or dG (v ) of v is the number of edges of G incident with v . A vertex of degree 1 in a graph G is called an end-vertex of G or a leaf of G. We denote the maximum degree in a graph G by (G); the minimum degree is denoted by (G). A graph is regular if (G) = (G). Also, a graph is k -regular if (G) = (G) = k . It is easy to check that a k -regular graph with n vertices has nk/2 edges. The complete graph Kn is (n 1) regular. The complete bipartite graph Kn,n is n-regular. The Petersen graph is 3-regular. A 3-regular simple connected graph is called a cubic graph. It turns out that cubic graphs on n vertices exist for even values of n. Note that K3,3 is a cubic graph. Some cubic graphs are shown in Fig. 6.12.
Figure 6.12:
Chapter 6 Graph Theory Exercise 6.2.19:
332
Consider any k -regular graph G for odd k . Prove that the number of edges in G is a multiple of k . Exercise 6.2.20: Prove that every 5-regular graph contains a cycle of length at least six. Theorem 6.2.21 (First Theorem of graph theory or Degree-Sum Formula): For any graph G, d(v ) = 2m(G)
v V (G)
Proof. When the degrees of all the vertices are summed up, each edge is counted twice. Hence the result. By the Degree-Sum Formula, the average vertex degree is 2m(G)/n(G), where n(G) is the order of G. Here, (G) 2m(G)/n(G) (G) A vertex of a graph is called odd or even depending on whether its degree is odd or even. Corollary 6.2.22 (Handshake Lemma): In any graph G, there is an even number of odd vertices. Proof. Let A and B be respectively the set of odd and even vertices of G. Then for each u B , d(u) is even and so Degree-Sum Formula d(u) +
u B w A
d(u) is also even. By the

u B
d(w) =
v V (G)
d(v ) = 2m(G)
Chapter 6 Graph Theory This gives,

w A
333 d(v ) = an even number

u B
d(w) = 2m(G)
(being the dierence of two even numbers) Hence the result. This can be interpreted thus: the number of participants at a birthday party each of who shake hands with an odd number of other participants is always even.
6.3
Representations of Graphs
A graph G = (V (G), E (G)) can be represented as a collection of adjacency lists or as an adjacency matrix (dened below). For sparse graphs for which |E (G)| is much less compared to |V (G)|2 , the adjacency list representation is preferred. For dense graphs for which |E (G)| is close to |V (G)|2 , the adjacency matrix representation is good. Denition 6.3.1: The adjacency list representation of a graph G = V (G), E (G) consists of an array Adj of |V (G)|. For each vertex u of G, Adj[u] points to the list of all vertices v that are adjacent to u. For a directed graph Adj[u] points to the list of all vertices v such that uv
is an arc in E (G). It is easy to see that the adjacency list representation of a graph (directed or undirected) has the desirable property that the amount of memory it requires is O(max(|V |, |E |)) = O(|V | + |E |) Given an adjacency list representation of a graph, to determine if an edge
334
uv is present in the graph, the only way is to search the list Adj[u] for v . The process of determining the presence (or absence) of an edge uv is much simpler in the adjacency-matrix (dened below) representation of a graph. Denition 6.3.2: To represent a graph G = V (G), E (G) , we rst number the vertices of G by 1, 2, . . . , |V | in some arbitrary manner. The adjacency matrix of G is then the |V | |V | matrix A = (aij ) when, 1 if ij E (G) aij = 0 otherwise. elements aij as, aij = 1 if
The above denition applies to directed graphs also where we specify the
Note that a graph may have many adjacency lists and adjacency matrices because the numbering of the vertices is arbitrary. However, all the representations yield graphs that are isomorphic. It is then possible to study properties of graphs that do not dependent on the labels of the vertices. Theorem 6.3.3: Let G be a graph with n vertices v1 , . . . , vn . Let A be the adjacency matrix of G with this labeling of vertices. Let Ak be the result of multiplication of k (a positive integer) copies of A. Then the (i, j )th entry of Ak is the number of dierent (vi , vj )-walks in G of length k .
0 otherwise.
ij A(G)
335
Proof. We prove the result by induction on k . For k = 1, the theorem follows from the denition of the adjacency matrix of G, since a walk of length 1 from vi to vj is just the edge vi vj . We now assume that the result is true for Ak1 (k > 1). Let Ak1 = (bij ) i.e., bij is the number of dierent walks of length k 1 from vi to vj . Let Ak = (cij ). We want to prove that cij is the number of dierent walks of length k from vi to vj . By denition, Ak = Ak1 A and by the denition of matrix multiplication,
n
cij =
r =1 n
(i, r)th element ofAk1 (r, j )th element of A bir arj
=
r =1
Now every (vi , vj )-walk of length k consists of a (vi , vr )-walk of length k 1 for some r followed by the edge vr vj . Now arj = 1 or 0 according as vr is adjacent to vj or not. Now, by induction hypothesis, the number of (vi , vr )walks of length k 1 is the (i, r)th entry bir of matrix Ak1 . Hence the total number of (vi , vj )-walks of length k is,
n
bir arj = cij .

r =1
Hence the result is true for A . The next theorem uses the above result to determine whether or not a graph is connected. Theorem 6.3.4: Let G be a graph with n vertices v1 , . . . , vn and let A be the adjacency matrix of G. Let B = (bij ) be the matrix given by, B = A + A2 + . . . + An1 .
336
Then G is connected if and only if for every pair of distinct indices i, j , bij = 0; that is, G is connected if and only if B has no zero elements o the main diagonal. Proof. Let aij denote the (i, j )th entry of Ak (k = 1. . . . , n 1). We then have, bij = aij + aij + . . . + aij
(k ) (1) (2) (n1) (k )
By Theorem 6.3.3, aij denotes the number of distinct walks of length k from vi to vj . Thus, bij = (number of dierent (vi , vj )-walks of length 1) + (number of dierent (vi , vj )-walks of length 2) +
+ (number of dierent (vi , vj )-walks of length n 1) In other words, bij is the number of dierent (vi , vj )-walks of length less than n. Assume that G is connected. Then for every pair i, j (i = j ) there is a path from vi to vj . Since G has only n vertices, any path is of length at most n 1. Hence there is a path of length less than n from vi to vj . This implies that bij = 0. Conversely, assume that bij = 0 for every pair i, j (i = j ). Then from the above discussion it follows that there is at least one walk of length less than n, from vi to vj . This holds for every pair i, j (i = j ) and therefore we conclude that G is connected. Exercise 6.3.5: Let A be the adjacency matrix of a connected graph G with n vertices; is it
337
always necessary to compute up to An1 to conclude that G is connected? Justify your answer. Let G be the simple connected graph shown in Fig. 6.13. 1 3
2 Figure 6.13:
We label the vertices of G so that the edge {1, 2} is necessarily present in G. The graph G can now be regarded as a collection of edges from the lexicographically ordered sequence. {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} Indicating the presence or absence of an edge in G by 1 or 0 respectively, the above sequence yields the binary string, 1 0 0 1 1 1
which represents the number 39 in the decimal system. Thus we can uniquely represent a graph by a number.
6.4
6.4.1
Basic Ideas in Connectivity of Graphs

Some Graph Operations
Let G = (V, E ) be a graph. We dene graph operations that lead to new graphs from G.
Chapter 6 Graph Theory (a) Edge addition: Let uv E (G). Then the graph, G + uv = V (G), E (G) {uv } (b) Edge deletion: Let e E (G). Then the graph G e = V (G), E (G) \ {e} (c) Vertex deletion: Let v be a vertex of G. Then the graph G v = V (G) \ {v }, {e E (G)|e not incident with v } i.e., we delete the vertex v and all edges having v as an endvertex. (d) Edge subdivision: G%e = V (G) {z }, E (G) \ xy xz, zy
338
where e = xy E (G) and z / V (G) is a new vertex; we insert a new vertex z on the edge xy . (e) Deletion of a set of vertices: Let W V (G). The graph G \ W is the graph obtained from G by deleting all the vertices of W as also each edge that has at least one vertex in W . The above operations are illustrated in Fig 6.14.
Chapter 6 Graph Theory u w G + uv v
339
u w G
v e w
Ge
Gw
G% e
G \W, where W = {w, w } Figure 6.14:
6.4.2
Vertex Cuts, Edge Cuts and Connectivity
Denition 6.4.1: In a connected graph G, a subset V V (G) of is a vertex cut of G if G V is disconnected. It is a k -vertex cut if |V | = k . V is called a separating set of vertices of G. A vertex v of a connected graph G is a cut vertex of G if {v } is a vertex cut of G.
Chapter 6 Graph Theory Denition 6.4.2:
340
Let G be a nontrivial graph. Let S be a proper nonempty subset of V . Let ] denote the set of all edges of G having one endpoint in S and the other [S, S . A set of edges of G of the form [S, S ] is called an edge cut of G. An in S edge e E (G) is a cut edge of G if {e} is an edge cut of G. An edge cut of cardinality k is called a k -edge cut of G. If e is a cut edge of a connected graph, then G e has exactly two components. Example 6.4.3: Consider the graph in Fig. 6.15. w u v y Figure 6.15: {v } and {w, x} are vertex cuts. The edge subsets, wy, xy , uv edges. The following theorem characterizes a cut vertex of G. Theorem 6.4.4: A vertex v of a connected graph G with at least three vertices is a cut vertex of G if and only if there exist vertices u and w of G, distinct from v , such that v is in every (u, w)-path in G. Proof. If v is a cut vertex G, then G v is disconnected. Let G1 and G2 be the two components of G \ v . Choose u, w such that u V (G1 ) and and z x
{xz } are all edge cuts. Vertex v is a cut vertex. Edges uv and xz are cut
341
w V (G2 ). Then every (u, w)-path in G must contain v as otherwise u and w would belong to the same component of G v . Conversely, assume that the condition of the theorem holds. Then the deletion of v splits every (u, w)-path in G and hence u and w lie in dierent components of G v . So G v is disconnected and therefore v is a cut vertex of G. The following two theorems characterize a cut edge of a graph. Theorem 6.4.5: In a connected graph G, an edge e = uv of is a cut edge of G if and only if e does not belong to any cycle of G. ] = {e} be the partition Proof. Let e = uv be a cut edge of G and let [S, S
. If e belongs to a of V (G) dened by G e so that u S and v S
] must contain at least one more edge contradicting that cycle of G then [S, S
]. Hence e cannot belong to a cycle. {e} = [S, S Conversely assume that e is not a cut edge of G. Then G e is connected and hence there exists a (u, v )-path P in G e. Then P together with the edge e forms a cycle in G. Theorem 6.4.6: In a connected graph G, an edge e = uv is a cut edge if and only if there exist vertices x and y such that e belongs to every (x, y )-path in G. Proof. Let e = uv be a cut edge in G. Then G e has two components, say G1 and G2 . Let x V (G1 ) and y V (G2 ). Then there is no (x, y )-path in G e. Hence every (x, y )-path in G must contain e.
342
Conversely, assume that there exist vertices u and v satisfying the condition of the theorem. Then there exists no (x, y )-path in G e and this means that G e is disconnected. Hence e is a cut edge of G Exercise 6.4.7: Prove or disprove: Let G be a simple connected graph with |V (G)| 3. Then G has a cut edge if and only if it has a cut vertex.
6.4.3
Vertex Connectivity and Edge-Connectivity
We next introduce two parameters of a graph which in a way, measure the connectedness of a graph. Denition 6.4.8: Let G be a nontrivial connected graph having at least a pair of non-adjacent vertices. The minimum k for which there exists a k -vertex cut is called the vertex connectivity or simply the connectivity of G; it is denoted by (G). If G has no pair of nonadjacent vertices, that is if G is a complete graph of order n, then (G) is dened to be n 1. Note that the removal of any set of n 1 vertices of Kn results in a K1 . A subset of vertices or edges of a connected graph G is said to disconnect the graph if its deletion results in a disconnected graph. Denition 6.4.9: The edge connectivity of a connected graph G is the smallest k for which there exists a k -edge cut ( i.e., an edge cut containing k edges). The edge connectivity of G is denoted by (G).
343
If (G) is the connectivity of a graph G, then there exists a set of (G) edges whose deletion results in a disconnected graph and no subset of edges in G of size less than (G) has this property. Thus we have the following denition Denition 6.4.10: A graph G is r-connected if (G) r. G is r-edge connected if (G) r. The parameters (G), (G) and (G) are related by the following inequalities. Theorem 6.4.11: For a connected graph G, (G) (G) (G) Proof. By our denition of (G) and (G), (G) 1, 1 and (G) 1. Let E be an edge cut of G with (G) edges and let uv E . For each edge of E that does not have both u and v as endpoints, remove an endpoint that is dierent from u and v . If there are t such edges, at most t vertices would have been removed. If the resulting graph is disconnected, then (G) t (G). Otherwise, there will remain a subset of edges of E having u and v as end vertices, the removal of which will disconnect the graph. Hence additional removal of one of u or v will disconnect the graph or a trivial graph will result. In the process, a set of at most t + 1 vertices would have been removed and so (G) t + 1 (G). Finally it is clear that (G) (G); in fact, if v is a vertex of G with dG (v ) = (G), then the set of (G) edges incident with v forms the edge cut [{v }, V \ {v }] of G. Thus (G) (G).
344
We now characterize 2-connected graphs. Two paths from a vertex u to another vertex v (u = v ) are internally disjoint if they have no common vertex except the vertices u and v . The following theorem due to Whitney characterizes 2-connected graphs. Theorem 6.4.12: A graph G with at least three vertices is 2-connected if and only if there exists a pair of internally disjoint paths between any pair of distinct vertices. Proof. For any two distinct vertices u, v in G, assume that G has at least two internally disjoint (u, v )-paths. Let w be any vertex of G. Then w is not a cut vertex of G. If not, by Theorem 6.4.12, there exist vertices u and v of G such that every (u, v )-path in G contains w. Hence w is not a cut vertex of G and therefore G is 2-connected. Conversely, assume that G is 2connected. We apply induction on d(u, v ) to prove that G has two internallydisjoint (u, v )-paths. When d(u, v ) = 1, the graph G uv is connected, since (G) (G) 2. Any (u, v )-path in G uv is internally disjoint in G from the (u, v )-path consisting of the edge uv . Thus u and v are connected by two internally-disjoint paths in G. Now we apply induction on d(u, v ). Let d(u, v ) = k > 1 and assume that G has internally-disjoint (x, y )-paths whenever 1 d(x, y ) < k . Let w be the vertex appearing before v on a shortest (u, v )-path. Since d(u, w) = k 1, by induction hypothesis, G has two internally disjoint (u, w)-paths, say P and Q. (see Fig. 6.16).
Chapter 6 Graph Theory P z u w Q v R
345
Figure 6.16: Since G w is connected, a (u, v )-path, say R, must exist in G w. If R avoids P or Q, we are done; otherwise let z be the last vertex of R belonging to P Q. (see Fig 6.16). Without loss of generality, assume that z P . We combine the (u, z )-subpath of P (with (z, v )-section of R) to obtain a (u, v )-path internally-disjoint from the (u, v )-path Q (w, v )-path in G. We next give further characterizations of 2-connected graphs. Lemma 6.4.13 (Expansion Lemma): Let G be a k -connected graph and let G be obtained from G by adding a new vertex x and making it adjacent to at least k vertices of G. Then G is also k -connected. Proof. Let S be a separating set of G . We have to show that |S | k . If x S , then S \ {x} separates G and so |S {x}| k . Hence |S | k + 1. If x / S , then if S separates G, then |S | k . Otherwise, S does not separate G but separates G and so the neighbor set of x in G is contained in S. Theorem 6.4.14: For a graph G with |V (G)| 3, the following conditions characterize 2-
Chapter 6 Graph Theory connected graphs and are all equivalent. (a) G is connected and has no cut vertex.
346
(b) For all u, v V (G), u = v , there are two internally disjoint (u, v )-paths. (c) For all u, v V (G), u = v , there is a cycle through u and v (d) (G) 2 and every pair of edges in G lies on a common cycle. Proof. Equivalence of (a) and (b) follows from Theorem 6.4.12. Any cycle containing vertices u and v corresponds to a pair of internally disjoint (u, v )paths. Therefore (b) and (c) are equivalent. Next we shall prove that (d) (c). Let x, y V (G) be any two vertices in G. We consider edges of the type ux and uy (since (G) > 1) or ux and wy . By (d) these edges lie on a common cycle and hence x and y lie on a common cycle (see Fig. 6.17). x u u y w y
Figure 6.17: Therefore (d) (c). For proving (c) (d), suppose that G is 2-connected and let uv, xy E (G). We add to G, the vertices w with neighborhood {u, v } and z with
347
neighborhood {x, y }. By the Expansion Lemma, the the resulting graph G is 2-connected and hence w, z lie on a common cycle C in G . Since w, z each have degree 2, this cycle contains the paths u, w, v and x, z, y but not uv and xy . We replace the paths u, w, v and x, z, y in C by the edges uv and xy to obtain a desired cycle in G. We conclude this section after introducing the notion of a block. Denition 6.4.15: A graph G is nonseparable if it is nontrivial, connected and has no cut vertices. A block of a graph is a maximal nonseparable subgraph of G. Note that if G has no cut vertices then G itself is a block. Example 6.4.16: A graph G is shown in Fig 6.18.(a) and Fig 6.18.(b) shows its blocks B1 , B2 , B3 and B4 .
Chapter 6 Graph Theory b a c b a c (b) Blocks of G Figure 6.18: h d d f e g
348
d (a) Graph G
f h g e f f
Let G be a connected graph with |V (G)| 3. The following statements are straightforward. 1 Each block of G with at least three vertices is a 2-connected subgraph of G. 2 Each edge of G belongs to one of its blocks and hence G is the union of its blocks. 3 Any two blocks of G have at most one vertex in common; such a vertex, if it exists, is a cut vertex of G. 4 A vertex of G that is not a cut vertex belongs to exactly one of its blocks. 5 A vertex of G is a cut vertex of G if and only if it belongs to at least two blocks of G.
349
6.5
6.5.1
Trees and their properties

Basic Denition
The notion of a tree in graph theory is a simple and fundamental concept with important applications in Computer Science. Denition 6.5.1: A tree is a connected graph that has no cycle (that is, acyclic). A forest is an acyclic graph; each component of a forest is a tree. Fig. 6.19 shows an arbitrary tree. The graphs of Fig 6.20 show all (unlabeled) trees with at most ve vertices.
Figure 6.19: An arbitrary tree.
350
1 vertex 2 vertices 3 vertices
4 vertices
5 vertices Figure 6.20: Trees with at most ve vertices. A tree can be characterized in dierent ways. We do this in the next theorem. Theorem 6.5.2 (Tree characterizations): Given a graph G = (V, E ), the following conditions are equivalent: (i) G is a tree. (ii) There is a unique path between any pair of vertices in G. (iii) G is connected and each edge of G is a cut edge. (iv) G is acyclic and the graph formed from G by adding an edge (that is, a graph of the form G + e where e joins a pair of nonadjacent vertices of G) contains a unique cycle. (v) G is connected and |V (G)| = |E (G)| + 1.
Chapter 6 Graph Theory Before proving Theorem 6.5.2, we rst prove two lemmas. Lemma 6.5.3: Any tree with at least two vertices contains at least two leaves.
351
Proof. Given a tree T = (V, E ), let P = (v0 , e1 , v1 , . . . , ek , vk ) be a longest path of T . Clearly, length of P is at least 1 and so v0 = vk . We claim that both v0 and vt are leaves. This is done by contradiction. If v0 is not a leaf, then there exists an edge e = v0 v where e = e1 . Then either v is in P or v is not in P : if v is in P (that is, v = vi , i 2) then the edge e together with the section of the path P from v0 to vi forms a cycle in T , a contradiction to the fact that T is acyclic. If v is not in P , then that would contradict the choice of P . Lemma 6.5.4: Let v be a leaf in a graph G. Then G is a tree if and only if G v is a tree. Proof. We rst assume that G is a tree. Let x, y be two vertices of G v . Since G is connected x and y are connected by a path in G. This path contains only two vertices, viz., x and y , of degree 1; so it does not contain v . Consequently it is completely contained in G v and hence G v is connected. As G is acyclic, G v is also acyclic and so it is a tree. We now assume that G v is a tree. As v is a leaf, there exists an edge uv incident on it. Clearly, u (G v ) and (G v ) {uv } has no cycle. As G v is connected (because it is a tree), (G v ) {uv } = G is connected. We now prove Theorem 6.5.2
352
Proof of Theorem 6.5.2. We prove that each of the statements (ii) through (v) is equivalent to statement (i). The proofs go by induction on the number of vertices of G, using Lemma 6.5.4. For the induction basis, we observe that all the statements (i) through (v) are valid if G contains a single vertex only. We rst show that (i) implies all of (ii) to (v). Let G be a tree with at least two vertices and let v be a leaf and let v be the vertex adjacent to v in G. By the induction hypothesis, we assume that G v satises (ii) to (v). Now the validity of (ii), (iii) and (v) for G is obvious. For (iv), since G is connected, any two vertices x, y V (G) is connected by a path P and if xy / E (G), then P + xy creates a cycle. Therefore (i) implies (iv) as well. We now prove that each of the conditions (ii) to (v) implies (i). In (ii) and (iii) we already assume connectedness. Also, a graph satisfying (ii) or (iii) cannot contain a cycle: for (ii), this is because two vertices in a cycle are connected by two distinct paths and for (iii), the reason is that by omitting an edge in a cycle we obtain a connected graph. Thus (ii) implies (i) and (iii) also implies (i). To verify that (iv) implies (i), it suces to check that G is connected. If x, y V (G), then either xy E (G) or the graph G + xy contains a unique cycle. Necessarily, as G is a cyclic, this cycle must contain the edge xy . Now removal of the edge xy from this cycle give a path from x to y in G. Thus G is connected. We nally prove that (v) implies (i). Let G be a connected graph satisfying |V (G)| = |E (G)| + 1 2. The sum of the degrees of all vertices is 2|V (G)| 2. This means that not all vertices can have degree 2 or more. Since all degrees are at least 1 (by connectedness) there exists a vertex v of degree exactly 1, that is, a leaf of G. The graph G = G v is
353
again connected and it satises |V (G )| = |E (G )| + 1. Hence it is a tree by the induction hypothesis and thus G is a tree as well.
6.5.2
Sum of distances from a leaf of a tree
Denition 6.5.5: Let G be a graph and let u, v be a pair of vertices connected in a path in G. Then the distance from u to v , denoted by dG (u, v ) or simply d(u, v ), is the least length (in terms of the number of edges) of a (u, v )-path in G. If G has no (u, v )-path, we dene d(u, v ) = . Denition 6.5.6: The diameter of a connected graph G (denoted by diam (G)) is dened as the maximum of the distances between pairs of vertices of G. In symbols, diam (G) = max d(u, v ).
u,v V (G)
Since in a tree any two vertices are connected by a unique path, the diameter of a tree T is the length of a longest path in T . We next prove a theorem that gives a bound on the sum of the distances of all the vertices of a tree T from a given vertex of T . Theorem 6.5.7: Let u be a vertex of a tree T with n vertices. Then, d(u, v ) n . 2
v V (G)
Proof. The proof goes by induction on the number of vertices of G. The result holds trivially for n = 2. Let n > 2. The graph T u is a forest with
354
components T1 , . . . , Tk , where k > 1. As T is connected, u has a neighbor in each Ti ; also, since T has no cycles, u has exactly one neighbor say vi , in each Ti (the situation is shown in Fig. 6.21)
T1
v1
u v3 v2 T2 T3
Figure 6.21: If v V (Ti ), then the unique (u, v )-path in T passes through vi and we have, dT (u, v ) = 1 + dTi (vi , v ) Letting ni = |Ti |, we obtain dT (u, v ) = ni +
v V (Ti ) v V (Ti )
dTi (vi , v )
By the induction hypothesis, dTi (vi , v ) ni 2
v V (Ti ) k
if we now sum the formula for distances from u over all the components of T u, we obtain (as
i=1
ni = n 1), dT (u, v ) (n 1) + ni 2
v V ( T )

k
355 since ni = n 1, and the right hand side
We note that
i=1
ni 2
n1 2
counts the edges in Kn1 and the left hand side counts the edges in a subgraph of Kn1 (a disjoint union of cliques). Hence we have, dT (u, v ) (n 1) + n1 2 = n 2
v V (T )
6.6
6.6.1
Spanning Tree
Denition and Basic Results
We now introduce the notion of a spanning tree. Denition 6.6.1: A spanning subgraph of a graph G is a subgraph H of G such that V (H ) = V (G). A spanning tree of G is a spanning subgraph of G which is a tree. It is not dicult to show that every connected graph has a spanning tree. Note that not every spanning subgraph is connected and a connected subgraph of G need not be a spanning subgraph. Exercise 6.6.2: Prove that a graph G is connected if and only if it has a spanning tree. By labeling the vertices of K4 , we can easily see that it has 16 dierent spanning trees (see Fig. 6.22). However, K4 has only two non-isomorphic unlabeled spanning trees namely K1,3 and P4 .
Chapter 6 Graph Theory 1 2 1 2 1 2 1 2
356
4 1
3 2
4 1
3 2
4 1
3 2
4 1
3 2
4 1
3 2
4 1
3 2
4 1
3 2
4 1
3 2
4 1
3 2
4 1
3 2
4 1
3 2
4 1
3 2
Figure 6.22: We state a general theorem due to the English mathematician Arthur Caley without proof. Theorem 6.6.3 (A. Caley (1889)): The complete graph Kn has nn2 dierent labeled spanning trees. Corollary 6.6.4 (to Theorem 6.5.7): If u is a vertex of a connected graph G of order n then d(u, v ) n 2
v V (G)
357
Proof. Let T be a spanning tree of G. Every (u, v )-path in T also appears in G (G may have other (u, v )-paths that are shorter than those in T ). Therefore dG (u, v ) dT (u, v ), for every vertex v . This implies
v V (G)
dG (u, v )
v V (G)
dT (u, v )
v V (G) n 2
But Theorem 6.5.7, we have Hence
dT (u, v )
. n 2
v V (G)
dG (u, v )
Exercise 6.6.5: Let G be a graph with exactly one spanning tree. Prove that G is a tree. Exercise 6.6.6: If = k , then show that a graph G has a spanning tree with at least k leaves. The sum of the distances over all pairs of distinct vertices a graph G is known as the Wiener Index of G, denoted by W (G). Thus W (G) =
u,v V (G)
d(u, v ).
Chemical molecules can be modeled as graphs by treating the atoms as vertices and the atomic bonds as edges. Wiener index was originally introduced by Harold Wiener who observed correlation between this index and the boiling point of parans. Consider the path Pn of n vertices labeled 1 through n wherein vertex i to connected to vertex i 1 (1 < i n). W (Pn1 ) 1 2 3 4 (n 2) (n 1) Figure 6.23: W (Pn ) n
Chapter 6 Graph Theory From Fig. 6.23 it easily follows that W (Pn ) = W (Pn1 ) + yielding Pn = Exercise 6.6.7: Prove the following: (i) W (Kn ) =
n 2 n+1 3
358
n(n 1) 2
(ii) W (Km,n ) = (m + n)2 mn m n. (iii) W (C2n ) = (2n)3 /8. (iv) W (C2n+2 ) = (2n + 2)(2n + 1)(2n)/8. (v) Wiener index of the Petersen Graph is 75. We next give an algorithm for nding a spanning tree of a graph.
Algorithm SP-TREE
Let G = (V, E ) be the given graph, where |V (G)| = n and |E (G)| = m. Let (e1 , e2 , . . . , em ) be the edge sequence of G labeled in some way. We now successively construct the subsets E0 , E1 , . . . of G. Set E0 = . From Ei1 , the set Ei is obtained as follows: Ei1 {ei }, if the graph (V (G), Ei1 {ei }) has no cycle Ei = Ei1 , otherwise
We stop if Ei has already n 1 edges or if i = m (i.e., all edges have been considered).
Chapter 6 Graph Theory We prove that algorithm SP-TREE is correct. Proposition 6.6.8:
359
If algorithm SP-TREE outputs a graph T with n 1 edges, then T is a spanning tree of G. If T has k < (n 1) edges, then G is a disconnected graph with n k components. Proof. From the way the sets Ei are constructed the graph T contains no cycle. If k = |E (T )| = n 1, then by Theorem 6.5.2 (v), T is a tree and hence it is a spanning tree. If k < n 1, then T is a disconnected graph whose every component is a tree. It is easy to reason that it has n k components. We prove that the vertex sets of the components of the graph T coincide with those of the components of the graph G. Assume the contrary and let x and y be vertices lying in the same component of G but in distinct components of T . Let C be the component of T containing the vertex x. Consider some path, (x = x0 , e1 , x1 , e2 , . . . , ek , xk = y ) from x to y in G as shown in Fig. 6.24.
e x C xi y
Figure 6.24: Let i be last index for which xi is contained in the component C . Obvi-
360
ously i < k and have xi+1 / C . The edge e = xi xi+1 thus does not belong to T and so it had to form a cycle with some edges already selected into T at some stage of the algorithm. Therefore the graph T + e also contains a cycle; but this is impossible as e connects two distinct components of T . This gives the desired contradiction.
6.6.2
Minimum Spanning Tree
The design of electronic circuits using integrated chips often requires that the pins of several components be at the same potential. This is achieved by wiring them together. To interconnect a set of n pins, we can use an arrangement of n 1 wires, each connecting two points. Of the various arrangements on a circuit board, the one that uses the least amount of wire is desirable. The above wiring problem can be modeled thus: we are given a connected graph G = (V (G), E (G)), where V (G) corresponds to the set of pins and E (G) corresponds to the possible interconnections. Associated with each edge uv E (G), we have a weight w(u, v ) specifying the cost (amount of wire needed) to connect u and v . We then wish to nd a spanning tree T (V, E (T )) of G whose total weight, W (E (T )) =
u v E ( T )
w(u, v )
is minimum. Such a tree T is called a minimum spanning tree (MST) of the graph G. We now present what is popularly known as Prims algorithm (alternatively called Jarniks algorithm) for nding the MST of a graph.
361
6.6.3
Algorithm PRIM
Let the input graph be G = (V, E ), with |V (G)| = n, |E (G)| = m. We will successively construct the sets V0 , V1 , V2 , . . . V of vertices and the sets E1 , E1 , E2 , . . . E of edges as given below. Let E0 = and V0 = {v }, where v is an arbitrary vertex. Having constructed Vi1 and Ei1 , we next nd an edge ei = xi yi E (G) such that, (i) xi Vi1 and yi V \ Vi1 and (ii) ei is an edge of minimum weight. We now set, Vi = Vi1 {yi } and Ei = Ei1 {ei }. If no such edge exists, the algorithm terminates. Let Et denote the set for which the algorithm has stopped. The algorithm outputs the graph T = (V, Et ) as the MST. Proof of correctness of Algorithm PRIM. Let T = (V, Et ) be the output of algorithm PRIM. Let the edges of Et be numbered e1 , e2 , . . . , en1 in the order in which they have been added to T by the algorithm. Assume that T is not a MST; let T be some MST. Let k (T ) denote the index for which all the edges e1 , e2 , . . . , ek E (T ) but ek+1 / E (T ). Among all MSTs, we select the one which has the maximum value k . Let this MST be T = (V, E ). Dene k = k (T ). Let us now consider the stage in the algorithms execution when the edge ek+1 has been added to T . Let Tk = (Vk , Ek ) be the tree formed by the addition of the edges e1 , . . . , ek . Then ek+1 = xy where x V (Tk ) and y / V (Tk ). Consider the graph T + ek+1 . This graph contains some cycle C -such a cycle necessarily contains the edge ek+1 .
362
The cycle C consists of the edge ek+1 = xy , plus a path, say P connecting the vertices x and y in the spanning tree T . At least one edge of the path P has one vertex in the set Vk and the other vertex not in Vk . Let e be such an edge. Obviously e is dierent from ek+1 (see Fig. 6.25) and also e E and ek+1 / E.
x ek+1 y
edges in P Figure 6.25: Both the e and ek+1 connect a vertex of Vk with a vertex outside Vk and by the edge selection rule in the algorithm we get w(ek+1 ) w(e).
+ eK +1 ) e. This graph has n 1 edges Now consider the graph T = (T
and is connected as can be easily seen. Hence it is a spanning tree. Now we ) w(e) + w(eK +1 ) w(E ) and thus T is an MST as have w(E (T )) = w(E hence T must be a MST. We next present another elegant algorithm to nd the MST of a weighted graph. This is Kruskals algorithm.
). This is a contradiction to the choice of T and well, but with k (T ) > k (T
363
6.6.4
Algorithm KRUSKAL
Given a weighted graph G, let us denote the edges of G by the sequence {e1 , e2 , . . . , em } where w(e1 ) w(e2 ) , . . . , w(em ). For the above ordering of edges, execute algorithm SP-TREE. Exercise 6.6.9: Prove that algorithm KRUSKAL does produce an MST.
6.7
6.7.1
Independent Sets and Vertex Coverings

Basic Denitions
Denition 6.7.1: Recall (see Section 6.2.1 on page 319) that an independent set of a graph G is a subset S V (G) such that no two vertices of S are adjacent in G. S is a maximum independent set of G if G has no independent set S with |S | > |S |. A maximal independent set of G is an independent set that is not a proper subset of an independent set of G.
Chapter 6 Graph Theory p u v t s Figure 6.26: r q
364
In Fig 6.26, {v } and {p, q, r, s, t, u} are both maximal independent sets. The latter set is also a maximum independent set. Denition 6.7.2: A subset K V (G) is called a covering of G if every edge of G is incident with at least one vertex of K . A covering K is minimum if there is no covering K of G such that |K | < |K |; it is minimal if there is no covering K of G such that K is a proper subset of K . u
y z x w
Figure 6.27: Wheel W5 In the graph W5 of Fig 6.27, {u, v, w, x, y } is a covering of W5 ; {u, w, x, z } is a minimal covering.
Chapter 6 Graph Theory Theorem 6.7.3:
365
In a graph G = V (G), E (G) , a subset S V (G) is independent if V (G) \ S is a covering of G. Proof. In proof, we note that S is independent if and only if no two vertices of S are adjacent in G. Hence every edge of S must be incident to a vertex of V (G) \ S . Therefore V (G) \ S is a covering of G. Denition 6.7.4: The number of vertices in a maximum independent set of G is called the independence number of G and is denoted by (G). The number of vertices in a minimum covering of G is the covering number of G and is denoted by (G). Corollary 6.7.5: For a graph G of order n, (G) + (G) = n. Proof. Let S be a maximum independent set of G. By Theorem 6.7.3, V (G) \ S is a covering of G and therefore |V (G) \ S | = n (G) (G) or n (G)+ (G). Similarly, let K be a minimum covering of G. Then V (G) \ K is an independent set and so |V (G) \ K | = n (G) (G) or n (G)+ (G). The two inequalities together imply that (G) + (G) = n. Exercise 6.7.6: Prove that the smallest possible maximal independent set in a d-regular graph is n/(d + 1). Exercise 6.7.7:
366
Show that a graph G having all degrees at most d satises the following inequality: (G) V (G) d+1
6.8
6.8.1
Vertex Colorings of Graphs

Basic Ideas
We begin with a simple problem, known as the storage problem. The Chemistry department of a college wants to store various chemicals so that incompatible chemicals (two chemicals are incompatible when they cause a violent reaction when brought together) are stored in dierent rooms. The college is interested in knowing the minimum number of rooms required to store all the chemicals so that no two incompatible chemicals are in the same room. To solve the above problem, we form a graph G = V (G), E (G) , where V (G) corresponds to the chemicals and uv E (G) if and only if the chemicals corresponding to u and v are incompatible. Then any set of compatible chemicals corresponds to an independent set of G. Thus a safe storing scheme of chemicals corresponds to a partition of V (G) into independent subsets of G. The cardinality of such a minimum partition of V (G) is then the required number of rooms. This minimum cardinality is called the chromatic number of the graph G. Denition 6.8.1: The chromatic number (G) of a graph G is the minimum number of independent subsets that partition the vertex set of G. Any such minimum
Chapter 6 Graph Theory partition is called a chromatic partition of V (G).
367
We can interpret the above described storage problem as a vertex coloring problem: we are asked to color the vertices of a graph G so that no two adjacent vertices receive the same color; that is, no two incompatible chemicals are marked by the same color. If we use the minimum number of colors, we have solved the problem. Denition 6.8.2: A k -coloring of a graph G is a labeling f : V (G) {1, . . . , k }. The labels are interpreted as colors; all vertices with the same color form a color class. A k -coloring f is proper if an edge uv is in E (G) i f (u) = f (v ). A graph G is k -colorable if it has a proper k -coloring. We will call the labels colors because their numerical value is not important. Note that (G) is then the minimum number of colors needed for a proper coloring of G. We also say G is k -chromatic to mean (G) = k . It is obvious that (Kn ) = n. Further (G) = 2 if and only if G is bipartite having at least one edge. We can also reason that (Cn ) = 2 if n is even and (Cn ) = 3 if n is odd. Exercise 6.8.3: Prove that (G) = 2 if and only if G is a bipartite graph with at least one edge. The Petersen graph, P has chromatic number 3. Fig. 6.28 shows a proper 3-coloring of P using three colors. Certainly, P is not 2-colorable, since it contains an odd cycle.
Chapter 6 Graph Theory 1 3 3 1 1 2 2 1 Figure 6.28: 3 2
368
Since each color class of a graph G is an independent set, we can see that, (G) |V (G)|/(G), where (G) is the independence number of G. Denition 6.8.4: A graph G is called critical if for every proper subgraph H of G, we have (H ) < (G). Also, G is called k -critical if it is k -chromatic and critical. The above denition holds for any graph. When G is connected it is equivalent to the condition that (G e) < (G) for each edge e E (G); but then this is equivalent to saying (G e) = (G) 1. If (G) = 1, then G is either trivial or totally disconnected. Hence G is 1-critical if and only if G is K1 . Also (G) = 2 implies that G is bipartite and has at least one edge. Hence G is 2-critical if and only if G is K2 . Exercise 6.8.5: Prove that every critical graph is connected. Exercise 6.8.6: Show that if G is k -critical, then for any v V (G) and e E (G), (G v ) = (G e) = k 1.
369
It is clear that any k -chromatic graph contains a k -critical subgraph. This can be seen by removing vertices and edges in succession, whenever possible, without decreasing the chromatic number. Theorem 6.8.7: If G is k -critical, then (G) k 1. Proof. Suppose (G) k 2. Let v be a vertex of minimum degree in the graph G. Since G is k -critical, (G v ) = (G) 1 = k 1 (refer Exercise 6.8.6). Hence in any proper (k 1)-coloring of G v , at most (k 2) colors alone would have been used to color the neighbors of v in G. Thus there is at least one color, say c, that is left out of these (k 1) colors. If v is given that color c, a proper (k 1)-coloring of G is obtained. This is impossible since G is k -chromatic. Hence (G) (k 1).
6.8.2
Bounds for (G)
We begin with the following corollary to Theorem 6.8.7 Corollary 6.8.8: For any graph (G) 1 + (G). Proof. Let G be a k -chromatic graph and let H be a k -critical subgraph of G. Then (H ) = (G) = k . By Theorem 6.8.7, (H ) k 1 and hence k 1 + (H ) 1 + (H ) 1 + (G). We can also show that the above result is implied by the greedy coloring
Chapter 6 Graph Theory algorithm given below.
370
Algorithm GREEDY-COLORING
Given V (G), let v1 , v2 , . . . , vn be a vertex ordering, where |V (G)| = n. Color the vertices in this order, assigning to vi the smallest-indexed label (color) not already used on its lower-indexed neighbors. In a vertex ordering, each vertex v has at most (G) neighbors prior to v in the ordering. So the algorithm GREEDY-COLORING cannot be forced to use more than (G)+1 colors. (reason: v has at most (G) vertices adjacent to it; for coloring these adjacent vertices we need (G) colors and a dierent color for coloring v itself). This constructively proves that (G) (G) + 1. If G is an odd cycle, then (G) = 3 = 2+1 = 1+(G); if G is a complete graph Kk , (G) = k = 1 + (k 1) = 1 + (G). That these are the only two extremal families of graphs for which (G) = 1 + (G) is asserted by the following theorem, which we state without proof. Theorem 6.8.9 (Brooks Theorem): If G is a connected graph other than a complete graph or an odd cycle, then (G) (G). Exercise 6.8.10: If (G) = k , then show that G contains at least k vertices each of degree at least k 1.
Chapter 7 Coding Theory

7.1 Introduction
Coding Theory has its origin in communication engineering. But with Shannons seminal paper of 1948 [?], it has been greatly inuenced by mathematics with a variety of mathematical techniques to tackle its problems. Algebraic coding theory uses a great deal of matrices, groups, rings, elds, vector spaces, algebraic number theory and, not to speak of, algebraic geometry. In algebraic coding, each message is regarded as a block of symbols taken from a nite alphabet. On most occasions, these are elements of Z2 = {0, 1}. Each message is then a nite string of 0s and 1s. For instance, 00110111 is a message. Usually, the messages get transmitted through a communication channel. It is quite possible that such channels are subjected to noises, and consequently, the messages changed. The purpose of an error correcting code is to add redundancy symbols to the message, based on, of course, on some rule so that the original message could be retrieved even though it is garbled. Any communication channel looks as in Figure 7.1. The rst box of the 371
372 Original message 1101
Message 1101
Encoder 1101001
Channel noise
Received message 0100001
Decoder 1101001
Figure 7.1: The Communication Channel channel indicates the message. It is then transmitted to the encoder which adds a certain number of redundancy symbols. In Figure 7.1, they are 001 which when added to the message 1101 gives the coded message 1101001. Because of channel noise, the coded message gets distorted and the received message in Fig. 7.1 is 00101001. This message then enters the decoder. The decoder applies the decoding algorithm and retrieves the coded message using the added redundancy symbols. From this, the original message is read o in the last box. The decoder has thus corrected two errors, that is, error in two places. The eciency of a code is the number of errors it can correct. A code is perfect if it can correct all of its errors. It is k -error-correcting if it can correct k or fewer errors. The aim of coding theory is to devise ecient codes. Its importance lies in the fact that erroneous messages could prove to be disastrous. It is relatively easier to detect errors than to correct them. Sometimes, even detection may prove to be helpful as in the case of a feedback channel, that is, a channel that has a provision for retransmission of the messages. Suppose the message 1111 is sent through a feedback channel. If a single error occurs and the received message is 0111, we can ask for a feedback twice and may get 1111 on both the occasions. We then conclude that the received message should have been 1111. Obviously, this method is not perfect as the
Chapter 7 Coding Theory 0 p q p 0
373
q Figure 7.2:
original message could have been 0011. On the other hand, if the channel is two-way, that is, it can detect errors so that the receiver knows the places where the errors have occurred and also contains the provision for feedback, then it can prove to be more eective in decoding the received message.
7.2
Binary Symmetric Channels
One of the simplest channels is the binary symmetric channel (BSC). This channel has no memory and it simply transmits two symbols 0 and 1. It has the property that the probability that a transmitted message is received correctly is q while the probability that it is not is p = 1 q . This is pictorially represented in Figure 7.2. Before considering an example of a BSC, we rst give the formal denition of a code. Denition 7.2.1: A code C of length n over a eld F is a set of vectors in F n , the space of ordered n-tuples over F . Any element of C is called a codeword of C. As an example, the set of vectors, C = {(10110), (00110), (11001), (11010)} is a code of length n over the eld Z2 . This code C has four codewords.
374
Suppose a binary codeword of length 4 (that is, a 4-digit codeword) is sent through a BSC with probability q = 0.9. Then the probability that the sent word is received correctly is q 4 = (0.9)4 = 0.6561. We now consider another code, namely, the Hamming (7, 4)-code. (See Section 7.5 below). This code has as its words the binary vectors (1000 001), (0100 011), (0010 010), (0001 111), of length 7 and all of their linear combinations over the eld Z2 . The rst four positions are information positions and the last three are the redundancy positions. There are in all 24 = 16 codewords in the code. We shall see later that this code can correct one error. Hence the probability that a received vector yields the transmitted vector is q 7 +7pq 6 , where the rst term corresponds to the case of no error and the term 7pq 6 corresponds to a single error in each of the seven possible positions. As q = 0.9, q 7 + 7pq 6 = 0.4783 + 0.3720 = 0.8503, which is quite large compared to the probability 0.6561 arrived at earlier in the case of a BSC. Hamming code is an example of a class of codes called Linear Codes. we now present some basic facts about linear codes.
7.3
Linear Codes
Denition 7.3.1: An [n, k ]-linear code C over a nite eld F is a k -dimensional subspace of V n , the vector space of ordered n-tuples over F . If F has q elements, that is, F = GF (q ), the [n, k ]-code will have q k codewords. The codewords of C are all of length n as they are n-vectors over
Chapter 7 Coding Theory F . k is the dimension of C. Denition 7.3.2:
375
A nonlinear code of length n over a eld F is just a subset of the vector space F n over F . C is a binary code if F = Z2 . A linear code C is best represented by any one of its generator matrices. Denition 7.3.3: A generator matrix of a linear code C over F is a matrix whose row vectors form a basis for C over F . If C is an [n, k ]-linear code over F , then a generator matrix of C is a k by n matrix G over F whose row vectors form a basis for C. For example, consider the binary code C1 with generator matrix 10011 G1 = 0 1 0 1 0 . 00101
Clearly, all the three row vectors of G1 are linearly independent over Z2 . Hence C1 has 23 = 8 codewords. The rst three columns of G1 are linearly independent over Z2 . Therefore, the rst three positions of any codewords of C1 may be taken as information positions and the remaining two as redundancy positions. In fact, the positions corresponding to any three linearly independent columns of G1 may be taken as information positions and the rest redundancies. Now any word of X of C1 is given by X = x1 R1 + x2 R2 + x3 R3 , (8.1)
376
where x1 , x2 , x3 are all in Z2 and R1 , R2 , R3 are the three row vectors of G1 in order. Hence by (8.1), X = (x1 , x2 , x3 , x1 + x2 , x1 + x3 ). If we take X = (x1 , x2 , x3 , x4 , x5 ), we have the relations x4 = x1 + x2 , x5 = x1 + x3 . and (8.2)
In other words, the rst redundancy coordinate of any codeword is the sum of the rst two information coordinates of that word while the next redundancy coordinate is the sum of the rst and third information coordinates. Equations (8.2) are the parity-check equations of the code C1 . They can be rewritten as x1 + x2 x4 = 0, x1 + x3 x5 = 0. In the binary case, equations (8.3) become x1 + x2 + x4 = 0, x1 + x3 x5 = 0. and (8.4) and (8.3)
In other words, the vector X = (x1 , x2 , x3 , x4 , x5 ) C1 i its coordinates satisfy (8.4). Equivalently, X C1 i it is orthogonal to the two vectors (11010) and (10101). If we take these two vectors as the row vectors of a matrix H1 , then H1 is the 2 by 5 matrix 11010 . H1 = 10101
H1 is called a parity-check matrix of the code C1 . The row vectors of H1
are orthogonal to the row vectors of G1 . (Recall that two vectors X =
377
(x1 , . . . , xn ) and Y = (y1 , . . . , yn ) of the same length n are orthogonal if their inner product (= scalar product < X, Y > = x1 y1 + + xn yn ) is zero). Now if a vector v is orthogonal to u1 , . . . , uk , then it is orthogonal to any linear combination of u1 , . . . , un . Hence the row vectors of H1 which are orthogonal to the row vectors of G1 are orthogonal to all the vectors of the row space of G1 , that is, to all the vectors of C1 . Thus C1 = X Z2 5 : H1 X t = 0 = Null space of the matrix H1 , when X t is the transpose of X . The orthogonality relations H1 X t = 0 give the parity-check conditions for the code C1 . These conditions x the redundancy positions, given the message positions of any codeword. A similar result holds good for any linear code. Thus any linear code over a eld F is either the row space of one of its generator matrices or the null space of its corresponding parity-check matrix. So far we have been considering binary linear codes. We now consider linear codes over an arbitrary nite eld F . As mentioned in Denition 7.3.1, an [n, k ] linear code C over F is a k -dimensional subspace of F n , the space of all ordered n-tuples over F . If {u1 , . . . , uk } is a basis of C over F , every word of C is a unique linear combination 1 u1 + + k uk , i F for each i.
Since i can take q values for each i, 1 i k , C has q q q (k times) = q k codewords. Let G be the k by n matrix over F having u1 , . . . , uk of F n as its row vectors. Then as G has k (= dimension of C) rows and all the k rows form a linearly independent set over F , G is a generator matrix of C. Consequently,
378
C is the row space of G over F . The null space of C is the space of vectors X F n which are orthogonal to all the words of C. In other words, it is the dual space C of C. As C is of dimension k over F , C is of dimension n k over F . Let {X1 , . . . , Xnk } be a basis of C over F . If H is the matrix whose row vectors are X1 , . . . , Xnk , then H is a parity-check matrix of C. It is an (n k ) by n matrix. Thus C = row space of G = null space of H = X F n : HX t = 0 . Theorem 7.3.4: Let G = (Ik |A) be a generator matrix of a linear code C over F , where Ik is the identity matrix of order k over F , and A is a k by (n k ) matrix over F . Then a generator matrix of C is given by H = At |Ink over F . Proof. Each row of H is orthogonal to all the rows of G since (by block multiplication, See(...), A GH t = [Ik |A] I = A + A = 0. nk Recall that in the example following Denition 7.3.3, k = 3, n = 5, and 11 G1 = (I3 |A) , where A = 1 0 01 while H1 = At |I2 = At |I2 over Z2 .
Chapter 7 Coding Theory Corollary 7.3.5:
379
G = [Ik |A] is a generator matrix of a code C of length n i H = [At |Ink ] is a parity-check matrix of C.
7.4
The Minimum Weight (Hamming Weight) of a Code
Denition 7.4.1: The weight W (v ) of a codeword v of a code C is the number of nonzero coordinates in v . The minimum weight of C is the least of the weights of its nonzero codewords. The weight of the zero vector of C is naturally zero. Example 7.4.2: As an example, consider the binary code C2 with generator matrix 10110 G2 = 0 1 1 0 1 = [I2 |A] . C2 has four codewords. Its three nonzero words are u1 = (10110),
u2 = (01101), and u3 = u1 + u2 = (11011). Their weights are 3, 3 and 4 respectively. Hence the minimum weight of C2 is 3. A parity-check matrix of C2 is, (refer Theorem 7.3.4) H2 = [At|I52 ] = At |I3 = Denition 7.4.3: Let X, Y F n . The distance d(X, Y ), also called the Hamming distance between X and Y , is dened to be the number of places in which X and Y dier. 11110 10010 01001 (as F = Z2 ).
380
If X and Y are codewords of a linear code C, over a eld F , then X Y is also in C and has nonzero coordinates only at the places where X and Y dier. Accordingly, if X and Y are words of a linear code C, then d(X, Y ) = wt (X Y ). We state this result as our next theorem. Theorem 7.4.4: The minimum distance of a linear code C is the minimum weight of a nonzero codeword of C. Thus for the linear code C2 of Example 7.4.2, the minimum distance is 3. The function d(X, Y ) dened in Equation (8.5) does indeed dene a distance function (that is, a metric) on C. That is to say, it has the following three properties: For all X, Y, Z in C, (i) d(X, Y ) 0, and d(X, Y ) = 0 i X = Y . (ii) d(X, Y ) = d(Y, X ). (iii) d(X, Z ) d(X, Y ) + d(Y, Z ). (8.5)
7.5
Hamming Codes
Hamming codes are binary linear codes. They can be dened either by their generator matrices or by their parity-check matrices. We prefer the latter. Let us start by dening the [7, 4]-Hamming code H3 . The seven column vectors of its parity-check matrix H are the binary representations of the
381
numbers 1 to 7 written in such a way that the last three of its column vectors form I3 , the identity matrix of order 3. Thus H= 1110100 1101010 . 1011001
The columns of H are the binary representations of the numbers 7, 6, 5, 3; 4, 2, 1 respectively. As H is of the form [At |I3 ], the generator matrix of H3 is given by 1 0 G = [I4 |A] = 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 1 1 0 1 . 1
H3 is of length 23 1 3 = 4, and dimension 4 = 7 3 = 23 1 3. What is the minimum distance of H3 ? One way of nding it is to list all the 24 1 nonzero codewords (See Theorem 7.4.4). However, a better way of determining it is the following. The rst row of G is of weight 4 while the remaining rows are of weight 3. The sum of any two or three of these row vectors as well as the sum of all the four row vectors of G are all of weight at least 3. Hence the minimum distance of H3 is 3.
7.6
Standard Array Decoding
We now write the coset decomposition of Z7 2 with respect to the subspace H3 .

7 7 (Recall that H3 is a subgroup of the additive group Z7 2 ). As Z2 has 2 vectors,
and H3 has 24 codewords, the number of cosets of H3 in Z2 is 27 /24 = 23 . (See ....). Each coset is of the form X + H3 = {X + v : v H3 }. Any two cosets are either identical or disjoint. The vector X is a representative of the cosetX + H3 . The zero vector is a representative of the coset H3 . If X and Y are each of weight 1, the coset X + H3 = Y + H3 , since X Y is of weight
382
at most 2 and hence does not belong to H3 (as H3 is of minimum weight 3). Hence the seven vectors of weight 1 in Z7 2 together with the zero vector dene 8 = 23 pairwise disjoint cosets exhausting all the 23 24 = 27 vectors of Z7 2 . These eight vectors (namely, the seven vectors of weight one and the zero vector) are called coset leaders. We now construct a double array (Figure 7.3) of vectors of Z7 2 with the cosets dened by the eight coset leaders (mentioned above). The rst row is the coset dened by the zero vector, namely, H3 . Figure 7.3 gives the standard array for the code H3 . If u is the message vector (that is, codeword) and v is the received vector, then v u = e is the error vector. If we assume that v has one or no error, then e is of weight 1 or 0. Accordingly e is a coset leader of the standard array. Hence to get u from v , we subtract e from v . In the binary case (as e = e), u = v + e. For instance, if in Figure 7.3, v = (1100 110), then v is present in the second coset for which the leader is e = (1000 000). Hence the message is u = v + e = (0100 110). This incidentally shows that H3 can correct single errors. However, if for instance, u = (0100 110) and v = (1000 110), then e = (1100 000) is of weight 2 and is not a coset leader of the standard array of Figure 7.3. In this case, the standard array decoding of H3 will not work as it would wrongly decode v as (1000 110) (0000 001) = (1000 111) H3 . (Notice that v is present in the last row of Figure 7.3). The error is due to the fact that v has two errors and not just one. Standard array decoding is therefore maximum likelihood decoding. The general Hamming code Hm is dened analogous to H3 . Its parity-check matrix H has the binary representations of the numbers 1, 2, 3, . . . , 2m 1 as its column vectors. Each such vector is a vector of
383
length m. Hence H is a m by 2m 1 binary matrix and the dimension of Hm is 2m 1 m = 1 (number of columns in H ) (number of rows in H )). In other words, Hm is a [2m 1, 2m 1 m] linear code over Z2 . Notice that the matrix H has rank m since H contains Im as a submatrix.
coset leader
C = H3
(0000000) (1000111) (0100110) (0010101) (0001011) (1100001) (1010010) (1001100) (0110011) (0101101) (0011010) (0111000) (1011001) (1101010) (1110100) (1111111) (1000000) (0000111) (1100110) (1010101) (1001011) (0100001) (0010010) (0001100) (1110011) (1101101) (1011010) (1111000) (0011001) (0101010) (0110100) (0111111) (0100000) (1100111) (0000110) (0110101) (0101011) (1000001) (1110010) (1101100) (0010011) (0001101) (0111010) (0011000) (1111001) (1001010) (1010100) (1011111) (0010000) (1010111) (0110110) (0000101) (0011011) (1110001) (1000010) (1011100) (0100011) (0111101) (0001010) (0101000) (1001001) (1111010) (1100100) (1101111) (0001000) (1001111) (0101110) (0011101) (0000011) (1101001) (1011010) (1000100) (0111011) (0100101) (0010010) (0110000) (1010001) (1100010) (1111100) (1110111) (0000100) (1000011) (0100010) (0010001) (0001111) (1100101) (1010110) (1001000) (0110111) (0101001) (0011110) (0111100) (1011101) (1101110) (1110000) (1111011) (0000010) (1000101) (0100100) (0010111) (0001001) (1100011) (1010000) (1001110) (0110001) (0101111) (0011000) (0111010) (1011011) (1101000) (1110110) (1111101)
Figure 7.3:
384
Chapter 7 Coding Theory H : 0 . . . 0 0 . . . 1 0 . . . 1 H : 1 0 0 1 . . . . . . 0 0
385
1 0 1 1 1 0
1 1
j k (a)
a b (b)
Figure 7.4: The minimum distance of Hm , m 2 is 3. This can be seen as follows. Re2 call that Hm = X Z2
m 1
: HX t = 0 . Let i, j, k denote respectively the
numbers of the columns of H in which the m-vectors (0. . . 011), (00. . . 0101) and (0. . . 0110) are present. (see Figure 7.4 (a)). Let v be the binary vector of length 2m 1 which has 1 in the i-th, j -th and k -th positions and zero at other positions. Clearly, v is orthogonal to all the row vectors of H and hence belongs to Hm . Hence Hm has a word of weight 3. Further, Hm has no word of weight 2 or 1. Suppose Hm has a word u of weight 2. Let a, b be the positions where u has 1 (Figure 7.4 (b)). As the columns of H are distinct, the a-th and b-th columns of H are not identical. It is clear that u is not orthogonal to all the row vectors of H (For instance, u is not orthogonal to the rst row of H ). Hence Hm has no word of weight 2. Similarly, it has no word of weight 1. Thus the minimum weight of Hm is 3.
386
7.7
Sphere Packings
As before, let F n denote the vector space of all ordered n-tuples over F . Recall (Section 7.4) that F n is a metric space with the Hamming distance between vectors of F n as the metric. Denition 7.7.1: In F n , the sphere with centre X and radius r is the set S (X, r) = {Y F n : d(X, Y ) r} F n . Denition 7.7.2: An r-error-correcting linear code C is perfect if the spheres of radius r with the words of C as centres are pairwise disjoint and their union is F n . The above denition is justied because if v is a received vector that has at most r errors, then v is at a distance at most r from a unique codeword u of C, and hence v belongs to the unique sphere S (u, r); then, v will be decoded as u. Theorem 7.7.3: The Hamming code Hm is a single-error-correcting perfect code. Proof. Hm is code of dimension 2m 1 m over Z2 and hence, has 2(2
m 1m)
words. Now if v is any codeword of Hm , S (v, 1) contains v (which is at distance zero from v ) and the 2m 1 codewords got from v (which is of length 2m 1) by altering each position once at a time. Thus S (v, 1) contains 1 + (2m 1) = 2m words of Hm . Consequently, the union of the spheres S (v, 1) as v varies over Hm contains 22
m 1m
2m vectors. But this number
Chapter 7 Coding Theory is = 22

m 1
387
= the number of vectors in F n , where n = 2m 1.
Out next theorem shows that the minimum distance d of a linear code stands as a good measure of the code. Theorem 7.7.4: If C is a linear code of minimum distance d, then C can correct t = or fewer errors. Proof. It is enough to show that the spheres of radius t centred at the codewords of C are pairwise disjoint. Indeed, if u and v are in C, and if z B (u, t) B (v, t) (See Figure 7.5), then z d1 2
Figure 7.5:
d(u, v ) d(u, z ) + d(z, v ) t + t = 2t d 1 < d, a contradiction to the fact that d is the minimum distance of C. For instance, for the Hamming code Hm , d = 3, and therefore t = d1 = 1, and so Hm is single error-correcting, as observed earlier in 2 Section 7.6.
388
7.8
Extended Codes
Let C be a binary linear code of length n. We can extend this code by adding an overall parity-check at the end. This means, we add a zero at the end of each word of even weight in C and add 1 at the end of every word of odd weight. This gives an extended code C of length n + 1. For looking at some of the properties of C , we need a lemma. Lemma 7.8.1: Let w denote the weight function of a binary code C. Then w(X + Y ) = w(X ) + w(Y ) 2(X Y ) where X Y is the number of common 1s in X and Y . Proof. Let X and Y have common 1s in the i1 , i2 , . . . , ip -th positions so that X Y = p. Let X have 1s in the i1 , . . . , ip and j1 , . . . , jq -th positions and Y in the i1 , . . . , ip and l1 , . . . , ir -th positions. Then w(X ) = p + q , w(Y ) = p + r and w(X + Y ) = q + r. The proof is now clear. Coming back to the extended code C , by denition, it is an even weight code, that is, every word of C is of even weight. In fact, if X and Y are in C , then the RHS of Equation (8.6) is even and therefore w(X + Y ) is even. A generator matrix of C is obtained by adding an overall parity-check to the rows of a generator matrix of C. Thus a generator matrix of the extended code H3 is
(8.6)
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 1 1 0
1 1 0 1
1 0 1 1
0 1 1 1
389
7.9
Syndrome Decoding
Let C be an [n, k ]-linear code over GF (q ) = F . The standard array decoding scheme requires storage of q n vectors of F n and also comparisons of a received vector with the coset leaders. The number of such comparisons is at most q nk , the number of distinct cosets in the standard array. Hence any method that makes a sizeable reduction in storage and the number of comparisons is to be welcomed. One such method is given by the syndrome-decoding scheme. Denition 7.9.1: The syndrome of a vector Y F n with respect to a linear [n, k ]-code over F with parity-check matrix H is the vector HY t . As H is an (n k ) by n matrix, the syndrome of Y is a column vector of length n k . We denote the syndrome of Y by S (Y ). For instance, the syndrome of Y = (1110 001) with respect to the 1 1 1111100 1 1101010 0 = 1 0 1 1 0 0 1 0 0 1 Theorem 7.9.2: Two vectors of F n belong to the same coset in the standard array decomposition of a linear code C i they have the same syndrome. Proof. Let u and v belong to the same coset a + C, a F n , of C. Then u = a + X and v = a + Y , where X, Y are in C. Then S (u) = S (a + X ) = Hamming code H3 is
1 0 . 1
390
H (a + X )t = Hat + HX t = S (a) (Recall that as X C, S (X ) = HX t = 0). Similarly S (v ) = S (a). Thus S (a) = S (b). Conversely, let S (u) = S (v ). Then Hut = Hv t and therefore H (u v )t = 0. This means that u v C, and hence the cosets u + C and v + C are equal. Theorem 7.9.2 shows that the syndromes of all the vectors of F n are determined by the syndromes of the coset leaders of the standard array of C. In case C is an [n, k ]-binary linear code, there are 2nk cosets and therefore the number of distinct syndromes is 2nk . Hence in contrast to standardarray decoding, it is enough to store 2nk vectors (instead of 2n vectors) in the syndrome decoding. For instance, if C is an [100, 30]-binary linear code, it is enough to store the 270 syndromes instead of the 2100 vectors in Z100 2 , a huge saving indeed.
7.10
Exercises
0 1 1 1 1. Find all the codewords of the binary code with generator matrix [ 1 1 0 0 1 1 ].
Find a parity-check matrix of the code. Write down the parity-check equations. 2. Show by means of an example that the syndrome of a vector depends on the choice of the parity-check matrix. 3. Decode the received vector (1100 011) in H3 using (i) the standard array decoding, and (ii) syndrome decoding.
7 4. How many vectors of Z7 2 are there in S (u, 3), where u Z2 ?
391
5. How many vectors of F n are there in S (u, 3), where u F n , and |F | = q . 6. Show that a t-error-correcting binary perfect [n, k ]-linear code satises the relation n n n + + + 0 1 t = 2nk .
More generally, show that a t-error-correcting perfect [n, k ]-linear code over GF (q ) satises the relation n n n + + + 0 1 t = q nk .
7. Show that there exists a set of eight binary vectors of length 6 such that the distance between any two of them is at least 3. 8. Show that it is impossible to nd nine binary vectors of length 6 such that the distance between any two of them is at least 3. 9. Show that the function d(X, Y ) dened in Section 7.4 is indeed a metric. d 10. Show that a linear code of minimum distance d can detect at most 2 errors.
Chapter 8 Cryptography
8.1 Introduction
Cryptography is the science of transmitting messages in a secured fashion. Naturally, it has become an important tool in this information age. It has already entered into our day-to-day life through e-commerce, e-banking etc., not to speak of defence. To make a message secure, the sender usually sends the message in a disguised form. The intended receiver removes the disguise and then reads o the original message. The original message of the sender is the plaintext and the disguised message is the ciphertext. The plaintext and the ciphertext are usually written in the same alphabet. The plaintext and the ciphertext are divided, for the sake of computational convenience, into units of a xed length. The process of converting a plaintext to a ciphertext is known as enciphering or encryption , and the reverse process is known as deciphering or decryption . A message unit may consist of a single letter or any ordered k -tuple, k 2. Each such unit is converted into a number in a suitable arith392
393
metic and the transformations are then carried out on this set of numbers. An enciphering transformation f converts a plaintext message unit P (given by its corresponding number)into a number that represents the corresponding ciphertext message unit C while its inverse transformation, namely, the deciphering transformation just does the opposite by taking C to P . We assume that there is a 11 correspondence between the set of all plaintext units P to the set C of all ciphertext units. Hence each plaintext unit gives rise to a unique ciphertext unit and vice versa. This can be represented symbolically by P C , where C stands for the enciphering transformation. Such a set up is known as a cryptosystem .
f f 1
8.2
8.2.1
Some Classical Cryptosystem

Caesar Cryptosystem
One of the earliest of the cryptosystems is the Caesar cryptosystem attributed to Julius Caesar of 1st century. In this cryptosystem, the alphabet is the set of English characters A, B , C , . . . ,X , Y , Z labelled 0, 1, 2, . . . , 23, 24, 25 respectively so that 0 corresponds to A, 1 corresponds to B and so on, and nally 25 corresponds to Z . In this system, each message unit is of length 1 and hence consists of a single character. The encryption (transformation) f P C is given by f (a) = a + 3 (mod 26). (8.1)
Chapter 8 Cryptography A 0 N 13 B 1 O 14 C 2 P 15 D 3 Q 16 E 4 R 17 F 5 S 18 G 6 T 19 H 7 U 20 I 8 V 21 J 9 W 22 K 10 X 23 L 11 Y 24 M 12 Z 25
394
Figure 8.1: while the decryption (transformation) f 1 C P is given by f 1 (b) = b 3 (mod 26). (8.2)
Figure 8.1 gives the 11 correspondence between the characters A to Z and the numbers 1 to 25. For example, the word OKAY corresponds to the number sequence (14) (10) (0) (24) and this gets transformed, by eqn. (8.1), to (17)(13)(3)(1) and so the corresponding ciphertext is RNDB. The deciphering transformation applied to RNDB then gives back the message OKAY.
8.2.2
Ane Cryptosystem
Suppose we want to encrypt the message I LIKE IT. In addition to the English characters, we have in the message two spaces in between words. So we add space to our alphabet by assigning to it the number 26. We now do arithmetic modulo 27 instead of 26. Suppose, in addition, each such message unit is an ordered pair (sometimes called a digraph). Then each unit corresponds to a unique number in the interval [0, 272 1]. Now, in the message, I LIKE IT, the number of characters including the two spaces is 9, an odd number. As our message units are ordered pairs, we add an extra blank space at the end of the message. This makes the number of characters
Chapter 8 Cryptography 10 and hence the message can be divided into 5 units U1 , U2 , . . . , U5 : |I | |LI | |KE | |I | |T |
395
where U1 = I etc. (Here, stands for space). Now U1 corresponds to the number (see Figure 8.1) 8 271 + 26 = 242. Assume that the enciphering transformation that acts now on ordered pairs is given by C aP + b (mod 272 ) (8.3)
where a and b are in the ring Z27 , a = 0 and (a, 27) = 1. In eqn. (8.3), P and C denote a pair of corresponding plaintext and ciphertext units. The Extended Euclidean Algorithm [?] ensures that as (a, 27) = 1, a has a unique inverse a1(mod 27) so that aa1 1 (mod 272 ). This enables us to solve for P in terms of C from the Congruence (8.3). Indeed, we have P a1 (C b) (mod 272 ) As a specic example, let us take a = 4 and b = 2. Then C 4P + 2 (mod 272 ). Further as (4, 272 ) = 1, 4 has a unique inverse (mod 272 ); in fact 41 = 182 as 4 (182 ) 1(mod 272 ). This when substituted in congruence (8.4) gives P 182 (C 2) (mod 272 ). Getting back to P =I = 252 in I LIKE IT, we get C 4 252 + 2 (mod 272 ) (8.4)
Chapter 8 Cryptography 271 (mod 272 ).
396
Now 271 = 10 27 + 1, and therefore it corresponds to the ordered pair KB in the ciphertext. Similarly, LI, KE and (space)I and T(space) correspond to ?????????? respectively. Thus the ciphertext that corresponds to the plaintext I LIKE IT is KB, . . . , . . . , . . . , . To get back the plaintext, we apply the inverse transformation (8.4). As the numerical equivalent of KB is 271, relation (8.4) gives P (182 ) (271 2) = 1863 405 ( mod 272 ) and this, as seen earlier, corresponds to I (). An equation of the form C = aP + b is known as an ane transformation. Hence such cryptosystems are called ane cryptosystems . In the Ceasar cryptosystem and the ane cryptosystem, in the transformation f (a) a + 3(mod 26), 3 is known as the key of the transformation. In the transformation given by eqn. (8.3), there are two keys, namely, a and b.
8.2.3
Private Key Cryptosystems
In the Caesar cryptosystem and the ane cryptosystem, the keys are known to the sender and the receiver in advance. That is to say that whatever information does the sender has with regard to his encryption, it is shared by the receiver. For this reason, these cryptosystems are called private key cryptosystems.
8.2.4
Hacking an ane cryptosystem
Suppose an intruder I (that is a person other than the sender A and the receiver B ) who has no knowledge of the private keys wants to hack the
397
message, that is, decipher the message stealthily. We may suppose that the type of cryptosystem used by A and B of the system including the unit length, though not the keys, are known to I . Such an information may get leaked out over a passage of time or may be obtained even by spying. How does I go about hacking it? He does it by a method known as frequency analysis. Assume for a moment that the message units are of length 1. Look at a long string of the ciphertext and nd out the most-repeated character, the next most-repeated character and so on. Suppose, for the sake of precision, they are U, V, X, . . . Now in the English language, the most common characters of the alphabet of 27 letters consisting of the English characters A to Z and space are known to be space and E. Then space and E of the plaintext correspond to U and V of the ciphertext respectively. If the cryptosystem used is the ane system given by equation, C = aP + b (mod 27), we have and Subtraction yields 22a 1 (mod 27) (8.5) 20 = a 26 + b (mod 27) 21 = a 4 = b (mod 27).
As (22, 27) = 1, (8.5) has a unique solution a = 11. This gives b = 21 4a = 21 44 = 23 = 4 ( mod 27). The cipher has thus been hacked. Suppose now the cryptosystem is based on an ane transformation C = aP + b with unit length 2. If the same alphabet consisting of 27 characters of this section (namely, A to Z and space) is used, each unit corresponds to
398
a unique nonnegative integer less that 272 . Suppose the frequency analysis of the ciphertext reveals that the most commonly occurring ordered pairs are CB and DX in their decreasing orders of their frequencies. The decryption transformation is of the form P a C + b (mod 272 ) (8.6)
Here a and b are the enciphering keys and a , b are the deciphering keys. Now it is known that in English language, the most frequently occurring order pairs, in their decreasing orders of their frequencies, are E(space) and S(space). Symbolically, E(space) CA, S(space) CX. Writing these in terms of their numerical equivalents we get (4 27) + 26 = 134 (2 27) + 0 =54, (18 27) + 26 = 512 (3 27) + 23=104. These, when substituted in (8.5), give the congruences: 134 54a + b (mod 729), 512 104a + b (mod 729). Subtraction gives 50a 378 (mod 729). As (50, 729) = 1, this congruence has a unique solution by the Extended Euclidean Algorithm [?]. In fact, a =?????????? and therefore b =?????????? and and (8.7) and
399
Thus the deciphering keys a and b have been determined and the cryptosystem has been hacked. In our case, the gcd(50, 729) happened to be 1 and hence we had no problem in determining the deciphering key. If not, we have to try all the possible solutions for a and take the plaintext that is meaningful. Instead, we can also continue with our frequency analysis and compare the next most-repeated ordered pairs in the plaintext and ciphertext and get a third congruence and try for a solution in conjunction with one or both of the earlier congruences. If these also fail, we may have to adopt adhoc techniques to determine a and b .
8.3
Encryption Using Matrices
Assume once again that the message units are ordered pairs in the same alphabet of size 27 of Section 8.2. We can use 2 by 2 matrices over the ring Z27 to set up a private key cryptosystem in this case. In fact if A is any 2 by 2 matrix with entries from Z27 , and (X, Y ) is any plaintext unit, we X encipher it as B = A , where B is again a 2 by 1 matrix and therefore Y X , we have the equations a ciphertext unit of length 2. If B = Y X = A Y Y X X and = A1 . Y Y X
(8.8)
400
The rst equation of (8.8) gives the encryption while the second gives the decryption. Notice that A1 must be taken in Z27 . For A1 to exist, we must have (det A, 27) = 1. If this were not the case, we may have to try once 21 As an example, take A = . Then det A = 2, and (det A, 27) = 43 7 (2, 27) = 1. Hence 2 (mod 27) exists and 21 = 14 Z27 . This gives A1 15 13 42 14 3 1 = = = 14 25 1 56 28 4 2 over Z27 . (8.9) again adhoc methods.
Thus the ciphertext of HEAD is SNDJ. We can decipher SNDJ in
Suppose, for instance, we want to encipher HEAD using the above matrix 7 transformation. We proceed as follows: HE corresponds to the vector , 4 0 and AD to the vector . Hence the enciphering transformation gives 3 the corresponding ciphertext as 0 7 70 A = A , A (mod 27) 3 4 43 21 0 21 7 = , (mod 27) 43 3 43 4 3 18 = , (mod 27) 9 40 D S 3 18 = , = , J N 9 13
Chapter 8 Cryptography exactly the same manner by taking A1 in Z27 . This gives the plaintext 15 13 3 18 , as given by (8.9) A1 , A1 , where A1 = 25 1 9 13 Therefore the plaintext is 0 7 162 450 , (mod 27) = , 3 4 84 463
401
and this corresponds to HEAD.
8.4
Exercises
5 in Z27 . 7
3. Encipher the word MATH using the matrix A of Exercise 1 above as the enciphering matrix in the alphabet A to Z of size 26. Check your result by deciphering your ciphertext. 4. Solve the simultaneous congruences x y = 4 (mod 26) 7x 4y = 10 (mod 26). 5. Encipher the word STRIKES using an ane transformation C 4P + 7 ( mod 272 ) acting on units of length 2 over an alphabet of size 27 consisting of A to Z and the exclamation mark ! with 0 to 25 and 26 as the corresponding numerals.
12 3 in Z29 . 2. Find the inverse of A = 5 17
17 1. Find the inverse of A = 8
402
6. Suppose that we know that our adversary is using a 2 by 2 enciphering matrix with a 29-letter alphabet, where A to Z have numerical equivalents 0 to 25, (space)=26, ?=27 and !=28. We receive the message (space) C ? Y C F ! Q, T W I U M H Q V. Suppose we know by some means that the last four letters of the plaintext are our adversarys signature MIKE. Determine the full plaintext.
8.5
Other Private Key Cryptosystems
We now describe two other private key cryptosystems.
8.5.1
Vigenere Cipher
In this cipher, the plaintext is in the English alphabet. The key consists of an ordered set of letters for some xed positive integer d. The plaintext is divided into message units of length d. The ciphertext is obtained by adding the key to each message unit using modulo 26 addition. For example, let d = 3 and the key be XYZ. If the message is ABANDON, the ciphertext is obtained by taking the numerical equivalence of the plaintext, namely, 010 (13) 3 (14) 14 (13), and the adding modulo 26 numerical equivalence of XYZ, namely, (23) (24) (25) of the key. This yields (23) (25) (25) (36) (27) (39) (37) (37) (mod 26)
Chapter 8 Cryptography = (23) (25) 25 (10) 1 (13) (11) (11) = X Z Z K B N L L as the ciphertext.
403
8.5.2
The One-Time Pad
This was introduced by G. S. Vernam in 1926. The alphabet for the plaintext is the set of 26 English characters. If the message M is of length N , the key K is generated as a pseudorandom sequence of characters of also of the same length N . The ciphertext is then obtained by the equation C M + K (mod 26) Notwithstanding the fact that the key K is as long as the message M , the system has its own drawbacks. (i) There are only standard methods of generating pseudorandom sequences from out of , and their number is not large. (ii) The long private keyK must be communicated to the receiver in advance. Despite these drawbacks, this cryptosystem is allegedly used in some highest levels of communication such as the Washington-Moscow hotline [?]. There are several other private key cryptosystems. The interested reader can see the references [?].
404
8.6
Public Key Cryptography
All cryptosystems described so far are private key cryptosystems. This means that some one who has enough information to encipher messages has enough information to decipher messages as well. As a result, in private key cryptography, any two persons in a group who want to communicate messages in a secret way must have exchanged keys in a safe way (for instance, through a trusted courier). In 1976, the face of cryptography got altered radically with the invention of public key cryptography by Die and Hellman [?]. In this cryptosystem, the encryption can be done by any one. But the decryption can be done only by the intended recipient who alone is in possession of a secret key. At the heart of this cryptography is the concept of a one-way function. Roughly speaking, a one-way function is a 1-1 function f which is such that whenever k is given, it is possible to compute f (k ) rapidly while it is extremely dicult to compute the inverse of f in a reasonable amount of time. There is no way of asserting that such and such a function is a one-way function since the computations depend on the technology of the daythe hardware and the software. So what passes on for a one-way function today may fail to be a one-way function a few years hence. As an example of a one-way function, consider two large primes p and q each having at least 500 digits. Then it is easy to compute their product n = pq . However, given n, there is no ecient factoring algorithm as on date that would give p and q in a reasonable amount of time. The same problem of forming the product pq with p and q having 100 digits had passed on for a one-way function in the 1980s but is no longer so today.
405
8.6.1
Working of Public Key Cryptosystems
A public key cryptosystem works in the following way: Each person A in a group has a public key PA and a secret key SA . The public keys are made public as in a telephone book with PA given against the name A. A computes his own secret key SA and keeps it within himself. The security of the system rests on the fact that no person of the group other than A or an intruder would be able to nd out SA . The keys PA and SA are chosen to be inverses of each other in that for any message M , (PA SA ) M = M = (SA PA ) M. Transmission of Messages Suppose A wants to send a message M to B in a secure fashion. The public key of B is SB which is known to every one. A sends PB M to B . Now, to decipher the message, B applies SB to it and gets SB (PB M ) = (SB PB )M = M . Note that none other than B can decipher the message sent by A since B alone is in possession of SB . Digital Signature Suppose A wants to send some instruction to a bank (for instance, transfer an amount to Mr. C from out of his account). If the intended message to the bank is M , A applies his secret key SA to M and sends SA M to the bank. He also gives his name for identication. The bank applies As public key PA to it and gets the message PA (SA M ) = (PA SA )M = M . This procedure also authenticates As digital signature. This is in fact the method adopted in credit cards.
406
We now describe two public key cryptosystems. The rst is RSA, after their inventors, Rivest, Shamir and Adleman. In fact, Die and Hellman, though they invented public key cryptography in 1976, did not give the procedure to implement it. Only Rivest Shamir and Adleman did it in 1978, two years later.
8.6.2
RSA Public Key Cryptosystem
Suppose there is a group of people who want to communicate between themselves secretly. In such a situation, RSA is the most commonly used public key cryptosystem. The length of the message units is xed in advance as also the alphabet in which the cryptosystem is operated. If for instance, the alphabet consists of the English characters and the unit length is k , then any message unit is represented by a number less than 26k = (say) N . Description of RSA We now describe RSA. 1. Each person A (traditionally called Alice) chooses two large distinct primes p and q and computes their product n = pq , where p and q are so chosen that n > N . 2. Each A chooses a small positive integer e, 1 < e < (n), such that (e, (n)) = 1, where (the Euler function) (n) = (pq ) = (p)(q ) = (p 1)(q 1). (e is odd as (n) is even). 3. As (e, (n)) = 1, by Extended Euclidean algorithm [?], e has a multi-
Chapter 8 Cryptography plicative inverse d modulo n, that is, ed 1 (mod (n)).
407
4. A (Alice) gives the ordered pair (n, e) as her public key and keeps d as her private (secret) key. 5. Encryption P (M ) of the message unit M is done by P (M ) M e (mod n), while decryption S (M ) of the cipher text unit M is given by S (M ) M (mod n)
d
(8.10)
(8.11)
Thus both P and S (of A) act on the ring Zn . Before we establish the correctness of RSA, we observe that d (which is computed using the Extended Euclidean algorithm) can be computed in O(log3 n) time. Further powers M e and M d modulo n in eqns. (8.10) and (8.11) can also be computed in O(log3 n) time [?]. Thus all computations in RSA can be done in polynomial time. Theorem 8.6.1 (Correctness of RSA): Equations (8.10) and (8.11) are indeed inverse transformations. Proof. We have S (P (M )) = (S (M e )) M ed (mod n). Hence it suces to show that M ed M (mod n).
Chapter 8 Cryptography Now, by the denition of d, ed 1 (mod (n)). But (n) = (pq ) = (p)(q ) = (p 1)(q 1), and therefore, ed = 1 + k (p 1)(q 1) for some integer k . Hence M ed = M 1+k(p1)(q1) = M M k(p1)(q1) By Fermats little theorem, if (M, p) = 1, M p1 1 (mod p) and therefore, M ed = M M 1+k(p1)(q1) M (mod p). If however (M, p) = 1, then (M, p) = p, and trivially M ed M (mod p). Hence in both the cases, M ed M (mod p). For a similar reason, M ed M (mod q ).
408
(8.12)
(8.13)
As p and q are distinct primes, the congruences (8.12) and (8.13) imply that M ed M (mod pq ) M (mod n)
409
The above description shows that if Bob wants to send the message M to Alice, he will send it as M e(mod n) using the public key of Alice. To decipher the message, Alice will raise this number to the power d and get M ed M (mod n), the original message sent by Bob. The security of RSA rests on the supposition that none other than Alice can determine the private key d of Alice. A person can compute d if he/she knows (n) = (p 1)(q 1) = n (p + q ) + 1, that is to say, if he/she knows the sum p + q . For this, he should know the factors p and q of n. Thus, in essence, the security of RSA is based on the assumption that factoring a large number n that is a product of two distinct primes is dicult. However to quote Koeblitz [?], no one can say with certainty that breaking RSA requires factoring n. In fact, there is even some indirect evidence that breaking RSA cryptosystem might not be quite as hard as factoring n. RSA is the public key cryptosystem that has had by far the most commercial success. But, increasingly, it is being challenged by elliptic curve cryptography.
8.6.3
The ElGamal Public Key Cryptosystem
We have seen that RSA is based on the premise that factoring a very larger integer which is a product of two large primes p and q is dicult compared to forming their product pq . In other words, given p and q , nding their product is a one-way function. ElGamal public key cryptosystem uses a dierent one-way function, namely, a function that computes the power of an element of a large nite group G. In other words, given G, g G, g = e, and a positive integer a, ElGamal cryptosystem is based on the assumption that computation of g a = b G is easy while given b G and g G, it is
Chapter 8 Cryptography dicult to nd the exponent a. Denition 8.6.2:
410
Let G be a nite group and b G. If y G, then the discrete logarithm of y with respect to base b is any non-negative integer x less than o(G), the order of G, such that bx = y , and we write logb y = x. As per the denition, logb y may or may not exist. However, if we take G = Fq , the group of nonzero elements of a nite eld Fq of q elements and g , a generator of the cyclic group Fq (See [?]), then for any y Fq , the discrete logarithm logg y exists. Example 8.6.3:
5 is a generator of F17 . In F17 , the discrete logarithm of 12 with respect to base 5 is 10. In symbols: log5 12 = 10. In fact, in F17 ,
5 = 51 = 5, 52 = 8, 6, 13, 14, 2, 10, 58 = 1, 12, 9, 11, 4, 3, 15, 7, 516 = 1 This logarithm is called discrete as it is taken in a nite group.
8.6.4
Description of ElGamal System
The ElGamal system works in the following way: All the users in the system agree to work in an already chosen large nite eld Fq . A generator g of Fq is xed once and for all. Each message unit is then converted into a number Fq . For instance, if the alphabet is the set of English characters and if each message unit is of length 3, then the message unit BCD will have the numerical equivalent 262 1 + 261 2 + 3 ( mod q ). It is clear that in order that these numerical equivalents of the message units are all distinct, q should be
411
quite large. In our case, q 263 . Now each user A in the system randomly chooses an integer a = aA , 0 < a < q 1, and keeps it as his or her secret key. A declares g a Fq as his public key. If B wants to send the message unit M to A, he chooses a random positive integer k , k < q 1, and sends the ordered pair g k , M g ak (8.14)
to A. Since B knows k and g a is the public key of A, B can compute g ak . How will A decipher B s message? She will rst raise the rst number of the ordered pair given in (8.14) to the power a and compute it in Fq . She will then divide the second number M g ak of the pair by g ak and get M . A can do this as she has a knowledge of a. An intruder who gets to know the pair (g k , M g ak ) cannot nd a = loggk (g ak ) Fq , since the security of the system rests on the fact that nding discrete logarithm is dicult, that is, given h and ha in Fq , there is no ecient algorithm to determine a. There are other public key cryptosystems as well. The interested reader can refer to [?].
8.7
8.7.1
Primality Testing
Nontrivial Square Roots (mod n)
We have seen that the most commonly applied public key cryptosystem, namely, the RSA is built up on very large prime numbers (numbers having, say, 500 digits and more). So there arises the natural question: Given a large positive integer, how do we know that it is a prime or not. A primality test is a test that tells if a given number is a prime or not.
412
Let n be a prime, and a a positive integer with a2 1 ( mod n). Then a is called a square root mod n. This means that n divides (a 1)(a + 1), and so, n|(a 1) or n|(a + 1); in other words a 1 ( mod n). Conversely, if a 1 ( mod n), then a2 1 ( mod n). Hence a prime number has only the trivial square root 1 and 1 modulo n. However, the converse is not true, that is, there exist composite numbers m having only trivial square roots modulo m, for instance, m = 10 is such a number. On the other hand 11 is nontrivial square root of the composite number 20 since 11 1 ( mod 20). Consider the modular exponentiation algorithm which determines ac(mod n) in O(log2 n log c) time [?]. At any intermediate stage of the algorithm, the output i is squared, taken modulo n and then multiplied to a or 1 as the case may be. If the square i2(mod n) of the output i is 1 modulo n, then already we have determined a nontrivial square root modulo n, namely, i. Therefore we can immediately conclude that n is not a prime and therefore a composite number. This is one of the major steps in the Miller-Rabin Primality Testing Algorithm to be described below.
8.7.2
Prime Number Theorem
For a positive real number x, let (x) denote the number of primes less than or equal to x. The Prime Number Theorem states that (x) is asymptotic to x/ log x; in symbols, (x) x/ log x. Here the logarithm is with respect to base e. Consequently, (n) n/ log n, or, equivalently,
(n) n
1 . log n
In
other words, in order to nd a 100-digit prime one has to examine roughly loge 10100 230 randomly chosen 100-digit numbers for primality. (This gure may drop down by half if we omit even numbers).
413
8.7.3
Pseudoprimality Testing
Fermats Little Theorem (FLT) states that if n is prime, then for each a, 1 a n 1, an1 1 (mod n). (8.15)
Note that for any given a, an1 ( mod n) can be computed in polynomial time using the repeated squaring method [?]. However, the converse of Fermats Little Theorem is not true. This is because of the presence of Carmichal numbers. A Carmichal number is a composite number n satisfying (8.15) for each a prime to n. They are sparse but are innitely many (??????????). The rst few Carmichal numbers are 561, 1105, 1729. Since we are interested in checking if a given large number n is prime or not, n is certainly odd and hence (2, n) = 1. Consequently, if 2n1 1 (mod n), we can conclude with certainty, in view of FLT, that n is composite. However, if 2n1 1 ( mod n), n may be a prime or not. If it is not a prime, then it is a pseudoprime with respect to base b. Denition 8.7.1: n is called a pseudoprime to base a, where (a, n) = 1, if (i) n is composite, and (ii) an1 1 ( mod n) In this case, n is also called a base a pseudoprime. Base-2 Pseudoprime Test Given an odd positive integer n, check if 2n1 1 ( mod n). If yes, n is composite. If not, 2n1 1 ( mod n) and n may be a prime.
414
But then there is a chance that n is not a prime. How often does this happen? For n < 10, 000, there are only 22 pseudoprimes to base 2. They are 341, 561, 645, 1105, . . . . Using better estimates due to Carlo Pomerance (See [?]), we can conclude that the chance of a randomly chosen 50 digit (resp. 100-digit) number satises (8.15) but fails to be a prime is < 106 (resp.< 1013 ). More generally, if (a, n) = 1, 1 < a < n, the pseudoprime test with reference to base a checks if an1 1 ( mod n). If true, a is composite; if not, a may be a prime.
8.7.4
The Miller-Rabin Primality Testing Algorithm
If we choose each a with (a, n) = 1, then we may have to choose (n) base numbers a in the worst case. Instead, the Miller-Rabin Test works as follows: (i) It tries several randomly chosen base values a instead of just one a. (ii) While computing each modular exponentiation an1 ( mod n), it stops as soon as it notices a nontrivial square root of 1 ( mod n) and outputs composite. We now proceed to present the pseudocode for the MillerRabin test. The code uses an auxiliary procedure WITNESS such that WITNESS (a, n) is TRUE i a is a witness to the compositeness of n. We now present and justify the construction of WITNESS.
WITNESS (a, n)
415
1 let (bk , bk1 , . . . , b0 ) be the binary representation of n 1 2 d 1 3 for i k downto 0 4 5 6 7 8 9 10 11 do x d d (d, d) ( mod n) if d = 1 and x = n 1 then return TRUE if bi = 1 then d (d, d) ( mod n) if d = 1 then return TRUE
12 return FALSE
Description of WITNESS (a, n)

Line 1 determines the binary representation of n. Lines 3 to 9 compute d as an1 ( mod n). This is done using the modular exponentiation method (??????????) and can be done in polynomial time. Lines 6 and 7 check if a nontrivial square root of 1 is present; if yes, then declares the test that n is composite. Lines 10 and 11 return TRUE if an1(mod n) 1. We again conclude that n is composite. If FALSE is returned in line 12, we conclude that n is either a prime or a pseudoprime to base a. WITNESS (a, n) gives a decisive conclusion only in the case when n is composite.
416
Correctness of WITNESS (a, n)

If line 7 returns TRUE, then, x is a nontrivial square root modulo n and so n is composite (See Section 8.7). If WITNESS returns TRUE in line 11, then it has checked that an1 1(mod n) and therefore by Fermats Little Theorem, a is composite. If line 12 returns FALSE, then an1 1(mod n), and therefore a is either a prime or a pseudoprime to base a (as mentioned before).
8.7.5
Miller-Rabin Algorithm (a, n)
The Miller-Rabin algorithm is a probabilistic algorithm to test the compositeness of n. It chooses at random s base numbers in {1, 2, . . . , n 1} from a random-number generating process: RANDOM (1, n 1). It then uses WITNESS (a, n) as an auxiliary procedure and checks if a is composite.
MILLER-RABIN (a, n)
1 for j 1 to s 2 3 4 do a RANDOM (1, n 1) if WITNESS (a, n) then return COMPOSITE Definitely Almost surely
5 return PRIME
417
Description and Correctness of Miller-Rabin (a, n)

The main loop (beginning of line 1) picks s random values of a {1, 2, . . . , n} (line 2). If one of as so picked is a witness to the compositeness of n (line 3), then Miller-Rabin outputs COMPOSITE in line 4. If no witness is found in line 3, Miller-Rabin outputs PRIME in line 5. If Miller-Rabin outputs PRIME in line 5, then there may be an error in the procedure. This error may not depend on n but rather on the size of s and the luck in the random choice of the base a. However, this error is rather small. In fact, it can be shown that the error is less than 1/2s . Hence if the number of witnesses s is large, the error is comparatively small.
8.8
The Agrawal-Kayal-Saxena (AKS) Primality Testing Algorithm
8.8.1
Introduction
The Miller-Rabin primality testing algorithm is a probabilistic algorithm that uses Fermats Little Theorem. Another probabilistic algorithm is due to Solovay and Strassen. It uses the fact that if n is an odd prime, then a a n1 a 2 ( )(mod n), where ( ) stands for the Legendre symbol. The Millern n Rabin primality test is known to be the fastest randomized primality testing algorithm, to within constant factors. However, the question of determining a polynomial time algorithm (See ??????????for denition) to test if a given number is prime or not was remaining unsolved till July 2002. In August 2002, Agrawal, Kayal and Saxena of the Indian Institute of Tech-
418
nology, Kanpur, India made a sensational revelation that they have found a polynomial time algorithm for primality testing. Actually, their algorithm (log7.5 n) time (Recall from ?????????? that O (f (n)) stands for works in O O f (n) (polynomial in log f (n)) . It is based on a generalization of Fermats Little Theorem to polynomial rings over nite elds. Notably, the correctness proof of their algorithm requires only simple tools of algebra. In the following section, we present the details of AKS algorithm in ???
8.8.2
The Basis of AKS Algorithm
The AKS algorithm is based on the following identity for prime numbers which is a generalization of Fermats Little Theorem. Lemma 8.8.1: Let a Z, n N, n 2, and (a, n) = 1. Then n is prime if and only if (X + a)n X n + a (mod n) Proof. We have (X + a) = X + If n is prime, each term
n i n n n1 n=1
(8.16)
n X n1 ai + an . i
, 1 i n 1, is divisible by n. Further, as
(a, n) = 1, by Fermats Little Theorem, an a ( mod n). This establishes (8.16). If n is composite, n has a prime factor q < n. Let q k ||n (that is, q k |n but q k+1 | n). Now consider the term We have n q = n(n 1) (n q + 1) . 1 2q
n q
X nq aq in the expansion of (X + a)n .
Chapter 8 Cryptography Then q k | term

n q n q
419 , as q k ||n , n(n 1) (n q + 1) must be
. For if q k
n q
divisible by q , a contradiction. Hence q k , and therefore n does not divide the X nq aq . This shows that (X + a)n (X n + a) is not identically zero
over Zn . The above identity suggests a simple test for primality: given input n, choose an a and test whether the congruence (8.16) is satised, However this takes time (n) because we need to evaluate n coecients in the LHS in the worst case. A simple way to reduce the number of coecients is to evaluate both sides of (8.16) modulo a polynomial of the form X r 1 for an appropriately chosen small r. In other words, test if the following equation is satised: (X + a)n = X n + a (mod X r 1, n) (8.17)
From Lemma 8.8.1, it is immediate that all primes n satisfy eqn. (8.17) for all values of a and r. The problem now is that some composites n may also satisfy the eqn. (8.17) for a few values of a and r (and indeed they do). However, we can almost restore the characterization: we show that for an appropriately chosen r if the eqn. (8.17) is satised for several as, then n must be a prime power. It turns out that the number of such as and the appropriate r are both bounded by a polynomial in log n, and this yields a deterministic polynomial time algorithm for testing primality.
8.8.3
Notation and Preliminaries
Fp denotes the nite eld with p elements, where p is a prime. Recall that if p is prime and h(x) is a polynomial of degree d irreducible over Fp , then Fp [X ]/ (h(X )) is a nite eld of order pd . We will use the notation, f (X ) = g (X )
420
(mod h(X ), n) to represent the equation f (X ) = g (X ) in the ring Zn [X ]/ (h(X )), that is, if the coecients of f (X ), g (X ) and h(X ) are reduced modulo n, then h(X ) divides f (X ) g (X ). O f (n) poly in log f (n)
f (n) stands for As mentioned earlier, for any function f (n) of n, O . For example,
(logk n) = O logk n poly log logk n O = O logk n poly log log n = O logk+ n for any > 0.
All logarithms in this section are with respect to base 2. Given r N, a Z with (a, r) = 1, the order of a modulo r is the smallest number k such that ak 1(mod r). It is denoted as Or (a). (r) is the Eulers quotient function. Since by Fermats theorem, a(r) 1 ( mod r), and since aOr (a) 1 ( mod r), by the denition of Or (a), we have Or (a) | (r). We need the following simple fact about the lcm of the rst m natural numbers (See ?????????? for a proof). Lemma 8.8.2: Let lcm (m) denote the lcm of the rst m natural numbers. Then for m 7, lcm (m) 2m .
421
8.8.4
The AKS Algorithm
Input, integer n > 1 1 If (n = ab for a N and b > 1), output COMPOSITE 2 Find the smallest r such that Or (n) > 4 log2 n 3 If 1 < (a, n) < n for some a r, output COMPOSITE 4 If n r, output PRIME 5 For a = 1 to 2 (r) log n, do if (X + a)n = X n + a ( mod X n 1, n), output COMPOSITE 6 output PRIME Theorem 8.8.3: The AKS algorithm returns PRIME i n is prime. We now prove Theorem 8.8.3 through a sequence of lemmas. Lemma 8.8.4: If n is prime, then the AKS algorithm returns PRIME. Proof. If n is prime, we have to show that AKS will not return COMPOSITE in steps 1, 3 and 5. Certainly, the algorithm will not return COMPOSITE in step 1. Also, if n is prime, there exists no a such that 1 < (a, n) < n, so that the algorithm will not return COMPOSITE in step 3. By Lemma 8.8.1, the for loop in step 5 cannot return COMPOSITE. Hence the algorithm will identify n as PRIME either in step 4 or in step 6. We now consider the steps when the algorithm returns PRIME, namely, steps 4 and 6. Suppose the algorithm returns PRIME in step 4. Then n must be prime. If n were composite, n = n1 n2 , where 1 < n1 , n2 < n.
422
then as n r, if we take a = n1 , we have a r. So in step 3, we would have had 1 < (a, n) = a < n, a r. Hence the algorithm would have output COMPOSITE in step 3 itself. Thus we are left out with only one case, namely, the case if the algorithm returns PRIME in step 6. For the purpose of subsequent analysis, we assume this to be the case. The algorithm has two main steps (namely, 2 and 5). Step 2 nds an appropriate r and step 5 veries eqn. (8.17) for a number of as. We rst bound the magnitude of r. Lemma 8.8.5: There exists an r 16 log5 n + 1 such that Or (n) > 4 log2 n. Proof. Let r1 , . . . , rt be all the numbers such that Ori (n) 4 log2 n for each i and therefore ri divides i = (nOri (n) 1) for each i. Now for each i, i divides the product P =
4 logr n i=1
(ni 1) < n16 log
= (2log n )16 log

t
= 216 log n .
(Note that we have used the fact that

i=1
(ni 1) < nt , the proof of which
follows readily by induction on t). As ri divides i and i divides P for each i, 1 i t, the lcm of the ri s also divides P . Hence (lcm of the ri s)< 216 log n . However, by Lemma 8.8.2, lcm 1, 2, . . . , 16 log5 n 216 2
log5 n 5
Hence there must exist a number r in 16 log5 n + 1 such that Or (n) > 4 log2 n.
1, 2, . . . , 16 log5 n , that is, r
Let p be a prime divisor of n. We must have p > r. For, if p r, (then
423
as p < n), n would have been declared COMPOSITE in step 3, while if p = n r, n would have been declared PRIME in step 4. This forces that (n, r) = 1. Otherwise, there exists a prime divisor p of n and r, and hence p r, a contradiction as seen above. Hence (p, r) is also equal to 1. We x p and r for the remainder of this section. Also, let l = 2 (r) log n. Step 5 of the algorithm veries l equations. Since the algorithm does not output COMPOSITE in this step (recall that we are now examining step 6), we have (X + a)n = X n + a (mod X r 1, n) for every a, 1 a l. This implies that (X + a)n = X n + a (mod X r 1, p) for every a, 1 a l. By Lemma 8.8.1, we have (X + a)p = X p + a (mod X r 1, p) (8.19) (8.18)
for 1 a b. Comparing eqn. (8.18) with (8.19), we notice that n behaves like prime p. We give a name to this property: Denition 8.8.6: For polynomial f (X ) and number m N, m is said to be introspective for f (X ) if [f (X )]m = f (X m ) (mod X r 1, p). It is clear from eqns. (8.18) and (8.19) that both n and p are introspective for X + a, 1 a l. Our next lemma shows that introspective numbers are closed under multiplication.
Chapter 8 Cryptography Lemma 8.8.7: If m and m are introspective numbers for f (X ), then so is mm . Proof. Since m is introspective for f (X ), we have f (X m ) = (f (X ))m (mod X r 1, p) and hence [f (X m )]m = f (X )mm (mod X r 1, p).

424
(8.20)
Also, since m is introspective for f (X ), we have f Xm
= f (X )m (mod X r 1, p).
Replacing X by X m in the last equation, we get f X mm
= f (X m )m (mod X mr 1, p).
and hence f X mm
f (X m )m (mod X r 1, p)
(8.21)
(since X r 1 divides X mr 1). Consequently from (8.20) and (8.21), (f (X ))mm = f X mm Thus mm is introspective for f (X ). Next we show that for a given number m, the set of polynomials for which m is introspective is closed under multiplication. Lemma 8.8.8: If m is introspective for both f (X ) and g (X ), then it is also introspective for the product f (X )g (X ). Proof. The proof follows from the equation: [f (X ) g (X )]m = [f (X )]m [g (X )]m = f (X m )g (X m ) (mod X r 1, p).

(mod X r 1, p).
425
Eqns. (8.18) and (8.19) together imply that both n and p are introspective for (X + a). Hence by Lemmas 8.8.7 and 8.8.8, every number in the set I = {ni pj : i, j 0} is introspective for every polynomial in the set P =
l a=1
(X + a)ea : ea 0 . We now dene two groups based on the sets I
and P that will play a crucial role in the proof. The rst group consists of the set G of all residues of numbers in I modulo
r. Since both n and p are prime to r, so is any number in I . Hence G Zn ,
the multiplicative group of residues mod r that are relatively prime to r. It is easy to check that G is a group. (Only thing that requires a verication is that ni pj has a multiplicative inverse in G. Since nOr (n) 1 ( mod r), there exists i , 0 i < Or (n) such that ni = ni . Hence inverse of ni (= ni ) is n(Or (n)i) . A similar argument applies for p as pOr (n) = 1. (p being a prime divisor of r, (p, r) = 1). Let |G| = the order of the group G = t (say). As G is generated by n and p modulo r and since Or (n) > 4 log2 n, t > 4 log2 n. To dene the second group, we need some basic facts about cyclotomic polynomials over nite elds. Let Qr (X ) be the r-th cyclotomic polynomial over the eld Fp (?????[?]). Then Qr (X ) divides X r 1 and factors into irreducible factors of the same degree d = Or (p). Let h(X ) be one such irreducible factor of degree d. Then F = Fp [X ]/(h(X )) is a eld. The second group that we want to consider is the group generated by X + 1, X + 2, . . . , X + l in the multiplicative group F of nonzero elements of the eld F . Hence it consists of simply the residues of polynomials in P modulo h(X ) and p. Denote this group by G . We claim that the order of G is exponential in either t = |G| or l. Lemma 8.8.9:

Chapter 8 Cryptography |G| min 2l 1, 2t .
426
Proof. First note that h(X ) | Qr (X ) and Qr (X ) | (X r 1). Hence X may be taken as a primitive r-th root of unity in F = Fp [X ]/(h(X )). We claim that (*) if f (X ) and g (X ) are polynomials of degree less than t and if f (X ) = g (X ) in P , then their images in F (got by reducing the coecients modulo p and then taking modulo h(X ) ) are distinct. To see this, assume that f (X ) = g (X ) in the eld F (that is, the images of f (X ) and g (X ) in the eld F are the same). Let m I . Recall that every number of I is introspective with respect to every polynomial in P . Hence m is introspective with respect to both f (X ) and g (X ). This means that f (X )m = f (X m ) (mod X r 1, p), and g (X )m = g (X m ) (mod X r 1, p). Consequently, f (X m ) = g (X m ) (mod X r 1, p), and since h(X ) | (X r 1), f (X m ) = g (X m ) (mod h(X ), p). In other words f (X m ) = g (X m ) in F , and therefore X m is a root of the polynomial Q(Y ) = f (Y ) g (Y ) for each m G. As X is a primitive r-th root of unity and for each m G, (m, r) = 1, X m is also a primitive r-th root of unity. Since each m G is reduced modulo r, the powers X m , m G, are all distinct. Thus there are at least t = |G| primitive r-th roots of unity X m , m G, and each one of them is a root of Q(Y ). But since f and g are of degree less than t, Q(Y ) has degree less than t. This contradiction shows that f (X ) = g (X ) in F .
427
We next observe that the numbers 1, 2, . . . , t are all distinct in Fp . This is because l = 2 r) log n < 2 r log n t <2 r (as t > 4 log2 n) 2 <r
(as t < r; recall that G is a subgroup of Zn )
< p (by assumption on p). Hence {1, 2, . . . , l} {1, 2, . . . , p 1}. This shows that the elements X + 1, X + 2, . . . , X + l are all distinct in Fp [X ] and therefore in Fp [X ]/h(X ) = F . If t l, then all the possible products of the polynomials in the set {X + 1, X + 2, . . . X + t} except the one containing all the t of them are all distinct and of degree less than t. Their number is 2t 1 and all of them belong to P . By (*), their images in F are all distinct, that is, |G| 2t 1. If t > l, then there exist at least 2l such polynomials (namely, the product of all subsets of {X + 1, . . . , X + l}). These products are all of degree at most l and hence of degree < t. Hence in this case, |G| 2l . Thus |G| min 2t 1, 2l . Finally we show that if n is not a prime power, then |G| is bounded above by an exponential function of t = |G|. Lemma 8.8.10:
1 2 t n . If n is not a prime power, |G| 2
= {ni pj : 0 i, j t}. If n is not a prime power (recall Proof. Set I = (t + 1)2 > t. When reduced mod r, that p|n), the number of terms in I give elements of G. But |G| = t. Hence there exist at least the elements of I
428
which become equal when reduced modulo r. Let two distinct numbers in I them be m1 , m2 with m1 > m2 . So we have (since r divides (m1 m2 ) ), X m1 = X m2 (mod X r 1) Let f (X ) P . Then [f (X )]m1 = f (X m1 ) (mod X r 1, p) = f (X m2 ) (mod X r 1, p) by (8.22) = [f (X )]m2 (mod X r 1, p) = [f (X )]m2 (mod h(X ), p) since h(X ) | (X r 1). This implies that [f (X )]m1 = [f (X )]m2 in the eld F . (8.23) (8.22)
Now f (X ) when reduced modulo (h(X ), p) yields an element of G . Thus every polynomial of G is a root of the polynomial Q1 (Y ) = Y m1 Y m2 over F .
Thus there are at least |G| distinct roots in F . Naturally, |G| degree of Q1 (Y ). Now the degree of Q1 (Y ) = m1 (as m1 > m2 )
= (n p)t the greatest number in I < n 2

2 t
p<
t
n2 t n2 = t < 2 2 n2 t . Hence |G| m1 < 2
n , 2 .
since p | n,
and p = n
429
Lemma 8.8.9 gives a lower bound for |G| while Lemma 8.8.10 gives an upper bound for |G|. These bounds enable us to prove the correctness of the algorithm.
Chapter 9 Finite Automata

9.1 Introduction
The broad areas of automata, formal languages, computability and complexity comprise the core of contemporary Computer Science theory. Progress in these areas has given a formal meaning to computation, which itself received serious attention in the 1930s. Automata theory provides a rm basis for the mathematical models of computing devices; nite automaton is one such model. Ideas of simple nite automata have been implicitly used in electromechanical devices for over a century. A formal version of nite automata appeared in 1943 in the McCulloch-Pitts model of neural networks. The 1950s saw intensive work on nite automata (often under the name sequential machines), including their recognizing power and equivalence to regular expressions. A nite automaton is a greatly restricted model of a modern computer. The main restriction is that it has only limited memory (which cannot be expanded) and has no auxiliary storage. The theory of nite automata is rich 430
431
and elegant. In 1970, with the development of the UNIX operating system, practical applications of nite automata have appearedin lexical analysis (lex), in text searching (grep) and in Unix-like utilities (awk). Extensions to nite automata have been proposed in the literature. For example, the B uchi automata is one that accepts innite input sequences. We begin the notion of languages which is freely used in later discussions. We then introduce regular expressions. Subsequent sections deal with nite automata and their properties.
9.2
Languages
An alphabet is any nite set of symbols or letters. If we are considering ASCII characters as our alphabet we can build text as understood commonly. With the alphabet {0, 1}, we can build binary numbers. We will denote an arbitrary alphabet by the Greek letter . Given , a string (or any sentence or a sequence) over is any nite-length sequence of symbols of . e.g. if = {0, 1}, we can construct strings ending in 0; that is, we can prescribe the sequence, 0, 10, 100, 110, 1000, 1100, 1010, 1110, . . . (these strings can be precisely described in another waythey are binary numbers divisible by 2) The length of a string x, denoted by |x| is dened to be the number of symbols in it. For example |1010| = 4. The null string or empty string is a unique string of length 0 (over ). It is denoted by . Thus || = 0. The set of all strings over an alphabet is denoted by . For example, if = {a},
Chapter 9 Finite Automata then, {a} = {, a aa, aaa, aaaa, . . .}
432
Consider the alphabet A comprising both upper and lower case of the 26 letters of the English alphabet as well as the punctuation marks, blank and question mark. That is, A = {a, b, . . . , A, B, . . . , b, ?} The sentence, This b is b a b sentence. is a sequence in A . Similarly, the sentence, sihT b si b ynnuf. is a sequence in A . The objective in dening a language is to selectively pick certain sequences of , that make sense in some way or that satisfy some property. Denition 9.2.1: Let be a nite set, the alphabet. A language over is a subset of the set . Note that , and denote some language! It is not dicult to reason that since is nite, is a countably innite set (the basic idea is thus: for each k 0, all strings of length k are enumerated before all strings of length k + 1; strings of length exactly k are enumerated lexicographically, once we x some ordering of the symbols in ).
433
Since languages are sets, they can be combined by union, intersection and dierence. Given a language A we also dene the complement in as = A = {x | x A} . A In other words, A is \ A. If A and B are languages over , their concatenation is a language A B or simply AB dened as, AB = {xy | x A and y B } . The powers of An of a language A are inductively dened A = {} An+1 = AAn
For example, {a, b}n is the set of strings over {a, b} of length n. The closure or the Kleene star or the asterate A of a language A is the set of all strings obtained by concatenating zero or more strings from A. This is exactly equivalent to taking the union of all nite powers of A. That is A = An
n0
We also dene A+ to be the union of all nonzero powers of A: A+ = AA Exercise 9.2.2: The language L is dened over the alphabet {a, b} recursively as follows:
i) L
Chapter 9 Finite Automata ii) x in L, xa, bx and abx are also in L iii) nothing else is in L
434
Show that L is the set of all elements in {a, b} except those containing the substring aab. A primary issue in the theory of computation is the representation of languages by nite specication. As we are interested in languages that may have innite number of strings, the issue is quite challenging. Also, for many languages, characterizing all the strings by a simple unique property may not be possible. In any case, given a specication of a language, we are mostly interested in the following problems: (i) how to automatically generate the strings in the given language? (ii) how to recognize whether a given string belongs to the given language? Answers to these questions comprise much of the subject matter of the study of automata and formal languages. Models of computation, such as nite automata, pushdown automata, Turing Machines or -calculus evolved before modern computers came into existence. Parallely and independently, the formal notion of grammar and language (Chomsky hierarchy, a hierarchy of language classication) were developed. The equivalence between languages and automata is now well-understood. Our concern here is restricted to what are called as regular languages described precisely by regular expressions (answering the question (i) above). We are also concerned with the corresponding model of computation called nite automata or nite state machines (which will answer the question (ii) above).
435
9.3
Regular Expressions and Regular Languages
To concisely describe an identier in a programming language, we may say the following: an identifer (in some programming language) is a letter followed by zero or more letters or digits. This can be syntactically described as, identier = letter (letter + digit) The expression on the right is a typical regular expression. Interpreting as concatenation, + as union and
as the Kleene star, we get the desired
meaning. We generalize the idea in the above example: we can construct a language using the operations of union, concatenation and Kleene star on simple languages. Given , for any a , we can form a simple language {a}; also we allow empty language and {} to be simple languages. Denition 9.3.1: Regular expressions (and the corresponding languages) over are exactly those expressions that can be constructed from the following rules: 1. is a regular expression, denoting the empty language. 2. is a regular expression, denoting the language {}, the language containing only the empty string. 3. If a , a is a regular expression denoting the language {a}, the language with only one string (comprising the single symbol a).
436
4. If R and S are regular expressions denoting languages LR and LS respectively, then: i) (R) + (S ) is a regular expression denoting the language LR LS . ii) (R) (S ) is a regular expression denoting the language LR LS . iii) (R) is a regular expression denoting the language L R. Languages that can be obtained by the above rules 1 through 4 produce what are called as regular languages over , which we dene later. Regular expressions obtained as above are fully paranthesized. We can relax on this requirement if we agree to the precedence rules: Kleene star has highest precedence and + , with the lowest precedence (we use parentheses when necessary). Example 9.3.2: We write a + b c to mean (a + ((b )c)), a regular expression over {a, b, c}. Example 9.3.3: The regular expression (a + b) cannot be written as a + b because they denote dierent languages. Example 9.3.4: To reason about the language represented by the regular expression (a + b) a, we note that (a + b) denotes strings of any length (including 0) formed by taking a or b. This is then concatenated with the symbol a. Therefore the language consists of all possible strings from {a, b} ending in a.
437
If two regular expressions R and S denote the same language, we write R = S and we say R and S are equivalent. Exercise 9.3.5: Let = {a, b}. Reason that (i) (a + b) = {a , b } . (ii) (aa + ab + ba + bb) = ((a + b)(a + b)) . Every regular expression can be regarded as a way of generating members of a language. There are however some languages which have simple descriptions even though they cannot be described by regular expressions. For example, the language {0n 1n | n 1} can be shown to be not regular. Thus regular expressions are limited in their powers to specify languages in general.
9.4
Finite Automata Denition
In the previous section, we saw regular expressions as language generators. We now describe a simple automaton or an abstract machine to recognize strings that belong to a language generated by regular expressions. In the discussions that follow, should be obvious from the context. Consider the following simple diagram, which depicts a nite automaton M1 :
Chapter 9 Finite Automata 1 q0 0 q1 0
438
1 The automaton M1 Figure 9.1: Let 1010 be the input to the machine. We start in state q0 (marked by an arrow) and we see a 1 in the input (scanned from left to right). On seeing the 1 we make the transition as indicated and stay in state q0 . On seeing the second symbol 0, from state q0 we follow the transition and move to state q1 . In state q1 we see the third symbol 1. We make the transition to state q0 . In state q0 we see the last symbol 0 and make the transition to state q1 which is an accepting state (marked by double circle). We say that the input string 1010 is accepted by the automaton M1 . It is not dicult to reason that M1 accepts only all strings described by (0 + 1) 0 i.e., all strings ending in 0. Let us next consider the following automaton M2 : 0 q0 1 q1 1
0 The automaton M2 Figure 9.2: We can easily reason that M2 accepts the empty string and those that end in 0. We now consider the following automaton M3 which solves a practical
439
problemtesting for divisibility of a number by 3: The above automaton 0, 3, 6, 9 q0
1, 4, 7 2, 5, 8
0, 3, 6, 9 s 2, 5, 8 2, 5, 8
1, 4, 7
1, 4, 7 q1 0, 3, 6, 9
1, 4, 7 2, 5, 8
q2 0, 3, 6, 9
Automaton M3 Divisibility by 3 tester Figure 9.3: recognizes all strings from = {0, 1, 2, 3, . . . , 9} that are exactly divisible by 3. Note that at any time, in scanning an input, we are in: state q0 , if the partial string scanned (i.e., the number corresponding to it) so far is congruent to 0 ( mod 3); state q1 , if the partial string scanned so far is congruent to 1 ( mod 3); and state q2 , if the partial string scanned so far is congruent to 2 ( mod 3)
Given a machine M , if A is the set of all strings accepted by M , we say that the language of the machine M is A. This is concisely written as L(M ) = A.
Chapter 9 Finite Automata Exercise 9.4.1:
440
By trial and error, design a nite automaton that precisely recognizes the following language A: A = { | is a string that has an equal number of occurrences of 01 and 10 as substrings}. We now give the denition of a (deterministic) nite state automaton (DFA). Denition 9.4.2: A (deterministic) nite automaton M is a 5-tuple M = (Q, , , s, F ) , where Q is a nite set of states is a nite alphabet : Q Q is a transition function s Q is the start state F Q is the set of accept (or nal) states. Note that , the transition function denes the rules for moving through the automaton as depicted pictorially. It can equivalently be described by a table. If M is in state q and sees input a, it moves to state (q, a). No move on is allowed; also (q, a) is uniquely specied. Example 9.4.3:
Chapter 9 Finite Automata The following automaton M4 accepts strings ending in 11: M4 = (Q, , , s, F )
441
where Q = {a, b, c}, = {0, 1}, s = a and F = {c}. is given by the following table:
: a b c
0 a a a
1 b c c
, the multistep version of : We now dene : Q Q (q, ) = q and for any string x and By denition, for any q Q, symbol a , (q, x), a . (q, xa) =
(q, x) is the state the automaton M will end up in, when Thus the state started in state q , fed the input x and allowed transitions according to . and agree on strings of length one: Note that (q, a) = (q, a), since a = a
(q, ), a = (q, a). = (s, x) F and is Formally, a string x is accepted by the automaton M if (s, x) F . rejected by the automaton M if We can now formally dene L(M ), the language of the machine: (s, x) F L(M ) = x |
Chapter 9 Finite Automata Denition 9.4.4: A language A is said to be regular if A = L(M ) for some DFA M .
442
9.4.1
The Product Automaton and Closure Properties
Given two regular languages A and B , we now show that A B , A B and A are also regular. Let A and B be regular languages. Then there must be automata M1 and M2 such that L(M1 ) = A and L(M2 ) = B . Let, M1 = (Q1 , , 1 , s1 , F1 ) M2 = (Q2 , , 2 , s2 , F2 ) To show that A B is regular, we have to show that there exists an automaton M3 such that L(M3 ) = A B . We claim that M3 can be constructed as given below: Let M3 = (Q3 , , 3 , s3 , F3 ) , where and q Q2 } and q F2 }
Q3 = Q1 Q2 = {(p, q ) | p Q1 F3 = F1 F2 = {(p, q ) | p F1 s3 = (s1 , s2 )
3 (p, q ), a = 1 (p, a), 2 (q, a) The automaton M3 is called the product of M1 and M2 . The operation of M3 should be intuitively clear. On an input x , it will simulate the moves of M1 and M2 simultaneously and reach an accept state only if x is accepted by both M1 and M2 . A formal argument is given below.
Chapter 9 Finite Automata Lemma 9.4.5: We rst show that for all x , 3 ((p, q ), x) = 1 (p, x), 2 (q, x) . The proof is by induction on | x | . If x = , 3 ((p, q ), ) = (p, q ) = 1 (p, ), 2 (q, ) .
443
Now assume that the lemma holds for x . We will show that it holds for xa also where a . 3 (p, q ), xa = 3 3 ((p, q ), x) , a = 3 1 (p, x), 2 (q, x) , a 3 denition of induction hypothesis denition of 3 1 and 2 denition of
1 1 (p, x), a , 2 2 (q, x), a = 1 (p, xa)) , 2 (q, xa) = We now show that L(M3 ) = L(M1 ) L(M2 ). For all x , x L(M3 ) 3 (s3 , x) F3 3 ((s1 , s2 ), x) F1 F2 1 (s1 , x), 2 (s2 , x) F1 F2 1 (s1 , x) F1 2 (s2 , x) F2 and
denition of acceptance denition of s3 and F3 lemma 9.4.5 denition of set product denition of acceptance denition of intersection
x L(M1 ) and x L(M2 ) x L(M1 ) L(M2 )
444
We next argue that regular sets are closed under complementation. To do this, we take the DFA M , where L(M ) = A, and interchange the set of accept and reject states. The modied automaton accepts a string x exactly when x is not accepted by M . So we have constructed the modied automaton to accept A. By one of the DeMorgans laws, A B = ( A B ) Therefore, the regular languages are closed under union also. In the sequel, we shall give only intuitively appealing informal correctness arguments rather than rigorous formal proofs as done above.
9.5
9.5.1
Nondeterministic Finite Automata

Nondeterminism
A DFA can be made more exible by adding a feature called nondeterminism . This feature allows the machine to change states that is only partially determined by the current state and input symbol. That is, from a current state and input symbol, the next state can be any one of the several possible legal states. Also the machine may change its state by making an -transition i.e., there may be transition from one state to another without reading any new input symbol. It however turns out that every nondeterministic nite automaton (NFA) is equivalent to a DFA. An NFA is said to accept its input string x if it is possible to start in the start state and scan x, moving according to the transition rules making a series of choices (for moving to next state) that eventually leads to one of the
445
accepting states when the end of x is reached. Since there are many choices for going to the next state, there may be many paths through the NFA in response to the input xsome may lead to accept states and some may lead to reject states. The NFA is said to accept x if at least one computation path on x starting from the start state leads to an accept state. It should be noted that in the NFA model itself there is no mechanism to determine as to which transition to make in response to a next symbol from the input. We illustrate these ideas with an example. Consider the following automaton N1 : 0, 1 q0 1 q1 0, 1 q2 0, 1 q3
The automaton N1 Figure 9.4: In the automaton N1 , from the state q0 there are two transitions on the symbol 1. So N1 is indeed nondeterministic. It is easy to reason that N1 accepts all strings over {0, 1} that contain a 1 in the third position from the end. On an input, say, 01110100, a computation can be such that we always stay in state q0 . But the computation where we stay in state q0 till we read the rst ve symbols and then move to states q1 , q2 and q3 on reading the next three symbols accepts the string 01110100. Therefore N1 accepts the string 01110100.
446
9.5.2
Denition of NFA
The denition is very similar to that of a DFA except that we need to describe the new way of making transitions. In an NFA the input to the transition function is a state plus an input symbol or the empty string; the transition is to a set of possible next (legal) states. Let P (Q) be the power set of Q. Let denote {}, for any alphabet . The formal denition is as follows: Denition 9.5.1: A nondeterministic nite automaton is a 5-tuple (Q, , , s, F ), where Q is a nite set of states is a nite alphabet : Q P (Q) is the transition function s Q is the start state and F Q is the set of accept states. We can now formally state the notion of computation for an NFA. Let N = (Q, , , s, F ) be an NFA and let w be a string over the alphabet . Then we say that N accepts w if we can write w as w1 w2 wk , where each wi , 1 i k , is in and there exists a sequence of states q0 , q1 , . . . , qk , 0 i k such that, (i) q0 = s (ii) qi+1 (qi , wi+1 ), for i = 0, . . . , k 1 (iii) qk F Condition (i) states that the machine starts in its start state. Condition (ii)
447
states that on reading the next symbol in any current state the next state is any one of the allowed legal states. Condition (iii) states that the machine exhausts its input to end up in any one of the nal states.
9.6
Subset ConstructionEquivalence of DFA and NFA
Denition 9.6.1: Two automata M and N are said to be equivalent if L(M ) = L(N ). A fundamental fact is that both deterministic and non-deterministic nite automata recognize the same class of languages and are therefore equivalent. Thus, corresponding to an NFA there exists a DFAthe DFA may require more states. The equivalence can be proved using what is known as the subset construction, which can be intuitively described as given below. Let N be the given NFA. We start with a pebble in the start state of N . We next track the moves in N using its transition function. Assume that we have scanned y , a prex of the input string and that the next symbol is b. We now make all b moves (as indicated by ) from every (current) state where we have a pebble to each destination state to which we can move on b or on . In each of these destination states, we place a pebble and we remove the pebbles o the current states. Let P be the set of current states and P be the set of destination states. Then we can build a transition function for a DFA which has P as one state and which moves to the state P on seeing b from state P . This intuitive idea can be precisely rened and captured. We rst need a preliminary computation before we describe the subset construction.
448
Let -CLOSURE(q ), q Q be the states of N built by applying the following rules: (i) q is added to -CLOSURE(q ); (ii) If r1 is in -CLOSURE(q ) and there is an edge, labelled from r1 to r2 , then r2 is also added to -CLOSURE(q ) if r2 is not already there. Rule (ii) is repeated until no more new states can be added to -CLOSURE(q ). Thus -CLOSURE(q ) is the set of states that can be reached from q on one or more -transitions alone. If T is a set of states, then we dene CLOSURE(T ) as the union over all states q in T of -CLOSURE(q ). The procedure given in Fig. 9.1 illustrates the computation of -CLOSURE(T )
using a stack.
begin push all states of T onto STACK; -CLOSURE(T ) = T ; while (STACK not empty) do begin pop q , top element of STACK, off STACK; for (each state r with an edge from q to r labelled ) do if (r is not in -CLOSURE (T )) then begin add r to -CLOSURE (T ); push r onto STACK end end end
449

Figure 9.1: Computing -CLOSURE We now give the subset construction procedure. That is, we are given an NFA N ; we are required to construct a DFA D equivalent to N . Initially let -CLOSURE (s) be a state (the start state) of D, where s is the start state of N . We assume that initially each state of D is unmarked.
We now execute the following procedure.
begin while(there is an unmarked state q = (r1 , r2 , . . . rn ) of D) do begin mark q ; for (each input symbol a ) do begin let T be the set of states to which there is a transition on a from some state ri in q ; x = -CLOSURE (T ); if (x has not yet been added to the set of states of D) then make x an unmarked state of D; Add a transition from q to x labelled a if not already present; end end
end
Figure 9.2:
Chapter 9 Finite Automata We thus have the following theorem. Theorem 9.6.2: Every NFA has an equivalent DFA.
450
Thus NFAs give an alternative characterization of regular languages: a language is regular if and only if some NFA recognizes it. Exercise 9.6.3: The following example illustrates the fact that given an NFA with -transitions, we can get another NFA with no -transitions. q2 q2 0 q0 0 q1 1 q3 0 0 q0 0 0 0 q1 0 Figure 9.5: Argue that the NFAs given above are equivalent. Exercise 9.6.4: Argue that both the following NFAs accept the language (01 + 0 1). 1 0 q3 1 0
Chapter 9 Finite Automata q0 1 0 p0 1 p1 Figure 9.6: Exercise 9.6.5: 0 p0 0 1 1 p1 0 q1 q0 0 q1 1
451
Consider the following NFA which accepts all strings ending in 01 (interpreted as a binary integer, the NFA accepts all integers of the form 4x + 1, x = any integer). 0, 1 q0 0 q1 1 q2
Figure 9.7: Show that by applying the subset construction procedure we get the following equivalent DFA. 1 {q0 } 0 0 {q0 , q1 } 1 1 0 {q0 , q2 }
Figure 9.8:
Chapter 9 Finite Automata Exercise 9.6.6:
452
Consider the following NFA which accepts all strings of 0s and 1s such that the nth symbol from the end is 1. 0, 1 q0 1 q1 0, 1 q2 0, 1 0, 1 qn1 0, 1 qn
Figure 9.9: Argue that the above NFA is a bad case for subset construction.
9.7
Closure of Regular Languages Under Concatenation and Kleene Star
We rst show that the class of regular languages is closed under the concatenation operation. Given two regular languages A and B , let N1 and N2 be two NFAs such that L(N1 ) = A and L(N2 ) = B . From N1 and N2 we construct a new NFA N , to recognize AB , as suggested by the following gure.
453
s1
s2
N1
N2
N Figure 9.10: The key idea is that, we make the start state s1 of N1 as the start state s of N . From each accept state of N1 we make an -transition to the start state s2 of N2 . The accept states of N are the accept states of N2 only. Let N1 = (Q1 , , 1 , s1 , F1 ) N2 = (Q2 , , 2 , s2 , F2 ) We construct N to recognize AB as follows: N = (Q, , , s1 , F2 ) where (i) Q = Q1 Q2 (ii) We dene so that for any q Q and a
Chapter 9 Finite Automata 1 (q, a) 1 (q, a)
454
if q Q1 and q F1 if q F1 and a
(q, a) =
1 (q, a) {s2 } if q F1 and a = 2 (q, a) if q Q2
Finally we show that the class of regular languages is closed under the star operation. Given a regular language A, we wish to prove that the language A is also regular. Let N be an NFA such that L(N ) = A. We modify N , as suggested by the following gure, to build an NFA N :
N s
Figure 9.11: The NFA N accepts any input that can be broken into several pieces and each piece is accepted by N . In addition, N also accepts which is a member of A . Let N = (Q, , , s, F ) be an NFA that recognizes A. We construct N to recognize A as follows: N = (Q , , , s , F )
Chapter 9 Finite Automata Q = Q {s } F = F {s }
455
For any q Q and a , we dene (q, a) (q, a) (q, a) = (q, a) {s} {s} We thus have the following theorem: Theorem 9.7.1:
if q Q and a F if q F and a = if q F and a if q = s and a = if q = s and a =
The class of languages accepted by nite automaton is closed under intersection, complementation, union, concatenation and Kleene star. Proof. Follows from Subsection 9.4.1 and the above discussions.
9.8
Regular Expressions and Finite Automata
Although regular expressions and nite automata appear to be quite dierent, they are equivalent in their descriptive power. Thus any regular expression can be converted to a nite automaton and given any nite automaton, we can get a regular expression capturing the language recognized by it.
Chapter 9 Finite Automata Lemma 9.8.1:
456
If a language A is described by a regular expression R, then it is regular; that is, there is an NFA to recognize A. Proof. Given a regular expression R, we can construct an equivalent NFA N by considering the following four cases: 1. R = . Then L(R) = and the following NFA recognizes L(R).
Figure 9.12: 2. R = . The L(R) = {} and the following NFA recognizes L(R).
Figure 9.13: 3. R = a, for some a . The L(R) = {a} and the following NFA recognizes L(R): a
Figure 9.14: 4. R can be one of the following (R1 and R2 are regular expressions): i) R1 + R2 ii) R1 R2

iii) R1
457
For this case we simply construct the equivalent NFA following the techniques used in proving closure properties (union, catenation and Kleene star) of regular languages. The following example will provide a concrete illustration: Example 9.8.2: Consider the regular expression (ab + aab) . We proceed to construct an NFA to recognize the corresponding language: Step 1 We construct NFAs for a, b : a b
Figure 9.15:
Step 2 We use the above NFAs and construct NFAs for ab and aab. a b
Figure 9.16:
Step 3 Using the above NFAs for ab and aab, we construct an NFA for ab + aab.
Chapter 9 Finite Automata a b
458
Figure 9.17:
Step 4 From the above NFA for ab + aab, we build the required NFA for (ab + aab) . a a a b b
Figure 9.18:
Lemma 9.8.3: If a language is regular (that is, it is recognized by a nite automaton), then it is described by a regular expression. Proof. Let L be accepted by a DFA M = (Q, , , s, F ). Then L = (s, x) F . Let F = {q1 , q2 , . . . , qk }. For every qi F we can x | consider the language Lqi as, (s, x) = qi Lqi = x |
Chapter 9 Finite Automata Then L = Lq1 Lq2 Lqk .
459
If Lqi is described by a regular expression rqi then L can be described by the regular expression, rq1 + rq2 + + rqk . (recall: nite unions of regular sets are regular). For any two states p, q Q, let L(p, q ) be the language dened as, (p, x) = q L(p, q ) = x | It is therefore sucient to show that L(p, q ) is described by a regular expression. We now look at the number of distinct states through which M passes in moving from state p to state q . It turns out that it is more convenient to consider, for each k , a specic set of k states and to consider the set of strings that cause M to go from state p to state q by going through only states in that set. If k is large enough, this set of strings will be all of L(p, q ). Let us relabel the states of M using integers 1 through n, where n = |Q|. For a string x , let x represent a path from state p to state q going through state t is there exist strings y, z (both = ) such that, x = yz Now, for j 0, let L(p, q, j ) =set of strings corresponding to paths from state p to state q that go through no state numbered higher than j . We have, L(p, q ) = L(p, q, n), because no string in L(p, q ) can go through a state numbered higher than n for there are no states numbered greater than n. (p, y ) = t (t, z ) = q and
460
The problem now is to show that L(p, q, n) can be described by a regular expression. (This will be true if we can show that L(p, q, j ) is described by a regular expression for every j with 0 j n). We proceed to do this by induction. For the basis step, we show that L(p, q, 0) can be described by a regular expression. By denition, L(p, q, 0) represents sets of strings corresponding to paths from state p to state q going through no state numbered higher than 0this means going through no state at all, which means that the string can contain no more than one symbol. Therefore, L(p, q, 0) {} More explicitly, {a | (p, q ) = q } if p = q
L(p, q, 0) =
It is easy to reason that L(p, q, 0) can be described by a regular expression. The induction hypothesis is that, for some k 0 and for every states p, q where 0 p, q n, the language L(p, q, k ) can be described by a regular expression. Assuming the hypothesis, we wish to show that for every p, q in the same range, the language L(p, q, k + 1) can also be described by a regular expression. We note that for k n, L(p, q, k + 1) = L(p, q, k ) and so we assume k < n. By denition, a string x L(p, q, k + 1) if it represents a path from p to q that goes through no state numbered higher than k + 1. This is possible in two ways. First, the path does not go higher than k so, x L(p, q, k ). Second, the path goes through the state k + 1 (and nothing higher). In this
{a | (p, a) = p} {} if p = q.
461
case, in general, it goes from state p to the state k + 1 (for the rst time), then possibly loops from state k + 1 back to itself (zero or more times), and then from the state k + 1 to the state q . This means, we can write x as yzw, where y corresponds to the path from state p to the rst visit of state k + 1, z corresponds to the looping in state k + 1, and w corresponds to the path from state k + 1 to state q . We note that in each of the two parts y and w and in each of the looping making z , the path does not go through any state higher than k . Therefore, y L(p, k + 1, k ), w L(k + 1, q, k ), z L(k + 1, k + 1, k )
Considering both the above cases, we have, x L(p, q, k ) L(p, k + 1, k )L(k + 1, k + 1, k ) L(k + 1, q, k ) Since x is any string in L(p, q, k + 1), we can write, L(p, q, k + 1) = L(p, q, k ) L(p, k + 1, k )L(k + 1, k + 1, k ) L(k + 1, q, k ) The expression on the right hand side above can be described by a regular expression if the individual languages in it can be described by regular expressions (this is so, by induction hypothesis). Therefore L(p, q, k + 1) can be described by a regular expression. Theorem 9.8.4: A language is regular if and only if some regular expression describes it. Proof. Follows from lemmas 9.8.1 and 9.8.3 above.
462
9.9
DFA State Minimization
Consider the regular expression (a + b) abb. If we construct the NFA for this and apply subset construction algorithm, we will get the following DFA: b A a a b a a b B a C b
Figure 9.19: The above DFA has ve states. The following DFA with only four states also accepts the language described by (a + b) abb: b A a a Figure 9.20: The above example suggests that a given DFA can possibly be simplied to give an equivalent DFA with a lesser number of states. We now give an algorithm that gives a general method of reducing the B b b a C a b D
Chapter 9 Finite Automata number of states of a DFA. Let the given DFA be, M = (Q, , , s, F )
463
We assume that from every state there is a transition on every input (if q is a state not conforming to this then introduce a new dead state d from q to d on the inputs that are not already present; also add transitions from d to d on all inputs). (q1 , w) We say that a string w distinguishes a state q1 from a state q2 if The minimization procedure works on M by nding all groups of states that can be distinguished by some input string. Those groups of states that cannot be distinguished are then merged to form a single state for the entire group. The algorithm works by keeping a partition of Q such that each group of states consists of states which have not yet been distinguished from one another and such that any pair of states chosen from dierent groups have been found distinguishable by some input. Initially the two groups are F and Q \ F . The fundamental step is to take a group of states, say A = {q1 , q2 , . . . , qk } and some input symbol a and consider (qi , a) for every qi A. If these transitions are to states that fall into two or more dierent groups of the current partition, then we must split A into subsets so that the transitions from the subsets of A are all conned to a single group of the current partition. For example, let (q1 , a) = t1 and (q2 , a) = t2 and let t1 and t2 be in dierent groups. Then we must split A into at least two subsets so that one subset contains q1 and another contains q2 . Note that t1 and t2 are distinguished by some string w and so q1 and q2 are distinguished by the string aw.
(q2 , w) F or vice versa. F and
464
The algorithm repeats the process of splitting groups until no more groups need to be split. The following fact can be formally proved: If there exists a (q1 , w) F and (q2 , w) F or vice versa then q1 and string w such that q2 cannot be in the same group; if no such w exists then q1 and q2 can be in the same group. Algorithm DFA Minimization Input: A DFA M = (Q, , , s, F ) with transitions dened for all states. Output: A DFA M such that L(M ) = L(M ).
1. We construct a partition only. We next rene to
of Q. Initially
new ,
consists of F and Q \ F , each split into one

new
a new partition using procedure Rene-
given in Fig. ??. That is, or more subgroups. If procedure Rene- . If .
new new new
consists of groups of , we replace by
= =
and repeat the
, then we terminate the process of rening
Let G1 , G2 , . . . , Gk be the nal groups of
procedure Refinebegin for (each group G of ) do
begin partition G into subgroups such that two states q1 and q2 of G are in the same subgroup iff for all a , (q1 , a) and (q2 , a) are in the same group of /* in the worst case, a state will be in a subgroup by itself */ place all subgroups so formed in
new

end
465
end
Figure 9.3: 2. For each Gi in we pick a representative, an arbitrary state in Gi . The
representatives will be the states of the DFA M . Let qi be the representative of Gi and for a let (qi , a) Gj . Note Gj can be same as Gi . Let qj be the representative of Gj . Then in M we add the transition from qi to qj on a. Let the initial state of M be the representative of the group containing the initial state of s of M . Also let the nal states of M be the representatives which are in F . 3. If M has a dead state d, then remove d from M . Also remove any state not reachable from the initial state. Any transition from other states to d become undened. Example 9.9.1: The following DFA recognizes the language described by the regular expression (0 + 1) 10.
Chapter 9 Finite Automata 0 0 2 0 1 1 3 0 1 7 1 Figure 9.21: 0 1 6 1 5 0 1 4 1 0
466
Applying the above algorithm gives the nal partition of states as {1, 2, 4}, {5, 3, 7} and {6}. The minimized DFA is given below: 0 {1,2,4} 1 0 {5,3,7} 1 0 1 {6}
Figure 9.22: Section 9.11 will answer the question of the uniqueness of the minimal DFA.
467
9.10
9.10.1
Myhill-Nerode Relations
Isomorphism of DFAs
We begin with the notion of isomorphism of DFAs. Let D1 and D2 be two DFAs where, D1 = (Q1 , , 1 , s1 , F1 ) D2 = (Q2 , , 2 , s2 , F2 ) The DFAs D1 and D2 are said to be isomorphic if there is a 11 onto mapping f : Q1 Q2 such that, (i) f (s1 ) = s2 (ii) f (1 (p, a)) = 2 (f (p), a) for all p Q1 and a (iii) p F1 if and only if f (p) F2 , for all p Q1 Conditions (i), (ii) and (iii) imply that D1 and D2 are essentially the same automaton up to renaming of states. Therefore they accept the same input set. We can show that the minimal state DFA corresponding to set it accepts is unique up to isomorphism. This can be done by a beautiful correspondence between a DFA with input alphabet and certain equivalence relations on . and
9.10.2
Myhill-Nerode Relation
Let M be the following DFA with no inaccessible states: M = (Q, , , s, F )
Chapter 9 Finite Automata Let L(M ) = R . That is, R is the regular set accepted by M . We dene an equivalence relation M on :
468
(s, y ). It is easy to see that M is an equivalence relation on . Thus the automaton M induces the equivalence relation M on . It is interesting to note that M also satises the following properties: (a) M is a right congruence: For any x, y and a , if x M y then xa M ya. To see this, we assume that x M y . Then (s, x), a , (s, xa) = (s, y ), a , = (s, ya), = Hence xa M ya. (b) M renes R: (s, x) = (s, y ), which is either an accept state or a x M y means reject state. Therefore either both x and y are accepted or both x and y are rejected. Stated another way, every equivalence class induced by M has all its elements in R or none of its elements in R. (c) M is of its nite index: That is, the number of equivalence classes induced by M is nite. Corresponding to each state q Q, there is exactly one equivalence For any x, y , if x M y then (x R y R). By denition by denition by assumption by denition
(s, x) = For any two strings x, y , we say x M y if and only if
Chapter 9 Finite Automata class given by, (s, x) = q x |
469
The number of states is nite and hence the number of equivalence classes is also nite. Denition 9.10.1: An equivalence relation M on is a Myhill-Nerode relation for R, a regular set, if it satises the properties (a), (b) and (c) above; that is, M is a right congruence of nite index, rening R. The interesting fact about the denition above is that it characterises exactly the relations on that are M for some automaton M . That is, we can construct M from M using only the fact that M is a Myhill-Nerode relation.
9.10.3
Construction of the DFA from a given MyhillNerode Relation
We rst show the construction of the DFA M , for a given R , from any given Myhill-Nerode relation for R. We will assume that R is not necessarily regular. Given any string x, its equivalence class is dened by, [x] = {y | y x} Note that there are indenitely many strings, but there are only nitely many equivalence classes by property (c) above. We now dene the DFA M = (Q, , , s, F ) as follows: Q = {[x] | x }
Chapter 9 Finite Automata s = [] F = {[x] | x R} well dened because x R i [x] F by property (b) ([x], a) = [xa] well dened by property (a)
470
We now have to show that L(M ) = R. Lemma 9.10.2: ([x], y ) = [xy ] Proof. The proof is by induction on [y ]. For the basis, we note that, ([x], ) = [x] = [x] ([x], ya). By denition, In the induction step, we consider ([x], y ) , a ([x], ya) = = ([xy ], a) , = [xyz ] using induction hypothesis
Theorem 9.10.3: L(M ) = R Proof. Let x be any string recognized by M . We will argue that x is in R. Now, x L(M ) ([], x) F, denition of acceptance
Chapter 9 Finite Automata [x] F using lemma 9.10.2 by denition of F
471
x F,
9.10.4
Myhill-Nerode Relation and the Corresponding DFA
We have done the following constructions: (i) given M , we have dened M , a Myhill-Nerode relation. (ii) given (a Myhill-Nerode relation), we have dened M . We will now show that the constructions (i) and (ii) are inverses up to isomorphism of automata. Lemma 9.10.4: Let be a Myhill-Nerode relation for R . Let M be the corresponding DFA. From M if we now dene the corresponding Myhill-Nerode relation, say M , it is identical to . Proof. Given , let the corresponding DFA be M , dened as M = (Q, , , s, F ). Then, for any two strings x, y , by denition, x M y Hence M is identical to . (s, x) = (s, y ) ([], x) = ([], y ) [x] = [y ] by lemma 9.10.2 x y.
Chapter 9 Finite Automata Lemma 9.10.5:
472
Let the DFA for R be M , with no inaccessible states. Let the corresponding Myhill-Nerode relation be M . From M if we construct the corresponding DFA say MM , it is isomorphic to M . Proof. Let M = (Q, , , s, F ) let MM = (Q , , , s , F ) By construction, for the DFA MM [x] = {y | y M x} (s, y ) = (s, x) = y| Q = {[x] | x } s = [] F = {[x] | x R} ([x], a) = [xa] We now have to show that MM and M are isomorphic under the map: (s, x) f : Q Q where f ([x]) = By the denition of M , (s, x) = (s, y ) . [x] = [y ] i So the map of f is well-dened on the equivalence classes induced by M . Also f is 11. Since M has no inaccessible states f is onto. and
Chapter 9 Finite Automata To argue that f is an isomorphism we need to show that, (i) f (s ) = s ([x], a) = f ([x]), a (ii) f (iii) [x] F i f ([x]) F and
473
We show these as follows: (i) f (s ) = f ([]) , (s, ) = = s, ii) ([x], a) = f ([xa]) , f (s, xa) , = (s, x) , a , = = f ([x]), a , (iii) [x] F i x R, denition of s denition of f denition of denition of denition of f denition of denition of f
denition of acceptance and property (b) since L(M ) = R
(s, x) F, i i f ([x]) F,
denition of f .
We thus have the following theorem. Theorem 9.10.6: Let be a nite alphabet. Up to isomorphism of automata, there is a 11 correspondence between DFA (with no inaccessible states) over accepting R and Myhill-Nerode relations for R on . Theorem 9.10.6 implies that we can deal with regular sets and nite automata in terms of a few simple algebraic properties.
474
9.11
9.11.1
Myhill-Nerode Theorem
Notion of Renement
Denition 9.11.1: A relation r1 is said to rene another relation r2 if r1 r2 , considered as sets of ordered pairs. That is, r1 renes r2 if for all x, y if x r1 y holds then x r2 y also holds. For equivalence relations 1 and 2 , this means that for every x, the 1 -class of x is included in the 2 -class of x. For example, the equivalence relation i j mod 6 on the integers renes the equivalence relation i j mod 3. We will now show that there exists a coarsest Myhill-Nerode relation R for any given regular set R; that is, any other Myhill-Nerode relation for R renes R . The relation R corresponds to the unique minimal DFA for R. Property (b) of the denition of Myhill-Nerode relations says that a Myhill-Nerode relation for R renes the equivalence relation with equivalence classes R and \ R. The relation of renement between equivalence relations is a partial order: it is reexive: every relation renes itself. it is antisymmetric: if 1 renes 2 and if 2 renes 1 then 1 and 2 are the same relation. it is transitive: if 1 renes 2 and if 2 renes 3 then 1 renes 3 . Note that, if 1 renes 2 , then 1 is the ner and 2 is the coarser of the two relations. The identity relation on any set S , {(x, x) | x S } is the
475
nest equivalence relation on S . Also the universal relation on any set S , {(x, y ) | x, y S } is the coarsest equivalence relation on S .
9.11.2
Myhill-Nerode Theorem and Its Proof
Let R , regular or not. The equivalence relation R on is dened in terms of R as follows: For any x, y, z , x R y if and only if (xz R yz R). That is two strings x and y are equivalent under R if whenever we append any other string z to both of them, the resulting strings xz and yz are either both in R or both not in R. It is easy to reason that R is an equivalence relation for any R. The following lemma 9.11.2 will show that R for any R (regular or not) satises the properties (a) and (b) of MyhillNerode relations and it is the coarsest such relation on . In case R is regular, R is of nite index and is therefore a Myhill-Nerode relation for R. In fact, it is the coarsest possible Myhill-Nerode relation for R and it corresponds to the unique minimal nite automaton for R. Lemma 9.11.2: Let R , be any set, regular or not. Let the relation R be dened as follows: For any x, y, z , x R y if and only if (xz R yz R). Then R is a right congruence rening R and is the coarsest such relation on R . Proof. We rst show that R is a right congruence:
Chapter 9 Finite Automata taking z = aw, a , w in the denition of R we have, x R y a , w (xaw R yaw R)
476
a (xa R ya) We next show that R renes R. Taking z = in the denition of R, x R y (x R y R) . We now show R is the coarsest such relation. That is any other equivalence relation satisfying property (a) and (b) renes R : x y z (xz yz ) , by induction on |z | and using property (a) by property (b)
z (xz R yz R) , x R y, by denition of R .
We are now ready to state and prove the Myhill-Nerode Theorem. Theorem 9.11.3 (Myhill-Nerode Theorem): Let R . The following statements are equivalent: (i) R is regular. (ii) there exists a Myhill-Nerode relation for R. (iii) the relation R is of nite index. Proof. We rst show (i) (ii): Given a DFA M for R (because R is regular) we can construct R , a Myhill-Nerode relation for R (as shown in Section 9.11.2).
477
We next show (ii) (iii): By lemma 9.11.2, any Myhill-Nerode relation R is of nite index and renes R ; therefore R is of nite index. We nally show (iii) = (i): If R is of nite index, then it is a MyhillNerode relation for R and given R we can construct the corresponding DFA MR for R. Since R is the unique coarsest Myhill-Nerode relation for R, a regular set, it corresponds to the DFA for R with the fewest states among all DFAs for R.
9.12
9.12.1
Non-regular Languages
An Example
We now show that the power of nite automata is limited. Specically we show that there exist languages that are not regular. Consider the set A given by, A = {an bn | n 0} = {, ab, aabb, aaabbb, . . .} The intuitive argument to show that there exists no DFA M such that L(M ) = A goes thus: if such an M exists then M has to remember when passing through the centre point between the as and the bs and how many as it has seen. It has to do this for arbitrarily long strings an bn (n may be arbitrarily large), much larger than the number of states. This is an unbounded amount of information and it is not possible to remember this with only nite memory. The formal argument is given below:
478
Assume that A is regular and assume that a DFA M exists such that L(M ) = A. Let k be the number of states of this assumed DFA M . Consider the action of M on the input an bn , where n k . Let the start state be s and let the machine reach a nal state r after scanning the input string an bn : aaaaaaaaaaaaaaaa n s Figure 9.23: Since n k , by pigeon hole principle, there must exist some state, say p that M must enter more than once while scanning the input. We break the string an bn into three pieces u, v, w where v is the string of as scanned between the two occurrences of entry into state p. This is depicted below: aaaaaa aaaaa u s p v p Figure 9.24: Note that we have assumed |v | > 0. We then have (s, u) = p (p, v ) = p (p, w) = r F We now show that the substring v can be deleted and the resulting string will still be erroneously accepted: (s, u), w = (p, w) = r F (s, uw) = The acceptance is erroneous because after deleting v , the number of as in the resulting string is strictly less than the number of bs. This is a contradiction aaaa bbbbbbbbbbbbbbb w r bbbbbbbbbbbbbbb n r
479
and the assumption there exists a DFA M to recognize A is wrong. In other words, A is not regular. Note that we could also insert (pump in) extra copies of v and the resulting string would be erroneously accepted. We formalize the idea used in the above example in the form of a theorem called the pumping lemma.
9.12.2
The Pumping Lemma
The theorem commonly known as pumping lemma is about a special property of regular languages. Languages that are not regular lack the property. The property states that each string (of a regular language) contains a section that can be repeated any number of times with the resulting string still remaining in the language. Theorem 9.12.1 (Pumping Lemma): Let A be a regular set. Then there is a number p (the pumping length) such that for any string w A with |w| p, w may be divided into three parts, as w = xyz satisfying the following conditions: (i) xy i z A, i 0 (ii) |y | > 0 and (iii) |xy | p Proof. We start with an informal approach to the proof. Let M = (Q, , , s, F ) such that L(M ) = A. Let us take p to be |Q|, the number of states. If for all w A it happens that |w| < p, then the theorem is vacuously truebecause the three conditions should only for all
480
strings of length at least p. Let w A be such that |w| = n p. Consider the sequence of states (starting with s) that M goes through in accepting w. It starts in state s, then goes to say q3 , then say q21 , then say q10 and so on till it reaches a nd state say q14 at the end of the last symbol in w. Since |w| = n, the sequence of states q3 , q21 , q10 , . . . , q14 has length n + 1. Also as n p, n + 1 is greater than p, the number of states of M . By the pigeon hole principle, the sequence of states must contain a repeated state. The following gure depicts the situationstate q10 is repeated: w= s w1 q3 w2 q21 w3 q10 Figure 9.25: We divide w into three parts thus: x is the part of w appearing before the rst occurrence of q10 ; y is the part of w between the two occurrences of q10 ; z is the remaining part of w, coming after the second occurrence of q10 . Therefore, (s, x) = q10 (q10 , y ) = q10 (q10 , z ) = q14 w4 w5 q10 wn q14
Let us pump in a copy of y into wwe get the string xyyz . The extra copy of y will start in state q10 and will end in q10 . So z will still start in state q10 and will go to the accept state q14 . Thus xyyz will be accepted. Similar reasoning shows that the string xy i z, i > 0 will be accepted. It is easy to reason that xz is also accepted. Thus condition (i) is satised. Since y is the part occurring between the two dierent occurrences of state q10 , |y | > 0 and so the condition (ii) is satised. To get condition (iii), we make sure that q14 is the rst repetition in the
481
sequence. By pigeon hole principle, the rst p +1 states in the sequence must contain a repetition. Therefore |xy | p. We now formalize the above ideas. As before, let w = w1 w2 w3 wn , n p. Let r1 (= s), r2 , r3 , . . . , rn+1 be the sequence of states that M enters while processing w, so that ri+1 = (ri , wi ), 1 i n. This sequence of states has length n + 1 which is at least p + 1. By the pigeon hole principle there must be two identical states among the rst p + 1 states in the sequence. Let the rst occurrence of the repeated state by ra and the second occurrence of the same state by rb . Because ra occurs among the rst p + 1 states starting at r1 we have a p + 1. Now let, x = w 1 w 2 wa 1 y = w a w b 1 z = w b wn As x takes M from state r1 to ra , y takes M from ra to rb and z takes M from ra to rn+1 , an accept state, M must accept xy i z for i 0 (see the informal argument). We know that a = b, so |y | > 0. Also a p + 1 and so |xy | p. Thus all the conditions of the theorem are satised.
Bibliography
[1] G. E. Andrews, Number Theory, Hindustan Publishing Corporation (India) Delhi, 1992. [2] T. M. Apostol, Introduction to Analytic Number Theory, Springer International Student Edition, Narosa Publishing House, New Delhi, 1989. [3] R. Balakrishnan and K. Ranganathan, A Textbook of Graph Theory, Springer, New York, 2000. [4] C. Berge, Priciples of Combinatorics, Academic Press, New York, 1971. [5] E. R. Berlekamp, Algebraic Coding Theory, revised ed., Algean Park Press, Laguna Hills, 1984. [6] B. Bollobas, Modern Graph Theory, Springer-Verlag, New York, 1998. [7] J. A. Bondy and U. S. R. Murty, Graph Theory with Applications, The MacMillan Press Ltd., 1976. [8] G. S. Boolos and R. C. Jerey, Computability and Logic, 2/e, Cambridge University Press, Cambridge, 1980. [9] G. Brassard and P. Bratley, Algorithmics: Prentice-Hall, 1988. Theory and Practice,
[10] C. Chuan-Chong and K. K. Meng, Principles and Techniques in Combinatorics, World Scientic, Singapore, 1992. [11] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices and Groups, Springer-Verlag, New York, 1988. [12] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, Prentice Hall of India, Eastern Economy Edition, 2000.
482
BIBLIOGRAPHY
483
[13] M. D. Davis, R. Sigal, and E. J. Weyuker, Computability, Complexity and Languages: Fundamentals of Theoretical Computer Science, 2/e, Academic Press, New York, 1994. [14] W. Die and M. E. Hellman, New Directions in Cryptography, IEEE Trans. Inform. Theory 22 (1976), 644684. [15] L. Dornho and F. E. Hohn, Applied Modern Algebra, Macmillan, New York, 1978. [16] H. M. Edgar, A First Course in Number Theory, Wadsworth Publishing Co., 1988. [17] H. B. Enderton, A Mathematical Introduction to Logic, Academic Press, New York, 1972. [18] P. Flajolet, Mathematical Methods in the Analysis of Algorithms and Data Structures, Trends in Theoretical Computer Science (E. B orger, ed.), Computer Science Press, 1988. [19] W. J. Gilbert, Modern Algebra with Applications, Wiley, New York, 1976. [20] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics, Addison-Wesley, Reading, MA, 1989. [21] W. K. Grassmann and J. P. Tremblay, Logic and Discrete Mathematics, Prentice-Hall, New Jersey, 1996. [22] G. Gratzer, Lattice Theory, Freeman, San Francisco, 1971. [23] , General Lattice Theory, Birkhauser, Basel, 1978.
[24] D. H. Greene and D. E. Knuth, Mathematics for the Analysis of Algorithms, Birkhauser, Boston, 1982. [25] F. Harary, Graph Theory, Addison-Wesley, Reading MA, 1971, Second Printing. [26] J. L. Hein, Discrete Structures, Logic and Computability, 2/e, Jones and Bartlett, 2002. [27] M. E. Hellman, The Mathematics of Public-key Cryptogrphy, Sci. Amer. 241 (1979), 130139. [28] I. N. Herstein, Topics in Algebra, 2/e, Wiley, New York, 1975.
BIBLIOGRAPHY
484
[29] K. Homan and R. Kunze, Linear Algebra, Prentice Hall of India, Eastern Economy Edition, New Delhi, 1967. [30] F. E. Hohn, Applied Boolean Algebra, 2/e, Macmillan, 1970. [31] W. M. L. Holcombe, Algebraic Automata Theory, Cambridge University Press, Cambridge, 1982. [32] J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory, Languages and Computation, Addision-Wesley, 1979. [33] J. E. Hopcroft and J. E. Ullman, Formal Languages and their Relation to Automata, Addison-Wesley, 1969. [34] E. Horowitz, S. Sahini, and S. Rajasekaran, Fundamentals of Computer Algorithms, Galgotia Publication Pvt Ltd, New Delhi, 2000. [35] N. Jacobson, Basic Algebra, 2/e, vol. I & II, Freeman, San Francisco, 1985. [36] D. Jungnickel, Finite Fields; Structure and Arithmetic, Bibliographisches Institut, Mannheim, 1993. [37] D. Kahn, The Codebreakers, Weidenfeld and Nicholson, 1967. [38] D. E. Knuth, Seminumerical Algorithms, 2/e, The Art of Computer Programming, vol. 2, Addison-Wesley, Reading, Mass, 1981. [39] , Fundamental Algorithms, 3/e, The Art of Computer Programming, vol. 1, Addison Wesley Longman, 1997.
[40] N. Koblitz, A Course on Number Theory and Cryptography, Springer Verleg, 1987. [41] D. C. Kozen, Automata and Computability, Springer-Verlag, New York, 1997. [42] S. Lang, Algebra, Addison-Wesley, 1993. [43] H. R. Lewis and C. H. Papadimitriou, Elements of the Theory of Computation, Prentice-Hall, Eaglewood Clis, 1981. [44] R. Lidl and G. Pilz, Applied Abstract Algebra, 2/e, SpringerVerlag, 1998. [45] S. MacLane and G. Birkho, Algebra, 2/e, Collier Macmillan, 1979.
BIBLIOGRAPHY
485
[46] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes, vol. I & II, North-Holland, Amsterdam, 1977. [47] Neeraj Kayal Manindra Agrawal and Nitin Saxena, Primes is in p, Mathematics Newsletter 555 (2003), 1119. [48] J. Martin, Introduction to Languages and the Theory of Computation, McGraw Hill, 2003. [49] J. Matousek and J. Nesetril, Invitation to Discrete Mathematics, Clarendon Press, Oxford, 1998. [50] 1998. [51] R. J. McEliece, Information Theory and Coding Theory, AddisonWesley, Reading,Mass, 1977. [52] E. Mendelson, Introduction to Mathematical Logic, D. Van Nostrand, New York, 1979. [53] J. L. Mott, A. Kandel, and T. P. Baker, Discrete Mathematics for Computer Science, Reston Publishing Co., VA, 1983. [54] I. Niven and H. S Zuckerman, An Introduction to the Theory of Numbers, ???, 2000. [55] A. M. Odlyuzko, Discrete Logarithms in Finite Fields and Their Cryptographic Signicance, Advances in Cryptology, Paris 1984 (T. Beth, N. Cot, and I. Ingemarsson, eds.), Lecture Notes in Computer Science, vol. 209, SpringerVerlag, Berlin, 1985, pp. 224314. [56] E. S. Page and L. B. Wilson, An Introduction to Computational Combinatorics, Cambridge University Press, Cambridge, 1979. [57] W. W. Peterson and E. J. Weldon Jr., Error-Correcting Codes, MIT Press, Cambridge,Mass, 1972. [58] V. Pless, Introduction to the Theory of Error-Correcting Codes, 2/e, Wiley, New York, 1989. [59] H. J. Ryser, Combinatorial Mathematics, Carus Mathematical Series, no. 14, American Mathematical Association, 19. [60] R. Sedgewick and P. Flajolet, An Introduction to the Analysis of Algorithms, Addison-Wesley, 1996. , Invitation to Discrete Mathematics, Clarendon Press, Oxford,
BIBLIOGRAPHY
486
[61] M. Sipser, Introduction to Theory of Computation, PWS Publishing Company, 1997. [62] G. Szasz, Introduction to Lattice Theory, Academic Press, New York, 1963. [63] B. L. van der Waerden, Modern Algebra, vol. I & II, Ungar, New York, 1970. [64] J. H. van Lint, Introduction to Coding Theory, 2/e, Springer-Verlag, New York, 1992. [65] J. von zur Gathen and J. Gerhard, Modern Computer Algebra, Cambridge University Press, 1999. [66] D. Welsh, Codes and Cryptography, Clarendon Press, New York, 1988. [67] D. B. West, Introduction to Graph Theory, Prentice-Hall, New Jersey, 1996.

Discrete Mathematics - Balakrishnan and Viswanathan

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Discrete Mathematics - Balakrishnan and Viswanathan

Transféré par

Droits d'auteur :

Formats disponibles

Contents

ii 75 78 79 80 83 83 84 85 85 86 86 90 93 94 96 97 98 100 102 104 105

2.17 2.18 2.19 2.20 2.21 2.22 2.23

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

for each , and consequently, x A . Thus

( A ) . This proves (i).

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

(iii) f 1 (Y1 Y2 ) = f 1 (Y1 ) f 1 (Y2 ), (iv ) f 1 (Y1 Y2 ) = f 1 (Y1 ) f 1 (Y2 ).

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

Finite and Innite Sets

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

Clearly f is onto; however, f is not 11 since f (1) = f (0) = f (1) = 0.

Chapter 1 Introduction: Sets, Functions and Relations

Cardinal Numbers of Sets

Chapter 1 Introduction: Sets, Functions and Relations Lemma 1.5.7:

Chapter 1 Introduction: Sets, Functions and Relations

Power set of a set

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

Partially Ordered Sets

Figure 1.4: Hasse diagrams of ve elements

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations Example 1.8.6:

We now dene a lattice.

Chapter 1 Introduction: Sets, Functions and Relations Denition 1.9.1:

(L2 ) a (b c) = (a b) c; and (L3 ) a (a b) = a; Now, by (L3 ),

a (b c) = (a b) c, (Associative law) (Absorbtion law)

(iii) Finally we prove that is transitive. Let a b and b c so that a b = a and b c = b.

Chapter 1 Introduction: Sets, Functions and Relations 10 5 20

Chapter 1 Introduction: Sets, Functions and Relations Denition 1.9.4:

Proof. We have (see Exercise 5 of 1.12) a b = a (b c) = (a b) c (by L2)

Chapter 1 Introduction: Sets, Functions and Relations (ii) x (y z ) (x y ) (x z ).

Distributive and Modular Lattices

Figure 1.6: Hasse diagram of the diamond and pentagonal lattices

In the case of the pentagonal lattice, a (b c) = a 0 = a, while

= (a b) (b c) = (a c) (b c) = (a (b c)) (c (b c)) = (a (b c)) c (by absorbtion law)

We now consider another important class of lattices called modular lattices.

Chapter 1 Introduction: Sets, Functions and Relations

As L is distributive, by Theorem 1.9.14, y1 = y2 .

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

Examples of Boolean algebras

(by distributivity) (since a b = b a )

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations

Chapter 1 Introduction: Sets, Functions and Relations p be an atom such that p c

( a). Then p {p1 , . . . , pn }, as

ba / A(b) a A \ A(b) = (A(b)) . Thus A(b ) = A(b)

Chapter 1 Introduction: Sets, Functions and Relations

11. Give a detailed proof of Theorem 1.10.7.

Chapter 2 Combinatorics 1. study of the intrinsic properties of a known conguration

Elementary Counting Ideas

Sum Rule and Product Rule

Chapter 2 Combinatorics Values of Variables 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Function Value 1 1 0 0 1 1 0 0

Combinations and Permutations

where n! (read as n-factorial) is the product 1 2 3 n.

Examples in simple combinatorial reasoning