Bulletin of Mathematical Biology (2007) 69: 215–243 DOI 10.

1007/s11538-006-9119-3

ORIGINAL ARTICLE

An Extended RNA Code and its Relationship to the Standard Genetic Code: An Algebraic and Geometrical Approach
´ a,∗ , Eberto R. Morgadob , Tzipe Govezenskya Marco V. Jose
a

Theoretical Biology Group, Instituto de Investigaciones Biom´ edicas, Universidad Nacional Autonoma de M´ exico, M´ exico D.F. 04510, M´ exico ´ b Facultad de Matem´ atica, F´ ısica y Computacion, ´ Universidad Central “Marta Abreu” de Las Villas, Santa Clara, Cuba
Received: 19 September 2005 / Accepted: 23 February 2006 / Published online: 2 November 2006 C Society for Mathematical Biology 2006

Abstract An algebraic and geometrical approach is used to describe the primaeval RNA code and a proposed Extended RNA code. The former consists of all codons of the type RNY, where R means purines, Y pyrimidines, and N any of them. The latter comprises the 16 codons of the type RNY plus codons obtained by considering the RNA code but in the second (NYR type), and the third, (YRN type) reading frames. In each of these reading frames, there are 16 triplets that altogether complete a set of 48 triplets, which specify 17 out of the 20 amino acids, including AUG, the start codon, and the three known stop codons. The other 16 codons, do not pertain to the Extended RNA code and, constitute the union of the triplets YYY and RRR that we define as the RNA-less code. The codons in each of the three subsets of the Extended RNA code are represented by a fourdimensional hypercube and the set of codons of the RNA-less code is portrayed as a four-dimensional hyperprism. Remarkably, the union of these four symmetrical pairwise disjoint sets comprises precisely the already known six-dimensional hypercube of the Standard Genetic Code (SGC) of 64 triplets. These results suggest a plausible evolutionary path from which the primaeval RNA code could have originated the SGC, via the Extended RNA code plus the RNA-less code. We argue that the life forms that probably obeyed the Extended RNA code were intermediate between the ribo-organisms of the RNA World and the last common ancestor (LCA) of the Prokaryotes, Archaea, and Eucarya, that is, the cenancestor. A general encoding function, E, which maps each codon to its corresponding amino acid or the stop signal is also derived. In 45 out of the 64 cases, this function takes the form of a linear transformation F , which projects the whole six-dimensional hypercube onto a four-dimensional hyperface conformed by all triplets that end in cytosine. In the remaining 19 cases the function E adopts the form of an affine
∗ Corresponding author. ´ E-mail address: marcojose@biomedicas.unam.mx (M. V. Jose).

216

Bulletin of Mathematical Biology (2007) 69: 215–243

transformation, i.e., the composition of F with a particular translation. Graphical representations of the four local encoding functions and E, are illustrated and discussed. For every amino acid and for the stop signal, a single triplet, among those that specify it, is selected as a canonical representative. From this mapping a graphical representation of the 20 amino acids and the stop signal is also derived. We conclude that the general encoding function E represents the SGC itself. Keywords Primaeval RNA code · Standard genetic code · Evolution of the genetic code · Extended RNA codes · Algebra and geometry

1. Introduction The current genetic code is considered to be nearly universal. This code is written in an alphabet of four letters (C, A, U, G), grouped into words three letters long, called triplets or codons. Each of the 64 codons specifies one of the 20 amino acids or else serves as a punctuation mark signaling the end of a message. Given 64 codons and 20 amino acids plus a punctuation mark there are 2164 ≈ 4 × 1084 possible genetic codes. Is there something special about the only one code that governs all life on Earth? Francis Crick (1968) argued that the Standard Genetic Code (SGC) need not be special at all; it could be nothing more than a “frozen accident.” This concept is not far away from the idea that sometime there was an age of miracles. However, when the SGC was compared to a computer generated random sample of one million alternatives, the natural code emerged as superior to every random permutation with a single exception (Freeland and Hurst, 1998). Recently, numerical experiments with hand-crafted genetic codes analyzed in silico showed inferior statistical properties (such as information content, scaling and autocorrelation properties) than the SGC (Garc´ ıa et al., 2004). It is widely accepted that there was an age in the origin of life in which RNA played the role of both genetic material and main agent of catalytic activity (e.g. Woese, 1967; Crick, 1968; Kenneth and Ellington, 1995). This period is known as the RNA World (Gilbert, 1986; Gesteland et al., 1999). Investigations on the minimal gene set that is necessary and sufficient to sustain the existence of cellular life are consistent with the notion that the last common ancestor (LCA) of the three primary kingdoms (Archaea, Eucarya, and Prokaryotes) had an RNA genome (Mushegian and Koonin, 1996; Hutchinson et al., 1999; Gil et al., 2002). However, the quasi-species concept of Eigen and Schuster (1977) demonstrated that the accuracy of replication placed limits on the size of the genome that can be maintained by selection. The higher the error rate during replication, the smaller the maximum possible permissible genome size. Thus, replication fidelity was a strong limiting feature in the RNA World. On the other hand, sequence similarities shared by many ancient, large proteins found in all three kingdoms of life suggest that considerable fidelity already existed in the operative genetic system of their LCA, but such fidelity is unlikely, given the Eigen’s limit, to be found in RNA-based genetic systems (Lazcano, 1995; Lazcano and Miller, 1996). The cenancestor probably had a DNA genome (Becerra et al., 1997).

2. 1996. Several authors (e. A.Bulletin of Mathematical Biology (2007) 69: 215–243 217 To our knowledge. Theoretical background The standard table of codon assignments derives from the obvious representa´ ˜ tion of the triplet code as a 4 × 4 × 4 cube. G). given that translational and transcriptional errors were probably of great importance early in the history of life. U. the three four-dimensional hypercubes and the right hyperprism (A9. 1968. (A. A and U are complementary to each other in double stranded DNA. and 01 is to 10. 11 to the letters C. This encoding function adopts different forms for different subsets of codons. we discuss our findings in terms of the origin and evolution of the SGC. U. (C. G. we define an encoding function for each of the three four-dimensional hypercubes.b. 2004a. second and third reading frames. Here. G. Hence. G. (1957) but rather an RNA code which can be translated in the first. Therefore. Konecny et al. observing that 64 is equal not only to 43 but also to 26 . 1. 3.. the constraint of having an intact message in only one reading frame has to be relaxed. C. we search for symmetries and patterns in both the SGC and the RNA code. U↔ 10. 01. G). we show that each reading frame can be represented as a four-dimensional hypercube (A3. we used the following ordered assignment of the nucleotide bases: C↔ 00. A↔ 01. 1973. Finally. We also derive the general encoding function of the SGC. A3. (A. about the concepts of algebra and geometry is provided at the end of this article. 2005). we consider not a strict comma-less code as proposed by Crick et al.. which also represent the integers 0. two algebras were presented to reflect the relationship between codon assigment and the physicochemical aspects of the amino acids. it is convenient to select the ordering in such a way that 00 is complementary of 11. may be done in 24 = 4! ways. 1977. whereas the pyrimidines C and U are represented by the even numbers 0 and 2. 1995) with the SGC. C. A. .1. we have only eight possible selections: (C.1. G↔ 11. Sanchez et al. in the binary numerical system.3) associated to the RNA-less code can be inserted as pairwise disjoint four-dimensional affine subspaces in the six-dimensional hypercube (Coxeter. U). A1. 1. and. U. and an encoding function for the RNA-less code.. A. 2005). when the machinery of protein synthesis was imprecise. First. The article is organized as follows. Eigen and Schuster. and we show that this function is an integration of the different encoding functions of the above-mentioned four sets. Interestingly. To this end. For the interested reader. an Appendix which is referred to throughout this work. associated to the Extended RNA code.g. U). Next. This assignment of the duplets 00. Jimenez-Monta no ´ et al.1. With this ordering.1) as derived by concepts of combinatorial geometry. The main question that we address in this work is to see if via our algebraic and geometrical approach we can shed some light on the problem of how the primaeval RNA code could have evolved to generate the SGC. In previous works (Sanchez et al. 10. We hypothesize that in order to allow further evolution of the RNA genetic code. Fig. 5). suggested to organize the codon table as a six-dimensional hypercube ´ or six-dimensional vector space over the binary field.. there has not been systematic studies that relate the RNA code (Crick. Given that C and G. purines A and G are represented by the odd numbers 1 and 3. and it represents the SGC code itself.

as in every orthotope. This is so because the picture is a projection of a six-dimensional figure over a plane. also called XOR operation. C. all the angles between adjacent edges are supposed to be right angles. 2.1). N . Results 2. but it is possible to show that the remaining ones lead to the same results.218 Table 1 + 0 1 Bulletin of Mathematical Biology (2007) 69: 215–243 Sum module 2 in the field Z2 0 0 1 1 1 0 (U. This extended code includes 48 triplets which specify 17 amino Table 2 Sum module 2 of nucleotide bases + C A U G C C A U G A A C G U U U G C A G G U A C . U. In the vector space structure of the hypercube the addition is the so-called sum module 2. for the elements of the field Z2 . according to the terminology of Coxeter (1973) (A3).pyrimidine.1. (U. (G. 1986) which comprises the codons with an RNY pattern (R . and (G. A).any nucleotide). bit by bit. C). C. A. These three patterns altogether are here defined as the codons of the Extended RNA code. is defined in Table 1. shifts in the reading frame probably occurred. The RNA code and its three reading frames: The extended code A primaeval RNA World was proposed (Gilbert. G. Y . the so-called Klein Four Group. G. A). A. which is a vector space over the binary field Z2 = {0. when they are portrayed in a plane. U. Since the primitive translational apparatus may have been imperfect. C). The same happens with some of the angles of a three-dimensional cube. From these orderings. In the hypercube. but in our pictures many of them look like acute or obtuse. 1}(A3.purine. which is the operation table of a known abelian group of order 4. This sum. The vector space (Z2 )6 is an orthotope. This group is isomorphic (A3) to the group of all the symmetries of a plane rectangle (not including the square). When Table 1 is translated to the nucleotide bases it gives Table 2. widely used in symbolic logic. The 64 sextuples of zeros and ones generate the so-called six-dimensional hypercube. and codons with an NYR and YRN patterns emerged. we have selected the first.

. Y. that . these pairs of codons specify the same amino acid. U. and Z belong to the set {C. and we defined it as the RNA-less code or complementary code of the Extended RNA code. the second (NYR pattern) and the third (YRN pattern) sets correspond to readings of the primaeval RNA code at the second and the third reading frames. Now we will consider the combinatorial geometry. the start codon. RRY is obtained from YYY by the addition of the triplet AAC. and the fourth set corresponds to the RNA-less code (YYY and RRR patterns). In this context. The remaining 16 codons can only be obtained by mutations other than by frame-shift readings. associated to the canonical vector e4 . every pair consisting of triplets that have. which represents transversions in the first and the second nucleotides. In order to describe algebraically and geometrically all sets. For example. each of them formed by triplets having the same central nucleotide. The first set corresponds to the so-called primaeval RNA code (RNY pattern). For every pair (i . associated to the structure of the vector space.1). not only the same central nucleotide. that is. 1b) in which every amino acid corresponds to an edge of the associated cube. these subclasses correspond to coordinated planes or faces of the cube. contained in the six-dimensional hypercube. First reading frame The 16 triplets of the RNY pattern code for eight amino acids and are considered as the primaeval RNY code (Crick. 1a) and RRY (Fig. The eight elements of a subset correspond to a three-dimensional coordinated affine subspace (A4.1) or vectorial cube. We say that the reading and translation of sequences of codons of the form RNY defines the first reading frame (FRF) in the RNA World. In most cases.Bulletin of Mathematical Biology (2007) 69: 215–243 219 acids. but also the same first nucleotide. 2.1).1) between triplets on the same edge is 1.1). where X. For every triplet (i . Eigen and Schuster. we may consider two subclasses. of the canonical base. G}. Each cube is obtained from the other by means of the translation t4 . only one is a vector subspace (A1. can be partitioned into four sets of 16 triplets each. k). Finally. 1978). we introduce the following notational convention. that is. and the three known stop codons. The other seven subsets are three-dimensional affine subspaces obtained from YYY by translations (A9. For any vector space the associated geometry is defined as the family of all the affine subspaces or linear varieties of the given space (A2). The fourth set is the union of two subsets: YYY and RRR. 2004. that is. This is so since the Hamming distance (He et al. including AUG. the addition of the vector e2 which involves a transversion (A10. we denote as ti the associated translation: v → v + ei . j . RYY is obtained from YYY by the addition of the triplet ACC.2. Each of the first three sets can be partitioned into two subsets by replacing the N by Y or R. the 64 triplets XYZ of the SGC. the vector e2 + e4 . and that is the subset YYY. j ). They constitute coordinated lines or edges of the hypercube. an ordinary cube. 1968.2) in the first nucleotide. The set of codons of the form RNY is the union of the cubes RYY (Fig. we denote as ti jk the composed function ti ◦ t j ◦ tk and so on (A9. which contains the null vector CCC . A. each of the subclasses is partitioned into two pairs. for the set RNY we obtain the two subsets RYY and RRY. For each of the eight subsets. respectively. we denote as ti j the composed function ti ◦ t j . From the eight subsets. A6. For every vector ei .

1 RNY code. Graphical representation of the subsets (a) RYY and (b) RRY. Note that in each cube the four amino acids correspond to 4 (solid edges) out of the 12 edges of the cube. is equal to 1. five of which were also found in the FRF. Then. the addition of the codon CAC to each triplet. (c) First Reading Frame: RNY = RYYURRY. 2. It is a symmetrical regular polytope with mutually orthogonal (A8) sides. due to an slippage. The second reading frame If in a sequence of triplets of the form RNY. is. which is a coordinated affine subspace (A4). The 16 triplets of the form NYR specify eight amino acids. The Hamming distance between a vertex and its image under the translation t4 . or a four-dimensional hyperface of the sixdimensional hypercube of the 64 triplets (Fig. and is therefore an orthotope (Coxeter.3. 1973). also called a measure polytope. ignoring the first nucleotide R. the reading starts at the second nucleotide N. The hypercube is a generalization of a three-dimensional cube in n dimensions.220 Bulletin of Mathematical Biology (2007) 69: 215–243 Fig. It means that the set of codons of the form RNY constitutes a four-dimensional hypercube. 1c). then triplets of the form NYR will be translated. the triplets of the form NYR give . The set of triplets of the form NYR is a disjoint set with the set RNY.

that is. 2b).. Each cube can be derived . Graphical representation of the subsets (a) YYR and (b) RYR. which involves transversions in the first and third nucleotides. In this set. The set of codons of the form NYR is the union of the cubes YYR (Fig. that is. those of the FRF and the SRF. the start codon AUG ensues (Konecny et al. YYR is obtained from YYY by addition of the codon CCA. In (b) not all the amino acids correspond to edges of the cube. RYR is obtained from YYY by the addition of the triplet ACA. 2a) and RYR (Fig. that is. 1995). We say that the reading and translation of sequences of codons of the form NYR defines the second reading frame (SRF) in the RNA World. Note that in the cube YYR the amino acid Leu corresponds to a face and in the cube RYR not every amino acid corresponds to an edge of the associated cube. (c) Second Reading Frame: NYR = YYRURYR. as an extension of the original RNY code. rather there are two amino acids (Met and Ile) which correspond to single vertexes. and Leu correspond to a face (four connected solid edges).Bulletin of Mathematical Biology (2007) 69: 215–243 221 a) b) c) Fig. and their correspondence with their 11 amino acids. rise to only three new amino acids. 2 NYR code. We consider the union of the sets of codons of the form RNY and NYR. Note that in (a). two amino acids correspond to 2 (solid edges) out of the 12 edges of the cube. of the vector e6 which represents a transversion in the third nucleotide. the addition of the vector e2 + e6 .

which complete a set of 17 out of the 20 primary amino acids. which is a coordinated affine subspace. where 45 out of them specify 17 of the primary amino acids.4. second and third reading frames. as an extension of the primaeval RNY code. In the YRY cube every amino acid corresponds to an edge of the associated cube but this is not the case in the YRR cube. e2 → e6 . Each cube is obtained from the other by the translation t6 . e5 → e3 . 2c). 3c). NYR and YRN. the set of 48 codons of the form RNY. which is a coordinated affine subspace. e3 → e1 . The third reading frame If in a sequence of triplets of the form RNY. NYR and YRN.222 Bulletin of Mathematical Biology (2007) 69: 215–243 from the other by means of the translation t2 . which is defined by an even permutation of the canonical base. 3a) and YRR (Fig. The Hamming distance between a vertex and its image. 2. . The Hamming distance between a vertex and its image. 3b). This function maps every triplet XYZ onto the triplet YZX. It means that the set of codons of the form NYR constitutes also a four-dimensional hypercube. e6 → e4 . Then. without repetitions of any of those found in the FRF or the SRF. e6 → e4 . ignoring the first (R) and the second (N) nucleotides. The set of codons of the form YRN is the union of the cubes YRY (Fig. then triplets of the form YRN will be translated. those associated to the first. due to a slippage. that is. or four-dimensional hyperface of the six-dimensional hypercube of the 64 triplets (Fig. e5 → e3 . or a four-dimensional hyperface of the six-dimensional hypercube of the 64 triplets (Fig. It means that the set of codons of the form YRN constitutes also a four-dimensional hypercube. and it can also be interpreted as a double rotation in the six-dimensional Euclidean vector space R6 (A7). the triplets of the form YRN give rise to six new amino acids. under the translation t2 . e4 → e2 . The matrix of this linear transformation is orthogonal with determinant 1. e2 → e6 . that is. Thirteen out of the 16 triplets of the form YRN code for six new amino acids. We will consider the union of the sets of codons of the form RNY. The other three codons are the so-called stop codons. and the other three codons correspond to a termination signal. In summary. The set of triplets of the form YRN is a disjoint set with the sets RNY and NYR. is equal to 1. the addition of the codon CCA to every triplet. under the translation t6 . that is. by the addition of the codon ACC to every triplet. We say that the reading and translation of sequences of codons of the form YRN defines the third reading frame (TRF) in the RNA World. which is the same basis permutation. YRR is obtained from YYY by the addition of the triplet CAA. of the vector e4 that represents a transversion in the second nucleotide. comprises the Extended RNA code. e3 → e1 . associated to the canonical vector e2 . or double rotation. the reading starts at the third nucleotide Y. The hypercube YRN is the image of the NYR under the non-singular linear transformation e1 → e5 . the addition of the vector e4 + e6 which involves transversions in the second and third nucleotides. YRY is obtained from YYY by addition of the codon CAC. which converts RNY into NYR. that is. and their correspondence with 17 amino acids and stop codons. The hypercube NYR is the image of the RNY under the non-singular linear transformation e1 → e5 . e4 → e2 . that is. associated to the canonical vector e6 . is equal to 1.

The 16 codons . Note that in (a). 2. but they do not enter into the composition of the Extended RNA code. This addition completes the set of the 20 amino acids. The union of the Extended RNA code and its complementary RNA-less code. Graphical representation of the subsets (a) YRY and (b) YRR. 3 YRN code. (c) Third Reading Frame: YRN = YRYUYRR. The codons of the RNA-less World The remaining 16 triplets belong to the cubes YYY or RRR. Then. four amino acids correspond to 4 (solid edges) out of the 12 edges of the cube.5. we will call them the triplets of the RNA-less World. that is.Bulletin of Mathematical Biology (2007) 69: 215–243 223 a) b) c) Fig. the triplets of the form YYY or RRR give rise to only three new amino acids. and the three stop codons belong to this cube connected by two solid edges. five out of which are repetitions of those found in the Extended RNA code. these triplets code for eight amino acids. For this reason. constitutes the six-dimensional hypercube of the 64 triplets with the 20 amino acids and a termination mark. Trp corresponds to only one vertex. If we consider the reading and translation of sequences of codons of the form YYY or RRR. those composed by only pyrimidines or only purines. In (b) two amino acids correspond to an edge of the cube (Arg and Gln).

in both cubes (a) and (b). Note that. that is.224 Bulletin of Mathematical Biology (2007) 69: 215–243 a) b) c) Fig. which performs a transversion in every nucleotide component. In contrast to the hypercubes of the Extended RNA code. 4 The RNA-less code: YYYURRR. the addition of the codon AAA to every triplet. of the RNA-less code represent the union of the cubes YYY (Fig. 4a) and RRR (Fig. Each cube could be obtained from the other by the translation t246 associated to the vector e2 + e4 + e6 . the Hamming distance between a vertex and its . the four amino acids correspond also to 4 (solid edges) out of the 12 edges of the cube. Recall that YYY is a vector subspace of the whole vector space. Graphical representation of the subsets (a) YYY and (b) RRR. 4b). (c) RNA-less code.

black • YYY and RRR (RNA-less code).Bulletin of Mathematical Biology (2007) 69: 215–243 225 image. 5 A graph representation diagram of the Boole lattice of the 64 triplets of the SGC. but it is not as in the other cases. 4c and A9. Interestingly. or the standard table of the genetic code) the 64 codons of the SGC but without any reference to an encoding function that maps each triplet with its corresponding amino acid or stop signal. The sixdimensional hypercube can be envisaged as composed by: yellow • RNY (primaeval RNA code). Encoding functions So far.. . none of these Fig. 5). Actually. 1996). is equal to 3.3). The set is a fourdimensional subspace of the six-dimensional hypercube. purple • YRN (RNA code in the TRF). the result of the union of the pairwise disjoint sets of the Extended RNA code plus the RNA-less code is the six-dimensional hypercube of the 64 codons of the SGC (Fig. the six´ ˜ et al. it has been customarily to represent in various graphical ways (e. 3. orange • NYR (RNA code in the SRF). It means that the set of codons of the RNA-less code do not lead to a four-dimensional hypercube. the icosahedron or dodecdimensional hypercube (Jimenez-Monta no ahedron (White undated).g. under the translation t246 . a four-dimensional hyperface but rather it is a right hyperprism of height 3 (Fig.

AUA. u. We select the representation of every amino acid by only one triplet. t . ACU. CCG CUC. t . CUG. 0. CUA. GAU GGC. CGG. AAG GAA. The ordering of the set of triplets is the linear order determined by the selected order of the bases (C. An auxiliary linear function for the different encoding functions In Table 3. AUU GCC. Hence. GCU. the kernel of F is a face of the hypercube. GGA. G) so that the triplet at the left-most position is the one with the minimum value. the first. For the three stop codons we select as representative the triplet UAA.1. 0). GUU. it is well known that most mutations in the third base does not change the corresponding amino acid (the wobble hypothesis) and therefore we started with a function that maps the third nucleotide of a codon to zero. UUU representations is the genetic code. of the vector space (Z2 )6 . v ). F : (x . GCA. that is. a vectorial four-dimensional hypercube or a hyperface of the six-dimensional hypercube. GUA. the triplets of the form CCZ. the RNA-less code. CGU. 3. z. The kernel of F is the two-dimensional subspace of vectors of the form (0. Herein. AGU. is the set of triplets of the form XYC. CUU. denoted by Im( F ). A. y. U. Consequently. and the SGC.2). The endomorphism F belongs to a class of endomorphisms. UCC. It is a four-dimensional subspace. we derive encoding functions for the Extended RNA code. we consider the linear transformation. The image of F . UAU UGC. UCU. the correspondence of the ordered set of triplets that specify every amino acid is illustrated. F ◦ F = F . 0. in fact. 0. ACG AUC. UCA. GUG AAC. u. UAG. AGA. that is. which are called projections and are characterized by the property of idempotency. y. CGA. z. AGC UAC. CCU.GGU. It is the solution subspace of the homogeneous linear .UCG GAC. which carries over every triplet XYZ to the triplet XYC. Biologically. those that end with the nucleotide C. UGU CAA. UUG AUG CAC. GCG GUC. that is. CAG UAA. ACA. CAU CGC. or endomorphism F (A9.CCA. AAU AGC. v ) → (x . the representative of Ala is the triplet GCC. GGG CCC. GAG UUC. it changes the third nucleotide by the nucleotide C. UGA UGG AAA. 0. For example.226 Bulletin of Mathematical Biology (2007) 69: 215–243 Table 3 The correspondence between the ordered set of triplets and every amino acid and stop codons Amino acid Threonine RF Isoleucine Alanine Valine 1st Asparagine Serine Aspartic acid Glycine Proline RF Leucine 2nd Methionine Histidine RF Arginine Tyrosine 3rd Cysteine Glutamine Stop Tryptophan Lysine RNA Glutamic acid Phenylalanine less Cube Symbol Sextuple Set of triplets 1 1 1 1 2 2 2 2 3 3 4 5 5 5 5 6 6 6 7 7 8 Thr Ile Ala Val Asn Ser Asp Gly Pro Leu Met His Arg Tyr Cys Gln Stop Trp Lys Glu Phe 010000 011000 110000 111000 010100 011100 110100 111100 000000 001000 011011 000100 001100 100100 101100 000101 100101 101111 010101 110101 101000 ACC. that is.UUA.

The encoding function for the primaeval RNY code In the four-dimensional hypercube of codons of the form RNY. t . image of the function F .2. system u = 0. to the hypercube RNY. of the linear transformation F : XYZ → XYC . we define a function F1 . is selected. its vertex that ends with the nucleotide C. e2 . G}. This representative triplet in the cube RNC .Bulletin of Mathematical Biology (2007) 69: 215–243 227 Fig. according to the linear order of the set of triplets as derived from the selected ordering in the set {C. v which are the components of a generic vector in (Z2 )6 . z. e4 . or else while preserving the first and second nucleotides. if compared with the other representative. 6). A. 3. in fact. with unknowns x . is the intersection of the four-dimensional hypercubes RNY . Note that the blue three-dimensional cube is image of F1 . representing the FRF of the primaeval RNA code. Note that the cube RNC . of the whole six-dimensional hypercube. This set is a four-dimensional coordinated vector subspace. This means that for every edge associated to the same amino acid. 6 The hypercube NNC . We denote this hypercube as NNC (Fig. The function F1 is the restriction. the third one is changed to C if it is U. is the minor. the function F1 assigns to each of the 16 triplets of the set RNY the same triplet if it ends with C. y. a four-dimensional hyperface of the whole six-dimensional hypercube generated by the canonical vectors e1 . u. The triplet that is selected as the canonical representative of each amino acid is the one which belongs to the cube RNC . In other words. image of the function F1 in the set RNY. Hypercube NNC image of the function F . U. which projects the whole hypercube onto the cube RNC . e3 . which projects it onto its subspace of triplets of the form NNC. v = 0.

the minor. for 13 out of the 16 triplets. AGU → AGC GAC. as is the case of F1 in the hypercube RNY. GGU → GGC (for Threonine) (for Isoleucine) (for Alanine) (for Valine) (for Asparagine) (for Serine) (for Aspartic acid) (for Glycine) 3. CCG → CC A GU A. we define a function F2 . UUG → CU A UC A. ACG → AC A (for Leucine) (for Serine) (for Proline) (for Valine) (for Methionine) (for Isoleucine) (for Alanine) (for Threonine) We note that all of the images of the function F2 . UU A. respectively. Actually. The explicit definition of the function F1 is: F1 ACC. The encoding function for the triplets of the third reading frame In the hypercube of codons of the form YRN. 3. which assigns to every set of triplets. t16 and t56 the composed translations t1 ◦ t6 and t5 ◦ t6 . F2 behaves as the composition t16 ◦ F . according to the linear order in the whole set of triplets. we define a function F3 . UUA and AUG. encoding for the same amino acid. CUG. The explicit definition of the function F2 is: F2 CU A. described above. UCG → UC A CC A. with only three exceptions: UUG. In most cases.228 Bulletin of Mathematical Biology (2007) 69: 215–243 and NNC . GCU → GCC GUC. GUG → GU A AUG → AUG AU A → AU A GC A. associated to the SRF of the Extended RNA code. AAU → AAC AGC. The function F2 is related to the linear transformation F . The encoding function for the triplets of the second reading frame In the hypercube of codons of the form NYR. But the function F2 is not exactly a linear projection of NYR onto NYA. specified by the F RF . ACU → ACC AUC. belong to the cube NYA.4.3. the composition of F with the translation t6 . Here. AUU → AUC GCC. GCG → GC A AC A. in the following way. whereas for AUG it acts as the composition t56 ◦ F . it is so. with the only exception of AUG. . which assigns to every set of triplets. which is a three-dimensional hyperface of the hypercube NYR. For UUG and UUA. GUU → GUC AAC. F2 coincides with the restriction to NYR of the affine transformation t6 ◦ F . GAU → GAC GGC. associated to the TRF of the Extended RNA code. we are taking the cube RNC as a canonical representation of the set of eight amino acids.

belong to the cube YRC . AAG → AAA AGA. the minor. U AG. C AG → C AA (for Cysteine) (for Arginine) (for Tyrosine) (for Histidine) (for Tryptophan) (for Stop codons) (for Glutamic acid) Note that four out of the seven images for the function F3 . encoding for the same amino acid. CUC. For UGG. The abscissa represents each of the 64 codons and they are mapped according to the encoding functions of each subset which results in their respective representative codons. The explicit definition of the function F4 is: F4 CCC. for the triplets AAG. UGA → U AA C AA. CUU → CUC UCC. UCU. for 10 triplets. AAA. A graphical representation of the encoding functions F1 . GGG. The encoding function for the triplets of the RNA-less code In the vector subspace of the codons of the form RRR or YYY. and F4 is shown in Fig. the function F3 coincides with the restriction to YRN of the linear transformation F : XYZ → XYC . UCU → UCC UUC. CCU → CCC CUC. F3 . In fact. AGG → AGA GAA. F3 coincides with the restriction to YRN of the affine transformation t6 ◦ F For UGA. GAG → GAA GGA. F3 coincides with the restriction to YRN of the affine transformation t56 ◦ F . CGG → CGC U AC. F4 acts as the restriction to RRR of the affine transformation t6 ◦ F . AGG. UUC. F3 coincides with the restriction to YRN of the affine transformation t36 ◦ F . which assigns to every set of triplets. the minor. UUU. 7. CCC. the function F4 acts as the restriction to the set YYY of the linear transformation F . UUU → UUC AAA. C AU → C AC UGG → UGG U AA. AGA. The explicit definition of the function F3 is: F3 UGC. U AU → U AC C AC. F2 . UGU → UGC CGC. GGG → GGA (for Proline) (for Leucine) (for Serine) (for Phenylalanine) (for Lysine) (for Arginine) (for Glutamic acid) (for Glycine) For the triplets CCU. However. associated to the RNY-less code we can define a function F4 .Bulletin of Mathematical Biology (2007) 69: 215–243 229 encoding for the same amino acid. UCC. according to the linear order in the whole set of triplets. 3.5. we have considered the lexicographic order of the triplets used above and we have assigned to this order the corresponding integer values from 0 to 63. The linearity of the encoding functions is apparent even considering few departures. GAG. GAA. GGA. For UAG. note that several codons have only . In order to build this plot. CGA. CUU. UAA. according to the linear order in the whole set of triplets. CAG and CAA. CGU. However.

We have defined in the set of 64 triplets a structure of a vector space over the binary field Z2 = {0. Ser appears when F1 . that is. F2 and F4 . F2 . which leads to the representation of each triplet as a sextuple of zeros and ones. we derive an algebraic function. A graphical representation of the local encoding functions F1 . G ↔ 11. F3 and F4 . In Table 3. It is done by means of the assignment C ↔ 00. F3 and F4 70 F1 F2 F3 F4 20 18 19 60 21 20 21 50 17 16 15 17 Representative codon 40 12 11 11 5 10 14 13 30 9 9 8 6 11 20 6 5 7 10 4 2 3 1 4 4 0 0 1 10 20 30 40 50 60 70 Codon (lexicographic order) Fig. one representative codon but some of them appear in two (e. Pro appears when F2 and F4 are applied) or even three (e.230 Bulletin of Mathematical Biology (2007) 69: 215–243 Encoding Function F1. F2.6. as well as a combinatorial geometry. the triplets . Herein. F4 .g. 1}. 61 out of the 64 triplets code for the 20 primary amino acids that are the building blocks of proteins. also called six-dimensional hypercube. 3. as a vector of the six-dimensional vector space (Z2 )6 . A ↔ 01.g. This correspondence involves the addition operation and the vectorial algebraic structure in the set of triplets. are applied) of the subsets. F3 and. associated to the vectorial structure. Then a given amino acid will appear at different ordinate values. which assigns to every triplet its associated amino acid or its termination mark. hence these codons have two or three representative codons. U ↔ 10. The general encoding function for amino acids and stop codons in the hypercube of 64 triplets As it is well known. F2 . 7 Local encoding functions F1 .

The latter is mainly due to a change in the first nucleotide of their codons. and in the ordinate the value corresponding to the canonical codon for each amino acid or the termination mark (image of the function E) is given. The overall shape is still linear and we remark that this function represents the actual SGC. some of which are mapped directly by F but others require a translation. UCU. There are three amino acids encoded by six codons. is taken as the canonical representative of the corresponding amino acid. Three out of the 19 codons specify the stop signal and the remaining eight codons correspond to the three amino acids encoded by six codons whose canonical representatives are of the type NNC. Note that for 8 out of the 20 amino acids there are special subsets of the sets of their associated triplets for which F alone is not the encoding function E. In the abscissa. UAG UGA Encoding function t2 ◦ F t6 ◦ F t6 ◦ F t1 ◦ F t6 ◦ F t56 ◦ F t1234 ◦ F t56 ◦ F t6 ◦ F t36 ◦ F . In order to summarize the function E. AGG CAA. with the desired encoding function denoted by E. CAG GAA. The first triplet in each row. require a particular translation. In Table 4. the one which is in the left-most position (marked in bold characters). UUG AAA. values from 0 to 63 were assigned to each codon according to the selected lexicographic order. In this encoding function a single canonical codon corresponds to each amino acid in contrast to the above-mentioned four encoding functions. We define the encoding function as that which assigns to every triplet the left-most triplet in every row. as shown in Table 4. Table 4 Triplets for which the function F requires an affine transformation Amino acid Arg Gln Glu Leu Lys Met Ser Trp Stop Stop Canonical triplet CGC CAA GAA CUC AAA AUG AGC UGG UAA UAA Special set AGA. Thus.Bulletin of Mathematical Biology (2007) 69: 215–243 231 which code for each amino acid and the stop signal are listed according to their lexicographic order. In fact. their canonical codons are of the type NNC. For the stop signal. represented by crosses. Eight out of the 19 codons specify the remaining five amino acids and their canonical codons are of the type NNA or NNG. The 45 triplets encode for 15 amino acids. UCA. 8. it turns out that the endomorphism F coincides. There are five amino acids and the stop signal whose canonical codons end in A or G and therefore they require specific translations. The other 19 triplets (black characters). 7 is shown in Fig. The 45 codons that specify 15 amino acids which are directly mapped by the linear function F are represented by circles. the composition of F with a suitable translation. The remaining 19 codons. are those for which the encoding function E takes the form of an affine transformation. in most of the cases. AAG AUG UCC. that is. GAG UUA. a graphical representation built in the same way as Fig. UCG UGG UAA. there are also two special sets. we show the encoding functions for every special set. it coincides for 45 (grey characters) out of the 64 triplets.

8 Plot of the general encoding function E in its two main forms. under the affine transformation t6 ◦ F . 6) correspond to triplets that code for 15 out of the 20 amino acids. that is. belong to the hypercube NN A. AUG and UGG. GAN and AAN. the addition of the triplet CCG. 9. The other five amino acids. CAA. respectively. the triplet UCC. which is not the canonical representative of any amino acid. under the affine transformation t56 ◦ F . AUG. which are not represented in the hypercube NNC . but this amino acid is already represented by the triplet AGC. we deleted the vertex with label UCC and its four adjacent edges in the last graph diagram. Glu. In Fig. The graphical representation of the general encoding function E of the SGC.232 Bulletin of Mathematical Biology (2007) 69: 215–243 General Encoding function E 60 mapped by F mapped by F and a translation 20 18 19 17 21 50 16 15 Canonical codon 40 12 14 13 30 10 9 11 20 6 5 7 8 5 4 10 4 2 3 0 1 0 10 20 30 40 50 60 70 Codon (lexicographic order) Fig. 4. are Gln. The graph representation diagram of amino acids and the stop signal Note from Table 3 that the vertexes of the four-dimensional hypercube NNC (Fig. these two triplets are the unitary images of the faces AU N and UGN. which is the image of NNC under the translation t56 . Lys. which is the image of NNC under the translation t6 . As before. It codes for Ser. Amongst the 16 triplets which are labels of the vertexes. These three triplets are the unitary images of the faces C AN. which is the first in its list. and Trp. GAA. and AAA. CAA. For this reason. The first three. The other two triplets. there is only one. GAA. the addition of the triplet CCA. that is. we show a graph representation diagram of . whose canonical triplets are. AAA. Met. and UGG (Table 3). belong to the hypercube NNG.

and another vertex. We observe that the vertex which represents the stop signal is adjacent to the amino acids Glu. which represents the stop signal. The canonical RNA code consists of only RNY codons that comprises 16 out of the 64 possible triplets and which codify for eight amino acids. A graph representation diagram of the 20 amino acids and the stop signal. Gln and Tyr. and that the distance between the amino acids Met and Trp is equal to 3. Altogether these three sets comprise 45 triplets which code for 17 out of the . codons of the type NYR and the YRN appear. Discussion In this work. 9 The phenotypic graph of the 20 amino acids and the stop signal. whose canonical representative triplets do not belong to the hypercube NNC . the 20 amino acids..g. we propose an Extended RNA code as derived from the RNA code as originally proposed by Eigen (1977) and later used by several authors (e. five of which correspond to amino acids. with an additional vertex.Bulletin of Mathematical Biology (2007) 69: 215–243 233 Fig. nor its adjacent edges. Konecny et al. By allowing readings starting at the SRF and TRF positions of the RNY code. 1995). These three sets are disjoint when they are pairwise compared. that represents the class of the three stop codons. without the vertex UCC. with the addition of six external vertexes. Each of these types corresponds to one set of 16 elements. 5. It is a phenotypic graphical image of the hypercube NNC .

by allowing reading slippages in the other two reading frames. providing a comma-free readout via wobbleintermediates to the present form.234 Bulletin of Mathematical Biology (2007) 69: 215–243 20 amino acids. can not be derived by frame-shift readings but rather by other types of mutations such as insertions. and they also include the three stop codons. the steps RNY plus RNR and YNY could also form another extended RNA code. These results suggest a plausible evolutionary path from which the primaeval RNA code could have originated. and YRN (Extended RNA code). plus NYR. Notably. some amino . Our present results do not offer any clue about a chronological order in which the different encoding subsets could have led to the current SGC. Interestingly. The union of the cubes YYY and RRR produces a four-dimensional vector subspace which is not a hyperface of the six-dimensional hypercube. 1977). each of them being a three-dimensional cube. makes up the whole six-dimensional hypercube of 64 triplets. we can hypothesize that the point in which genetically encoded protein translation started to evolve corresponds most likely to a breakthrough organism obeying an Extended RNA code after the RNA World and prior to the cenancestor. These 16 triplets constitute two disjoint sets. This is what we call the Extended RNA code. As a consequence. Alternatively. It works on what already exists. we can decompose the six-dimensional hypercube as consisting of the patterns RNY (primaeval RNA code). The RNY code can be graphically represented as a four-dimensional hypercube that results from the union of the disjoint sets RYY and RRY each of them being a three-dimensional cube.2) to each other as affine subspace of the whole sixdimensional hypercube. However. via the Extended RNA code and the addition of the RNA-less code. deletions and substitutions. and YYY and RRR (RNA-less code) (Fig. respectively. and they proposed that this order may reflect the evolution of the genetic code from an RNY structure. it has been found that the order of triplet frequencies RNY > RNR > YNY > YNR is a general attribute of coding sequences (Eigen et al. It innovates with what it has at hand and this process has been recognized as the evolutionary tinker (Jacob. In the RNY code every amino acid is coded by two neighbor triplets located in an edge whereas in the NYR and YRN codes there are departures from this regularity: some amino acids are now encoded by four triplets and others are encoded by only one. the six-dimensional hypercube. this subspace is isomorphic as affine subspace to the three hypercubes of the Extended RNA code but it is not isometric to any of them. each of them being a three-dimensional cube. The remaining 16 triplets. Given the RNA code. emerge. Natural selection does not generate novelties from scratch. frame-reading mistranslations conferred obviously evolutionary advantages. 1985). 5). each four-dimensional hypercube is isomorphic and isometric (A6. YYY and RRR. and they are pairwise disjoint. which we call RNA-less code or complementary code of the Extended RNA code. Conversely. The NYR and YRN sets are also represented by a fourdimensional hypercube that result from the union of the disjoint sets YYR and RYR. Thus. the union of the three hypercubes of the Extended RNA code and the vector subspace of the RNA-less code.. and the union of the disjoint sets YRY and YRR. 32 new triplets (which codify for nine new amino acids) and three stop triplets (which specify a stop signal). since in fact. either transforming a system to give it new functions or combining several systems to produce a more elaborate one.

.1. In the search of producing synthetic life in the laboratory (Hutchinson III et al. For a vector space V we call affine subspace of V . (1995) first noted that by allowing reading slippages there were two hidden messages in the RNY code which are AUG and CAU which are found in the SRF and TRF. as well as with the concept of a linear transformation or linear endomorphism of a vector space. et al. for a fixed subspace W. respectively. The dimension of a linear variety is defined as the dimension of its associated vector subspace.2. genes are transcribed in a frame-shift fashion (Keck. for a linear variety. the primaeval RNY code was already frozen and that it evolved like a replicating and growing icicle. or linear variety contained in V . Mathematical and biological background We assume that the reader is familiar with the concept of a vector space over a scalar field K. in part. We recall that every vector space is an abelian group for the addition operation. A. Definition A. To our knowledge. is unique. The associated vector subspace W. The set v + W is also called a coset or adjoint class of the subgroup W.Bulletin of Mathematical Biology (2007) 69: 215–243 235 acids of the RNY are also coded by triplets that appear in the SRF. The phenotypic graph of amino acids is also a novel finding whose image resembles. it means that their . and with that of vector subspace of a vector space. It is also worth to mention that in present day DNA virus such as vaccinia virus.1. ´ 1999. Szathmary. this is the first time in which the SGC is expressed as a mathematical function which maps each triplet onto its corresponding amino acid or stop signal. Concept of affine space and its dimension Definition A.. where W is a vector subspace and v is a fixed vector. we can say that considering the symmetries of both the Extended RNA code and the RNA-less code. 1990). a single messenger RNA (mRNA) is able to encode three different proteins because messages contain three distinct putative translation initiation sites.1. and it contains the element v . In other words. In the context of the frozen concept. Appendix A.. 1990). Konecny et al. Given the uneven degeneracy of the genetic code it is appealing that the general encoding function is almost linear. a single mRNA can be translated in three different reading frames which encode three trans-activators that are required for late transcription (Keck et al.1. In the vaccinia virus. but the vector v may be any of the elements of the set v + W. Remarks. to every subset of the form v + W. to any of the three four-dimensional hypercubes of the Extended RNA code. The subspace W is called the associated vector subspace of the linear variety v + W. When two vectors u and v define the same linear variety. and there are new triplets which altogether specify nine new amino acids. also called a K-vector space. 2005) our encoding functions may be used as a guide to understand the difference between a tinkered-together genome and an engineered one.

The linear varieties or affine subspaces. The concept of n-dimensional hypercube Let us consider a vector space of the form V = Kn . lines and planes in a geometry are. xn ). . 1. If we take v as the null vector 0. the k-dimensional for 2 < k < n. . en = (0. . . As it is well known. is the ordered set of vectors e1 = (1. the vectors are represented as n-tuples (x1 . are the equivalent classes of the equivalence relation RW . . and 2. 0). provided of a coordinate system.. . We say that 2 affine subspaces v + W and u + U are parallel if W is a subspace of U . the affine subspaces that contain the null vector. 0. also called points of the associated geometry. the affine subspaces of dimension 0. etc. n}.3. is the solution set of a linear system . . where K is a field. 0. if n is its dimension. defined as the set of all the ordered pairs (u.2. K being a field. . . . Concepts of point. such that u − v ∈ W. the set or family of all the affine subspaces or linear varieties contained in V . The points. .3. In particular if W = U . they form a base of the vector spaces. respectively. where the xi are elements of K.2. are called the coordinates of the vector. parallel planes. 0. having the same dimension.2. It is easy to show that these vectors are linearly independent and that they generate the space. are generically called hyperplanes of the geometry. . parallel cubes. . The associated geometry to a vector space. line. if the affine subspaces have the same associated subspaces. . The other affine subspaces. every n-dimensional K-vector space is isomorphic to the space Kn .2. According to this definition we can talk of parallel lines. that is. 0.236 Bulletin of Mathematical Biology (2007) 69: 215–243 difference vector u − v belongs to W. Thus.1. x2 .. Consequently. e2 = (0. The affine subspaces of dimension 1 are called the lines of the geometry and those of dimension 2 are called the planes of the geometry. the identity v + W = W is obtained. . that is. Definition A. Definition A. in fact. The elements xi . . Then we say that the space has been coordinatized. The only affine subspace of dimension n is the whole space V . We call the associated geometry of a K-vector space V . and plane in a geometry. . a line parallel to a plane. for i ∈ {1. they are parallel. Hence. or U is a subspace of W. . The canonical base of Kn . In this case. all the vector subspaces are also affine subspaces. or as some vector which belongs to W. The isomorphism may be defined by the matching of any of the bases of V with the canonical bases of Kn . 2. v ) ∈ V × V . . A. that is. From standard courses of linear algebra. . it is known that every affine subspace v + W.2. A. 0). 1. The concept of parallelism Definition A. all the unitary sets are affine subspaces of dimension zero. . for a fixed linear subspace W. 1) . in an n-dimensional vector space V . .

The vectorial coordinated lines are usually called coordinated axes and the vectorial coordinated planes simply coordinated planes. and all the edges have length 1. Definition A. the vector space Kn is called n-dimensional hypercube. . the system with the same left parts and the right members equal to zero. A. xn ). The zero-dimensional hyperfaces are the vertexes of the hypercube whereas the one-dimensional hyperfaces are the edges of it. .5. 1973). In Fig. it is also called the weight of the vector.4. one of whose vertexes is the null vector 0 = (0.1. i. the solution set of the associated homogeneous system. The concept of coordinated affine subspaces Definition A.Bulletin of Mathematical Biology (2007) 69: 215–243 237 of n equations with n unknowns. The name hypercube comes from analogy with the three-dimensional case.4. the six-dimensional hypercube of the 64 codons is illustrated. The assignment to every integer. The concept of norm or length of a vector in the spaces (Z p )n Let us consider a vector space of the form (Z p )n . i. is the so-called reduction module p. where r denotes the rank of the associated matrix.1. We will call norm or length of a vector v = (x1 . 0).2. It is also called an orthotope because the angle of two adjacent edges is a right angle. 1}.. A. x2 . the field of non-negative integers. which are remainders of the entire division by p. being the linear subspace. . The dimension of W is equal to n − r . The two-dimensional hyperfaces are simply the faces of the hypercube.e.. 5 of the main text. which is always a nonnegative integer. . In the case of the binary field (Z2 ). Definition A.5. In this case. its defect remainder in the entire division by p. The ndimensional hypercube (Z2 )n is a regular polytope (Coxeter. being p a prime number. In fact. the norm is simply the number of ones that appear in the n-tuple. In the particular case where K is the binary field Z2 = {0. e2 and e3 of the canonical bases. In a vector space Kn we will call coordinated affine subspace to every affine subspace whose associated subspace is generated by one or some vectors of the canonical base. coincide with the extremes of the vectors e1 .4. Definition A. a coordinated affine subspace is the solution set of a system whose associated matrix is diagonal. represent the points which are the vertexes of a cube or regular hexahedron. the 2 2 2 ordinary sum |v | = x1 + x2 + · · · + xn . A coordinated affine subspace of the hypercube is called a hyperface of the hypercube (Z2 )6 .e. 0. and the vertexes which are adjacent to it. of two elements.3. where the eight triplets of zeros and ones. . where (Z p ) denotes the Galois Field of p elements.1.

. w >= x1 y1 + x2 y2 + · · · + xn yn .. . Note that the norm or length of a vector coincides with the scalar product of the vector with itself. the following equality holds: |u + v | = |u| + |v | + 2 u. being the subtraction the inverse operation of the addition in the vector space (Z p )n . The concept of distance in the spaces (Z p )n . f (v )) = d(u. is called isometric if it preserves the distance between every two elements.e.238 Bulletin of Mathematical Biology (2007) 69: 215–243 In the more general case of a vector space (Z p )n . and geometrically. For every pair (u. A. . . widely used in coding theory and in criptology. Actually.. The concept of permutation transform or multiple rotation in the vector space V = (Z p )n As the only vectors of length 1 in the vector space are the canonical vectors e1 . . v ) of arbitrary vectors u and v . every linear isometry performs a permutation on the set {e1 . the affine coordinated lines are usually called edges.7. The matrix of a linear isometry with respect to the canonical base is a matrix obtained from the identity matrix by a permutation of its columns.6. y2 . It can be proved that the inner product and the length or norm are related in the following way. In general.6. . we can define the concept of scalar or inner product in the following way: Definition A. . xn ) and w = ( y1 . v ) for every pair (u. visualizing the space as a graph. the so-called hypercubes. v . x2 . the affine coordinated subspaces are called hyperfaces of the hypercube. . en }. yn ) the integer < v.2. the linear isometries will also be called permutation transforms. . We call scalar or inner product of the vectors v = (x1 . i. We define the distance between the vectors v and w as the norm of the difference vector v − w . . have the property that the inverse of any of them is equal to its transposed. .2. An isometric linear transformation will be called a linear isometry of the vector space. Definition A. . it means the minimal number of edges between the two vertexes represented by the n-tuples. A transformation f of the vector space V = (Z2 )n in itself. A. the Hamming distance is the number of places in which both vectors differ. It is easy to show that a linear isometry preserves the length or norm of every element of the space. e2 . if d( f (u). . . and the affine coordinated planes are called faces of the hypercube.1.6. In vector spaces of the type (Z2 )n . this distance is the so-called Hamming distance. This kind of matrices. called permutation matrices. Transformations that preserve the distance Definition A. e2 . en .5. In the case of the hypercube (Z2 )n . For this reason. . v ) ∈ V × V .

one to each other. we will denote as ti the translation associated to the vector ei of the canonical base. It means that the set T of all the translations of the vector space V is a group. It is clear that an affine transformation tv ◦ F is bijective. the dimension is preserved. whose dimension is less than or equal to that of v + W. we call a translation associated to the vector v .9. i. in fact. that is. associated to the vectors v and w . and so on. . an abelian group which is a subgroup of the symmetric group S(V ) of all the bijections of the set V with itself.7.9. to the bijective function tv : u → u + v . one to one. The concept of translation and affine transformations in a vector space Definition A. being the isomorphism. every linear isometry can be interpreted as a composition of local rotations.8. from V onto T . k ∈ {1. and only if. It is easy to prove that every affine transformation carries a linear variety v + W onto another linear variety. being R the field of real numbers. its linear component F is also bijective. Given a vector v of a vector space V . isomorphic to the additive group of V . j . . a linear isometry preserves the orthogonality of any pair of orthogonal vectors of the vector space.e. associated to the sum v + w of both vectors. but also the scalar product of any two vectors of the vector space. Special notation. This kind of base is usually called orthonormal base. where F is a linear transformation or linear endomorphism of the vector space and tv is the associated translation of a fixed vector v . . A. as ti j .2. from V to V . 2. if. If the affine transformation is bijective.. As the set (Z p )n can be viewed as a subset of the Rvector space Rn . The concept of orthogonality in a vector space (Z p )n Definition A. given by the function: v → tv . We say that two vectors v and w of the space (Z p )n are orthogonal or perpendicular.1.8. . ti jk as the translation associated to the vector ei + e j + ek. n}. For translations tv and tw .1. T is. Hence.9. if their scalar product is equal to zero. the translation associated to the vector ei + e j . Definition A. According to this definition the vectors of the canonical bases. in any vector space of the form (Z p )n are orthogonal. The linear isometries of the vector space (Z p )n will also be called permutation transforms or multiple rotations. We call affine transformation to every composed function of the form tv ◦ F . . In the hypercube (Z2 )n . of length 1. a multiple rotation in the vector space. a multiple rotation in (Z p )n can also be interpreted as a multiple rotation in Rn . the composition tv ◦ tw is the translation tv+w . and only in this case. It can be proved that a linear isometry preserves not only the distance between vectors.Bulletin of Mathematical Biology (2007) 69: 215–243 239 As every permutation is a composition of pairwise disjoint cycles. Definition A. additionally. where i .1. A. and they are.

for instance. to a distance h greater than 1. since A and G are both purines. that is. 0. since it consists in the change of only one base. 1) is changed to the vector (0. 1. 1. purine to purine or pyrimidine to pyrimidine. and G= 11. G} is called a mutation.10. and it is called a transversion when the change is from one class to another. by substitution of the base A by the base G. 0. 1. A= 01. is an affine subspace. 0. Some biological concepts and their mathematical representation A. with h greater than 1. 0. On the the other hand. when ei does not belong to W. it means that the vector (0. In our work. The substitution in a triplet of one or more of its three nucleotide bases within the set {C. This terminology is due to their chemical composition. A right hyperprism of height h. According to our representation of triplets as elements of the vector space (Z2 )6 . For instance. It follows.9. then the set u + v + W is also a k-dimensional hyperface. that is. the nucleotide A is replaced by the nucleotide G. U= 10. when u is orthogonal to W and has length h greater than 1. Definition A. Then. it is a transversion. associated to some vector v .10. then the purine A is changed to the pyrimidine U. Then. It can be proved that the union of the k-dimensional hyperfaces v + W and ei + v + W. it is a transition. a (k + 1)-dimensional hyperface is obtained. A. The union of the hyperfaces v + W and u + v + W.3. 0).1. y. The nucleotides A and G are called purines. if CUA is changed to CUU. by the translation t5 . A codon mutation is called a transition when it changes a base to a base of the same class. A right hyperprism is obtained from a hyperface by the adjunction of a translation of it. z.240 Bulletin of Mathematical Biology (2007) 69: 215–243 If v + W is a k-dimensional hyperface and u is any vector of the hypercube (Z2 )n . 1.2. 0. Transitions and transversions. as an element of the hypercube (Z2 )6 . 0. . that is. purine to pyrimidine or pyrimidine to purine. u. and the nucleotides C and U are called pyrimidines. 0. 1) and this transformation may be represented by the addition of the vector e5 = (0. associated to it. If the translation is to a distance equal to 1. is a (k + 1)-dimensional hyperface. v ) of zeros and ones. if the triplet CUA changes to the triplet CUG. every mutation may be represented by means of a translation tv . We say that the mutation is simple if the substitution involves only one base of the triplet. if the triplet CUA is changed for the triplet CUG. 0. Mutations Definition A. is called a right hyperprism of height h. we have assigned to every nucleotide a numerical pair in the following way: C= 00. but it is not a hyperface of the hypercube. Definition A. that every triplet XYZ is represented as a sextuple (x .10. In this particular case the mutation is simple. U. t . that is.

associated to the triplet. those associated to the canonical vectors e1 . the Hamming distance between a base and its complementary is always equal to 2. and t6 of even subindexes. and are represented by the numerical pairs 00 and 11. where the components X . u. Analogously. and t5 . the maximum of all the possible distances in the six-dimensional hypercube. When U is changed by T (thymine). Hence. t1 . are considered complementary bases. From the algebraic point of view. t13 . z.Bulletin of Mathematical Biology (2007) 69: 215–243 241 A. the triplet GGG. They also form a pair in the double helix structure. according to the addition operation illustrated in Table 2. 1). e3 and e5 . the pyrimidine C and the purine G. 1. the set of all the transitions is a group of order 8. represented by the translations t1 . . t35 and t135 being t0 the identity function. Complementary bases For biological reasons. t4 . that is. t3 . CAC and CCA (see Table 2). are obtained. and it consists in the substitution of every zero by ones and every one by zeros. 1. they have been represented by complementary numerical pairs 10 and 01. All the other mutations are transversions or compositions of transversions with transitions. the subgroup of pure transitions is a normal subgroup of T . y. For this reason. it means the addition of the triplet GGG to XYZ. The translation t M performs a transversion in each of the three components of the triplet. as additions of the triplets UCC. The mentioned function is the translation t M . respectively. as result. According to our numerical representation. defined by the vector M. v ). of the six vectors of the canonical base. t3 . Y. while the basic transversions are represented by the translations t2 . t5 . the associated to the vector M. which assigns to every triplet XYZ the triplet X Y Z . 1. for they differ in two of their components. This triplet is usually called the complementary codon or the anticodon of XYZ. Algebraic representations of transitions and transversions Notice that the basic transitions in the hypercube (Z2 )6 are represented by the translations t1 .12. In the vector space of the 64 triplets we can define a bijective function. t3 and t5 . Notice that the Hamming distance between a codon and its complementary is always equal to 6. the purine A and the pyrimidine U are considered to be complement of each other. and Z. respectively. and Z are the complements of X. A. and t6 . t . 1. are obtained by additions of the triplets ACC. one to each other. The sum of a triplet with its complementary gives. CUC and CCU. that is. and the identity function may be considered as a transition. t15 . the base A is paired with T in the double helix structure of DNA. subgroup of the group T of 64 possible mutations. The set of mutations which are transitions is closed under the composition. The eight transitions are represented by the translations: t0 . which is the sum e1 + e2 + e3 + e4 + e5 + e6 . which have odd subindexes. the addition to the vector (x . As T is an abelian group. The basic transitions. that is. whereas the basic transversions represented by the translations t2 . Y . that is.11. t4 . In the Boolean structure of the hypercube (Z2 )6 this transformation is the so-called Boolean negation. the vector M = (1.

M. Morgado. Codes without commas. Coxeter.V. Naturwissenschaften 64.. Baldick. Silva. 4454–4458.I. Moss. C. Dover Publication Inc. E. Comput. The RNA World.J. USA 43. J.. Grau. J. The hypercycle.. and time.. H. Evol. Life Evol. Flores. Cell 85.. Polyphyletic gene losses can bias backtrack characterizations of the cenancestor. Kenneth. D.R. J. Complementary coding to the primeval commaless code... Naturwissenschaften 64. The hypercube structure of the genetic code explains conservative and non-conservative amino acid substitutions in vivo and in vitro. F. E. A principle of natural self-organization.J. Natl. Govezensky. J. Math.. Gesteland. 1997. S. 173..R.E. 618. We also thank Rafael Camacho for helpful suggestions. 2004a. Schuster. ´ M. 29–46. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Mol. ˜ Gil. 1996. Sci. Genetic code. 115–118. Science 196.. Freeland. Petoukhov.. 801–809. Koonin. MATCH Commun. ¨ Konecny.. 1995. 66. W.J. S. J. USA 99. Part C: The realisitic hypercycle. 3rd edition.R... 2002. J. Hofacker. Gilbert. Griffith.H. 1978. J. Fraser. Jose.D... Winkler-Oswatitsch. The RNA World.. 2004..A. B.F.: Toward the minimal genome needed for symbiotic life. de la Mora-Basanez. Cline. R.S. J. 1986. Lazcano. Statistical Garc´ ıa. Cold Spring Harbor Laboratory Press. S. Part A: Emergence of the hypercycle. R.. Venter. ´ tiproject: Tecnolog´ ıas para la Universidad de la Informacion ´ UNAM.. Acad. 263–270. Mushegian. Math. C. 25. 1–13. P. T. J. A. Sabater-Munoz. 515–530. Peterson.. Proc. Leguina.. Biosph. 1985.L.. Evol. 38. Acad. the pre-RNA World..N.. S.. was financially supported by PAPIIT-IN216106. ´ Sanchez. Chem. The genetic code Boolean lattice. R. A. UNAM.. 288–293. 1995. A principle of natural self-organization. Sci. C.R. Alvarez.. Mol. He. ´ ˜ M.J... R. 1990.. 417–421. Crick. M. White. and by the Mul´ y la Computacion.C. 1161–1166. 1999. R. Keck. Physica A 342.J.C. P.. Lazcano..G.. B.D. S. 1977. 1998.. The origin of the genetic code. 793–798. References Becerra. USA 82.. Cech. Lazcano. Regular Polytopes. Jacob. 52.V.V. S. . The genetic code is one in a million. The search for missing links between self-replicating nucleic acids and the RNA world. The origin and early evolution of life: Prebiotic chemistry. Poschel.. L. Mexico. 1405–1421.C. Extreme genome reduction in Buchnera spp.. 1968. 1996. Ellington. 1995..F. Pattern analysis of 5S rRNA.O. Biol.. Crick. Miller. Santa Clara. Cell 61. M. 238–248. 1996. USA 93. New York. Cuba. A. 2165–2169. Nature 319. Bull. Biosystems 39.. Schuster. 367–379.. A. Global transposon mutagenesis and a minimal mycoplasma genome. Clarke. Hurst. A. S. M. Proc. Schoniger. Latorre. J. Eigen. Atkins... 1977. 1957. analysis of the distribution of amino acids in Borrelia burgdorferi genome under different genetic codes. Cellular evolution during the early Archean: What happened between the progenote and the cenancestor? Microbiologia SEM 11. 10268–10273. 45. Lindemann. Acad. J. Silva. C.. Eigen. ˜ ¨ Jimenez-Monta no. T. L. R.. Theor...... Gill. Islas. Science 286.S.. Biol. A. 2004. F. We thank Giselle Morgado Avin figures of the manuscript.. Sci. A. was supported by Universidad Central ”Marta Abreu” de ˜o ´ for preparing the Las Villas. Proc. Eigen.. Ricci.. Sci. H.E. 117–125.A. Bobadilla.R. 341–369. Orig. Proc.. Orgel. M. A. 541–565. Role of DNA replication in Vaccinia virus gene expression: A naked template is required for transcription of three late trans-activator genes. M.T. Natl.H. E. Natl.V. 47.A. Evolution and tinkering. L. O. P. Acad.. 2437–2441... Natl.. Hutchinson III. A. 1999. B. 1973. C. F. J..242 Bulletin of Mathematical Biology (2007) 69: 215–243 Acknowledgments M. Moya.M.H. Hamming distance and stochastic matrices. Mol. E.... T. S. The hypercycle.. Biol.. Smith..

Bull.codefun. Harper and Row.. The meaning of boolean deductions. Biol.. Corfus. Science.. pp. R. ´ Szathmary. 67. 7. http://www. issue 2. Woese.. 2005. In search of the simplest cell. vol.. ISNN 1109–9518. R. E. 1. ´ Sanchez.. Genetic code Boolean algebras.. 469–470.com/Images/genetic/Max/Sym300pi. 2004b. E. 190–197. Greece. Undated. Morgado. E. 2005. 433. M. Unpublished manuscript. C. R. Grau. Maximum symmetry in the genetic code: The Rafiki map. Math. The Genetic Code. R. New York. White. Morgado.pdf. 1–14. 1967. A genetic code boolean structure I. .Bulletin of Mathematical Biology (2007) 69: 215–243 243 ´ Sanchez. Ch. Grau... W-Seas Transactions of the International Conference on Biology and Biomedicine.

Sign up to vote on this title
UsefulNot useful