Vol. 259, No.

3, Issue of February

THEJOURNAL
OF BIOLOGICAL CHEMISTRY
0 1984 by The American Society of Biological Chemists, Inc

pp. 1695-1702,1984

Printed in U.S.A.

Complete Sequence of the Staphylococcal Gene Encoding
Protein A
A GENE EVOLVED THROUGHMULTIPLE DUPLICATIONS*
(Received for publication, August 4, 1983)

Mathias Uhlen$QlI, Bengt GussQ, Bjorn NilssonSTi,
Sten Gatenbeck$, Lennart
PhilipsonQII, and
Martin Lindberg$**
From the $Department of Biochemistry, Royal Institute of Technology, S-100 44 Stockholm, Swedenand the §Department of
Microbiology, University of Uppsala, The Biomedical Center, Box 581, S-75123 Uppsala, Sweden

The gene coding for proteinA from Staphylococcus gene for staphylococcal protein A inE. coli (10). This protein
aureus has been isolated by molecular cloning, and a interacts with the F, (constantpart of immunoglobulins)
subclone containing an 1.8-kilobase insert was found domain of several immunoglobulins from many species into give a functional protein A in Escherichia coli. The cluding man and hastherefore been used extensively for
complete nucleotide sequence of theinsert, including quantitative and qualitative immunological techniques (11).
thestructuralgeneandthe
5’ and 3‘ flanking se- Amino acid sequence analysis of proteinA revealed two
quences, has been determined. Starting from
a TTG functionally distinct regions of the molecule (7, 8). Both
initiatorcodon,anopenreadingframecomprising
regions have remarkably repetitive structures.
1527 nucleotidesgives a preprotein of509 amino acids
The NH2-terminal part contains four or five homologous
and a predicted M, = 58,703. The structural gene is IgG-binding units consisting of approximately 58 amino acids
flanked on both sides by palindromic structures fol- each. The COOH-terminal part which is thought to bind to
lowed bya stretch ofT residues, suggesting transcriptional termination signals. Thus, it appears that pro- the cellwall of Staphylococcus aureus consists of several
repeats of an octapeptide (Glu-Asp-Gly-Asn-Lys-Pro-Glytein A is translated froma monocistronic mRNA.
The sequence reveals extensive internal homologies LYS)(8).
In a previous report (lo), we determined the nucleotide
involving a 58-amino acid unit, responsible for IgG
sequence
of the promoter region, as well as theregion coding
binding, repeated 5 times and an 8-amino acid unit,
possibly responsible for bindingto the cell wall of S. for the NH2-terminal part of the protein. Here we report the
aureus, repeated 12 times. Comparisons between the complete nucleotide sequence of the protein A gene including
repeated regions showa marked preference forsilent the 5‘ and 3’ flanking regions from the S. aureus strain 83254. Thestructural gene is 1,527 nucleotides long giving a
mutations, indicating an evolutionary pressure to keep
the amino acid sequence preserved. The structure of preprotein consisting of 509 amino acids and a M, = 58,703.
the gene alsosuggests how the gene hasevolved.
The repetitive structure of the gene has been clarified which
suggests how the gene has evolved.
Evolution by gene duplication is a well known phenomenon
among eukaryotic genes. The globin clusters, the immunoglobulins, and theinterferon genes probably all have ancestral
genes which have been duplicated and then diverged into
functionally distinct genes (1). Examples of internally, repetitive sequences have also been reported; rabbit skeletal tropomysin contains a 7-residue amino acid periodicity throughout the molecule (2), andsimilar repeats have been reported
for chicken fibronectin (3) and mammalian serum albumin
(4). Among prokaryotes, most reports of duplicated genes
have involved in vitro constructions (5), which seem to be
stable inEscherichia coli, but dramatically unstablein Bacillus
subtilis (6). However, the amino acid sequences of a few cell
wall-bound proteins from Gram-positive bacteria have revealed remarkable periodicity, i.e. staphylococcal protein A
(7,8) andstreptococcal M protein (9).
We have earlier reported on the molecular cloning of the

* The costs of publication of this article were defrayed in part by
the payment of page charges. This article must therefore be hereby
marked “advertisement” in accordance with 18 U.S.C. Section 1734
solely to indicate this fact.
IT Supported by grants from the Swedish National Board for Tech.
nical Development.
11 Present address, European Molecular Biology Laboratory, Hei.
delberg, Federal Republic of Germany.
** Supported by grants from the Swedish Medical Research Council and Pharmacia Fine Chemicals, Uppsala.

EXPERIMENTALPROCEDURES

Bacterial Strains and Plasmids-E. coli strains HBlOl (12) and
GM161 (13) were used as bacterial hosts. The plasmid vectors were
pBR322 (14),pTR262 (15), and pEMBL9 (16).
DNA Preparations-Plasmid DNA was prepared by the alkaline
as
extraction method (17). Transformation of E. coliwasmade
described by Morrison (18). Restriction endonucleases, T4 DNA
ligase (New England Biolabs), alkaline phosphatase, and T4polynucleotide kinase (Boehringer-Mannheim) were used according to the
suppliers’ recommendations.
Isolation of the 2.15-kilobase DNAfragment containing the entire
protein A gene was made by digesting the plasmid pSPA3 (10) with
EcoRV. The digested material was electrophoresed on a 5% polyacrylamide gel, and the 2.15-kilobase fragment was eluted electrophoretically. The isolated fragment was passed over an anion exchange
column, eluted, and precipitated with ethanol. The precipitated material was washed in 80% ethanol, dried, resuspended in water, and
used for DNA sequence analyses.
DNA Sequencing Determinutions-DNA fragments were sequenced by the method of Maxam and Gilbert (19) or Sanger et al.
(20). The samples were analyzed on 6, 8, and 20% denaturing polyacrylamide gels using the thermostatic LKB Macrophor system.
Computer Anulysis-All the sequencing analyses were performed
on a Hewlett-Packard desktop computer (HP-85) equipped with a
HP7225A plotter. The software was constructed by M. Uhlen.
RESULTSANDDiSCUSSION

DNA Sequence-We have earlier reported that theprotein
A gene from S. aureus strain 8325-4 is located ona 1.8kilobase insert of staphylococcal DNA cloned in the plasmid

1695

J. there is an open reading frame of 1.subtilis 16 S rRNA.. Sou3 Amino Acid Sequence-The amino acid sequence deduced Rea1 from the DNA sequence as well as amino acids that differ in EcoRI I the partial protein sequence established in Sjodahl (27) are FIG. Both palindromes arefollowed by a T-rich stretch of residues (TTTATTTT). in press. The amino downstream from the EcoRV site on the originalplasmid pSPAl (10)was determined using both methods (19. M. Note that the previouslypublishedsequence due tostrain differences and proteolyticdigestion during at Lofdahl et al. the amino acid the 3' endof the gene.20).752. . U h l h .A-D are IgG-binding regions. the entire insertwas Amino Acid Composition-Attempts to determine the prosequenced according to the method of Maxam and Gilbert (19). Although we ToqI C. Third. and therefore additional sequencing ylococcal cell walls with lysostaphin using the enzymatic method (20. Hellman. As no palindromic order to compare the sequences deduced from the DNA sesequence indicating transcription termination was found in quence with those obtained experimentally. (1984) Eur.711 thus gives a mature protein A of 473 amino acids and a in S.8 kilobase TuqI..2. (8). coli in which it is pSPA8 fl very rare (22). Lindberg.In addition. although the first -35 sequence shows relatively poor complementarity (only three out of six) with TTGACA. do not have any experimental data to show where the tranEcoRV scription of the protein A mRNA starts or terminates. including the putative signal peptide. aureus. Two upstream overlapping promoter sequences similar to the consensus sequences (TTGACA and TATAAT) of prokaryotes (26) havebeen indicated in Fig. there are - 9 Guss. resulting Using the strategy outlinedin Fig. Since the map of the corresponding DNA sequence. First. there are several reasons to postulate this. (10) lacks one of the three thymidines position isolation of the protein. also similar to other Gram-positive genes (24.703. C. TTGa common is start codon in Gram-positive bacteria (21). 8 out of 11 nucleotides are complementary to the 3' end of B. = 52. A. 2C. this codon is preceded by a possible ShineDalgarno sequence (23) that has many features in common with other Gram-positive ribosomal binding sequences (24). Thepartialamino acid pBR322 (21). B. and Sjodahl. in Fig. of protein A which lacks IgG-binding activity. unpublished results.. A. as deduced from the DNA sequence. J. 4. J. FIG. According to Sjodahl (27) and Lind183-185. are tabulated in TableI. coli.527 nucleotides terminating ina TAG stop codon at nucleotide 1. sert. Boxes show the positions of the replication origin (OR0 and thegenes coding forprotein A (PROT the start codon is sevennucleotides.' The amino acid numbering starts with the alanine at and the 5' end of the structural gene has been reported (10) nucleotide 292 which has been shown to be the first amino as well as the sequence of therepetitive region X which acid of the mature protein A. U.\ (a few basic residues followed by a stretch of 23 hydrophobic residues). Expression of the gene was sequence although about 10% of the amino acids are differdemonstrated in E. Structure of plasmid pSPA8 with relevant restric. a high degree of homology exists and only 4 out of All these E is a region homologous to A-D. B. the sequence a few hundred nucleotides compositions of different parts of the protein.Theseareindicatedin Fig.. 3. and C. A ) and p-lactamase (AMP).' M .1696 DNA Sequence of Staphylococcal Protein A Starting from a TTG codon at nucleotide 184. this startcodon gives a putative signal Pet I peptide with a reasonable size (36 amino acids) and structure \ . 1.Theplasmid was designatedpSPA8andis to thededuced sequence of region X also shows high similarity shown schematically in Fig. only a few amino acids NHZ-terminal mark et al. similar to other Gram-positive genes (25). B. S is a signal sequence. 16) was performed in order Afrom mutant bacteria which secrete the product (8). B. 3. sequencing strategy of DNA sequence was obtained from strain 8325-4 andthe the 1. Among the IgG-binding regions D. Biochem. 3.711. The preprotein. protein sequencefrom strain Cowan I. Restriction map and sequencing strategy of the inalso indicated inFig. tion sites. 1. partial restriction changes canbe explained by single point mutations. consists of 509 amino acids giving a M . schematic drawing of the gene coding for protein A with its different regions. = 58.8-kilobase insert. it thus Bcl I PstI appears likely that protein A is translated from a monocisHlndI I I tronic mRNA. aureus are also presented in Table I. thespace between the lastG in thissequence and EcoRV insert in the plasmid pBR322. M. The protein A gene is contained in a 1. 25). 3. The gene is both preceded and followed by palindromic sequences indicating transcriptionterminations.' The stop codon at nucleotide probably is responsible for thecell wall binding of the protein 1.. In to confirm the sequence in these parts. the divergence is probablydue tostrain variation. andthe B= 0 1 kb possible mRNA hairpin structures that can beformed are schematically drawn in Fig. A direct comparison of of structures from deduced and purified proteins is difficult. Sjoquist. The acid compositions of purified protein A from different strains complete nucleotide sequence of the protein A gene is shown of S. and X is the COOH-terminal part the 235 amino acids comprising all four regions vary.. Nilsson. It was not possible to obtain sequence on both strands tein sequence of protein A have involved digestion of staph(28) or analyzing protein in all parts of the gene. Although we have not shown that the codon at nucleotide 184 is the translational start.Second. unlike E. The sequence of the promoter region ent.

l 3 4 3 B 25 a > si2 uc am aa a > e u e J u c 8e Xo P U L U L W Y ern as am aa O > c u e J u r uc E %S c L a z I-+ u u e L t L 1697 .0 U L UL am a z a J -u u ^ e d "a- m o v c u as u c am ua am a > ad e n am oa e m u- na a3 e u r i am a > a 1 uc am au e no a mrr oac .DNA Sequence of Staphylococcal Protein A * 1' 1 .a 1o U U " m m a L e m um a r u eaam a r a J u n am Eu -% oa ? L U T a t C L U L ac u > 0 0 0 u c am aa am uma am a > a J 4 y7 u n am ma am a > a J I-> cu U J cL wam u ac e- a- :: +L u > oO W u c am aa e o m i ua 3.

. . the DNA sequence and its deduced amino acid segenes. it appears likely that the secreted form of protein A from strain A676 does contain region E.. At present. amino acids 1366. the plasmid-coded genes have a marked preference of the protein or if it reflects genomic differences. UUC (Phe). similar to theoverall GC content difference in size and amino acid composition is due to proof the Bacillus species involved.. Homology Plot Analysis-In order to search for homologous Codon Usage-The codon usage for the preprotein of protein A is compared in Table I1 with other Gram-positive regions. size of the deduced protein from 8325-4 is larger than two Table I1 shows that among the chromosomal genes the independent determinationsof the protein from Cowan I even codon usage is randomly distributed. 'From Lindmark et al..DNA Sequence of Staphylococcal Protein A 1698 TABLE I A. . to exhibit aclear preference for third position A/U bases with In contrast. according to the theory. The amino acid composition. terminal sequence of this protein (8) fits well with the NH2. 3. Their hypothesis predicts that efficient in-phase translation is faof region D in protein A isolated from cell walls of Cowan I. -3 ' C -T. . of a mature protein A lacking 107 amino is 32%. respectively. In teolysis both in the NH2-terminal andCOOH-terminal parts contrast. e From Movitz (2). it is unclear if this degenerate third base is 42%.are the codon pairs which. cilitated by proper choice of degenerate codewords. as deduced from can be found. aureus from different strains of A-T 6-C T-A C-G C C T-A CT-A A 5' f5 I ~ T -A T T T-A. reported for A676 (8). Table I shows that the imal codon-anticodon interaction energy. aureus which is 30-33% (34). 3. and only 21% C (18/85) amino acids. 3. selection for C is DNA sequence starting at nucleotide 292 in Fig.. . (8). and AGC (Ser). and the However.Two of these exceptions can be explained by the Grosjean terminus of protein A from strain 8325-4 when determined and Fiers (32) hypothesis. However. . The size preferred. Total a genes and plasmid-coded genes by the four putative proteins encoded by the staphylococcal plasmid vector pC194 (26).. . . dMature protein A except COOH-terminal part. . The numbers to Grosjean and Fiers (33). similar to theGC content of chromosomal DNA from acids in the COOH-terminal part shows good agreement with S. The GC content at the thirdbase of the codons the DNA sequence. which is 42-47% (34)..3 8 + .ATCATCT/" " TTTATTTTAC. The per cent G/C of the if region E is omitted (A-E). . the DNA sequence does not adapting to theoverall GC content of the host cell with some contain the COOH-terminal -Val-Ala-Lys which has been exceptions. 4. isolated by lysostaphin treatment of bacteria. extracellular protein A produced by a methicillin-resistant strain. AAC (Asn). the codon usage the composition of purified protein A from strain A676 as of the proteinA gene shows a preference for A/U bases shown in Table I. amino acids 1-473 in Fig. (8). 8 From Lindmark et al. - 851 Amino acids Deduced protein A from 8325-4 Purified protein A Prot-A" Mat-Ab A-E' A-Xd Cowan I' Cowan I' A67W - ~ TTTATTTTAT . Furthermore. Hypothetical secondary structures at the 5' and 3' Also indicated by or . Although the repetitive nature of the protein Agene makes statistical analysis risky. among the four codon both by Edman degradation of the purified protein' and by pairs in which.codon pairs marked in Table I1 are most dependent on maxtained due to a blocked terminus (27). Every point in 5 I . . The protein A gene of Cowan I has recently been cloned in our laboratory. not preferred. this nucleotide is indeed chosen 64% of the time of protein A from A676 would then indicate that the protein (67/105). FIG. The NH2.a few exceptions. only 22% G/C. amino acids 57-473. . according regions flanking theprotein A coding sequence. by highly expressed genes. the exact NHAerminal sequence could not be ob. Amino acid composition of deduced protein A gene or purified protein S. it seems which will help to clarify this point. for A/U bases. /T A-T C-G A-T A-T C -G G-C A-T G-CA C-G A-T C -G G-C T-A A-T A-T A-r 69 Lysine Histidine 7 Arginine 6 Aspartic acid 105 10 Threonine Serine 22 25 78 Glutamic acid 31 Proline 33 Glycine 42 Alanine 15 Valine Methionine 6 10 13 14 18 Isoleucine Leucine 31 36 41 9 Tyrosine 14 1412 Phenylalanine 65 7 5 103 85 7 78 30 28 18 38 12 6 62 6 4 91 7 17 18 67 2727 26 3131 10 5 27 8 7 14 13 45 3 5 2 20 6870 2426 3136 4 3 28 29 5 12 52 4 5 82 5 65 27 30 22 34 5 2 911 27 5 12 53 4 4 83 6 16 48 3 4 82 4 16 64 30 8 3 12 7 3 4 4 473 417 366 381 366 395 509 Protein A including the signal peptide. In contrast. Chromosomal genes are represented by four Bacillus quence were scanned by a computer program. .TTAAGCC ' B. the four codon pairs with predicted is truncated at theCOOH-terminal lacking approximately 80 selection for U show a reversed ratio. Mature protein A except region E. Therefore. . T T. mainly following the Grosjean-Fiers (33) rules for highly expressed genes. isolated by lysostaphin treatment of bacteria. are most likely to be preferred or refer to nucleotides in Fig. * Mature protein A.

e Per cent G/C in the third degenerate base. B . Structure of IgG-bindingRegions-The IgG-binding regions of protein A have been defined by trypsin cleavage of the mature proteininto functional IgG-binding units D. A. e Four putative proteins of pC194 (32). 6 are the sequences flanking the repetitive regions. These results strongly support the previously suggested hypothesis (27) of an evolutionary pressure in these regions keeping the amino acid sequence preserved. 5A are more broken than those in Fig. although the lastfive amino acids of region C’ (292296) are changed compared to region B’. and C (7. 6 is . The same holds for the other end of the repetitive region located in the beginning of region E’. the part of the gene coding for the signal peptide (S) as well as the promoter region (5’) seems to be totally unrelated to theIgGbinding regions (E. the boundary of these regions has been moved 15 nucleotides towards the 3’ end of the gene. licheniforrnis penicillinase (31). 5. the homology plots represents an identical residue (1). _ Prep 9 1 6 1 Tyr UAU UAC Term UAA UAG His CAU CAC Gin 33CAA CAG Asn AAU 31AAC Lys AAA AAG Asp GAU 35GAC Glu 59GAA GAG UGU Cys UGC Term UGA Trp UGG Arg CGU CGC CGA CGG Ser AGU 17 AGC Arg AGA AGG GGU Gly 36GGC GGA GGG Sum Per cent G/c‘ 29 9 8 1 0 0 6 1 38 2 20 45 51 18 21 19 37 1 0 0 0 0 3 3 0 0 3 12 0 0 18 14 1 0 49 33 27 8 46 20 17 1 16 6 43 12 56 12 22 5 19 10 7 4 9 4 1 3 0 13 3 11 4 11 2 3 3 509 32 1654 42 655 22 35 68 79 26 81 35 2 2 - 35 18 5 10 9 19 11 14 22 - Protein A including the signal peptide (preprotein). The eight codon pairs which aremost likely to be preferred (+) or not preferred (-) by highly expressedgenes (331. and AUA (Ile) are omitted. B and C ) located in the middle of the gene. A . and B. more than half of the nucleotides (8/15) are homologous. This choice is of course arbitrary as the5’ end and the 3‘ end of the repetitive region have diverged slightly. Recently. which means that many of the nucleotide changes between the codons in the homologous regions have occurred in bases giving no amino acid change. Although the first three amino acids are different from region D’. In Fig. 6 the sequence of the regions are aligned to enable comparisons. 5B.1699 DNA Sequence of Staphylococcal Protein A TABLE I1 Phe UUU uuc Leu UUA UUG cuu CUC CUA CUG Ile AUU AUC AUA Met AUG Val GUU GUC GUA GUG Ser UCU ucc Pro UCA UCG CCU ccc CCA CCG Thr ACU ACC ACA ACG Ala GCU GCC GCA GCG Prot-A” Chromb Plasmid‘ 2 12 20 5 7 1 6 2 8 9 1 6 5 2 6 2 5 0 3 2 21 45 20 34 22 31 7 3 31 38 30 12 29 21 21 21 30 20 21 31 22 16 11 11 25 13 16 48 45 29 36 40 38 39 11 35 13 10 4 5 4 27 5 18 12 12 1 14 4 16 1 7 4 10 5 3 1 14 4 15 5 0 8 2 5 1 4 0 25 1 11 5 . subtilis SpoOF (30). There exists a nine-nucleotide insertion in region E’ giving three amino acid residues (59-61) not homologous to the otherregions. flanked by unique sequences without homology in the 5’ and the 3’ ends of the structural gene. A changed nucleotide compared to region B’ in Fig. As already pointed out in the homology analysis (Fig. A and B ) these regions seem to be nonhomologous to the IgG-binding regions. Comparisons between the plots show that the homology lines in Fig. The nucleotide triplets and thededuced amino acids are compared in Fig. B. which disappear when no homology exists. UGG (Trp). amyloliquefaciens a-amylase (25). subtilis a-amylase (29). a line of identity occurs from the left upper corner to theright lower corner. The cleavage points for trypsin are marked with arrows. The sum of four Bacillus chromosomal genes. The plots reveal two structurally distinct regions with internal homology. As the start codons are yet to be identified. The codons AUG (Met). we showed (10)that strain 8325-4 also contains a fifth region E homologous to the four repetitive regions earlier identified by protein sequencing. 27). the total open reading frames are taken into account. and homologous repeats show up as parallel lines. Also shown in Fig. indicating a relationship. A and 8. D. However. B. Thus. five out of nine nucleotides are identical. 5. The partof the gene coding for the COOH-terminal part of region X as well as the 3’ flanking sequence seems to be unrelated to both the repetitious region X and the IgGbinding regions.respectively. As the sequence is compared with itself. In order to achieve maximal homology. B.

The sequences of the repetitive regions have been aligned to achieve maximal homology. Table I11 summarizes the amino IV the codon changes between the regions. 6. Dot matrix comparisons of the protein A sequence. marked with an asterisk. Each dot represents the center of a three-base identity. and direct repeats appear asparallel lines across the grid. a homology gradient will evolve.e. one interpretation of thisphenomenonisthatthe primordial structural gene coding for the IgG-binding part of protein A has been subjected to stepwise gene duplications involving only oneregion followed by a period in which point mutations have occurred. Comparisons of the IgG-binding regions and flanking regions. and a nucleotide is marked with a n asterisk and an aminoacid is underlined when different from the B' region. A. The comparison is based on region B'. thus generating slightly dissimilar nucleotide and amino acid sequences. i. the deduced amino acid sequence compared with itself. REGION C' FIG. 5' 5' 3' S E D A B FIG. and a changed amino acid is underacid changes and Table lined. As a result of these evolutionary events. 5. the entire nucleotide sequence and the immediate 5' and 3' flanking sequences are compared with itself. A comparison of the five regions with respect to mutual relationship reveals a pronounced "homology gradient" along the protein molecule. R. The fact thatcodons acids (Table (Table IV) have changed much faster than amino 111) indicates that anevolutionary pressure exists tokeep the . The cleavage points for trypsin are marked with arrows. As already pointedout by Sjodahl (27).DNA Sequence of Staphylococcal ProteinA 1700 B.the higher the degree of homology. A. the closer the location of two regions.

As discussed above. the corresponding residues are 183-192 and 198-211. D’. 3. 6. 192 (Leu). and a changed aminoacid is underlined. a changed nucleotide is marked with an asterisk. In region B’. 7.The numbers refer to the amino acids in Fig. As seen in Fig. Acomparison of the 12 repeated units reveals striking homologies. The 3’ end of the repetitive region is obviously located at amino acid 392 (see Fig. As seen in Fig. Structural studies based on the cleavage with trypsin (7. The six first amino acids (Lys-Pro-Gly-Lys-Glu- . This pressure is evenmore pronounced when comparing the residues in these a-helices that interact with IgG. suggesting an evolutionary pressure to keep these residues intact. 206-207 (Ile-Glu).) followed by a constant region coding for 81 amino acids (Xc). Structuralstudies of protein A have suggested that 11 amino acids of the IgG-bindingregions are essentialfor binding to theF. there are striking homologies in these two a-helices between the different regions. there is a strong pressure tokeep these amino acids preserved. giving the amino acids 5961. Region E’ D’ A’ B’ E’ D‘ 0 31 A’ B‘ C‘ 25 26 36 31 0 21 25 28 2536 2128 0 14 30 26 25 14 0 20 C’ Total 30 20 0 118 105 101 86 115 amino acidsequencepreserved. part of the immunoglobulins (35). 6. Apart from the mutual homology between the five regions. a u r e u s Cowan I and 8325-4. The cleavage point for trypsin which defines region X (7. 203 (Asn). the 24-nucleotide repeats are aligned and a mutual comparison was performed. S t r u c t u r e of Region X-The repetitive nature of region X is indicated as multiple lines in Fig. 7). which has been observed in protein A both fromS. not clearly defined since the 12 last nucleotides. The changes observed are often outside the two helical areas. there also seem to exist internalhomologies in each region as revealed by traces of lines in Fig. however. thereis a serine insteadof aspargine at position 70. Since region C’ terminates at amino acid 296. The comparison is based on region XI. In region B’. Since the number of total changes of codons is lowest for region B’ (Table IV). A and B . but this region has obviously diverged in the COOH-terminal end. 5. The boundarybetween region C’ and region X is. and an altered nucleotide is marked with an asterisk and an altered amino acid is underlined. are identical with the corresponding amino acids of region X1 (Fig. this region was chosen for the comparison in Fig. in regions E’. region. 7. Mostof these amino acids are assumed tobe located in two a-helical regions (35). the changed His-Leu. A and B. to Asn-Met. the repetitive part of region X consists of exactly 12 units each with a length of 24 nucleotides. generating a few amino acids identical with region XI. A comparison nucleotide by nucleotide reveals that 14 out of 18 bases are identical between these two regions. 7. and A’. for instance. 20) is immediately before amino acid 292 (Glu). these amino acids are 184-186 (Gln-Gln-Asn). Comparison of the repetitive units of region X and flanking regions. 7) which is directly followed by the constant 209 C‘ 2’37 x1 305 x2 313 x3 32 1 x4 329 x5 337 X6 345 x7 353 X8 361 x9 369 x10 377 x11 385 x12 3’) 3 FIG. 5. coding forthe last four amino acids of region C‘. Another subregion of interest is the nine-nucleotide insert.Again. 188-189 (Phe-Tyr). 6. giving an approximately 300-base pair repetitive region (X. The sequences of the repetitive region have been aligned to achieve maximal homology.5 times.In Fig. structurally the octapeptideof region X seems tobe repeated 12. 20) have suggested that region X starts at amino acid 292 which differs five amino acids from the boundary chosen in Fig. and 210 (Lys). Region E’ D‘ A’ E‘ D’ A’ B‘ C‘ 0 11 14 11 0 12 14 21 11 17 12 7 0 5 15 7 B‘ C’ Total 57 46 11 5 0 10 21 17 15 10 0 40 41 64 TABLE IV Comparison of codons of the ZgG-binding regions The values listedrepresentthenumber of changednucleotide triplets of identically positioned codons when the regions are campared in pairs. Clearly. the end of region C’ is probably related to the otherIgG-binding regions.1701 DNA Sequence of Staphylococcal ProteinA TABLE 111 Comparison of amino acids of the ZgG-binding regions The values listed represent the number of changed amino acids of identically positioned residueswhen the regions are compared in pairs. This subregion (residues 57-62) is possibly related to other regionslike amino acids 4-9 in the beginning of region E’. Therefore. Hence. but all the other 49 residues are identical. the nucleotide sequence coding for amino acids 179 (Lys) to 188 (Phe) and196 (AAC)to 205 (Phe) all withinregion B contains 24 identical outof 30 nucleotides. at position 193-194 of region B’.

Boca Raton. the codon coding for the first lysine is changed periodically (1977) Gene (Amst.. (1980) Gene (Amst. Neugebauer. R. The two last 7. 80.. S. M. Fasman. (1982) Semin. G. B. 11. Sci. A. M. 2577-2588 32.401-410 sequence. R. U h l h .. pp.M. Moran.800-804 27. 74. How this evolved at 20.. J..10071013 26.. Dis.) 12.10. Bolivar. (1981) Gene (Amst.) 2. M. Grosjean. 4 7 4 5 signs of a 48-nucleotide repeat. Beachey. P. U.157-252 12. J. Kawamura. (1982) Gene (Amst.H. 4... andYamada. 258. and Philipson. Fishetti. A. Vol. Shimotsu.H. but the nucleotide sequence of Acad. Movitz. all occurring in a wobble position and therefore representing 11. and Manjula. M. Natl. R. 150. L. the evolution of the repetitive partof region 18. 347353 Biol.) 302. Palva. FL 35. E. Asn. 2.. H. L. John Sjoquist for critical comments and advice. A.and Fiers. C. B.. 658-662 31. and Gilbert. Crosa.. A. J. I. 815825 33. (1977) Eur. Sullivan. L.M. A... H.237-249 30. 697-701 ing thesix conserved amino acids in the12 X. Genet. and Lindberg. 199-209 34. Johnson. I. (1983) Proc. 2361-2370 . and Kaariiiinen. and Falkow. 11. M. Riedel. J.) 23. Langone.. Sci. S. L. Sjodahl. (1979) Methods Enzymol. thewobble base A/G in 14. T. Nilsson. J. T. 80. Greene. Maxam. S. Acad. Hirano.. X probably involved stepwise gene duplications of an ances. may help in 22.775-782 Asp) are identical throughout the X. 68.95-113 in regions X7 to X12. Acknowledgments-We are grateful 50 Dr. Seyer..369-378 for proteins with similar repeated structures.623-628 of this extremely conserved octapeptide is not known. T. 74. U. Jeffreys. Swanberg.. J.. the molecular level is unclear.. H. Philipson. W. L. Pettersson. (1981) Nucleic Acids Res. N. Although the biological function 74. W. Hartley. and Weisblum. Yang. Biol. Inc. Academic Press..19. 343-351 amino acids are changed in a regular pattern between Asn8.. U. 32. 6). Movitz. A. and Coulson... M. (1983) Nucleic Acids Res. 34-38 duplications. Chem. gradient throughout the region. (1983) Proc. 471-490 28. 1-45 resolving the molecular events causingstepwise multiple DNA 23. Mol. H.. B. L. (1977) Proc. N. Biochem. K.. Bacteriol. A. C. 68.. (1981) J. G. G. (1981) in Genetic Engineering (Williamson. Acad. Betlach. and Henner. (1983) Proc. A. U. Shine. (1983) J. Y.5463-5467 the protein A gene from other strains. (1976) Eur. J.. and Roulland-Dussoix. R. J.. 80... Zmmunol. D. Kalkkinen. silent mutations. J. Natl.. I. We also thank Dr.) 13. Nicklen. (1983) Gene (Amst. M. Birnboim. Sci. M. M.(1983) Microbiol. Biochem. Pastan. V. 24. or Asn-Lys. (1977) Proc. (ed) (1976) CRC Handbook of Biochemistry and Molecular Biology: Nucleic Acids Section 3rd Ed. (1973) Mol.DNA Sequence of Staphylococcal Protein A 1702 6. REFERENCES 1. and Gregori. J. Rodriquez. L... Biochem.. B. Kobayashi. H. (1982) Semin.. In conclusion. Hence. R. S..(1979) J. 69-183. 15131523 24-nucleotide sequence. D. Horinouchi. Sanger.) 18.. Natl.. Acad. and between Asn and Gly in regions 5 to 10 (seeFig. Sci. 78. (1969) J. S. and Saito. there arealso 13. C. aswell as genes coding 21. 1 2 7 . 9. Galizzi. Poteete.. and aminoacid 7 is changed periodically 15. J. 256. K. H. 73. and Schaller.46-50 4. de Crombrugghe. Boyer. ed) Vol. Roberts. (1982) J. Guss.. 411-418 3. Acad.11283-11291 25. and Dalgarno. W. M. J. Gen.. Andras Gaal for introducing us to the thermostatic LKBMacrophor system and Dr. Chem. Dis. R. Boyer. 139. pp. and Losick. R. (1977) Eur. Natl. A. S. Deisenhofer. H. A . F. R. 1-48. W. I. A. B. and Rabinowitz. 7. Biol.(1982) Adu. Natl. C. Stephen Fahnestock for a correction of the nucleotide sequence. 41. and Sjoquist.. 4. Kozak.. Morrison. J. F. Heyneker. C. Tanaka. B. We thank Hans-Olof Pettersson and Bjorn Jansson for skillful technical assistance and ChristinaPellettieri and Gerd Benson for patient secretarial help.. W.)254. (1975) Nature (Lord. H. CRC Press. D. F. J . clearly 9.. Sjodahl. S. Yamada. (1981) Biochemistry 20.560-564 tral 24.UhlBn. C... (1981) Proc. S. Sci. J. Xr although the gradient must 11. M. (1979) Nucleic Acids Res. 291-299 29. (1983) Nucleic Acids Res. S. J. Lofdahl.. J. P. F. Reu. D. H.326-331 A. Sci. 7657-7661 5. Bacteriol. L. Acad. Infect. there has been a strong pressure to preserve its amino acid Infect. A. J . S. Takkinen. 47. H. Lindmark.. S. L. Biochm. A. region. Gatenbeck. McLaughlin. Guss.. I. K. 123-127 There also seems to besomeevidencefor a homology 16. and Kang. Soderlund. 12 nucleotides have changed when compar. Bachman. Gly-Asn. K. S. s.1645-1655 be based on a 48-nucleotide repeat rather than the primordial 17. Murray.. and Doly. 459-472 Apart from the distinct24-nucleotide repeat. R. Sprengel.. Natl. Marinus. 78.... U. L. Thus. Y. (1983) Nature (Lond. compartments. U. Cesareni. Ohno. Lindberg. (1977) Eur.....or 48-nucleotide long sequence. New York 2. Y. Dente.. and Cortese. H.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.