Vol. 259, No.
3, Issue of February
OF BIOLOGICAL CHEMISTRY
0 1984 by The American Society of Biological Chemists, Inc
Printed in U.S.A.
Complete Sequence of the Staphylococcal Gene Encoding
A GENE EVOLVED THROUGHMULTIPLE DUPLICATIONS*
(Received for publication, August 4, 1983)
Mathias Uhlen$QlI, Bengt GussQ, Bjorn NilssonSTi,
Sten Gatenbeck$, Lennart
From the $Department of Biochemistry, Royal Institute of Technology, S-100 44 Stockholm, Swedenand the §Department of
Microbiology, University of Uppsala, The Biomedical Center, Box 581, S-75123 Uppsala, Sweden
The gene coding for proteinA from Staphylococcus gene for staphylococcal protein A inE. coli (10). This protein
aureus has been isolated by molecular cloning, and a interacts with the F, (constantpart of immunoglobulins)
subclone containing an 1.8-kilobase insert was found domain of several immunoglobulins from many species into give a functional protein A in Escherichia coli. The cluding man and hastherefore been used extensively for
complete nucleotide sequence of theinsert, including quantitative and qualitative immunological techniques (11).
5’ and 3‘ flanking se- Amino acid sequence analysis of proteinA revealed two
quences, has been determined. Starting from
a TTG functionally distinct regions of the molecule (7, 8). Both
regions have remarkably repetitive structures.
1527 nucleotidesgives a preprotein of509 amino acids
The NH2-terminal part contains four or five homologous
and a predicted M, = 58,703. The structural gene is IgG-binding units consisting of approximately 58 amino acids
flanked on both sides by palindromic structures fol- each. The COOH-terminal part which is thought to bind to
lowed bya stretch ofT residues, suggesting transcriptional termination signals. Thus, it appears that pro- the cellwall of Staphylococcus aureus consists of several
repeats of an octapeptide (Glu-Asp-Gly-Asn-Lys-Pro-Glytein A is translated froma monocistronic mRNA.
The sequence reveals extensive internal homologies LYS)(8).
In a previous report (lo), we determined the nucleotide
involving a 58-amino acid unit, responsible for IgG
of the promoter region, as well as theregion coding
binding, repeated 5 times and an 8-amino acid unit,
possibly responsible for bindingto the cell wall of S. for the NH2-terminal part of the protein. Here we report the
aureus, repeated 12 times. Comparisons between the complete nucleotide sequence of the protein A gene including
repeated regions showa marked preference forsilent the 5‘ and 3’ flanking regions from the S. aureus strain 83254. Thestructural gene is 1,527 nucleotides long giving a
mutations, indicating an evolutionary pressure to keep
the amino acid sequence preserved. The structure of preprotein consisting of 509 amino acids and a M, = 58,703.
the gene alsosuggests how the gene hasevolved.
The repetitive structure of the gene has been clarified which
suggests how the gene has evolved.
Evolution by gene duplication is a well known phenomenon
among eukaryotic genes. The globin clusters, the immunoglobulins, and theinterferon genes probably all have ancestral
genes which have been duplicated and then diverged into
functionally distinct genes (1). Examples of internally, repetitive sequences have also been reported; rabbit skeletal tropomysin contains a 7-residue amino acid periodicity throughout the molecule (2), andsimilar repeats have been reported
for chicken fibronectin (3) and mammalian serum albumin
(4). Among prokaryotes, most reports of duplicated genes
have involved in vitro constructions (5), which seem to be
stable inEscherichia coli, but dramatically unstablein Bacillus
subtilis (6). However, the amino acid sequences of a few cell
wall-bound proteins from Gram-positive bacteria have revealed remarkable periodicity, i.e. staphylococcal protein A
(7,8) andstreptococcal M protein (9).
We have earlier reported on the molecular cloning of the
* The costs of publication of this article were defrayed in part by
the payment of page charges. This article must therefore be hereby
marked “advertisement” in accordance with 18 U.S.C. Section 1734
solely to indicate this fact.
IT Supported by grants from the Swedish National Board for Tech.
11 Present address, European Molecular Biology Laboratory, Hei.
delberg, Federal Republic of Germany.
** Supported by grants from the Swedish Medical Research Council and Pharmacia Fine Chemicals, Uppsala.
Bacterial Strains and Plasmids-E. coli strains HBlOl (12) and
GM161 (13) were used as bacterial hosts. The plasmid vectors were
pBR322 (14),pTR262 (15), and pEMBL9 (16).
DNA Preparations-Plasmid DNA was prepared by the alkaline
extraction method (17). Transformation of E. coliwasmade
described by Morrison (18). Restriction endonucleases, T4 DNA
ligase (New England Biolabs), alkaline phosphatase, and T4polynucleotide kinase (Boehringer-Mannheim) were used according to the
Isolation of the 2.15-kilobase DNAfragment containing the entire
protein A gene was made by digesting the plasmid pSPA3 (10) with
EcoRV. The digested material was electrophoresed on a 5% polyacrylamide gel, and the 2.15-kilobase fragment was eluted electrophoretically. The isolated fragment was passed over an anion exchange
column, eluted, and precipitated with ethanol. The precipitated material was washed in 80% ethanol, dried, resuspended in water, and
used for DNA sequence analyses.
DNA Sequencing Determinutions-DNA fragments were sequenced by the method of Maxam and Gilbert (19) or Sanger et al.
(20). The samples were analyzed on 6, 8, and 20% denaturing polyacrylamide gels using the thermostatic LKB Macrophor system.
Computer Anulysis-All the sequencing analyses were performed
on a Hewlett-Packard desktop computer (HP-85) equipped with a
HP7225A plotter. The software was constructed by M. Uhlen.
DNA Sequence-We have earlier reported that theprotein
A gene from S. aureus strain 8325-4 is located ona 1.8kilobase insert of staphylococcal DNA cloned in the plasmid
downstream from the EcoRV site on the originalplasmid
pSPAl (10)was determined using both methods (19. aureus are also presented in Table I.2. The sequence of the promoter region
ent. J. Note that the previouslypublishedsequence
due tostrain differences and proteolyticdigestion during
Lofdahl et al. a high degree of homology exists and only 4 out of
E is a region homologous to A-D.
only a few amino acids NHZ-terminal
mark et al.
sert. (1984) Eur. Biochem. the sequence a few hundred nucleotides compositions of different parts of the protein. there are
of protein A which lacks IgG-binding activity. 3. Restriction map and sequencing strategy of the inalso
indicated inFig. coli. schematic drawing of the gene coding for protein A with its
different regions. thespace between the lastG in thissequence and
EcoRV insert in the plasmid pBR322. coli in which it is
very rare (22). Structure of plasmid pSPA8 with relevant restric. A. the amino acid
the 3' endof the gene. although the first
-35 sequence shows relatively poor
three out of six) with TTGACA. are tabulated in TableI.
FIG.Second. Third. unlike E. 2C. = 58. and X is the COOH-terminal part the 235 amino acids comprising all four regions vary. Hellman. 16) was performed in order Afrom mutant bacteria which secrete the product (8). B. Both palindromes arefollowed
by a T-rich stretch of residues (TTTATTTT). The acid compositions of purified protein A from different strains
complete nucleotide sequence of the protein A gene is shown of S.. including the
putative signal peptide. 1.
tion sites. Sjoquist. Nilsson. and therefore additional sequencing ylococcal cell walls with lysostaphin
using the enzymatic method (20. Among the IgG-binding regions D.20). 3. (10) lacks one of the three thymidines position
of the protein. B.
Amino Acid Sequence-The amino acid sequence deduced
from the DNA sequence as well as amino acids that differ in
the partial protein sequence established in Sjodahl (27) are
FIG.Theplasmid was designatedpSPA8andis
sequence of region X also shows high similarity
shown schematically in Fig. there is an
open reading frame
of 1. in press.. S is a signal sequence.A-D are IgG-binding regions. Thepartialamino
pBR322 (21). there are
reasons to postulate this. Although we
Using the strategy outlinedin Fig.527 nucleotides terminating ina TAG
stop codon at nucleotide 1. the entire insertwas
Amino Acid Composition-Attempts to determine the prosequenced according to the method of Maxam and Gilbert
(19)..711 thus gives a mature protein A of 473 amino acids and a
in S. It was not possible to obtain sequence on both strands tein sequence of protein A have involved digestion of staph(28) or analyzing protein
in all parts of the gene.752.
in Fig.subtilis 16 S rRNA. Expression of the gene was sequence although about
10% of the amino acids are differdemonstrated in E. U h l h . C. M.
protein sequencefrom strain Cowan I. Although we have not shown that the codon at
nucleotide 184 is the translational start. According to Sjodahl (27) and Lind183-185. The preprotein. The gene is both preceded
and followed by palindromic sequences indicating transcriptionterminations.
8 out of 11 nucleotides are complementary to the
3' end of B.1696
DNA Sequence of Staphylococcal Protein A
Starting from a TTG codon at nucleotide 184.Theseareindicatedin
to confirm the sequence in these parts.. TTGa common
in Gram-positive bacteria (21). J. sequencing strategy of DNA sequence was obtained from strain 8325-4 andthe
the 1.711. A direct comparison
of of structures from deduced and purified proteins is difficult.8 kilobase TuqI.. 3. The protein A gene is contained in a 1. B. M. 1..
U.In addition. Lindberg.
A ) and p-lactamase (AMP). the divergence is
probablydue tostrain variation.
Two upstream overlapping promoter sequences similar to
the consensus sequences (TTGACA and TATAAT) of prokaryotes (26) havebeen indicated in Fig.\
(a few basic residues followed by a stretch of 23 hydrophobic
residues). and C. aureus. First. 4. 25). this codon is preceded by a possible ShineDalgarno sequence (23) that has many features in common
with other Gram-positive ribosomal binding sequences (24). 3.' The amino acid numbering starts with the alanine at
and the 5' end of the structural gene has been reported (10)
nucleotide 292 which has been shown to be the first amino
as well as the sequence of therepetitive region X which
acid of the mature protein A. Boxes show the positions of
the replication origin (OR0 and thegenes coding forprotein A (PROT the start codon is sevennucleotides. unpublished results.'
.' The stop codon at nucleotide
probably is responsible for thecell wall binding of the protein
1. Since the
map of the corresponding DNA sequence. J.8-kilobase insert. B. it thus
appears likely that protein A is translated from a monocisHlndI I I
tronic mRNA. similar to other Gram-positive
genes (25). also similar to other
Gram-positive genes (24. As no palindromic order to compare the sequences deduced from the DNA sesequence indicating transcription termination was found in quence with those obtained experimentally. consists of 509 amino acids giving a
M . partial restriction
changes canbe explained by single point mutations.
do not have any experimental data to show where the tranEcoRV
scription of the protein A mRNA starts or terminates. = 52. A. andthe
possible mRNA hairpin structures that can
schematically drawn in Fig. as deduced
from the DNA sequence. (8). this startcodon gives a putative signal
peptide with a reasonable size (36 amino acids) and structure
\ . and
DNA Sequence of Staphylococcal Protein A
l 3 4 3
8e Xo P
U U "
m m a
L e m
0 0 0
^ e d
m o v
similar to theGC content of chromosomal DNA from
acids in the COOH-terminal part shows good agreement with S. .
. The protein
A gene of Cowan I has recently been cloned in our laboratory.3
. by highly expressed genes. it appears likely that the secreted form of
protein A from strain A676 does contain region E. AAC (Asn).
genes and plasmid-coded genes by the four putative proteins
encoded by the staphylococcal plasmid vector pC194 (26).codon pairs marked in Table I1 are most dependent on maxtained due to a blocked terminus (27). 3.
Deduced protein A from
Purified protein A
Prot-A" Mat-Ab A-E' A-Xd Cowan I' Cowan I' A67W
Amino acid composition of deduced protein A gene or purified protein
e From Movitz (2). . Therefore. The size preferred. Every point in
. it seems
which will help to clarify this point. In contrast.a few exceptions. . UUC (Phe). -3 '
C -T. similar to theoverall GC content
difference in size and amino acid composition is due to proof the Bacillus species involved. selection for C is
DNA sequence starting at nucleotide 292 in Fig. of a mature protein A lacking 107 amino is 32%. However.. are most likely to be preferred or
refer to nucleotides in Fig.
terminal sequence of this protein (8) fits well with the NH2. At present. among the four codon
both by Edman degradation of the purified protein' and by pairs in which. 4. Chromosomal genes are represented by four Bacillus quence were scanned by a computer program.
. 3.DNA Sequence of Staphylococcal Protein A
A. isolated by lysostaphin treatment of bacteria. amino acids 57-473. . the DNA sequence and its deduced amino acid segenes. the plasmid-coded genes have a marked preference
of the protein or if it reflects genomic differences.
Homology Plot Analysis-In order to search for homologous
Codon Usage-The codon usage for the preprotein of protein A is compared in Table I1 with other Gram-positive regions. aureus which is 30-33% (34). Furthermore.. amino acids 1-473 in Fig. this nucleotide is indeed chosen 64% of the time
of protein A from A676 would then indicate that the protein (67/105). .
not preferred. cilitated by proper choice of degenerate codewords.ATCATCT/"
" TTTATTTTAC. The NH2.
exhibit aclear preference for third position A/U bases with
In contrast. the DNA sequence does not adapting to theoverall GC content of the host cell with some
contain the COOH-terminal -Val-Ala-Lys which has been exceptions. it is unclear if this
degenerate third base is 42%. . Hypothetical secondary structures at the 5' and 3' Also indicated by or . .
reported for A676 (8). The per cent G/C of the
if region E is omitted (A-E). (8). which is 42-47% (34).
'From Lindmark et al.
8 From Lindmark et al. . . according
regions flanking theprotein A coding sequence. for A/U bases.
dMature protein A except COOH-terminal part.
size of the deduced protein from 8325-4 is larger than two
Table I1 shows that among the chromosomal genes the
independent determinationsof the protein from Cowan I even codon usage is randomly distributed. the codon usage
the composition of purified protein A from strain A676 as of the proteinA gene shows a preference for A/U bases
shown in Table I.. isolated by lysostaphin treatment of
bacteria. The amino acid composition. and AGC (Ser).. . extracellular protein A produced by a
methicillin-resistant strain. The GC content at the thirdbase of the codons
the DNA sequence. the exact NHAerminal sequence could not be ob.
10 13 14 18
Leucine 31 36 41
381 366 395
Protein A including the signal peptide. T
T. (8).Two of these exceptions can be explained by the Grosjean
terminus of protein A from strain 8325-4 when determined and Fiers (32) hypothesis. mainly following the Grosjean-Fiers (33) rules for
highly expressed genes. the four codon pairs with predicted
is truncated at theCOOH-terminal lacking approximately 80 selection for U show a reversed ratio. amino acids 1366. 3. and only 21% C (18/85)
amino acids. The numbers to Grosjean and Fiers (33).
Mature protein A except region E. Their
hypothesis predicts that efficient in-phase translation is faof region D in protein A isolated from cell walls of Cowan I. In
teolysis both in the NH2-terminal andCOOH-terminal parts
contrast.are the codon pairs which. as deduced from can be found. and the
However. according to the theory. only 22% G/C. aureus
from different strains of
* Mature protein A. Although the repetitive nature
of the protein Agene makes statistical analysis risky. .. Table I shows that the imal codon-anticodon interaction energy.TTAAGCC '
As the start codons are yet to be identified. These results strongly
support the previously suggested hypothesis (27) of an evolutionary pressure in these regions keeping the amino acid
sequence preserved. In Fig. five out of nine
nucleotides are identical. 6 is
. which means that many of the nucleotide changes
between the codons in the homologous regions have occurred
in bases giving no amino acid change. The cleavage points for trypsin are
marked with arrows. a line of identity occurs from the left upper corner
to theright lower corner.1699
DNA Sequence of Staphylococcal Protein A
. and AUA (Ile) are omitted. B. D. B. The partof the gene coding for the COOH-terminal
part of region X as well as the 3’ flanking sequence seems to
be unrelated to both the repetitious region X and the IgGbinding regions. A and B ) these regions
seem to be nonhomologous to the IgG-binding regions. The same holds for the other end
of the repetitive region
located in the beginning of region E’.
e Per cent G/C in the third degenerate base. 27). B. the part
of the gene coding for the signal peptide (S) as well as the
promoter region (5’) seems to be totally unrelated to theIgGbinding regions (E.
However. As the sequence is compared
with itself. indicating a relationship.
B . 5. This choice is of course arbitrary as the5’ end and
the 3‘ end of the repetitive region have diverged slightly. Also shown in Fig. Recently. As already pointed
out in the homology analysis (Fig. The codons AUG (Met). and
C (7. A. the boundary of these
regions has been moved 15 nucleotides towards the 3’ end of
the gene. 5A are more broken than those in
Fig.respectively. amyloliquefaciens a-amylase (25). 5. which disappear when no homology exists.
Structure of IgG-bindingRegions-The IgG-binding regions
of protein A have been defined by trypsin cleavage of the
mature proteininto functional IgG-binding units D. the total open reading frames
are taken into account. B and C ) located in the middle of
the gene. Although the first three
amino acids are different from region D’. Thus. There exists a nine-nucleotide insertion
in region E’ giving three amino acid residues (59-61) not
homologous to the otherregions.
e Four putative proteins of pC194 (32). licheniforrnis penicillinase (31).
The sum of four Bacillus chromosomal genes. 6 the
sequence of the regions are aligned to enable comparisons. A .
A changed nucleotide compared to region B’ in Fig. we showed (10)that strain 8325-4 also
contains a fifth region E homologous to the four repetitive
regions earlier identified by protein sequencing. UGG (Trp). The
plots reveal two structurally distinct regions with internal
homology. and homologous repeats show up as
parallel lines. In
order to achieve maximal homology. 5B. 6 are the
sequences flanking the repetitive regions. and B. A and 8. more than half of
the nucleotides (8/15) are homologous. although the lastfive amino acids of region C’ (292296) are changed compared to region B’. subtilis a-amylase (29). Comparisons between the plots show that
the homology lines in Fig.
the homology plots represents an identical residue (1). The
nucleotide triplets and thededuced amino acids are compared
in Fig. flanked by unique sequences without homology in
the 5’ and the 3’ ends of the structural gene. _
Per cent G/c‘
Protein A including the signal peptide (preprotein).
The eight codon pairs which aremost likely to be preferred (+) or not preferred (-) by highly expressedgenes
(331. subtilis SpoOF (30).
The comparison is based on region B'. As already pointedout by Sjodahl (27). A. The fact thatcodons
(Table IV) have changed much faster than amino
111) indicates that anevolutionary pressure exists tokeep the
.DNA Sequence of Staphylococcal ProteinA
B. R. the closer the location of two regions. Dot matrix comparisons of the protein A sequence.
FIG.the higher the degree
of homology. A comparison of
the five regions with respect to mutual relationship reveals a
pronounced "homology gradient" along the protein molecule. The cleavage points
for trypsin are marked with arrows.
A. Comparisons of the IgG-binding regions and flanking regions.
marked with an asterisk. Table I11 summarizes the amino
IV the codon changes between the regions. the deduced amino acid sequence compared
with itself. The sequences of the repetitive
regions have been aligned to achieve maximal homology. and a nucleotide
is marked with a n asterisk and an aminoacid is underlined when different from the B' region.
FIG. the entire nucleotide sequence and the
immediate 5' and 3' flanking sequences are compared with itself. As a result of these evolutionary
events. 5. thus generating slightly dissimilar nucleotide
and amino acid sequences. a homology gradient will evolve.
i. and direct repeats appear asparallel lines across the grid. Each dot represents the center of a three-base
identity. 6. one
interpretation of thisphenomenonisthatthe
structural gene coding for the IgG-binding part of protein A
has been subjected to stepwise gene duplications involving
only oneregion followed by a period in which point mutations
have occurred.e. and a changed amino acid is underacid changes and Table
but all the other 49 residues are identical. the changed His-Leu.
192 (Leu). A and B. a u r e u s
Cowan I and 8325-4. a changed nucleotide is marked
with an asterisk. Therefore. Another subregion of interest is the nine-nucleotide insert. but this region has
obviously diverged in the COOH-terminal end. As discussed above.
7. Comparison of the repetitive units of region X and
Structuralstudies of protein A have suggested that 11
amino acids of the IgG-bindingregions are essentialfor binding to theF. This subregion (residues 57-62) is possibly related to other regionslike amino acids 4-9 in the
beginning of region E’. part of the immunoglobulins (35). the 24-nucleotide repeats are aligned and a mutual comparison was performed. not clearly defined since the 12
last nucleotides.) followed
by a constant region coding for 81 amino acids (Xc). suggesting an evolutionary pressure to keep these residues intact.
Comparison of codons of the ZgG-binding regions
The values listedrepresentthenumber
triplets of identically positioned codons when the regions are campared in pairs. and A’. The
3’ end of the repetitive region is obviously located at amino
acid 392 (see Fig. In region B’. 20) is immediately before amino acid 292
(Glu). and a changed aminoacid is underlined. the end of region C’ is probably
related to the otherIgG-binding regions.
Acomparison of the 12 repeated units reveals striking
homologies. however. As
seen in Fig. 206-207 (Ile-Glu). In region B’. the
repetitive part of region X consists of exactly 12 units each
with a length of 24 nucleotides. 188-189 (Phe-Tyr). 6. 7. structurally the octapeptideof region X seems tobe repeated 12. to Asn-Met. As seen in Fig. 7). 5. thereis a serine insteadof aspargine at position
70. are identical with the corresponding amino acids of region
DNA Sequence of Staphylococcal ProteinA
Comparison of amino acids of the ZgG-binding regions
The values listed represent the number
of changed amino acids of
identically positioned residueswhen the regions are compared in
The changes observed are often outside the two helical areas. these
amino acids are 184-186 (Gln-Gln-Asn).
for instance. there
is a strong pressure tokeep these amino acids preserved.The numbers refer to the amino acids in Fig. The comparison is based on
region XI. The cleavage point for trypsin
which defines region X (7.
Structural studies based on the cleavage with trypsin (7.In Fig. which has been observed in protein A both fromS.
20) have suggested that region X starts at amino acid 292
which differs five amino acids from the boundary chosen in
Fig. This pressure
is evenmore pronounced when comparing the residues in
these a-helices that interact with IgG. A comparison nucleotide by nucleotide
reveals that 14 out of 18 bases are identical between these
Since the number of total
changes of codons is lowest for region B’ (Table IV). 7) which is directly followed by the constant
FIG. generating a
few amino acids identical with region XI. giving an
approximately 300-base pair repetitive region (X. in
regions E’. Hence. 6. D’.Again. and 210 (Lys).
Apart from the mutual homology between the five regions. and an altered nucleotide is marked with an asterisk and
an altered amino acid is underlined. Clearly. the
nucleotide sequence coding for amino acids 179 (Lys) to 188
(Phe) and196 (AAC)to 205 (Phe) all withinregion B contains
24 identical outof 30 nucleotides. 7. the corresponding residues are 183-192
and 198-211. giving the amino acids 5961.
S t r u c t u r e of Region X-The repetitive nature of region X
is indicated as multiple lines in Fig. 3. at position
region. The six first amino acids (Lys-Pro-Gly-Lys-Glu-
. 5. this
region was chosen for the comparison in Fig. Since region C’ terminates at amino acid 296. A and B . 203 (Asn). The sequences of the repetitive region have been
aligned to achieve maximal homology. 6. Mostof these
amino acids are assumed tobe located in two a-helical regions
there also seem to exist internalhomologies in each region as
revealed by traces of lines in Fig.
amino acidsequencepreserved. The boundarybetween region
C’ and region X is. coding forthe last four amino acids of region
times. there are striking homologies
in these two a-helices between the different regions.
Deisenhofer. (1969) J... 32. Greene.UhlBn.. Jeffreys. Tanaka.(1979) J.19. B.775-782
Asp) are identical throughout the X. (1977) Eur. 256. Galizzi.
silent mutations. Yamada. W. M. the evolution of the repetitive partof region 18. and Kang. Chem. G. and Losick. Sci.. A. J. (1977) Proc. pp. (1982) Semin.800-804
27. J . 7.. Gen. Although the biological function
74. I. J. Biochem.M. Acad. H. S. and Schaller. W. S. and Coulson. J.. andYamada. CRC Press. 78.. and Manjula..1645-1655
be based on a 48-nucleotide repeat rather than the primordial 17. Pastan. and Henner. Boyer. Dente. Grosjean.
(1983) Proc. M. (1981) J. 74.. (1976) Eur. C. FL
35. Sprengel. Infect. Natl. Sci. and Dalgarno.326-331
A. S. Neugebauer. (1977) Proc.
Acknowledgments-We are grateful 50 Dr. L..
although the gradient must
11.. I. Reu. and Sjoquist. Sci. H.95-113
in regions X7 to X12. L. J. 199-209
Heyneker. Sanger. A . Genet.
and Philipson. W. H.... 2. F. s. Moran. 123-127
There also seems to besomeevidencefor
a homology 16. Nilsson. 11.. B. Kozak. L. J.
X probably involved stepwise gene duplications of an ances.46-50
ing thesix conserved amino acids in the12 X.. 7657-7661
5.. 12 nucleotides have changed when compar. Gatenbeck. 291-299
29. (1981) Biochemistry 20.10. Betlach.401-410
sequence.) 302. thewobble base A/G in 14. aswell as genes coding 21. L. Dis. U h l h . McLaughlin. Takkinen. M. Hence. region. J. F. H. Lofdahl.. S. M. Natl. H. R. D. A. M. Acad. 6). John Sjoquist for critical
comments and advice. F. (1981) Proc. B..
Apart from the distinct24-nucleotide repeat. Sci. (1983) J. and Falkow. Stephen Fahnestock for a correction of the nucleotide sequence. Natl.)254. H. (ed) (1976) CRC Handbook of Biochemistry and
Molecular Biology: Nucleic Acids Section 3rd Ed.
3. A. Nicklen.. A. Beachey.. N. 68. B. Acad.) 12. 68.. C. A. 2361-2370
. H. I. 9..10071013
26. Rodriquez. and Gilbert. How this evolved at 20. C. Horinouchi. and Weisblum. (1982) Semin... S. (1981) Gene (Amst. Cesareni. but the nucleotide sequence of
Acad.. C. K. 1-48. M. 15131523
24-nucleotide sequence. 258. Biol.) 2.
In conclusion. Crosa. 1-45
resolving the molecular events causingstepwise multiple DNA 23. S. Sjodahl. Vol. K. K. Inc. and Gregori.
the molecular level is unclear.. 347353
Biol. C. A.or 48-nucleotide long sequence. J. 139.
amino acids are changed in a regular pattern between Asn8. 78.560-564
tral 24. J .. 4.and Fiers. The two last
7.M. (1977) Eur. may help in
22. Fishetti. Swanberg. P. Sci... Philipson. I. Sullivan.
30.(1982) Adu. Andras
Gaal for introducing us to the thermostatic LKBMacrophor system
and Dr. S. T. R. K. Bacteriol. T.. A. (1982) Gene (Amst. U.
U. Yang. Maxam.. B. Hirano..11283-11291
Vol. D. N.157-252
12. Natl. S. Biochm.(1983) Microbiol.. Acad. M.5463-5467
the protein A gene from other strains... U. A. and Kaariiiinen. Sci. Kawamura. pp.DNA Sequence of Staphylococcal Protein A
6. (1983) Nature
31. Lindberg. (1983) Nucleic Acids Res. L.) 13. and Rabinowitz. Roberts.
there has been a strong pressure to preserve its amino acid
Infect. 47.. Movitz. (1973) Mol.H. Lindmark. J. V. (1979) Methods Enzymol. R. Hartley. H.. (1983) Nucleic Acids Res. Boyer. Guss. Y. (1979) Nucleic Acids Res. Guss. and Roulland-Dussoix. B.. R. A.
69-183. and Lindberg.) 23. J. Biol. We also thank Dr. R. E. Acad. (1983)
Proc. Thus. L. J. Kobayashi. J. and Doly.. U.
Asn. 80. 4 7 4 5
signs of a 48-nucleotide repeat. 34-38
duplications. Marinus. (1982) J. J. U. A..
the codon coding for the first lysine is changed periodically
(1977) Gene (Amst. 41. Shimotsu. H. L. T. Birnboim.
all occurring in a wobble position and therefore representing 11... Seyer. Riedel. Natl.. Dis.
Shine. 1 2 7 .. U. J. 74.. Poteete. 150.. (1977) Eur. Palva. (1983) Proc. Murray.. G.. S.
gradient throughout the region. G.
11.. R. Movitz. 471-490
28. R. 2577-2588
32. H. (1981) Nucleic
Acids Res. F. (1980) Gene (Amst. 815825
between Asn and Gly in regions 5 to 10 (seeFig. (1983) Gene (Amst. D.. J. Pettersson. P. there arealso 13. M. Boca Raton. Biochem. de Crombrugghe.) 18.. and Cortese. J. Ohno.. Johnson. Academic Press. We thank Hans-Olof Pettersson and Bjorn
Jansson for skillful technical assistance and ChristinaPellettieri and
Gerd Benson for patient secretarial help. Y. I. H. Gly-Asn.H... Kalkkinen. Y. L. S. Biochem.. W. R. Morrison.623-628
of this extremely conserved octapeptide is not known. L. (1981) in Genetic Engineering (Williamson. Langone. M. C. Sjodahl. Fasman. J. R. and Saito. M. S. (1975) Nature (Lord. New York
2. Bolivar. S.
80. M. clearly
9. D. A. A.. Soderlund. or Asn-Lys.. W. and aminoacid 7 is changed periodically
15. 80. Chem.369-378
for proteins with similar repeated structures. L. Mol. Bacteriol. Natl. 4. 73.