J. Biol. Chem.-1984-Uhlén-1695-702

Vol. 259, No.
3, Issue of February
THEJOURNAL
OF BIOLOGICAL CHEMISTRY
0 1984 by The American Society of Biological Chemists, Inc
pp. 1695-1702,1984
Printed in U.S.A.
Complete Sequence of the Staphylococcal Gene Encoding

Protein A
A GENE EVOLVED THROUGHMULTIPLE DUPLICATIONS*
(Received for publication, August 4, 1983)
Mathias Uhlen$QlI, Bengt GussQ, Bjorn NilssonSTi,

Sten Gatenbeck$, Lennart
PhilipsonQII, and
Martin Lindberg$**
From the $Department of Biochemistry, Royal Institute of Technology, S-100 44 Stockholm, Swedenand the Department of
Microbiology, University of Uppsala, The Biomedical Center, Box 581, S-75123 Uppsala, Sweden
The gene coding for proteinA from Staphylococcus gene for staphylococcal protein A inE. coli (10). This protein
aureus has been isolated by molecular cloning, and a interacts with the F, (constantpart of immunoglobulins)
subclone containing an 1.8-kilobase insert was found domain of several immunoglobulins from many species into give a functional protein A in Escherichia coli. The cluding man and hastherefore been used extensively for
complete nucleotide sequence of theinsert, including quantitative and qualitative immunological techniques (11).
thestructuralgeneandthe
5 and 3 flanking se- Amino acid sequence analysis of proteinA revealed two
quences, has been determined. Starting from
a TTG functionally distinct regions of the molecule (7, 8). Both
initiatorcodon,anopenreadingframecomprising
regions have remarkably repetitive structures.
1527 nucleotidesgives a preprotein of509 amino acids
The NH2-terminal part contains four or five homologous
and a predicted M, = 58,703. The structural gene is IgG-binding units consisting of approximately 58 amino acids
flanked on both sides by palindromic structures fol- each. The COOH-terminal part which is thought to bind to
lowed bya stretch ofT residues, suggesting transcriptional termination signals. Thus, it appears that pro- the cellwall of Staphylococcus aureus consists of several
repeats of an octapeptide (Glu-Asp-Gly-Asn-Lys-Pro-Glytein A is translated froma monocistronic mRNA.
The sequence reveals extensive internal homologies LYS)(8).
In a previous report (lo), we determined the nucleotide
involving a 58-amino acid unit, responsible for IgG
sequence
of the promoter region, as well as theregion coding
binding, repeated 5 times and an 8-amino acid unit,
possibly responsible for bindingto the cell wall of S. for the NH2-terminal part of the protein. Here we report the
aureus, repeated 12 times. Comparisons between the complete nucleotide sequence of the protein A gene including
repeated regions showa marked preference forsilent the 5 and 3 flanking regions from the S. aureus strain 83254. Thestructural gene is 1,527 nucleotides long giving a
mutations, indicating an evolutionary pressure to keep
the amino acid sequence preserved. The structure of preprotein consisting of 509 amino acids and a M, = 58,703.
the gene alsosuggests how the gene hasevolved.
The repetitive structure of the gene has been clarified which
suggests how the gene has evolved.
Evolution by gene duplication is a well known phenomenon
among eukaryotic genes. The globin clusters, the immunoglobulins, and theinterferon genes probably all have ancestral
genes which have been duplicated and then diverged into
functionally distinct genes (1). Examples of internally, repetitive sequences have also been reported; rabbit skeletal tropomysin contains a 7-residue amino acid periodicity throughout the molecule (2), andsimilar repeats have been reported
for chicken fibronectin (3) and mammalian serum albumin
(4). Among prokaryotes, most reports of duplicated genes
have involved in vitro constructions (5), which seem to be
stable inEscherichia coli, but dramatically unstablein Bacillus
subtilis (6). However, the amino acid sequences of a few cell
wall-bound proteins from Gram-positive bacteria have revealed remarkable periodicity, i.e. staphylococcal protein A
(7,8) andstreptococcal M protein (9).
We have earlier reported on the molecular cloning of the
* The costs of publication of this article were defrayed in part by

the payment of page charges. This article must therefore be hereby
marked advertisement in accordance with 18 U.S.C. Section 1734
solely to indicate this fact.
IT Supported by grants from the Swedish National Board for Tech.
nical Development.
11 Present address, European Molecular Biology Laboratory, Hei.
delberg, Federal Republic of Germany.
** Supported by grants from the Swedish Medical Research Council and Pharmacia Fine Chemicals, Uppsala.
EXPERIMENTALPROCEDURES
Bacterial Strains and Plasmids-E. coli strains HBlOl (12) and

GM161 (13) were used as bacterial hosts. The plasmid vectors were
pBR322 (14),pTR262 (15), and pEMBL9 (16).
DNA Preparations-Plasmid DNA was prepared by the alkaline
as
extraction method (17). Transformation of E. coliwasmade
described by Morrison (18). Restriction endonucleases, T4 DNA
ligase (New England Biolabs), alkaline phosphatase, and T4polynucleotide kinase (Boehringer-Mannheim) were used according to the
suppliers recommendations.
Isolation of the 2.15-kilobase DNAfragment containing the entire
protein A gene was made by digesting the plasmid pSPA3 (10) with
EcoRV. The digested material was electrophoresed on a 5% polyacrylamide gel, and the 2.15-kilobase fragment was eluted electrophoretically. The isolated fragment was passed over an anion exchange
column, eluted, and precipitated with ethanol. The precipitated material was washed in 80% ethanol, dried, resuspended in water, and
used for DNA sequence analyses.
DNA Sequencing Determinutions-DNA fragments were sequenced by the method of Maxam and Gilbert (19) or Sanger et al.
(20). The samples were analyzed on 6, 8, and 20% denaturing polyacrylamide gels using the thermostatic LKB Macrophor system.
Computer Anulysis-All the sequencing analyses were performed
on a Hewlett-Packard desktop computer (HP-85) equipped with a
HP7225A plotter. The software was constructed by M. Uhlen.
RESULTSANDDiSCUSSION
DNA Sequence-We have earlier reported that theprotein

A gene from S. aureus strain 8325-4 is located ona 1.8kilobase insert of staphylococcal DNA cloned in the plasmid
1695
1696
DNA Sequence of Staphylococcal Protein A
Starting from a TTG codon at nucleotide 184, there is an

open reading frame
of 1,527 nucleotides terminating ina TAG
stop codon at nucleotide 1,711. The preprotein, including the
putative signal peptide, consists of 509 amino acids giving a
M , = 58,703. Although we have not shown that the codon at
nucleotide 184 is the translational start, there are
several
reasons to postulate this. First, TTGa common
is
start codon
in Gram-positive bacteria (21), unlike E. coli in which it is
pSPA8
fl
very rare (22).Second, this startcodon gives a putative signal
Pet I
peptide with a reasonable size (36 amino acids) and structure
\ - \
(a few basic residues followed by a stretch of 23 hydrophobic
residues). Third, this codon is preceded by a possible ShineDalgarno sequence (23) that has many features in common
with other Gram-positive ribosomal binding sequences (24).
8 out of 11 nucleotides are complementary to the
3' end of B.
FIG. 1. Structure of plasmid pSPA8 with relevant restric- subtilis 16 S rRNA, similar to other Gram-positive
genes (25).
tion sites. The protein A gene is contained in a 1.8 kilobase TuqI- In addition, thespace between the lastG in thissequence and
EcoRV insert in the plasmid pBR322. Boxes show the positions of
the replication origin (OR0 and thegenes coding forprotein A (PROT the start codon is sevennucleotides, also similar to other
Gram-positive genes (24, 25).
A ) and p-lactamase (AMP).
Two upstream overlapping promoter sequences similar to
the consensus sequences (TTGACA and TATAAT) of prokaryotes (26) havebeen indicated in Fig. 3, although the first
-35 sequence shows relatively poor
complementarity (only
three out of six) with TTGACA. The gene is both preceded
and followed by palindromic sequences indicating transcriptionterminations.Theseareindicatedin
Fig. 3, andthe
B= 0
1
kb
possible mRNA hairpin structures that can
beformed are
schematically drawn in Fig. 4. Both palindromes arefollowed
by a T-rich stretch of residues (TTTATTTT). Although we
ToqI
C.
do not have any experimental data to show where the tranEcoRV
scription of the protein A mRNA starts or terminates, it thus
Bcl I
PstI
appears likely that protein A is translated from a monocisHlndI I I
tronic mRNA.
Sou3
Amino Acid Sequence-The amino acid sequence deduced
Rea1
from the DNA sequence as well as amino acids that differ in
EcoRI I
the partial protein sequence established in Sjodahl (27) are
FIG.2. Restriction map and sequencing strategy of the inalso
indicated inFig. 3. Among the IgG-binding regions D, A,
sert. A, schematic drawing of the gene coding for protein A with its
different regions. S is a signal sequence,A-D are IgG-binding regions, B, and C, a high degree of homology exists and only 4 out of
All these
E is a region homologous to A-D, and X is the COOH-terminal part the 235 amino acids comprising all four regions vary.
of protein A which lacks IgG-binding activity. B, partial restriction
changes canbe explained by single point mutations. Since the
map of the corresponding DNA sequence. C, sequencing strategy of DNA sequence was obtained from strain 8325-4 andthe
the 1.8-kilobase insert.
protein sequencefrom strain Cowan I, the divergence is
probablydue tostrain variation. Thepartialamino
acid
pBR322 (21).Theplasmid was designatedpSPA8andis
to thededuced
sequence of region X also shows high similarity
shown schematically in Fig. 1. Expression of the gene was sequence although about
10% of the amino acids are differdemonstrated in E. coli. The sequence of the promoter region
ent.' The amino acid numbering starts with the alanine at
and the 5' end of the structural gene has been reported (10)
nucleotide 292 which has been shown to be the first amino
as well as the sequence of therepetitive region X which
acid of the mature protein A.' The stop codon at nucleotide
probably is responsible for thecell wall binding of the protein
1,711 thus gives a mature protein A of 473 amino acids and a
in S. aureus.'
M , = 52,752.
resulting
Using the strategy outlinedin Fig. 2C, the entire insertwas
Amino Acid Composition-Attempts to determine the prosequenced according to the method of Maxam and Gilbert
(19). It was not possible to obtain sequence on both strands tein sequence of protein A have involved digestion of staph(28) or analyzing protein
in all parts of the gene, and therefore additional sequencing ylococcal cell walls with lysostaphin
using the enzymatic method (20, 16) was performed in order Afrom mutant bacteria which secrete the product (8). In
to confirm the sequence in these parts. As no palindromic order to compare the sequences deduced from the DNA sesequence indicating transcription termination was found in quence with those obtained experimentally, the amino acid
the 3' endof the gene, the sequence a few hundred nucleotides compositions of different parts of the protein, as deduced
from the DNA sequence, are tabulated in TableI. The amino
downstream from the EcoRV site on the originalplasmid
pSPAl (10)was determined using both methods (19,20). The acid compositions of purified protein A from different strains
complete nucleotide sequence of the protein A gene is shown of S. aureus are also presented in Table I. A direct comparison
of of structures from deduced and purified proteins is difficult,
in Fig. 3. Note that the previouslypublishedsequence
due tostrain differences and proteolyticdigestion during
at
Lofdahl et al. (10) lacks one of the three thymidines position
isolation
of the protein. According to Sjodahl (27) and Lind183-185.
only a few amino acids NHZ-terminal
mark et al. (8), there are
Guss, B., U h l h , M., Nilsson, B., Lindberg, M., Sjoquist, J., and
Sjodahl, J. (1984) Eur. J. Biochem., in press.
U. Hellman, unpublished results.
*
1' 1
- 0
U L
UL
am
a z
a J
-u
^ e d
"a-
m o v
c u
as
u c
am
ua
am
a >
ad
e n
am
oa
e
u-
na
a3
e u
r
i
am
a >
a 1
uc
am
au
e
no
a mrr
oac
- a 1o
U U "
m m a
L e m
um a
r u
eaam
a r
a J
u n
am
Eu -%
oa
? L
U T
a t
C
U L
ac
u >
0 0 0
u c
am
aa
am
uma
am
a >
a J
4
y7
u n
am
ma
am
a >
a J
I->
cu
U J
cL
wam
u
ac
e-
a-
::
+L
u >
oO W
u c
am
aa
e o
m i
ua
3;
l 3 4 3
B
25
a >
si2
uc
am
aa
a >
e u
e J
u c
8e Xo P
U L
U L
W Y
ern
as
am
aa
O >
c u
e J
u r
uc
E
%S
c
a z
I-+
u u
e L
t L
1697
1698
TABLE
I
A.
Amino acid composition of deduced protein A gene or purified protein

S. aureus
from different strains of
A-T
6-C
T-A
C-G
C C
T-A
CT-A
5'
f5
I
~
T -A
T T
T-A,
- . . . . .TTAAGCC '
B.
-
851
Amino acids
Deduced protein A from

8325-4
Purified protein A
Prot-A" Mat-Ab A-E' A-Xd Cowan I' Cowan I' A67W
TTTATTTTAT ..... -3 '
C -T, T
T,
/T
A-T
C-G
A-T
A-T
C -G
G-C
A-T
G-CA
C-G
A-T
C -G
G-C
T-A
A-T
A-T
A-r
69
Lysine
Histidine
7
Arginine
6
Aspartic acid
105
10
Threonine
Serine
22
25
78
Glutamic acid
31
Proline
33
Glycine
42
Alanine
15
Valine
Methionine
6
10 13 14 18
Isoleucine
Leucine 31 36 41
9
Tyrosine
14 1412
Phenylalanine
65
7
5
103
85
7
78
30
28
18
38
12
6
62
6
4
91
7
17
18
67
2727
26
3131
10
5
27
8
7
14 13
45
3
5
2
20
6870
2426
3136
4
3
28
29
5
12
52
4
5
82
5
65
27
30 22
34
5
2
911
27
5
12
53
4
4
83
6
16
48
3
4
82
4
16
64
30
8
3
12
7
3
473
417
366
381 366 395
509
Protein A including the signal peptide.
* Mature protein A, amino acids 1-473 in Fig. 3.
Mature protein A except region E, amino acids 57-473.
dMature protein A except COOH-terminal part, amino acids 1366.
e From Movitz (2), isolated by lysostaphin treatment of bacteria.
'From Lindmark et al. (8), isolated by lysostaphin treatment of
bacteria.
8 From Lindmark et al. (8), extracellular protein A produced by a
methicillin-resistant strain.
Total
genes and plasmid-coded genes by the four putative proteins

encoded by the staphylococcal plasmid vector pC194 (26).
FIG. 4. Hypothetical secondary structures at the 5' and 3' Also indicated by or - are the codon pairs which, according
regions flanking theprotein A coding sequence. The numbers to Grosjean and Fiers (33), are most likely to be preferred or
refer to nucleotides in Fig. 3.
not preferred, respectively, by highly expressed genes. Their
hypothesis predicts that efficient in-phase translation is faof region D in protein A isolated from cell walls of Cowan I. cilitated by proper choice of degenerate codewords, and the
However, the exact NHAerminal sequence could not be ob- codon pairs marked in Table I1 are most dependent on maxtained due to a blocked terminus (27). Table I shows that the imal codon-anticodon interaction energy.
size of the deduced protein from 8325-4 is larger than two
Table I1 shows that among the chromosomal genes the
independent determinationsof the protein from Cowan I even codon usage is randomly distributed. The per cent G/C of the
if region E is omitted (A-E). At present, it is unclear if this
degenerate third base is 42%, similar to theoverall GC content
difference in size and amino acid composition is due to proof the Bacillus species involved, which is 42-47% (34). In
teolysis both in the NH2-terminal andCOOH-terminal parts
contrast, the plasmid-coded genes have a marked preference
of the protein or if it reflects genomic differences. The protein
A gene of Cowan I has recently been cloned in our laboratory, for A/U bases, only 22% G/C. Although the repetitive nature
of the protein Agene makes statistical analysis risky, it seems
which will help to clarify this point.
to
exhibit aclear preference for third position A/U bases with
In contrast, it appears likely that the secreted form of
protein A from strain A676 does contain region E. The NH2- a few exceptions, UUC (Phe), AAC (Asn), and AGC (Ser).
terminal sequence of this protein (8) fits well with the NH2- Two of these exceptions can be explained by the Grosjean
terminus of protein A from strain 8325-4 when determined and Fiers (32) hypothesis. Furthermore, among the four codon
both by Edman degradation of the purified protein' and by pairs in which, according to the theory, selection for C is
DNA sequence starting at nucleotide 292 in Fig. 3. The size preferred, this nucleotide is indeed chosen 64% of the time
of protein A from A676 would then indicate that the protein (67/105). In contrast, the four codon pairs with predicted
is truncated at theCOOH-terminal lacking approximately 80 selection for U show a reversed ratio, and only 21% C (18/85)
amino acids. The amino acid composition, as deduced from can be found. The GC content at the thirdbase of the codons
the DNA sequence, of a mature protein A lacking 107 amino is 32%, similar to theGC content of chromosomal DNA from
acids in the COOH-terminal part shows good agreement with S. aureus which is 30-33% (34). Therefore, the codon usage
the composition of purified protein A from strain A676 as of the proteinA gene shows a preference for A/U bases
shown in Table I. However, the DNA sequence does not adapting to theoverall GC content of the host cell with some
contain the COOH-terminal -Val-Ala-Lys which has been exceptions, mainly following the Grosjean-Fiers (33) rules for
highly expressed genes.
reported for A676 (8).
Homology Plot Analysis-In order to search for homologous
Codon Usage-The codon usage for the preprotein of protein A is compared in Table I1 with other Gram-positive regions, the DNA sequence and its deduced amino acid segenes. Chromosomal genes are represented by four Bacillus quence were scanned by a computer program. Every point in
5
- . . . . .ATCATCT/"
" TTTATTTTAC.
. . . .- 3
1699

TABLE
I1
Phe UUU
uuc
Leu UUA
UUG
cuu
CUC
CUA
CUG
Ile AUU
AUC
AUA
Met AUG
Val GUU
GUC
GUA
GUG
Ser UCU
ucc
Pro
UCA
UCG
CCU
ccc
CCA
CCG
Thr ACU
ACC
ACA
ACG
Ala GCU
GCC
GCA
GCG
Prot-A
Chromb
Plasmid
2
12
20
5
7
1
6
2
8
9
1
6
5
2
6
2
5
0
3
2
21
45
20
34
22
31
7
3
31
38
30
12
29
21
21
21
30
20
21
31
22
16
11
11
25
13
16
48
45
29
36
40
38
39
11
35
13
10
4
5
4
27
5
18
12
12
1
14
4
16
1
7
4
10
5
3
1
14
4
15
5
8
2
5
1
4
0
25
1
11
5
, _
Prep
9
1
6
1
Tyr
UAU
UAC
Term UAA
UAG
His
CAU
CAC
Gin 33CAA
CAG
Asn
AAU
31AAC
Lys
AAA
AAG
Asp
GAU
35GAC
Glu 59GAA
GAG
UGU
Cys
UGC
Term UGA
Trp
UGG
Arg
CGU
CGC
CGA
CGG
Ser
AGU
17 AGC
Arg
AGA
AGG
GGU
Gly
36GGC
GGA
GGG
Sum
Per cent G/c
29
9
8
1
0
0
6
1
38
2
20
45
51
18
21
19
37
1
0
0
0
0
3
3
0
0
3
12
0
0
18
14
1
0
49
33
27
8
46
20
17
1
16
6
43
12
56
12
22
5
19
10
7
4
9
4
1
3
0
13
3
11
4
11
2
3
3
509
32
1654
42
655
22
35
68
79
26
81
35
2
2
-
35
18
5
10
9
19
11
14
22
Protein A including the signal peptide (preprotein).

The sum of four Bacillus chromosomal genes, B. amyloliquefaciens a-amylase (25), B. subtilis a-amylase (29).
B . subtilis SpoOF (30), and B. licheniforrnis penicillinase (31).
e Four putative proteins of pC194 (32). As the start codons are yet to be identified, the total open reading frames
are taken into account.
The eight codon pairs which aremost likely to be preferred (+) or not preferred (-) by highly expressedgenes
(331.
e Per cent G/C in the third degenerate base. The codons AUG (Met), UGG (Trp), and AUA (Ile) are omitted.
the homology plots represents an identical residue (1). The

nucleotide triplets and thededuced amino acids are compared
in Fig. 5, A and 8,respectively. As the sequence is compared
with itself, a line of identity occurs from the left upper corner
to theright lower corner, and homologous repeats show up as
parallel lines, which disappear when no homology exists. The
plots reveal two structurally distinct regions with internal
homology, flanked by unique sequences without homology in
the 5 and the 3 ends of the structural gene. Thus, the part
of the gene coding for the signal peptide (S) as well as the
promoter region (5) seems to be totally unrelated to theIgGbinding regions (E, D, A , B and C ) located in the middle of
the gene. The partof the gene coding for the COOH-terminal
part of region X as well as the 3 flanking sequence seems to
be unrelated to both the repetitious region X and the IgGbinding regions. Comparisons between the plots show that
the homology lines in Fig. 5A are more broken than those in
Fig. 5B, which means that many of the nucleotide changes
between the codons in the homologous regions have occurred
in bases giving no amino acid change. These results strongly
support the previously suggested hypothesis (27) of an evolutionary pressure in these regions keeping the amino acid
sequence preserved.
Structure of IgG-bindingRegions-The IgG-binding regions
of protein A have been defined by trypsin cleavage of the

mature proteininto functional IgG-binding units D, A, B, and
C (7, 27). Recently, we showed (10)that strain 8325-4 also
contains a fifth region E homologous to the four repetitive
regions earlier identified by protein sequencing. In Fig. 6 the
sequence of the regions are aligned to enable comparisons. In
order to achieve maximal homology, the boundary of these
regions has been moved 15 nucleotides towards the 3 end of
the gene. This choice is of course arbitrary as the5 end and
the 3 end of the repetitive region have diverged slightly.
However, although the lastfive amino acids of region C (292296) are changed compared to region B, more than half of
the nucleotides (8/15) are homologous, indicating a relationship. The same holds for the other end
of the repetitive region
located in the beginning of region E. Although the first three
amino acids are different from region D, five out of nine
nucleotides are identical. The cleavage points for trypsin are
marked with arrows. There exists a nine-nucleotide insertion
in region E giving three amino acid residues (59-61) not
homologous to the otherregions. Also shown in Fig. 6 are the
sequences flanking the repetitive regions. As already pointed
out in the homology analysis (Fig. 5, A and B ) these regions
seem to be nonhomologous to the IgG-binding regions.
A changed nucleotide compared to region B in Fig. 6 is
DNA Sequence of Staphylococcal ProteinA
1700
B.
A.
5'
5'
3'
FIG. 5. Dot matrix comparisons of the protein A sequence. A, the entire nucleotide sequence and the
immediate 5' and 3' flanking sequences are compared with itself. Each dot represents the center of a three-base
identity, and direct repeats appear asparallel lines across the grid. R, the deduced amino acid sequence compared
with itself.
REGION C'
FIG. 6. Comparisons of the IgG-binding regions and flanking regions. The sequences of the repetitive
regions have been aligned to achieve maximal homology. The comparison is based on region B', and a nucleotide
is marked with a n asterisk and an aminoacid is underlined when different from the B' region. The cleavage points
for trypsin are marked with arrows.
marked with an asterisk, and a changed amino acid is underacid changes and Table
lined. Table I11 summarizes the amino
IV the codon changes between the regions. A comparison of
the five regions with respect to mutual relationship reveals a
pronounced "homology gradient" along the protein molecule,
i.e. the closer the location of two regions,the higher the degree
of homology. As already pointedout by Sjodahl (27), one
interpretation of thisphenomenonisthatthe
primordial
structural gene coding for the IgG-binding part of protein A

has been subjected to stepwise gene duplications involving
only oneregion followed by a period in which point mutations
have occurred, thus generating slightly dissimilar nucleotide
and amino acid sequences. As a result of these evolutionary
events, a homology gradient will evolve. The fact thatcodons
acids (Table
(Table IV) have changed much faster than amino
111) indicates that anevolutionary pressure exists tokeep the
1701
DNA Sequence of Staphylococcal ProteinA

TABLE
111
Comparison of amino acids of the ZgG-binding regions
The values listed represent the number
of changed amino acids of
identically positioned residueswhen the regions are compared in
pairs.
Region
E
D
A
B
C
0
11
14
11
0
12
14
21
11
17
12
7
0
5
15
Total
57
46
11
5
0
10
21
17
15
10
0
40
41
64
TABLE
IV
Comparison of codons of the ZgG-binding regions
The values listedrepresentthenumber
of changednucleotide
triplets of identically positioned codons when the regions are campared in pairs.
Region
E
D
0
31
A
B
C
25
26
36
31
0
21
25
28
2536
2128
0
14
30
26
25
14
0
20
Total
30
20
0
118
105
101
86
115
amino acidsequencepreserved.
Since the number of total
changes of codons is lowest for region B (Table IV), this
region was chosen for the comparison in Fig. 6.
Structuralstudies of protein A have suggested that 11
amino acids of the IgG-bindingregions are essentialfor binding to theF, part of the immunoglobulins (35). Mostof these
amino acids are assumed tobe located in two a-helical regions
(35). In region B, the corresponding residues are 183-192
and 198-211. As seen in Fig. 6, there are striking homologies
in these two a-helices between the different regions, suggesting an evolutionary pressure to keep these residues intact.
The changes observed are often outside the two helical areas,
for instance, the changed His-Leu, at position
193-194 of
region B, to Asn-Met, in
regions E, D, and A. This pressure
is evenmore pronounced when comparing the residues in
these a-helices that interact with IgG. In region B, these
amino acids are 184-186 (Gln-Gln-Asn), 188-189 (Phe-Tyr),
192 (Leu), 203 (Asn), 206-207 (Ile-Glu), and 210 (Lys). As
seen in Fig. 6, thereis a serine insteadof aspargine at position
70, but all the other 49 residues are identical. Clearly, there
is a strong pressure tokeep these amino acids preserved.
Apart from the mutual homology between the five regions,
there also seem to exist internalhomologies in each region as
revealed by traces of lines in Fig. 5, A and B. Hence, the
nucleotide sequence coding for amino acids 179 (Lys) to 188
(Phe) and196 (AAC)to 205 (Phe) all withinregion B contains
24 identical outof 30 nucleotides. Another subregion of interest is the nine-nucleotide insert, giving the amino acids 5961, which has been observed in protein A both fromS. a u r e u s
Cowan I and 8325-4. This subregion (residues 57-62) is possibly related to other regionslike amino acids 4-9 in the
beginning of region E. A comparison nucleotide by nucleotide
reveals that 14 out of 18 bases are identical between these
two regions.
S t r u c t u r e of Region X-The repetitive nature of region X
is indicated as multiple lines in Fig. 5, A and B , giving an
approximately 300-base pair repetitive region (X,) followed
by a constant region coding for 81 amino acids (Xc).In Fig.
7, the 24-nucleotide repeats are aligned and a mutual comparison was performed.Again, a changed nucleotide is marked
with an asterisk, and a changed aminoacid is underlined. The
3 end of the repetitive region is obviously located at amino
acid 392 (see Fig. 7) which is directly followed by the constant
209
237
x1
305
x2
313
x3
32 1
x4
329
x5
337
X6
345
x7
353
X8
361
x9
369
x10
377
x11
385
x12
3) 3
FIG. 7. Comparison of the repetitive units of region X and

flanking regions. The sequences of the repetitive region have been
aligned to achieve maximal homology. The comparison is based on
region XI, and an altered nucleotide is marked with an asterisk and
an altered amino acid is underlined. The cleavage point for trypsin
which defines region X (7, 20) is immediately before amino acid 292
(Glu).The numbers refer to the amino acids in Fig. 3.
region. Since region C terminates at amino acid 296, the

repetitive part of region X consists of exactly 12 units each
with a length of 24 nucleotides. The boundarybetween region
C and region X is, however, not clearly defined since the 12
last nucleotides, coding forthe last four amino acids of region
C, are identical with the corresponding amino acids of region
X1 (Fig. 7).
Structural studies based on the cleavage with trypsin (7,
20) have suggested that region X starts at amino acid 292
which differs five amino acids from the boundary chosen in
Fig. 7. As discussed above, the end of region C is probably
related to the otherIgG-binding regions, but this region has
obviously diverged in the COOH-terminal end, generating a
few amino acids identical with region XI. Therefore, structurally the octapeptideof region X seems tobe repeated 12.5
times.
Acomparison of the 12 repeated units reveals striking
homologies. The six first amino acids (Lys-Pro-Gly-Lys-Glu-
1702
6. Tanaka, T.(1979) J. Bacteriol. 139,775-782

Asp) are identical throughout the X, region. The two last
7. Sjodahl, J. (1977) Eur. J . Biochem. 73, 343-351
amino acids are changed in a regular pattern between Asn8. Lindmark, R., Movitz, I., and Sjoquist, J. (1977) Eur. J . Biochem.
Asn, Gly-Asn, or Asn-Lys. Although the biological function
74,623-628
of this extremely conserved octapeptide is not known, clearly
9. Beachey, E. H., Seyer, I. M., and Kang, A .H. (1982) Semin.
there has been a strong pressure to preserve its amino acid
Infect. Dis. 4,401-410
sequence. Hence, 12 nucleotides have changed when compar- 10. Lofdahl, S., Guss, B., U h l h , M., Philipson, L., and Lindberg, M.
(1983) Proc. Natl. Acad. Sci. U. S. A. 80, 697-701
ing thesix conserved amino acids in the12 X, compartments,
all occurring in a wobble position and therefore representing 11. Langone, J. J.(1982) Adu. Zmmunol. 32,157-252
12. Boyer, H. W., and Roulland-Dussoix, D. (1969) J. Mol. Biol. 41,
silent mutations.
459-472
Apart from the distinct24-nucleotide repeat, there arealso 13. Marinus, M. G. (1973) Mol. Gen. Genet. 1 2 7 , 4 7 4 5
signs of a 48-nucleotide repeat. Thus, thewobble base A/G in 14. Bolivar, F., Rodriquez, R. L., Greene, P. J., Betlach, M. C.,
Heyneker, H. L., Boyer, H. W., Crosa, J. H., and Falkow, S.
the codon coding for the first lysine is changed periodically
(1977) Gene (Amst.) 2,95-113
in regions X7 to X12, and aminoacid 7 is changed periodically
15. Roberts, T. M., Swanberg, S. L., Poteete, A., Riedel, G., and
between Asn and Gly in regions 5 to 10 (seeFig. 6).
Bachman, K. (1980) Gene (Amst.) 12, 123-127
There also seems to besomeevidencefor
a homology 16. Dente, L., Cesareni, Y., and Cortese, R. (1983) Nucleic Acids Res.
gradient throughout the region,
Xr
although the gradient must
11,1645-1655
be based on a 48-nucleotide repeat rather than the primordial 17. Birnboim, H. C., and Doly, J. (1979) Nucleic Acids Res. 7, 15131523
24-nucleotide sequence.
In conclusion, the evolution of the repetitive partof region 18. Morrison, D. A. (1979) Methods Enzymol. 68,326-331
A.M., and Gilbert, W. (1977) Proc. Natl. Acad. Sci.
X probably involved stepwise gene duplications of an ances- 19. Maxam,
U. S. A. 74,560-564
tral 24- or 48-nucleotide long sequence. How this evolved at 20. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl.
the molecular level is unclear, but the nucleotide sequence of
Acad. Sci. U. S. A. 74,5463-5467
the protein A gene from other strains, aswell as genes coding 21.UhlBn,M., Nilsson, B., Guss, B., Lindberg, M., Gatenbeck, S.,
and Philipson, L. (1983) Gene (Amst.) 23,369-378
for proteins with similar repeated structures, may help in
22. Kozak, M.(1983) Microbiol. Reu. 47. 1-45
resolving the molecular events causingstepwise multiple DNA 23.
Shine, J., and Dalgarno, L. (1975) Nature (Lord.)254, 34-38
duplications.
24. McLaughlin, J. R., Murray, C. L., and Rabinowitz, C. (1981) J.
Acknowledgments-We are grateful 50 Dr. John Sjoquist for critical
comments and advice. We thank Hans-Olof Pettersson and Bjorn
Jansson for skillful technical assistance and ChristinaPellettieri and
Gerd Benson for patient secretarial help. We also thank Dr. Andras
Gaal for introducing us to the thermostatic LKBMacrophor system
and Dr. Stephen Fahnestock for a correction of the nucleotide sequence.
REFERENCES
1. Jeffreys, A. J. (1981) in Genetic Engineering (Williamson, R., ed)
Vol. 2, pp. 1-48, Academic Press, New York
2. Fishetti, V. A., and Manjula, B. N. (1982) Semin. Infect. Dis. 4,
411-418
3. Hirano, H., Yamada, Y., Sullivan, M., de Crombrugghe, B., Pastan, I., andYamada, K. M. (1983) Proc. Natl. Acad. Sci. U. s. A.
80,46-50
4. Ohno, S. (1981) Proc. Natl. Acad. Sci. U. S. A. 78, 7657-7661
5. Hartley, I. L., and Gregori, T. J. (1981) Gene (Amst.) 13, 347353
Biol. Chem. 256,11283-11291

25. Takkinen, K., Pettersson, R. F., Kalkkinen, N., Palva, I., Soderlund, H., and Kaariiiinen, L. (1983) J. Biol. Chem. 258,10071013
26. Johnson, W. C., Moran, C. P., and Losick, R. (1983) Nature
(Lond.) 302,800-804
27. Sjodahl, J. (1977) Eur. J. Biochem. 78, 471-490
28. Movitz, J. (1976) Eur. J. Biochm. 68, 291-299
29. Yang, M., Galizzi, A,, and Henner, D. (1983) Nucleic Acids Res.
11.237-249
30. Shimotsu, H., Kawamura, F., Kobayashi, Y., and Saito, H. (1983)
Proc. Natl. Acad. Sci. U. S. A. 80, 658-662
31. Neugebauer, K., Sprengel, R., and Schaller,H. (1981) Nucleic
Acids Res. 9, 2577-2588
32. Horinouchi, S., and Weisblum, B. (1982) J. Bacteriol. 150, 815825
33. Grosjean, H.,and Fiers, W. (1982) Gene (Amst.) 18, 199-209
34. Fasman, G. D. (ed) (1976) CRC Handbook of Biochemistry and
Molecular Biology: Nucleic Acids Section 3rd Ed., Vol. 11, pp.
69-183, CRC Press, Inc., Boca Raton, FL
35. Deisenhofer, J. (1981) Biochemistry 20, 2361-2370

J. Biol. Chem.-1984-Uhlén-1695-702

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

J. Biol. Chem.-1984-Uhlén-1695-702

Transféré par

Droits d'auteur :

Formats disponibles

Vol. 259, No.

Complete Sequence of the Staphylococcal Gene Encoding

Mathias Uhlen$QlI, Bengt GussQ, Bjorn NilssonSTi,

* The costs of publication of this article were defrayed in part by

Bacterial Strains and Plasmids-E. coli strains HBlOl (12) and

DNA Sequence-We have earlier reported that theprotein

DNA Sequence of Staphylococcal Protein A

Starting from a TTG codon at nucleotide 184, there is an

U. Hellman, unpublished results.

DNA Sequence of Staphylococcal Protein A

DNA Sequence of Staphylococcal Protein A

Amino acid composition of deduced protein A gene or purified protein

Deduced protein A from

Prot-A" Mat-Ab A-E' A-Xd Cowan I' Cowan I' A67W

TTTATTTTAT ..... -3 '

genes and plasmid-coded genes by the four putative proteins

DNA Sequence of Staphylococcal Protein A

Protein A including the signal peptide (preprotein).

the homology plots represents an identical residue (1). The

of protein A have been defined by trypsin cleavage of the

DNA Sequence of Staphylococcal ProteinA

structural gene coding for the IgG-binding part of protein A

DNA Sequence of Staphylococcal ProteinA

FIG. 7. Comparison of the repetitive units of region X and

region. Since region C terminates at amino acid 296, the

DNA Sequence of Staphylococcal Protein A

6. Tanaka, T.(1979) J. Bacteriol. 139,775-782

Biol. Chem. 256,11283-11291

Vous aimerez peut-être aussi