Académique Documents
Professionnel Documents
Culture Documents
3, Issue of February
THEJOURNAL
OF BIOLOGICAL CHEMISTRY
0 1984 by The American Society of Biological Chemists, Inc
pp. 1695-1702,1984
Printed in U.S.A.
The gene coding for proteinA from Staphylococcus gene for staphylococcal protein A inE. coli (10). This protein
aureus has been isolated by molecular cloning, and a interacts with the F, (constantpart of immunoglobulins)
subclone containing an 1.8-kilobase insert was found domain of several immunoglobulins from many species into give a functional protein A in Escherichia coli. The cluding man and hastherefore been used extensively for
complete nucleotide sequence of theinsert, including quantitative and qualitative immunological techniques (11).
thestructuralgeneandthe
5 and 3 flanking se- Amino acid sequence analysis of proteinA revealed two
quences, has been determined. Starting from
a TTG functionally distinct regions of the molecule (7, 8). Both
initiatorcodon,anopenreadingframecomprising
regions have remarkably repetitive structures.
1527 nucleotidesgives a preprotein of509 amino acids
The NH2-terminal part contains four or five homologous
and a predicted M, = 58,703. The structural gene is IgG-binding units consisting of approximately 58 amino acids
flanked on both sides by palindromic structures fol- each. The COOH-terminal part which is thought to bind to
lowed bya stretch ofT residues, suggesting transcriptional termination signals. Thus, it appears that pro- the cellwall of Staphylococcus aureus consists of several
repeats of an octapeptide (Glu-Asp-Gly-Asn-Lys-Pro-Glytein A is translated froma monocistronic mRNA.
The sequence reveals extensive internal homologies LYS)(8).
In a previous report (lo), we determined the nucleotide
involving a 58-amino acid unit, responsible for IgG
sequence
of the promoter region, as well as theregion coding
binding, repeated 5 times and an 8-amino acid unit,
possibly responsible for bindingto the cell wall of S. for the NH2-terminal part of the protein. Here we report the
aureus, repeated 12 times. Comparisons between the complete nucleotide sequence of the protein A gene including
repeated regions showa marked preference forsilent the 5 and 3 flanking regions from the S. aureus strain 83254. Thestructural gene is 1,527 nucleotides long giving a
mutations, indicating an evolutionary pressure to keep
the amino acid sequence preserved. The structure of preprotein consisting of 509 amino acids and a M, = 58,703.
the gene alsosuggests how the gene hasevolved.
The repetitive structure of the gene has been clarified which
suggests how the gene has evolved.
Evolution by gene duplication is a well known phenomenon
among eukaryotic genes. The globin clusters, the immunoglobulins, and theinterferon genes probably all have ancestral
genes which have been duplicated and then diverged into
functionally distinct genes (1). Examples of internally, repetitive sequences have also been reported; rabbit skeletal tropomysin contains a 7-residue amino acid periodicity throughout the molecule (2), andsimilar repeats have been reported
for chicken fibronectin (3) and mammalian serum albumin
(4). Among prokaryotes, most reports of duplicated genes
have involved in vitro constructions (5), which seem to be
stable inEscherichia coli, but dramatically unstablein Bacillus
subtilis (6). However, the amino acid sequences of a few cell
wall-bound proteins from Gram-positive bacteria have revealed remarkable periodicity, i.e. staphylococcal protein A
(7,8) andstreptococcal M protein (9).
We have earlier reported on the molecular cloning of the
EXPERIMENTALPROCEDURES
1695
1696
Guss, B., U h l h , M., Nilsson, B., Lindberg, M., Sjoquist, J., and
Sjodahl, J. (1984) Eur. J. Biochem., in press.
*
1' 1
- 0
U L
UL
am
a z
a J
-u
^ e d
"a-
m o v
c u
as
u c
am
ua
am
a >
ad
e n
am
oa
e
u-
na
a3
e u
r
i
am
a >
a 1
uc
am
au
e
no
a mrr
oac
- a 1o
U U "
m m a
L e m
um a
r u
eaam
a r
a J
u n
am
Eu -%
oa
? L
U T
a t
C
U L
ac
u >
0 0 0
u c
am
aa
am
uma
am
a >
a J
4
y7
u n
am
ma
am
a >
a J
I->
cu
U J
cL
wam
u
ac
e-
a-
::
+L
u >
oO W
u c
am
aa
e o
m i
ua
3;
l 3 4 3
B
25
a >
si2
uc
am
aa
a >
e u
e J
u c
8e Xo P
U L
U L
W Y
ern
as
am
aa
O >
c u
e J
u r
uc
E
%S
c
a z
I-+
u u
e L
t L
1697
1698
TABLE
I
A.
5'
f5
I
~
T -A
T T
T-A,
- . . . . .TTAAGCC '
B.
-
851
Amino acids
Purified protein A
C -T, T
T,
/T
A-T
C-G
A-T
A-T
C -G
G-C
A-T
G-CA
C-G
A-T
C -G
G-C
T-A
A-T
A-T
A-r
69
Lysine
Histidine
7
Arginine
6
Aspartic acid
105
10
Threonine
Serine
22
25
78
Glutamic acid
31
Proline
33
Glycine
42
Alanine
15
Valine
Methionine
6
10 13 14 18
Isoleucine
Leucine 31 36 41
9
Tyrosine
14 1412
Phenylalanine
65
7
5
103
85
7
78
30
28
18
38
12
6
62
6
4
91
7
17
18
67
2727
26
3131
10
5
27
8
7
14 13
45
3
5
2
20
6870
2426
3136
4
3
28
29
5
12
52
4
5
82
5
65
27
30 22
34
5
2
911
27
5
12
53
4
4
83
6
16
48
3
4
82
4
16
64
30
8
3
12
7
3
473
417
366
381 366 395
509
Protein A including the signal peptide.
* Mature protein A, amino acids 1-473 in Fig. 3.
Mature protein A except region E, amino acids 57-473.
dMature protein A except COOH-terminal part, amino acids 1366.
e From Movitz (2), isolated by lysostaphin treatment of bacteria.
'From Lindmark et al. (8), isolated by lysostaphin treatment of
bacteria.
8 From Lindmark et al. (8), extracellular protein A produced by a
methicillin-resistant strain.
Total
- . . . . .ATCATCT/"
" TTTATTTTAC.
. . . .- 3
1699
Phe UUU
uuc
Leu UUA
UUG
cuu
CUC
CUA
CUG
Ile AUU
AUC
AUA
Met AUG
Val GUU
GUC
GUA
GUG
Ser UCU
ucc
Pro
UCA
UCG
CCU
ccc
CCA
CCG
Thr ACU
ACC
ACA
ACG
Ala GCU
GCC
GCA
GCG
Prot-A
Chromb
Plasmid
2
12
20
5
7
1
6
2
8
9
1
6
5
2
6
2
5
0
3
2
21
45
20
34
22
31
7
3
31
38
30
12
29
21
21
21
30
20
21
31
22
16
11
11
25
13
16
48
45
29
36
40
38
39
11
35
13
10
4
5
4
27
5
18
12
12
1
14
4
16
1
7
4
10
5
3
1
14
4
15
5
8
2
5
1
4
0
25
1
11
5
, _
Prep
9
1
6
1
Tyr
UAU
UAC
Term UAA
UAG
His
CAU
CAC
Gin 33CAA
CAG
Asn
AAU
31AAC
Lys
AAA
AAG
Asp
GAU
35GAC
Glu 59GAA
GAG
UGU
Cys
UGC
Term UGA
Trp
UGG
Arg
CGU
CGC
CGA
CGG
Ser
AGU
17 AGC
Arg
AGA
AGG
GGU
Gly
36GGC
GGA
GGG
Sum
Per cent G/c
29
9
8
1
0
0
6
1
38
2
20
45
51
18
21
19
37
1
0
0
0
0
3
3
0
0
3
12
0
0
18
14
1
0
49
33
27
8
46
20
17
1
16
6
43
12
56
12
22
5
19
10
7
4
9
4
1
3
0
13
3
11
4
11
2
3
3
509
32
1654
42
655
22
35
68
79
26
81
35
2
2
-
35
18
5
10
9
19
11
14
22
1700
B.
A.
5'
5'
3'
FIG. 5. Dot matrix comparisons of the protein A sequence. A, the entire nucleotide sequence and the
immediate 5' and 3' flanking sequences are compared with itself. Each dot represents the center of a three-base
identity, and direct repeats appear asparallel lines across the grid. R, the deduced amino acid sequence compared
with itself.
REGION C'
FIG. 6. Comparisons of the IgG-binding regions and flanking regions. The sequences of the repetitive
regions have been aligned to achieve maximal homology. The comparison is based on region B', and a nucleotide
is marked with a n asterisk and an aminoacid is underlined when different from the B' region. The cleavage points
for trypsin are marked with arrows.
marked with an asterisk, and a changed amino acid is underacid changes and Table
lined. Table I11 summarizes the amino
IV the codon changes between the regions. A comparison of
the five regions with respect to mutual relationship reveals a
pronounced "homology gradient" along the protein molecule,
i.e. the closer the location of two regions,the higher the degree
of homology. As already pointedout by Sjodahl (27), one
interpretation of thisphenomenonisthatthe
primordial
1701
E
D
A
B
C
0
11
14
11
0
12
14
21
11
17
12
7
0
5
15
Total
57
46
11
5
0
10
21
17
15
10
0
40
41
64
TABLE
IV
Comparison of codons of the ZgG-binding regions
The values listedrepresentthenumber
of changednucleotide
triplets of identically positioned codons when the regions are campared in pairs.
Region
E
D
0
31
A
B
C
25
26
36
31
0
21
25
28
2536
2128
0
14
30
26
25
14
0
20
Total
30
20
0
118
105
101
86
115
amino acidsequencepreserved.
Since the number of total
changes of codons is lowest for region B (Table IV), this
region was chosen for the comparison in Fig. 6.
Structuralstudies of protein A have suggested that 11
amino acids of the IgG-bindingregions are essentialfor binding to theF, part of the immunoglobulins (35). Mostof these
amino acids are assumed tobe located in two a-helical regions
(35). In region B, the corresponding residues are 183-192
and 198-211. As seen in Fig. 6, there are striking homologies
in these two a-helices between the different regions, suggesting an evolutionary pressure to keep these residues intact.
The changes observed are often outside the two helical areas,
for instance, the changed His-Leu, at position
193-194 of
region B, to Asn-Met, in
regions E, D, and A. This pressure
is evenmore pronounced when comparing the residues in
these a-helices that interact with IgG. In region B, these
amino acids are 184-186 (Gln-Gln-Asn), 188-189 (Phe-Tyr),
192 (Leu), 203 (Asn), 206-207 (Ile-Glu), and 210 (Lys). As
seen in Fig. 6, thereis a serine insteadof aspargine at position
70, but all the other 49 residues are identical. Clearly, there
is a strong pressure tokeep these amino acids preserved.
Apart from the mutual homology between the five regions,
there also seem to exist internalhomologies in each region as
revealed by traces of lines in Fig. 5, A and B. Hence, the
nucleotide sequence coding for amino acids 179 (Lys) to 188
(Phe) and196 (AAC)to 205 (Phe) all withinregion B contains
24 identical outof 30 nucleotides. Another subregion of interest is the nine-nucleotide insert, giving the amino acids 5961, which has been observed in protein A both fromS. a u r e u s
Cowan I and 8325-4. This subregion (residues 57-62) is possibly related to other regionslike amino acids 4-9 in the
beginning of region E. A comparison nucleotide by nucleotide
reveals that 14 out of 18 bases are identical between these
two regions.
S t r u c t u r e of Region X-The repetitive nature of region X
is indicated as multiple lines in Fig. 5, A and B , giving an
approximately 300-base pair repetitive region (X,) followed
by a constant region coding for 81 amino acids (Xc).In Fig.
7, the 24-nucleotide repeats are aligned and a mutual comparison was performed.Again, a changed nucleotide is marked
with an asterisk, and a changed aminoacid is underlined. The
3 end of the repetitive region is obviously located at amino
acid 392 (see Fig. 7) which is directly followed by the constant
209
237
x1
305
x2
313
x3
32 1
x4
329
x5
337
X6
345
x7
353
X8
361
x9
369
x10
377
x11
385
x12
3) 3
1702