Vous êtes sur la page 1sur 9

Copyright 0 I99 1 by the Genetics Society o f America

Circumsporozoite Protein Genes of Malaria Parasites (Plasmodium spp.): Evidence for Positive Selection on Immunogenic Regions
Austin L. Hughes'
Centerf o r Demographic and Population Genetics, The University o f Texas Health Science Center, Houston, Texas 77225 Manuscript received May 30, 1990 Accepted for publication October 10, 1990

ABSTRACT The circumsporozoite (CS) protein is a cell surface protein of the sporozoite, the stage of the life cycle ofmalaria parasites (Plasmodium spp.) that infects the vertebrate host. Analysis of DNA sequences supports the hypothesis that in Plasmodium falciparum,positive Darwinian selection favors diversity in the T-cell epitopes (peptides presented to T cells by host MHC molecules) of the CS protein. In gene regions encoding T cell epitopes of P. falciparum,the rate of nonsynonymous nucleotide substitution is significantly higher than that of synonymous substitution, whereas this is not true of other gene regions. Furthermore nonsynonymous nucleotide substitutions in these regions cause a change of amino acid residue charge significantly more frequently than expected by chance. By contrast, in Plasmodium cynomolgi, the same regions show no evidence of positive selection, and residue charge is conserved. The CS protein has a central repeat region, which is the target of host antibodies. In P . falciparum, the amino acid sequence of the repeat region is conserved within and between alleles. In P. cynomolgi, on the other hand, there is evidence that positive selection has favored evolution of two different repeat types within a given allele.

EVERAL recent papers have reported that positive Darwinian selection acts on genes encoding proteins involved in recognition of foreign antigens by the vertebrate immune system (HUGHES and NEI 1988, 1989; TANAKA and NEI 1989). So far, nostudy has used similar methods to test the hypothesis that genes of pathogenic organisms are subject to corresponding selection for the ability to evade the host's defense mechanisms. I test this hypothesis in the case of the circumsporozoite(CS) protein genesof malaria parasites (Protista: Sporozoa: Plasmodium spp.). The CS protein is expressed only on the surface of the sporozoite, the stage of the Plasmodium life history that is infective to the vertebrate host, and has long been known to be a target of host immune responses (CLYDE et al. 1973, 1975; NARDIN et al. 1982; GODSON et a l . 1983; ZAVALAet a l . 1983). Recent DNA sequence data have revealed considerable polymorphism at the CS protein locus in Plasmodium species, and several authors have suggested that this polymorphism may be the result of positive selection. For example, in the simian parasite Plasmodium cynomolgi, there is considerable polymorphism in therepeatregion of the CS protein, which is the target of host antibodies (GALINSKI et al. 1987). ENEA and ARNOT(1988) suggest that selection may have promoted this polymorphism. In the human parasite Plasmodium falciparum, variation among alleles in non' Currentaddress:Department of Biology and Institute of Molecular EvolutionaryGenetics, 208 MuellerLaboratory, T h e PennsylvaniaState University, University Park, Pennsylvania 16802.
Genetics 127: 345-353 (February, 1991)

repeat regions has been shown to involve only differences at first and second codon positions (McCUTCHAN, GOOD and MILLER 1989). Since differences at first and second codon positions are generally nonsynonymous and since the nonrepeat regions include peptides presented by major histocompatibility (MHC) molecules to T cells, it has been argued that this polymorphism is caused by positive selection to evade T cell recognition (GOODet a l . 1988a; McCUTCHEON, GOOD and MILLER 1989). However, since the number of nucleotide differences involved is small, this interpretation has been questioned (ARNOT 1989). Because evidence regarding positive selection on CS protein genes has so far been inconclusive, I conducted detailed statistical tests of this hypothesis by examining patterns of nucleotide substitution in immunologically important regions of CS protein genes.
DNA SEQUENCES ANALYZED

In Plasmodium fertilization occurs in the midgut of the insect host and is directly followed by meiosis. T h e haploid products of meiosis develop into sporozoites; these migrateto themosquito's salivary glands, where they mature andeventually infect a subsequent vertebrate host. The CS protein covers theentire surface of the maturesporozoite (YOSHIDA et al. 1980; FINEet a l . 1984), accounting in one species for 1020% of theprotein synthesized by the sporozoite (COCHRANE et a l . 1982). The CS gene, which is present in a single copy per haploid genome, lacks introns and

346

A. L. Hughes
TABLE 1
Circumsporozoite proteingene sequences used in analyses
Amino acid residues 9NR 5NR Repeats

Spears (strain)
P . falciparum

Host

Allele References

Human LE5

N/A

1. yoelii P. berghei 1. cynomolgi

Murine rodents rodents Murine Monkeys (Macaca)

1. knowlesi

Monkeys (Macaca) Human

P. vivax

104 123 133 125 118 138 NK65 92 B 96 C 96 G 97 L 97 M/N 208 98 N 97 H 144(12) 99 B (9) 180 95

NF54 T4 We1 7G8

(4) 160 (4) 176 (4) 176 (4) 184 (4) 164
(6, 4) 136(8,4) (9, 172 16) 195 (9, 17) 182(11) 180 (6, 11) (4) (9) 144

N/A 125 125 125 125


11 1 100 110

107 123
101

109
110 120

103

CRUZ et a[. (1987, 1988) CASPERS et al. (1989) DEL PORTILLO, NUSSENZWEIG andENEA (1 987) LOCKYER and SCHWARZ(1987) DE LA CRUZ,LALand MCCUTCHAN ( 1 987) LALet al. (1987) L A N A R ( ~ ~ ~ ~ ) GALINSKI et al. (1987) GALINSKI et al. (1987) GALINSKI et al. (1987) GALINSKI et al. (1987) GALINSKI et al. ( 1 987) SHARMA et al. (1985) SHARMA et al. ( 1 985) ARNOT, BARNWELL and STEWART (1988)
DE LA

Numbers in parentheses are lengths of amino acid repeat units. N/A = complete sequence not available.
P
falciparum

encodesaprotein which varies in length between species but is usually around 400 amino acid residues long. The CS gene can be divided into three regions: (1) the 5 nonrepeatregion ( 5 N R ) ; (2) acentral repeat region, consisting of one or two short motifs (4- 12 codons in length) repeated in tandem numerous times; (3) the Snonrepeat (3NR) region. Table 1 lists DNA sequences analyzed in this paper; where data are available, the number of amino acid residues encoded by each of these gene regions is given. Evidence from a number of different host and parasite species indicates that the repeat region is the target of antibodies against the CS protein (GODSON et al. 1983; ZAVALAet al. 1983; BALLOU et al. 1985). In the caseof P. falciparum, epitopes presented by class I1 MHC molecules and recognized by helper T cells have been identified. These consist of two peptides (T cell epitopes 1 and 2, henceforth TCE) encoded in 3NR region of the gene (GOOD, BERZOFSKY and MILLER 1988; GOOD et al. 1987, 1988; LOCKYER, MARSH and NEWBOLD 1989). So far, regionspresented by class I MHC molecules and recognized by cytotoxic T cells have not been recognized experimentally, although it has been argued that they are likely to have similar properties to the helper T cell epitopes and thus to overlap them(GOOD et al. 1987). The number of repeats in the repeat region may vary between alleles of one species and in some species (such as P . cynomolgi), the length of each repeated unit varies widely between alleles (Table 1). For these reasons, the repeat region cannot be reliably aligned even between alleles of the same species. The nonrepeat regions were aligned at the protein level by the method of GOTOH (1986) (Figures 1 and 2). In both 5NR and 3NR, not all sequences are complete. Also, in some parts of the 5NR, homology between differ-

H M R K L A I L S V S S F L F V E E Y O C Y G S S S N T R V L N E ~ - D N A G T N L Y N ~ 59 KKCTILWASLLLVDSLLPGYGQNKSVQAQRNLNELCYNEENDNKL~V~SK-NGKI
58

P. y o e ~ i i
P . berghei

-~CTILWASLLLVNSLLPGYGQNKSIQAQRNLNELCYNEGNDNKLYHVWSK-NGKI 58
KNFILIAVSSILLVDLLPTHFEHNVDLSRAINVNGVSRVNVDTSSUiAAQSASRG
59

knowlesi

P. cynomolgi
P. ,ax
p . falciparum

KNFNLLAVSSILLVDLFPTHCGHNVDFSRGINLNGVSFNNVDASSHGAEQVRQSASRG

59

-PZKNFILIAVSSILLVDLFPTHCGHNVDLSKAINLNGVN~NVDASSLG~VGQSASRG 59

~ ~ S L - - - - - - - - - - - - - - K K N S R S L G E N D D G N N N N G D N G R E G K D E D K R D G 104 NNE-~

P. yoelii
p , berghex p , knovlesi
p

YNRNIVNRLLGDAINGKPEEKKDDPPKDGNKDDLPKEEKKDDLPKEEKKDDPPKDPKKDD
YNRNTVNRLL ............................................. ADAPE

118
73

RGL ..........................................
RGL..........................................

GEKPKEGADKEKKKE GENPKDEGADKPKKK

77
11

p , vivax

RGL .............. .............................

G E ~ P D D E E G D ~ 15

P. falciparum

LRKPKHKKLKOPGDGNPD-P PPKEAQNKINQPWADENVD GKKNEKKNEKIERNNKLKQP

123
138
93

P. yoelii
P

berghei

knowlesi KEKEKEEEPKKPNENKLKQ DEKQVEPKKPRENKLKQPRE 97 KKDGKKAEPKNPRENKLK 95

97

P. cynomolgi
P
Y L V ~ X

FIGURE .-Alignment 1 of the N-terminal region of the CS protein from six Plasmodium species (encoded by the 5NR region of the gene). Regions analyzed in this paper are underlined in the top sequence.

ent species is low. So that all analyses would be based on a comparable data set,65 aligned codons from the 5NR and 78 aligned codons from the 3NR (including 36 codons in theTCE) were usedinanalyses (Figures 1 and 2).
RESULTS

Nucleotidesubstitution in nonrepeatregions: I computed the numberof synonymous differences per synonymous site (p,) and the number of nonsynonymous differences per nonsynonymous site (p,) in pairwise comparisons among available sequences. p.7 and p, were computed separately for aligned codons in

Genes

Protein Circumsporozoite
TABLE 2

347

Percent synonymous (p.) and nonsynonymous (pN) differences in different regions of circumsporozoite proteingenes
5'NR ( N = 6 5 )
Cotnprison (No.)

S'NR (excluding TCE) ( N = 42)


P.5

T C E ( N = 36)
P.5 P.b

P.$
0.0 f 0.0 73.4 f 7.0 82.4 f 6.0 64.8 f 7.7 63.5 f 7.6 60.5 f 7.7 27.0 f 7.0 75.9 f 7.1 70.2 f 7.3 77.7 f 6.6 74.3 f 7.2 69.6 f 7.3 75.2 f 6.8 10.0 f 3.7 13.8 3.6 27.9 f 6.7

P .
0.6 f 0.5 42.5 f 4.0*** 49.1 f 4.0*** 56.2 f 4.0 59.6 f 4.0 59.0 f 4.0 24.0 f 3.4 60.4 f 4.0* 58.9 f 4.0 67.5 f 3.8 54.7 f 4.0* 58.4 f 4.0 57.7 f 4.0* 5.0 f 1.3 14.6 f 2.6 16.3 f 2.8 2.0 f 1.0 22.4 f 3.3 38.8 f 2.5

P,
0.8 f 0.6 27.8 f 4.4" 29.5 f 4.Sh 36.4 f 4Ah 36.9 k 4.8' 36.0 f 4.8' 6.9 f 2.5' 25.8 f 4.3' 27.9 f 4.5' 25.3 f 4.3' 30.9 f 4.6*' 33.4 f 4.7 30.8 f 4.6' 2.2 f 0.9 14.4 f 3.4 9.8 f 2.8 1.0 f 1.0 13.7 f 3.4 23.5 f 2.7*'

P. falciparum vs. P. falriparum ( 1 0) vs. P. yoelii ( 5 ) vs. P. berghei (5) vs. P. cynomolgi (25) vs. P . knowlesi (1 0) vs. P . vivax ( 5 ) P. yoelii vs. P. berghti (1) vs. P. cynomolgi ( 5 ) vs. P. knowlesi (2) vs. I". viuax ( 1 ) 1'. berghez vs. P. cynomolgi ( 5 ) vs. I? knowlesi (2) vs. P. vivax (1) P . cynomolgi vs. P. cynomolgi ( 1 0) vs. P . knowlesi (10) vs. P . vivax ( 5 ) P. knowlesi vs. P . knowlesi (1) vs. P. vivax (2) All comparisons ( 1 05)

0.0 f 0.0 45.1 f 10.0 50.5 f 10.0 57.5 f 10.0 55.1 f 10.0 53.5 f 10.0 12.3 f 6.7 46.1 f 9.9 48.1 f 10.0 36.8 t 9.6 55.0 f 10.0 54.6 f 10.0 46.5 f 10.0 5.2 3.1 20.8 f 7.8 17.9 f 7.3 0.0 f 0.0 25.2 _t 8.7 37.5 f 6.3

0.0 k 0.0 72.3 f 9.3 69.5 f 9.6 69.6 f 9.8 66.4 f 9.7 64.1 f 10.0 15.5 f 7.5 77.5 f 8.8 94.7 f 4.7 81.6 f 8.1 89.4 f 6.6 98.5 f 2.5 93.2 f 5.2 1.6 k 1.7 21.8 f 8.5 13.0 f 6.9 0.0 f 0.0 20.9 f 8.5 49.3 f 6.7

5.4 f 1.6***h 42.0 f 5.3** 42.6 f 5.3* 38.5 f 5.2**h 46.1 f 5.Y" 46.1 f 5.4 17.0 f 4.1 45.7 f 5.4*" 43.8 f 5.5***" 45.0 f 5.4***' 41.0 f 5.3***" 35.6 f 5.2***' 38.0 f 5.3***" 3.3 f 1.3 22.3 f 4.4 14.6 f 3.7 1.2 f 1.2 20.2 f 4.3 30.5 f 3.2**"

4.8 f 3.4 31.7 f 7.2 47.5 f 5.1

Values are percent synonymous difference at synonymous sites (p,$)and percent nonsynonymous difference at nonsynonymous sites (p,,) f SE. N = number of codons compared. p S significantly different from p,v at (*) 5 % level: (**) 1 % level: (***) 0.1 % level. p,. significantly at * 5% ievel; h ' ~% &vel: 0.1 % level. different from p,, in ~ ' N R

the 5'NR, the 3'NR excluding the TCE, and the TCE (Table 2). NEI and GOJOBORI'S (1986) method was used to estimate p , and p N . Standard errors of mean p,s and P N were computed by the method of NEI and JIN (1989). In the comparison amongthe five P.falciparum alleles, there are no synonymous differences in the 5'NR or 3'NR. Only in theTCE, however, is p N significantly higher than p s (Table 2), and in P. falciparum, P N in the TCE is Significantly higher than P N in the other regions. Comparison among alleles in P . cynomolgz and between the two P . knowlesi alleles does not reveal a similarly elevated PN in theTCE.In between-species comparisons, the TCE seems to be a relatively conserved region, since ps significantly exceeds P N in this region in a majority of such comparisons. The 5'NR is the least conserved region in between-species comparisons; in mostcases, pN in this region is significantly higher than that in the other two regions. ARNOT(1 989) suggested that the lack of synonymous differences among P. fakiparum alleles may be due to some factor that prevents synonymous substitution at this locus. It is known that an extreme bias in G C content (leading to either ve5y high or very low levels of G + C at third-codon positions) is associated with areduction in the rate of synonymous

substitution (WOLFE,SHARPE and LI 1989). Plasmodium species are known to have low G C content in genomic D N A (WEBER1988), and in regions of CS genes analyzed in Table 2 a similar bias is apparent (Table 3). Third-position G C content is particularly low in the TCE of P. falciparum and the two rodent parasites (Table 3). This G + C content bias may well lower the rate of synonymous substitution in these species, but it does not seem to have totally eliminated synonymous substitution in this region, since there are synonymous differences among these species in the TCE (Table 2). LOCKYER, MARSH and NEWBOLD (1989) sequenced the TCE only from a number of CS alleles from P . falciparum. Combining their sequence data with that previously published, it was possible to compare the TCE (36 codons) for 16 alleles. In this comparison p,y = 0.5 k 0.8, and P N = 5.7 & 1.3. p , y and p , are significantly different at the 1% level, and the value of P N in the TCE for this expanded data set in very close to that obtained for the five complete alleles analyzed in Table 2. These dataalso provide evidence that synonymous substitutions can occur in the TCE of P. falciparum. In P . cynomolgz and Plasmodium knowlesi, there is n o evidence that the same regions which form the TCE of P. falciparum are under positive selection. These

348
TABLE 3

A. L. Hughes
P

++
falciparum yoelii berghei
N - - - - - - - - - - - - - - - - - - . - ~ N Q G N G Q G H N H P N N P N R M ~ 40

Mean G + C content (in %) at all positions and at third-codon positions (3d) in different regions of circumsporozoite protein genes

P
P

q . . . . . . . . . . . . . . . . . . . . Q . . . . . . . . . . . .

PRPQPDGNNNNNNNNGNNNEDS..S

28

DPAPpQCNNNpQPQPRPQPQpQPQ~PQPQPQ~PRPqPQ~PGGNNN~NNNDDSYlPS 60
GDGARGGNAGAGKGQGqNNQGANVPN

P
p

howlesi ..................................

26
39 38

cynomolgi

. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .

NARAC~PPACCmocAGEAGGNAGAGQGQNNEGANVPN
GGNAANKKAEDACGNAGGNAGCG~NNEGANAPN

P. ,ax

Species
( N o . ;tllelea)

-~
Sd
All

++++++++++++++++

e+++++++++++++++++
100
86

3d

All

Sd

All

P falciparum
P
yoelii berghei

AEQILEWKqISSQLTEEWSQCSICGSGVRVR-KRKNVNKQPENLTLE-DIDTEICKnD AEKILEFVKQlRDSITEEWSqCNVTCGSGIRVR-KRKGSNKKAEDLILE.DIDTEICKnD
EKWNDYlllKIRSSVTTEWTPCSVTCGNGVRIRRKGHAGNKMD-DLEVEACVHD AKLVKEYLDKIRSTLGVEUSPCSVTCGKGVRMRRKVSAAN-DLGTGVCTMD

/.fakiparum (5) I?yoelii (1) 1. bwghei(1) 1. rynomolgi (5) 1. knowlesi(2) 1. viuax ( 1 )

:10.8 24.6 29.2 34.2 40.0 40.0

33.9 32.8 31.3 42.2 47.2 47.2

19.0 11.9 16.7 31.0 28.6 28.6

31.1 33.3 27.0 44.3 46.0 46.0

16.7 25.0 25.0 32.8 27.8 27.8

30.0 33.3 33.3 38.5 38.0 38.0

118
85

P . knowlesi P

cynomolgi

98

P. v i v a x

EKSVKEYLDKVRATVGTEWTPCSVTCGVGVRVRRRVNAAN-DLETOVCTMD

94

A = number of aligned codons analy7.ed.

P. fakiparum mNWNSSIGLIMILSFLFW P
yoelii

125

KCSSIFNIVSNSLG~IILLVLVFFX 111

P. berghei

KCSSIFNIVSNSLGFVILLVLVFFN
KCAGIFNWSNSLGLVILLVLLF5
KCAGIFNWSNSIICLVILLVLLFN

143

regions may not be T cell eptitopes for other species, but instead certain other sections of the 3NR may serve as T cell epitopes. If such epitopes are under similar positive selection to the TCE of P. falciparum, one way to look for themis to look for regions showing P,,, > Ps. In P. cynomolga, a search for such a region produced a candidate T cell epitope corresponding to the portion of the 3NR that is 5 to T-cell eptiope 1 of P. falciparum plus T cell epitope 1 of P. falciparum (30 codons; see Figure 2). In this region, in the comparison among the five available P. cynomolgi alleles, p,v = 0.0 f 0.0 and p N = 3.8 k 1.6. The difference between pS and p N in this region is significant at the 5% level. By contrast, in the remainder of the 3NR outside this region, in the comparison among all P. cynomolga alleles, p . ~ = 6.1 & 3.0 and p , = 2.1 f 0.9. In this case, the difference between p, and p , is not statistically significant. Conservative and radical amino acid replacements: Under positive selection favoring diversity at the protein level, diversity with respect to a particular amino acid property may be favored; and thus amino acid replacements that are radical (nonconservative) with respect to this property will occur with a disproportionate frequency. MONOSet al. ( 1 984) noted that class I MHC alleles show an exceptionally large number of charge differences and speculated that these differences may be important in determining differences among alleles with respect totheirpeptidebinding capacities. HUGHES, OTA and NEI (1990) tested this idea statistically by a methodwhich classifies nonsynonymous nucleotide sites (NEI and GOJOBORI 1986) as conservative or radical (with respect to an amino acid property of interest) and, in comparing two DNA sequences, estimates the number of conservative nonsynonymous nucleotide differences per conservative nonsynonymous site ( P N C ) and the number of radical nonsynonymous nucleotide differences per radical nonsynonymous site ( P N K ) . If > PNR, the amino acid property of interest is conserved. If

P. h n o w l e s ~
P . cynomolgl

110
123
119

P . vivax

KCAGIFNWSNSLGLVILLVLLFX

F I G U R E 2.-Alignment of the C-terminal region of the CS protein fro111 six Plasmodium species (encoded by the 3NR region of the gene). + indicates amino acid residues in the P.fulczparum T (ell epitopes. The region analysed in this paper is underlined i n the ~ o sequence. p

nonsynonymous substitutions occurat random with respect to the property, and thus there is no particular constraint with respect to that property. If P N , > PN,, then selection favors diversification between the sequences compared with respect to the property. In thecase of class I MHC genes of humans and mice, P N R >p N < : with respect to aminoacid residue charge in the bindingcleft, indicating that in this region nonsynonymous nucleotide substitutions causing a charge change occur more frequently than expected by chance (HUGHES, OTAand NEI 1990). If positive selection on the TCE of P. falciparum favors the ability of this region to evade binding by MHC molecules, it might bepredictedthat in the TCE also nonsynonymous substitutions causing charge change occur more frequently than expected by chance. I tested this prediction by applying the method of HUGHES, OTAand NEI (1 990) to nonrepeat regions of Plasmodium CS genes (Table 4). In the TCE of P. falciparum, pNR with respect to charge exceeds pNc by a ratio of over 9:1, whereas in other gene regions in this species pNc;and p N Kare notsignificantly different. In P. cynomolgi, by contrast, P N C is significantly greater than p N R in the TCE. Also in several between-species comparisons, p N C is significantly greater than P N K in the TCE. Outside the T C E , by contrast, p,, and P N R tend to be about the same in most comparisons (Table 4). These results support the hypothesis that in P. falciparum selection favors diversification of charge profile in the TCE. A similar analysis was applied to the candidate T cell epitope for P. cynomolgi mentioned above. In the

PNC:= P N , ,

Circumsporozoite Protein Genes


TABLE 4

349

Percent conservative (pNC) and radical (pNR) nonsynonymous nucleotide difference in different regions of circumsporozoite protein genes
5NR ( N = 65)
Comparison (No.) PNK
PNC PNK

3NR (excluding TCE) ( N = 42)


PNC PNK PNC

TCE ( N = 36)

P . falciparum vs. P. fulciparum (1 0) vs. P. yoelii (5) vs. P. berghei (5) vs. P. cynomolgi (25) vs. P. knowlesi (1 0) vs. P. vivax (5) P . yoelii vs. P. berghei (1) vs. P. cynomolgi (5) vs. P. knowlesi (2) vs. P. uiuax (1) P. berghei vs. P. cynomolgz (5) vs. P. Rnowlesi (2) vs. P. vivax (1) P . cynomolgi vs. P . cynomolgz (10) vs. P. Rnowlesi ( 1 0) vs. P. viuux (5) P. knowlesi vs. P. knowlesi (1) vs. P. vivax (2) All comparisons (105)

0.8 f 0.8 43.3 f 5.6 49.5 f 5.8 55.0 f 5.7 59.7 f 5.6 55.6 f 5.6 18.9 f 4.6 54.3 k 5.8 55.6 f 5.8 59.6 k 5.7 57.5 f 5.9 57.7 f 5.9 56.1 f 5.8 4.8 f 1.7 13.9 k 3.7 1 6 . 6 f 4.1 1.4 k 1.4 20.7 f 4.7 37.4 f 3.5

0.5 f 0.5 49.6 f 5.6 48.7 f 5.5 58.4 5.6 59.4 f 5.5 62.4 f 5.6 28.4 f 4.9 66.1 f 5.4 61.8 k 5.4 75.1 f 4.9* 52.5 f 5.5 59.0 2 5.4 59.2 f 5.5
5.1 f 1.8 15.3 f 3.8 15.8 f 4.0

1.3 f 1.0 26.1 f 5.6 26.7 f 5.7 31.8 f 5.9 30.1 f 5.9 29.1 f 5.8 8.2 f 3.5 23.7 f 5.4 26.8 f 5.7 23.2 f 5.4 31.4 f 6.0 34.4 f 6.1 31.6 -+ 6.0 3.1 f 1.5 17.8 f 4.8 13.4 f 4.3 1.7 f 1.7 15.7 f 4.7 21.8 f 3.4

0.0 f 0.0 30.3 f 7.1 33.7 f 7.3 43.1 f 7.6 47.0 k 7.9 46.3 f 7.9

0.9 f 0.9 45.0 f 9.3 46.1 f 7.6 48.8 f 7.7 58.4 f 7.6 61.6 f 7.4 15.2 f 7.5 50.8 f 7.6 47.2 f 7.6 50.0 f 7.6 52.3 f 7.6 43.3 ? 7.5 47.7 f 7.6 6.6 f 2.7 30.6 f 6.9 24.1 f 6.3 2.4 f 2.4 30.9 k 7.1 37.6 f 4.8

9 . 7 f 3.1** 39.0 f 7.3 39.1 f 7.3 28.9 k 6.6* 34.5 f 7.0* 3 0 . 9 f 6.8** 18.8 f 6.0 40.6 f 7.6 40.4 f 7.5 39.7 f 7.6 29.5 f 7.0* 28.0 f 6.8 28.1 f 7.0
0.0 f o.o* 14.1 f 5.3 4.9 f 3.4**
0.0 f 0.0 9.5 f 4.5* 23.5 f 4.1 *

4.9 f 3.4 29.0 f 7.1 29.7 f 7.3 28.6 f 7.3 30.1 f 7.1 32.0 f 7.3 29.7 f 7.2 1.0 -+ 1.0 9.3 f 4.5 4.4 f 3.1
0.0 f 0.0 10.5 f 5.0 26.0 f 4.4

2.5 f 1.7 24.0 f 4.8 40.0 f 3.6

Values are percent conservative nonsynonymous difference at conservative nonsynonymous sites ( p s ) and percent radical nonsynonymous difference at radical nonsynonymous sites (p,) f SE. N = number of codons compared. pNc significantly different from p,, at (*) 5% level; (**) 1 % level.

putative epitope, p N C = 6.0 & 2.8 and PNR = 1.3 +I .2; in this case the difference between PNCand P N R is not statistically significant. In the remainder of the 3NR, PNC = 3.7 f 1.6 and P N R= 0.0 -t 0.0; here the difference between p N C and P N R is significant at the 5% level. Thus, there is no evidence of positive selection favoring charge profile diversity in the putative T cell epitope of P . cynomolgi. However, it appears that the entire C-terminal region of the CS protein is under fairly strong constraint with respect to charge in P . cynomolgz (Table 4). Evolution of repeat regions:Because alignment of the repeat region of CS genes is not possible between species, repeat regions were analyzed within species forthe two species with the most available allelic sequences, P. falciparum and P . cynomolgi. In the case of P . falciparum, the basic repeat unit encodes a 4amino acid unit with the consensus sequence Asn-AlaAsn-Pro. The number of repeat units varies among alleles, presumably as a result of deletion and duplication due to unequal intralocus crossing over. The difference between alleles in the number of repeat units makes any alignment of this region arbitrary. Rather than attemptingto align alleles, I computed ps and P N between individual repeat units, both within

and between alleles, as a way of examining the type of natural selection acting on the repeat region (Table 5 ) . p . y was found to exceed P N in the comparison of repeat units both within and between alleles. In fact, mean ps and mean p , in within-allele comparisons are almost identical to those in between-allele comparisons (Table 5 ) . These results indicate that the amino acid sequence of the repeat unit is conserved within and between alleles in P. falciparum. In P . cynomolgi, the length of the repeat unit varies among alleles, and several alleles have two separate repeat types which differ in length (Table 1). Alignment of the repeat region in this species is therefore difficult. However, GALINSKI et al. (1987) have identified a4-amino acid coresequence, homologs to which can be found in all P. cynomolgi repeat types. When ps and P N are calculated for this core sequence within and between repeat types, a very different pattern emerges from that seen in P .falciparum repeat units (Table6).In P. cynomolgi, p N exceeds p s in comparisons betweendifferentrepeat types of the same allele. This suggests that positive selection favors diversification of the amino acid sequence of the core repeat unit within alleles. On the other hand, in the comparison between alleles, ps and p N are about equal.

350
TABLE 5

A. L. Hughes

Percent synonymous (ps)and nonsynonymous (p,) differences in comparisons among repeat units (4 codons) within and between P.faleiparum alleles
Allele ( N o . repeat units)

Comparison (No.)

pS

PN

L.F.5 (40)

NF54 (44)

'1'4 (44)

We1 (46)

7<;8 (4) Means Between alleles" Within alleles

k 17.5 4.1 f 2.6** us. T 4 (1760) 52.4 f 17.5 3.8 f 2.4** us. We1 (1840)54.3 k 17.8 4.3 f 2.7** us. 7G8 (1646)54.5 f 17.6 4.2 f 2.7** us. NF54 (946) 44.0 f 16.0 3.6 f 2.3* us. T 4 (1936) 42.0 f 15.6 3.1 f 2.0* us. We1 (2024) 46.2 & 16.9 3.7 k 2.3* us. 7G8 (1804) 46.1 f 16.8 3.7 f 2.3* us. T 4 (946) 41.8 f 16.4 2.7 f 1.8* us. We1 (2024) 45.5 f 16.8 3.3 f 2.1* us. 7G8 (1804) 4 5 . 3 f 16.7 3.3 f 2.1* us. We1 (1035) 48.7 k 17.8 3.9 f 2.4* us. 7G8 (1886)48.3 f 17.4 3.8 f 2.4* us. 7G8 (820) 49.2 f 17.8 3.9 f 2.4* 48.6 3.7 48.5 k 17.4 3.7 f 2.4***

us. LE5 (780) 60.8 us. NF54 (1760) 52.9

f 18.9 4.8 2 3.1**

Values are mean percent synonymous difference at synonymous sites (p,,) and mean percent nonsynonymous difference at nonsynonymous sites (p,) f SE. p,, is significantly different from p , at (*) 5% level; (**) 1 % level; (***) 0.1% level. " A method is not available for estimating SE of between-allele
111CilllS.

This suggests that once two alleles have evolved different core repeat types from each other, there is not continued selection for further diversification of the core unit between the two alleles. Of course, repeat types in P. cynomolgi differ in numerous other ways besides the sequence of the core repeat unit;duplication and deletion of codons must also have occurred numerous times, thereby giving rise to further differences among repeat types in addition to those examined here.
DISCUSSION

In P . falciparum, the fact that nonsynonymous substitutions areconcentrated in the TCEand occur nonrandomly with respect to residue charge in this region is evidence favoring the hypothesis that positive selection for evading MHC recognition is operating at this locus. In P. falciparum, the CS gene shows an unusual degree of G + C content bias, particularly in the TCE. This bias may lower the rateof synonymous substitution in this region, but it cannot explain the concentration of nonsynonymous differences in the TCE nor the preponderance of charge changesin this region. If P . falciparum has fairly recently switched to the human from a nonprimate host (MCCUTCHAN et al. 1984), this species may have adapted to a host whose MHC polymorphism was already high. Selection on the CS locus favoring adaptationto a pre-existing high

host MHC polymorphism would be a form of balancing selection. The observation that in P. falciparum allelic types in the TCE region are worldwide in their distribution (LOCKYER, MARSH and NEWBOLD 1989) is consistent with such a mode of selection. ENEAand ARNOT (1988) state that it is a "unpalatable" to accept that CS genes of P. falciparum and P. cynomolgi are under different selective pressures. Nonetheless the present analyses reveal certain differences between these two species. The gene regions that encode the TCE for P. falciparum are apparently under positive selection in that species but not in P. cynomolgi. Moreover, there is some evidence that there may be positive selection on another region of the 3'NR in P . cynomolgz. Since different MHC molecules recognize different short peptides from a given foreign protein (SETTEet al. 1989), it would not be surprising to find that the MHCs of different mammalian species exert selective pressure on different regions of the CS molecule. Furthermore, in the repeat region,P. falciparum and P. cynomolgz seem to be under quite different selection pressures. In the former species, the repeat sequence is conserved both within and between alleles, whereas in the latter species there is evidence that positive selection may have favored the evolution of a new repeat type within certain alleles. The biological role of the repeat region of the CS protein has been somewhat controversial. As ENEA and ARNOT(1988)pointout, it is well known by immunologists that a simple amino acid polymer is likely to elicit a strong T cell-independent antibody response by a vertebrate host. Thus, they argue, the role of the repeat region is precisely to elicit such a response, which is apparently ineffective in conferring lasting immunity on the host. T cell-dependent immunity against nonrepeat regions of the CS protein would be much more effective (VERGARA et al. 1986), but the induction of the T cell-independent response to the repeat region appears to prevent the development of a T cell-dependent response (ENEA and ARNOT 1988). Experimental results showing thatthe presence of anti-CS antibodies does not confer immunity (WEISSet al. 1989) are consistent with this hypothesis regarding the function of the repeat region. On the other hand, since antibodies against the CS repeat region can immobilize sporozoites, a series of mutations leading to formation of a new repeat type might be favored by selection because this repeat type would be less effectively attacked by antibodies against the prevailing type (ENEAand ARNOT1988). This would explain why in comparison between core repeat sequences of P. cynomolgi alleles, the proportion of nonsynonymous differences tends to be higher than that of synonymous differences (Table 6). Once it has

Genes

Protein Circumsporozoite
TABLE 6

35 1

Percent synonymous(ps) and nonsynonymous (p.) differences in comparisons among core repeat units (4 codons) within and between P. cynomolgi alleles
Allele (No. core repeat

units)

(No.)

p.s

PN

B (17)

Within allele Repeat type 1 Repeat type 2 1 us. 2 All comparisons Between alleles
us. us. G us. L us. M/N

16.6 f 13.3 4.5 f 6.0 13.3 f 10.3 13.7 f 10.9 17.4 f 13.0 1 6 . 4 f 11.1 7.9 f 6.6 43.6 f 22.1 18.4 f 14.5 0.0 f 0.0 13.4 f 12.7 17.2 f 14.0 22.1 f 14.2 12.5 f 11.0 49.6 f 23.7 13.2 f 11.7 8.6 f 7.8 31.1 f 18.3
0.0 f 0.0 0.0 f 0.0 0.0 f 0.0 0.0 f 0.0

2.3 f 2.0 10.0 f 5.1 35.7 f 15.2 1 9 . 4 f 8.1 28.5 f 35.6 f 34.1 f 25.1 f 11.4 12.7 11.6* 10.8

MIN (52)

Within allele Repeat type 1 Repeat type 2 1 us. 2 All comparisons Between alleles us. G us. L us. M/N Within allele All comparisons Between alleles us. L us. M/N Within allele Repeat 1 Repeat 2 1 us. 2 All comparisons Between alleles us. M/N Within allele All comparisons

5.3 f 5.0 0.0 f 0.0 23.2 f 13.5 9.2 f 5.8

22.6 f 12.6 38.2 f 13.2 21.8 f 11.4


0.0 f 0.0

39.1 f 14.0 14.0 f 10.0


6.1 f 5.6 0.0 f 0.0 58.4 f 16.5*** 24.5 f 7.7**

36.1 f 21.4 27.2 f 14.9

46.9 f 13.1 4.4 f 3.3

Means Within alleles Within repeat types ( B , C , and L ) Within repeat types (all alleles) Between repeat types ( B , C , and L ) Between alleles

8.6 f 8.5 22.6 f 13.5 6.9 f 9.4 30.7

5.2 f 3.8 4.2 f 3.5 44.7 f 15.1** 30.3

Values are mean percent synonymous difference at synonymous sites (p,) and mean percent nonsynonymous difference at nonsynonymous sites (p,) SE. p,s is significantly different from p N at (*) 5% level; (**) 1% level; (***) 0.1% level. A method is not available for estimating SE of between-allele means.

evolved, such a new repeat type may be duplicated within the allele by unequal crossing over; finally, this process will lead to the production of an allele whose repeats are entirely of the new type. Once a new repeat type is formed,there would be no further selection for change in its amino acid sequence; this would explain the fact that in P. cynomolgi, synonymous differencespredominate within repeat types. Note that, on this model, P. cynomolgi alleles having two different repeattypes represent transitionalstages in the spread of new repeat types. The reasons for the differences betweenP. falciparum and P. cynomolgi in the way the CS genes have evolved are not fully understood at present, but it may be worthwhile to mention some possible differ-

ences between their host species that may be correlated with different selective pressures. Polymorphism at human MHC loci is very high, suggesting that the long-term effective population number for humans is quite high (NEI and HUGHES1990). The Macaca species which serve as hosts for P. cynomolgi may not have as high effective population sizes and therefore may have more limited MHC polymorphism. Thus, in P. falciparum selection on the CS genes may mainly have arisen as a result of adaptation to a host with high MHC polymorphism, whereas in P. cynomolgz the hosts MHC may have been less important as a source of selection, giving a correspondingly greater role to the avoidance of host antibody defenses.
This research was supported by National Institutes of Health

352

A. L. Hughes
GOOD, M. F., D. POMBO, I. A. QUAKY, E. M. RILEY,R. A. HOUGHTEN, A. MENON, D. W. ALLINGS, J. A. BERZOFSKY and L. H. MILLER,1988 Human T-cell recognition of the circumsporozoite proteinof Plasmodiumfalciparum: immunodominant Tcell domains map to the polymorphic regions of the molecule. Proc. Natl. Acad. Sci. USA 85: 1199-1203. GOTOH,O., 1986 Alignment of three biological sequences with an efficienttraceback procedure. J. Theor. Biol. 121: 327337. HUGHES, A. L., and M. NEI, 1988 Pattern of nucleotide substitution at major histocompatibilitycomplex class 1 loci reveals overdominant selection. Nature 335: 167-1 70. HUGHES, A. L., AND M. NEI, 1989 Nucleotide substitution at major histocompatibility complex class I1 loci: evidence for overdominant selection. Proc. Natl Acad. Sci. USA 86: 958962. HUGHES,A. L., T . OTAand M. NEI,1990 Positive Darwinian selection promotes charge profile diversity in the antigen bind: 515ing cleft of class I MHC molecules. Mol. Biol. Evol. 7 524. A. WELSH, Y. CHAROENVIT, W. L. LAL,A. A,, V. F. DE LA CRUZ, J. MALOY and T. F. MCCUTCHAN, 1987 Structure of the gene encoding the circumsporozoite proteinof Plasmodium yoelii. J. Biol. Chem. 262: 2937-2940. LANAR, D. E., 1990Sequence of the circumsporozoite gene of Plasmodium berghei ANKA clone and NK65 strain. Mol. Biochem. Parasitol. 39: 151-154. LOCKYER, M. J., K. MARSH and C. I. NEWBOLD, 1989 Wild isolates of Plasmodium fakiparum show extensive polymorphism in T cell epitopes of the circumsporozoite protein, Mol. Biochem. Parasitol. 37: 275-280. LOCKYER, M. J., AND R. T . SCHWARZ, 1987Strain variation in the circumsporozoite protein geneof Plasmodiumfalciparum. Mol. Biochem. Parasitol. 22: 10 1- 108. MCCUTCHAN,T. F., M. F. GOOD and MILLER, H. L. 1989 Polymorphism in the circumsporozoite (CS) protein of Plasmodium falciparum. Parasitol. Today 5: 143- 146. T. F., J. B. DAME,L. H. MILLER and J. BARNWELL, MCCUTCHAN, 1984 Evolutionary relatedness of Plasmodium species as determined by the structure of DNA. Science 225: 808-8 l l . MONOS, D. S., W . A. TEKOLF, S. SHAW and H. L. COOPER, 1984 Comparisonof structuraland functionalvariation in class 1 HLA molecules: the role of charged amino acid substitution. J. Immunol. 132: 1379-1385. NARDIN, E. H., V. NUSSENZWEIG, R. S. NUSSENZWEIG, W. E. COLLINS, K. T. HARINASUTA, P. TAPCHAISRI and Y . CHOMCHARN, 1982 Circutnsporozoite (CS) proteins of human malaria parasites Plasmodium falciparum and Plasmodium viuax. J. Exp. Med. 156: 20-30. 1986 Simple methods for estimating NEI, M . , and T. GOJOBORI, the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3: 418-426. NEI, M., and A. L. HUGHES,1990 Polymorphism and evolution of major histocompatibility complex loci in mammals, in EvoA. G. lution at the Molecular Level, edited by R. K. SELANDER, CLARK,and T . S. WHITTAM.Sinauer,Sunderland, MA (in press). NEI, M., and L. JIN, 1989 Variances of the average numbers of nucleotide substitutions within and between populations. Mol. Biol. Evol. 6: 290-300. J. A.SMITH, R. CHESNUT, C. SETTE, A,, S. Buus, E. APPELLA, MILES,S. M. COLONand H. M. GREY,1989 Prediction of major histocompatibility complex binding regions of protein antigens by sequence pattern analysis. Proc. Natl. Acad. Sci. USA 8 6 3296-3300. SHARMA, S., P. SVEC, G . H. MITCHELL and G. N. Godson, 1985 Diversity of circumsporozoite antigen genes from two

gl-mnts R 0 1 GM43940 andR01 GM20293 andby National Science Foundation grant BSR8807910.

LITERATURECITED
ARNOT,D. E., 1989 Malaria and the major histocompatibility complex. Parasitol. Today 5: 138-142. ARNOT,D. E., J. W. BARNWELL and M. J. STEWART, 1988 Does biased gene conversion influence polymorphism in the circumsporoLoite protein-encoding gene of Plasmodium viuax? Proc. Natl. Acad. Sci. USA 85: 8102-8106. UALLOU, W. R.,J. ROTHBARD, R. A. WIRTZ, R. W. GORE, I. SCHNEIDER, M. R. HOLLINGDALE, R.L. BEAUDOIN, W. L. MALOY, 1,. H. MILLER and W. T . HOCKMEYER, 1985 Immunogenicity of synthetic peptides from circumsporo7oite protein of Plasmodium falciparum. Science 2 2 8 996999. CASPERS, P., R. GENTZ,H. MATILE, J. R. PINK and F. SINIGAGLIA, 1989 T h e circumsporozoite protein gene from NF54, a Plasmodium falciparum isolate used in malaria vaccine trials. Mol. Biochem. Parasitol. 35: 185-190. (~YDE , F., H. MOST, V. C. MCCARTHY D. and J. P. VANDENBERG, 1973 Immunization of man against sporozoite-induced falciparum malaria. Am. J. Med. Sci. 2 6 6 169-177. R. M. MILLER and W. E. WOODCLYDE, D. F., V. C. MCCARTHY, WARD,1975Immunization of man againstfalciparum and vivax malaria by use of attenuated sporozoites. Am. J. Trop. Med. Hyg. 24: 397-401. COCHRANE, A. H., F. SANTORO, V. NUSSENZWEIG, R. W. GWADZ 1982 Monoclonal antibodies identify and R. S. NUSENZWEIG, the protective antigens of sporozoites of Plasmodium knowlesi. Proc. Natl. Acad. Sci. USA 7 9 5651-5655. IIE LA CRUZ, V. F., A. A. LAL and T . F. MCCUTCHAN, 1987Sequence variation in putative functional domains of Plasmodium falciparum: implications for vaccine development. J. Biol. Chem. 262: 11935-1 1939. DE LA CRUZ, V. F., W. L. MALOY, L. H. MILLER,A. A. LAL,M. F. Goon and T . F. MCCUTCHAN, 1988 Lack of cross-reactivity between variant T cell determinants from malaria circumsporozoite protein. J. Immunol. 141: 2456-2460. IXL PORTILLO, H. A., R. S. NUSSENZWEIG and V. ENEA, 1987 Circumsporozoite protein gene of a Plasmodium falciparum strain from Thailand. Mol. Biochem. Parasitol. 24: 289294. ENEA, V., and D. ARNOT, 1988Thecircumsporozoitegene in Plasmodia, pp. 5-1 1 in Molecular Genetics of Parasitic Protozoa, edited by M. J. TURNER and D. ARNOT. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. FINE,E., M. AIKAWA, A. H. COCHRANE and R.S. NUSSENZWEIG, 1984Immunoelectron microscopicobservations on Plasmodium knowlesi sporozoites: localization of protective antigen and its precursors. Am. J. Trop. Med. Hyg. 33: 220-226. <;ALINSKI, M. R., D. E. ARNOT, A. H. COCHRANE, J. W. BARNWELL, R. S. NUSSENZWEIG and V. ENEA, 1987 The circumsporozoite 1-3 19. gene of the Plasmodium cynomolgt complex. Cell 48: 3 1 GODSON, G. N., J. ELLIS, P. SVEC,D. H. SCHLESINGER andV. NUSSENZWEIG, 1983 Identification and chemical synthesis of a tandemly repeated immunogenic regionof Plasmodium knowlesi circumsporo7oite protein. Nature 305: 29-33. <;eon, M . F., J. A. BERZOFSKY and L. H. MILLER,1988 The T cell response to the malaria circumsporozoite protein: an immunological approachto vaccine development.Annu. Rev. Imnlunol. 6: 663-688. Goon, M. F., W. L. MALOY, M. N. LUNDE, H. MARGALIT,J. L. CORNETTE, G . L. SMITH,B. Moss, L. H. MILLERand J. A. BERZOFSKY, 1987 Construction of synthetic immunogen: use of new T-helper epitope on malaria circumsporozoite protein. Science 235: 1059-1062.

Genes Circumsporozoite Protein


strains of the malarial parasite Plasmodiumknowlesi. Science 2 2 9 779-782. TANAKA, T., AND M. NEI, 1989 Positive Darwinian selection observed atthe variable-region genes of immunoglobulins. Mol. Biol. Evol. 6 447-459. VERGARA, U . , R. GWADZ, D. SCHLESINGER, V. NUSSENZWEIC and A. FEREIRA, 1986 Multiple non-repeated epitopes on the circumsporozoite protein of Plasmodium fulcipurum. Mol. Biochem. Parasitol. 14: 283-292. WERER, J. L., 1988 Molecularbiology of malaria parasites. Exp. Parasitol. 66: 143-170. WEISS,W. R., M. F. GOOD, M. R. HOLLINGDALE, L. H. MILLERand J. A. BERZOFSKY,1989 Genetic control of immunity to Plus-

353

modium yoelii sporozoites. J. Immunol. 143: 4263-4266. WOLFE, K. H., P. M. SHARPE and W.-H. LI, 1989 Mutation rates vary among regions of the mammalian genome. Nature 337: 283-285. YOSHIDA,N., R. S. NUSSENZWEIC, P. POTOCNJAK, V. NUSSENZWEIC and M. AIKAWA, 1980 Hybridoma produces protective antibodies directed against the sporozoite stages of malaria parasite. Science 207: 71-73. ZAVALA, F., A. H. COCHRANE, E. H. NARDIN, R. S. NUSSENZWEIG and V. NUSSENZWEIC,1983 Circumsporozoite proteins of malaria parasites contain a single immunodominant region with two o r more identical epitopes. J. Exp. Med. 157: 1947-1957.
Communicating editor: A. G. CLARK