Vous êtes sur la page 1sur 13

Gene 247 (2000) 265277 www.elsevier.

com/ locate/gene

Dierences in HERV-K LTR insertions in orthologous loci of humans and great apes
Yuri B. Lebedev a, *, Oksana S. Belonovitch a, Natalia V. Zybrova a, Paul P. Khil a, Sergey G. Kurdyukov a, Tatyana V. Vinogradova a, Gerhard Hunsmann b, Eugene D. Sverdlov a
a Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10 Miklukho-Maklaya St., Moscow 117871, Russia b German Primate Centre, Department for Virology and Immunology, Kellnerweg 4, D-37077 Goettingen, Germany Received 23 August 1999; received in revised form 2 December 1999; accepted 25 January 2000

Abstract The classication of the long terminal repeats (LTRs) of the human endogenous retrovirus HERV-K (HML-2) family was rened according to diagnostic dierences between the LTR sequences. The mutation rate was estimated to be approximately equal for LTRs belonging to dierent families and branches of human endogenous retroviruses (HERVs). An average mutation rate value was calculated based on dierences between LTRs of the same HERV and was found to be 0.13% per million years (Myr). Using this value, the ages of dierent LTR groups belonging to the LTR HML-2 subfamily were found to vary from 3 to 50 Myr. Orthologous potential LTR-containing loci from dierent primate species were PCR amplied using primers corresponding to the genomic sequences anking LTR integration sites. This allowed us to calculate the phylogenetic times of LTR integrations in primate lineages in the course of the evolution and to demonstrate that they are in good agreement with the LTR ages calculated from the mutation rates. Human-specic integrations for some very young LTRs were demonstrated. The possibility of LTRs and HERVs involvement in the evolution of primates is discussed. 2000 Elsevier Science B.V. All rights reserved.
Keywords: HERV-K; HML-2; Human endogenous retroviruses; Human genome; LTR; Primate evolution

1. Introduction Rapid progress of the Human Genome Project and related achievements in the development of technologies for gene identication, mapping and sequencing have opened new horizons for revealing the molecular events that underlie the processes of speciation and, in particular, the genetic causes of great apes divergence in the evolution. One of the most exciting questions in this eld is what dierences between human and ape genomes make these related species so phenotypically dierent. To answer this question one has to compare the human genome with the genomes of ape species such as orangutan, gorilla and, of course, the closest human relative, chimpanzee. The next step would be to associate the
Abbreviations: HERV(s), human endogenous retrovirus(es); LTR(s), long terminal repeat(s); Myr, millions years (ago); PCR, polymerase chain reaction; TEs, transposable elements. * Corresponding author. Tel.: +7-095-330-6992; fax: +7-095-330-6538. E-mail address: yuri@humgen.siobc.ras.ru ( Y.B. Lebedev)

genomic variations with interspecies dierences at the level of expressed proteins/enzymes including tissue specicity and/or inducibility of various genes. Chimpanzee (Pan troglodytes) and man (Homo sapiens) have been thoroughly compared, both as the organisms and using available biochemical and genetic data, and since the classical work of King and Wilson (1975) it is widely accepted that human proteins and genes are basically 99% identical to their chimpanzee counterparts. This remarkably low level of dierence allowed King and Wilson (1975) to formulate a concept of regulatory evolution suggesting that a relatively small number of genetic changes in systems controlling the expression of genes may account for major organisational dierences between human and chimpanzees. Since that time, new highly ecient techniques of structural analysis of proteins and nucleic acids have been developed, and a large number of new structures were compared having conrmed that homologous, orthologous sequences of human and chimpanzee are indeed at least 98.5% identical [see comment in Gibbons (1998)]. It has been

0378-1119/00/$ - see front matter 2000 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 0 0 ) 0 0 06 2 - 7

266

Y.B. Lebedev et al. / Gene 247 (2000) 265277

found, however, that there are quite a number of qualitative dierences between the two genomes, including the absence of some chimpanzee DNA stretches from the human genome, and vice versa. The dierences include: (i) dierences in chromosome organisation (Nickerson and Nelson, 1998; Schwartz et al., 1998); (ii) dierent relative copy numbers, locations, and functional status of individual genes within multigene families (Schwartz et al., 1998; Trask et al., 1998); (iii) dierences within gene coding regions leading sometimes to greatly dierent repertoires of the gene products in tissues of humans and great apes (Chou et al., 1998); (iv) dierences in the number and distribution of interspersed repeats, especially related to dierent transposable elements ( TEs) that occupy about 35% of the human genome (Smit, 1996). For example, the number of the Alu family repeats was estimated to be as high as 910 000 in the human genome, and 330 000, 410 000, and 580 000 in the genomes of chimpanzee, gorilla, and orang-utan respectively. The number of KpnI (LINE-1 ) copies in these genomes is 107 000, 51 000, 64 000, and 84 000 respectively. The results were considered as evidence for a large number of insertion and/or deletion events for these DNA sequences having occurred during the evolution of higher primates (Benit et al., 1999). The data obtained for dierent species clearly demonstrate that such repeated insertions and deletions can change the structure of genes, thus leading to new modied proteins. In addition, they were shown to aect the gene regulation, causing changes in tissue specicity of the gene expression which, in turn, can aect the embryo development [Miller and Zeller 1997; reviewed in Britten (1996, 1997), Kidwell and Lisch (1997) and Yoder et al. (1997)]. Moreover, if the elements have their own genes they can enrich the genome with new genetic information, like genes of reverse transcriptase or viral resistance factors (see below). They can also change the stability of the genome through introduction of recombination hot spots (Mighel et al., 1997). In most known cases the newly inserted elements cause deleterious eects up to hereditary diseases induced by insertion mutations. However, the number of described cases where retroelement sequences were shown to impart useful traits to the host is constantly growing (Britten 1996; Schulte et al., 1996). In general, TEs can provide sequence motifs for nucleosome positioning, DNA methylation, transcriptional enhancers, poly(A) addition, splice sites, and add new amino acid codons into open reading frames. Retropositions might thus serve as a major pace maker of the evolution that continues to change expression or genome stability. We are interested in a possible role of the most sophisticated TEs, human endogenous retroviruses (HERVs), in the human genome evolution. HERVrelated sequences probably represent footprints of ancient germ-cell retroviral infections (Steinhuber et al.,

1995; Leib-Mosch and Seifarth, 1996; Lower et al., 1996; Patience et al., 1997) and now occupy up to 1% of the human genome. HERVs, being various in primary structures and abundance, are thought to have been inserted into the germ-line at dierent times between <10 and 60 Myr ago (Steinhuber et al., 1995). Along with nearly full-length HERV elements, HERV-related sequences also include solitary HERV long terminal repeats (LTRs) with no retroviral genes attached. Some HERVs are transcriptionally active (Harris, 1998), and although genomes of many HERVs are corrupted by termination codons, deletions or frame shift mutations, recent studies revealed HERV protein expression or formation of viruslike particles. In particular, HERV elements could aect the human genome through (1) expression of retroviral genes, (2) genome loci rearrangement following the retroposition of HERVs, or (3) the ability of LTRs to regulate nearby genes (Leib-Mosch and Seifarth, 1996; Lower et al., 1996; Patience et al., 1997). A plethora of solitary LTRs comprises a variety of transcription regulatory elements, such as promoters, enhancers, hormoneresponsive elements, and polyadenylation signals. Therefore, the LTRs are potentially able to cause signicant changes in the expression patterns of nearby genes and can be considered as good candidates for being one of the causative agents in speciation. Revealing dierences of HERVs integration sites in the genomes of great apes and man could facilitate a deeper insight into the role of retroviruses in the humanchimpanzee divergence. In this report several HERV-K LTR-containing loci of the human genome are compared with the orthologous loci of apes, the dierences in the LTR content between these species are demonstrated, and human-specic integrations are described.

2. Materials and methods 2.1. Cosmid clones Cosmid clones of chromosome 19 (Chr19)- and Chr21-specic libraries were kindly provided by Dr L. Ashworth and Dr A. Carrano (LLNL, USA), and by Dr K. Gardiner respectively. HERV-K LTRs were identied within the cosmid clones as previously reported (Lavrentieva et al., 1998). Cosmid DNA was prepared with a Wizard Plus Minipreps DNA Purication System (Promega) according to the manufacturers recommendations. 2.2. Oligonucleotide primers Oligonucleotide primers for PCR amplication and sequencing were synthesised using a Milligen 7500 DNA synthesiser as described ( Vinogradova et al., 1997). The primers used are listed in Table 1.

Y.B. Lebedev et al. / Gene 247 (2000) 265277 Table 1 Oligonucleotide primers used to amplify HERV sequences No. Suppression set 1 2 3 4 Sequence (53 ) Designation

267

GTAATACGACTCACTATAGGGCAGCGTGGTCGCGGCCGAGGT ACCTCGGC GTAATACGACTCACTATAGGGC AGCGTGGTCGCGGCCGAGGT

T7Not1 suppression adapter Not2 adapter T7 A1-primer Not1 A2-primer 192oR T2-primer 192yR T2-primer 214oR T1-primer 214yR T1-primer 915yF T1-primer 922oF T1-primer 927oF T2-primer 927yF T2-primer ltr12-F ltr12-R ltr18-F ltr18-R ltr30-F ltr30-R ltr31-F ltr31-R ltr32-F ltr-32-R ltr41-F ltr41-R ltr47-F ltr47-R ltr50b-F ltr50b-R ltr69-F ltr69-R ltr70-F ltr70-R 2508-F 2508-R 0041-F 0041-R 5612-F1 5612-F2 5612-R1 5612-R2 6684-F1 6684-F2 6684-R1 6684-R2

Primers specic for U3 termini of the LTRs 5 TGTTTTTGTGAGCTCAAGGTTGGG 6 TGTTTCAGAGAGCACGGGGTTGGG 7 AACCTTGATTCAATACAACACATG 8 AACCCTGAGTTGACACAGCACATG Primers specic for U5 termini of the LTRs 9 TCCTCCRTATGCTGAACGCTGGTTCC 10 TATGCTGAGCGCCGGTCCC 11 TGAGCGCCGGTCCCCTGGGCC 12 TGAACGCTGGTTCCCTGGGCC Loci-specic primers AGTCTGACAGGAATGGAACTGC CACCACTGCCAGCTCAATC CTCAATCCATTGCACACTGC GGTGGAAATTGTGGCCTG ATGCTCGAAACTACCTGCACTT ATTATGCAACCTGGGTCTGTCA CGTGCTAAGAGTTATCCACACC TGTGTATTTGCTCACTCGCTG GCTGGAATGGAGGTATTATTGT AAAGTAACTGCCACTTGTGAAAC GGCTGGCTTTTCAGGTCG GTCAGTGGCTGCCTGCTGATTTG GTGTTTGAGAAGCTCCTGCC AATCGAGGAACCGGAAGTG TTCAAGCAGGAAGTCACC ACACATGGCGTGTAAAGTC CATGGGGAGACAAGCCATC TGTTGGCCTCAGCGTACC AAATGACTGATACTAATCCAACCAC TGGCAGGGACACAGTGAGG CTCCCATTTTAATTTAGCACCG CCTTTGACCTGTTGAAGTGATG CCTGGCATACAACACTTAACGT CAGGGCCAGGATTTGAAC CCAGTGCCACAAGGTCAG CCGATTCCCCATTCATTCCAG AAGAATGGCAGCGTTGATG GTTGATGCCTGTCCCTCTGCC TTGGGATGACCAGTAACCG AGGGAACCAGCGCACACAGC CATCTCTGGGCTAAGGCATC TCAGTCCCACAAAGGCATCAGT

2.3. Preparation of adapter-ligated DNA 500 ng of cosmid DNA was digested in 50 ml of the restriction buer containing 20 u EcoRI, PstI or AluI restriction enzymes at 37C for 90 min, and further incubated for 90 min after addition of 10 u of fresh restriction enzyme. The termini of the fragments were lled in with the Klenow fragment of the DNA polymerase under standard conditions. Ligation to adapters was

carried out in 30 ml of a buer containing 50 mM Tris HCl, pH 7.6, 10 mM MgCl , 0.5 mM ATP, 10 mM 2 dithiotreitol, 2 mM adapter (oligonucleotides 1 and 2, Table 1) and 5 u T4 DNA ligase (Life Technologies). Samples were incubated at 16C overnight, and the reactions were terminated by heating the reaction mixtures at 75C for 5 min. DNA was then separated from the primers with a QIAquick DNA Purication Kit (Qiagen, CA) and eluted with 50 ml of sterile water.

268

Y.B. Lebedev et al. / Gene 247 (2000) 265277

2.4. LTR-anking DNA amplication by PCR 35 ng of adapter-ligated DNA was amplied using 5 pmol of each A1 and T1 primer and AmpliTaq-DNA Polymerase (PerkinElmer Cetus) in a standard PCR medium: 50 mM KCl, 10 mM TrisHCl (pH 9.0), 2.5 mM MgCl , 0.2 mM each of dNTPs in a nal volume 2 of 25 ml. Primers 7, 8, 9, or 10 ( Table 1) were selected according to the priming direction and the LTR structure. Thermocycling conditions were 30 s denaturation at 94C, 30 s annealing at 60C, 40 s elongation at 72C, 17 cycles (thermocycler OmniGene, Hybaid, UK ). The PCR products obtained after the rst PCR step were 1000-fold diluted and amplied in the second PCR round using 10 pmol of A2 and T2 primers (oligonucleotides 5, 6, 11, or 12 depending on the primers used for the rst round of PCR). The conditions for PCR were the same as in the rst round, except that the number of amplication cycles was 2025. The resulting PCR products were analysed by electrophoresis in a 2% agarose gel. 2.5. Sequencing Templates for sequencing were obtained at the second round of PCR amplication using A2 and T2 primers. PCR products were puried using a QIAquick-spin PCR Purication Kit (QIAGEN ). PCR fragments were sequenced manually with fmol Sequencing System (Promega) using A2 and T2 primers labelled with [c-32 P]-ATP and polynucleotide kinase. Complementary strand sequences were aligned using the DNAsis and Gene Runner programs. 2.6. Sequence analysis LTR anking sequences were analysed by advanced BLAST and standard Repeat Masker programs. LTR searching and extraction, preparation of LTRs, alignment, and its renement using Clustal, GDE, GeneDoc and Phylip programs were done as described previously (Lavrentieva et al., 1998). 2.7. Genomic PCR 10 ng of DNA puried from human or primate blood samples was PCR amplied (2732 cycles) using 10 pmol each of specic primers and AmpliTaq-DNA Polymerase (PerkinElmer Cetus). 2.8. Primate sequences analysed DNA templates puried from the following species were obtained through the Gene Bank of Primates.

Species Homo sapiens Pongidea and Hylobatea Chimp Gorilla Orang-utan Gibbon Old World monkey (Macaca arctoides) (Macaca mulatta) (Mandrillus sphinx) (Papio hamadryas) (Colobus quereza) New World monkey (Callimico goeldii) (Callithrix pigmae) (Saimiri sciureus)

No. of samples 20 (Pan troglodytes) (Gorilla gorilla) (Pongo abelii) (Hylobates syndactylus) (Hylobates lar) 5 4 3 1 2 1 2 2 3 1 2 2 2

3. Results and discussion 3.1. Average ages of dierent HERV-K (HLM-2) LTR branches Phylogenetic analysis shows that most of HERVs entered the genome early in primate evolution: most of HERV families entered and/or were amplied in the germ line after the separation of Old and New World monkeys. Therefore, their age can be estimated as 30 50 Myr (Leib-Mosch and Seifarth, 1996; Lower et al., 1996). However, some HERV-related sequences were detected in New World monkeys, and they are older than 45 Myr (Simpson et al., 1996). There are data indicating that some of HERVs might have been integrated into the genome even earlier, more than 60 Myr ago, i.e. before the divergence of prosimians and New World monkeys (Anderssen et al., 1997). In our previous work we classied human HERV-K retroviral LTRs into groups according to their divergence (Lavrentieva et al., 1998). The accumulation of new data on the LTR sequences in databases allowed us to improve the previous classication and to identify additional branches (see Table 2). For example, a previously uniform K branch was subdivided into two closely related, though dierent branches K1 and K2 having common and distinct diagnostic substitutions. The improvement made the branches more homogeneous thus allowing us to deduce more reliable consensus sequences. As a result, more exact intragroup divergences were calculated. These divergences can be used to calculate the age of the branch ancestor (master or source) gene, the retropositions of which gave birth to the branch members. Such calculations are possible providing that the average rate of divergence is the same for dierent branches as observed for other retroelements. Another prerequisite is the knowledge of the LTRs mutation

Y.B. Lebedev et al. / Gene 247 (2000) 265277 Table 2 Calculated average divergences within HERV-K LTR subfamilies and branches, and estimated time of their ancestors insertiona Subfamily designation LTR I Average internal divergence (%) 21 Branch designation I-E I-D I-S I-P I-K1 I-K2 I-Y I-A I-I I-X II-H II-O II-N II-V II-T II-L1 II-L2 II-L3 II-L4 No. of LTRs in branch 17 (11)b 3 8 7 7 10 12 17 5 5 4 6 6 14 12 5 6 10 6 (4)b Average internal divergence D (%) 8.7 6.5 10.4 9.2 11.2 13.8 10.7 11.3 6.1 7.7 6.5 8.6 5.7 8.2 4.2 2.7 2.5 1.6 0.9

269

Estimated time (T ) of master gene insertion (Myr) 33.5 25.0 40.0 35.4 43.1 53.1 41.2 43.5 23.5 29.6 25.0 33.1 21.9 31.5 16.2 10.4 9.6 6.2 3.5

LTR II c

a The mutation rate of 0.13%/Myr was used for the group age calculations by the formula T=D/20.13, where D is the divergence value (%) and T is the time (Myr) passed since the integration event. The factor 2 is used because the average divergence values presented in the Table 2 correspond to the average of dierences between each two LTRs in a given group, which is expected to be two times higher than the average number of mutations accumulated by each of the LTRs during its evolution. b Two numbers are given in this cell: the total number of entries and the number of sequences except recombinant ones (in brackets). c An analysis revealed an additional group called II-B consisting of three LTRs with an internal divergence of 11.9%. However, based on diagnostic mutations, this group can be assigned to the subfamily II of LTRs having considerably lower intragroup divergence. The reason(s) for such a discrepancy remains unknown, and possibly the group will be further split into several subgroups as more LTR sequences become available. At this point we do not consider this group.

rate. Unfortunately, the mutation rates vary considerably for dierent genome constituents. For instance, the rate of 0.273%/Myr was determined for a-enolase pseudogene, whereas quite dierent rates were reported for other pseudogenes: 1.26% for c-globin, 0.43% for a lactate dehydrogenase, and 0.1% for a- and b-globin [reviewed in Minghetti and Dugaiczyk (1993)]. Finally, Britten (1994) used the mutation rate for Alu repeats of 0.130.16%/Myr. Authors working with HERVs often use dierent mutation rates for HERV age estimations. For instance, Mager and Freeman (1995), Anderssen et al. (1997), and we in our previous work used the rates 0.12%/Myr, 0.2%/Myr and 0.26%/Myr respectively. Here we estimated an average mutation rate for LTRs using well-documented divergences either among retroviral LTRs belonging to the same retrovirus when the time of integrations was known from phylogenetic analysis, or among orthologous LTRs in dierent species. In the rst case the LTRs were probably identical at the time of the retroelement integration (Dangel et al., 1995), and then independently accumulated mutations, the number of which should increase with time passed since the insertion event. Therefore, the dierences between them can be used for calculations of the muta-

tion rates if the ERV insertion time is known or can be estimated. Table 3 demonstrates examples of divergences between 5- and 3-LTRs anking an ERV sequence and corresponding mutation rates calculated for the HERVK(C4) element located in the complement system C4 loci and detected in the human genome as well as in the same sites of the genomes of some other primates (Dangel et al., 1995). The divergences between 5- and 3-LTRs of the HERV-K(C4) were calculated for human (9.1%), orang-utan (10.1%), and Old World monkeys (10.5%) (Dangel et al., 1995). According to the authors, the virus integrations have occurred after the divergence of New World monkeys, i.e. around 45 Myr ago (see footnote c to Table 3). Therefore, the mutation rates can be calculated from these data by the formula D/2T, where D is the divergence value (%) and T the time (Myr) passed since the integration event, i.e. in this case 45 Myr. The factor 2 in the denominator is used because the dierences between the LTRs are the sum of mutations in both LTRs. On the other hand, the divergences between orthologous LTRs of HERVs in dierent species (i.e. between 3-LTRs or between 5-LTRs integrated in the same

270

Y.B. Lebedev et al. / Gene 247 (2000) 265277

Table 3 Intra- and inter-species percent divergence of the HERV-K(C4) LTRs (Dangel et al., 1995) and calculated mutations rates (%/Myr)a Speciesb Has(5 ) Has(3 ) Ppy(5 )c Ppy(3 ) OWm(5 ) Hsa(3 ) 9.1 (0.10) Ppy(5 ) 2.4 (0.09) 5.5 (0.21) 10.1 (0.11) Ppy(3 ) OWm(5 ) 7.2 (0.13) 8.7 (0.16) 8.6 (0.15) 8.7 (0.16) 10.5 (0.12) OWm(3 )

a The gures in brackets represent mutation rate values (% per Myr) calculated as D/2T (see explanation in Section 3.1). b Hsa, humans; Ppy, orang-utan; OWm, Old World monkey (Dangel et al., 1995). c The branching data for primate evolution were averaged from three estimates (Sibley and Ahlquist 1987; Britten, 1994; Takahata and Satta, 1997): New World monkeys 45 Myr; Old World monkeys 28 Myr; gibbon 18 Myr; orang-utan 13 Myr; gorilla 8 Myr; chimpanzee 5.6 Myr.

positions in the human and ape genomes) allow one to estimate independently the mutation rate using the same formula but with a T value corresponding to the time passed since the splitting of the species under comparison. For example, assuming that the branching time for Old World monkeys is 28 Myr ago, the divergence of 7.2% between 5-LTRs in humans and Old World monkeys corresponds to the mutation rate value of 0.13%/Myr. The gures calculated in this way are presented in Table 3. Similar data taken from Mager and Freeman (1995) for the LTRs belonging to the same proviruses of the HERV-H superfamily allowed us to obtain values of 0.1% for HERV-H(cH-4) and 0.12% for RTVL-H3. The average mutation rate of 0.13%/ Myr obtained for LTRs from all the data above is very close to the rate for Alu sequences (Britten 1994). We also calculated the mutation rate using HERV-K LTR intrabranch average sequence divergences between LTRs belonging to dierent HERVs reported by Medstrand and Mager (1998). The division of the divergences by the phylogenetic time of the corresponding group emergence gave an independent evaluation of the average mutation rate value of 0.12%/ Myr, which is in good agreement with the values obtained above. Medstrand and Mager (1998) demonstrated that intrabranch divergences were roughly proportional to the group age. This important observation means that dierent branches evolved at similar rates, i.e. the majority of branch members in dierent branches were under the same selective pressure in the genome. Using the value of 0.13% for the LTR mutation rate, the ages of dierent branches were calculated ( Table 2). It appeared that, along with very old branches (as old as 50 Myr), there were also young groups aged about 36 Myr, (L4)(L3) respectively. A broad spectrum of ages was also demonstrated for the HERV-H LTR superfamily (Anderssen et al., 1997), and for another set of the HERV-K (HML-2) family LTRs (Medstrand and Mager, 1998). Though the time uncertainty is high, as

discussed in a previous paper (Lavrentieva et al., 1998), the estimates in Table 2 suggest that the LTRs of the youngest group II-L emerged in the human genome about 36 Myr ago, approximately at the time of branching between hominid and chimpanzee lineages. Other groups were integrated in the genome earlier, at dierent times ( Table 2). We suggest that many of the old LTRs should be in orthologous positions in the genomes of all hominoids, though one could also expect some dierences in integration sites among dierent primate species in the case the retropositions of these old representatives continued after branching events. Moreover, the human genome is supposed to contain a greater proportion of the youngest groups members. To check these indirect conclusions, and to conrm the presence or absence of the LTR in the site, a sequence analysis of the LTR integration sites in dierent species is required. Accordingly, we sequenced anking DNAs of some LTRs integrated in dierent positions of the human genome. In addition, LTR anking sequences can already be found in databases. These sequences were used for PCR analysis of the genomic DNAs from dierent primate species. 3.2. Isolation of the LTR anking sequences The procedure used for the isolation of anking sequences was based on the PCR-suppression eect (PS-eect; Siebert et al., 1995) (see Fig. 1). LTR-containing cosmid DNAs were digested with a restriction enzyme (R) and tagged with an adapter pair of complementary oligonucleotides (1 and 2, Table 1) of unequal lengths. After lling in with DNA polymerase the termini of each DNA restriction fragment ligated to the adapters are converted into inverted repeats (Fig. 1, line B). In this way single-stranded restriction fragments possess self-complementary termini capable of forming intramolecular stemloop structures ( Fig. 1, line C ). Moreover, the ligated adapter is a GC-rich long oligonucleotide (40 nt) that facilitates and strengthens the

Y.B. Lebedev et al. / Gene 247 (2000) 265277

271

Fig. 1. A scheme of using PCR suppression eect for amplication of LTR anking regions. ( A ) Schematic representation of a genomic DNA region containing an LTR. Vertical lines marked with R designate restriction endonuclease recognition sites; grey boxes positions of LTRs. ( B ) DNA fragments with ligated suppression adapters. Open boxes designate short oligonucleotides complementary to the 3-end of a 40 nt T7Not1 suppression adapter; two light-shaded boxes mark the parts of the adapter corresponding to A1 and A2 ( Table 1) primers. (C ) Pan-handle structures formed by single-stranded DNA fragments arising at the denaturation step. Dark-shaded boxes designate the ends lled in by Taq DNA polymerase before the rst denaturation step and complementary to the adapter. Positions of A1 and T1 primers in the pan-handle structure are indicated by the arrows with corresponding symbols. ( D) PCR fragments with dierent termini formed through amplication with T1 and A1 primers. ( E ) PCR fragments obtained by nested PCR using A2 and T2 primers.

sticking of its self-complementary ends. Obviously, PCR amplication of such DNA fragments using A1-primer corresponding to the outermost parts of the termini will be suppressed. However, the PCR will take place with the simultaneous use of two primers: A1-primer and T-primer complementary to a single-stranded part of the stemloop structure of the fragment ( Fig. 1, line C, left). In this case newly synthesised PCR products will have two dierent, not self-complementary termini unable to form stemloop structures (Fig. 1, line D).

These PCR products are not subject to the PS-eect and thus can be eciently amplied using the A1 and T pair of primers. To increase the specicity of the amplication, nested PCR with A2 and T2-primers was used (Fig. 1, line E ). This mechanism ensures the ecient amplication of only those fragments that contain the targeted sequence. Fig. 2 shows an example of the two-step PCR. The technique allowed us specically to produce LTR anking sequences for primary structure determination.

272

Y.B. Lebedev et al. / Gene 247 (2000) 265277

Fig. 2. Specic PCR amplication of the genomic DNA anking the LTR within R30306 cosmid. Cosmid R30306 containing a Chr19q12 DNA fragment with LTR 31 (see Fig. 3) was digested with EcoRI restriction enzyme, and the restriction fragments were ligated to the adapter (oligonucleotides 1 and 2, Table 1). ( A ) A schematic representation of an LTR with its U3, R and U5 and genomic anking regions. The designations of the A primers used for the two-step PCR amplication correspond to those in Fig. 1. The LTR-specic primers used for the amplication of the U3 and U5 anks are marked as T1r, T2r and as T1f, T2f respectively. ( B ) Specic PCR amplication of the LTR anking region adjacent to the U3 terminus of the LTR. In the rst PCR step, the A1-primer (oligonucleotide 3, Table 1) corresponding to the 5-outermost part of the ligated adapter, and the T1r-primer targeted at the U3 region of the LTR (oligonucleotide 8, Table 1; Fig. 1) were used. The PCR product generated in the rst PCR step (column R1) was re-amplied with T2r (oligonucleotide 6, Table 1) and A2-primers (oligonucleotide 4). The resulting PCR product is shown in column R2. (C ) The specic amplication of the U5 anking region of the LTR. The same as in ( B ) but with primers T1f (oligonucleotide 9, Table 1) and T2f (oligonucleotide 12, Table 1) corresponding to the U5 region of the LTR. The rst (R1 column) and the second (R2 column) step products of the PCR are shown. The gures in the right part of the gel images indicate the molecular masses of the marker (M ).

3.3. Sequence features of the LTR anks The LTR anking sequences obtained (Fig. 3) could be subdivided into two categories: (A) rather long unique sequences on both sides of the LTR and (B) sequences containing representatives of known families

of interspersed repeated elements like Alu, LINE or some retroviral genes on one or both sides of the LTR. At the moment only ve of the 15 LTRs analysed here appeared to be integrated into unique genomic sequences. Having analysed the data available in the databases we found that about 75% of LTRs detected

Y.B. Lebedev et al. / Gene 247 (2000) 265277

273

Fig. 3. Strategy of LTR anks sequencing. DNA restriction fragments used as templates for PCR amplication are designated by bold horizontal lines, the sites of EcoRI (RI ), AluI (AI ) or PstI (PI ) are indicated at the corresponding ends of the fragments. Grey boxes show positions of LTRs; U3, R and U5 regions of LTRs are indicated within the boxes. Repetitive elements within DNA fragments are designated by variously shaded boxes, their types being indicated under the boxes. Arrows designate directions of sequencing and their lengths approximate the lengths of the sequences obtained. Lengths of several large LTR anking fragments are indicated under the corresponding parts of the fragments. Locations of LTRs on human chromosomes are shown at the right ends of DNA fragments. The designations of the LTRs are shown in brackets on the right of the gure. Locations of sequences homologous to retroviral gag and env genes are indicated.

in the human genomic sequences were integrated coincidentally with other repeats, whereas only 25% were integrated at the distances longer than 200 bp from other known repeats (P. Khil, unpublished results). This distribution reects the known trend of retroposons to be reiteratively inserted into or next to pre-existing retroelements ( Kidwell and Lisch 1997). Frequent coincidences of HERVs and their LTRs with Alu and LINE

repeats in the human genome were also noticed earlier by us and other authors (Baban et al., 1996; Khil et al., 1997). The extent of reiterative integrations into preexisting elements is sometimes very impressive. For example, over 50% of the maize genome is represented by retroelements ( Kidwell and Lisch, 1997). However, the maize genome was not knocked out because highly repetitive elements were mostly targeted at intergene

274 Y.B. Lebedev et al. / Gene 247 (2000) 265277

Fig. 4. Examples of PCR amplications of genomic DNAs from dierent primate species with the primers corresponding to sequences anking individual LTRs in the human genome. A chromosome 19 ideogram with the HERV-K (HML-2) LTRs mapped (triangles) is depicted in the upper part of the gure. The results of the LTR-containing loci amplication are as follows. ( A ) LTR41 ( Table 4) containing locus. The F1 (LTR41-F, Table 1) and R1 (LTR41-R, Table 1) primers for amplication are shown as arrows together with the lengths of the LTR anks including the primer sequences. ( B) LTR50b ( Table 4) locus. The primers for amplication were F2 (LTR50b-F, Table 1) and R2 (LTR50b-R, Table 1). Other designations as in ( A ). (C ) LTR 70 locus (Table 4). The primers for amplication were F3 (LTR70-F, Table 1) and R3 (LTR70-R, Table 1). Expected lengths of the PCR products for LTRs integrated in the loci are marked over the double-headed arrows.

Y.B. Lebedev et al. / Gene 247 (2000) 265277 Table 4 Integration of individual HERV-K (HLM-2) LTRs in primate genomes LTR number a LTR branch Accession Chromosome location Primate speciesb Hu 50b 30 32 31 41 47 69 18 70 12 II-L3 II-L2 II-L2 II-T II-N II-V II-O II-O I-E I-P I-P I-Y I-Y II-B AC002508 L47334 AF017229 AB005612 AC005524 AF017223 bc52374 AC006684 AC006129 AC006115 AP000041 AC007204 AC003682 AF017186 7q31.17q31.2 19q13.2 19q12.0 21q22.2 19q12.0 19q12.0 19q13.1 21q22.3 19q13.2 19q13.4 21q21 19p12.0 19q13.4 19p13.1 + + + + + + + + + + + + + + Ch + + + + + + + + + + + Gor + + + + + + + + + + + Oran + + + + + + + + + Gib + ? + + + + + + OWm + + + + NWm ? ? /?d /?d +/?e +/?e <5.6 <5.6 <5.6 813 813 1828 1318 <18 1828 1828 ~45 2845 2845 ~45

275

Integration timec (Myr)

a Numbering for chromosome 19 is according to the map published by us earlier ( Vinogradova et al., 1997). The same designations for corresponding loci in Fig. 3 are given in brackets. b + means successful PCR amplication, indicates the presence of a short PCR product with the length corresponding to the site lacking the LTR, and symbol ? designates no PCR fragment detected, possibly due to mismatches between genomic and primer DNAs. c Branching data for primate evolution see in the footnote to Table 3. d Here three species of the New World monkeys gave dierent results: Callimico goeldii and Callithrix pigmae produced short PCR fragments lacking LTRs, whereas Saimiri sciureus did not reveal any PCR amplication. e Here three species of the New World monkeys gave dierent results: Callimico goeldii and Callithrix pigmae produced long LTR-containing PCR fragments, whereas Saimiri sciureus did not reveal any PCR amplication.

regions, and a multitude of them was nested within other elements ( Kidwell and Lisch, 1997). From this point of view it seems unlikely that the emergence of new LTRs within such dumps could cause signicant evolutionary changes in the genome. Therefore, for further analysis we have chosen the LTRs integrated into unique genomic regions which seem to have more chances of being of evolutionary importance. 3.4. Phylogenetic analysis of the LTR-containing loci The choice of such orphan LTRs for phylogenetic analysis was dictated also by technical reasons, since the use of unique primers allows one to produce unique products during the whole genome PCR amplication. Four such isolated LTRs (3032 and 21q22.3) were taken from the list in Fig. 3. Other LTRs were chosen from the sequences determined by us earlier or available in the GeneBank. A list of the LTRs used is given in Table 4. The primers specic for the LTRs integration sites were synthesised and used for amplication of the genomic DNA from dierent primate species. The results are exemplied in Fig. 4. The gures show that the amplication of genomic DNAs with the primers corresponding to the sites anking LTR30 and LTR50b of the II-L group gave the expected long PCR product only in the case of human DNA ( Fig. 4B ). The products with DNAs from

chimpanzee, gorilla and orang-utan were shorter and corresponded to the same site but lacking LTR. The LTR integrations into these loci thus seem to be human specic and occurred probably less than 5.6 Myr ago. Amplication of ape DNAs with the primers designed for the human sequence suggests high conservation of these orthologous sites. Fig. 4A gives another example where the amplication of a long product was observed with human, chimp, gorilla and orang-utan but not with gibbon, guenon and marmoset DNAs. Therefore, this integration occurred within 1318 Myr ago, the integration site being also rather conserved. This LTR (LTR41) belongs to the II-O branch ( Table 2, Fig. 5). Its master gene appeared about 33.2 Myr ago and was probably still active when the integration of LTR41 occurred. Another possibility is that LTR41 was passively transferred by duplication of a human genome section containing this LTR. Such an event would mean transfer of a very short genomic fragment containing a solitary LTR within the genomic sequence of about 100 bp which lies between 3 ends of the primers used for PCR amplication, so it seems rather improbable. The third example in Fig. 4C concerns an even older LTR of the I-Y branch with the putative master gene integrated about 42 Myr ago. Here the long PCR product was observed down to some Old World monkeys, thus this LTR must be older than 28 Myr. In such a way the ages for several dierent LTR integrations were determined and summarised in Table 4 and Fig. 5.

276

Y.B. Lebedev et al. / Gene 247 (2000) 265277

Fig. 5. Subfamilies and branches of HERV-K LTRs. Arrows supplied with the corresponding branch names indicate estimated ages of propagation starts. Dotted squares and open circles at the ends of the arrows designate LTR-I and LTR-II subfamilies respectively. An evolutionary tree of the primate lineage is presented at the bottom of the gure. Arrows at the tree mark the times of the LTR insertions in the loci; loci names are added at the arrow tops; dark and light arrows designate LTRs belonging to the LTR-I and LTR-II subfamilies respectively.

The phylogenetically determined time of appearance of a particular LTR in the primate genome is a characteristic of the LTR and diers from the age of the group ( Table 2) evaluated based on the intragroup sequence divergencies. However, the assessments of integration times for individual LTRs ( last column of Table 4) suggest that the phylogenetically determined LTR ages are generally in accord with the gures obtained from internal divergences of the LTR branches ( Table 2), so the LTRs belonging to the older groups appear in the genomes of primates earlier in the evolution. The exceptions are the LTRs of the II-O and II-V branches. The phylogenetically determined age of the II-V branch representatives is less than that of two II-O branch LTRs, although the V branch is somewhat older than the O branch. The discrepancy may reect the dierence between the age of the master gene determined from the intrabranch divergence and the age of individual LTRs. The times of individual LTRs integrations can considerably dier from the time of the master gene appearance. The results also demonstrate the existence of LTRs

integrated in some human genomic loci, but not present in orthologous loci of other primates. Human-specic integrations could be important for deciphering the human lineage speciation. Until recently only SINE-R.C2, a representative of the HERV LTR-related family of short interspersed SINE-R repeats, was shown to be human specic (Steinhuber et al., 1995). When this work was in progress, Medstrand and Mager (1998) demonstrated the existence of a human-specic integration for another set of LTRs, suggesting that humanspecic LTR integration is not a rare event. Identication of the genes that can be regulated by LTRs and assessment of the actual contribution of LTRs in the regulation will help to make the next step in understanding the role of LTRs in the evolution and regulation of the human genome. Acknowledgements The authors are indebted to Christian Roos for the provision of primate DNA samples from the Gene Bank

Y.B. Lebedev et al. / Gene 247 (2000) 265277

277

of primates jointly supported by the Department of Medical Genetics of the Ludwig-MaximiliansUniversity, Munich and the German Primate Centre, Goettingen. The authors thank Dr B. Glotov for helpful discussions and assistance in the manuscript preparation. The work was supported by grants 98-04-48798 of the Russian Foundation for Basic Research, HHMI International Research Scolars award 75195-544201, and partly supported by the Human Genome State Project of Russia and INTAS-96-1710.

References
Anderssen, S., Sjottem, E., Svineng, G., Johansen, T., 1997. Comparative analyses of LTRs of the ERV-H family of primate-specic retrovirus-like elements located from marmoset african green monkey and man. Virology 234, 1430. Baban, S., Freeman, J.D., Mager, D.L., 1996. Transcripts from a novel human KRAB zinc nger gene contain spliced Alu and endogenous retroviral segments. Genomics 33, 463472. Benit, L., Lallemand, J.B., Casella, J.F., Philippe, H., Heidmann, T., 1999. ERV-L elements: a family of endogenous retrovirus-like elements active throughout the evolution of mammals. J. Virol. 73, 33013308. Britten, R.J., 1994. Evidence that most human Alu sequences were inserted in a process that ceased about 30 million years ago. Proc. Natl. Acad. Sci. USA 91, 61486150. Britten, R.J., 1996. DNA sequence insertion and evolutionary variation in gene regulation. Proc. Natl. Acad. Sci. USA 93, 93749377. Britten, R.J., 1997. Mobile elements inserted in the distant past have taken on important functions. Gene 205, 177182. Chou, H.H., Takematsu, H., Diaz, S., Iber, J., Nickerson, E., Wright, K.L., Muchmore, E.A., Nelson, D.L., Warren, S.T., Varki, A., 1998. A mutation in human CMPsialic acid hydroxylase occurred after the HomoPan divergence. Proc. Natl. Acad. Sci. USA 95, 11 75111 756. Dangel, A.W., Baker, B.J., Mendoza, A.R., Yu, C.Y., 1995. Complement component C4 gene intron 9 as a phylogenetic marker for primates: long terminal repeats of the endogenous retrovirus ERVK(C4) are a molecular clock of evolution. Immunogenetics 42, 4152. Gibbons, A., 1998. Which of our genes make us human? Science 281, 14321434. Harris, J.R., 1998. Placental endogenous retrovirus (ERV ): structural, functional and evolutionary signicance. Bioessays 20, 307316. Khil, P.P., Kostina, M.B., Azhikina, T.L., Kolesnik, T.B., Lebedev, Y.B., Sverdlov, E.D., 1997. Structural characteristics of four long terminal repeats (LTR) of human endogenous retroviruses and features of their integration sites. Russ. J. Bioorg. Chem. 23, 406411. Kidwell, M.G., Lisch, D., 1997. Transposable elements as sources of variation in animals and plants. Proc. Natl. Acad. Sci. USA 94, 77047711. King, M.C., Wilson, A.C., 1975. Evolution at two levels in humans and chimpanzees. Science 188, 107116. Lavrentieva, I., Khil, P., Vinogradova, T., Akhmedov, A., Lapuk, A., Shakhova, O., Lebedev, Y., Monastyrskaya, G., Sverdlov, E.D., 1998. Subfamilies and nearest-neighbour dendrogram for the LTRs of human endogenous retroviruses HERV-K mapped on human chromosome 19: physical neighbourhood does not correlate with identity level. Hum. Genet. 102, 107116. Leib-Mosch, C., Seifarth, W., 1996. Evolution and biological signicance of human retroelements. Virus Genes 11, 133145.

Lower, R., Lower, J., Kurth, R., 1996. The viruses in all of us: characteristics and biological signicance of human endogenous retrovirus sequences. Proc. Natl. Acad. Sci. USA 93, 51775184. Mager, D.L., Freeman, J.D., 1995. HERV-H endogenous retroviruses: presence in the New World branch but amplication in the Old World primate lineage. Virology 213, 395404. Medstrand, P., Mager, D.L., 1998. Human-specic integrations of the HERV-K endogenous retrovirus family. J. Virol. 72, 97829787. Mighel, A.J., Markham, A.F., Robinson, P.A., 1997. Alu sequences. FEBS Lett. 417, 15. Miller, M., Zeller, K., 1997. Alternative splicing in lecithin:cholesterol acyltransferase mRNA: an evolutionary paradigm in humans and great apes. Gene 190, 309313. Minghetti, P.P., Dugaiczyk, A., 1993. The emergence of new DNA repeats and the divergence of primates. Proc. Natl. Acad. Sci. USA 90, 18721876. Nickerson, E., Nelson, D.L., 1998. Molecular denition of pericentric inversion breakpoints occurring during the evolution of humans and chimpanzees. Genomics 50, 368372. Patience, C., Wilkinson, D.A., Weiss, R.A., 1997. Our retroviral heritage. Trends Genet. 13, 116120. Schulte, A.M., Lai, S., Kurtz, A., Czubayko, F., Riegel, A.T., Wellstein, A., 1996. Human trophoblast and choriocarcinoma expression of the growth factor pleiotrophin attributable to germline insertion of an endogenous retrovirus. Proc. Natl. Acad. Sci. USA 93, 14 75914 764. Schwartz, A., Chan, D.C., Brown, L.G., Alagappan, R., Pettay, D., Disteche, C., McGillivray, B., de la Chapelle, A., Page, D.C., 1998. Reconstructing hominid Y evolution: X-homologous block, created by XY transposition, was disrupted by Yp inversion through LINE LINE recombination. Hum. Mol. Genet. 7, 111. Sibley, C.G., Ahlquist, J.E., 1987. DNA hybridization evidence of hominoid phylogeny: results from an expanded data set. J. Mol. Evol. 26, 99121. Siebert, P.D., Chenchik, A., Kellogg, D.E., Lukyanov, K.A., Lukyanov, S.A., 1995. An improved PCR method for walking in uncloned genomic DNA. Nucleic Acids Res. 23, 10871088. Simpson, G.R., Patience, C., Lower, R., Tonjes, R.R., Moore, H.D., Weiss, R.A., Boyd, M.T., 1996. Endogenous D-type (HERV-K ) related sequences are packaged into retroviral particles in the placenta and possess open reading frames for reverse transcriptase. Virology 222, 451456. Smit, A.F.A., 1996. The origin of interspersed repeats in the human genome. Current Opin. Genet. Dev. 6, 743748. Steinhuber, S., Brack, M., Hunsmann, G., Schwelberger, H., Dierich, M.P., Vogetseder, W., 1995. Distribution of human endogenous retrovirus HERV-K genomes in humans and dierent primates. Hum. Genet. 96, 188192. Takahata, N., Satta, Y., 1997. Evolution of the primate lineage leading to modern humans: phylogenetic and demographic inferences from DNA sequences. Proc. Natl. Acad. Sci. USA 94, 48114815. Trask, B.J., Friedman, C., Martin-Gallardo, A., Rowen, L., Akinbami, C., Blankenship, J., Collins, C., Giorgi, D., Iadonato, S., Johnson, F., Kuo, W.L., Massa, H., Morrish, T., Naylor, S., Nguyen, O.T., Rouquier, S., Smith, T., Wong, D.J., Youngblom, J., van den Engh, G., 1998. Members of the olfactory receptor gene family are contained in large blocks of DNA duplicated polymorphically near the ends of human chromosomes. Hum. Mol. Genet. 7, 1326. Vinogradova, T., Volik, S., Lebedev, Y., Shevchenko, Y., Lavrentyeva, I., Khil, P., Grzeschik, K.H., Ashworth, L.K., Sverdlov, E.D., 1997. Positioning of 72 potentially full size LTRs of human endogenous retroviruses HERV-K on the human chromosome 19 map. Occurrence of the LTRs in human gene sites. Gene 199, 255264. Yoder, J.A., Walsh, C.P., Bestor, T.H., 1997. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 13, 335340.

Vous aimerez peut-être aussi