Académique Documents
Professionnel Documents
Culture Documents
Mitochondrial Genome Diversity in the Tubalar, Even, and Ulchi: Contribution to Prehistory of Native Siberians and Their Afnities to Native Americans
Rem I. Sukernik,* Natalia V. Volodko, Ilya O. Mazunin, Nikolai P. Eltsov, Stanislav V. Dryomov, and Elena B. Starikovskaya
Laboratory of Human Molecular Genetics, Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, Novosibirsk 630090 KEY WORDS mtDNA lineages; population/evolutionary history; native Siberia published data sets, revealed a wide range of tribal- and region-specic mtDNA haplotypes that emerged or diversied in Siberia before or after the last glacial maximum, 18 kya. Spatial distribution and ages of the east and west Eurasian mtDNA haploclusters suggest that anatomically modern humans that originally colonized Altai derived from macrohaplogroup N and came from Southwest Asia around 38,000 years ago. The derivatives of macrohaplogroup M, which largely emerged or diversied within the Russian Far East, came along with subsequent migrations to West Siberia millennia later. The last glacial maximum played a critical role in the timing and character of the settlement of the Siberian subcontinent. Am J Phys Anthropol 148:123138, 2012. V 2012 Wiley Periodicals, Inc.
C
ABSTRACT To ll remaining gaps in mitochondrial DNA diversity in the least surveyed eastern and western anks of Siberia, 391 mtDNA samples (144 Tubalar from Altai, 87 Even from northeastern Siberia, and 160 Ulchi from the Russian Far East) were characterized via highresolution restriction fragment length polymorphism/single nucleotide polymorphisms analysis. The subhaplogroup structure was extended through complete sequencing of 67 mtDNA samples selected from these and other related native Siberians. Specically, we have focused on the evolutionary histories of the derivatives of M and N haplogroups, putatively reecting different phases of settling Siberia by early modern humans. Population history and phylogeography of the resulting mtDNA genomes, combined with those from previously
Studies on present day world populations, especially those based on maternally inherited mitochondrial DNA (mtDNA), suggest that modern humans expanded from East Africa around 70 kya (1000 years ago). They dispersed along the Southwest-Southeast Asia shore and reached the Siberian Pacic 38-37 kya (reviewed by Pope and Terrel, 2007). In the other direction, archaic cultures, including Neanderthal and previously unknown hominin, who lived in Denisova Cave in Altai, southwestern Siberia, around 40 kya, were replaced by modern humans no later than 30 kya (Krause et al., 2007, 2010b; Derevianko, 2011). During the last Ice Age (43-12 kya), a special feature of Siberia has been its relative isolation from the rest of Eurasia and the New World, making it a case study for understanding the main pattern of cultural and biological adaptation and subsequent dispersions to the New World. At the height of last glacial maximum (LGM), the earliest Siberians were largely conned to their strongholds, south of the 568N parallel, which were areas of continuous occupation (Finlanson and Cartion, 2007; Kuzmin, 2008; Graf, 2009). Current distribution of maternal lineages across the world indicates that Siberia lacks autochthonous M and N lineages, the Eurasian descendants of African L3 that are abundant along the proposed southern coastal migration route (reviewed by Forster and Matsumura, 2005; Mellars, 2006). Many phylogenies across Siberia/ Beringia based on complete mtDNA sequences were reconstructed and sequence-divergence estimates were obtained (Derbeneva et al., 2002a; Starikovskaya et al., 2005; Derenko et al., 2007; Perego et al., 2010; Volodko
C V 2012
et al., 2008). However, it is still not quite clear where in the southern extent of Siberia the M and N offshoots arose, when and how they spread over the higher latitudes, setting the stage for colonization of the Americas. As these and related studies have progressed, the AltaiSayan Mountain system, represented by the Tubalar, immediate descendants of autochthonous hunters and gatherers, emerged as a strategic area of very high relevance to these issues. However, lack of entirely sequenced mtDNA data precluded elucidation of Altaic prehistory, potentially preserved in the intrinsic diversity of mtDNA lineages. This is also true for the genetic history of the Tungusic world (Pakendorf et al., 2007), in East Siberia represented largely by the Evenki and Even, the sparse groups of nomadic reindeer herders and hunters, and the Ulchi, in recent traditional times a
Grant sponsor: Wenner-Gren Foundation for Anthropological Research; WG Int. Res. Grant number: 65; Grant sponsor: Russian Foundation for Basic Research; Grant number: 06-04-48182 and 0904-00183. Grant sponsor: Russian Foundation for Humanitarian Research; Grant number: 08-01-0356. *Correspondence to: Rem I. Sukernik, Prospekt Lavrentyeva 10, Novosibirsk 630090. E-mail: sukernik@mcb.nsc.ru, sukernik@gmail.com Received 11 November 2011; accepted 13 February 2012 DOI 10.1002/ajpa.22050 Published online 4 April 2012 in Wiley Online Library (wileyonlinelibrary.com).
124
well-dened tribe of hunters and shermen dispersed along the lakes and the reaches of the Lower Amur (Starikovskaya et al., 2005). In this study, we continued lling gaps in mtDNA genome diversity, which remained poorly sampled in aboriginal Siberian populations with long histories preserved in remote pockets of the subcontinent. The phylogeographic approach and the molecular timedependent clock principal have been applied to uncover the geographic distribution of mtDNA lineages on the tree, and antiquity of the lineages, especially those that are restricted to a particular area (Soares et al., 2009; 2010). Newly obtained entire mtDNA sequences were integrated with those previously published, and updated age estimates were generated to add important details into the prehistory of Native Siberians, and their afnities to Native Americans.
areas are shown in Figure 1, and a brief description of each population follows. Tubalar. The Tubalar are immediate descendants of hunting-gathering bands, who for ages inhabited the coniferous forest (taiga) refuge in Altai-Sayan Mountains, encompassing the northern coast of Lake Teletskoye, the Upper Biya River and Isha River (the Upper Ob River basin). On the northeastern border of their range, the Tubalar were closely related to the Chelkan, similar hunting-gathering tribe that dwindled during the last decades. Because the Tubalar and Chelkan lived in geographic isolation in remote part of the Atai-Sayan, they had retained their traditional way of life and tribal integrity until almost the middle of the 20th century (Levin and Potapov, 1964; Potapov, 1972). The Tubalar and Chelkan are grouped with northern Altaians, and they differ from numerous southerners (Altai-kizhi or Altai proper) in culture, language, and physical appearance. Originally, northern Altaians spoke a dialect of the Uralic language family, but later adopted a Turkic dialect of the Altaic language (Radloff, 1883). The present day Tubalar number 1,500 members and the whole group is still subdivided into several exogamous patriclans conned to their ancestral territory. Venous blood samples were drawn from elder people residing in a dozen of tiny villages: Urlu-Aspak, Paspaul, Uimen, Kara-Koksha, Inyrga, Salganda, Tunzha, Sankin-ail, Tuloi, Kebezen, Artybash, and Pyzha (Choisky and Turochak districts, Altai Republic, Russian Federation). This report is based on the mtDNA genome diversity of the 144 Tubalar samples. Previously published
125
Ulchi
Y1a
2 3 1 1 5 2 47 2 2 4 1 1 3 1 1
X2e B4b1a B5b2 F1a F1b F2 R9b HV9 H8 U2e U4a U4b
1 8 1 1 4 1 5 3 4 1 1 4 2 1 1 11 1 8 9 1 1 1 1 1 1 14 2 3 1 3 11 1 2 5 1 1 2 1 1 1 4 1 1 1 10 2 2 1 1 2 1 1 1 7 10 1 1 1 1
C4a2 C4b
C5
126
Lineage D4c2 D4e4 D4e5 D4g2b D4h D4i2 D4j RFLP (SNP)
G2a1
M7 M9a1
(1/1)25176a (8383 9431) (1/1)25176a (8383 9431) (1/1)25176a (1935 8683 11215 14905) (1/1)25176a (1935 8683 11215 14905) (1/1)25176a (1935 8683 11215 14905) (1/1)25176a (5021 11215 15106 15184) (1/1)25176a (4131) (1/1)25176a (3336 3644 5048) (1/1) 4830n -5176a (1/1)25176a (11696) (1/1)25176a (11696) (1/1)25176a (11696) (1/1)25176a (11696) (1/1)25176a (11696) (1/1)25176a (10427) (1/1)25176a (10427) (1/1)25176a (10427) (1/1)21715c 25176a (8762 12651C) (1/1)21715c 25176a (8762 12651C) (1/1)25176a 10646k (1/1)25176a 10646k (1/1)25176a 10646k (1/1)25176a 10646k (1/1)25176a 10646k (1/1) -5176a 10646k (2/2)25176a 12026h (10397 12705) (2/2)25176a 12026h (10397 12705) (2/2)25176a 12026h (10397 12705) (2/2)25176a (4200T 4216 10397 12705) (2/2) (4200T 4216 10397 12705) (2/2)25176a (4200T 4216 10397 12705) (1/1) 4830n 8198a (1/1) 4830n 8198a (1/1) 4830n 8198a (1/1) 4830n 8198a (1/1) 4830n 8198a (1/1) 4830n 8198a (1/1) 4830n 8198a (1/1) 4830n 8198a (1/1) 4830n 27598f (1/1) 4830n 27598f (1/1) 4830n 27598f (1/1) 4830n 27598f (1/1) 9820g (1/1) 9820g (1/1) 9820g (1/1) 3391e
2 3 2 1 1 1 2 1 1 1 1 2 1 160
1 1 1
144
87
Mutations were scored relative to the revised Cambridge Reference Sequence, rCRS (Andrews et al.,1999). RFLP sites are numbered from the rst nucleotide of the enzyme recognition sequence. The restriction enzymes are given using the following singleletter code: a 5 AluI; c 5 DdeI; e 5 HaeIII; f 5 HhaI; g 5 HinfI; h 5 HpaI; j5MboI; k 5 RsaI; l 5 TaqI; n 5 HaeII; o 5HincII; s 5 AccI; u 5 MseI; v 5 AvrII; w 5 Tsp509I. Three different mutations 1715, 1719 or 17181A create the same site loss -1715DdeI (veried through sequencing). The presence/absence of the associated 10394DdeI/10397AluI sites is denoted through slash brackets (1/1), (/), or (1/). 8281d9 5 9bp COII/tRNALys deletion. 2 indicates the absence of restriction site. Mutations are transitions, unless the base change is specied explicitly. Insertions are specied by 1 with the inserted nucleotide. Single Nucleotide Polymorphisms (SNP) in the coding region veried through sequencing are shown in brackets. Only those nucleotide positions between 16,013 and 16,520 that differ from the rCRS are shown. Founding RFLP/SNP/HVS-I haplotypes are shown in boldface. Additional mutations in the coding region are shown in parenthesis and were identied by or conrmed by sequencing.
mtDNA data from 72 Tubalar admixed with Chelkan (Starikovskya et al., 2005) were revised and supplemented by new Tubalar samples. As a result, each of the communities sharing the river or lake was represented more or less equally in the total sample. Even. In recent traditional times, the Even, formerly called Lamut, were a large group of reindeer herders and hunters of thousands individuals who spoke a language in the northern Tungusic branch of the Altaic American Journal of Physical Anthropology
linguistic family. They spread over a vast territory stretching from the Upper Yana River in the west to the Sea of Okhotsk in the east. The Even share a common genetic heritage with the Evenki, formed through the mixing of northern aboriginal Siberians and southern populations from former Manchuria (Levin and Potapov, 1964; Janhunen, 1996). This study includes 87 mtDNA samples. Previously published mtDNA data on 18 Even-Evenki from the Sea of Okhotsk coast (Starikovskaya et al., 2005) were re-
Tubalar Mansi Ket Nganasan Tuvan Tofalar Evenki Yukaghir Even Ulchi Udegey Negidal Nivkhi Itelmen Koryak Chukchi Sireniki Chaplin Naukan (98) (38) (39) (95) (46) (53) (82) (87) (160) (46) (33) (56) (47) (147) (182) (37) (50) (39) Lineage (144) A2a 2.7 33.7 43.3 72.0 33.3 A2b 13.9 27.0 18.0 41.0 A4 6.9 3.1 2.6 1.1 7.5 A8 5.3 1.1 6.5 2.7 N2a 2.6 W 2.1 N9a 4.2 3.2 N9b 4.3 30.4 Y1 5.8 42.9 8.7 21.2 66.1 4.4 8.8 X2e 0.7 B4a 5.1 4.3 B4b1a 6.3 1.1 B5 2.1 0.6 12.1 F1 1.4 1.0 23.7 4.1 8.7 1.9 5.8 3.1 F2 2.8 R9b 0.7 HV 0.7 H 2.8 14.3 10.5 4.1 V 1.0 J 12.3 2.1 T 7.2 U2e 1.4 U4a 0.7 5.1 21.1 12.8 U4b 14.7 8.1 2.6 1.1 U4c 3.1 5.3 7.7 U5a 6.3 4.1 5.3 2.1 U7a 5.1 K 3.1 C1a 0.6 C4a 2.1 6.1 13.2 10.3 19.7 39.1 43.4 12.2 15.9 1.9 2.2 3.0 0.7 C4b 13.9 10.2 2.6 20.5 15.6 10.9 20.7 41.6 12.5 3.1 15.2 6.1 21.8 6.9 C5 4.2 1.0 20.5 7.2 10.9 17.0 12.2 3.5 6.8 6.1 13.0 13.6 11.9 C7 1.2 Z1 1.4 2.6 2.6 1.1 10.9 2.4 2.3 0.6 6.5 5.4 M8a 15.2 3.7 15.2 D4a 0.6 6.1 D4b1 7.0 2.1 1.9 3.6 1.4 5.0 25.7 D4b2 4.1 0.6 D3 1.0 17.9 1.1 1.2 2.3 1.2 D4c 1.1 1.2 1.2 D4e4 1.9 3.5 1.9 D4e5 1.2 D2a 2.0 29.7 10.0 D4g 0.6 D4h 2.5 D4i 4.9 3.5 D4j 7.0 1.0 2.6 5.1 3.2 1.9 19.4 1.9 6.1 D4l 1.4 3.1 1.9 2.4 4.5 3.0 -
127
128
Tubalar Mansi Ket Nganasan Tuvan Tofalar Evenki Yukaghir Even Ulchi Udegey Negidal Nivkhi Itelmen Koryak Chukchi Sireniki Chaplin Naukan (98) (38) (39) (95) (46) (53) (82) (87) (160) (46) (33) (56) (47) (147) (182) (37) (50) (39) Lineage (144) D4m 2.1 1.1 2.4 2.3 0.6 23.2 D4o 4.2 2.6 1.2 8.1 9.1 5.4 D4* 2.0 D5a 2.1 1.0 1.1 1.9 1.2 3.5 0.6 D5c 2.8 1.2 G1 14.7 10.3 7.5 27.3 5.4 69.6 42.9 26.6 G2 2.1 6.1 6.3 1.2 G3 4.1 M7 1.0 3.2 2.5 19.5 M9 0.6 8.7 -
vised and supplemented by 69 new Even samples drawn from elder persons currently residing in the villages of Chokurdakh (Allaikh district, Sakha-Yakut Republic), and Nelkan and Djigda (Ayan-Maysky District, Khabarovsk Region). While Chokurdakh is located on the left bank of Indigirka River in its low course, Nelkan and Djigda are far apart to the southeast, in the Maya-Aldan River area. The majority of samples come from individuals who reported their Even maternal ancestry, but quite a few blood donors from Chokurdakh were uncertain about their Even or Yukaghir ancestral continuity. Likewise, a few individuals from Maya River reported their mixed Even-Evenki origin. Efforts were made to avoid taking blood from individuals who had the Turkic-speaking Yakut on their maternal side. Taking into account the family history of each participant, including the birthplace of the maternal grandmother and the language she spoke, the entire sample represented a subset of the Tungusic-speakers, who a century ago were wandering in small aggregates within the vast area between the Yana, Indigirka and Kolyma River upper reaches in the west and the Sea of Okhotsk Coast in the east. Ulchi. Historically, the Ulchi are a well-dened tribe of hunters and shermen dispersed along the lakes and the reaches of the Lower Amur. They speak a language of the Tungusic-Manchu group (Levin and Potapov, 1964; Black, 1988; Krauss, 1988). Previously published mtDNA data obtained from 87 elderly Ulchi residing in Old and New Bulava, two neighboring villages (Starikovskaya et al., 2005), were revised and supplemented by 73 new samples collected in Bogorodskoe and Nizhniy Gavan villages (Ulchi district, Khabarovsk Region) in September 2009. Hence, the total Ulchi sample consisted of 160 individuals, with little admixture with the Nivkhi, Negidal, and Udegey.
TABLE 2. (Continued)
Eskimos
mtDNA analysis
Genomic DNAs were extracted from buffy coats by using standard procedures. The rst step consisted in mtDNA variation surveyed by digestion with a battery of restriction enzymes, sequencing HVS-I of the control region and diagnostic single nucleotide polymorphisms (SNPs) in the coding region. Founding restriction fragment length polymorphism (RFLP)/SNP/HVS-I haplotypes are in boldface, as shown in Table 1. Complete sequencing procedure entailed polymerase chain reaction amplication of the eleven overlapping mtDNA templates, which were sequenced in both directions with BigDye terminator chemistry (PE Applied Biosystems) and ABI Prism 3130 DNA Analyzer. Trace les were analyzed with the Sequencher (version 4.5 GeneCode Corporation) software. Mutations were scored relative to the revised Cambridge Reference Sequence, (rCRS) (Andrews et al., 1999).
129
GenBank accession number FJ493500 FJ147306 FJ147321 FJ147307 HM776708 FJ493503 HM776709 HM776710 HM776711 HM776712 HM776713 HM776714 HM776715 FJ493516 FJ147308 FJ147309 FJ147310 FJ147311 FJ493504 FJ147312 FJ493505 FJ493506 FJ493507 FJ147313 FJ147314 FJ147315 FJ147316 FJ858877 FJ147322 FJ493508 FJ147317 FJ493517 FJ858878 FJ147318 FJ493514 FJ493515 FJ493509 FJ493510 FJ493511 FJ493512 FJ493513 HM044854 JN375993 FJ858879 FJ858880 FJ858881 FJ858882 FJ493501 GQ376202 FJ858883 FJ858884 FJ858885 FJ858886 FJ493502 HM153529 HM153530 FJ147319 FJ147320 FJ858887 FJ858888 HM044855 HM044856 HM153527 HM153528 FJ858889 HM776716 HM776717
130
Estimates of coalescence time. Age estimates are based on q-statistics (Forster et al., 1996; Saillard et al., 2000). Incidentally, q is an objective measure of a depth of a node within the tree and is an indicator of the age of the mutations dening the branch leading to that node. Converting q to time would require assumptions about the consistency of mutation rates on different branches of the phylogenetic tree, and there is some discrepancy deriving from the differences in mutation rates used (briey reviewed by Pereira et al., 2011). Here, we used recently improved rates for the entire mtDNA molecular clock, considering one coding-region substitution every 3,624 years, while assuming 7,884 years per synonymous transition (Soares et al., 2009). To dene the genetic relationships between populations, the principal component analysis (PCA) was performed using Statistica software, version 6.0 (StatSoft).
Complete genome 14.4 28.8 17.3 20.5 15.9 38.7 18.7 16.4 25.0 12.4 6.0 53.0 6.7 1.5 11.4 17.5 18.6 18.0 17.3 18.7 22.0 14.5 28.8 29.1 24.8 23.1 17.0 13.9 12.2 13.4 2.6 16.3 19.7 5.1 39.9 16.0 21.7 25.8 8.2 13.4 24.9 20.4 9.1 6.9 1.6 (10.6; 18.3) (17.3; 40.9) (9.8; 25.3) (11.3; 30.0) (13.3; 18.5) (28.2; 49.5) (12.3; 25.3) (12.0; 20.9) (15.0; 35.5) (5.9; 19.1) (3.3; 8.8) (32.1; 75.1) (0.6; 13.1) (-0.7; 3.8) (6.1; 16.8) (10.8; 24.3) (13.8; 23.4) (11.5; 24.7) (10.8; 24.1) (13.2; 24.3) (10.8; 33.5) (11.4; 17.6) (23.1; 34.6) (19.9; 38.6) (17.5; 32.3) (14.1; 32.5) (8.8; 25.4) (7.3; 20.6) (5.2; 19.5) (2.7; 24.7) (0.9; 4.3) (9.8; 23) (12.0; 27.8) (0.7; 9.7) (25.3; 55.2) (10.8; 21.4) (15.3; 28.4) (17.8; 34.1) (5.1; 11.5) (5.0; 22.2) (15.9; 34.4) (7.4; 34.0) (2.5; 15.8) (0.2; 13.8) (0.6; 2.6)
Synonymous positions 15.1 21.9 11.0 18.9 17.8 36.0 16.5 19.2 30.7 12.5 1.8 46.0 1.1 0.0 6.8 20.0 21.4 20.2 18.1 16.4 23.7 17.8 37.1 42.7 32.4 38.5 22.8 14.8 16.9 25.4 1.8 23.9 21.2 2.9 33.5 17.3 23.5 20.6 7.7 11.8 27.3 21.7 14.4 9.0 0.8 (12.8; 17.4) (15.3; 28.4) (6.9; 15.2) (12.3; 25.4) (15.6; 20.0) (28.2; 43.9) (11.8; 21.1) (14.5; 24) (20.7; 40.6) (6.0; 19.0) (0.8; 2.9) (31.4; 60.5) (0.0; 2.3) (0.0; 0.0) (3.6; 9.9) (12.5; 27.5) (16.2; 26.6) (13.1; 27.3) (12.6; 23.6) (12.7; 20.0) (14.0; 33.3) (15.2; 20.4) (31.2; 43.1) (33.0; 52.4) (25.5; 39.3) (27.7; 49.4) (14.2; 31.3) (10.4; 19.2) (9.7; 24.1) (11.7; 39.1) (0.5; 3.0) (16.8; 30.9) (14.0; 28.4) (1.5; 4.2) (22.9; 44.1) (12.6; 22.0) (16.9; 30.2) (15.4; 25.8) (4.7; 10.7) (5.0; 18.7) (21.2; 33.5) (10.8; 32.7) (6.3; 22.5) (1.8; 16.2) (0.2; 1.4)
Macrohaplogroup N mtDNAs
Almost half of the mtDNA types harbored by the Tubalar were found to fall into different lineages of west Eurasian haplogroups. Most common of the Tubalar mtDNA samples are the haplogroup U mtDNAs, conned to subhaplogroups U2e, U4a, U4b, and U5a1. Of these, the entire U4 type attains its highest frequency east of the Urals Mountains, ranging from 15.4% in the Tubalar to 29.0% in the Ket (Tables 1 and 2). The uneven distribution of the two subhaplogroups, with U4a predominately found in the Ket (21.1%) and U4b in the Tubular (14.7%), could be attributed to their different population history augmented by subsequent genetic drift. To discern the ancestral status of U4a and U4b types in the region, we generated 10 new sequences, of which four are of U4a and six U4b (Table 3). Because of the difference in the age of U4a (14.0/7.6 kya), in comparison with U4b (18.0/20.2), it seems unlikely that the high frequency and extensive diversity within the Tubalar U4b sequences conned to western Siberia is a result of recent gene ow from eastern Europeans. Rather the remarkable concentration of the U4b mtDNAs preserved within the northern Altai-lower Ob-lower Yenisei triangular could be a part of the Upper Palaeolithic expansion from the Middle East or western Asia (Derbeneva et al. 2002b,c; Pimenoff et al. 2008; Naumova et al. 2009; this study). Of the remaining haplogroup U samples, the mtDNAs that belonged to U2e and U5a1 were also well represented in the Tubalar and encompassed 2 and 9 mtDNA American Journal of Physical Anthropology
samples, respectively. One of the U2e and one U5a1 mtDNAs were subjected to complete sequencing. We found that the Tubalar U2e mtDNA sequence motif (524insAC-2626-5814-13419T-14587-16214-16258) was different from both European and Indian U2e counterparts (Palanichamy et al., 2004; Achilli et al., 2005; Metsapalu et al., 2004). Incidentally, we also sequenced the U2e mtDNA from a patient with Lebers (Lebers hereditary optic neuropathy) disease of Russian origin. Both the Tubalar and Russian haplotypes assigned to U2e1 were not previously reported, and either differed from other derivatives of the U2e1 available through GenBank. The entire U2e1 appears to have an age 17.5/20.0 kya (Table 4). Recent identication of a complete U2 mtDNA sequence from 30,000-year-old early modern humans from Kostenki, southwest Russia (Krause et al., 2010a), supported the picture with Upper Paleolithic ancestry of the U2e1 lineage observed in West Siberian
131
Fig. 2. The phylogenetic tree of haplogroup N9 complete sequences revealed in Siberia. Mutation positions, relative to the revised Cambridge reference sequence (Andrews et al., 1999) are transitions unless the base change is specied. Deletions are indicated by a d preceding the deleted nucleotides. Insertions are indicated by a 1 preceding the inserted nucleotide. Reversal mutations are underlined. Point mutations at 16182 and 16183 are excluded because of their dependence on the presence of C-T transition at 16189; the length variation in the poly-C stretch at nps 309-315 and point mutation at 16519 was omitted because of their hypervariability. When two or more identical samples belong to the same group, their number is given in brackets.
Plain (Pimenoff et al., 2008; this study). Likewise, the U5a1 mtDNA haplotype harbored previously nonreported coding-region motif: 3027-3552-4924C-1085814110-15218. Among the ve Tubalar mtDNA samples that belonged to haplogroup H, four harbored the characteristic codingregion 13101C transversion and control-region 16288 transition, indicating that they should be attributed to H8 (Achilli et al., 2004). This lineage is very uncommon. Of seven H8 mtDNA sequences available from GenBank, that one of the Tubalar differs from their European counterparts by three previously nonreported transitions: 961, 7765, and 12490. The presence in the Tubalar of subhaplogroup H8 dating to 11.4/6.8 kya (Table 4) may indicate a Neolithic phase expansion towards the Altai Mountains. In addition, the complete genome sequencing was performed in two of 10 Tubalar mtDNAs of haplogroup A4.
It resulted in the appearance of a unique A4* mtDNA haplotype, which shared no mutation with any other A4 sequences, with only 16362 being an exception. This nding introduces the possibility that the A4* sequence revealed in the Tubalar is tribal-specic, and represents a part of basic A4 dispersal originated in Altai 28.8/21.9 kya, but only 14.4/15.1 kya its particular derivative gave rise to an American enclave of the A2 (A2a1A2b) haplogroup in modern Chukotka (Starikovskaya et al., 2005; Volodko et al., 2008; Qin et al., 2010; this study). Of special interest are the Tubalar B4b1a samples (6.2%) that have 16086 variant, resulting in the 1613616189-16217-16519 motives, previously described in a few ancient and modern mtDNA samples from the Egyin Gol Valley (south of Lake Baikal in northern Mongolia) (Keyser-Tracqui et al., 2003; 2006). Compared at the complete sequence level, the Tubalar B4b1a shared 6023, 6413, and 16136 with only a part of the eastern Asian American Journal of Physical Anthropology
132
Fig. 3. The phylogeny of haplogroup D4b sequences. For additional information, see Figure 2 legend.
B4b1 full sequences albeit with the same ancestral node 499-4820-13590 shared with Native American B2 (Starikovskaya et al., 2005; Hill et al., 2007; this study). The estimated age of the B4b1a cluster is 20.5/18.9 kya, in consistency with its pattern of geographic distribution throughout the extreme south of Siberia and adjacent part of Central/Eastern Asia. A total of 16 distinct haplotypes attributable to haplocluster N9 (N9a-N9b-Y) were identied in the Tubalar, Even and Ulchi (Table 1). The updated genealogy of N9 enriched by 10 new sequences, of which the Tuvan and Udegey mtDNA samples are from our old collection, is shown in Figure 2. The ages of principal offshoots of the N9 lineage are 18.7/16.5 kya for N9a, 16.4/19.2 kya for N9b, and 25.0/30.7 kya for Y, thus falling nicely within the Ice Age, whereas the antiquity of the subhaplogroup Y1a conned to the Even, Koryak, Ulchi, and Nivkhi is much younger, only 6.0/1.8 kya. Spatial patterns and coalescent dates of the N9a, N9b, and Y haplogroups suggest that the N9 root (5417) emerged in southwest Asia 38.7/36.0 kya, but only much later, presumably during the LGM, its particular derivatives might have spread through the corridor provided by southern refugia that stretch from mountainous Altai to the Russian Far East on one hand and deserted areas of Central Asia steppe on the other (Finlanson and Cartion, 2007). American Journal of Physical Anthropology
This conjecture is supported by absence of autochthonous N in the South/Southeast Asia where only M has a vast geographical distribution (Hill et al., 2007; Chandrasekar et al., 2009). The fact that N9 is present in the Tubalar, Tuvan, Buryat, Udegey, and Ulchi (Starikovskaya et al., 2005), as well as ancient (Upper Paleolithic) mtDNAs from Hokkaido (Adachi et al., 2009), implies that relatively rare N9 mtDNAs exhibited by modern Japanese (Tanaka et al., 2004; Nohira et al., 2010) have Siberian rather than Southeast Asian source. The haplogroup X sequence we revealed in a sole Tubalar and one Russian mtDNA samples seem to be attributed to the European X2e2 sub-branch. From the Druze and Georgian X2e2, dened by 3948 and 12084 mutations (Reidla et al., 2003; Shlush et al., 2008), the Tubalar differed at 13327. The age of Siberian X2e213327 lineage calculated on the basis of ve similar mtDNA sequences, one Tubalar, one Teleut, two Altaikizhi, and one Buryat (the latter four being attested by Derenko et al., 2007) is at most 1.5 kya (Table 4). It is obvious that Siberian X2e2-13327 sub-cluster, separated from the Near Eastern X2e root by three mutational steps (3948-12084-13327), represents a portion of relatively recent gene ow toward Altai-Sayan. So far, it is impossible to say whether the bearers of the founding sequence for X2a, distinguished by three coding region
133
mutations, 8913, 12397, and 14502 (Fagundes et al., 2008; Perego et al., 2010), have reached North America without leaving a progenitor of X2a in Siberia/Beringia. Some haplotypes intermediate between Near Eastern haplogroup X root and Native North American X2a are apparently missing. Finally, several Tubalar mtDNAs were found to fall into various sublineages of F1, F2, and G2 haplogroups. On the basis of what is known of the Tubalar ethnic history (Potapov, 1972), it is possible that the F1, F2, and G2 mtDNAs have been acquired through the gene ow from adjacent Tuvan or Altai-kizhi residing southward, where these lineages are more common and diverse (Starikovskaya et al., 2005; Derenko et al., 2007).
Macro-haplogroup M mtDNAs
Among the Tubalar, Even, and Ulchi mtDNAs attributable to distinct haplogroup C and D lineages, representatives of almost all known subhaplogroups are present (Table 1). Specically, among the 70 (48.6%) Tubalar mtDNA samples that belong to traditional East Asian haplotypes, 10 harbored characteristic mutations, indicating that they belong to haplogroup D4b1a2a1 (D3a2a in Volodko et al., 2008). Of these, two samples, each representing geographically separated territories, were completely sequenced. Sublineage of this haplogroup marked by 16093 unequivocally links a portion of the Tubalar maternal gene pool to that one harbored by the Eskimos residing on both sides of the Bering Strait. The coalescence time of the D4b1a2a1-16093 cluster, based on 7 mtDNA genomes, dates to 12.2/16.9 kya, in consistency with its pattern of geographic distribution, the Tubalar sequences included, whereas the estimated age of the 11383-14122C subcluster, calculated on the basis of only ve entire sequences (two Chukchi, two Naukan Eskimo, and one Canadian Inuit), is signicantly younger, 7.4/9.5 kya, implying a separate dispersal that ultimately contributed to formation Coast Chukchi and Neo-Eskimo mtDNA gene pool within the range of early-mid Holocene. A wide geographic distribution of the mtDNA lineages stemmed from either D4b1a2 (aged 17.0/22.8 kya) or the D4b1c nodes (aged 13.4/25.4 kya) encompass
most sequences from the Siberian interior. In contrast, members of the D4b2 node of similar time-depth (16.3/ 23.9 kya) are virtually absent from autochthonous Siberians as they are conned largely to central/eastern Asians, thus suggesting signicant discontinuity in the northeast Asian gene pool (Fig. 3; Table 4). New haplogroup D4j sequences (two Tuvan, two Even, and two Ulchi) have also been subjected to complete sequencing. This haplogroup (D6 in Volodko et al., 2008) is conspicuous for its nodal mutation at 11696 and embraces different haplotypes from Siberia, central and eastern Asians (Starikovskaya et al., 2005; Derenko et al., 2007; Volodko et al., 2008; Qin et al., 2010). An exceptionally large variety of D4j haplotypes and independent lineages have been recognized among the Tubalar, Mansi, Ket, Nganasan, Even, Evenk, Yukaghir, Ulchi, Tuvan, and Buryat mtDNA sequences. The estimated age of the D4j cluster is 16.0/17.3 kya (Table 4), in consistency with its pattern of geographic distribution, suggesting Manchuria/Mongolia region as putative origin of the haplogroup D4j lineages. Aside from the four Even/Yukaghir D4e mtDNAs (D2 in Volodko et al., 2008), we have completely sequenced rare D4 mtDNA samples exhibited by either the Tubalar, Even, or Ulchi, and compared them with a subset of similar mtDNA genomes elsewhere. As a result, the D4o (D4 in Volodko et al., 2008) sequence variants, distinguished by presence of 195-10646-16290, were found in the Tubalar, Nganasan, Even, Ulchi, Negidal, and Nivkhi, though in low frequency (Tables 1 and 2). Further comparisons showed that one Even and two Ulchi mtDNA samples share three mutations (8383, 9431, and 16245) with their East Asian counterparts, thus falling within the lineage D4c2. An unexpected and intriguing nding of this study was relic D* mtDNA variant, initially noted at the HVSI level in two Mansi samples from northwestern Siberia (Derbeneva et al., 2002b). The entire sequencing of one of these Mansi mtDNA samples yielded unique, previously unreported set of mutations: 3583, 9856, 15884, 16192, 16261, and 16316. Since D* haplotype does not share this set of mutations with those available from GenBank, except for 3010, 8414, and 14668 characterisAmerican Journal of Physical Anthropology
134
Fig. 6. Phylogenetic relationships (PCA) among 19 Siberian populations with the frequencies of the observed subhaplogroups.
Fig. 5. The phylogeny of haplogroup Z1 complete sequences. Those with asterisk were gleaned from Ingman and Gyllensten (2007).
led directly to the extended Z1a node (Fig. 5). Most likely, distinct sub-lineages of Z1, harbored by the Tubalar, Tofalar, Nganasan, Yukaghir, Koryak, Even, and Ulchi, could be a major part of Neolithic dispersals from southeastern Siberia, whereas Z1a1a emerged in the present-day Ket, Volga-Urals Russians and FinnsSaami most recently, 1.6/0.8 kya. Connement of single Volga-Urals Russians to the Finns-Saami Z1a1a (Ingman and Gyllensten, 2007) is not unexpected in view of recent historical events. Finnic-speaking hunters and gatherers from the Upper Volga basin, including those of the Volga-Kama region to the west of the Ural mountains, were partially dissolved in eastern Slavs 1,000 years ago, as the latter had spread rapidly across northern Russia (Kluchevsky, 2003). Based on the distribution of haplogroup Z1 and the number of sequence variants encompass, we postulate that this lineage originated in Manchuria, the putative homeland of the Altaic-speakers (Janhunen, 1996), and was then dispersed by their geographical expansion.
tic for the D4 node, we have clustered D* haplotype directly to basic D4 (Fig. 4). Because of the antiquity of nodal D4 (28.8/37.1 kya), it is reasonable to suggest that the Mansi D4* is the LGM-gap survivor among whatever haplotypes expanded from the Far East, where the haplogroup D4 diversity is particularly high (Derbeneva et al., 2002b; Starikovskaya et al., 2005; Volodko et al., 2008; Nohira et al., 2009; this study). This conjecture is consistent with archeological records of the Lower Amur, Sakhalin and Hokkaido region (Derevanko and Volkov, 1997; Kuzmin, 2008), suggesting a postglacial recolonization event originated in maritime eastern Asia, with humans spreading north and west. We have also revealed seven Tubalar, three Even, and one Ulchi mtDNA samples attributable to lineage D5 (Table 1). This lineage, for the rst time described by Derbeneva et al., (2002b), in contrast to other subsets of the haplogroup D mtDNAs, has back mutated at the 10394 Dde I1 and 10397 Alu I1 sites, characterizing macrohaplogroup M at the RFLP level. Finally, nine new haplogroup Z1a complete mtDNA sequences were generated. The geographic specicity and coalescence time computed from the root of the Z lineage (6752-9090-15784-16185-16260), assumes origin of the Z1 founder in the vicinities of Lake Baikal 20.4/ 21.7 kya. In the millennia after diversication in situ, Z1 American Journal of Physical Anthropology
Population relationships
To begin with, it is worthwhile to recall the major geographic subdivisions of Siberia, which are (1) western Siberia, encompassing the Ob River basin, bounded by the Ural Mountains in the west and Yenisei River in the east; (2) eastern Siberia, which covers essentially the Yenisei and Lena River basins, including the Taimyr Peninsula far north, and the Trans-Baikal, east of Lake Baikal; (3) northeastern Siberia, which stretches east of Lena River up to Chukotka. The Russian Far East is a separate large region of northern Asia, including the lower Amur River basin, Sea of Okhotsk region, and Sakhalin Island. The Kamchatka Peninsula and adjacent part of the Bering Sea coast also may be associated with the Russian Far East (reviewed by Kuzmin, 2008). The distribution in Table 2 has shown that the mtDNA variation west of the Yenisei is contributed by substantial proportion of eastern Eurasian ancestry. Specically, the Tubalar were found to harbor 48.2% mtDNAs belonging to C1D lineages, whereas the Mansi harbored 25.4%, and Ket of 21.0%, thus testifying a separate maternal makeup of the western and eastern Siberian populations. The PC analysis delineates the resemblances and distinctions of samples, representing 19
MITOCHONDRIAL GENOME DIVERSITY IN NATIVE SIBERIANS Siberian populations, based on sub-haplogroup frequencies, and these are well reected on the plot (Fig. 6). In accord with the expectations, the Eskimos are outliers while the Even placed near the Yukaghir because the core of the present-day Even mtDNA pool would represent an amalgamation of the remnants of Paleo-Siberian-speaking Yukaghir and northern Tungusic-speakers (Jochelson, 1910). Most likely, this is an implication of recent northward expansion of the bearers of lineage G1 from the Sea of Okhotsk-Kamchatka region. This haplogroup is predominantly found in the Koryak and Itelmen and also occurs at moderate frequency in the Negidal and Yukaghir (Schurr et al., 1999; Starikovskaya et al., 2005; Volodko et al., 2008; this study). On the same reason, present day Chukchi would place an intermediate position between populations of the Russian Far East the Eskimos. The PCA plot supports the hypothesis that the Ulchi, Nivkhi, Negidal, and Udegey represent a detached population cluster isolated by sufcient time to be relatively distinct over the average genetic background of continental Siberia. This picture may reect prolonged interactions and, therefore, genetic interchange among inhabitants of the Lower Amur, Sakhalin and Hokkaido since the early days of their common history (Adachi et al., 2009).
135
Derenko et al., 2007; this study) may be explained by extinction because of genetic drift or purifying selection during long-term dispersals from the near East-Southwest Asia, where the U7 and N2 types present the most widespread distribution and diversity (Richards et al., 2000; Metsapalu et al., 2004; Quintana-Murcy et al., 2004; Palanichami et al., 2004; Derenko et al., 2007).
136
occurs in the Aleuts of the Commander Islands and Sireniki Eskimos of the Chukchi Peninsula (Derbeneva et al., 2002a; Starikovskaya et al., 2005; Volodko et al., 2008), and the Paleo-Eskimo Saqqaq mtDNA genome from archeological remains in Greenland (Gilbert, 2008). Importantly, a few non-Aleut D2a1 mtDNA samples from the village of Nikolskoye on the Bering Island, most likely of the Tlingit (Na-Dene) ancestry, missed this mutation (Derbeneva et al., 2002a). Thus, the PaleoEskimos represented by Saqqaq and Sireniki on one hand, and Neo-Eskimos represented by Naukan on the other hand, have distinct relationships to Siberians/ Asians as reected in different ages for the D2a1 (5.1/ 2.9 kya) and D4b1a2a1-16093 (12.2/16.9 kya), consistent with different patterns of their geographical distribution (Volodko et al., 2008; this study). The coalescence dates and spatial distribution of the D2a, D4b1a2, and D4b1c/D3 haploclusters across Siberia-Beringia (Starikovskaya et al., 2005; Derenko et al., 2007; Volodko et al., 2008; Gilbert et al., 2008; this study) suggest that their deep-rooted nodes, D4e and D4b1, emerged in the refugia of the southern extent of Siberia prior or during the LGM, placing part of recurrent episodic dispersals toward Alaska at the terminal Pleistocene to early Holocene.
CONCLUSION
Here, we extended the survey of mitochondrial SNPs and entire sequences across Siberia to go deeper into the population-evolutionary history of northeastern Eurasia. Relatively limited variation of east Eurasian lineages (C4a, C4b, C5, D3, D4, D5, and Z1a) observed west of Yenisei appears to be associated with admixture events that distinguish more recently formed Eurasian populations from older ones. It resulted in a large variety of M and N derivatives with substantial geographical separation of western and eastern Eurasian lineages. Our results suggest that anatomically modern humans that originally colonized the Altai-Sayan region derived from macrohaplogroup N and came from Southwest Asia not later than 38 kya. The derivatives of macrohaplogroup M, which largely emerged or diversied within the Russian Far East, came along with subsequent migrations to West Siberia area millennia later. Further insights into the prehistory of native Siberians and their relationship to Native Americans would benet from generating whole genome sequences, with the nal goal to discern traces of potential gene ow of archaic groups from Altai, "Denisovans" to the ancestors of modern Siberians.
ACKNOWLEDGMENTS
We are indebted to the native people of Siberia, the Tubalar, Even and Ulchi, in particular, for their participation in this project. The authors have greatly beneted from conversations with Anatoly Derevianko and Andrei Tabarev (Institute of Archaeology and Ethnography, SBRAS, Novosibirsk). The help of Andre M. Sukernik in editing the nal draft of the manuscript is gratefully acknowledged.
LITERATURE CITED
Achilli A, Rengo C, Battaglia V, Pala M, Olivieri A, Fornarino S, Magri C, Scozzari R, Babudri N, Santachiara-Benerecetti AS, Bandelt H-J, Semino O, Torroni A. 2005. Saami and Berbers:
137
Palanichamy MG, Sun C, Agarwal S, Bandelt H-J, Kong Q-P, Khan F, Wang C-Y, Chaudhuri TK, Palla V, Zhang Y-P. 2004. Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. Am J Hum Genet 75:966978. Perego UA, Angerhofer N, Pala M, Olivieri A, Lancioni H, Kashani BH, Carossa V, Ekins JE, Gomez-Carballa A, Huber G, Zimmermann B, Corach D, Babudri N, Panara F, Myres NM, Parson W, Semino O, Salas A, Woodward SR, Achilli A, Torroni A. 2010. The initial peopling of the Americas: a growing number of founding mitochondrial genomes from Beringia. Genome Res 20:11741179. Pereira L, Soares P, Radivojak P, Li B, Samuels DC. 2011. Comparing phylogeny and the predicted pathogenicity of protein variations reveals equal purifying selection across the global human mtDNA diversity. Am J Hum Genet 88:433439. Pimenoff VN, Comas D, Palo JU, Vershubsky G, Kozlov A, Sajantila A. 2008. Northwest Siberian Khanty and Mansi in the junction of West and East Eurasian gene pools as revealed by uniparental markers. Eur J Hum Genet 16:12541264. Pope K, Terrell J. 2007. Environmental setting of human migration in the circum-Pacic region. J Biogeogr 35:121 Potapov L. 1972. Tubalar from Gorniy Altai. In: Ethnic history of Asian people. Moscow (Russian). Qin Z, Yang Y, Kang L, Yan S, Cho K, Cai X, Lu Y, Zheng H, Zhu D, Fei D, Li S, Jin L, Li H. 2010. A mitochondrial revelation of early human migrations to the Tibetan Plateau before and after the last glacial maximum. Am J Phys Anthropol 143:555569. Quintana-Murci L, Chaix R, Wells RS, Behar DM, Sayar H, Scozzari R, Rengo C, Al-Zahery N, Semino O, SantachiaraBenerecetti AS, Coppa A, Ayub Q, Mohyuddin A, Tyler-Smith C, Mehdi SQ, Torroni A and McElreavey K. 2004. Where west meets east: the complex mtDNA landscape of the southwest and central Asian corridor. Am J Hum Genet 74:827845. Radloff W. 1883. Aus Sibirien. Leipzig. Reidla M, Kivisild T, Metspalu E, Kaldma K, Tambets K, Tolk H, Parik J, Loogvali E, Derenko M, Malyarchuk B, Bermisheva M, Zhadanov S, Pennarun E, Gubina M, Golubenko M, Damba L, Fedorova S, Gusar V, Grechanina E, Mikerezi I, Moisan J-P, Chavetre A, Khusnutdinova E, Osipova L, Stepanov V, Voevoda M, Achilli A, Rengo C, Rickards O, Franco de Stefano G, Papiha S, Beckman L, Janicijevic B, Rudan P, Anagnou N, Michalodimitrakis E, Koziel S, Usanga E, Geberhiwot T, Herrnstadt C, Howell N, Torroni A, Villems R. 2003. Origin and diffusion of mtDNA haplogroup X. Am J Hum Genet 73:11781190. Richards M, Macaulay V, Hickey E, Vega E, Sykes B, Guida V, Rengo C, Sellitto D, Cruciani F, Kivisild T, Villems R, Thomas M, Rychkov R, Rychkov O, Rychkov Y, Golge M, Dimitrov D, ` Hill E, Bradley D, Romano V,Cal Vona G, Demaine A, Papiha S, Triantaphyllidis C, Stefanescu G, Hatina J, Belledi M, Di Rienzo A, Novelletto A, Oppenheim A, Nrby S, Al-Zaheri N, Santachiara-Benerecetti S, Scozzari R, Torroni A, Bandelt HJ. 2000. Tracing European founder lineages in the Near Eastern mtDNA pool. Am J Hum Genet 67:12511276. Saillard J, Forster P, Lynnerup N, Bandelt HJ, Nrby S. 2000. mtDNA variation among Greenland Eskimos: the edge of the Beringian expansion. Am J Hum Genet 67:718726. Schurr T, Sukernik R, Starikovskaya Y, Wallace D. 1999. Mitochondrial DNA variation in Koryaks and Itelmen: population replacement in the Okhotsk Sea-Bering Sea region during the Neolithic. Am J Phys Anthropol 108:139. Shlush LI, Behar DM, Yudkovsky G, Templeton A, Hadid Y, Basis F, Hammer M, Itzkovitz S, Skorecki K. 2008. The Druze: a population genetic refugium of the Near East. PLoS ONE 3:e2105. Soares P, Achilli A, Semino O, Davies W, Macaulay V, Bandelt H-J, Torroni A, Richards MB. 2010. The archaeogenetics of Europe. Curr Biol 20:174183. Soares P, Ermini L, Thomson N, Mormina M, Rito T, Rohl A, Salas A, Oppenheimer S, Macaulay V, Richards MB. 2009. Correcting for purifying selection: an improved human mitochondrial molecular clock. Am J Hum Genet 84:740759.
138
Starikovskaya E, Sukernik R, Derbeneva O, Volodko N, RuizPesini E, Torroni A, Brown M, Lott M, Hosseini S, Huoponen K, Wallace D. 2005. Mitochondrial DNA diversity in indigenous populations of the southern extent of Siberia, and the origins of Native American haplogroups. Ann Hum Gene 69:6789. Sukernik RI, Volodko NV, Mazunin IO, Eltsov NP, Starikovskaya YB. 2010. Genetic history of Old Russian Settlers in Polar North region of eastern Siberia. Genetika (Russian) 46:15711579. Tamm E, Kivisild T, Reidla M, Metspalu M, Smith DG, Mulligan CJ, Bravi CM, Rickards O, Martinez-Labarga C, Khusnutdinova EK, Fedorova SA, Golubenko MV, Stepanov VA, Gubina MA, Zhadanov SI, Ossipova LP, Damba L, Voevoda MI, Dipierri JE, Villems R, Malhi RS. 2007. Beringian standstill and spread of Native American founders. PLoS ONE 2:e829.