Vous êtes sur la page 1sur 14

AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 118:63–76 (2002)

Genetic Relationship of Chinese Ethnic Populations


Revealed by mtDNA Sequence Diversity
Yong-Gang Yao,1 Long Nie,1 Henry Harpending,2 Yun-Xin Fu,3 Zhi-Gang Yuan,4 and Ya-Ping Zhang1*
1
Laboratory of Molecular Evolution and Genome Diversity, Kunming Institute of Zoology, Chinese Academy of
Sciences, Kunming 650223, People’s Republic of China.
2
Department of Anthropology, University of Utah, Salt Lake City, Utah 84112.
3
Human Genetics Center, University of Texas-Houston, Houston, Texas 77030.
4
Guangxi Medical University, Nanning, Guangxi 530021, People’s Republic of China.

KEY WORDS mtDNA; Chinese ethnic populations; origin; genetic differentiation; language

ABSTRACT The origin and demographic history of Mongolian showed typical signatures of ancient popula-
the ethnic populations of China have not been clearly tion expansions in the mismatch distributions and neu-
resolved. In this study, we examined the hypervariable trality tests. Episodes of extreme size reduction in the past
segment I sequences (HVSI) of the mitochondrial DNA are one of the likely explanations for the absence of evi-
control region in 372 individuals from nine Chinese pop- dence of expansion in northern populations. Small sample
ulations and one northern Thai population. A relatively sizes as well as samples from isolated subpopulations
high percentage of individuals was found to share se- contributed to the bumpy mismatch distributions ob-
quences with those from other populations of the same served. Phylogenetic analysis and haplotype sharing
ethnogenesis. In general, the populations of southern or among populations suggest that current mtDNA variation
Pai-Yuei tribal origin showed high haplotype diversity in these ethnic populations could reveal their ethnohistory
and nucleotide diversity compared with the populations of to some extent, but in general, linguistic and geographic
northern or Di-Qiang tribal origin. Mismatch distribu- classifications of the populations did not agree well with
tions from these populations showed concordant features. classification by mtDNA variation. Am J Phys Anthropol
All except the northern groups Nu, Lisu, Tibetan, and 118:63–76, 2002. © 2002 Wiley-Liss, Inc.

There are 56 officially recognized ethnic popula- north Thailand and contributed to the ancestral
tions in China. Among them, Hans constitute the gene pool of the Thai (Du and Yip, 1993; Ma, 1994;
vast majority and are dispersed nearly all over the You, 1994). Do current genetic structures of these
country. The other ethnic populations are found populations have the signatures of their earlier
mainly in peripheral regions (Du and Yip, 1993). ethnogenesis? Is there a correlation between the
How these ethnic populations came into existence, genetic and the language differentiation? Given
and how and when they spread in China are of that these ethnic populations mainly practiced
great interest, particularly to Chinese anthropol- patrilocal endogamy and had different cultural
ogists. According to their current distributions, traditions, it is of interest to see at what extent
languages, culture, habits, and historical docu- their cultural traditions may be reflected in the
ments, most of the minority ethnic populations in current mtDNA pools of these populations.
southwest China can be traced back to two main Previous analyses of clustering of frequency dis-
ancient tribes: Di-Qiang and Pai-Yuei. The Di- tributions of immunoglobulin Gm haplotypes (Zhao
Qiang tribe moved from the current Gansu,
Ningxia, and Qinghai Provinces in northwest
China to southwest China in two waves, one Grant sponsor: Natural Science Foundation of Yunnan Province;
Grant sponsor: National Natural Science Foundation of China; Grant
around 4,000 –5,000 years ago and the other sponsor: Chinese Academy of Sciences.
around 2,000 –2,500 years ago (Du and Yip, 1993;
You, 1994). During their migration, the Di-Qiang *Correspondence to: Ya-Ping Zhang, Laboratory of Molecular Evo-
people intermarried with other groups and differ- lution and Genome Diversity, Kunming Institute of Zoology, Chinese
entiated into many ethnic groups, some of which Academy of Sciences, Kunming, Yunnan 650223, People’s Republic of
settled in highlands and mountainous regions. China. E-mail: zhangyp@public.km.yn.cn
The ancient Pai-Yuei tribe was widely distributed
Received 22 February 2000; accepted 26 November 2001.
along the southeast coast of China up to Yunnan
Province and the northern part of Southeast Asia Published online in Wiley InterScience (www.interscience.wiley.
2,000 –3,000 years ago (Du and Yip, 1993; You, com).
1994). Some Pai-Yuei people later migrated to DOI 10.1002/ajpa.10052

© 2002 WILEY-LISS, INC.


64 Y.-G. YAO ET AL.

et al., 1991), HLA alleles (Chen et al., 1993), and 38 all other samples, including 8 Tibetans from Qing-
allele frequencies (Du et al., 1998) in Chinese ethnic hai Province, were from various villages in concen-
populations have identified a clear difference be- trated communities of each ethnic population. The
tween northern and southern Chinese groups. Stud- blood donors of Zhuang were students of Guangxi
ies of archaeological assemblages revealed that dis- Medical University. Thirty-two Thais from north
tinction between the south and the north Thailand were also sequenced in the current study.
populations might have existed since the Neolithic The locations and language families of the studied
period (Wu et al., 1989). Recently, data from Y chro- ethnic populations are shown in Table 1 and Figure
mosome biallelic markers as well as nuclear micro- 1. The geographic origin, nationality, and maternal
satellites suggested a southern origin of the north- pedigree (unrelated through at least three genera-
ern groups (Chu et al., 1998; Su et al., 1999; tions) of each individual were ascertained before
reviewed in Jin and Su, 2000). However, by combin- sampling.
ing the data from mtDNA RFLPs, short tandem
repeat loci, and published data of Y-chromosome as DNA extraction, amplification, and sequencing
well as of human-carried JC virus, an ordinarily Genomic DNA was extracted from the whole blood
benign urinary tract virus that is primarily trans- by standard phenol/chloroform methods. The HVSI
mitted from parent to child, Ding et al. (2000) em- sequence was amplified and sequenced as described
phasized that the regional difference in genetic elsewhere (Yao et al., 2000a,b). For some individu-
markers in southern and northern populations als, internal primer H16401 (5⬘-TGATTTCACG-
might be more properly explained by simple isola- GAGGATGGTG-3⬘) (Vigilant et al., 1991) and L16209
tion by distance. Even though the ethnic populations (5⬘-CCCCATGCTTACAAGCAAGT-3⬘) (Mountain et
of northern Di-Qiang origin and of southern Pai- al., 1995) were used in sequencing.
Yuei origin have been included in previous studies,
none of these studies discussed the relationship be- Data analysis
tween documented ethnohistory and current genetic The sequences were edited and aligned using
structure of these populations. Dnastar software (DNASTAR, Inc.), and sequence
In the present study, we analyzed mtDNA control haplotypes were identified. Other data used in the
region hypervariable segment I sequence (HVSI) in current study included 20 Han individuals from
nine Chinese ethnic populations and one northern Hongkong (Cantonese) (Betty et al., 1996), 52 Han
Thai population, together with the data from four individuals from Taiwan (Horai et al., 1996; the
previously reported Chinese populations (Horai et other 14 individuals were discarded for short se-
al., 1996; Betty et al., 1996; Yao et al., 2000a). quences compared with our data.), 45 Uygurs, and
Among them, Bai, Sali (a branch of the Yi ethnic 30 Kazaks from Xinjiang Province (Yao et al.,
group), Nu, Lisu, and Tibetan trace their origins to 2000a). Because some of the published mtDNA data
the ancient Di-Qiang tribal group in northwest included here for comparison were of only 360-bp
China, while Dai and Zhuang are descendants of the length (nucleotide positions 16024 –16383), we re-
ancient Pai-Yuei tribe. There are two main language stricted our analyses to that 360-bp segment when
families in our sample, with Uygur, Kazak, Tu, and the published data were used.
Mongolian belonging to the Altaic language family, Three methods were used to compare the genetic
while Bai, Sali, Lisu, Nu, Tibetan, Dai, Zhuang, and component of each ethnic population. Firstly, we
Han people speak Sino-Tibetan languages (Du and identify the shared haplotypes among ethnic popu-
Yip, 1993; Ma, 1994). Our focus here is not on the lations, then computed the matching probability (m)
distinction between southern and northern Chinese between population samples according to
groups; rather, we are interested in understanding
1) the possible relationship between population eth- m⫽ 冘 XY,i i
nohistory and its genetic architecture, and 2) the i

correlation between language, culture, and genetic where Xi, Yi are the sample frequencies of the
differentiation. shared haplotype i in population X and Y, respec-
tively. Secondly, haplotype diversity (h) and nucleo-
MATERIALS AND METHODS
tide diversity (␲) were used. Haplotype diversity
Sampling was estimated by using the formula:
Nine Chinese ethnic populations were sequenced
in the present study. Bai (N ⫽ 31), Dai (N ⫽ 38),
h ⫽ 共n/n ⫺ 1兲共1 ⫺ 冘 p 兲,
i
2
i

Lisu (N ⫽ 37), Sali (N ⫽ 31), and Nu (N ⫽ 30) were where pi is the sample frequency of the i-th haplo-
collected in Yunnan Province. Tu (N ⫽ 35) and Mon- type and n is the number of individuals in the sam-
golian (N ⫽ 15) were from Qinghai Province. An ple (Nei, 1987). Nucleotide diversity was estimated
indigenous ethnic population of Guangxi Province, by
冘共1 ⫺ 冘␹ 兲,
Zhuang (N ⫽ 83), was included. Except for the 32
Yunnan Tibetans collected in an isolated village in ␲ ⫽ 共1/L兲共n/n ⫺ 1兲 2
ij
the hilly region of DeQin County, Yunnan Province, j i
GENETIC RELATIONSHIP OF CHINESE POPULATIONS 65
TABLE 1. Sample information of populations analyzed in present study
Sample No. of No. of unique
Population Language1 Census size1 Location size haplotypes haplotypes2
1. Lisu Yi branch, Tibeto-Burman 574,856 Gongshan, 37 22 15 (68.2%)
group, Sino-Tibetan Yunnan
family
2. Nu Yi branch, Tibeto-Burman 27,123 Gongshan, 30 9 4 (44.4%)
group, Sino-Tibetan Yunnan
family
3. Sali Yi branch, Tibeto-Burman 6,572,173 Nuxi, Yunnan 31 26 17 (47.2%)
group, Sino-Tibetan
family
4. Tibetan Tibetan branch, Tibeto- 4,593,330 Deqin, Yunnan; 40 19 15 (78.9%)
Burman group, Sino- Qinghai
Tibetan family
5. Bai Yi branch, Tibeto-Burman 1,594,827 Dali, Yunnan 31 29 21 (72.4%)
group, Sino-Tibetan
family
6. Dai Zhuang-Dai branch, 1,025,128 Jinghong, 38 36 24 (66.7%)
Zhuang-Dong group, Yunnan
Sino-Tibetan family
7. Zhuang Zhuang-Dai branch, 15,489,630 Guangxi 83 66 50 (75.8%)
Zhuang-Dong group,
Sino-Tibetan family
8. Cantonese3 Chinese, Sino-Tibetan Hong Kong 20 20 14 (70.0%)
family
9. Taiwanese Han3 Chinese, Sino-Tibetan Taiwan 52 46 34 (88.5%)
family
10. Tu Mongolian group, Altaic 191,624 Huzu, Qinghai 35 29 25 (86.2%)
family
11. Mongolian Mongolian group, Altaic 4,806,849 Qinghai 15 10 9 (90.0%)
family
12. Kazak3 Turkic group, Altaic 1,111,718 Kashen, Xinjiang 30 27 21 (77.8%)
family
3
13. Uygur Turkic group, Altaic 7,214,431 Yili and Kashen, 45 41 32 (78.0%)
family Xinjiang
14. Thai North Thailand 32 29 22 (75.9%)
Total 2 519 409 303
1
The language and total population size of each ethnic population (1990 Census) are from Du and Yip (1993) and Ma (1994).
2
Number of haplotypes that are not shared between or among the 14 populations. Percentages of these haplotypes in each population
are in parentheses.
3
Data from Betty et al. (1996), Horai et al. (1996), and Yao et al. (2000a), respectively.

structed from net genetic distances (dA), defined as


dA ⫽ dXY ⫺ (dX ⫹ dY)/2, where dXY is the mean
pairwise difference between individuals from popu-
lation X and Y, and dX (dY) is the mean pairwise
difference between individuals within population X
(or Y) (Nei, 1987). Since a tree presentation of the
distance matrix might be misread as a succession of
population splits, we also performed principal com-
ponent analysis (PCA) of the populations based on
the distance matrix.
In order to examine whether there are genetic
differences among different geographic groups and
language groups, we grouped the ethnic populations
according to their geographic locations and lan-
guages, respectively, and performed analyses of mo-
lecular variance (AMOVA) (Excoffier et al., 1992),
Fig. 1. Locations of samples in current study. Numbers cor-
using the Arlequin package (Schneider et al., 2000).
respond to population names in Table 1. The demographic history of each population was
examined by two different approaches. First, the D
test of Tajima (1989) and the Fs test of Fu (1997)
where ␹ij is the frequency of the i-th nucleotide at were used to test if neutrality holds (i.e., the popu-
site j, L is the sequence length, and n is the sample lation under study evolves with a constant effective
size (Nei, 1987). Thirdly, a neighbor-joining tree population size, all mutations being selectively neu-
(Saitou and Nei, 1987) of the populations was con- tral). A population that has experienced population
66 Y.-G. YAO ET AL.
TABLE 2. Genetic diversities and population demographic parameters in the populations
Population No. Haplotype diversity Nucleotide diversity Fs1 P2 D3 Tau Expansion time (YBP)4
Lisu 37 0.957 ⫾ 0.017 0.020 ⫾ 0.011 ⫺4.848 0.043 ⫺0.919 6.152 51,800
Nu 30 0.858 ⫾ 0.038 0.017 ⫾ 0.009 2.295 0.816 ⫺0.024 3.927 33,100
Sali 31 0.989 ⫾ 0.011 0.019 ⫾ 0.010 ⫺16.567 0.000 ⫺1.390 7.223 60,800
Tibetan 40 0.933 ⫾ 0.020 0.013 ⫾ 0.007 ⫺5.834 0.021 ⫺1.565 4.058 34,200
Bai 31 0.996 ⫾ 0.010 0.018 ⫾ 0.010 ⫺24.460 0.000 ⫺1.599 7.188 60,500
Dai 38 0.996 ⫾ 0.007 0.020 ⫾ 0.011 ⫺24.971 0.000 ⫺1.389 7.847 66,100
Zhuang 83 0.992 ⫾ 0.004 0.020 ⫾ 0.011 ⫺24.868 0.000 ⫺1.482 8.116 68,300
Cantonese 20 1.000 ⫾ 0.016 0.019 ⫾ 0.010 ⫺15.576 0.000 ⫺1.464 7.438 62,600
Taiwanese Han 52 0.993 ⫾ 0.006 0.019 ⫾ 0.010 ⫺25.030 0.000 ⫺1.600 7.515 63,300
Tu 35 0.987 ⫾ 0.011 0.019 ⫾ 0.010 ⫺19.557 0.000 ⫺1.417 6.002 50,500
Mongolian 15 0.943 ⫾ 0.045 0.016 ⫾ 0.009 ⫺1.459 0.258 ⫺0.561 6.839 57,600
Kazak 30 0.993 ⫾ 0.011 0.018 ⫾ 0.010 ⫺20.781 0.000 ⫺1.553 7.063 59,500
Uygur 45 0.995 ⫾ 0.006 0.017 ⫾ 0.010 ⫺25.170 0.000 ⫺1.895 6.184 52,100
Thai 32 0.994 ⫾ 0.010 0.020 ⫾ 0.011 ⫺22.002 0.000 ⫺1.627 7.833 65,900
Chinese5 519 0.996 ⫾ 0.001 0.020 ⫾ 0.001 ⫺24.239 0.000 ⫺1.953 7.471 62,900
1
Fu’s Fs test.
2
P value of Fu’s Fs statistic.
3
Tajima’s D test.
4
Expansion time of the population was estimated from the tau value, assuming the divergent rate to be 33% site/million years (Ward
et al., 1991). Only 360-bp HVSI sequences (16024 –16383) were included.
5
Total number of individuals in 13 Chinese ethnic populations and the Thai population.

expansion may result in a rejection of the null hy- ancient Di-Qiang origin showed lower haplotype di-
pothesis. Second, mismatch distribution analyses versity than those of southern Pai-Yuei origin pop-
were used to evaluate 1) whether there was signa- ulations.
ture of population expansion that is found in most The two ethnic populations from Xinjiang Prov-
samples from human populations in the mismatch ince (Uygur and Kazak; Yao et al., 2000a) showed
(Excoffier and Schneider, 1999), and 2) the timing of similar nucleotide diversity (0.017– 0.018). However,
demographic expansion measured in units of muta- nucleotide diversities varied in different ethnic pop-
tional time. Typically, a population with a constant ulations from Yunnan Province, ranging from
size in the past has a multimodal mismatch distri- 0.013 ⫾ 0.007 (Tibetan) to 0.020 ⫾ 0.011 (Dai). The
bution, while a population that has undergone ex- two populations from Qinghai Province also differed
pansion usually shows a unimodal or Poisson-like in their nucleotide diversities (Tu, 0.019 ⫾ 0.010;
distribution (Rogers and Harpending, 1992; Rogers, Mongolian, 0.016 ⫾ 0.009), although with no statis-
1995). These computations were also performed by tical significance (P ⬎ 0.05). With the exception of
using the Arlequin package (Schneider et al., 2000). the Lisu population, which had relatively high nu-
The divergence rate from Ward et al. (1991; 33% cleotide diversity (0.020 ⫾ 0.011), it seemed that the
site/million years) was used to converting muta- higher nucleotide diversity was only present in Pai-
tional time (␶) into real time according to equation: Yuei origin populations, such as Dai and Zhuang
␶ ⫽ 2t␮, where ␮ is the mutation rate per sequence (Table 2).
per generation and t is the time in years.
Haplotype sharing between or among the
RESULTS ethnic populations
Sequence diversity
A total of 237 haplotypes was identified in the 340
Hypervariable segment I sequences of the mtDNA Chinese and 32 Thais who were sequenced in this
control region (nucleotide positions 16001–16497) study; among them, 23 haplotypes were shared by
were sequenced in 340 individuals from 9 Chinese two or more populations (see Appendix). The per-
ethnic populations and 32 individuals from a north- centages of population-specific haplotypes were rel-
ern Thai population. One hundred thirty-five poly- atively low in Nu (44.4%) and Sali (47.2%), while in
morphic sites (excluding the insertion sites) were other samples, the percentages were higher, ranging
identified compared to the reference sequence from 66 –90% (Table 1). As shown by the number of
(Anderson et al., 1981; see Appendix). When the sequences shared among the populations listed in
published data of Chinese were considered, the Han the Appendix and the matching probabilities in Ta-
from Hongkong (Cantonese; Betty et al., 1996) ble 3, two main features could be discerned: 1) pop-
showed the highest haplotype diversity (h ⫽ 1.000 ⫾ ulations of the same ethnic origin generally shared a
0.016), followed by those of Bai, Dai, Zhuang, Tai- relatively large number of sequences with each
wanese Han, Uygur, and Kazak (h ⬎ 0.99), while other and presented high matching probabilities be-
that of Nu was the lowest, with a value of 0.858 ⫾ tween populations. The highest matching probabili-
0.038 (Table 2). The haplotype diversity of the Thai ties were found between Nu and Lisu, between Nu
population showed a similar value to those of Dai and Sali. 2) Populations of Pai-Yuei origin shared
and Zhuang. In general, the ethnic populations of relatively few sequences with the populations of Di-
GENETIC RELATIONSHIP OF CHINESE POPULATIONS 67
Qiang origin. The matching probabilities even

All values have been multiplied by 100. Below the diagonal, net genetic distances (dA) between populations; above the diagonal, matching probability (m) between population
0.0877

0.0641
0.2151
0.3810
0.3226
0.1802
0.5556
0.1667

0.2963
Kazak
reached zero between some populations. However,

0.0

0.0
0.0

0.0
in the Bai population, which is of northern Di-Qiang
origin but has intermixed extensively with Han and
0.1170
0.2083
0.0535
0.0427
0.2151
0.4444
0.2151

0.2222

0.0278
Uygur
0.0 other local populations (Du and Yip, 1993; You,

0.0
0.0

0.0
1994), relatively high matching probabilities were
found with the populations of Pai-Yuei origin. The
Han populations (Hongkong and Taiwanese; Betty
Mongolian

0.0803

0.0472
0.1177
et al., 1996; Horai et al., 1996) presented relatively
0.0
0.0
0.0

0.0
0.0
0.0
0.0
0.0
0.0
0.0
higher matching probabilities with the populations
of Pai-Yuei origin than with those of Di-Qiang origin
and other populations that currently in northwest
TABLE 3. Matching probability between population samples and the net genetic distances between populations1
Tibetan

China.
0.3226
0.4286
0.5645

0.0903
0.1125
0.1959
0.0
0.0
0.0
0.0
0.0

0.0
0.0

Mismatch analyses
Among the 14 populations considered here, only
1.3333
2.0430
3.7838

0.1255
0.1249
0.1335
0.1613

Nu, Tibetan, Lisu, and Mongolian showed multimo-


Nu
0.0
0.0
0.0
0.0
0.0
0.0

dal mismatch distributions, characteristic of popu-


lations in equilibrium; the remaining populations all
have unimodal distributions, characteristic of popu-
0.0711

0.6950
1.0462

0.0477
0.1254
0.1057
0.0742
0.1201
Lisu

lations that have undergone large-scale expansion


0.0

0.0
0.0
0.0
0.0

(Fig. 2) (Rogers and Harpending, 1992; Rogers,


1995). The Fs test (Fu, 1997) and D test (Tajima,
0.3226
0.1698
0.4032
0.3109
0.4963
0.0922
1.1982

0.0326
0.0970
0.0677
0.0423
0.0188
0.0884

1989) agreed well with the mismatch analysis, with


Sali

insignificant test results for Nu (P ⬎ 0.05) and Mon-


golian (P ⬎ 0.05). For Tibetan and Lisu, the Fs tests
were marginally significant (Lisu, ⫺4.848, P ⫽
0.2857
0.2256
0.3571
0.3098
0.4945
0.3687

0.0124
0.0314
0.1006
0.0760
0.0704
0.0248
0.0781
Tu

0.043; Tibetan, ⫺5.834, P ⫽ 0.021), while the test for


the remaining 10 populations was highly significant
(P ⬍ 0.05; Table 2). Note that the mismatch analysis
0.7640
0.2016
0.5052
0.1861

0.0145
0.0048
0.0518
0.1133
0.1219
0.0610
⫺0.0089
0.0508

of the pooled Chinese sample presented a smooth


Bai
0.0

unimodal distribution (Fig. 2). Thus, to understand


the details of Chinese migration history, it is impor-
tant to identify the demographic events of different
Taiwanese Han

ethnic populations.
0.4808
0.4555
0.6486
0.5792

0.0353
0.0539
0.0692
0.0832
0.1602
0.1296
0.1016
0.0365
0.0854

In order to see whether small sample size could


mimic multimodal mismatch distribution, we ran-
domly chose 15 individuals from the Zhuang popu-
lation, which presented a unimodal mismatch dis-
tribution in a total of 83 individuals, to estimate the
Zhuang
0.3614
0.7609
0.6777

0.0593
0.0379
0.0731
0.0835
0.1300
0.2457
0.2440
0.1488
0.0564
0.0919

mismatch distributions. The resulting mismatch


distributions were found to be ragged (Fig. 2). The
smaller the sample size, the more ragged the mis-
match distribution (data not shown). The P values
0.4688
0.4934

⫺0.0025 ⫺0.0040
0.0468
0.0433
0.0819
0.1008
0.1273
0.2546
0.2737
0.1773
0.0655
0.1081

(P ⬎ 0.05) of the Fs test in these samples were all


Thai

statistically nonsignificant.
The tau value (␶), which reflects the location of the
0.2632

0.0011

0.0915
0.0393
0.0743
0.0777
0.1025
0.2410
0.2808
0.1666
0.0692
0.1248

mismatch distribution crest, provides a rough esti-


Dai

mate of the time when rapid population expansion


started (Rogers and Harpending, 1992; Rogers,
1995). The tau values of Zhuang, Dai, and Thai were
Cantonese

0.0086
⫺0.0067
⫺0.0142
0.0134
0.0051
0.0229
0.0483
0.0769
0.1641
0.1652
0.0908
⫺0.0014
0.0258

all larger than 7.8, while those of Nu and Tibetan


were smaller than 4.0. This figure should be re-
garded with caution, since the latter two popula-
tions (Nu and Tibetan) did not show clear evidence
of any expansion at all. Sali, Bai, and Kazak had
Taiwanese Han

similar tau values to those of the Han populations,


corresponding to estimated expansion times of more
Mongolian
Cantonese

samples.

than 66,000 years before present (YBP) for Dai,


Tibetan
Zhuang

Kazak
Uygur

Thai, and Zhuang; about 60,000 YBP for Bai, Sali,


Thai

Lisu
Sali
Dai

Bai

Nu
Tu

and Han from Hongkong and Taiwan; and about


1
68 Y.-G. YAO ET AL.

Fig. 2. Mismatch distributions of HVSI sequences within (a) Tibetan, (b) Lisu, (c) Nu, (d) Zhuang, (e) Mongolian, (f) 15 randomly
selected Zhuang samples, (g) pooled samples of Tibetan, Nu, Lisu, and Mongolian, and (h) total of 519 samples. Numbers of nucleotide
differences between all pairs of sequences are indicated along x-axis, and frequency of pairs is indicated by y-axis.

33,000 YBP for Nu and Tibetan, assuming a diver- coffier et al., 1992). Considering haplotypes, 95.26%
gence rate of 33% site/million years (Ward et al., of the genetic variation was found within popula-
1991) (Table 2). Again, the recent tau estimates of tions, whereas 4.74% of the variation (P ⬍ 0.05,
Nu and Tibetan should be regarded with skepticism. 1,000 iterations) was among populations when all 14
populations were grouped together. When the ethnic
Genetic structure of the ethnic groups
populations from Xinjiang and Qinghai Province
The genetic structure of the ethnic populations were grouped together as a northwest geographic
(Thai included) was investigated by AMOVA (Ex- group (45 Uygurs, 30 Kazaks, 35 Tus, 15 Mongo-
GENETIC RELATIONSHIP OF CHINESE POPULATIONS 69
lians, and 8 Tibetans), and populations from Yun-
nan and Guangxi Province (31 Bais, 37 Lisus, 30
Nus, 32 Tibetans, 31 Salis, 38 Dais, and 83 Zhuangs)
were collected as a south geographic group, the ge-
netic variation among groups and within groups
among populations was ⫺0.39% (P ⫽ 0.94 ⬎ 0.05)
and 3.24% (P ⬍ 0.05), respectively. The genetic vari-
ation within the geographic groups was by far larger
than the between-group variation. The same results
were seen when the populations (Thai was not in-
cluded) were grouped into two major groups accord-
ing to their languages (Sino-Tibetan and Altaic, Ta-
ble 1). The proportion of genetic variation attributed
to the differences among language groups amounted
to ⫺0.17% (P ⫽ 0.52 ⬎ 0.05), while 4.86% (P ⬍ 0.05)
and 95.31% (P ⬍ 0.05) of the variation were ob-
served among populations within language groups
and within populations, respectively. We also
grouped the populations according to their historical
origin (Di-Qiang group: Sali, Tibetan, Nu, Lisu, and
Bai; Pai-Yuei group: Dai, Zhuang, and Thai); the
resulting AMOVA test showed a significant differ-
ence between these two groups. About 5.17% of the
genetic variation was observed between the Di-
Qiang and Pai-Yuei groups (P ⫽ 0.02 ⬍ 0.05).
Phylogenetic analysis and principal component
analysis of populations
An unrooted NJ tree of the 14 populations (Thai
included) was constructed using the net genetic dis-
tances (dA) shown in Table 3 (see also Fig. 3a). Those
with the same language family or from the same
geographic region (except for the two Xinjiang pop-
ulations) did not group together, which was consis-
tent with the results of AMOVA analyses. The Dai,
Zhuang, and Thai showed close affinity. The genetic
distances among Dai, Zhuang, and Thai even
showed negative estimated values. Tu and Sali oc-
cupied intermediate node positions in the tree, as
did Taiwanese Han and Bai. The two Xinjiang pop-
ulations (Uygur and Kazak) clustered together; how-
ever, the genetic distance between Uygur and Kazak
was not the smallest: the genetic distances between
Uygur and Han from Hongkong and between Uygur Fig. 3. a: Unrooted NJ tree of the 14 populations based on the
and Bai had negative estimates (Table 3). The Nu, net genetic distances (dA) given in Table 3. b: Principal compo-
Tibetan, and Lisu, with multimodal mismatch dis- nent (PC) map of the 14 populations based on the distance matrix.
tributions, also showed a long branch length to other First principal component accounts for 17% of the original vari-
populations. ation, while the second accounts for 15%. The PC map axes bear
no scales because the relationship shown among populations is
Figure 3b displays distances among the 14 popu- purely topological.
lations represented as the first two principal compo-
nents. The first two PCs incorporate about 32% of
the total squared distances among populations, intense drift, as also demonstrated by the mismatch
while the remaining dimensions mostly portray analyses.
unique characteristics of single populations. The DISCUSSION
cluster pattern of populations in the PC map, with
high-diversity southern-origin populations on the The vicissitudes of time, war, seizure of lands, and
left against low-diversity northern-origin popula- plague throughout history have produced numerous
tions on the right, was in good agreement with the large changes to the populations in China (Ge et al.,
results of the NJ tree. The genetic distances among 1997). Movement and admixture have thus obscured
the northerners are greater, probably reflecting much of the population history of Chinese groups.
their history of episodes of small population size and According to historical documents, Bai, Sali, Nu,
70 Y.-G. YAO ET AL.

Tibetan, and Lisu have their origins in the ancient size (Excoffier and Schneider, 1999). The relatively
Di-Qiang group in northwest China, and some of large genetic distances between Nu and other pop-
them (Lisu, Nu, and Sali) were separated from each ulations, and between Tibetan and other popula-
other about 1,000 –1,400 years ago (Du and Yip, tions, could also result from the impact of genetic
1993; Ma, 1994; You, 1994). Thus, it is not surpris- drift in these samples (Relethford, 1996; Excoffier
ing that Sali, Nu, and Lisu share similar linguistic and Schneider, 1999). Given the large census sizes of
backgrounds, geographic distributions, ethnic the current Tibetans, Mongolians, and Lisus (Table
names, customs, and habits. Bais have a different 1), and the fact that samplings of the populations
demographic history. Around the 3rd and 4th cen- might be restricted to small subsections of these
turies and again in the 14th century, thousands of ethnic populations (i.e., villages or regional popula-
Han moved to the region where Bais lived and were tions), as actually performed in the 32 Tibetan sam-
integrated gradually into the Bai population. The ples from Yunnan Province, the ragged mismatch
origin of Tu is still under debate, although the prev- distributions observed here might be restricted to
alent opinion is that their ancestors migrated from these samples analyzed, but not to entire ethnic
eastern Liaoning Province to southeastern Qinghai groups labeled as Lisu, Tibetan, or Mongolian.
and southern Gansu Provinces around the early 4th Moreover, the mismatch distributions could be eas-
century. Subsequently there was some gene ex- ily affected by the sample sizes considered. As
change with Mongolian, Tibetan, and Han people. shown in Figure 2, small sample sizes could mimic
Another view is that Tus were the descendants of bumpy mismatch distributions, while pooled sam-
Mongolian soldiers who intermarried with the indig- ples from populations showing ragged mismatch dis-
enous nomadic herdswomen during the Yuan Dy- tributions could present roughly unimodal mis-
nasty (1271–1368 AD). Both these historical ac- match distributions. Thus, the multimodal
counts suggest that the Tu had extensive genetic mismatch distribution in the 15 Mongolian samples
admixture in their history (Du and Yip, 1993; Ma, could also be attributed to small sample size. The
1994; You, 1994; Ge et al., 1997). Our results are mismatch distribution analyses and the neutrality
essentially in agreement with these historical docu- tests should be taken with caution when dealing
ments. First, a large number of individuals from with samples from restricted regions or with a small
these populations of the same ethnogenesis shared sample size. Nevertheless, the different mtDNA
sequences with each other. The populations with pools of these regional samples from various ethnic
documented gene flow with other populations, like populations could be reflected by the different mis-
Bai and Tu, showed signatures of admixture on the match distributions observed.
basis of sequence sharing and the matching proba- The expansion times of the populations that show
bility, but the conclusion is without strong statisti- evidence of expansion in the past were estimated to
cal support. Second, the genetic distances among be 58,000 – 65,000 YBP. This result reflects the an-
these populations of Di-Qiang origin, such as Lisu, cient imprint of the initial Paleolithic expansion and
Sali, and Nu, were relatively small. The estimated agrees with the previous estimate, based on Y-chro-
distance was even negative between the two Pai- mosome biallelic markers, that the settlement of
Yuei populations, Zhuang and Dai. However, larger modern humans in East Asia occurred about
distances were observed between populations of Di- 60,000YBP (Su et al., 1999), earlier than the later
Qiang origin and populations of Pai-Yuei origin (Ta- recorded population migrations (Du and Yip, 1993;
ble 3). Third, a statistically significant difference Ma, 1994; You, 1994; Ge et al., 1997). The estimated
was found between the Di-Qiang and Pai-Yuei expansion times of the Zhuang and Dai population
groups in the AMOVA test. Finally, in the popula- were earlier than 63,000 YBP and earlier than other
tion tree and the PC map (Fig. 3), these populations populations considered. This is consistent with the
presented a cluster pattern that was consistent with southern origin hypothesis (cf. Jin and Su, 2000) and
their ethnohistory or historical documents about ge- the earliest archaeological evidence of modern hu-
netic mixture. Thus, mtDNA variation in these pop- mans in China, i.e., the ancient Liujiang people who
ulations does reflect ethnohistory to some extent. lived in Guangxi Province, dated back to 67 Kya
There are preserved genetic traces of ancient shared (U-series dating; Wu et al., 1989).
ethnicity among those populations of the same eth- Our previous work on the frequency pattern of the
nogenesis, even in the face of numerous subsequent 9-bp deletion in the mtDNA COII/tRNALys inter-
movements and substantial gene flow. genic region, in which the frequency of the 9-bp
Among the 14 populations, Nu, Tibetan, Lisu, and deletion was present at higher values in populations
Mongolian showed no conspicuous population ex- of southern Pai-Yuei origin than in populations of
pansion in the past. These four populations also northern Di-Qiang origin (Yao et al., 2000b, 2001),
exhibited reduced genetic diversity compared with agrees well with the genetic pattern observed in the
the others in our sample. Both of these may suggest present study. As discussed above, populations of
possible episodes of severe size reductions in these northern Di-Qiang origin seem to have undergone
regional populations in the past. The previous signal recent genetic drift during their spread to southwest
of Pleistocene expansion might have been erased in China. In addition, ongoing migration from north to
these populations due to episodes of small sample south in the past 2,000 years (Ge et al., 1997) and
GENETIC RELATIONSHIP OF CHINESE POPULATIONS 71
before might account for some of the higher genetic Barbujani G, Sokal RR. 1990. Zones of sharp genetic change in
diversity observed in the southern populations. Europe are also linguistic boundaries. Proc Natl Acad Sci USA
87:1816 –1819.
The languages of our sample fall into two families, Betty DJ, Chin-Atkins AN, Croft L, Sraml M, Easteal S. 1996.
Sino-Tibetan and Altaic. We might expect that an- Multiple independent origins of the COII/tRNALys intergenic
cient population divergences would have led to well- 9-bp mtDNA deletion in Aboriginal Australians. Am J Hum
defined genetic clusters that were consistent with Genet 58:428 – 433.
linguistic boundaries. This has been found in major Cavalli-Sforza LL, Minch E, Mountain J. 1992. Coevolution of
genes and languages revisited. Proc Natl Acad Sci USA 89:
ethnic groups (Barbujani and Sokal, 1990; Cavalli- 5620 –5624.
Sforza et al., 1992). In our present study, however, Chen R, Ye G, Geng Z, Wang Z, Kong F, Tian D, Bao P, Liu R, Liu
the correlation between female genetic lineages and J, Song F, Fan L, Zhang G, Guo S, Xu L, Xu X, Cheng D, Zhao
linguistic differentiation of the populations was not X. 1993. Revelations of the origin of Chinese nation from clus-
evident, as shown by AMOVA analyses and the clus- tering analysis and frequency distribution of HLA polymor-
phism in major minority nationalities in mainland China. Acta
tering pattern in the phylogenetic tree as well as the Genet Sin 205:389 –398 [in Chinese].
PC map. One reason for the discordance may be due Chu JY, Huang W, Kuang SQ, Wang JM, Xu JJ, Chu ZT, Yang
to the likely asymmetry of maternal and paternal ZQ, Lin KQ, Li P, Wu M, Geng ZC, Tan CC, Du RF, Jin L. 1998.
gene flow among the populations, as reported by Genetic relationship of populations in China. Proc Natl Acad
Seielstad et al. (1998). Exogamous marriages hap- Sci USA 95:11763–11768.
Ding Y-C, Wooding S, Harpending H, Chi H-C, Li H-P, Fu Y-X,
pen occasionally in some populations nowadays. The Pang J-F, Yao Y-G, Xiang YJG, Moyzis R, Zhang Y-P. 2000.
tendency for women to move to their husbands’ lo- Population structure and history in East Asia. Proc Natl Acad
cality (patrilocality) could have obscured the corre- Sci USA 97:14003–14006.
lation between language and female genetic lin- Du R, Yip VF. 1993. Ethnic groups in China. Beijing: Science
eages. This process has been observed in Indian Press.
Du R, Xiao CJ, Cavalli-Sforza LL. 1998. Genetic distances be-
castes, in which there is limited upward female gene tween Chinese groups calculated on gene frequencies of 38 loci.
flow among the social ranks of stratified Hindu Sci China [C] 28:83– 89.
castes (Bamshad et al., 1998). Moreover, when we Excoffier L, Schneider S. 1999. Why hunter-gather populations do
considered large-scale migrations in recorded his- not show sign of Pleistocene demographic expansions. Proc
tory (Ge et al., 1997), which can be responsible for Natl Acad Sci USA 96:10597–10602.
Excoffier L, Smouse PE, Quattro JM. 1992. Analysis of molecular
more than “occasional” female exogamy, the absence variance inferred from metric distances among DNA haplo-
of concordance between language classification of types: application to human mitochondrial DNA restriction
populations and mtDNA variation could also be ex- data. Genetics 131:479 – 491.
plained, for the discordance may arise in a relatively Fu Y-X. 1997. Statistical tests of neutrality of mutations against
short amount of time under such destabilizing forces population growth, hitchhiking and background selection. Ge-
netics 147:915–925.
as warfare and plague. Our finding here is interest- Ge JX, Wu SD, Chao SJ. 1997. Zhongguo yimin Shi [the mi-
ing because linguistic and cultural differences gration history of China]. Fuzhou: Fujian People’s Press [in
among Chinese ethnic populations seem to have per- Chinese].
sisted for a lengthy time, despite possibly significant Horai S, Murayama K, Hayasaka K, Matsubayashi S, Hattori Y,
female gene flow among them. Fucharoen G, Harihara S, Park KS, Omoto K, Pan IH. 1996.
mtDNA polymorphism in East Asian populations, with special
ELECTRONIC-DATABASE INFORMATION reference to the peopling of Japan. Am J Hum Genet 59:579 –
590.
Accession numbers and URLs for the sequences in Jin L, Su B. 2000. Natives or immigrants: modern human origin
this article are as follows: GenBank, http://www. in East Asia. Nat Rev Genet 1:126 –132.
ncbi.nlm.nih.gov/web/Genbank (accession numbers: Ma Y. 1994. China’s minority nationalities. Beijing: Foreign Lan-
guages Press.
AF 392063–AF 392434). Mountain JL, Hebert JM, Bhattacharyya S, Underhill PA, Otto-
lenghi C, Gadgil M, Cavalli-Sforza LL. 1995. Demographic his-
ACKNOWLEDGMENTS tory of India and mtDNA-sequence diversity. Am J Hum Genet
We thank Hai-Peng Li, Dr. Francesc Calafell, Pro- 56:979 –992.
Nei M. 1987. Molecular evolutionary genetics. New York: Colum-
fessor Hans-Jürgen Bandelt, Dr. Yuan-Chun Ding, bia University Press.
and the three anonymous reviewers for their critical Relethford JH. 1996. Genetic drift can obscure population his-
comments on the manuscript and interesting discus- tory: problem and solution. Hum Biol 681:29 – 44.
sions on the subject. We also thank Professor Ai- Rogers AR. 1995. Genetic evidence for a Pleistocene population
Hua Liu, Pai-Li Geng, and Dr. Wen Wang for their expansion. Evolution 494:608 – 615.
Rogers AR, Harpending H. 1992. Population growth makes waves
invaluable help in collecting the samples. in the distribution of pairwise genetic differences. Mol Biol Evol
LITERATURE CITED 9:552–569.
Saitou N, Nei M. 1987. The neighbour-joining method: a new
Anderson S, Bankier AT, Barrell BG, de Bruijn MH, Coulson AR, method for reconstructing phylogenetic trees. Mol Biol Evol
Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier 4:406 – 425.
PH, Smith AJ, Staden R, Young IG. 1981. Sequence and orga- Schneider S, Roessli D, Excoffier L. 2000. ARLEQUIN, version
nization of the human mitochondrial genome. Nature 290:457– 2.0: a software for population genetic data analysis. Geneva:
465. University of Geneva.
Bamshad MJ, Watkins WS, Dixon ME, Jorde LB, Rao BB, Naidu Seielstad MT, Minch E, Cavalli-Sforza LL. 1998. Genetic evi-
JM, Prasad BV, Rasanayagam A, Hammer MF. 1998. Female dence for a higher female migration rate in humans. Nat Genet
gene flow stratifies Hindu castes. Nature 395:651– 652. 20:219 –220.
72 Y.-G. YAO ET AL.
Su B, Xiao C, Deka R, Seielstad MT, Kangwanpong D, Xiao J, Lu Yao Y-G, Lü X-M, Luo H-R, Li W-H, Zhang Y-P. 2000a. Gene admix-
D, Underhill P, Cavalli-Sforza L, Chakraborty R, Jin L. 1999. ture in the silk road of China—evidence from mtDNA and melano-
Y-chromosome evidence for a northward migration of modern cortin 1 receptor polymorphism. Genes Genet Syst 75:173–178.
humans into eastern Asia during the last Ice Age. Am J Hum Yao Y-G, Watkins WS, Zhang Y-P. 2000b. Evolutionary history of
Genet 656:1718 –1724. the mtDNA 9-bp deletion in Chinese populations and its rele-
Tajima F. 1989. Statistical method for testing the neutral vance to the Peopling of East and Southeast Asia. Hum Genet
mutation hypothesis by DNA polymorphism. Genetics 123: 107:504 –512.
585–595. Yao Y-G, Yuan Z-G, Zhou Z-D, Geng P-L, Li Q-W, Zhang Y-P.
Vigilant L, Stoneking M, Harpending H, Hawkes K, Wilson AC. 2001. Frequency of the mtDNA 9-bp deletion among Chinese
1991. African populations and the evolution of human mito- ethnic groups. Prog Nat Sci 11:358 –364.
chondrial DNA. Science 253:1503–1507. You Z. 1994. History of Yunnan nationalities. Kunming: Yunnan
Ward RH, Frazier BL, Dew-Jager K, Pä äbo S. 1991. Extensive University Press [in Chinese].
mitochondrial diversity within a single Amerindian tribe. Proc Zhao TM, Zhang GL, Zhu YM, Zheng SQ, Gu WJ, Chen Q, Zhang
Natl Acad Sci USA 88:8720 – 8724. X, Liu DY. 1991. Study on Immunoglobulin allotypes in the
Wu R, Wu X, Zhang S. 1989. Early humankind in China. Beijing: Chinese: a hypothesis of the origin of the Chinese nation. Acta
Science Press. In Chinese. Genet Sin 182:97–108 [in Chinese].
APPENDIX. Variable sites of mtDNA hypervariable segment I sequences (HVSI) in 340 individuals from 9 Chinese ethnic populations and 32 individuals from a northern Thai
population, with respect to nucleotide position 16001–16497 of the reference sequence (Anderson et al., 1981). Only the last three digits are given, e.g., site 051 means 16051. Two
additional insertions of a C and a G in the C homopolymer in region 16258 –16263 in an individual from Zhuang population (Zh68), and in the G homopolymer in region
16470 –16474 in an individual from Sali population (sali21), respectively, are not included. The numbers of individuals sharing a haplotype are listed at right. Zhuang, Tibetan,
and Mongolian are abbreviated as Zhua, Tibe, and Mong, respectively

(Continued)
APPENDIX. (Continued)

(Continued)
APPENDIX. (Continued)

(Continued)
APPENDIX. (Continued)

Vous aimerez peut-être aussi