Vous êtes sur la page 1sur 23

doi: 10.

1038/nature05616

SUPPLEMENTARY INFORMATION

I. Research subjects

The whole genome scan was performed using a two-step design using the following French
study population samples (Table S1).

Stage 1: Inclusion criteria for cases were (i) T2DM according to 1997 American Diabetes
Association (ADA) criteria1; (ii) family history of diabetes in first degree relatives; (iii) BMI
< 30 kg/m. Diabetic subjects were recruited at the UMR8090 CNRS unit in Lille (N=271)
and at the Endocrinology-Diabetology Department of the Corbeil-Essonnes Hospital during
medical examination (N=423). Families with at least two T2DM sibs have been recruited
since 1990 by the UMR8090 unit, and details of the recruitment have been described
elsewhere2. All individuals with onset < 25 years and family history compatible with
dominant inheritance have been screened for all known MODY genes and many of the
remaining with onset <40 years were screened for various monogenic forms.

The Inclusion criteria for controls were; (i) age at exam 45 yr.; (ii) normal fasting glucose
according to 1997 ADA criteria; (iii) BMI < 27 kg/m. The control subjects were participants
in the Data from an Epidemiological Study on the Insulin Resistance syndrome (DESIR)
program. Subjects were men and women aged from 30 to 64 years who participated in a 9-
year follow-up study that aims to clarify the development of the insulin resistance syndrome.
Participants were recruited from volunteers insured by the French social security system,
which offers periodic health examinations free of charge. They came from 10 health
examination centers in the western central part of France.

A total of 5,153 subjects had data available at baseline (men/women ratio: 49.6/50.4%; age:
47.2 10.0y., body mass index (BMI): 24.7 3.8 kg/m). Among them, 4,593 subjects were
normoglycemic (fasting plasma glucose < 6.1 mmol/l), and 3,528 (77%) were followed for
incident impaired fasting glucose (IFG) and diabetes during a 9-year period. Further, 3,807
men and women were non obese (BMI < 27 kgm) at baseline, and 2,952 (78%) could be
followed for overweight, class I, II, III obesity during the 9-year period. At the four three
yearly examinations during the 9 years of follow-up, 2,005 subjects were both non obese and
had normal fasting glucose3.

Stage 2: Inclusion criteria for cases were (i) T2DM according to 1997 ADA criteria; (ii) BMI
< 35 kg/m. Diabetic subjects were recruited at the UMR8090 CNRS unit in Lille (N=493), at
the Endocrinology-Diabetology Department of the Corbeil-Essonnes Hospital (N=1,339) and
at the Endocrinology-Diabetology Department of the Poitiers University Hospital (N=790).
Inclusion criteria for controls were (i) age at exam 40 yr.; (ii) normal fasting glucose
according to 1997 ADA criteria; (iii) BMI < 35 kg/m. They were ascertained as part of the
D.E.S.I.R. (N=1,676) and from the Fleurbaix-Laventie Ville Sant cohorts4 (N=226), or at
the UMR8090 CNRS unit in Lille (N=998). The Fleurbaix-Laventie Ville Sant Study is a
longitudinal epidemiological study on the determinants of weight gain. It included 294
families in 1999 (431 children and adolescents, 206 fathers and 244 mothers). All subjects
were of European ancestry.

Informed consent was obtained from all study participants, and the French ethics committee
approved the study protocol.

www.nature.com/nature
www.nature.com/nature 1 1
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

II. Genotyping Methods

In this report, we describe the results of a Stage 1 whole-genome scan for T2DM as well as
confirmation studies for the most significantly associated loci. (Figure S1).

The Stage 1 whole-genome scan was performed using the Illumina Infinium Human1 and
Hap300 BeadArrays. Genomic DNA was extracted from peripheral blood cells using PURE-
GENE D50K DNA isolation kits (Gentra Systems, Minneapolis, MN) or DNeasy Blood &
Tissue Kits (Qiagen, Hilden, Germany). Approximately 750 ng of DNA was used to genotype
each patient sample according to the manufacturers protocol (Illumina, San Diego, CA).
Briefly, the DNA samples were whole-genome amplified, fragmented and hybridized
overnight to allele-specific (Human1) or locus-specific (Hap300) probes on the BeadArray.
Non-specifically hybridized fragments were removed by washing and the remaining
specifically hybridized DNA was fluorescently labelled by a single base extension reaction
and detected using a BeadArray scanner. The scanned images were processed using
BeadStudio 2.0 and the sample genotypes were called using the manufacturers default cluster
settings (Figure S2). Assay accuracy and reproducibility was measured using DNA from
CEPH Utah samples genotyped as part of the HapMap project5 (~4% of genotyped samples
were technical replicates). Association testing was performed for SNPs filtered according to
call rate (>95% required), deviation from Hardy-Weinberg equilibrium (p>0.001 in controls
required), and minor allele frequency (MAF>0.01 required).

Genotypes for the rapid confirmation study were obtained using the Sequenom iPLEX assay
(Sequenom, Cambridge, MA). Locus-specific PCR primers and allele-specific detection
primers were designed using the MassARRAY Assay Design 3.0 software (Sequenom). The
sample DNAs amplified in a 25-31-plex PCR reaction and labelled using a locus-specific
single base extension reaction. The resulting products were desalted and transferred to a 384-
element SpectroCHIP array. Allele detection was performed using Matrix-Assisted Laser
Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS). The mass
spectrograms were analyzed by the MassARRAY TYPER software (Sequenom).

III. Identification and Correction of Population Stratification

To correct for possible population stratification at the intercontinental level, case and control
genotypes were analyzed jointly using STRUCTURE6. In order to increase our power to
discriminate individuals from different geographical origins, we performed our stratification
analysis using 328 SNPs spaced by at least 5 Mb and highly differentiated among individuals
from different continents (FST>0.2 based on the Perlegen dataset7). We also included
genotypes obtained from unrelated individuals representing four populations studied by the
HapMap project (60 CEPH from Utah (CEU), 60 Yoruba (YRI), 45 Han Chinese (CHB) and
44 Japanese (JPT)). These additional samples allowed a better estimation of allele frequencies
in the non-European populations and were included to better separate individuals according to
their continental origin. Three independent runs of 30,000 burn-in steps followed by 100,000
Markov Chain Monte Carlo steps were performed using a model based on three populations
and parameters that allow for admixture and that force the allele frequencies in the different
populations to be correlated prior to the computation. The three runs converged to a single
solution with very similar coefficients of ancestry for each individual (the result of one run is
illustrated in Figure S3). When the members in the data set were split into three populations,
the assignment of the HapMap individuals corresponded to their geographic origin, with each

www.nature.com/nature
www.nature.com/nature 2 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

individual belonging to one population with a coefficient of ancestry larger than 0.90. While
most cases and controls fell within the range of the CEPH individuals from the HapMap, we
identified 43 individuals which lay outside the CEPH cluster. These individuals were
excluded from association testing.

To identify spurious associations resulting from more subtle stratification of the case and
control populations we performed principal component analysis using 20,323 autosomal
SNPs8. We selected SNPs that were separated by at least 100 kb (in order to control for
possible variation due to linkage disequilibrium between loci); that were perfectly genotyped
(100% call rate over all samples); that had MAF 0.1 cases and controls combined and were
not associated with T2DM (p>0.01). For associations that were detected using the additive
model, PCA was performed by coding the individual genotypes at each locus according to the
number of common alleles (0, 1, 2). We extended our PCA analysis to the dominant and
recessive models by using different scoring methods ((0, 1, 1) and (0, 0, 1), respectively). In
order to ensure that each SNP contributed equally to the PCA results, the recoded genotypes
were standardized by subtracting the mean allele frequency and dividing by the standard
deviation at that locus. PCA was performed using the correlation matrix of all samples. Since
the PCA suggests that the first principal component accounts in isolation for most of the
ancestry differences between cases and controls (including up to 4 additional principal
components did not affect the results), we used it as a covariate in a logistic regression of
case-control status against genotype (using the glm function in R).

IV. Association Testing

The genotype data was analyzed for association with disease for the first stage data and the
validation data separately. The Stage 1 data consists of two sets, from the Human1 (100k) and
the Hap300 (300k) arrays.

A 2x3 genotype count table was formed for each SNP (E1). The allele for which the
frequency was higher in the cases than in the controls was denoted A, and the other allele
a.

aa aA AA Sum
Cases r0 r1 r2 R E1
Controls s0 s1 s2 S
Count n0 n1 n2 N

www.nature.com/nature
www.nature.com/nature 3 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

For each SNP i, the association between marker and disease was measured using Armitage's
trend test. For the autosomal SNPs, test statistics9 for additive, dominant and recessive models
were calculated according to equations E2-E4:

N [N (r1 + 2r2 ) R(n1 + 2n2 )]


2
=
[ ]
2
X
R(N R ) N (n1 + 4n2 ) (n1 + 2n2 )
A, i 2 E2

N [N (r1 + r2 ) R(n1 + n2 )]
2
=
[ ]
2
X
R( N R ) N (n1 + n2 ) (n1 + n2 )
D,i 2 E3

N ( Nr2 Rn2 )
2
2
=
X R ,i
(
R(N R ) Nn2 n22 ) E4

For each genetic model, the corresponding test statistic should theoretically be distributed as a
2-distribution, but because of (for example) population stratification and genotyping errors,
this is not the case. We have corrected for this using genomic control10. The test statistic is
modelled as a 2-distribution scaled by a variance inflation factor . We estimate this factor
for each genetic model by computing the ratio of the average measured test statistics to the
average expected test statistic, both taken over the lowest 90% of the distribution11:


M
X 2,i
= i =1

i , E5
i =1 F 1 N + 1
M

where the statistics X have been ranked in increasing order over the N tested SNPs, F is the 2
distribution for one degree of freedom, and M is the number of SNPs considered for the
estimate (90% of N). Following the calculation of the variance inflation factor, we re-estimate
the test statistics for each SNP for the three genetics models by dividing with the scaling
factor found.

The effect of this adjustment can be seen in Figures S4 and S5, where observed p-values
before and after adjustment for the variance inflation factor are plotted as a function of
expected ones, assuming an underlying uniform distribution of p-values.

A max statistic was formed across these to select the strongest obtainable association for any
of the three models.

2
X max, 2
{
i = max X A,i , X D ,i , X R ,i
2 2
} E6

P-values were calculated for the observed test statistic against the null distribution for the
genetic model giving the strongest association. Also, since the distribution for the max
statistic itself under the null hypothesis is not known, we establish such p-values by
permutation testing. To obtain these, Nperm permutations of the disease state vector were done

www.nature.com/nature
www.nature.com/nature 4 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

for each SNP and each genetic model; the three test statistics were calculated and adjusted
with the factor; and a max statistic was formed. The empirical p-value was estimated as the
frequency of observing a max statistic from the genome-wide (all SNPs, all permutations)
table derived from permuted data exceeding the corresponding test statistic from the observed
data. A genome-wide p-value was estimated for each permutation round by finding the
maximal test statistic across all SNPs. The genome-wide p-value was reported as the
frequency with which the genome-wide maxima for the permuted data exceeded the
maximum test statistic in the experimental data.

Association testing for Stage 1 genotypes: SNPs that were genotyped in Stage 1 were
filtered using call rate (>95% required), deviation from Hardy-Weinberg equilibrium
(p>0.001 in controls required), and minor allele frequency (MAF>0.01 required). After
filtering, the data set included 100,764 SNPs from the Human1 data (Table S2) and 309,163
SNPs from the Hap300 data (Table S3). Because of some overlap, a total number of 392,935
unique SNPs were tested. Results from the two arrays were analyzed separately because
different individuals failed in each, and because of different variance inflation factors (see
below). Good concordance was observed for the 18,068 SNPs, common to both chips. In total,
1,275 samples were assayed on them, totalling 23,036,700 tests. Of the 22,772,253 tests that
passed quality control 22,629,676 (or 99.37%) were assigned identically.

Variance inflation factors were estimated as previously described. For the Human1 array we
calculated factors of 1.1523 (additive model) 1.0925 (dominant) and 1.1085 (recessive). For
the Hap300, the corresponding factors were 1.1223, 1.0558 and 1.1136. P-P plots for the
adjusted and unadjusted max statistic show that the adjusted probabilities better match the
expected ones (Figures S4 and S5). Using these plots, we set the cutoff for inclusion in the
fast-tracked Stage 2 at p = 1 x 10-4 for the SNPs in the Human1 array data, and at p = 5 x 10-5
for the Hap300 data. This results in 28 and 43 associations from the Human1 and Hap300
arrays respectively (Tables S4 and S5). Additionally, less strong associations were observed
for SNPs in previously reported T2DM susceptibility loci (Table S6).

Association testing in the second population: Sequenom iPLEX assays were used to obtain
genotypes for 59 out of the 66 unique SNPs that passed the selected cutoff thresholds. Only
the strongest associated SNP out of the eight ones located in TCF7L2 was tested. Two SNPs,
rs1256526 and rs7712842, could not be genotyped using the iPLEX assay. rs1256526 was
successfully genotyped using fluorescence polarization assays, while rs7712842 failed on this
platform. Genotypes for one SNP, rs932206, were not in Hardy-Weinberg Equilibrium for
samples in the control population (pHWE = 6.5 x 10-5) and this SNP was excluded from
further analysis. Genotyping data was analyzed as for the Stage 1 data (Table S7). Bonferroni
correction over the remaining 57 SNPs tested gives a significance threshold of p = 8.8 x 10-4,
which was passed by 8 SNPs. In total, 8 SNPs representing 5 unique loci showed significant
T2DM association.

In Stage 2, the additive model gave the highest 2 value in all five significantly associated loci
(Table 1), which is consistent with the finding in Stage 1, except for rs13266634 (SLC30A8),
in which the recessive model was better, and the 3 SNPs in EXT2, where the dominant model
was clearly better. This is, perhaps, not unexpected, as the true effect of genotype on risk for
such a complex phenotype as T2DM is unlikely to fit any of these models. Discrepancies,
especially small ones, can occur because of the random distribution of genotype sampling. If
real, such discrepancies likely reflect true differences in the thresholds and maxima that

www.nature.com/nature
www.nature.com/nature 5 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

modify independent contribution of each allele between the Stage 1 and Stage 2 cohorts as a
result of differences in genetic makeup and environmental exposures.

For a general two-stage case-control study, it is now the consensus that a joint analysis of first
and second stage data is advantageous over replication12. However, when a small fraction of
SNPs are carried on to a second stage and the first stage is performed on < 30% of the entire
case-control population, combined analysis does not increase study power12. Since Stage 1 in
our design was performed on 20% of the total case-control sample and the current study only
fast-tracked a small number of SNPs to Stage 2 we opted not to do a joint analysis. Instead,
we consider SNPs passing the Bonferroni correction for 57 tests (p < 8.8 x 10-4) on
permutation p-values from Stage 2 as showing significant association.

Calculation of population attributable risk: In order to detect interactions among our


confirmed associations (Table 2), we fitted a logistic regression of disease status against age,
sex and BMI using 4,971 perfectly genotyped samples from the validation population. Since
all the associations were detected using the additive model, genotypes were coded according
to the number of risk alleles. This analysis showed that a single SNP at each locus explains
the full effect of that locus, and that the 5 loci contribute almost independently to the risk of
T2DM. We estimated the population attributable risk (PAR) for each confirmed association
(Table 2) assuming a T2DM prevalence of 7% in the general French population13. In order to
estimate the global PAR of these loci for T2DM, we considered an additive joint effects
model and a multiplicative joint effects model14. For each possible combination of loci (6
different sets of SNPs), we computed the PAR under both models. Because the PAR
estimation slightly varied between sets of SNPs, we reported the mean estimated PAR for
each model. Based on additive and multiplicative models, we calculated mean PARs of 58%
and 70%, respectively.

Identification of gene transcripts linked to significantly associated SNPs. To link


significantly associated SNPs to genes, the haplotype block structure was characterized for
each confirmed SNP and SNPs lying within a region including the neighbouring genes and at
least 20 kb of flanking sequence. If the confirmed SNP was intergenic, SNPs from the
intergenic regions plus nearby flanking genes were included in the haplotype analysis. The
SNP and gene annotations were based on dbSNP125, EntrezGene, and human genome build
35 compiled for the UCSC Genome Browser15. Pseudogenes and hypothetical genes were
excluded from the analysis. For these SNPs, Stage 1 genotype data was obtained for the 1,275
samples which were studied with both the Illumina Human1 and Hap300 BeadArray.
Pairwise D and r2 linkage disequilibrium (LD) measures between all SNPs with MAF>=0.01
were estimated with Haploview16 (version 3.32). LD measures and phased haplotypes were
visualized as heatmaps with R packages (http://www.r-project.org).

V. Calculation of Study Power

We considered a biometric model with quantitative liability trait L and threshold T above
which an individual is declared affected by the disease. The trait L is assumed to follow a
mixture of 3 normal distributions (one per genotype) to account for extreme selection of cases
and controls as well as for familial correlation not attributable to the tested mutation. This last
assumption is crucial in that a single gene responsible for the disease can bias power analyses.

Our biometric model involves two sets of parameters. The first set expresses the disease risk
in terms of the ratio between the genotype risk of carrying at least one at-risk allele to that of

www.nature.com/nature
www.nature.com/nature 6 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

carrying no at-risk alleles (genotype relative risk: GRR). The second set of parameters,
peculiar to the QTL liability threshold model, specifies a different normal distribution for
each genotype with mean g and residual variance 2R (independent of the disease locus)
composed of polygenic (2p) and environmental (2e) components. Both sets of parameters
can reflect the genetic model of inheritance (additive, recessive or dominant) and are
evaluated for fixed at-risk allele frequency and population prevalence.

For each inheritance mode, we used iterative methods to find a correspondence between
GRRs and genetic heritability (genetic variance) by testing the ratio of penetrances at different
disease thresholds (i.e. by altering the value of the parameter T). Next, we computed genotype
frequencies for individuals lying below the 20th and 93rd percentiles of the liability trait (to
model the controls for stage 1 and stage 2 respectively) and above the 97th and 93rd percentiles
(to model the cases for stage 1 and stage 2). We determined the genotype frequencies for
probands of concordant affected sib-pairs through simulation (10,000 replications). For each
simulated sib, the liability value equalled the sum of genotypic, polygenic and environmental
means (g + p + e). When a sib-pair was found to be concordant affected, it was used to
estimate genotype frequency.

The estimated genotype frequencies of cases and controls were used to generate two samples,
S1 (cases and controls for stage 1) and S2 (cases and controls for stage 2), using the
multinomial random generator rmultinom (R statistical analysis package). Based on our study
design, we tested association using Armitages trend test on each simulated replicate: one test
(T1) was performed on sample S1 only and a second test (T2) was performed on the pooled
samples (S1 and S2). The power was computed as the number of times T1 was significant at a
p-value of 0.05 and T2 was significant at a p-value of 1.25 x 10-7. This flexible model allowed
power estimation under conditions as close as possible to our full two-stage design. Power
analysis (Table S8) show that we have good probability of finding a variants with effect as
low as 1.3 for allele frequencies higher than 0.1.

www.nature.com/nature
www.nature.com/nature 7 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

REFERENCES:

1. Report of the Expert Committee on the Diagnosis and Classification of Diabetes


Mellitus. Diabetes Care 20, 1183-97 (1997).
2. Vionnet, N. et al. Genetics of NIDDM in France: studies with 19 candidate genes in
affected sib pairs. Diabetes 46, 1062-8 (1997).
3. Balkau, B. An epidemiologic survey from a network of French Health Examination
Centres, (D.E.S.I.R.): epidemiologic data on the insulin resistance syndrome. Rev
Epidemiol Sante Publique 44, 373-5 (1996).
4. Lafay, L. et al. Determinants and nature of dietary underreporting in a free-living
population: the Fleurbaix Laventie Ville Sante (FLVS) Study. Int J Obes Relat Metab
Disord 21, 567-73 (1997).
5. A haplotype map of the human genome. Nature 437, 1299-320 (2005).
6. Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using
multilocus genotype data. Genetics 155, 945-59 (2000).
7. Hinds, D. A. et al. Whole-genome patterns of common DNA variation in three human
populations. Science 307, 1072-9 (2005).
8. Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene
frequencies in Europeans. Science 201, 786-92 (1978).
9. Sasieni, P. D. From genotypes to genes: doubling the sample size. Biometrics 53,
1253-61 (1997).
10. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997-
1004 (1999).
11. Clayton, D. G. et al. Population structure, differential bias and genomic control in a
large-scale, case-control association study. Nat Genet 37, 1243-6 (2005).
12. Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more
efficient than replication-based analysis for two-stage genome-wide association
studies. Nat Genet 38, 209-13 (2006).
13. Gourdy, P. et al. Prevalence of type 2 diabetes and impaired fasting glucose in the
middle-aged population of three French regions - The MONICA study 1995-97.
Diabetes Metab 27, 347-58 (2001).
14. Yang, Q., Khoury, M. J., Friedman, J., Little, J. & Flanders, W. D. How many genes
underlie the occurrence of common complex diseases in the population? Int J
Epidemiol 34, 1129-37 (2005).
15. Karolchik, D. et al. The UCSC Genome Browser Database. Nucleic Acids Res 31, 51-
4 (2003).
16. Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization
of LD and haplotype maps. Bioinformatics 21, 263-5 (2005).

www.nature.com/nature
www.nature.com/nature 8 2
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

SUPPLEMENTARY TABLES

TABLE S1: Description of DNA sample sets.


Numbers slightly differ between the two arrays because different samples failed in each. A total of 661 cases and 614 controls were successful in
both Stage 1 platforms.

Genotyping Disease Sex Ratio Age at Age at BMI


Number
Stage Status (M / F) Diagnosis (yr) exam (yr) (kg/m)

1 (Human1) T2DM 686 416 / 270 44.9 8.4 59.9 10.3 25.8 2.8

1 (Human1) Controls 669 263 / 406 n. a. 53.4 5.6 23.2 1.8

1 (Hap300) T2DM 694 422 / 272 45.0 8.4 60.2 10.4 25.8 2.8

1 (Hap300) Controls 654 269 / 385 n. a. 53.4 5.7 23.2 1.8

2 T2DM 2617 1628 / 989 50.4 11.0 62.2 11.0 28.9 3.6

2 Controls 2894 1240 / 1654 n. a. 56.4 10.2 25.3 3.5

www.nature.com/nature 9
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

TABLE S2. Human1 BeadArray summary statistics. The Human1 array contains probes for 109,365 SNPs. SNPs were excluded from
association testing if they failed any of the following criteria: (1) call rate>95% over all samples, (2) Hardy-Weinberg equilibrium (HWE)
p>0.001 for control samples and (3) minor allele frequency (MAF) >0.01 for cases or controls.

Total Failed Failed Failed Analyzed


Chromosome
SNPs call rate HWE MAF SNPs
1 9819 369 165 378 9055
2 8702 304 119 265 8125
3 7207 269 118 257 6661
4 6000 259 62 167 5574
5 6329 238 69 166 5911
6 6579 289 84 156 6108
7 5581 220 75 171 5179
8 4891 209 85 157 4509
9 4480 149 63 167 4161
10 5240 211 88 215 4800
11 5928 224 100 211 5477
12 5465 220 103 184 5044
13 3093 141 38 97 2850
14 3420 141 68 140 3131
15 3307 112 46 131 3054
16 3388 91 52 160 3131
17 4079 144 81 184 3746
18 2570 112 58 108 2345
19 3520 128 74 113 3271
20 3007 113 52 104 2784
21 1381 68 14 31 1279
22 1886 75 29 69 1734
X 3430 160 0 497 2788
Y 13 4 0 2 8
XY 50 6 8 0 39

Total 109365 4256 1651 4130 100764

www.nature.com/nature 10
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

TABLE S3. Hap300 BeadArray summary statistics. The Hap300 array contains probes for 317,503 SNPs. SNPs were excluded from
association testing if they failed any of the following criteria: (1) call rate>95% over all samples, (2) Hardy-Weinberg equilibrium (HWE)
p>0.001 for control samples and (3) minor allele frequency (MAF) >0.01 for cases or controls.

Total Failed Failed Failed Analyzed


Chromosome
SNPs call rate HWE MAF SNPs
1 23275 472 157 6 22734
2 25351 545 161 25 24716
3 21580 484 154 6 21006
4 19113 448 132 5 18596
5 19272 419 121 26 18783
6 20811 447 157 13 20268
7 16675 409 104 10 16207
8 18274 373 109 18 17825
9 15835 339 92 8 15461
10 15592 349 133 16 15180
11 14660 377 99 18 14225
12 15032 307 112 30 14650
13 11526 255 73 2 11242
14 9829 197 44 2 9616
15 8900 201 72 24 8659
16 9006 196 80 11 8765
17 8343 217 72 3 8100
18 10495 227 67 9 10224
19 5927 219 78 1 5689
20 7836 163 46 0 7654
21 5493 111 35 6 5357
22 5505 124 46 3 5360
X 9171 325 0 2 8844
Y 0 0 0 0 0
XY 2 0 0 0 2

Total 317503 7204 2144 244 309163

www.nature.com/nature 11
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

TABLE S4. T2DM Associated SNPs identified using Human1 BeadArrays. Twenty-eight Stage 1 SNPs passed the selection cutoff (lambda-
corrected pMAX <1 x 10-4) for the Human1 array. Gene names refer to the gene in which the SNP is located or the closest gene. pMAX refers to
the smallest p-value obtained from any of the three genetic models, whereas the pMAX obtained from permutations were estimated by
generating the null distribution of the MAX statistic. Both p-values have been corrected for variance inflation. r0, r1 and r2 are genotype counts
for the cases, where r2 is the count for homozygotes carrying the risk allele. s0, s1 and s2 are the corresponding counts for the controls.

SNP Chr Position r0 r1 r2 s0 s1 s2 pMAX (corrected) pMAX (permutation) Closest gene

rs7900150 10 114783813 129 326 229 198 325 143 5.1 x 10-8 2.1 x 10-8 TCF7L2
rs7100927 10 114786038 129 328 229 198 326 143 5.2 x 10-8 2.1 x 10-8 TCF7L2
rs1193179 1 7503868 340 288 58 423 202 44 1.2 x 10-6 6.3 x 10-7 CAMTA1
rs932206 2 136659004 134 285 267 158 333 178 4.6 x 10-6 2.8 x 10-6 CXCR4
rs1978717 19 57189062 300 308 75 364 260 36 7.5 x 10-6 4.9 x 10-6 ZNF615
-5
rs11084127 19 57187192 300 311 75 363 266 36 1.1 x 10 7.4 x 10-6 ZNF615
rs1111875 10 94452862 77 298 310 122 316 231 1.2 x 10-5 8.6 x 10
-6
HHEX
rs11084128 19 57187458 299 302 76 363 264 36 1.3 x 10-5 8.8 x 10-6 ZNF615
rs282705 4 59343615 24 239 423 60 264 345 1.3 x 10-5 9.0 x 10-6 LOC644419
rs1836002 19 57190334 300 311 75 364 268 37 1.5 x 10-5 1.1 x 10-5 ZNF615
rs3740878 11 44214378 25 273 386 65 249 353 1.8 x 10-5 1.3 x 10-5 EXT2
-5
rs11037909 11 44212190 25 274 387 65 251 353 1.8 x 10 1.3 x 10-5 EXT2
rs8101509 19 57100638 303 297 80 344 285 33 2.2 x 10-5 1.6 x 10-5 ZNF649
rs2499953 11 4967481 646 39 1 660 9 0 2.3 x 10-5 1.7 x 10-5 MMP26
rs6670163 1 233862625 34 204 448 45 266 358 2.7 x 10-5 2.0 x 10-5 RYR2
rs945384 9 136892579 614 69 3 640 28 1 3.6 x 10-5 2.9 x 10-5 FAM69B
rs1113132 11 44209979 25 271 390 63 251 355 3.7 x 10-5 2.9 x 10-5 EXT2
-5
rs2278419 19 57163625 319 294 69 368 270 27 4.1 x 10 3.3 x 10-5 ZNF350
rs7651936 3 163505661 156 326 204 186 351 131 4.1 x 10-5 3.3 x 10-5 LOC131149
rs10211998 22 35569377 26 189 466 37 249 380 4.1 x 10-5 3.3 x 10-5 FLJ90680
rs5756371 22 35569520 28 194 462 41 251 376 5.1 x 10-5 4.2 x 10-5 FLJ90680
rs13064991 3 45809815 15 177 494 27 233 409 5.5 x 10-5 4.6 x 10-5 SLC6A20
rs1256517 14 64805437 471 184 17 527 116 15 5.5 x 10-5 4.6 x 10-5 LOC646279
-5
rs6541240 1 227397313 83 253 350 101 303 265 6.2 x 10 5.2 x 10-5 TTC13
rs6413504 19 11102915 126 340 215 185 313 163 8.2 x 10-5 7.1 x 10-5 LDLR
rs2050831 9 77117950 36 184 466 29 258 381 8.4 x 10-5 7.3 x 10-5 VPS13A
rs873492 22 44818409 270 321 95 337 256 76 9.6 x 10-5 8.5 x 10-5 FLJ27365
rs11078674 17 7251197 297 304 83 354 267 46 9.8 x 10-5 8.7 x 10-5 NLGN2

www.nature.com/nature 12
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

TABLE S5. T2DM Associated SNPs identified using Hap300 BeadArrays. Forty-three Stage 1 SNPs passed the selection cutoff (lambda-
corrected pMAX < 5 x 10-5) for the Hap300 array. Gene names refer to the gene in which the SNP is located, or the closest gene. pMAX refers to
the smallest p-value obtained from any of the three genetic models, whereas the pMAX obtained from permutations were estimated by
generating the null distribution of the MAX statistic. Both p-values have been corrected for variance inflation. r0, r1 and r2 are genotype counts
for the cases, where r2 is the count for homozygotes carrying the risk allele. s0, s1 and s2 are the corresponding counts for the controls.

SNP Chr Position r0 r1 r2 s0 s1 s2 pMAX (corrected) pMAX (permutation) Closest gene

rs7903146 10 114748339 197 348 149 335 254 65 3.2 x 10-17 <3.3 x 10-10 TCF7L2
rs12255372 10 114798892 221 342 131 332 267 55 1.4 x 10-13 <3.3 x 10-10 TCF7L2
rs10885409 10 114798062 121 324 248 191 325 138 1.8 x 10-10 <3.3 x 10-10 TCF7L2
rs7904519 10 114763917 118 329 247 189 325 140 2.7 x 10-10 <3.3 x 10-10 TCF7L2
rs932206 2 136659004 135 282 272 156 328 170 6.3 x 10-7 3.9 x 10-7 CXCR4
rs35666 12 91036838 494 188 12 539 108 7 2.1 x 10-6 1.5 x 10-6 BTG1
rs7950175 11 126033245 30 216 448 45 274 335 2.6 x 10-6 1.9 x 10-6 KIRREL3
rs4918789 10 114811797 158 338 191 220 309 125 3.3 x 10-6 2.4 x 10-6 TCF7L2
rs7923837 10 94471897 66 300 328 116 296 242 3.4 x 10-6 2.5 x 10-6 HHEX
rs1037386 3 1453453 60 342 292 113 278 263 4.0 x 10-6 3.0 x 10-6 CNTN6
rs1193179 1 7503868 350 283 59 414 201 39 4.6 x 10-6 3.4 x 10-6 CAMTA1
rs1256526 14 64809658 202 358 132 270 283 101 6.1 x 10-6 4.7 x 10-6 LOC646279
rs6894954 5 144294556 313 275 104 308 301 45 6.4 x 10-6 5.0 x 10-6 KCTD16
rs290483 10 114905204 72 271 339 111 284 244 6.8 x 10-6 5.3 x 10-6 TCF7L2
rs7712842 5 144247299 315 275 104 310 299 45 7.0 x 10-6 5.4 x 10-6 KCTD16
rs2317948 1 55146464 48 327 311 97 293 264 7.1 x 10-6 5.5 x 10-6 TMEM61
rs859101 1 95036805 205 359 130 271 278 105 8.8 x 10-6 7.0 x 10-6 SLC44A3
rs2327112 6 8944645 146 325 223 157 360 136 8.8 x 10-6 7.0 x 10-6 LOC389365
rs1111875 10 94452862 77 302 315 119 308 227 9.1 x 10-6 7.3 x 10-6 HHEX
rs2589001 16 53776363 196 334 163 249 305 99 9.5 x 10-6 7.7 x 10-6 LOC654106
rs9290240 3 165780254 310 313 69 374 238 42 9.8 x 10-6 7.9 x 10-6 SI

www.nature.com/nature 13
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

TABLE S5. T2DM Associated SNPs identified using Hap300 BeadArrays (continued).

SNP Chr Position r0 r1 r2 s0 s1 s2 pMAX (corrected) pMAX (permutation) Closest gene

rs282705 4 59343615 23 244 427 58 258 338 1.2 x 10-5 9.5 x 10-6 LOC644419
rs2866016 4 99861413 37 270 387 70 289 295 1.2 x 10-5 1.0 x 10-5 TSPAN5
rs7949067 11 44248060 138 356 188 201 295 154 1.3 x 10-5 1.1 x 10-5 ALX4
rs1978717 19 57189062 304 315 75 357 259 37 1.4 x 10-5 1.2 x 10-5 ZNF615
rs7480010 11 42203294 301 327 66 363 246 45 1.5 x 10-5 1.2 x 10-5 LOC387761
rs2288887 19 57187615 304 314 75 357 259 37 1.5 x 10-5 1.3 x 10-5 ZNF615
rs729287 11 44236666 26 275 393 64 244 346 1.6 x 10-5 1.3 x 10-5 ALX4
rs12629276 3 16403485 18 195 478 34 242 377 1.6 x 10-5 1.4 x 10-5 RAFTLIN
rs1005316 17 66501964 13 224 457 44 211 399 1.6 x 10-5 1.4 x 10-5 LOC124685
rs1888533 21 45825267 136 377 181 196 294 164 1.7 x 10-5 1.5 x 10-5 LOC728117
rs375694 21 42907029 415 238 41 465 166 22 1.9 x 10-5 1.6 x 10-5 SLC37A1
rs1293143 20 52351866 112 315 267 131 347 176 1.9 x 10-5 1.6 x 10-5 PFDN4
rs13266634 8 118253964 54 229 411 53 293 307 2.1 x 10-5 1.8 x 10-5 SLC30A8
rs1293144 20 52350615 134 323 236 157 346 151 2.5 x 10-5 2.3 x 10-5 PFDN4
rs11249433 1 120892655 88 352 253 141 305 208 2.5 x 10-5 2.3 x 10-5 LOC653464
rs2876711 13 76314505 99 322 272 121 351 182 2.7 x 10-5 2.4 x 10-5 KCTD12
rs231461 17 39388569 529 153 12 559 89 6 2.8 x 10-5 2.5 x 10-5 PYY
rs6823091 4 153427388 444 217 33 489 149 16 3.0 x 10-5 2.7 x 10-5 FBXW7
rs10823406 10 70982029 36 201 457 46 254 354 3.1 x 10-5 2.9 x 10-5 NEUROG3
rs10483096 22 16926334 24 201 449 36 250 348 3.8 x 10-5 3.6 x 10-5 PEX26
rs10503677 8 20248324 220 354 115 281 277 95 4.3 x 10-5 4.0 x 10-5 LZTS1
rs11249431 1 120898245 205 368 109 263 288 90 4.9 x 10-5 4.7 x 10-5 LOC653464

www.nature.com/nature 14
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

TABLE S6. Association results for known T2DM susceptibility loci.

Most strongly OR OR pMAX pMAX


Gene / SNP
associated SNP (het) (hom) (corrected) (permutation)

HNF4a / rs1884614 rs2425637 1.33 1.67 0.0029 0.0040


CAPN10 / rs2975760 rs7571442 0.95 1.67 0.027 0.044
ENPP1 / rs1044498 rs7769712 7.75 7.50 0.031 0.050
KCNJ11 / rs5219 rs2051772 1.34 1.39 0.047 0.074
ACDC rs6444175 1.23 1.27 0.061 0.10
PPARG / rs1801282 rs17793693 1.22 3.93 0.066 0.11
GCK / rs1799884 rs2268576 0.89 1.15 0.10 0.17

www.nature.com/nature 15
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

TABLE S7: Validation studies for best stage 1 T2DM-associated SNPs. Gene names refer to the gene in which the SNP is located, or the
closest gene. pMAX refers to the smallest p-value obtained from any of the three genetic models, whereas the pMAX obtained from
permutations were estimated by generating the null distribution of the MAX statistic. r0, r1 and r2 are genotype counts for the cases, where r2 is
the count for homozygotes carrying the risk allele. s0, s1 and s2 are the corresponding counts for the controls.

OR OR pMAX
SNP Chr Position r0 r1 r2 s0 s1 s2 pMAX Closest gene
(het) (hom) (permutation)
rs7903146 10 114748339 876 1215 408 1417 1194 238 1.65 2.77 1.5 x 10-34 < 1.0 x 10-7 TCF7L2
-8
rs13266634 8 118253964 177 945 1440 265 1200 1413 1.18 1.53 6.1 x 10 5.0 x 10-7 SLC30A8
rs1111875 10 94452862 334 1172 1065 459 1355 1015 1.19 1.44 3.0 x 10-6 7.4 x 10-6 HHEX
-6
rs7923837 10 94471897 278 1090 1089 412 1326 1114 1.22 1.45 7.5 x 10 2.2 x 10-5 HHEX
rs7480010 11 42203294 1170 1122 316 1414 1185 273 1.14 1.40 1.1 x 10-4 2.9 x 10-4 LOC387761
-4
rs3740878 11 44214378 141 954 1480 207 1114 1484 1.26 1.46 1.2 x 10 2.8 x 10-4 EXT2
rs11037909 11 44212190 140 953 1480 207 1106 1491 1.27 1.47 1.8 x 10-4 4.5 x 10-4 EXT2
-4
rs1113132 11 44209979 139 941 1492 191 1128 1511 1.15 1.36 3.3 x 10 8.1 x 10-4 EXT2
rs729287 11 44236666 140 939 1479 192 1124 1512 1.15 1.34 6.7 x 10-4 1.6 x 10-3 ALX4
-4
rs1005316 17 66501964 89 669 1708 89 913 1856 0.73 0.92 8.3 x 10 2.0 x 10-3 LOC124685
rs2876711 13 76314505 389 1191 989 484 1404 987 1.06 1.25 1.4 x 10-3 3.5 x 10-3 KCTD12
-3
rs1256526 14 64809658 865 1265 482 1045 1355 455 1.13 1.28 1.5 x 10 1.5 x 10-3 LOC646279
rs10823406 10 70982029 131 852 1578 186 1020 1668 1.19 1.34 2.6 x 10-3 6.2 x 10-3 NEUROG3
-3
rs6413504 19 11102915 585 1259 709 712 1470 697 1.04 1.24 2.8 x 10 6.7 x 10-3 LDLR
rs7949067 11 44248060 605 1332 673 761 1431 675 1.17 1.25 3.2 x 10-3 7.7 x 10-3 ALX4
-3
rs1256517 14 64805437 1843 635 66 2158 670 44 1.11 1.76 5.1 x 10 1.1 x 10-2 LOC646279
rs2499953 11 4967481 2437 128 2 2781 105 0 1.39 n.d. 7.0 x 10-3 8.3 x 10-3 MMP26
-3
rs1193179 1 7503868 1364 1012 196 1573 1047 178 1.11 1.27 9.1 x 10 2.0 x 10-2 CAMTA1
rs11078674 17 7251197 1188 1122 298 1344 1245 277 1.02 1.22 3.4 x 10-2 7.2 x 10-2 NLGN2
-2
rs2327112 6 8944645 529 1320 757 639 1445 778 1.10 1.18 4.0 x 10 8.5 x 10-2 LOC389365
rs11249431 1 120898245 975 1233 394 1090 1393 387 0.99 1.14 8.0 x 10-2 1.6 x 10-1 LOC653464
-2
rs2866016 4 99861413 188 1041 1330 247 1155 1469 1.18 1.19 8.9 x 10 1.8 x 10-1 TSPAN5
rs2317948 1 55146464 267 1115 1183 296 1312 1255 0.94 1.05 9.1 x 10-2 1.8 x 10-1 TMEM61
-2
rs12629276 3 16403485 94 786 1729 127 891 1854 1.19 1.26 9.6 x 10 1.9 x 10-1 RAFTLIN
rs2589001 16 53776363 794 1261 484 956 1381 527 1.10 1.11 9.8 x 10-2 1.9 x 10-1 LOC654106
-1
rs8101509 19 57100638 1282 957 239 1442 1158 237 0.93 1.13 1.0 x 10 2.0 x 10-1 ZNF649
rs10483096 22 16926334 110 809 1649 126 963 1782 0.96 1.06 1.0 x 10-1 2.0 x 10-1 PEX26
-1
rs282705 4 59343615 146 963 1453 191 1087 1593 1.16 1.19 1.5 x 10 2.8 x 10-1 LOC644419

www.nature.com/nature 16
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

TABLE S7: Validation studies for best stage 1 T2DM-associated SNPs (continued).

OR OR pMAX pMAX
SNP Chr Position r0 r1 r2 s0 s1 s2 Closest gene
(het) (hom) (corrected) (permutation)
rs9290240 3 165780254 197 1060 1303 252 1142 1485 1.19 1.12 1.6 x 10-1 3.0 x 10-1 SI
rs1888533 21 45825267 593 1309 704 696 1420 754 1.08 1.10 1.9 x 10-1 3.5 x 10-1 LOC728117
rs1293143 20 52351866 440 1245 917 487 1410 963 0.98 1.05 2.2 x 10-1 4.0 x 10-1 PFDN4
rs10211998 22 35569377 1628 857 125 1831 907 128 1.06 1.10 2.4 x 10-1 4.3 x 10-1 FLJ90680
-1
rs945384 9 136892579 2329 243 7 2545 242 13 1.10 0.59 2.5 x 10 3.9 x 10-1 FAM69B
rs2050831 9 77117950 1610 822 133 1817 929 130 1.00 1.15 2.5 x 10-1 4.5 x 10-1 VPS13A
-1
rs6823091 4 153427388 1654 811 104 1885 874 115 1.06 1.03 3.5 x 10 5.8 x 10-1 FBXW7
rs5756371 22 35569520 1620 860 132 1809 929 132 1.03 1.12 3.5 x 10-1 5.8 x 10-1 FLJ90680
-1
rs873492 22 44818409 336 1183 1090 345 1347 1171 0.90 0.96 3.5 x 10 5.8 x 10-1 FLJ27365
rs6670163 1 233862625 1530 918 129 1695 978 130 1.04 1.10 3.6 x 10-1 5.9 x 10-1 RYR2
-1
rs375694 21 42907029 1668 770 105 1900 865 105 1.01 1.14 3.7 x 10 6.0 x 10-1 SLC37A1
rs11249433 1 120892655 452 1311 845 519 1438 908 1.05 1.07 4.3 x 10-1 6.7 x 10-1 LOC653464
-1
rs859101 1 95036805 448 1201 958 490 1354 1026 0.97 1.02 4.4 x 10 6.9 x 10-1 SLC44A3
rs1037386 3 1453453 1056 1180 323 1214 1302 358 1.04 1.04 4.7 x 10-1 7.2 x 10-1 CNTN6
-1
rs11084127 19 57187192 1287 1077 236 1443 1170 250 1.03 1.06 4.8 x 10 7.3 x 10-1 ZNF615
rs1978717 19 57189062 1278 1060 231 1450 1176 246 1.02 1.07 5.1 x 10-1 7.6 x 10-1 ZNF615
-1
rs2288887 19 57187615 1278 1058 233 1450 1175 249 1.02 1.06 5.3 x 10 7.7 x 10-1 ZNF615
rs11084128 19 57187458 1294 1079 236 1445 1173 251 1.03 1.05 5.4 x 10-1 7.9 x 10-1 ZNF615
-1
rs7950175 11 126033245 1497 923 147 1693 1025 155 1.02 1.07 5.6 x 10 8.0 x 10-1 KIRREL3
rs6894954 5 144294556 1079 1158 296 1224 1322 321 0.99 1.05 5.7 x 10-1 8.2 x 10-1 KCTD16
-1
rs1836002 19 57190334 1278 1067 233 1430 1166 249 1.02 1.05 5.8 x 10 8.2 x 10-1 ZNF615
rs7651936 3 163505661 605 1297 703 676 1437 760 1.01 1.03 6.6 x 10-1 8.8 x 10-1 LOC131149
-1
rs13064991 3 45809815 1789 740 82 1984 801 88 1.02 1.03 6.7 x 10 9.0 x 10-1 SLC6A20
rs1293144 20 52350615 784 1278 540 855 1426 583 0.98 1.01 7.2 x 10-1 9.2 x 10-1 PFDN4
-1
rs231461 17 39388569 2123 463 24 2344 505 24 1.01 1.10 7.4 x 10 9.2 x 10-1 PYY
rs6541240 1 227397313 1102 1166 336 1217 1290 363 1.00 1.02 7.8 x 10-1 9.5 x 10-1 TTC13
-1
rs2278419 19 57163625 1355 1052 204 1501 1149 223 1.01 1.01 8.0 x 10 9.6 x 10-1 ZNF350
rs35666 12 91036838 42 542 1970 46 620 2210 0.96 0.98 8.0 x 10-1 9.7 x 10-1 BTG1
-1
rs10503677 8 20248324 421 1224 958 463 1353 1051 0.99 1.00 9.1 x 10 9.9 x 10-1 LZTS1

www.nature.com/nature 17
doi: 10.1038/nature05616 S U P P L E M E N TA RY I N FO R M AT I O N

TABLE S8. Estimation of Study Power. Power (in %) is shown for r2 = 1 and r2 = 0.8 (in parenthesis) for different values of the genotype
relative risk (GRR) and the at-risk allele frequency (AF). Prevalence of T2DM in the French population was estimated at 7% and prevalence of
early-onset disease was estimated at 3%.

GRR AF = 0.1 AF = 0.2 AF = 0.3


ADD REC DOM ADD REC DOM ADD REC DOM
1.2 30 (10) <1 (<1) 18 (5) 88 (64) <1 (<1) 40 (25) 97 (88) <1 (<1) 30 (17)
1.3 97 (84) <1 (<1) 83 (58) >99 (>99) 3 (<1) >99 (90) >99 (>99) 32 (12) >99 (93)
1.5 >99 (>99) <1 (<1) >99 (97) >99 (>99) 20 (10) >99 (>99) >99 (>99) 97 (86) >99 (>99)

www.nature.com/nature 18
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

SUPPLEMENTARY FIGURES

FIGURE S1: A two-stage genome-wide scan for T2DM susceptibility loci. T2DM
susceptibility loci were identified by a genome-wide scan using 392,935 genetic markers.
This report presents the results of the Stage 1 genome scan as well as rapid confirmation
studies for fifty-nine markers with the most significant T2DM association (shaded boxes).

www.nature.com/nature
www.nature.com/nature 19 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

FIGURE S2: Representative genotype clusters obtained using Illumina BeadArrays.


Intensity data from the Human1 and Hap300 bead arrays were used to establish sample
genotypes based on clusters obtained by Illumina BeadStudio 2.0 .Normalized angular cluster
plots are shown for 93 DNA samples genotyped at loci confirmed in the validation study.
Darkly shaded areas indicate signal intensities corresponding to successful homozygous (AA:
red, BB: blue) and heterozygous (purple) genotype calls.

rs7903146 rs13266634
2.00 2.40
2.20
1.80
2.00
1.60
1.80
1.40 1.60
1.20 1.40
Norm R

Norm R
1 1.20
0.80 1
0.80
0.60
0.60
0.40
0.40
0.20 0.20
0 0
-0.20 18 44 31 -0.20 3 28 62
-0.40
0 0.20 0.40 0.60 0.80 1 0 0.20 0.40 0.60 0.80 1
Norm Theta Norm Theta

rs1111875 rs7480010
1.60

1.40

2 1.20

1
Norm R

Norm R

0.80

1 0.60

0.40

0.20

0 0
12 34 47 -0.20
32 50 10
0 0.20 0.40 0.60 0.80 1 0 0.20 0.40 0.60 0.80 1
Norm Theta Norm Theta

rs729287

2
Norm R

0
2 34 57
0 0.20 0.40 0.60 0.80 1
Norm Theta

www.nature.com/nature
www.nature.com/nature 20 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

FIGURE S3. Detection of population stratification. Intercontinental population


stratification in our case (green dots) and control (red dots) samples was identified using
STRUCTURE. The analysis was performed using a dataset containing genotypes for 328
SNPs assayed in 1619 individuals (736 cases, 674 controls and 209 individuals from the
HapMap). Ancestry of most case and control subjects was similar to that of CEU individuals
from the HapMap (blue dots, top corner); however, 43 individuals lay outside the CEU cluster
and were excluded from the association study. No differences between the assignment of
cases and controls could be detected after removal of these outliers.

CEU

Case

Control

CHB/JPT YRI

www.nature.com/nature
www.nature.com/nature 21 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

FIGURE S4. Probability-probability plot for the MAX statistic calculated for Human1
BeadArray SNPs. P-values for the unadjusted MAX statistics (green) deviate from the
expected uniform distribution (magenta). This is partially corrected by adjusting for the
variance inflation factor (blue). The point where the adjusted curve markedly deviates from
the uniform distribution, p = 1 x 10-4, establishes the threshold for rapid confirmation studies.

www.nature.com/nature
www.nature.com/nature 22 2
doi: 10.1038/nature05616 SUPPLEMENTARY INFORMATION

FIGURE S5. Probability-probability plot for MAX statistic calculated for Hap300
BeadArray SNPs. P-values for the unadjusted MAX statistics (green) deviate from the
expected uniform distribution (magenta). This is partially corrected by adjusting for the
variance inflation factor (blue). The point where the adjusted curve markedly deviates from
the uniform distribution, p = 5 x 10-5, establishes the threshold for rapid confirmation studies.

www.nature.com/nature
www.nature.com/nature 23 2