Académique Documents
Professionnel Documents
Culture Documents
This chapter is closely based on a tutorial given by Stephan Schiffels (currently Sanger Institute) at
the Australian Centre for Ancient DNA in November 2014. This text is based very closely on his
script, with his permission.
7.1 Questions
1. What can we say about the time to the most recent common ancestor between you and the
Queen of England?
2. How different or similar is the DNA sequence of you and the Queen of England?
7.2 Ancestors
In a simple model of human populations, the number of ancestors that an individual has doubles when
going back one generation:
A(g) = 2g .
In a family tree in a finite population, it will occasionally happen that two (or more) different ancestors
in the same generation share an ancestor in the previous generation. This is called a coalescence event:
108 Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014
Model of growth: Here is a more realistic model of growth of number of ancestors as a function of
number of generations g: g
2 if 2g Neff
A(g) = ,
−→ Neff else
where Neff is the so-called effective population size.
• reflects the long-term population size of a population (in humans that is across several hundred
thousands of years)
• reflects the effective number of people that you “randomly” choose your mates from; a so-called
panmictic population: well-mixed randomly mating population.
Question 1: Time to recent common ancestor: What can we say about the time to the most
recent common ancestor between you and the Queen of England?
The approximate probability of sharing an ancestor with someone else g generations ago is:
(2g )2
.
Neff
How strongly does this answer depend on the assumed effective population size? While the number of
ancestors that lived g generations ago depends strongly on this, the number of generations that one
must go back to find a last common ancestor does not:
Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014 109
We now model the genealogy of two or more genes (or, more precisely, alleles of a gene) backward
until their most recent common ancestor is found:
This looks complicated. A central idea is to ignore all genes that are not passed down to the current-
day set of interest. This is also called looking backward in time.
To study the genealogy of a set of genes, we start at the present, and move backward in time, generation
by generation, modeling individual coalescence events:
Definition 7.7.1 (Basic coalescense theory) Basic assumptions of coalescence theory of a pair of
samples1 :
Estimator for population size: This gives us an estimator for population size:
Θ = 4N µ
% -
Fraction of heterozygote Effective population size
positions in the genome
This simple formula encapsulates a deep relationship between a purely genomic property (the het-
erozygosity) and a population level quantity (the effective population size).
1
J.F.C. Kingman, On the Genealogy of Large Populations, J. of Applied Probability, 19:27-43 (1982)
2
http://en.wikipedia.org/wiki/Cumulative_distribution_function
Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014 111
Question 2: Sequence similarity: How similar is the DNA sequence of the Queen of England and
of you?
Consider a single chromosome and compare the Queen’s copy with your copy.
Using N = 15 000 and µ = 1.25 × 10−8 , we get:
Hence:
1
The Queen’s and your chromosome differ at about 1 in 1333 sites. (Note that 1 333 = 0.00075 .)
Recall that the probability of two samples not coalescencing in time t is:
1 t
t
P2 (t) = 1 − ≈ e− 2N .
2N
i 1 t i(i − 1) 1 t
i(i−1)
Pi (t) = 1 − = 1− ≈ e− 4N t .
2 2N 2 2N
Mean waiting time for coalescence events: So, the waiting times Ti are exponentially distributed
with mean3 is:
4N
hTi i = .
i(i − 1)
Definition 7.11.1 (Tajima’s estimator) Tajima’s estimator is the mean proportion of pairwise
differences between any two sequences:
nr of pairwise differences
Θπ = .
L
Definition 7.11.2 (Watterson estimator) The Watterson estimator is the number of segregating
sites:
nr of segregating sites
ΘW = Pn−1 .
L i=1 1/i
Question 3: demographic history: How did our ancestral population size change through time?
Three possible simple answers: The population has been
(a) constant
(b) declining
(c) expanding
Interestingly, we can destinguish between these three possible scenarios by only comparing existing
genomes:
Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014 113
Θπ − ΘW
D=p
Var(Θπ − ΘW )
7.14 Summary