Vous êtes sur la page 1sur 7

Bioinformatics I, WS’14/15, D.

Huson, December 15, 2014 107

7 Introduction to Population Genetics

This chapter is closely based on a tutorial given by Stephan Schiffels (currently Sanger Institute) at
the Australian Centre for Ancient DNA in November 2014. This text is based very closely on his
script, with his permission.

7.1 Questions

The aim of today’s lecture is to answer three questions:

1. What can we say about the time to the most recent common ancestor between you and the
Queen of England?

2. How different or similar is the DNA sequence of you and the Queen of England?

3. How did our ancestral population size change through time?

7.2 Ancestors

In a simple model of human populations, the number of ancestors that an individual has doubles when
going back one generation:

(figure by Stephan Schiffels)

Number of ancestors as a function of number of generations g:

A(g) = 2g .

Problem: After ≈ 32 generations this exceeds the number of living humans.

7.3 Coalescence events

In a family tree in a finite population, it will occasionally happen that two (or more) different ancestors
in the same generation share an ancestor in the previous generation. This is called a coalescence event:
108 Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014

(figure by Stephan Schiffels)

Model of growth: Here is a more realistic model of growth of number of ancestors as a function of
number of generations g:  g
2 if 2g  Neff
A(g) = ,
−→ Neff else
where Neff is the so-called effective population size.

7.4 Effective population size

The effective population size

• reflects the long-term population size of a population (in humans that is across several hundred
thousands of years)
• reflects the effective number of people that you “randomly” choose your mates from; a so-called
panmictic population: well-mixed randomly mating population.

Some concrete numbers:

• North-Europeans: Neff ≈ 15 000


• West-Africans: Neff ≈ 20 000
• Native Americans: Neff ≈ 12 000

7.5 Answer to question 1

Question 1: Time to recent common ancestor: What can we say about the time to the most
recent common ancestor between you and the Queen of England?
The approximate probability of sharing an ancestor with someone else g generations ago is:
(2g )2
.
Neff

This is approximately 1 in only 7 generations:


(2g )2 (27 )2 1282 16 384
= = = ≈ 1.
Neff 15 000 15 000 15 000

How strongly does this answer depend on the assumed effective population size? While the number of
ancestors that lived g generations ago depends strongly on this, the number of generations that one
must go back to find a last common ancestor does not:
Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014 109

(figure by Stephan Schiffels)

7.6 Ancestry of two or more genes

We now model the genealogy of two or more genes (or, more precisely, alleles of a gene) backward
until their most recent common ancestor is found:

(figure by Stephan Schiffels)

This looks complicated. A central idea is to ignore all genes that are not passed down to the current-
day set of interest. This is also called looking backward in time.
To study the genealogy of a set of genes, we start at the present, and move backward in time, generation
by generation, modeling individual coalescence events:

(figure by Stephan Schiffels)


110 Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014

7.7 Coalescence theory with a pair of samples

Definition 7.7.1 (Basic coalescense theory) Basic assumptions of coalescence theory of a pair of
samples1 :

• Population has size N , with 2N gene copies.


• The probability P (t) of two genes not having the same ancestor in t generations is given by
1 t
P (t) = (1 − 2N ).
t
• In the limit for very large populations, N −→ ∞, we have P (t) = e− 2N .
• So the waiting time to a coalescence event between two lineages is exponentially distributed with
mean T2 = htcoal i = 2N
 
If the cumulative distribution function of an exponential
 −λx distribution is:
 e for x ≥ 0 
Recall2 :  F (x; λ) = , 
 0 else 
1
then the mean is λ .

7.8 Genetic diversity

To model genetic diversity, we add mutations to our simple model.

• Mutations occur with probability µ per generation per site.


• The mean tMRCA (time to the most recent common ancestor) between two genes is 2N gener-
ations ago.
• So, the number of mutations that we expect between two genes is 4N µ.

(figure by Stephan Schiffels)

• The site heterozygosity is given by Θ = 4N µ.

Estimator for population size: This gives us an estimator for population size:
Θ = 4N µ
% -
Fraction of heterozygote Effective population size
positions in the genome
This simple formula encapsulates a deep relationship between a purely genomic property (the het-
erozygosity) and a population level quantity (the effective population size).
1
J.F.C. Kingman, On the Genealogy of Large Populations, J. of Applied Probability, 19:27-43 (1982)
2
http://en.wikipedia.org/wiki/Cumulative_distribution_function
Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014 111

7.9 Answer to question 2

Question 2: Sequence similarity: How similar is the DNA sequence of the Queen of England and
of you?
Consider a single chromosome and compare the Queen’s copy with your copy.
Using N = 15 000 and µ = 1.25 × 10−8 , we get:

Θ = 4N µ = 4 × 15000 × 1.25 × 108 = 0.00075

Hence:
1
The Queen’s and your chromosome differ at about 1 in 1333 sites. (Note that 1 333 = 0.00075 .)

7.10 Mutations on a coalescence tree

Recall that the probability of two samples not coalescencing in time t is:

1 t
 
t
P2 (t) = 1 − ≈ e− 2N .
2N

The probability of i samples not coalescencing in time t is:

i 1 t i(i − 1) 1 t
     
i(i−1)
Pi (t) = 1 − = 1− ≈ e− 4N t .
2 2N 2 2N

Mean waiting time for coalescence events: So, the waiting times Ti are exponentially distributed
with mean3 is:
4N
hTi i = .
i(i − 1)

(figure by Stephan Schiffels)

Given n samples, and times Ti , the total branch length is:


n n n−1
X X 1 X1
hT i = ihTi i = 4N = 4N .
i−1 i
i=2 i=2 i=1

Hence, the expected number of mutations anywhere on the tree is:


n−1 n−1
X 1 X1
hSi = µhT i = µ4N =Θ .
i i
i=1 i=1
3
Kingman, 1982
112 Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014

7.11 Two famous estimators of genetic diversity

How to estimate the quantity Θ = 4N µ (heterozygosity) from genome data?


Consider n sequences of length L.

Definition 7.11.1 (Tajima’s estimator) Tajima’s estimator is the mean proportion of pairwise
differences between any two sequences:
nr of pairwise differences
Θπ = .
L

Definition 7.11.2 (Watterson estimator) The Watterson estimator is the number of segregating
sites:
nr of segregating sites
ΘW = Pn−1 .
L i=1 1/i

(figures by Stephan Schiffels)

7.12 Answer to question 3

Question 3: demographic history: How did our ancestral population size change through time?
Three possible simple answers: The population has been

(a) constant

(b) declining

(c) expanding

(figure by Stephan Schiffels)

Interestingly, we can destinguish between these three possible scenarios by only comparing existing
genomes:
Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014 113

7.13 Determining demographic history

We compare Tajima’s estimator and Watterson’s estimator to get a useful measure:

Definition 7.13.1 (Tajima’s D) Define

Θπ − ΘW
D=p
Var(Θπ − ΘW )

If the population size is constant, then should have D ≈ 0.


If the population size has been increasing, then more mutations will have occurred on leaf edges, thus
effecting less pairs, causing D to be negative.
If the population size has been decreasing, then more mutations will have occurred on inner edges,
thus effecting more pairs, causing D to be positive.
So, Taijima’s D tells us something about the history of a population.

7.14 Summary

The simple coalescence model allows us to:

• Estimate the time to the last common ancestor of individuals of a population.

• Estimate how similar the DNA of different individuals of a population is.

• Make statements about the “shape” of the recent history of a population.

Vous aimerez peut-être aussi