7 Introduction To Population Genetics: (Figure by Stephan Schiffels)

Bioinformatics I, WS’14/15, D.
Huson, December 15, 2014 107
7 Introduction to Population Genetics
This chapter is closely based on a tutorial given by Stephan Schiffels (currently Sanger Institute) at
the Australian Centre for Ancient DNA in November 2014. This text is based very closely on his
script, with his permission.
7.1 Questions
The aim of today’s lecture is to answer three questions:
1. What can we say about the time to the most recent common ancestor between you and the
Queen of England?
2. How different or similar is the DNA sequence of you and the Queen of England?
3. How did our ancestral population size change through time?
7.2 Ancestors
In a simple model of human populations, the number of ancestors that an individual has doubles when
going back one generation:
(figure by Stephan Schiffels)
Number of ancestors as a function of number of generations g:
A(g) = 2g .
Problem: After ≈ 32 generations this exceeds the number of living humans.
7.3 Coalescence events
In a family tree in a finite population, it will occasionally happen that two (or more) different ancestors
in the same generation share an ancestor in the previous generation. This is called a coalescence event:
108 Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014
Model of growth: Here is a more realistic model of growth of number of ancestors as a function of
number of generations g: g
2 if 2g Neff
A(g) = ,
−→ Neff else
where Neff is the so-called effective population size.
7.4 Effective population size
The effective population size
• reflects the long-term population size of a population (in humans that is across several hundred
thousands of years)
• reflects the effective number of people that you “randomly” choose your mates from; a so-called
panmictic population: well-mixed randomly mating population.
Some concrete numbers:
• North-Europeans: Neff ≈ 15 000

• West-Africans: Neff ≈ 20 000
• Native Americans: Neff ≈ 12 000
7.5 Answer to question 1
Question 1: Time to recent common ancestor: What can we say about the time to the most
recent common ancestor between you and the Queen of England?
The approximate probability of sharing an ancestor with someone else g generations ago is:
(2g )2
.
Neff
This is approximately 1 in only 7 generations:

(2g )2 (27 )2 1282 16 384
= = = ≈ 1.
Neff 15 000 15 000 15 000
How strongly does this answer depend on the assumed effective population size? While the number of
ancestors that lived g generations ago depends strongly on this, the number of generations that one
must go back to find a last common ancestor does not:
Bioinformatics I, WS’14/15, D. Huson (this part by S. Schiffels) December 15, 2014 109
7.6 Ancestry of two or more genes
We now model the genealogy of two or more genes (or, more precisely, alleles of a gene) backward
until their most recent common ancestor is found:
This looks complicated. A central idea is to ignore all genes that are not passed down to the current-
day set of interest. This is also called looking backward in time.
To study the genealogy of a set of genes, we start at the present, and move backward in time, generation
by generation, modeling individual coalescence events:

7.7 Coalescence theory with a pair of samples
Definition 7.7.1 (Basic coalescense theory) Basic assumptions of coalescence theory of a pair of
samples1 :
• Population has size N , with 2N gene copies.

• The probability P (t) of two genes not having the same ancestor in t generations is given by
1 t
P (t) = (1 − 2N ).
t
• In the limit for very large populations, N −→ ∞, we have P (t) = e− 2N .
• So the waiting time to a coalescence event between two lineages is exponentially distributed with
mean T2 = htcoal i = 2N
 
If the cumulative distribution function of an exponential
−λx distribution is:
 e for x ≥ 0 
Recall2 :  F (x; λ) = , 
 0 else 
1
then the mean is λ .
7.8 Genetic diversity
To model genetic diversity, we add mutations to our simple model.
• Mutations occur with probability µ per generation per site.

• The mean tMRCA (time to the most recent common ancestor) between two genes is 2N gener-
ations ago.
• So, the number of mutations that we expect between two genes is 4N µ.
• The site heterozygosity is given by Θ = 4N µ.
Estimator for population size: This gives us an estimator for population size:
Θ = 4N µ
% -
Fraction of heterozygote Effective population size
positions in the genome
This simple formula encapsulates a deep relationship between a purely genomic property (the het-
erozygosity) and a population level quantity (the effective population size).
1
J.F.C. Kingman, On the Genealogy of Large Populations, J. of Applied Probability, 19:27-43 (1982)
2
http://en.wikipedia.org/wiki/Cumulative_distribution_function
Question 2: Sequence similarity: How similar is the DNA sequence of the Queen of England and
of you?
Consider a single chromosome and compare the Queen’s copy with your copy.
Using N = 15 000 and µ = 1.25 × 10−8 , we get:
Θ = 4N µ = 4 × 15000 × 1.25 × 108 = 0.00075
Hence:
1
The Queen’s and your chromosome differ at about 1 in 1333 sites. (Note that 1 333 = 0.00075 .)
7.10 Mutations on a coalescence tree
Recall that the probability of two samples not coalescencing in time t is:
1 t

t
P2 (t) = 1 − ≈ e− 2N .
2N
The probability of i samples not coalescencing in time t is:
i 1 t i(i − 1) 1 t

i(i−1)
Pi (t) = 1 − = 1− ≈ e− 4N t .
2 2N 2 2N
Mean waiting time for coalescence events: So, the waiting times Ti are exponentially distributed
with mean3 is:
4N
hTi i = .
i(i − 1)
Given n samples, and times Ti , the total branch length is:

n n n−1
X X 1 X1
hT i = ihTi i = 4N = 4N .
i−1 i
i=2 i=2 i=1
Hence, the expected number of mutations anywhere on the tree is:

n−1 n−1
X 1 X1
hSi = µhT i = µ4N =Θ .
i i
i=1 i=1
3
Kingman, 1982
7.11 Two famous estimators of genetic diversity
How to estimate the quantity Θ = 4N µ (heterozygosity) from genome data?

Consider n sequences of length L.
Definition 7.11.1 (Tajima’s estimator) Tajima’s estimator is the mean proportion of pairwise
differences between any two sequences:
nr of pairwise differences
Θπ = .
L
Definition 7.11.2 (Watterson estimator) The Watterson estimator is the number of segregating
sites:
nr of segregating sites
ΘW = Pn−1 .
L i=1 1/i
(figures by Stephan Schiffels)
Question 3: demographic history: How did our ancestral population size change through time?
Three possible simple answers: The population has been
(a) constant
(b) declining
(c) expanding
Interestingly, we can destinguish between these three possible scenarios by only comparing existing
genomes:
7.13 Determining demographic history
We compare Tajima’s estimator and Watterson’s estimator to get a useful measure:
Definition 7.13.1 (Tajima’s D) Define
Θπ − ΘW
D=p
Var(Θπ − ΘW )
If the population size is constant, then should have D ≈ 0.

If the population size has been increasing, then more mutations will have occurred on leaf edges, thus
effecting less pairs, causing D to be negative.
If the population size has been decreasing, then more mutations will have occurred on inner edges,
thus effecting more pairs, causing D to be positive.
So, Taijima’s D tells us something about the history of a population.
7.14 Summary
The simple coalescence model allows us to:
• Estimate the time to the last common ancestor of individuals of a population.
• Estimate how similar the DNA of different individuals of a population is.
• Make statements about the “shape” of the recent history of a population.

7 Introduction To Population Genetics: (Figure by Stephan Schiffels)

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

7 Introduction To Population Genetics: (Figure by Stephan Schiffels)

Transféré par

Droits d'auteur :

Formats disponibles

Bioinformatics I, WS’14/15, D.

Huson, December 15, 2014 107

7 Introduction to Population Genetics

The aim of today’s lecture is to answer three questions:

3. How did our ancestral population size change through time?

(figure by Stephan Schiffels)

Number of ancestors as a function of number of generations g:

Problem: After ≈ 32 generations this exceeds the number of living humans.

7.3 Coalescence events

(figure by Stephan Schiffels)

7.4 Effective population size

The effective population size

Some concrete numbers:

• North-Europeans: Neff ≈ 15 000

7.5 Answer to question 1

This is approximately 1 in only 7 generations:

(figure by Stephan Schiffels)

7.6 Ancestry of two or more genes

(figure by Stephan Schiffels)

(figure by Stephan Schiffels)

7.7 Coalescence theory with a pair of samples

• Population has size N , with 2N gene copies.

7.8 Genetic diversity

To model genetic diversity, we add mutations to our simple model.

• Mutations occur with probability µ per generation per site.

(figure by Stephan Schiffels)

• The site heterozygosity is given by Θ = 4N µ.

7.9 Answer to question 2

Θ = 4N µ = 4 × 15000 × 1.25 × 108 = 0.00075

7.10 Mutations on a coalescence tree

The probability of i samples not coalescencing in time t is:

(figure by Stephan Schiffels)

Given n samples, and times Ti , the total branch length is:

Hence, the expected number of mutations anywhere on the tree is:

7.11 Two famous estimators of genetic diversity

How to estimate the quantity Θ = 4N µ (heterozygosity) from genome data?

(figures by Stephan Schiffels)

7.12 Answer to question 3

(figure by Stephan Schiffels)

7.13 Determining demographic history

We compare Tajima’s estimator and Watterson’s estimator to get a useful measure:

Definition 7.13.1 (Tajima’s D) Define

If the population size is constant, then should have D ≈ 0.

The simple coalescence model allows us to:

• Estimate the time to the last common ancestor of individuals of a population.

• Estimate how similar the DNA of different individuals of a population is.

• Make statements about the “shape” of the recent history of a population.

Vous aimerez peut-être aussi