Vous êtes sur la page 1sur 26

Increased taxon sampling with short DNA sequences does it support or collapse deep monophyly?

John James Wilson Department of Integrative Biology University of Guelph jwilso04@uoguelph.ca

Introduction

Methods

Results

Discussion

Lepidoptera phylogeny
more genes? more taxa? better models?

(Regier et al. 2008)

Introduction

Methods

Results

Discussion

lots of data?
more genes? more taxa? better models?

- 24 nuclear genes - exemplar taxa - resolve deep nodes

- COI only - every species - global ID system

- Lists priority gene regions - COI and EF-1a

Introduction

Methods

Results

Discussion

more taxa?
Hillis et al. (1998, 2003) - limited amount of time and money for data assembly - phylogenetic estimates improve with more taxa even if # of characters remains unchanged

X2

Introduction

Methods

Results

Discussion

more taxa?
Hillis et al. (1998, 2003) - limited amount of time and money for data assembly - phylogenetic estimates improve with more taxa even if # of characters remains unchanged

X2

objective (i) test this hypothesis using macrolepidoptera

Introduction

Methods

Results

Discussion

ancient rapid radiation?


a - difficult to reconstruct - taxon sampling may not help (Whitfield & Kjer 2006) b

(Rokas & Carroll 2006)

Introduction

Methods

Results

Discussion

ancient rapid radiation?


a - difficult to reconstruct - taxon sampling may not help (Whitfield & Kjer 2006) b

objective (ii) explore effect of tree-shape with simulated sequences


(Rokas & Carroll 2006)

Introduction

Methods

Results

Discussion

empirical sequences
Bombycidae Drepanidae Geometridae Hedylidae

Hesperiidae

Lasiocampidae Megalopygidae

Mimallonidae

Noctuidae

Notodontidae

Nymphalidae

Papilionidae

97
Pieridae Saturniidae Sphingidae Uraniidae

5 sampling levels (species per family): - 100 - 200 - 300 - 400 - 500 Cytochrome c oxidase I (650bp)

Introduction

Methods

Results

Discussion

simulated sequences
- simulated along NJ tree of empirical sequences - constrained families as clades, and deeper phylogeny of Pogue (2009) X1 X2 X4

2 sampling levels: - 1000 species (~80 species per family) - 150 species (~10 species per family)

Introduction

Methods

Results

Discussion

phylogenetic analysis
- maximum parsimony analysis using TNT - tree and character scores measured in PAUP

Introduction

Methods

Results

Discussion

evaluating phylogenies
concordance groups - 16 families
1. Proportion of monophyletic taxa # of monophyletic taxa/number of taxa 2. Taxon consistency index = m/s m = minimum # of clades a taxon can exhibit on any cladogram s = minimum # of clades a taxon exhibits on actual cladogram 3. Taxon retention index = (g-s)/(g-m) g = greatest # of clades a taxon can exhibit on any cladogram

Introduction

Methods

Results

Discussion

increasing taxon number in empirical datasets


1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 100 200 300 # species per family 400 500 Proportion of monophyletic taxa Taxon consistency index Taxon retention index

Introduction

Methods

Results

Discussion

increasing taxon number in empirical datasets


MI LA BO SA SP HE HP PA PI LY NY DR UR GE NO NC All #spp. 13 69 29 100 100 17 100 100 100 100 100 5 6 100 100 100 1139 ave. p-dist. 0.109 0.123 0.121 0.131, 0.133 0.106, 0.106 0.098 0.122, 0.120 0.119, 0.119 0.144, 0.142 0.101, 0.094 0.132, 0.129 0.098 0.109 0.112, 0.114 0.122, 0.121 0.109, 0.111 0.138, 0.138 #spp. 13 69 29 166 457 17 500 209 211 404 500 5 6 500 372 500 3958 ave. p-dist. 0.109 0.123 0.121 0.132 0.106 0.098 0.122, 0.122 0.119 0.142 0.098 0.134, 0.133 0.098 0.109 0.115, 0.115 0.12 0.109, 0.109 0.137, 0.137

Homoplasy measures CI RI 1139 1139 0.024, 0.024 0.498, 0.501 3958 3958 0.010, 0.010 0.500, 0.580

Introduction

Methods

Results

Discussion

increasing taxon number in empirical datasets


MI LA BO SA SP HE HP PA PI LY NY DR UR GE NO NC All #spp. 13 69 29 100 100 17 100 100 100 100 100 5 6 100 100 100 1139 ave. p-dist. 0.109 0.123 0.121 0.131, 0.133 0.106, 0.106 0.098 0.122, 0.120 0.119, 0.119 0.144, 0.142 0.101, 0.094 0.132, 0.129 0.098 0.109 0.112, 0.114 0.122, 0.121 0.109, 0.111 0.138, 0.138 #spp. 13 69 29 166 457 17 500 209 211 404 500 5 6 500 372 500 3958 ave. p-dist. 0.109 0.123 0.121 0.132 0.106 0.098 0.122, 0.122 0.119 0.142 0.098 0.134, 0.133 0.098 0.109 0.115, 0.115 0.12 0.109, 0.109 0.137, 0.137

Homoplasy measures CI RI 1139 1139 0.024, 0.024 0.498, 0.501 3958 3958 0.010, 0.010 0.500, 0.580

Introduction

Methods

Results

Discussion

increasing tree-like shape in simulated datasets


1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 X1 X2 X4 X8 X16 Proportion of monophyletic taxa Taxon consistency index Taxon retention index 100 species per family 10 species per family

branch length empirical values

Introduction

Methods

Results

Discussion

results summary
(i) increasing taxon number in empirical datasets - doesnt increase # monophyletic families - doesnt break long branches - adds homoplasy to the dataset (ii) increasing tree-like shape in simulated datasets - does increase # monophyletic families - but not substantially - increased taxon sampling doesnt help

Introduction

Methods

Results

Discussion

improving phylogenies

Introduction

Methods

Results

Discussion

improving phylogenies
more genes? may not help
- Regier et als 5 genes generally recovered families - other studies show varying success (wg, Ef-1a, period) - few exemplars increase the a priori probability of species appearing together

Introduction

Methods

Results

Discussion

improving phylogenies
more genes? may not help more taxa? may not help
- sequence information to increase taxon sampling for other genes is non-existent - additional taxon sampling must improve the stability of the classification?

Introduction

Methods

Results

Discussion

improving phylogenies
more genes? may not help more taxa? may not help better models? may not help
- unfeasible in terms of computer time - global parsimony still represents the boldest test of monophyly (Goloboff et al. 2009) - do we really know how molecules evolve?

Introduction

Methods

Results

Discussion

improving phylogenies
more genes? may not help more taxa? may not help better models? may not help

Introduction

Methods

Results

Discussion

improving phylogenies
The recovery of short internodes is likely to vary even with small perturbations of gene choice, taxon sampling and analytical assumptions

Introduction

Methods

Results

Discussion

improving phylogenies
The recovery of short internodes is likely to vary even with small perturbations of gene choice, taxon sampling and analytical assumptions

Increased taxon sampling with short molecular sequences - does it support or collapse deep monophyly?

Acknowledgments
Study design
Paul Hebert, Bob Hanner, Joo Lima

Area de Conservacin Guanacaste


Dan Janzen, Winnie Hallwachs & ACG Parataxonomists

Laboratory Database & Analysis


Megan Milton, Brianne Hebert, Riadul Mannan & Sujeevan Ratnasingham

Funding & Support


Listed on the Canadian Centre for DNA Barcoding website: www.dnabarcoding.ca

Thank you!

Introduction

Methods

Results

Discussion

Effect of gene length and gene choice


Sum of Bootstrap Support Values (>50) for Monophyletic Subfamilies
450

3 genes

EF-1a

300

2 genes
150

wingless

1 gene
0 25

COI
50 75 25

# of Taxa

# of Taxa

50

75

Maximum Parsimony Maximum Likelihood

Nymphalidae