Vous êtes sur la page 1sur 21

Functional Genomics of Fern

Gametophytes:
Transcriptome Sequencing in
Pteridium Aquilinum
Joshua Der, Michael Barker, Norman Wickett,
Claude dePamphilis and Paul Wolf
Acknowledgments
Coauthors:
• Michael Barker (U. British Columbia) - project design and
transcriptome assembly
• Norman Wickett and Claude dePamphilis (Penn State U.) -
transcriptome annotation and interpretation of results
• Paul Wolf (Utah State U.) - project design, funding,
interpretation of results, and general support
Utah State University:
• Aaron Duffy - tissue culture & bioinformatics help
• Mike Pfrender - RNA lab space & equipment
• VP for Research & Center for Integrated BioSystems -
research funds
• Dept. of Biology, Center for Integrated BioSystems, & Ecology
Center - travel funds
Indiana University:
• Keithanne Mockaitis - cDNA library preparation & 454
sequencing
University of British Columbia:
• Katrina Dlugosch - sequence cleaning script
Penn State University:
• Eric Wafula - general scripting help
Fern Evolution

Sister to seed plants Seed Plants


Ancient lineage (Devonian)
~11000 extant species Ferns

High diversity in morphology,


geography, and ecology Lycophytes

Evolved and maintain


independent gametophyte Bryophytes
and sporophyte generations
Fern Evolution

Sister to seed plants zygote (2n)

Ancient lineage (Devonian) meiosis


syngamy
~11000 extant species
sperm (n)

High diversity in morphology, Fern haploid spores (n)

geography, and ecology egg (n)


life cycle

Evolved and maintain


independent gametophyte
and sporophyte generations
Fern Genetics

Recessive alleles are not masked in


haploid gametophytes
Gametic phase segregation and
recombination can be directly observed
Controlled crosses can be performed
to produce double haploid sporophytes
(i.e. complete homozygotes)
Apogamy and apospory can be induced,
unlinking ploidy and life stage Klekowski 1971
Challenges In Fern Genetics

Limited agronomic importance

Large genome sizes (avg. 10 Gb)

High chromosome numbers


(avg. n = 57)

Extensive history of
hybridization and polyploidy
Photo credit: Mike Windham
Fern Genomics

Genomic resource development in


ferns has lagged far behind those in
flowering plants
(but wait for Mike's talk next)
No fern genome sequencing
projects have been funded
New high throughput sequencing
has started to bring the power of
genomics to non-model organisms
www.454.com
Bracken Fern:
Pteridium aquilinum
Worldwide distribution
Toxic to livestock and weedy in pasture,
so has been extensively studied
Highly adaptable and phenotypically plastic
Established culture techniques
Model system for understanding the fern life
cycle, gametophyte development, and sex
determination
Phylogeny is well characterized
Paleopolyploid with diploid gene expression
Genome size: 1C = 9.8 Gb Lindman. 1917-1926. Bilder ur Nordens Flora-508
The Fern Gametophyte
Transcriptome
How has the fern life cycle influenced genome evolution?
What genes are active in the gametophyte generation?
What is the functional profile of these genes?
Do gametophyte specific genes experience purifying selection?
Do reproductive proteins have a signature of positive
selection or is their rate of molecular evolution elevated?
What is the function of "flowering" gene homologues in fern
gametophytes?
Sequence Pre-processing:
Cleaned ESTs
RNA from whole gametophytes: Histogram of cleaned reads
male, female, and bisexual
cDNA library normalized and

15000
enriched for full-length mRNA

Number of sequences

10000
Reads were quality and length
filtered, adapter and polyA/T

5000
trimmed
Cleaned reads: 681,722 0

Mean length: 372.60 bp 0 100 200 300 400 500 600

Total bases: 254 Mb Cleaned read length, maximum = 624


EST Assembly: Unigenes
Two-step strategy for EST assembly to
reduce redundancy in the unigene set: Histogram of transcriptome unigenes (CAP3)

1. ESTs were first assembled in MIRA Total unigenes = 38889


2. Assembly passed to CAP3 to join

6000
Mean length = 685.76 bp
additional contigs

Number of sequences
Total bases = 26.67 Mp

4000
MIRA CAP3
Assembly: (1º) (2º)

2000
# singletons: 638 183
# 1º contigs: 50,020 32,801
# 2º contigs: 0 5,905
# unigenes: 50,658 38,889 0
0 500 1000 1500 2000 2500

mean unigene length: 637.7 bp 685.8 bp Unigene length, largest transcript = 4897 bp

largest unigene length: 4,489 bp 4,897 bp


total consensus: 32.30 Mb 26.67 Mb
Transcriptome Coverage
To assess the depth and breadth of
transcriptome coverage, we compared
our assembly with the predictions
from a simulation model using ESTcalc
Parameters ESTcalc Actual
(CAP3)
Technology 454 GSFLX 454 GSFLX
(Titanium)
Library type normalized normalized
Reads/plate 681,722 681,722
Read length 372.6 bp 372.6 bp
Output
Total sequence amount 254 MB 254.0076 MB
Total assembled sequence 26.2 MB 26.67 MB
Percent transcriptome (A) 87 % ?
Percent of genes tagged (B) 100 % ?
Unigene count (C) 32,044 38,889
Mean unigene length (D) 819 bp 685.8 bp
Singleton yield (E) 19 % 0.0047 %
Percent of genes with 90% coverage 69.8 % ?
Wall et. al., 2009. BMC Genomics 10:347
Percent of genes with 100% coverage (F) 23.7 % ?
Transcriptome Annotation
Two complementary strategies for functional annotation

1. BLAST unigenes in NCBI nr protein database


GO annotation using Blast2GO
Broad functional perspective with a rich objective GO annotation

2. BLAST to inferred proteomes of 10 complete plant genomes


Pseudo-annotated based on MCL cluster membership in
PlantTribes2.0
Tribe and OrthoGroup assignment, GO-slim function, and
Arabidopsis gene id & description
Plant gene family classification with detailed information from
well-curated reference genomes
Transcriptome Annotation:
nr BLASTx
46%
21,097 of 38,889 unigenes with positive hit (e-value cutoff 1e-10) 54%

Positive BLAST hit


No BLAST hit
Top-Hit species distribution
BLAST HITs
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000
Physcomitrella patens
Vitis vinifera
Picea sitchensis
Ricinus communis
Populus trichocarpa
Arabidopsis thaliana
Oryza sativa
Sorghum bicolor
Glycine max
Zea mays
Gossypium hirsutum
Medicago truncatula
unknown
Adiantum capillus-veneris
Ceratopteris richardii
Nicotiana tabacum
Marchantia polymorpha
Solanum tuberosum
Chlamydomonas reinhardtii
Alsophila spinulosa
Ginkgo biloba
Micromonas sp.
Pteris vittata
Elaeis guineensis
Pinus taeda
Solanum lycopersicum
Micromonas pusilla
Triticum aestivum
Gossypium barbadense
others
Transcriptome Annotation:
Blast2Go

Cellular Component - GO level 5


cellular_component Level 5
endoplasmic
cytoskeleton
reticulum
(238)
(317)
nucleus
nucleoplasm
(1,325)
(376)
endosome

Localization of genes is
vacuole (274)
(10)
Golgi
nucleolus

predominantly in the
apparatus
(197)
(212)
nuclear

nucleus, mitochondria,
microbody
lumen (555)
(119)
cytosol (448)

and plastids
plastid
(3,613)

mitochondrio
n (1,967)
Transcriptome Annotation:
Blast2Go

Biological Process - GO level 2


biological_process Level 2
multicellular regulation of
organismal biological
process (166) process (716)
localization response to
(1,713) stimulus (908)
multi-organism
process (15)

Two main biological growth (41)

processes involve
establishment of
localization
(1,713)

metabolism and cellular reproduction (73) metabolic


process (7,641)
biological

machinery
regulation (853)
developmental
process (194)
reproductive
process (29)

cellular process
(7,432)
Transcriptome Annotation:
Blast2Go

Molecular Function - GO
molecular_function Level 2
level 2
enzyme structural
regulator molecule
activity (106) activity (542)
translation
regulator
activity (1)

Two main molecular transporter


activity (908)

functions are binding molecular


transducer

(DNA, RNA, and protein)


binding (8,120) activity (357)

and catalytic activity


(hydrolase and catalytic activity
(7,915)

transferase activity)
transcription
regulator
activity (409)
Transcriptome Annotation:
PlantTribes2.0

25,172 of 38,889 unigenes


with positive hit, e-value 35%

cutoff 1e-5
65%
Unigenes classified into
7,126 Tribes and 9,548
OrthoGroups
Positive BLAST hit
No BLAST hit
Transcriptome Annotation:
PlantTribes2.0
Some interesting results:
Single unigene similar to LEAFY
one copy found in seed plants, two in Physcomitrella and Selaginella
Single unigene similar to SEPALLATA3
a gene family absent from gymnosperms, thought to have originated
with flowers and required by B and C floral organ identity genes to
function
Single unigene similar to PISTILLATA and two unigenes similar to
CAULIFLOWER
not known in gymnosperms, Physcomitrella, or Selaginella

WARNING: these annotations are based on BLAST which may return


distant homologues. A detailed phylogenetic examination is needed!
Future Work

Sequence the sporophyte


transcriptome
Transcriptome profiling in various
life stages/tissues (RNA-seq)
Examine gene family evolution in
land plants
RNA editing in the chloroplast
genome
Population genomics (with mined
SSR and SNP loci)
Linkage mapping
Thank You!

Collecting bracken in the Rocky Mountains with my field assistant

Vous aimerez peut-être aussi