Académique Documents
Professionnel Documents
Culture Documents
DOI: 10.1007/s00239-003-2564-9
1
Institute of Botany, Academia Sinica, 128 Sec. 2, Academy Road, Taipei 115, Taiwan
2
Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA
Abstract. We estimated the dates of the monocot– the monocot–dicot divergence and the core eudicot’s
dicot split and the origin of core eudicots using a age are older than their respective fossil records.
large chloroplast (cp) genomic dataset. Sixty-one
protein-coding genes common to the 12 completely Key words: Chloroplast genome — Divergence of
sequenced cp genomes of land plants were concate- monocot and dicot — Angiosperm phylogeny —
nated and analyzed. Three reliable split events were Age of core eudicots — Molecular clock — Un-
used as calibration points and for cross references. equal rate
Both the method based on the assumption of a con-
stant rate and the Li–Tanimura unequal-rate method
were used to estimate divergence times. The phylo-
genetic analyses indicated that nonsynonymous sub- Introduction
stitution rates of cp genomes are unequal among
tracheophyte lineages. For this reason, the constant- Fossil evidence suggests that flowering plants (an-
rate method gave overestimates of the monocot–dicot giosperms) first appeared 140 million years (Myr)
divergence and the age of core eudicots, especially ago in the early Cretaceous (Willis and McElwain
when fast-evolving monocots were included in the 2002). They soon diversified and expanded globally in
analysis. In contrast, the Li–Tanimura method gave the mid-Cretaceous (90–100 Myr ago) (Nicholas et al.
estimates consistent with the known evolutionary 1983). Although the angiosperm phylogeny has now
sequence of seed plant lineages and with known fossil been largely established (Mathews and Donoghue
records. Combining estimates calibrated by two 1999; PS Soltis et al. 1999; Qiu et al. 1999; Parkinson
known fossil nodes and the Li–Tanimura method, we et al. 1999; DE Soltis et al. 2000; Chase et al. 2000),
propose that monocots branched off from dicots 140– the question of why the oldest unequivocal fossil for
150 Myr ago (late Jurassic–early Cretaceous), at least angiosperms is nearly 300 and 170 Myr later than the
50 Myr younger than previous estimates based on the first vascular plants (ca. 440 Myr ago [Taylor and
molecular clock hypothesis, and that the core eudi- Taylor 1993]) and their extant sister group, the
cots diverged 100–115 Myr ago (Albian–Aptian of gymnosperms (late Carboniferous, ca. 310 Myr ago
the Cretaceous). These estimates indicate that both [Doyle 1998]), respectively, remains an ‘‘abominable
mystery’’ (Darwin et al. 1903). A number of hy-
potheses have been proposed to explain the late ar-
rival of angiosperms in the fossil record. These
include (1) the escape of fossilization in the initial
Correspondence to: Shu-Miaw Chaw; email: smchaw@sinica. stage of angiosperm evolution (Thomas and Spicer
edu.tw 1987), (2) bias in the fossil record (i.e., angiosperms
425
Reference point
Reference Gene used (timea of divergence; Myr) Estimated timea (Myr)
evolved much earlier but went undetected), and (3) from the three plant genomes (Mathews and Do-
the suggestion that the evolution of angiosperms was noghue 1999; Parkinson et al. 1999; Qiu et al. 1999;
triggered by a particular set of environmental con- PS Soltis et al. 1999; DE Soltis et al. 2000; Chaw et al.
ditions, and/or biotic interactions (such as co-evolu- 2000). These phylogenetic analyses have led to the
tion with faunal groups) (Willis and McElwain 2002). conclusion that the dicots were split into the basal
Is the origin of angiosperms actually much older dicots (or the magnoliids) and the eudicots and that
than the known fossil record? Since Ramshaw et al.’s the monocot lineage was derived from one of the
(1972) first application of molecular data to address basal magnoliids (Fig. 1A). Parallel to the molecular
this question, three decades have passed. In the in- data has been the accumulation of pollen fossils of
terim, molecular phylogenetic studies and critical eudicots, which began in the late Barremian (of
fossils of derived angiosperms from older geological Cretaceous, ca. 120 Myr ago) and spread globally in
deposits (Magallón et al. 1999; Wikström et al. 2001) the Albian (ca. 110 Myr ago) (Doyle 1992; Hughes
have opened up an opportunity to readdress the age 1994). In addition, many new megafossils of basal
and evolution of angiosperms. Although all previous eudicots have appeared, such as Tetracentraceae
estimates of the monocot–dicot divergence (Table 1) from the Barremian (110–118 Myr ago) (Magallón
predate angiosperms’ fossil records, they are highly et al. 1999), as well as core eudicots, such as a pos-
variable, ranging from 140–190 Myr (Goremykin sible Rhamnaceae/Rosaceae (rosids) from the early
et al. 1997; Sanderson 1997; Wikström et al. 2001; Cenomanian (94–97 Myr ago [Basinger and Dilcher
Sanderson and Doyle 2001) to 200 Myr (Ramshaw 1984]). It has also been suggested that the date of
1972; Wolfe et al. 1989; Laroche et al. 1995; Yang diversification of core eudicots was underestimated.
et al. 1999) or even 300–320 Myr (Martin et al. 1989, Wikström et al. (2001) have examined this issue
1993; Brandl et al. 1992). (Table 1) with nuclear 18S rDNA and two cp (rbcL
Traditionally, the angiosperms were subdivided and atpB) genes. We now provide additional evidence
into two classes, Liliopsida (the monocots) and by analyzing whole chloroplast (cp) genomic DNA
Magnoliopsida (the dicots) (Cronquist 1988). How- sequences.
ever, this subdivision was first refuted by rbcL and Cp DNA sequences are useful for studying the
18S rRNA gene phylogenies (Chase et al. 1993; Chaw plant phylogeny at deep levels of evolution because of
et al. 1997) and later by analyses of multiple genes their lower rates of silent nucleotide substitution
426
Fig. 1. Rooted phylogenetic tree for the 12 sampled species. A B). Gene loss (open bar), loss but with known transfer to nucleus
Phylogeny of angiosperms based on Qiu et al. (1999) and P. S. (hatched bar), retention (gray bar), and likely gain with no simi-
Soltis et al.’s (1999) phylogenetic trees. Solid lines lead to taxa larity to prokaryotic genes (filled bar) are plotted on the branches
sampled in this study. B Rooted NJ tree using the Pamilo–Bianchi– leading to each lineage. The upper numbers at each node denote the
Li distances based on the Ka values concatenated from 61 cp bootstrap percentages (where applicable, values of the interior
protein-coding genes. The branches leading to nodes C2 and A are branch test indicated after the slash). Total gene number in the cp
not drawn to scale. Lengths are indicated. The calibration points genome is given after each species, in parentheses. Branch lengths
(nodes C1, C2, C3) were used to estimate the divergence between and the scale bar are Ka values per 100 sites.
monocots and dicots (node A) and the origin of core eudicots (node
(Palmer 1985a, b; Wolfe et al. 1989; Clegg et al. was due to rate variation across lineages. In order to
1994). Moreover, concatenating sequences from mitigate this problem we used mean branch lengths of
many genes may overcome the problem of multiple the sampled monocots and dicots.
substitutions that cause the loss of phylogenetic in- The focus of this study is to estimate the dates of
formation between cp lineages (Lockhart et al. 1999) the monocot–dicot split and the origin of core eudi-
and can reduce ‘‘sampling errors due to substitutional cots using a large cp genomic dataset. The date of the
noise and the finite number of characters within a monocot–dicot divergence can be calculated by ex-
gene’’ (Sanderson and Doyle 2001). In this study we trapolation from the reliable dates of other speciation
analyzed 39,507 sites of cp DNA genomic sequences events by means of phylogeny based on DNA se-
from 61 protein-coding genes common to the 12 quence distances (Wolfe et al. 1989). Three diver-
complete cp genomes of land plants (Table 2). Our gence events with well-supported fossil dates were
dataset is larger than those used in previous studies, used as calibration points and cross references. Both
including that of Goremykin et al. (1997; see also the method based on the assumption of a constant
Table 1), who analyzed 40 proteins of cp genomes rate and Li and Tanimura’s (1987) unequal-rate
from fewer taxa (five land plants, including only one method (hereafter the Li–Tanimura method) were
dicot and two monocots). used to estimate divergence times, and the estimates
Molecular dating often assumes rate constancy, were compared with known fossil dates. Although
but this is frequently violated (PS Soltis et al. 2002 several other methods without the rate constancy
and references herein). For example, substitution assumption, such as the nonparametric rate
rates of cp genes vary greatly among and within smoothing method (NPRS), have been proposed
tracheophyte (or vascular plant) lineages (Bousquet (Sanderson 1997 and references cited herein), we
et al. 1992; Gaut et al. 1992, 1993; Clegg et al. 1994; chose the Li–Tanimura method for its simplicity. The
Sanderson and Doyle 2001; PS Soltis et al. 2002), method uses lineages in which the molecular clock
between protein-coding loci (Muse and Gaut 1997; holds better than the others to estimate the diver-
Matsuoka et al. 2002), and between nonsynonymous gence time at a particular node. We also discuss
and synonymous sites (Gaut et al. 1997; Matsuoka et possible reasons for discrepancies among estimates of
al. 2002). Sanderson and Doyle (2001) believed that divergence dates obtained in this study and previous
much of the conflict in estimating divergence times studies.
427
Table 2. Scientific names, classification, and NCBI accessions of species in the dataset
Bryophyte
Marchantiaceae Marchantia polymorpha NC_001319 (Aug 2002)/Ohyama et al. (1986)
Petridophyte
Psilotaceae Psilotum nudum AP004638 (Nov 2002)/Wakasugi et al. (2000)
Gymnoaperm
Pinaceae Pinus thunbergii NC_001631 (Sep 2002)/Wakasugi et al. (1994)
Angiosperms
Monocots
Poaceae
Andropogoneae Zea mays NC_001666 (Sep 2002)/Maier et al. (1995)
Oryzeae Oryza sativa NC_001320 (Sep 2002)/Hiratsuka et al. (1989)
Triticeae Triticum aestivum NC_002762 (Sep 2002)/Ikeo and Ogihara (2000)
Dicots
Eudicots
Caryophyllidae
Chenopodiaceae Spinacia oleracea NC_002202 (Aug 2002)/Schmitz-Linneweber et al. (2001)
Asteridae
Solanaceae Nicotiana tabacum NC_001879 (Sep 2002)/Shinozaki et al. (1986)
Rosidae
Brassicaceae Arabidopsis thaliana NC_000932 (Sep 2002)/Sato et al. (1999)
Onagraceae Oenothera elata subsp. hookeri NC_002693 (Sep 2002)/Hupfer et al. (2000)
Fabaceae
Papillionoideae
Loteae Lotus japonicus NC_002694 (Sep 2002)/Kato et al. (2000)
Trifolieae Medicago truncatula AC093544c(Nov 2001)/Lin et al. (2001)
a
Ranks of species follow the NCBI’s Taxonomy Guide.
b
Data modified from http://megasun.bch.umontreal.ca/ogmp/projects/other/cp_list.html (vers. 20 Dec 2002).
c
No gene annotation in this accession.
Data and Methods from the rice cp genome either. We used the reviews of Millen et al.
(2001) and Martin et al. (2002) cp genes as guides to confirm our
BLAST searches, especially for those genes lost or with unknown
Database Search for Cp Genome Sequences functions.
After careful comparison and annotation, a total of 98 protein-
Individual genes of the 12 published cp genome sequences (Table coding genes was found in the cp genomes of the 12 sampled
2) were downloaded from GenBank, National Center for Bio- species (Table 2). The lengths as well as the presence or absence of
technology Information (NCBI). Nomenclatures of the cp pro- those genes in each taxon are presented in Appendix 1. An open
tein-coding genes complied by Hallick and Bairoch (1994), Stoebe reading frame homologous to a known gene was given the same
et al. (1998), Martin et al. (2002), and Swiss-Prot Protein name to facilitate comparison and alignment. For some unanno-
Knowledgebase (2003) were used as guides. When synonyms were tated genes filtered by using BLASTX search, their positions in the
encountered, their sequence homologies with the typified names corresponding genomes were indicated. We excluded pseudogenes
were carefully verified. Two homology criteria were considered: and genes duplicated in the inverted repeat regions. The cp encoded
(1) the alignable length between two proteins is larger than 80% RNA genes were previously shown to be problematic in early cp
of the longer sequence, and (2) the sequence identity in the phylogeny (Martin et al. 1998; Lockhart et al. 1999) and in the
aligned region is at least 40% if L > 150, or at least 0.06 + present study as well (data not shown). Therefore, RNA genes were
4.8L)0.032 (1 þ exp()L/1000)) (Rost 1999; Gu et al. 2002). Note that we excluded from analysis.
raise the identity to 40% instead of 30% because the taxa we
sampled are comparatively recent and cp genes are highly con-
served (Wolfe et al. 1989). Alignment of All Cp Genes and Phylogenetic Analyses
Since the sequence of Medicago was not annotated, its protein-
coding genes were annotated using the Nucleotide query–Protein Amino acid sequences of each gene from the 12 taxa were first
database (BLASTX) algorithm at NCBI with each known gene aligned one by one using GeneDoc (Nicholas and Nicholas 1997)
from Lotus as query. If a particular gene was missing from Lotus, with minor adjustments. The alignment was then used as a guide
that gene from the rest of the 10 taxa was used instead. Open for aligning the corresponding nucleotide sequences. Unknown
reading frames annotated by us were also verified using the BLAST sites, start and stop codons, and regions difficult to align were
2 sequences algorithm and the Nucleotide query–Translated db removed from each gene alignment. All aligned individual gene
algorithm in NCBI against the corresponding gene and the whole sequences were then assembled using the Text Editor in MEGA 2.1
genome of Arabidopsis, respectively. A query sequence with more (Kumar et al. 2001). Gaps were completely deleted from the as-
than 40% identity to the specific known genes was then considered sembled alignment concatenated from the 61 cp protein-coding
as a putative homologous gene. A remnant of the accD gene in the genes common to the 12 sampled taxa (see also Results). The
rice was reported previously (Hiratsuka et al. 1989) but could not working data file (in MEGA format) is shown in Appendix 2,
be detected by Katayama and Ogihara (1996) or Ogihara et al. available in the Supplementary Material Section at the JME Web
(2002) using Southern hybridization. We were not able to locate it site.
428
Nucleotide sequence divergence between a pair of taxa (or Pennsylvanian to the upper Triassic (215–310 Myr ago). The
groups) was calculated in terms of the numbers of substitutions per earliest fossil evidence of trees bearing the typical conifer’s bisac-
synonymous site (Ks) or per nonsynonymous site (Ka), using the cate pollen that germinates distally dates from the late Carbonif-
Pamilo–Bianchi–Li method implemented in MEGA 2.1. Diver- erous to early Permian (ca. 250–290 Myr ago) and conifer relatives
gence value between two groups is presented as average distance ± are known from ca. 310 Myr ago (Rai et al. 2003). Gymnosperms
standard error, obtained from the option Compute Between and angiosperms are the two major taxa of seed plants, distinct
Groups Means in MEGA 2.1. Average distance between two since the end of the Carboniferous, 300 Mya (Bow et al. 2003).
groups is the arithmetic mean of all pairwise distances between taxa From the above considerations, we took 280–310 Myr as an upper
in the intergroup comparisons. To date the divergence between the bound for the split between the conifer and the angiosperm line-
monocot and the dicot lineages, Saito and Nei’s (1987) neighbor- ages.
joining (NJ) method and the Ka values (not Ks, because substitu- Fossil leaves of rice (belonging to the grass family Poaceae)
tions at the third codon positions are saturated across sampled land have been described from the upper Eocene, about 40 Myr ago
plant lineages; see Results) were used to reconstruct the phylo- (Stebbins 1981), and the earliest unequivocal evidence of grass
genetic trees, rooted at the top of the Pinus lineage. Because the six fossils (including spikelets and inflorescence with pollen) were
sampled dicots (Table 2) represent the two large clades (the rosids found in Paleocene–Eocene deposits, about 50–60 Myr ago (Crepet
and the asterids) of core eudicots and one of the remaining four and Feldman 1991). Initial radiation of the grass family was sug-
small core eudicot clades, they can be used to infer the age or gested to be 65 Myr ago (Stebbins 1987; Thomasson 1987). Bremer
diversification date of core eudicots. The NJ trees reconstructed by (2000) regarded the 50–70 Myr ago estimate of a maize–wheat
Ka values and Ks values were rooted at the monocot lineage (see divergence used by Wolfe et al. (1989) as rather uncertain. More-
Results). Relative support for each node was evaluated using the over, phylogenetic analyses of the cp rpl16 intron sequences (Zhang
bootstrap test and the interior branch test implemented in MEGA 2000), eight character sets (GPWG 2001), cp genome structure
2.1 with 2000 replicates. The latter test is constructed based on the (Ogihara et al. 2002), and cp genomic comparison (Matsuoka et al.
interior branch length and its standard error. If this value is higher 2002) indicated that in the grass family, Oryzoideae (rice) and
than 95% for a given branch, then the inferred length for that Pooideae (wheat) diverged after the subfamily Panicoideae (maize),
branch is considered significantly higher than 0 (Kumar et al. which was preceded by four other subfamilies (Zhang 2000).
2001). To compare the evolutionary rates of sampled fern, pine, Therefore, we took 50–60 Myr as a reasonable estimate of the
monocot, and dicot lineages, Tajima’s relative rate test (1993) im- maize–wheat split.
plemented in MEGA 2.1 was applied. Because the method does not
distinguish between Ka, and Ks, the first and second codon posi-
tions of the combined 61 cp protein-coding genes were used
instead. Results
To date the divergence between the monocots and the dicots, three The concatenated lengths of all known cp functional
split events (see Figs. 1 and 4) with reliable fossil dates were used as protein-coding genes (Appendix 1) in the 12 sampled
reference nodes: (C1) the Psilotum (fern)–seed plant split (400–420 species (Table 2) range from 58,095 bp in the Triticum
Myr old [Pryer et al. 2001]); (C2) the Pinus (conifer)–angiosperm
to 71,509 bp in the Marchantia; the average is 63,661
split (280–310 Myr old); and (C3) the maize–wheat split (50–60
Myr old). Since uncertainties about the age of the reference node ± 4,764 bp. Sixty-one cp protein-coding genes, which
were a probable reason behind the discrepancies among previous encode two envelope membrane proteins (cemA,
estimates of angiosperm origin (Bremer 2000; Sanderson and Doyle ycf9), 1 maturase (matK), 1 protease (clpP), 34 pho-
2001), we have carefully examined the dates of our calibration tosynthetic light reactions (atpA, atpB, atpE, atpF,
nodes.
atpH, atpI, petA, petB, petD, petG, petL, petN, psaA,
psaB, psaC, psaI, psaJ, psbA, psbB, psbC, psbD, psbE,
Fossil Dates psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN,
psbT), 18 ribosomal proteins (rpl2, rpl14, rpl16, rpl20,
Psilotum has been repeatedly suggested as a member of ferns by rpl32, rpl33, rpl36, rps2, rps3, rps4, rps7, rps8, rps11,
molecular data (e.g., Nickrent et al. 2000; Pryer et al. 2001), but the rps12, rps14, rps15, rps18, rps19), 4 RNA polym-
architecture of its sperm cell suggests that Psilotum is an early erases (rpoA, rpoB, rpoC1, rpoC2), and 1 cytochrome
divergent fern (Renzaglia et al. 2001) with relatively remote affin-
ities to Ophioglossaceae (a basal fern family) and Equisetaceae
c biogenesis protein (ccsA), are in common to all 12
(sphenopsids). Kenrick and Crane (1997) considered that the basal taxa. After elimination of unknown sites, regions
dichotomy of Euphyllophytina occurred in the early–mid Devo- difficult to align, start and stop codons, and all gaps,
nian (ca. 400–420 Myr ago) and resulted in two clades: one con- 39,507 sites were used for comparison and tree
taining the extinct Psilophyton and the other ferns, horsetails, and reconstruction.
seed plants. We took this splitting date as the lower bound for the
divergence between Psilotum and seed plants.
As shown in Table 3 (the first row), the 12 cp ge-
Pinus is a genus of Pinaceae, which contains over 230 species nomic sequences are AT-rich. This bias is particularly
and is the largest and most basal family of conifers (Hart 1987; strong at the third codon positions, primarily because
Price et al. 1993; Chaw et al. 1997). Delevoryas and Hope (1973) of the high T nucleotide contents. These data are
and Miller (1977, 1988) proposed that the Triassic (206–248 Myr consistent with the high AT content found earlier in
ago) period may represent a time when modern conifers were
evolving. Cladistic and stratigraphic analyses of living seed plants
the plastid genome (Whitfeld and Bottemley 1983).
(Doyle and Donoghue 1987; Crane 1988; Doyle 1998) suggested Across the 11 tracheophytes nucleotide base compo-
that diversification of modern seed plants occurred from the lower sitions are homogeneous at the first and second co-
429
Table 3. Nucleotide base composition (%) of the concatenated 61 cp protein-coding genes in Marchantia and 11 sampled tracheophytes
Codon positiona A C G T pb
All 33.2/29.2 ± 0.4 (1.4%) 14.4/18.4 ± 0.5 (2.7%) 17.8/21.5 ± 0.4 (1.9%) 34.6/31.0 ± 0.5 (1.6%) 0.000
1st 30.3/28.2 ± 0.4 (1.4%) 16.9/19.6 ± 0.3 (1.5%) 29.1/30.2 ± 0.3 (1.0%) 23.7/22.0 ± 0.3 (1.4%) 0.793
2nd 29.1/27.2 ± 0.3 (1.1%) 20.8/21.7 ± 0.2 (0.9%) 17.4/19.1 ± 0.3 (1.6%) 32.6/32.0 ± 0.3 (0.9%) 0.981
3rd 40.1/32.3 ± 0.8 (2.5%) 5.3/13.8 ± 1.0 (7.2%) 7.0/15.0 ± 0.8 (5.3%) 47.5/39.0 ± 1.1 (2.8%) 0.000
a
The start and stop codons were not included in analysis. Data on Marchantia are before the slash and the average of 11 sampled
tracheopytes is after the slash and presented as mean ± SE (coefficient of variation).
b
Probabihty (p) was based on v2 tests for homogeneity across the 11 sampled tracheophytes using PAUP 4.0b1 (Swofford 1998).
don positions (v2 test, p = 0.793 and 0.981) but not tracheophyte lineages are 100%, and the latter test
so at the third codon positions (p < 0.000). The G yielded a higher percentage support for the rice +
content is particularly high at the first codon posi- wheat clade. The phylogenetic relationships of the
tions in all taxa and Marchantia much prefers the use monocot lineage and the six core eudicots generally
of synonymous codons ending with A or T. agree well with those in recent multigene trees (Fig.
The mean Ka/Ks ratio for all species pairs is 0.19. 1A [Qiu et al. 1999; PS Soltis et al. 1999; DE Soltis et
The mean Ka/Ks ratio difference between the mono- al. 2000]) except that in our NJ tree the Caryophyll-
cot (0.156) and the dicot (0.158) lineages is small. ales (represented by Spinacia) and asterid (repre-
These data are suggestive of stringent selective con- sented by Nicotiana) are well resolved as sister clades.
straints on amino acid substitutions and correlate This relationship was previously revealed in the trees
well with the observed higher GC contents at the first made by Wolfe et al. (1989) and Goremykin et al.
and second positions (Table 3). (2003).
The NJ tree reconstructed from the Ks values ap-
The Inferred Phylogenetic Trees pears to be unreliable because it placed Arabidopsis as
basal to the remaining dicots (data not shown),
Figure 1A was simplified from the topology of the contrary to the most recent multigene phylogenies of
maximum parsimony (MP) trees reconstructed with angiosperms (Mathews and Donoghue 1999; Qiu et
multigenes by Qiu et al. (1999) and by S. P. Soltis al. 1999; PS Soltis et al. 1999; DE Soltis et al. 2000;
et al. (1999). Figure 1B is a NJ tree reconstructed with Chase et al. 2000). These data caused us to question if
Ka values using Marchantia as the outgroup. The the third codon position, where most synonymous
topology of this tree strongly indicates that, to the substitution occurs, is saturated with substitutions.
exclusion of the fern (Psilotum) lineage, the seed To assess levels of sequence saturation with the
plants form a monophyletic clade, within which the concatenated cp genes, pairwise uncorrected numbers
conifer (Pinus) lineage and the angiosperms comprise of transitions and transversions (uncorrected P) were
two separate subgroups. The sampled angiosperms plotted against corrected (Kimura’s two-parameter)
are subdivided into two well-supported lineages, the sequence distance (Fig. 2). Sixty-six paired points [12
monocots and the eudicots. Both bootstrap and in- · (12 ) 1)/2] are presented in Fig. 2. The curves of
terior branch tests for the above-mentioned major both uncorrected transitions and transversions
430
Table 4. Estimates of the monocot–dicot divergence and the age of core eudicots based on the constant rate method
Outgroup Calibration event (fossil dates; Myr) Ka Rateb (·10)9) Ka Time (Myr)
Monocot–dicot divergence
Marchantia C1: Fern–seed plant divergence (400–420) Ka: 18.03 ± 0.22 0.215–0.225 Ka: 9.30 ± 0.24 206 ± 5–217 ± 6
Psilotum C2: Conifer–angiospenn divergence (280–310) Ka: 14.40 ± 0.29 0.232–0.257 Ka: 9.28 ± 0.34 180 ± 7–200 ± 7
Pinus C3: Maize–wheat divergence (50–60) Ka: 1.97 ± 0.23 0.164–0.197 Ka: 9.35 ± 0.29 237 ± 5–285 ± 9
Origin of core eudicots
Pinus C3: Maize–wheat divergence (50–60) Ka: 1.97 ± 0.23 0.164–0.197 Ka: 6.08 ± 0.26 154 ± 7–185 ± 8
Monocots C3: Maize–wheat divergence (50–60) Ks: 12.10 ± 0.11 1.008–1.210 Ks: 36.06 ± 0.51 149 ± 2–181 ± 3
a
K denotes the number of substitutions per 100 synonymous (Ks) or nonsynonymous (Ka) sites between pair of taxa or groups.
b
Rate (r) is defined as the number of substitutions per site per year, r = K/(2T) (Li and Grauer 1991).
ever, we found that the grass lineage has also recruited tutions per nonsynonymous site per year, respec-
nine novel genes; one of them, ycf68, is shared with the tively. Clearly, these three calibrated Ka rates are
pine lineage, and the remaining eight, ycf69–76, are unequal, differing from one another by from 8%
unique. Functions of these genes are not known yet [(0.232 ) 0.215)/0.215] to nearly 42% [(0.232 ) 0.164)/
and they have no detectable homology to prokaryotic 0.164], and the conifer–angiosperm’s Ka rate is the
genes (Martin et al. 2002). Except for spinach, all highest.
sampled eudicots have lost the translational initiation
factor 1 (infA). According to an extensive survey of Dates of the Monocot–Dicot Divergence and the
more than 300 diverse angiosperms by Millen et al. Origin of Core Eudicots
(2001), the infA gene of the cp genome has repeatedly
become defunct in about 24 separate angiosperm lin-
eages, including almost all rosid species. Molecular Clock or Rate Constancy Method. The
date of the monocot–dicot divergence was estimated
by applying the equation T = K/(2r). As indicated in
Nucleotide Substitution Rates Table 4, based on the entire dataset and the calibra-
tion points C1, C2, and C3, three time estimates for
Before applying molecular calibration, we assessed the monocot–dicot divergence, 206 ± 5–217 ± 6, 180
the assumption of rate constancy. Fig. 1B shows that ± 7 –200 ± 7, and 237 ± 5–285 ± 9 Myr, were
the branches from the calibration point C1 leading to obtained. These estimates suggest that the monocot–
the Psilotum (fern) lineage and the Pinus lineage are dicot divergence took place 220 ± 40 Myr ago.
not equal in length. The NJ trees in Figs. 3B, C, and Using either the Ka or the Ks rates of the maize–
D, using Marchantia, Pinus, and Psilotum as the wheat divergence and the mean Ka values (see node B
outgroup, respectively, also indicate that the Ka rates in Fig. 1B and Fig. 4) of all six eudicots or the Ks
in the monocot and the dicot lineages are unequal. values between the rosid clade and the asterid +
The monocot lineage has evolved faster than the di- Caryophyllales clade (see node B in Fig. 3A), the
cots, by 39.6, 37.3, and 32.3%, respectively, for the divergence for core eudicots was estimated to be 154
three outgroups. In Fig. 1B the branches from node ± 7–185 ± 8 and 149 ± 2–181 ± 3 Myr ago (Table
A leading to Arabidopsis, Spinacia, and Nicotiana are 4), respectively. These two estimates are close to each
strikingly shorter than those leading to the other other, and their average is 170 Myr ago.
three dicots and the monocots. Tajima’s relative rate
test using rice, Marchantia, Psilotum, and Pinus as Li–Tanimura Method. Figure 4 was simplified
outgroups, respectively, confirmed this observation from the phylogenetic tree Fig. 1B with all branch
(all p’s < 0.001). However, exclusion of the above lengths indicated. The branch length of core eudicots
three slower dicot lineages (gray lines in Figs. 3B–D) was calculated as the mean length of the branch
led to even higher estimates of divergence dates (data leading from their emergence point (node B) to the six
not shown). We therefore used the entire dataset. core eudicots. We then used the Li–Tanimura meth-
By applying the equation, r = K/(2T), where K is od, which uses lineages in which the molecular clock
the distance and T is the divergence time between the holds better than the others, to estimate the diver-
two taxa compared, nonsynonymous rates were cal- gence time at points A and B. For example, we know
ibrated and are shown in Table 4. Based on the three that the branching date for Pinus (node C2) is 280–
divergence events, C1, C2, and C3, and the dataset 310 Myrs ago and want to estimate the branching
with all six dicots, the Ka rates are 0.215–0.225 · 10)9, dates between the monocot and the dicot lineages.
0.232–0.257 · 10)9, and 0.164–0.197 · 10)9 substi- The distances from node C2 to Pinus, monocots, and
432
C1 —a 380–421
C2 295–309 —
A l44–151 137–152
B 110–115 104–115
a
Nonapplicable.
nomes is quickly increasing, this trend may be re- Comparison of Estimates from the Molecular Clock
tested soon. and Li–Tanimura Methods
Significant rate variation in the cp genomes of the
tracheophyte lineages is also consistent with the fin- Tables 4 and 5 show that the dates of the monocot–
ding of P. S. Soltis et al. (2002), who studied one dicot split and the origin of core eudicots estimated
nuclear and three plastid genes using MP analyses. In by the rate constancy and Li–Tanimura methods
summary, the molecular clock hypothesis does not differ greatly, with estimates from the former method
hold for the Ka rates among the cp genomes of tra- predating the latter by 50 Myr. Estimates calibrated
cheophyte plants. from nodes C1 and C2 using the molecular clock
method vary more than those obtained from the Li–
Tanimura method.
Reference Fossil Dates and the Phylogenetic Tree In the rate constancy method we used the arith-
Obtained metic mean of all pairwise Ka distances between
monocots and dicots to estimate the divergence date
In the Data and Methods section we have carefully (Table 4). As a result, the obtained dates for the
cross-examined the three fossil dates by adopting monocot–dicot split (220 Myr) and for the origin of
updated phylogenies and documented fossil records. the core eudicots (170 Myr) appear to be severely
Bremer (2000, pp 4709, 4710) suggested that in overestimated because the three high-rate grasses
phylogenetic dating rate calibration rather than were included in the distance calculation. This was
unequal substitution rates is the major source of also the case in most previous estimates (Wolfe et al.
error and is behind the discrepancies in earlier es- 1989; Martin et al. 1989, 1993; Brandl et al. 1992;
timates of monocot and flowering plant evolution. Laroche et al. 1995; Yang et al. 1999), which not only
Indeed, in Table 4 the three calibrated rates based used the molecular clock hypothesis but also included
on the molecular clock are discrete, and the ob- one (maize) or several fast-evolving grass species (or
tained dates for monocot–dicot divergence do not annual Liliales [such as Ramshaw 1972]).
agree with one another. To evaluate if the fossil Together with the preceding age estimates for the
dates and the cp Ka rates corresponded well with monocot–dicot split and the origin of core eudicots,
each other with respect to the two dating methods, we concluded that the Li–Tanimura method can
we also used the divergence rates and the Ka dis- substantially reduce the effect of rate variation among
tances (Table 4) from the fern–seed plant and con- lineages and provide an estimate more in line with
ifer–seed plant splits to date the other’s divergences. known fossil data.
The rate constancy method led to an estimate of
350–390 Myr ago for the former event and 320–335
Myr ago for the latter. These two estimates differ Comparison of Our Estimates with Previous Estimates
widely from the fossil records. In contrast, the di-
vergence times (Table 5) of these two events esti- Since our estimates based on the rate constancy
mated from the Li–Tanimura method are highly method seem unreliable, we shall compare only esti-
compatible with the paleobotanical data. mates obtained from the Li–Tanimura method with
Sanderson and Doyle (2001) proposed that (1) those from other methods. Goremykin et al. (1997)
biases in the data or the statistical estimation used a very similar framework of the Li–Tanimura
method used, (2) variation in rate across sites which method (1987) and claimed that their approach is
‘‘causes sequence divergences to be estimated in- ‘‘independent of the rate fluctuation on the grass
correctly,’’ and (3) incorrect phylogenies are the (high rate) and Marchantia (low rate) branches.’’
underlying sources of error in molecular dating. P. They estimated the divergence time between the Zea–
S. Soltis et al. (2002) added that ‘‘inadequate sam- Oryza lineage and the Nicotiana lineage to be 160
pling of taxa...can compound the problem.’’ The Myr ago, which predates ours by about 10–20 Myr
same concern could be raised about the results we (Table 5). Based on our cp genome data (Figs. 1B and
present here. However, the effect of these problems 3A), the Nicotiana lineage has the slowest Ka and Ks
is likely to have been considerably reduced by the rates among the sampled dicots but was used as half
sampling of 12 evolutionary successive land plants of the denominator by Goremykin et al. (1997) in
(Table 2; including all three living subclasses of estimating the monocot–dicot divergence time. Since
eudicots), the use of 61 genes (>39,000 bp long) our estimates of the dates for the monocot–dicot
with different functions from the complete cp ge- lineage and the origin of core eudicots were based on
nomes, and the highly reliable NJ tree (Fig. 1B), the mean branch length of six dicots, our data should
which is consistent with the NJ tree of Goremykin be more reliable than data based on single species.
et al. (1997), inferred from concatenating 14,295 In order to reduce the effects of unequal rates,
amino acids of cp genomes. Bremer (2000) used the mean branch lengths from a
434
group of terminal taxa to their common node (which sample: Oenothera), and 83 Myr for the Caryophyll-
has a known fossil age) for calculating the change ales clade (represented by our sample: Spinacia)
rate (distance/age by Bremer’s definition). Using the (Magallón et al. 1999). In addition, our estimate for
rbcL gene, the MP tree of monocots, and the eight the age of core eudicots is reasonably shorter than the
reference nodes with known fossil dates, Bremer fossil age of a basal eudicot, Tetracentraceae, from
(2000) estimated the split between Acorus, presuma- the Barremian (110–118 Myr ago [Magallón et al.
bly the basalmost extant monocot (APG 1998; Chase 1999]). Collectively, our cp genomic data indicate
et al. 2000), and the remaining monocots at 134 Myr that the core eudicots’ age is also older than known
ago. According to the integrated and widely used fossil records indicate.
phylogenetic tree for the orders of flowering plants
(APG 1998), the separation of the monocot lineage Conclusions
from the other magnoliids predated the branching-off
of eudicots. Therefore, Bremer’s estimate is compat- We observed significant Ka rate variation in cp ge-
ible with ours for the monocot–dicot split (140–150 nome data among major tracheophyte lineages.
Myr ago) and the core eudicot divergence (100–115 Therefore, the rate constancy method is not appro-
Myr ago). priate for dating the divergence between monocots
Using a single gene (rbcL) and the NPRS meth- and dicots or the age of eudicots, especially if fast-
od, two genes (plus 18S rRNA) and maximum evolving monocots are included. Using cp genome
likelihood analyses, and the calibration date of data, we demonstrated that the Li–Tanimura method
Marchantia (450 Myr ago), Sanderson (1997) and gives estimates that better reflect the known evolu-
Sanderson and Doyle (2001) estimated that the age tionary sequence of tracheophyte lineages and cor-
of crown angiosperms originated 160 and 140–190 respond well with the fossil records of calibration
Myr ago, respectively. Combining a three-gene da- points we used.
taset (rbcL, atpB, and 18S rRNA), the NPRS Combining our estimates calibrated by two known
method, and the split between Fagales and Cu- fossil nodes and the Li–Tanimura method, we pro-
curbitales (84 Myr ago), Wikström et al. (2001) pose that the monocot lineage branched off from
proposed the origin of the extant angiosperms to be dicots 140–150 Myr ago, in the late Jurassic to early
158–179 Myr old and that of eudicots to be 131–147 Cretaceous, and that the core eudicots radiated 100–
Myr old. These estimates are in good agreement 115 Myr ago, between the Albian and the Aptian of
with ours, as the dicots we sampled are all eudicots. Cretaceous. These estimates are in accordance with
Recent multigene analyses of angiosperm evolution those of Sanderson (1997) and P. S. Soltis et al.
have revealed that the monocot–dicot divergence (2002), who analyzed one to three genes and used
was preceded by five living basal dicot lineages, the MP and ML branch lengths with the NPRS method.
Amborellaceae, the Nymphaeales, and a group in- In summary, methods that accommodate unequal
cluding Illiciaceae, Trimeniaceae, and Austrobailey- rates give smaller estimates than the rate constancy
aceae (i.e., the so-called ANITA group) (Qiu et al. method and appear to agree well not only with one
1999; PS Soltis et al. 1999; DE Soltis et al. 2000; see another, but also with the recently documented fossil
Fig. 1A), and an extinct basal angiosperm lineage, evidence.
the Archaefructaceae (Sun et al. 2002). Therefore, Our results confirm all previous conclusions that
previous estimates for angiosperms’ origin based on molecular data indicate a pre-Cretaceous origin for
the monocot–dicot split have underestimated the age angiosperms, but our estimates for the monocot–
of angiosperms themselves. The above authors’ es- dicot divergence postdate previous estimates based
timates are consistent with ours if we postulate that on the molecular clock hypothesis by at least 50 Myr
approximately 20 (=160 ) 140) to 40 (=190 ) 150) (=200–150 Myr ago).
Myr separates the angiosperm origin and the split
between the ancestors of the monocot and eudicot
lineages. Acknowledgments. We thank Robert Friedman for critical com-
ments on an early version of the manuscript and Yoshihiro
Our estimated date for the origin of core eudicot
Matsuoka and Shu-Shin Wu for help with the gene group assign-
lineages is 100–115 Myr ago (Table 5). This is earlier ment for the three grasses and other taxa. We also thank the two
than the many documented fossil-based estimates for reviewers’ critical and valuable comments and suggestions. This
core eudicots, such as a possible Rhamnaceae/Rosa- work was supported in part by National Science Council Grant
ceae (both are rosids, represented here by our sam- NSC912311B001103, and Academia Sinica Grant IB91 to S.M.C.,
and NIH Grant GM30998 to W.H.L.
ples: Lotus, Medicago, Arabidopsis, and Oenothera)
from the early Cenomanian (94–97 Myr [Basinger
and Dilcher 1984]), 89 Myr for the Caparales (rep- Appendix
resented by our sample: Arabidopsis), 84 Myr for
Myrtales (Magallon et al. 1999) (represented by our Appendix Table A1 continues on next page.
Table A1. Names and sizes of protein-coding genes in the chloroplast genomes of the 12 species (Table 2) analyzed in this study
Taxon
Genea Marchantia Psilotum Pinus Triticum Oryza Zea Lotus Medicago Arabidopsis Spinacia Nicotiana Oenothera
b c
accD 951 933 966 — — — 1,506 1,512 1,467 1,569 1,539 1,317
(ORF316)
atpA 1,524 1,527 1,485 1,515 1,524 1,524 1,533 1,536 1,524 1,296 1,524 1,518
atpB 1,479 1,479 1,479 1,497 1,497 1,497 1,497 1,497 1,497 1,497 1,497 1,497
atpE 408 402 414 414 414 414 402 402 399 405 402 402
atpF 555 555 555 552 543 552 555 552 555 555 555 555
atpH 246 246 246 246 246 246 246 246 246 246 246 246
atpI 747 747 747 744 744 744 744 738 750 744 744 744
ccsA 963 933 963 969 966 966 972 972 987 972 942 960
(ORF320) (ORF320) (ycf5) (ORF321) (ORF321) (ycf5) (ycf5) (ycf5) (ycf5) (ycf5) (ycf5)
cemA 1,305 1,350 786 693 693 693 690 690 690 702 690 690
(ORF434) (ycf10) (ORF261) (ORF230) (ycf10) (ORF229) (ycf10)
chlB 1,542 — 1,533 — — — — — — — — —
(ORF513)
chlL 870 — 876 — — — — — — — — —
(frxC)
chlN 1,398 — 1,404 — — — — — — — — —
(108667–110064)
chlP 612 597 591 651 651 651 591 588 591 591 591 750
(ORF203) (ORF216)
cysA 1,113 — — — — — — — — — — —
(mbpX)
cysT 867 — — — — — — — — — — —
(ORF288)
infA 237 243 237 342 324 324 — — — 177 Wd —
matK 1,113 1,512 1,548 1,629 1,629 1,635 1,527 1,521 1,581 1,518 1,530 1,539
(ORF370i) (ORF542)
ndhA 1,107 1,116 — 1,089 1,089 1,089 1,092 1,077 1,083 1,098 1,092 1,092
(ndh1)
ndhB 1,504 1,491 Ye 1,533 1,533 1,533 1,164 1,164 1,164 1,533 1,533 1,533
(ndh2)
ndhC 363 363 Y 363 363 363 363 363 363 363 363 363
(ndh3)
ndhD 1,500 1,545 Y 1,503 1,503 1,503 1,494 1,494 1,503 1,383 1,503 1,503
(ndh4)
ndhE 303 321 Y 306 306 306 306 306 306 306 306 306
(ndh4L)
ndhF 2,079 2,223 — 2,220 2,205 2,217 2,244 2,235 2,241 2,229 2,223 2,211
(ndh5)
ndhG 576 567 — 531 531 531 531 531 531 531 531 531
(ORF191)
435
Continued
Table A1. Continued
436
Taxon
Genea Marchantia Psilotum Pinus Triticum Oryza Zea Lotus Medicago Arabidopsis Spinacia Nicotiana Oenothera
ndhH 1,179 1,182 Y 1,182 1,182 1,182 1,182 1,182 1,182 1,182 1,182 1,182
(ORF392) (ORF393)
ndhI 552 498 Y 543 537 543 486 486 519 513 504 498
(frxB) (ORF178)
ndhJ 510 477 — 480 480 480 477 477 477 477 477 477
(ORF169) (ORF480)
ndhK 732 624 Y 738 741 747 693 684 678 846 744 744
(psbG) (psbG) (psbG) (psbG)
petA 963 966 960 963 963 963 963 963 963 963 963 957
petB 648 648 648 648 648 705 648 648 648 648 636 648
petD 483 483 543 483 483 483 483 483 483 483 483 483
petE 114 114 114 114 114 114 114 114 114 114 114 114
(ORF37) (petG) (petG) (petG) (petG) (petG) (petG) (petG) (petG) (petG)
petL 96 96 189 96 96 96 96 96 96 96 96 96
(ORF31) (ORF62b) (ORF31) (ORF31) (ycf7) (ORF31)
petN 90 90 90 90 90 90 90 90 90 90 90 90
(5168–5257) (ORF29) (ycf6) (ORF29) (ORF29) (ycf6) (ycf6) (ycf6) (ycf6) (ycf6) (ycf6)
psaA 2,253 2,253 2,262 2,253 2,253 2,253 2,253 2,277 2,253 2,253 2,253 2,256
psaB 2,205 2,205 2,205 2,205 2,205 2,208 2,205 2,205 2,205 2,205 2,205 2,205
psaC 246 246 246 246 246 246 246 246 246 246 246 246
(frxA)
psaI 111 111 159 111 111 111 105 105 114 102 111 105
(ORF36b) (ORF36)
psaJ 129 129 135 129 135 129 135 135 135 135 135 132
(ORF42b) (ORF44)
psaM 99 99 93 — — — — — — — — —
(ORF32)
psbA 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062
psbB 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527 1,527
psbC 1,422 1,386e 1,422 1,422 1,422 1,422 1,422 1,422 1,422 1,422 1,386 1,422
(1422)
psbD 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062 1,062
psbE 252 252 252 252 252 252 252 252 252 252 252 252
psbF 120 120 120 120 120 120 120 120 120 120 120 120
psbH 225 225 228 222 222 222 222 219 222 222 222 222
(ORF74)
psbI 111 111 158 111 111 111 111 111 111 111 111 111
(ORF36a) (8398–8508)
psbJ 123 123 123 123 123 123 123 123 123 123 123 123
(ORF40) (ORF40)
psbK 168 177 180 186 186 186 186 186 180 180 186 180
(ORF55) (ORF98)
psbL 117 117 117 117 117 117 117 117 117 117 117 117
(ORF38)
psbM 105 105 114 105 105 105 105 105 105 105 105 105
(ORF34)
psbN 132 132 132 132 132 132 132 132 132 132 132 132
(ORF43)
psbT 108 99 108 117 108 102 102 108 102 102 105 108
(ORF35) (ORF35) (ORF33)
rbcL 1,428 1,428 1,428 1,434 1,434 1,431 1,428 1,428 1,440 1,428 1,434 1,428
rpl2 612 834 831 822 822 822 825 792 825 819 825 825
(ORF203)
rpl14 369 369 369 372 372 372 369 369 369 366 372 369
rpl16 432 423 405 411 411 411 408 408 408 408 405 408
rpl20 351 345 360 360 360 360 366 360 354 387 387 393
rpl21 351 390 — — — — — — — — — —
rpl22 360 351 429 447 450 447 — — 483 600 468 414
rpl23 276 273 276 282 282 282 282 282 282 W 282 282
(ORF42)
rpl32 210 183 213 192 192 180 153 180 159 174 168 156
(ORF69) (ORF63)
rpl33 198 198 207 201 201 201 201 201 201 201 201 201
rpl36 114 114 114 114 114 114 114 114 114 114 114 114
(secX)
rpoA 1,023 1,023 1,008 1,020 1,014 1,020 1,002 1,002 990 1,008 1,014 1,104
rpoB 3,198 3,201 3,228 3,321 3,228 3,228 3,213 3,213 3,219 3,213 3,213 3,219
rpoC1 2,055 2,025 2,091 2,052 2,049 2,052 2,049 2,061 2,043 2,034 2,067 2,040
rpoC2 4,161 4,227 3,675 4,440 4,542 4,584 3,999 4,145 4,031 4,086 4,179 4,161
rps2 708 720 705 711 711 711 711 711 711 711 711 711
rps3 654 663 654 720 720 675 657 636 657 657 657 657
rps4 609 600 597 606 606 606 606 606 606 606 606 612
rps7 468 468 468 471 471 471 468 474 468 468 468 468
rps8 399 399 399 411 411 411 405 405 405 405 405 417
rps11 393 393 393 432 432 432 417 417 417 417 426 435
rps12 372 372 372 369 375 375 372 372 372 372 372 372
rps14 303 303 300 312 312 312 303 303 303 303 303 303
rps15 267 261 267 273 273 237 273 276 267 273 264 264
rps16 — — — 258 189 258 243 — 240 267 258 267
rps18 228 228 303 513 492 513 315 297 306 306 306 306
rps19 279 279 279 282 282 282 279 279 279 279 279 279
ycf1f 3,207 5,112 5,271 — — — Y — 1,032 5,502 1,053 7,305
(ORF1068) (ORF1756) (ORF350)
ycf2g 6,411 6,942 6,165 — — — 6,897 5,658 6,885 6,396 6,843 6,843
(ORF2136) (ORF2054)
ycf3 513e 516 510 513 510 513 381 381 381 498 507 477
(ORF167) (ORF169) (ORF170) (ORF170)
(378)
ycf4 555 555 555 558 558 558 603 576 555 555 555 558
(ORF184) (ORF184) (ORF185) (ORF185)
437
Continued
Table A1. Continued
438
Taxon
Genea Marchantia Psilotum Pinus Triticum Oryza Zea Lotus Medicago Arabidopsis Spinacia Nicotiana Oenothera
ycf9 189 189 189 189 189 189 189 189 189 189 189 189
(ORF62) (ORF62) (ORF62) (ORF62)
ycf12 102 102 102 — — — — — — — — —
(ORF33) (ORF33)
ycf15 — — — — — 300 204 — 234 192 264 Y
(ORF99) (140818–141021) (ORF77) (90848–91039)
ycf66 408 — — — — — — — — — — —
(ORF135)
ycf68 — — 228 435 402 405 — — — — — —
(ORF75a) (92991–93425) (ORF133) (ORF133)
ycf69 — — — 177 216 177 — — — — 396 —
(124696–124872) (ORF72) (ORF58) (ORF131)
ycf70 — — — 129 270 210 — — — — — —
(14538–14666) (ORF91) (ORF69)
ycf71 — — — 153 249 225 — — — — — —
(80773–80925) (ORF82) (ORF75)
ycf72 — — — 414 414 414 — — — — — —
(81048–81461) (ORF137) (ORF137)
ycf73 — — — 750 750 522 — — — — — —
(83758–84507) (ORF249) (ORF173)
ycf74 — — — 150 330 150 — — — — — —
(94467–94616) (ORF109) (ORF49)
ycf75 — — — — 192 192 — — — — — —
(ORF63) (ORF63)
ycf76 — — — 255 258 258 — — — — — —
(124382–124636) (ORF85) (ORF85)
Total No. genes 87 81 73 84 85 86 77 74 79 79 80 78
Total length 71,509 68,355 60,470 58,095 58,677 58,581 61,908 60,296 63,543 67,839 64,551 70,110
Total No. genes 98 Average length, 63,661 ± 4,764
a
Gene names follow those of Martin et al. (2002) and Swiss-Prot Protein Knowlegebase (2003) and each NCBI accession of a given taxon (refer to Table 2).
b
Gene length (bp) is given after each gene name under each species. Within parentheses are the position ranges (where no annotation was available but a putatively respective gene homologue was
detected using the BLASTX in NCBI), or original gene names, or ORF names in a given taxon, respectively.
c
Absence of the gene in a given chloroplast genome.
d
Pseudogene.
e
The gene length we used was different from the NCBI annotation of a given species due to an earlier stop or longer reading frame detected.
f
Martin et al. (2002) considered that this gene is not related to prokaryotic genes and designated it ycf78.
g
A FtsH-like protein gene designated ycf77 by Martin et al. (2002).
439
Lin S, Wu H, Jia H, Zhang P, Dixon R, May G, Gonzales R, Roe H, Ozeki H (1986) Chloroplast gene organization deduced from
BA (2000) Medicago truncatula variety Jema Long A-17 chlo- complete sequence of liverwort Marchantia polymorpha chlo-
roplast, complete sequence (unpublished) roplast DNA. Nature 322:572–574
Lockhart PJ, Howe CJ, Barbrook AC, Larkum AWD, Penny D Palmer JD (1985a) Comparative organization of chloroplast ge-
(1999) Spectral analysis, systematic bias, and the evolution of nomes. Annu Rev Genet 19:325–354
chloroplasts. Mol Biol Evol 16:573–576 Palmer JD (1985b) Evolution of chloroplast and mitochondrial
Magallón S, Sanderson MJ (2001) Absolute diversification rates in DNA in plants and algae. In: MacIntyre RJ (ed) Molecular
angiosperm clades. Int J Org Evol 55:1762–1780 evolutionary genetics. Plenum Press, New York, pp 131–240
Magallón S, Crane PR, Herendeen PS (1999) Phylogenetic pattern, Parkinson CL, Adams KL, Palmer JD (1999) Multigene analyses
diversity, and diversification of eudicots. Ann Mo Bot Gard identify the three earliest lineages of extant flowering plants.
86:297–372 Curr Biol 9:1485–1488
Maier RM, Neckermann K, Igloi GL, Kossel H (1995) Complete Price RA, Thomas J, Strauss SH, Gadek PA, Quinn CJ, Palmer JD
sequence of the maize chloroplast genome: Gene content, hot- (1993) Familial relationships of the conifers from rbcL sequence
spots of divergence and fine tuning of genetic information by data. Am J Bot 80:172
transcript editing. J Mol Biol 251:614–628 Pryer KM, Schneider H, Smith AR, Cranfill R, Wolf PG, Hunt JS,
Martin W, Gierl A, Saedler H (1989) Molecular evidence for pre- Sipes SD (2001) Horsetails and ferns are a monophyletic group
Cretaceous angiosperm origin. Nature 339:46–48 and the closest living relatives to seed plants. Nature 409:618–622
Martin W, Lagrange T, Li YF, Bisanz-Seyer C, Mache R (1990) Qiu YL, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis
Hypothesis for the evolutionary origin of the chloroplast ri- M, Chen Z, Savolainen V, Chase MW (1999) The earliest an-
bosomal protein L21 of spinach. Curr Genet 18:553–556 giosperms: Evidence from mitochondrial, plastid and nuclear
Martin W, Lydiate D, Brinkmann H, Forkmann G, Saedler H, genomes. Nature 402:404–407
Cerff R (1993) Molecular phylogenies in angiosperm evolution. Rai HS, O’Brien HE, Reeves PA, Olmstead RG, Graham SW
Mol Biol Evol 10:140–162 1 (2003) Inference of higher-order relationships in the cycads
Martin W, Stoebe B, Goremykin V, Hansmann S, Hasegawa M, from a large chloroplast data set. Mol Phylogenet Evol 29:350–
Kowallik KV (1998) Gene transfer to the nucleus and the 359
evolution of chloroplasts. Nature 393:162–165 Ramshaw JAM, Richardson DL, Meatyard BT, Brown RH, Ri-
Martin W, et al. (2002) Evolutionary analysis of Arabidopsis, cy- chardson M, Thompson EW, Boulter D (1972) The time of
anobacterial, and chloroplast genomes reveals plastid phylog- origin of the flowing plants determined by using amino acid
eny and thousands of cyanobacterial genes in the nucleus. Proc sequence data of cytochrome C. New Phytol 71:773–779
Natl Acad Sci USA 99:12246–12251 Renzaglia KS, Johnson TH, Gates HD, Whittier DP (2001) Ar-
Mathews S, Donoghue MJ (1999) The root of angiosperm phy- chitecture of the sperm cell of Psilotum. Am J Bot 88:1151–1163
logeny inferred from duplicate phytochrome genes. Science Rost B (1999) Twilight zone of protein sequence alignments. Pro-
286:947–950 tein Eng 12:85–94
Matsuoka Y, Yamazaki Y, Ogihara Y, Tsunewaki K (2002) Whole Saito N, Nei M (1987) The neighbor-joining method: A new
chloroplast genome comparison of rice, maize, and wheat: im- method for reconstructing phylogenetic trees. Mol Biol Evol
plications for chloroplast gene diversification and phylogeny of 4:406–425
cereals. Mol Biol Evol 19:2084–2091 Sanderson MJ (1997) A nonparametric approach to estimating
Millen RS, Olmstead RG, Adams KL, Palmer JD, Lao NT, Heggie divergence times in the absence of rate constancy. Mol Biol
L, Kavanagh TA, Hibberd JM, Gray JC, Morden CW, Calie Evol 14:1218–1231
PJ, Jermiin LS, Wolfe KH (2001) Many parallel losses of infA Sanderson MJ, Doyle JA (2001) Sources of error and confidence
from chloroplast DNA during angiosperm evolution with intervals in estimating the age of angiosperms from rbcL and
multiple independent transfers to the nucleus. Plant Cell 18S rDNA data. Amer J Bot 88:1499–1516
13:645–658 Sato S, Nakamura Y, Kaneko T, Asamizu E, Tabata S (1999)
Miller Jr CN (1977) Mesozoic conifers. Bot Rev 43:217–280 Complete structure of the chloroplast genome of Arabidopsis
Miller Jr CN (1988) The origin of modern conifer families. In: Beck thaliana. DNA Res 6:283–290
CB (ed) Origin and evolution of gymnosperms. Columbia Schmitz-Linneweber C, Maier RM, Alcaraz JP, Cottet A, Herr-
University Press, New York, pp 448–486 mann RG, Mache R (2001) The plastid chromosome of spinach
Muse SV, Gaut BS (1997) Interlocus comparisons of the nucleotide (Spinacia oleracea): Complete nucleotide sequence and gene
substitution process in the chloroplast genome. Genetics organization. Plant Mol Biol 45:307–315
146:393–399 Shinozaki K, et al. (1986) The complete nucleotide sequence of
Nicholas KB, Nicholas HB Jr (1997) GeneDoc: Analysis and vis- tobacco chloroplast genome: Its gene organization and ex-
ualization of genetic variation. http://www.cris.com/Ketchup/ pression. EMBO J 5:2043–2049
genedoc.shtml Soltis DE, et al. (2000) Angiosperm phylogeny inferred from 18S
Nicholas KJ, Tiffney BH, Knoll AH (1983) Patterns in vascular rDNA, rbcL, and atpB sequendes. Bot J Linn Soc 133:381–461
land plant diversification. Nature 303:614–616 Soltis PS, Soltis DE, Chase MW (1999) Angiosperm phylogeny
Nickrent DL, Parkinson CL, Palmer JD, Duff RJ (2000) Multigene inferred from multiple genes: A research tool for comparative
phylogeny of land plants with special reference to bryophytes biology. Nature 402:402–404
and the earliest land plants. Mol Biol Evol 17:1885–1895 Soltis PS, Soltis DE, Savolainen V, Crane PR, Barraclough TG
Ogihara Y, Isono K, Kojima T, Endo A, Hanaoka M, Shiina T, (2002) Rate heterogeneity among lineages of tracheophytes:
Terachi T, Utsugi S, Murata M, Mori N, Takumi S, Ikeo K, Integration of molecular and fossil data and evidence for
Gojobori T, Murai R, Murai K, Matsuoka Y, Ohnishi Y, Tajiri molecular living fossils. Proc Natl Acad Sci USA 99:4430–
H, Tsunewaki K (2002) Structural features of a wheat plastome 4435
as revealed by complete sequencing of chloroplast DNA. Mol Stebbins GL (1981) Coevolution of grasses and herbivores. Ann
Gen Genomics 266:740–746 Mo Bot Gard 68:75–76
Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S, Stebbins GL (1987) Grass systematics and evolution: Past, present
Umesono K, Shiki Y, Takeuchi M, Chang Z, Aota S, Inokuchi and future. In: Sonderstrom TR, Hilu KH, Campbell CS,
441
Varkworth ME (eds) Grass systematics and evolution. Wakasugi, T, Tsudzuki J, Ito S, Nakashima K, Tsudzuki T,
Smithsonian Institution Press, Washington, DC, pp 359–367 Sugiura M (1994) Loss of all ndh genes as determined by se-
Stewart WN, Rothwell GW (1993) Paleobotany and the evolution quencing the entire chloroplast genome of the black pine Pinus
of plants, 2nd ed. Cambridge University Press, Cambridge thunbergii. Proc Natl Aad Sci USA 91:9794–9798
Stoebe B, Martin W, Kowallik KV (1998) Distribution and no- Wakasugi, T, Nishikawa A, Yamada K, Sugiura M (2002) Com-
menclature of protein-coding genes in 12 chloroplast genomes. plete nucleotide sequence of the chloroplast genome from a
Plant Mol Biol Rep 16:243–255 fern, Psilotum nudum (unpublished; available from NCBI, ac-
Sun G, Ji Q, Dilcher DL, Zheng S, Nixon KC, Wang X (2002) cession No. AP004638)
Archaefructaceae, a new basal angiosperm family. Science Whitfeld PR, Bottemley W (1983) Organization and structure of
296:899–904 chloroplast genes. Annu Rev Plant Physiol 34:279–310
Swiss-Prot Protein Knowledgebase (2003) List of chloroplast and Wikström N, Savolainen V, Chase M (2001) Evolution of the an-
cyanelle encoded proteins. http://bioinformatics.weizmann. giosperms: Calibrating the family tree. Proc R Soc Lond B
ac.il/databases/swiss-prot/release/plastid.txt, released 28 Feb 268:2211–2220
Swofford DL (1998) PAUP 4.0 b1: Phylogenetic analysis using Willis KJ, McElwain JC (2002) The evolution of plants. Oxford
parsimony (and other methods). Sinauer Associates, Sunder- University Press, New York
land, MA Wolfe KH, Li WH, Sharp PM (1987) Rates of nucleotide substi-
Tajima F (1993) Unbiased estimate of evolutionary distance be- tution vary greatly among plant mitochondrial, chloroplast,
tween nucleotide sequences. Mol Biol Evol 10:677–688 and nuclear DNAs. Proc Natl Acad Sci USA 84:9054–9058
Takezaki N, Rzhetsky A, Nei M (1995) Phylogenetic test of the Wolfe KH, Gouy MY, Yang W, Sharp PM, Li WH (1989) Date of
molecular clock and linearized trees. Mol Biol Evol 12:823–833 the monocot–dicot divergence estimated from chloroplast
Taylor TN, Taylor EL (1993) The biology and evolution of fossil chloroplast DNA sequence data. Proc Natl Acad Sci USA
plants, 1st ed. Prentice Hall, Englewood Cliffs, NJ 86:6201–6205
Thomas BA, Spicer RA (1987) The evolution and paleobiology of Yang YW, Lai KN, Tai PY, Li WH (1999) Rates of nucleotide
land plants. Croom Helm, London substitution in angiosperm mitochondrial DNA sequences and
Thomasson JR (1987) Fossil grasses. In: Sonderstrom TR, Hilu dates of divergence between Brassica and the other angiosperm
KH, Campbell CS, Varkworth ME (eds) Grass systematics and lineages. J Mol Evol 48:597–560
evolution. Smithsonian Institution Press, Washington, DC, pp Zhang W (2000) Phylogeny of the grass family (Poaceae) from
1820–1986 rpl16 intron sequence data. Mol Phylogenet Evol 15:135–146