Vous êtes sur la page 1sur 1


to ­chromosome F. No mapped scaffold could be implicated as cor- for Protein Sequences (MIPs) repeat database19 and the Institute
responding to the weak 25S signal on chromosome D. for Genomic Research (TIGR) repeat database 20 combined. No
­statistically ­significant difference was found in the amount of data
Synteny infers ancestral relationships masked in the raw sequence reads compared to the assembly, indicating
Genome-wide analyses provide insight into the nature and dynamics that the assembly provides a comprehensive transposable element dis-
of macro-syntenic relationships among rosaceous taxa. Comparison of covery resource. Moreover, previously identified transposable elements21
the map positions of 389 rosaceous conserved ortholog set (RosCOS) and nine intact long terminal repeat (LTR) retrotransposons of the Copia
markers previously bin mapped in Prunus10 to their positions on the superfamily were identified by the structure-based and homology-based
seven pseudochromosomes of F. vesca H4x4 revealed macro-syntenic searches of 30 sequenced F. vesca spp. americana fosmids22.
relationships between the two genomes (Fig. 2). Markers were deemed LTR retrotransposons occupy ~16% of the F. vesca nuclear genome,
orthologous between the two genomes when five or more RosCOS whereas CACTA elements and miniature inverted-repeat transposable
occurred within ‘syntenic blocks’ shared between the two genomes. elements (MITEs), the most numerous DNA transposable elements, rep-
This analysis revealed remarkable genome conservation between the resent 2.8% and 2.4%, respectively, of nuclear DNA. The most numerous
two taxa, with complete synteny between Prunus linkage group (PG) 2 LTR retrotransposon family has fewer than 2,100 copies. Average-size
and Fragaria chromosome (FC) 7, PG8 and a section of FC2, and PG5 angiosperm genomes have families with copy numbers greater than
with a section of FC5. Most markers mapped to PG3 were located 10,000, so the lack of highly abundant LTR retrotransposons is likely to
on FC6, with the remainder being on FC1. Markers on PG4 located be the reason F. vesca has a relatively small-size genome. High sequence
to FC3 and FC2, whereas those on PG7 mapped onto FC6 and FC1. identity between some elements suggests recent transposition activity.
New chromosomal translocations between PG1-FC5, PG3-FC1 and
PG6-FC6 were identified, adding improved resolution to a previous Transcriptome sequence
© 2011 Nature America, Inc. All rights reserved.

study7. Our data support these same broad structural relationships, Multiple complex complementary DNA (cDNA) pools provided
providing extensive evidence for the reconstruction of an ancestral F. vesca transcriptome sequence resources for gene model prediction
genome for Rosaceae with a haploid chromosome number (x) of nine, and validation23 and highlighted organ specificity of fruit and root
consistent with the base haploid chromosome number of the largest transcripts. Analysis of the relative representation of genes in the
group within the modern Rosaceae, the Spiraeoideae11. fruit- and root-specific cDNA libraries (Online Methods) identified
transcripts of overrepresented genes (>twofold, false discovery rate
Absence of large duplications in the F. vesca genome < 0.01) in the fruit (1,753 genes) and root (2,151 genes). A global per-
A comparison of the F. vesca genome against itself using MUMmer12 spective of the expression patterns of these genes in roots and fruit is
version 3.22 (Online Methods) showed that long matches form eight provided (Fig. 3). Genes overrepresented in fruit showed ­enrichment
distinct families of approximate repeats, with the largest family having for several categories of biological processes and molecular functions
15 occurrences (Supplementary Fig. 3). This family shows homo­ ­associated with fruit development and were dominated by genes related
logy to a rice retrotransposon protein (GenBank: ABA95102.1)13 and to ­carbohydrate metabolic activity (glycosyltransferases, pectin este-
­contains the longest (14,721 bp) of the 126 matches that are ≥10,000 bp. rases and polygalacturonases). Flower, fruit and embryo development,
All other families have three occurrences or less. F. vesca is the only maturation-associated amino acid metabolism, secondary metabolic
plant genome sequenced to date with no evidence of large-scale, and lipid metabolic genes were also overrepresented, consistent with
within-genome duplication (Supplementary Fig. 3). All members other reports24. In contrast, abundant transcripts in roots represented
of the rosid clade share an ancient triplication, first documented in transcription factor, kinase and signal transduction categories. We
grape14 and found in all other rosid genomes, including apple15. In observed overrepresentation of biotic and abiotic stress-related genes
strawberry, chromosome rearrangement (Fig. 2) and genome size
reduction (perhaps accompanied by preferential loss of duplicated
genes) may obscure the signature of the ancient triplication.

Repetitive sequences and transposable elements


In all plants studied, transposable elements are major components
of genomes, both in the percentage of the nuclear genome they 6

represent and the degree to which they drive gene and/or genome 3
evolution. Extensive homology and structure-based searches of the
F. vesca genome, performed as previously described for the much PG5

larger maize genome16, identified 576 different transposable ele-
ment exemplars17, including more than 6,000 fully intact transpos- FC4
able elements (Supplementary Table 4). These elements mask about PG4
22% of the assembly, compared to ~1.3% of the assembly masked by
transposable elements in Repbase18, the Munich Information Center
Figure 2  A schematic representation of the positions of 389 RosCOS markers
on the seven pseudochromosomes (FC1-7) of F. vesca in relation to their bin
map positions on the eight linkage groups (PG1-8) of the Prunus T × E reference 2
map10. The diagram was plotted using Circos; map positions from the Prunus PG

reference map were converted to approximate physical positions for comparison

by multiplying the marker positions in cM by 400,000. Markers were spaced at


100,000 nucleotide intervals within each T × E mapping bin (see URLs).

Nature Genetics  VOLUME 43 | NUMBER 2 | FEBRUARY 2011 111