Vous êtes sur la page 1sur 4

Vol 459|18 June 2009

FEATURE
Unlocking the secrets of the genome
Despite the successes of genomics, little is known about how genetic information produces complex
organisms. A look at the crucial functional elements of fly and worm genomes could change that.

DNA and expressed sequence tags, have been These two model organisms, with their ease of
Susan E. Celniker, Laura A. L. Dillon, invaluable, but unfortunately these data sets husbandry and genetic manipulation, are pillars
Mark B. Gerstein, Kristin C. Gunsalus, remain incomplete7. Non-coding RNA genes of modern biological research, and a systematic
Steven Henikoff, Gary H. Karpen, present an even greater challenge8–10, and many catalogue of their functional genomic elements
Manolis Kellis, Eric C. Lai, Jason D. Lieb, remain to be discovered, particularly those promises to pave the way to a more complete
David M. MacAlpine, Gos Micklem, that have not been strongly conserved during understanding of the human genome. Studies
Fabio Piano, Michael Snyder, Lincoln Stein, evolution. Flies and worms have roughly the of these animals have provided key insights
Kevin P. White and Robert H. Waterston, for same number of known transcription factors as into many basic metazoan processes, including
the modENCODE Consortium humans11, but comprehensive molecular stud- developmental patterning, cellular signalling,
ies of gene regulatory networks have yet to be DNA replication and inheritance, programmed
he primary objective of the Human tackled in any of these species. cell death and RNA interference (RNAi). The

T Genome Project was to produce high-


quality sequences not just for the human
genome but also for those of the chief model
In an attempt to remedy this situation, the
National Human Genome Research Institute
(NHGRI) launched the ENCODE (Encyclope-
genomes are small enough to be investigated
comprehensively with current technologies and
findings can be validated in vivo. The research
organisms: Escherichia coli, yeast (Saccharomy- dia of DNA Elements) project in 2003, with the communities that study these two organisms will
ces cerevisiae), worm (Caenorhabditis elegans), goal of defining the functional elements in the rapidly make use of the modENCODE results,
fly (Drosophila melanogaster) and mouse (Mus human genome. The pilot phase of the project deploying powerful experimental approaches
musculus). Free access to the resultant data has focused on 1% of the human genome and a that are often not possible or practical in mam-
prompted much biological research, includ- parallel effort to foster technology develop- mals, including genetic, genomic, transgenic,
ing development of a map of common human ment12. The initial ENCODE analysis revealed biochemical and RNAi assays. modENCODE,
genetic variants (the International HapMap new findings but also made clear just how com- with its potential for biological validation, will
Project)1, expression profiling of healthy and plex the biology is and how our grasp of it is far add value to the human ENCODE effort by illu-
diseased cells2 and in-depth studies of many from complete13. On the basis of this experi- minating the relationship between molecular
individual genes. These genome sequences ence, the NHGRI launched two complemen- and biological events.
have enabled researchers to carry out genetic tary programmes in 2007: an expansion of the The modENCODE project (Table 1) com-
and functional genomic studies not previously human ENCODE project to the whole genome plements other systematic investigations
possible, revealing new biological insights with (www.genome.gov/ENCODE) and the model into these highly studied organisms. In both
broad relevance across the animal kingdom3,4. organism ENCODE (modENCODE) project organisms, RNAi collections have been devel-
Nevertheless, our understanding of how the to generate a comprehensive annotation of oped and used to uncover novel gene func-
information encoded in a genome can produce the functional elements in the C. elegans and tions14–18. Mutants are being recovered through
a complex multicellular organism remains far D. melanogaster genomes (www.modencode. insertional mutagenesis19 and targeted dele-
from complete. To interpret the genome accu- org; www.genome.gov/modENCODE). tions (http://celeganskoconsortium.omrf.org;
rately requires a complete list of functionally
important elements and a description of their TABLE 1 modENCODE CONSORTIUM
dynamic activities over time and across dif- Elements Worm Fly Primary experimental data
ferent cell types. As well as genes for proteins
Transcripts Robert Waterston Susan Celniker Tiling arrays, RNASeq, RT-PCR/RACE,
and non-coding RNAs, functionally impor- (mRNAs, non- (University of (LBNL), Eric Lai mass spectrometry, 3’ untranslated
tant elements include regulatory sequences coding RNAs, Washington), (Sloan-Kettering region clone library, UAS-miRNA flies,
that direct essential functions such as gene transcription start Fabio Piano (New Institute) knockdowns of RNA-binding proteins
expression, DNA replication and chromosome sites, untranslated York University)
regions, miRNAs)
inheritance.
Although geneticists have been quick to Transcription-factor- Michael Snyder Kevin White ChIP-chip, ChIP-seq, transcription-
decode the functional elements in the yeast binding sites (Yale University) (University of factor-tagged strains, anti-
Chicago) transcription factor antibodies
S. cerevisiae, with its small compact genome
and powerful experimental tools5–6, our under- Chromatin marks Jason Lieb Gary Karpen (LBNL), ChIP-chip and ChIP-seq of
(University of Steven Henikoff chromosome-associated proteins and
standing of the more complex genomes of North Carolina), nucleosomes
human, mouse, fly and worm is still rudimen- Steven Henikoff
tary. Intrinsic signals that define the bounda- (University of
ries of protein-coding genes can only be partly Washington)
recognized by current algorithms, and signals DNA replication David MacAlpine ChIP-chip and ChIP-seq of essential
for other functional elements are even harder to (Duke University initiator proteins, origin mapping and
find and interpret. Experimental approaches, Medical Center) DNA copy number in differentiated
tissues
notably the sequencing of complementary
927
© 2009 Macmillan Publishers Limited. All rights reserved
OPINION NATURE|Vol 459|18 June 2009

RNA
Centromere polymerase
specification
Condensation Histone Replication origins and
and cohesion modifications, pre-replicative complex
variants, and Spliceosome
binding proteins
Transcription Pre-RC DNA
Nuclear pore and factors polymerase
nuclear lamin
interactions

Isolate
Domain-level chromatin
regulation and Extract
dosage compensation RNA

Origin mapping, Short RNA Long RNA


timing, miRNA mRNA
differential piRNA hnRNA
replication siRNA ncRNA
Generate
antibodies

Microarray or
sequence

Epigenetics and transcription regulation Replication Transcription and splicing

Figure 1 | DNA element functions and identification process.

www.shigen.nig.ac.jp/c.elegans), with the other issues as the opportunities arise. the different types of functional element
eventual goal of one for every known gene. The core of the modENCODE project consists will be used to reveal fundamental princi-
Genome sequences of related species are now of ten groups who use high-throughput ples of fly and worm genome biology and to
also available for both fly20,21 and worm22, methods to identify functional elements begin to uncover the emergent properties
and multiple independent wild isolates are (see Table 1). A Data Coordinating Center of these complex genomes. Some topics the
being characterized (T. MacKay, personal (DCC) will collect, integrate and display the modENCODE groups, along with interested
communication, www.dpgp.org23; R.H.W.). data. Together, the groups expect to identify members of the wider community, intend to
First-generation catalogues have been assem- the principal classes of functional element explore are outlined below, but these are only a
bled of gene expression patterns during for D. melanogaster and C. elegans. They will beginning. Our intention is to create a resource
development and in different tissues24–34. work closely together to complete the precise that will provide the foundation for ongoing
annotation of protein-coding genes, identify analysis by scientists for years to come.
Research and analysis small RNAs and non-coding RNA transcripts, Our two model organisms share many
The modENCODE project will operate as an map transcription start sites, identify promoter similarities with other metazoans, including
open consortium and participants can join motif elements, elucidate functional elements humans. They also differ from other organ-
on the understanding that they will abide by within 3ʹ untranslated regions, and identify isms in some striking ways, particularly in
the set criteria (www.genome.gov/26524644). alternatively spliced transcripts as well as the details of the establishment and maintenance
An important aim of the project is to respond signals required for splicing. Genomic sites of cellular identity, centromere biology and
to the needs of the broader Drosophila and bound by sequence-specific transcription heterochromatin function. To help under-
C. elegans scientific communities, and several factors will also be comprehensively identi- stand how the similarities and differences in
avenues will be open for suggestions on fied. Charting the chromatin ‘landscapes’ will worm and fly biology are reflected in their
which experiments to prioritize. For example, include the characterization of key histone genome sequences and how they are speci-
researchers can visit www.modencode.org/ modifications and variants, nucleosome phas- fied by genome function at the molecular
Vote.shtml now to help prioritize transcription ing, RNA polymerase II isoforms and proteins level, we will carry out comparative analyses
factors for studies using chromatin immuno- involved in dosage compensation, centromere of transcription, splicing, cis-regulatory and
precipitation followed by DNA microarray or function, replication, homologue pairing, post-transcriptional elements and chromatin
DNA sequencing (ChIP-chip and ChIP-seq), recombination and associations of chromo- function. We will subsequently investigate
and can also indicate whether they have useful somes with the nuclear envelope. how our findings apply to the control of gene
antibodies. We will seek community input on Integrative analysis of these data across expression in the human genome.
928
© 2009 Macmillan Publishers Limited. All rights reserved
NATURE|Vol 459|18 June 2009 OPINION

We also plan to use genome-wide data throughput genomic analysis cost-effective. We closely with WormBase (www.wormbase.
on pre- and post-transcriptional functional will use high-density tiling DNA microarrays org) and FlyBase (www.flybase.org) to facili-
elements to expand our understanding of gene- to interrogate the genome on a single micro- tate integration of the modENCODE data with
regulatory networks. We will study how these array (C. elegans, 26 base pair (bp) median selected data from these databases and with
two layers of control complement or reinforce spacing; D. melanogaster, 38 bp median spac- other information about these organisms.
each other during development. For example, ing) at a resolution sufficient for ChIP-chip All data will be available for bulk download
the availability of full-length transcripts and experiments. Denser arrays (D. melanogaster, through an FTP site and through a number
promoter structures for microRNA (miRNA) 7 bp median spacing), which promise higher of Generic Model Organism Database tools
genes will enable us to develop models of resolution, will be used in a move to high- (www.gmod.org): BioMart (www.biomart.
regulatory circuits that integrate the upstream throughput sequencing platforms such as the org) will provide powerful data-mining
regulation of miRNA genes with that of other Illumina Genome Analyzer to generate suffi- capabilities, and InterMine (www.intermine.
regulatory factors (such as transcription fac- cient sequence coverage for transcript mapping org) will provide a flexible interface for com-
tors) and the effects of miRNAs on their down- and miRNA and ChIP experiments. plex querying of the data, a library of canned
stream targets. We will search global patterns The biological significance of the genomic queries, and powerful list-based tools and
identified in the regulatory programs for features identified will be tested in experiments operations (http://intermine.modencode.
emerging principles of gene regulation within designed to evaluate the accuracy and func- org). As for the ENCODE pilot project data
and across species; as part of this endeavour, we tionality of subsets of the structural and regu- (www.genome.gov/10005107), new data can be
will evaluate evidence for the modular struc- latory annotations. For example, we will carry examined alongside existing data using interac-
ture of regulatory networks. out ChIP experiments on extracts from whole tive genome browsers35 for both the fly (www.
Because several developmental stages and animals or cells that lack selected regulators modencode.org/cgi-bin/gbrowse/fly) and the
diverse tissues will be sampled in both ani- (using mutants or RNAi). The tissue-specific worm (www.modencode.org/cgi-bin/gbrowse/
mals, we will be able to investigate the global DNA-binding patterns of selected regulators worm).
and dynamic activities of functional elements will be validated in transgenic animals. Figure 1 The Drosophila and C. elegans communities
across the entire genome in multiple cell types summarizes the DNA elements to be interro- have thrived because of their open culture. In
and stages of differentiation. We aim to define gated and the methods to be used. keeping with this tradition and with those of
the characteristics and rules that distinguish the genome sequencing projects, HapMap and
regulatory programs in different cell types and Data management and accessibility the ENCODE pilot project, modENCODE is
developmental stages at the DNA, chromatin, Data generated by the modENCODE a ‘community resource project’ subject to the
and post-transcriptional levels. This will enable Consortium, including those from valida- NHGRI’s data-sharing policy. The success of
us to identify the types of element that function tion experiments, will be collected, quality this policy is based on mutual and independ-
together in various spatio-temporal environ- checked, integrated and distributed through ent responsibilities for the production and use
ments and find new types of functional element, the modENCODE DCC (www.modencode. of the resource. We will release data rapidly
perhaps including those used in restricted devel- org). The DCC will collate detailed metadata (Table 2), before publication, once they have
opmental contexts. for each submitted data set to ensure broad been established to be reproducible (verifica-
An important objective is to generate specific and long-term usability. Where appropri- tion; see www.modencode.org/‘Publication
biological hypotheses that can be refined and ate, the data will also be submitted to public Policy link’ for the criteria), even if the data
tested experimentally by the broader scientific databases, for example, GenBank (www.ncbi. have not been sampled to determine if there is
community. For example, these analyses might nlm.nih.gov) and the Gene Expression Omni- biological meaning (validation). In turn, users
identify transcribed regions with novel regula- bus (www.ncbi.nlm.nih.gov/geo) or Array are asked to recognize the source of the data and
tory roles, structural regions that function in Express (www.ebi.ac.uk/microarray-as/aer/ to respect the legitimate interest of the resource
the establishment of chromatin structure or entry) and the University of California, Santa producers to publish an initial report of their
three-dimensional conformation, enhanc- Cruz Genome Bioinformatics Site (http:// work (see www.genome.gov/modencode for
ers far away from the gene they control, and genome.ucsc.edu). The DCC will also work more details). Finally, the funding agencies
alternative promoter regions. In addition, we
will use comparative analyses of the sequenced TABLE 2 GLOBAL ANALYSIS GOALS
genomes from different species to clarify the
extent of conservation and the functional con- Elements and processes Specific examples
straints associated with potential new classes of Transcribed regions Define cell- and tissue-specific transcriptional landscapes.
element and to characterize their evolutionary Annotate transcription start sites, exons, untranslated region
signatures21. structures, small regulatory RNAs and short single-exon open
Another objective of the modENCODE reading frames
project is the creation of reference data sets of Gene regulation, transcriptional regulation Identify transcription-factor binding sites in various cell
maximum utility. We have agreed that, when- and tissue types. Correlate chromatin structure marks and
ever possible, a common set of reagents will transcriptional activities for protein-coding and non-protein-
coding genes
be used to facilitate comparison of data sets
generated by different groups. For example, Post-transcriptional regulation Identify tissue-specific binding sites for miRNAs and other
small RNAs, RNA secondary structures and alternative splicing
the fly and worm groups using ChIP-chip and regulatory motifs
related methods to map the genome-wide dis-
tributions of histone modifications will use a Chromatin structure and function Identify sites of association between DNA and chromosomal
proteins involved in centromere specification, meiotic
common set of validated antibodies. In addi- recombination, dosage compensation, nuclear envelope and
tion, we will use common fly and worm strains, matrix interactions and chromosome condensation. Identify
and in the case of Drosophila, the common cell sites of incorporation of histone variants and specifically
lines Kc167, S2-DRSC, CME W1 Cl.8+ and modified histones. Correlate transcription maps for meta-
analysis of developmental chromatin dynamics.
ML-DmBG3-c2.
The fly and worm genomes are about a DNA replication Identify cell- and tissue-specific origins of replication. Correlate
thirtieth of the size of their mammalian coun- with cell- and tissue-specific transcription and chromatin
marks
terparts, making current methods for high-
929
© 2009 Macmillan Publishers Limited. All rights reserved
OPINION NATURE|Vol 459|18 June 2009

recognize the need to support the analysis and ments on a genome-wide basis. In the future, 21. Stark, A. et al. Nature 450, 219–232 (2007).
22. Stein, L. D. et al. PLoS Biol. 1, E45 (2003).
dissemination of the data. these data will provide a powerful platform for 23. Hillier, L. W. et al. Nature Methods 5, 183–188 (2008).
In addition, a variety of physical resources characterizing the functional networks that 24. Tomancak, P. et al. Genome Biol. 3,
(for example, DNA constructs and transgenic direct multicellular biology, thereby linking research0088.1–0088.14 (2002).
strains) will be produced that are likely to be genomic data with the biological programs of 25. Arbeitman, M. N. et al. Science 297, 2270–2275 (2002).
26. Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. Science 302,
of use to the broader community and to which higher organisms, including humans. ■ 249–255 (2003).
that community will have unrestricted access. 27. Li, T. R. & White, K. P. Dev. Cell 5, 59–72 (2003).
We expect to cooperate with data users in the 1. Sabeti, P. C. et al. Nature 449, 913–918 (2007). 28. Stolc, V. et al. Science 306, 655–660 (2004).
2. Neve, R. M. et al. Cancer Cell 10, 515–527 (2006). 29. Manak, J. R. et al. Nature Genet. 38, 1151–1158 (2006).
worm and fly communities to set the gold 3. Chintapalli, V. R., Wang, J. & Dow, J. A. Nature Genet. 39, 30. Tomancak, P. et al. Genome Biol. 8, R145 (2007).
standard for data release and openness. 715–720 (2007). 31. Jiang, M. et al. Proc. Natl Acad. Sci. USA 98, 218–223
4. Nichols, C. D. Pharmacol. Ther. 112, 677–700 (2006). (2001).
5. Ross-Macdonald, P. et al. Nature 402, 413–418 (1999). 32. Reinke, V., Gil, I. S., Ward, S. & Kazmer, K. Development 131,
Conclusion 6. Boone, C., Bussey, H. & Andrews, B. J. Nature Rev. Genet. 8, 311–323 (2004).
The Human Genome Project benefited 437–449 (2007). 33. Baugh, L. R., Hill, A. A., Slonim, D. K., Brown, E. L. & Hunter,
enormously from the technology developed and 7. Celniker, S. E. & Rubin, G. M. Annu. Rev. Genomics Hum. C. P. Development 130, 889–900 (2003).
Genet. 4, 89–117 (2003). 34. Kim, S. K. et al. Science 293, 2087–2092 (2001).
the experience acquired in sequencing the sig- 8. Tupy, J. L. et al. Proc. Natl Acad. Sci. USA 102, 5495–5500 35. Stein, L. D. et al. Genome Res. 12, 1599–1610 (2002).
nificantly smaller genomes of model organisms, (2005).
particularly C. elegans and D. melanogaster. The 9. Ruby, J. G. et al. Cell 127, 1193–1207 (2006).
Supplementary Information A full list of names and
10. Ruby, J. G. et al. Genome Res. 17, 1850–1864 (2007).
modENCODE project is dedicated to the next 11. Reece-Hoyes, J. S. et al. Genome Biol. 6, R110 (2005). addresses of current consortium participants is linked
phase of decoding the information stored in 12. The ENCODE Project Consortium Science 306, 636–640 to the online version of this feature at http://tinyurl.
these genomes: the comprehensive identifica- (2004). com/modENCODE
tion of sequence-based functional elements. 13. Birney, E. et al. Nature 447, 799–816 (2007).
14. Boutros, M. et al. Science 303, 832–835 (2004).
Having laid the foundation for the discovery of Acknowledgements We thank Brenda Andrews and
15. Kamath, R. S. et al. Nature 421, 231–237 (2003).
Tim Hughes for discussions on the status of yeast
many of the genetic programs underlying meta- 16. Rual, J. F. et al. Genome Res. 14, 2162–2168 (2004).
functional genomics.
zoan development and behaviour, Drosophila 17. Sonnichsen, B. et al. Nature 434, 462–469 (2005).
18. Dietzl, G. et al. Nature 448, 151–156 (2007).
and Caenorhabditis will serve as ideal model 19. Bellen, H. J. et al. Genetics 167, 761–781 (2004). Author Information Correspondence should be
systems to identify DNA-based functional ele- 20. Clark, A. G. et al. Nature 450, 203–218 (2007). addressed to S.E.C. (celniker@fruitfly.org).

Authors
Susan E. Celniker1, Laura A. L. Dillon2, Mark B. Gerstein3,4, Kristin C. Gunsalus5, Steven Henikoff6, Gary H. Karpen7, Manolis Kellis8,9, Eric C. Lai10,
Jason D. Lieb11, David M. MacAlpine12, Gos Micklem13, Fabio Piano5, Michael Snyder14, Lincoln Stein15, Kevin P. White16,17, Robert H. Waterston18
1
Department of Genome Biology, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA. 2Division of Extramural Research, National Human Genome
Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA. 3Program in Computational Biology and Bioinformatics, 4Department of Computer
Science and Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA. 5Center for Genomics and Systems Biology,
New York University, New York, New York 10003, USA. 6Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA. 7Department
of Genome and Computational Biology, Lawrence Berkeley National Laboratory, Department of Molecular and Cell Biology, University of California, Berkeley, California
94720, USA. 8Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02140, USA. 9Computer Science and Artificial
Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. 10Sloan-Kettering Institute, New York, New York 10065, USA.
11
Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA. 12Department of
Pharmacology and Cancer Biology, Duke University Medical Center, Durham, North Carolina 27710, USA. 13Department of Genetics, University of Cambridge, CB2 3EH,
UK, and Cambridge Systems Biology Centre, Tennis Court Road, Cambridge CB2 1QR, UK. 14Department of Molecular, Cellular and Developmental Biology, Yale University,
New Haven, Connecticut 06824, USA. 15Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11542 USA. 16Institute for Genomics & Systems Biology, University of
Chicago, Chicago, Illinois 60637, USA. 17Institute for Genomics & Systems Biology, Argonne National Laboratory, Argonne, Illinois 60439, USA. 18Department of Genome
Sciences and University of Washington School of Medicine, Seattle, Washington 98195, USA.

930
© 2009 Macmillan Publishers Limited. All rights reserved

Vous aimerez peut-être aussi