Vous êtes sur la page 1sur 19

Molecular Ecology (2006) 15, 17131731

doi: 10.1111/j.1365-294X.2006.02882.x

INVITED REVIEW

Blackwell Publishing Ltd

Microbial ecology in the age of genomics and metagenomics:


concepts, tools, and recent advances
JIANPING XU
Department of Biology, McMaster University, 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada, and Institute of Tropical
Medicine, Hainan Medical College, Haikuo, Hainan, China

Abstract
Microbial ecology examines the diversity and activity of micro-organisms in Earths
biosphere. In the last 20 years, the application of genomics tools have revolutionized
microbial ecological studies and drastically expanded our view on the previously underappreciated microbial world. This review first introduces the basic concepts in microbial
ecology and the main genomics methods that have been used to examine natural microbial
populations and communities. In the ensuing three specific sections, the applications of
the genomics in microbial ecological research are highlighted. The first describes the
widespread application of multilocus sequence typing and representational difference
analysis in studying genetic variation within microbial species. Such investigations have
identified that migration, horizontal gene transfer and recombination are common in
natural microbial populations and that microbial strains can be highly variable in genome
size and gene content. The second section highlights and summarizes the use of four
specific genomics methods (phylogenetic analysis of ribosomal RNA, DNADNA reassociation kinetics, metagenomics, and micro-arrays) in analysing the diversity and
potential activity of microbial populations and communities from a variety of terrestrial
and aquatic environments. Such analyses have identified many unexpected phylogenetic
lineages in viruses, bacteria, archaea, and microbial eukaryotes. Functional analyses of
environmental DNA also revealed highly prevalent, but previously unknown, metabolic
processes in natural microbial communities. In the third section, the ecological implications
of sequenced microbial genomes are briefly discussed. Comparative analyses of prokaryotic
genomic sequences suggest the importance of ecology in determining microbial genome
size and gene content. The significant variability in genome size and gene content among
strains and species of prokaryotes indicate the highly fluid nature of prokaryotic genomes,
a result consistent with those from multilocus sequence typing and representational difference
analyses. The integration of various levels of ecological analyses coupled to the application
and further development of high throughput technologies are accelerating the pace of
discovery in microbial ecology.
Keywords: Cryptococous, gene genealogy, microbial diversity, microbial sex, systems microbiology
Received 22 September 2005; revision accepted 14 December 2005

Introduction
Micro-organisms have been integral to the history and
function of life on Earth. They have played central roles
in Earths climatic, geological, geochemical, and biological

Correspondence: Jianping Xu, Fax: 1-905-522-6066; E-mail:


jpxu@mcmaster.ca
2006 Blackwell Publishing Ltd

evolution. However, until very recently, the general importance of micro-organisms has been appreciated by only a
few specialists. Indeed, micro-organisms are still most
often considered from an anthropocentric perspective, with
attention focused on the relatively few species that cause
human diseases and the potential of micro-organisms to
provide useful products and services. The recent advances
in genomics are offering fresh perspectives on this
previously underappreciated microbial world.

1714 J . X U
The microbial world contains a highly heterogeneous
group of organisms sharing only one common characteristic,
their small sizes. These organisms make up two (out of
three) entire Domains of life on Earth, the prokaryotic
Bacteria and Archaea (Woese 1987). Within the third
Domain, Eukarya, the majority of the phylogenetic diversity is contained within eukaryotic micro-organisms such
as protozoa, algae, and fungi. The prokaryotic life emerged
about 3.8 billion years ago, about 2 billion years before
eukaryotic life arose. Currently, microbial life forms are
found in virtually every imaginable ecological niche on
Earth, from the tropics to the Arctic and Antarctica, from
underground mines and oil fields to the stratosphere and
the top of great mountains, from deserts to the Dead Sea,
from above-ground hot springs to underwater hydrothermal vents.
Microbial ecology examines the diversity of microorganisms and how micro-organisms interact with each
other and with their environment to generate and to
maintain such diversities. Consequently, microbial ecologists have traditionally focused on two areas of study:
(i) microbial diversity, including the isolation, identification
and quantification of micro-organisms in various habitats;
and (ii) microbial activity, that is, what micro-organisms are
doing in their habitats and how their activities contribute
to the observed microbial diversity and biogeochemical
cycling.
Microbial diversity in the environment can be measured
by various indices such as phylogenetic diversity, species
diversity, genotype diversity, and gene diversity (Box 1).
Above the species level, microbial diversity is commonly
quantified based on evolutionary distances among observed
taxonomic groups from a specific environment (e.g. the
phylogenetic diversity based on a common chronometer
such as the 16S ribosomal RNA subunit). Below the species
level, microbial diversity is typically described using
population genetic parameters such as gene diversity and
genotype diversity. Gene diversity and genotype diversity
refer respectively to the probability that two randomly
drawn genes and genotypes in a population will be different.
At the species level, microbial diversity is measured as

Box 1 Measures of microbial diversity in


natural environments
Nucleotide diversity
Gene diversity
Genotype diversity
Species diversity
Phylogenetic diversity
Evolutionary diversity
Ecological niche diversity

Functional diversity
Morphological diversity
Structural diversity
Metabolic diversity
Metabolite diversity
Protein diversity

species diversity. There are various measures of species


diversity. One commonly used measure refers to the frequency that two randomly drawn individuals in an environment will be different species. This measure takes into
account both the number of species (species richness) and
the frequency of each species (species abundance) in the
environment. Conceptually, this measure of species diversity is similar to those used for gene diversity and genotype
diversity.
Species is the fundamental unit of biological classification and is critical for describing, understanding and
comparing biological diversities at different levels among
ecological niches. However, what constitute a species
remains controversial. For sexual organisms with the
meiotic life cycle (such as the majority of plants, animals
and sexual microbial eukaryotes), although over 20 species
concepts exist in the literature (Mayden 1997), the most
widely used is the biological species concept. In this concept,
a species consists of individuals capable of interbreeding
with each other to produce fertile progeny but are incapable of doing so with members of other species. However,
this definition is not applicable to asexual organisms lacking a regular meiotic life cycle. Such organisms include a
large proportion of eukaryotic micro-organisms as well as
all prokaryotes. Because most prokaryotes lack diagnostic
morphological characteristics, have no meiotic sexual
life cycle, but can exchange genetic materials among each
other in unusual ways, the biological species concept is
not applicable to them. Instead, the current most widely
accepted species concept for prokaryotes is an operational
one, rooted in the degree of DNADNA re-association. In
this definition, two strains belong to the same species when
their purified genomic DNA show at least 70% hybridization. This level of hybridization is equivalent to 94%
average nucleotide identity at the whole genome scale
(Konstantinidis & Tiedje 2005). It should be noted that this
prokaryotic species concept does not translate well to that
in plants and animals. For example, using this criterion, all
members of primates (e.g. chimpanzees, orangutans,
gorillas, gibbons and humans) would be belonging to the
same species (Sibley et al. 1990). Because of these and
other reasons, species concepts for both prokaryotes and
eukaryotes are still evolving (e.g. Cohan 2004; Konstantinidis
& Tiedje 2005) (Box 2).
The spatial and temporal distributions of microbial
diversities are the subjects of microbial population genetics

Box 2 The current species concepts for prokaryotes


(bacteria and archaea) and eukaryotes (plants, animals
and eukaryotic microbes such as fungi, protozoa and
algae) are not comparable.

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1715
and biogeography. The patterns of distributions are often
discussed in the context of environmental factors such as
temperature, pH, salinity, pressure, the availabilities of
water and nutrients, and the sources of energy and carbon.
These ecological factors influence microbial activities and
play very important roles in determining the spatial and
temporal dynamics of micro-organisms in natural environments. Consequently, microbial ecologists often group
micro-organisms into specific metabolic categories. For
example, depending on the energy source, micro-organisms
are called either phototrophs (obtaining energy from light)
or chemotrophs (obtaining energy from chemicals). Among
chemotrophs, if the energy sources are from inorganic
molecules (such as H2S, H2, NH3, and Fe2+), they are called
chemolithotrophs. In contrast, if their energy sources are
from organic compounds, they are called chemoorganotrophs. Similarly, depending on the carbon source, microorganisms can be either autotrophs (obtaining carbon from
inorganic sources such as CO2 and HCO3 ) or heterotrophs
(obtaining carbon from organic compounds). Some microorganisms, either in a free-living state or in association
with other organisms, can use atmospheric nitrogen as its
nitrogen source. Indeed, the diversity of microbial metabolisms extends far beyond the typical animal and plant
metabolic capabilities. Even more striking are the extreme
environmental conditions where many micro-organisms
are found and thriving. These conditions include extreme
high and low pressure, pH, oxygen and metal concentration, salinity, radiation, desiccation, and temperatures
(Rothschild & Mancinelli 2001). For example, the nitratereducing chemolithoautotroph Pyrolobus fumarii can grow
at temperatures of up to 113 C (Blochl et al. 1997).
Micro-organisms in the environment are commonly
organized into several levels of hierarchical organizations,
from simple to complex: individuals, populations, guilds
(metabolically related populations), communities (sets of
interacting guilds), and ecosystems. A microbial ecosystem consists both the microbial community and its interacting biotic (macro-organisms such as plants and animals)
and abiotic environmental factors (pH, temperature, inorganic and organic nutrients, etc.). While we commonly
associate micro-organisms as decomposers of organic wastes
and pathogens of plants, animals and humans, microorganisms can also form mutualistic associations with each
other as well as be fierce predators of other micro-organisms.
For example, the minute bacteria Bdellovibrio (0.3 m in
diameter) can quickly destroy an Escherichia coli cell many
times its own size (1 2 m) (Nunez et al. 2003).
Until very recently, most of what we know about microbial diversity and microbial activity were derived from
cultured microbes and ex situ laboratory experimental
investigations. While such studies are essential, recent
investigations using high resolution microelectronic, microscopic, and genomic tools have shown that much of what
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

we thought we knew about our natural microbial world


were in fact highly biased.
In the following sections, I will first provide a brief introduction to the main genomic methods that have been used
to examine natural microbial populations and communities. This is then followed by three topics dealing with the
impact of genomics on microbial ecology. The first topic is
on the widespread application of DNA-based genomics
technologies in microbial population studies in two specific areas: (i) the use of multilocus sequence typing to
address a variety of ecological questions; and (ii) the use of
representational difference analysis (RDA) to investigate
genome size and gene content differences among bacterial
strains. The second topic is on how genomic methods have
been used to reveal unexpected microbial diversities in
natural populations and communities. I pay special attention to how phylogenetic typing and metagenomics are
transforming our views of the diversity and activity of
micro-organisms in their natural habitats. The third topic
summarizes how large-scale genome sequencing projects
have provided unprecedented insights on the potential
functions and activities of various groups of microorganisms. I will conclude with a discussion on some of
the long-standing unresolved questions and future perspectives. It should be pointed out that the field of microbial ecological genomics is progressing rapidly with
thousands of publications accumulated in the last several
years alone. Therefore, an exhaustive review is not possible. Instead, I have used selected examples to illustrate
the impact of genomics on our current understanding of
microbial ecology and its potential implications for future
research.

Genomics tools
The word genomics has become a trendy term widely
used by the scientific community and the general public.
Originally, the term was used to describe a specific discipline in genetics that deals with mapping, sequencing and
analysing genomes. A genome refers to the complete set of
genes and chromosomes in an organism. While many people
use genomics in this narrow sense, an increasing number
of people have expanded its use to include functional analysis of entire genomes as well. These functional analytical
aspects include those on whole genome RNA transcripts
(called transcriptomics), proteins (proteomics), and metabolites (metabolomics). In addition, various combinations
of -omics terms have recently become highly fashionable.
For example, the discipline that uses genomics methods to
analyse natural ecological communities has been called
metagenomics, ecological genomics, community genomics,
and environmental genomics. In this section, the main
genomics tools and methods are briefly described with a
focus on those dealing with DNA (Box 3).

1716 J . X U

Box 3 Genomic methods in microbial ecology research


DNA sequencing
Polymerase chain reaction
DNA cloning systems (plasmid, lambda-phage,
cosmid, bacterial artificial chromosome or BAC,
yeast artificial chromosome or YAC)
DNA re-association
Fluorescent in situ hybridization (FISH)
Micro-array technology

DNA sequencing
The most significant technical advance in genomic is the
development of efficient, high throughput DNA-sequencing
techniques and instruments. While the basic principle for
DNA sequencing was established in the mid-1970s, it was
not until the mid-1990s when efficient automated DNA
sequencers and fluorescent dyes to tag the dideoxyribonucleotides (with one colour for each of the four types of
nucleotides) were developed. At present, high throughput
DNA sequencing facilities are found in most academic
institutions and many molecular biology laboratories.
Furthermore, faster and cheaper sequencing methods and
equipment are continuously developed. For example, the
recently developed pyrosequencing protocol used a novel
fibre-optic slide of individual wells. This method could
sequence 25 million bases in one 4-hour run with an
accuracy of 99.96% (Margulies et al. 2005).

Polymerase chain reaction


The second tool is the polymerase chain reaction (PCR)
that allows the analysis of minute amount of DNA from
laboratory and environmental sources. In combination with
appropriate DNA extraction protocols, PCR allows highly
selective amplification of target DNA. Indeed, the PCR
technique is permeating almost every aspect of biological
research, including many other DNA-based genomics
techniques. As will be shown below, in combination with
various gel electrophoresis techniques such as the denaturing gradient gel electrophoresis (DGGE), amplification
and analysis of the nuclear small ribosomal RNA gene
from environmental samples have significantly enhanced
our understanding and appreciation of natural microbial
diversities.

DNA cloning systems


The third highly useful genomics tool for microbial ecological studies is the availability of efficient in vivo cloning
systems (including cloning vectors and hosts). These systems

Representational difference analysis (RDA)


2-D gel electrophoresis
Denaturing gradient gel electrophoresis (DGGE)

Gas chromatography
Mass spectrophotometry
Bioinformatics

allow the separation and amplification of individual DNA


sequences from often unknown but heterogeneous gene
pools. A large variety of such systems is now available to
accommodate different types and sizes of DNA fragments.
For example, depending on the size of fragments for cloning,
the vectors may be based on plasmids (optimal range of
DNA fragments 0.52 kb, upper limit, 10 kb), bacteriophages (710 kb, 20 kb), cosmids or fosmids (35 40 kb;
45 kb), bacterial artificial chromosomes (BAC, 80120 kb,
200 kb), and yeast artificial chromosomes (YAC, 200 800
kb, 1.5 Mb). Vectors with large insert capacities are ideal
for studying genome organizations of unculturable microorganisms in the environment. For example, the blooming
field of metagenomics has benefited significantly from the
cosmid, BAC and YAC cloning systems.

Hybridization techniques
Several other traditional DNA analytical techniques have
also been widely used in microbial ecological studies. These
include DNA re-association kinetic analysis and fluorescent
in situ hybridization (FISH). Using fluorescently tagged
specific probes, FISH allows the direct observation and
estimation of micro-organisms from specific species, genera,
families or phyla in a given environmental sample. In contrast, the analyses of DNA re-association kinetics can be
used to provide estimates on the diversity of microbial
genomes in environmental DNA samples.
More recently, the high throughput micro-array technology has been applied to analyse the distributions of
genes and species in natural microbial consortia (Zhou
2003). DNA micro-arrays are glass surfaces to which arrays
of specific DNA fragments of various lengths have been
attached at discrete locations. These fragments serve as
probes for hybridization. Under conditions suitable for
hybridization, the DNA spots on the chip are exposed to a
solution containing a complex sample of fluorescent-labelled
DNA. These arrays may contain probes of lengths from
25 to several hundred or even over a thousand base pairs.
While most micro-arrays are derived from single genomes,
arrays containing specific genes from multiple genomes
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1717
can also be very useful for studying the distributions and
activities of groups of micro-organisms in nature (Zhou
2003; Lehner et al. 2005).

Representational difference analysis


Because large-scale DNA sequencing is still an expensive
enterprise, for most species, only one of two strains will be
completely sequenced. To study variation among strains
in species with sequenced representatives, a technique
called representational difference analysis (RDA) has
been developed (Lisitsyn et al. 1993). This method combines several molecular techniques such as DNADNA reassociation, selective PCR, cloning, and DNA sequencing.
This technique is especially powerful for genome size and
gene content comparisons among strains in prokaryotic
species. This is because strains in many prokaryotes vary
widely in their genome sizes and the differences often
contribute to their metabolic and ecological differences
(e.g. Bergthorsson & Ochman 1995, 1998; Table 1; see
also below section Unexpected microbial diversity from
environmental sources as revealed by genomics tools).

Tools for the analyses of the transcriptome, proteome, and


metabolome
Aside from advances in techniques for analysing DNA,
technical breakthroughs for analysing messenger RNA
(mRNA), proteins, metabolites as well as interactions among
these cellular constituents have also become common.
For example, the high throughput micro-array technology
has greatly increased the efficiency of genome-wide gene
expression studies, allowing the analysis of potential
genomeenvironment interaction of microbial communities in both laboratory and natural settings. Similarly, 2-D
gel electrophoresis, mass spectrometry, and gas chromatography are providing unprecedented access to the
constituents of microbial community proteins and small
metabolites.

Table 1 Genome size and ecological niche comparisons among


250 sequenced prokaryotic genomes (habitat classification and
data are based on NCBI information as of August 2005)
Genome size (Mb)
Habitat

No. of
genomes

Mean ( SD)

Range

Terrestrial
Multiple
Aquatic
Host-associated
Specialized
Unknown

11
65
26
122
23
3

4.92 ( 1.13)
4.29 ( 1.87)
3.14 ( 1.60)
2.57 ( 1.64)
2.29 ( 0.92)
3.47 ( 2.37)

3.287.25
1.409.12
1.317.15
0.499.11
0.715.37
0.805.31

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

Bioinformatics
Of all the methods mentioned above, none would have
been successful in microbial ecological research without
bioinformatics tools. Broadly defined, bioinformatics refers
to the use of computers to seek patterns in the observed
biological data and to propose mechanisms for such patterns.
As can be seen from below, bioinformatics not only can
help us directly address experimental research objectives but
also can integrate information from various sources and seeks
patterns not achievable through experimentation alone.

Genomics tools in ecological genetics studies of


cultured microbial populations
This section highlights the impact of DNA-based molecular
techniques on our understanding of microbial diversity at
below the species level. I will provide examples in two
specific areas. The first is on how multilocus sequence typing
(MLST) has improved our understanding of microbial
diversity and population structure, with a special focus on
the inferences of the relative roles of clonality and recombination in generating genotype diversity in microbial
populations. The second topic is on how the use of RDA
can help us reveal the tremendous diversity in genome
content among microbial strains.

Multilocus sequence typing


The development of highly affordable, reliable, and efficient
DNA sequencing technology has accelerated many areas
of scientific research. One prominent example is the multilocus sequence typing (MLST) of microbial populations.
As the name suggests, MLST refers to the use of DNA
sequences from multiple regions in the genome for discriminating strains in populations. Though the term was
coined only in 1998 for typing human bacterial pathogens
(Maiden et al. 1998), its use in microbial ecological and
evolutionary analyses dates back more than two decades
ago. It has various other synonyms such as multiple gene
genealogical analysis (MGGA) or comparative genealogical
analysis (CGA) (e.g. Xu et al. 2000; Xu 2005).
There are several advantages of analysing multiple loci
over the analysis of data based on a single locus: (i) it can
generate more information, thus generally more robust
conclusions; (ii) it samples multiple regions of the genome
and thus results are more representative of the whole
genome; and (iii) in many prokaryotes, horizontal gene
transfer is very common and if the selected single gene
happened to have been horizontally transferred, information derived from this gene will not be representative of
other parts of the genome (Xu 2005).
Compared to other types of strain-typing methods
(e.g. multilocus enzyme electrophoresis or MLEE, random

1718 J . X U
amplified polymorphic DNA or RAPD, amplified fragment
length polymorphisms or AFLP, restriction fragment length
polymorphisms or RFLP, PCR-RFLP and PCR fingerprinting)
that have been applied to analyse microbial populations,
DNA sequence-based typing has many advantages. First,
nucleotides in a DNA sequence are unambiguous. Such
certainty is essential for many analyses. Second, nucleotides
in a given DNA fragment typically share extended evolutionary history. Such sharing cannot be assumed between
genetic markers in different parts of genomes as those
obtained with other methods. Third, DNA sequences can
be easily stored in and retrieved from public databases
such as GenBank. Existence of such public databases makes
data-sharing among investigators possible. Fourth, many
analytical tools for DNA sequences are available. Indeed,
many methods have been developed to infer a variety of
processes governing the changes in populations and
species (Xu 2005).
MLST has been used to study the ecological genetics of
many microbial populations. It provides fine-scale measures
of gene diversity and genotype diversity among microbial
populations. These patterns of diversity have been used to
infer a variety of ecological and evolutionary processes such
as gene flow, cryptic speciation, hybridization, and the
relative importance of clonality and recombination among
analysed populations (Box 4). In human pathogenic
bacteria where much of the initial MLST work was carried
out, MLST allows the identification of medically important
strains and clones. There are several recent topical reviews
for readers interested in MLST of human bacterial pathogens (e.g. Urwin & Maiden 2003; Feil & Enright 2004). In
contrast, other environmentally more relevant groups of
micro-organisms are less researched or discussed.
Using specific examples, the following two subsections
illustrate how MLST has been used to address microbial
ecological questions. The first subsection provides a brief
description on how MLST has been used to address
evolutionary divergence, dispersion, hybridization, and
the origin of a population in a soil basidiomycete fungus,
Cryptococcus neoformans. The second subsection highlights
recent evidence for recombination in natural populations
of viruses, bacteria, protozoa, algae and fungi.

Box 4 MLST is a powerful method to address a


variety of ecological issues in microbial populations
Clonality
Recombination
Speciation/historical
divergence
Gene flow/dispersion/
migration

Hybridization
Niche specialization
Host shifts
Adaptive evolution

MLST in C. neoformans.
C. neoformans (= Filobasidiella neoformans) is a soil fungus
that can cause significant infections in humans and other
mammals throughout the world. This species has been
traditionally classified into five serotypes A, B, C, D, and
AD. To understand the evolutionary relationships among
strains, geographic populations, and serotypes and to address
ecological genetic questions, a series of gene genealogybased studies were conducted. The first analysed 34 strains
from various locations around the world, including 14
serotype A strains, 7 serotype D strains, 3 serotype B strains,
5 serotype C strains, 3 serotype AD strains and 2 strains
whose serotypes could not be determined (Xu et al. 2000).
Fragments of four genes were analysed for each strain, three
from different chromosomes of the nuclear genome and one
from the mitochondrial genome. Phylogenetic analysis of each
of the four genes indicated considerable divergence among
serotypes A, D, B, and C, suggesting that individual serotypes
A, D, B, and C are good phylogenetic species (Fig. 1).
However, there was little geographic pattern of genetic
variation. No correlation between geographic distance and
DNA sequence divergence among strains was observed
either within a serotype or the whole analysed population.
The results are consistent with recent dispersals of C.
neoformans throughout the world (Xu et al. 2000; Xu 2002).
Strains of serotype AD were quite different from those of
strains A, B, C, and D. While most predominantly strains of
serotypes A, B, C, and D examined so far were haploids,
strains of serotype AD are diploid or aneuploid. Furthermore, direct sequencing of PCR products from serotype
AD strains often failed to obtain clear chromatograms and
DNA sequences. Such results suggested sequence heterogeneity within individual strains. To investigate their origin
and relationships to strains of other serotypes, alleles of
two different genes from strains of serotype AD were
individually cloned, sequenced and compared to strains of
serotypes A, B, C, and D (Xu et al. 2002; Xu & Mitchell
2003). Sequence comparisons revealed that most strains
contained two different alleles with one allele highly similar to the serotype A group and the other to the serotype D
group. Further phylogenetic analyses identified that these
serotype AD strains were recent hybrids between strains
of serotypes A and D, and that there have been multiple
hybridization events in C. neoformans (Fig. 2; Xu et al. 2002;
Xu & Mitchell 2003). A recent study applied the same
MLST method to identify the origin of a Cryptococcus population responsible for an unusual outbreak in animal and
human populations on Vancouver Island, British Columbia,
Canada (Kidd et al. 2005). The analyses suggested that the
Vancouver Island population contained at least two evolutionary divergent elements shared by strains from many
other geographic areas, consistent with cryptic speciation
and recent migration observed earlier for Cryptococcus (Xu
et al. 2000; Kidd et al. 2005).
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1719

Fig. 1 One most parsimonious tree for 34 isolates of Cryptococcus neoformans from each of the four gene regions sequenced. CI, consistency
index; RI, retention index. Numbers above each branch are bootstrap values > 50% and based on 500 replicates. For URA5 and LAC trees,
branches with > 50% of bootstrap values were also strict consensus branches. Strain designation indicates serotype, isolate name, and
geographic origin (CA, California; NYC, New York City; NC, North Carolina, all from the USA). With the exception of five strains (see text),
all major phylogenetic groups correspond to traditional classifications. Of the two serologically untypable strains, one (M0024) clustered
consistently with the serotype D group and the other (M0053) clustered consistently with the serotype A group. Two of the three strains of
serotype AD, CN110.97 and CN196.88, clustered consistently with the serotype A group, while the other (KW5) lacked a consistent affinity
with any of the serotypes. Scale bar represents one nucleotide substitution. (Xu et al. 2000). Reproduced by permission.

Clonality and recombination in microbial populations.


All microbes can reproduce asexually and generate clones
and clonal lineages. As expected, in natural populations of
all microbial species examined (including viruses, bacteria,
protozoa, algae and fungi), signatures of clones and clonal
lineages are commonly found. These population genetic
signatures include (i) limited or lack of genetic variation
among individuals, (ii) over-representation of certain
genotypes, and (iii) significant associations among alleles
located on the same or different genomic regions (Xu 2004).
While clonal reproduction is expected and commonly
observed in natural microbial populations, the importance
of recombination has been rather obscure (e.g. Lenski 1993;
Maynard Smith et al. 1993; Feil & Enright 2004). Unlike plants
and animals where sexual reproduction (hence recombination)
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

can often be observed directly in nature, recombination in


natural populations of micro-organisms has to be inferred
using gene and genotype frequencies. The key notion of
this inference is that in purely clonal populations, alleles
from genes in different parts of the genome should give
identical evolutionary patterns among individuals in the
population and that these alleles should be in significant
linkage disequilibria. In contrast, recombination would break
up these associations and generate linkage equilibrium.
Using MLST, congruent genealogies for genes distributed
in diverse genomic locations would be consistent with clonality and incongruent genealogies suggest recombination.
Over the past two decades, numerous studies have confirmed that genetic recombination is ubiquitous in natural
populations of viruses, bacteria, protozoa, algae, and fungi.

1720 J . X U
Fig. 2 One of the 10 most parsimonious
trees for the 28 LAC sequences from 14 strains
of serotype AD in Cryptococcus neoformans.
For comparison, five representative sequences
from serotype A (E1, CN-A, MMRL750, J10
and ZG280) and five from serotype D (B10,
CN-D, J9, MMRL751 and MMRL757) were
included in this figure. These 10 sequences
were shown in Fig. 1 and represented the
genetic diversity of serotypes A and D
strains. Numbers above branches are
bootstrap values > 50% and based on 1000
replicates. Designations for strains of serotypes A and D included the isolate name,
geographic origin (CA, California; NYC,
New York City, both in the USA), and
serotype. For the 28 serotype AD sequences,
strain designations are followed by 1 or
2 to indicate the two alleles within each
strain. Midpoint rooting is used for this
phylogeny but the tree topology is identical
to that when serotype B or C sequences were
used as outgroups. Scale bar represents
one nucleotide substitution (Xu et al. 2002).
Reproduced by permission.

Box 5 Genomic studies suggest all microbial populations have a clonal component. However, signatures
of recombination are pervasive in natural populations of viruses, bacteria, fungi, algae and protozoa.
Despite significant efforts, no ancient asexual microbes
have been convincingly demonstrated.

Indeed, despite extensive investigations, and while the


frequencies of recombination have been difficult to quantify, no ancient asexual microbial populations or species
have been found (Box 5). Below are a few recent examples
of genetic recombination identified in natural populations
of representative groups of micro-organisms using MLST
or whole genome sequences.
Recombination in viral populations. One of the best-known
examples of viral sexuality is probably that of the influenza
A virus the causal agent of the common human flu. This
virus has a genome with eight segments of single-stranded
RNA. When co-infection of different viral strains occurs, a
large number of recombinant influenza A viruses can be
produced. These recombinants generate antigenic shifts
and have been credited for some of the deadliest flu
epidemics in recent human history (Capua & Alexander

2002). Recombination has also been observed in many


bacteriophages (Hendrix 2003), plant viruses (Keese &
Gibbs 1993), and animal and human viruses. Examples of
human viruses exhibiting recombination include, but are
not limited to, the dengue virus (Tolou et al. 2001), the
human immunodeficiency virus (Yamaguchi et al. 2003),
and the hepatitis B virus (Miyakawa & Mizokami 2003).
Recombination in prokaryotes. MLST has revealed abundant
evidence for recombination in natural populations of
prokaryotes. Some of the well-known examples include
the common human pathogens Escherichia coli, Neisseria
meningitidis, Streptococcus pneumoniae, Hemophilus influenzae,
and Staphylococcus aureus (e.g. Feil & Spratt 2001). Different
degrees of recombination were detected in populations
of these species, with E. coli and H. influenzae showing
relatively low rates of recombination while N. meningitidis,
Str. pneumoniae, and Sta. aureus showed high rates (Feil &
Spratt 2001). In nonhuman pathogenic bacteria such as the
nitrogen-fixing bacterium Sinorhizobium meliloti, evidence
for recombination is also pervasive (Sun S. and Xu J.,
unpublished). Recent comparative analysis of whole prokaryotic genomes identified that bacterial recombination
often extends beyond the traditional species boundary.
This phenomenon is commonly referred to as horizontal
gene transfer or lateral gene transfer and includes genetic
exchange between species from different genera, families,
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1721
and occasionally across kingdoms and/or domains. Indeed,
signatures of horizontal gene transfer are ubiquitous
among the sequenced prokaryotic genomes (e.g. Koonin
2003).
Recombination in eukaryotic microbes. Similar to observations
in natural viral and bacterial populations, molecular
investigations have identified that almost all eukaryotic
microbial populations show signatures of recombination
in nature. Examples include those from the algal species
Bostrychia moritziana (West & Zuccarello 1999); pathogenic
protozoan species such as Trypanosoma cruzi (the causal
agent of African sleeping sickness, Bogliolo et al. 1996) and
the malaria parasites Plasmodium falciparum (Conway et al.
1999) and Plasmodium vivax (Putaporntip et al. 2002); fungal
species such as C. neoformans mentioned above (Xu &
Mitchell 2003). Interestingly, many of the fungal species
previously thought to reproduce only asexually (the
Deuteromycota or Fungi Imperfecti) have been found to
contain signatures of recombination in natural populations
(Xu 2005). Among examined fungi, the degrees of sexuality
differ greatly, from panmictic to largely clonal (James 2005;
Pujol et al. 2005; Xu et al. 2005). At present, plant and human
pathogens dominate the examined species in the literature.
However, limited evidence from other groups of fungi
suggests a similar pattern: abundant evidence for clonality
and limited but unambiguous evidence for recombination
(James 2005; Pujol et al. 2005; Xu et al. 2005).

RDA in analysing prokaryotic gene content differences


among strains
Variations in genome sizes among strains within and
between species are common in bacteria. For example, the
genomes of natural isolates of the common bacterium
Escherichia coli can vary by more than 1 Mb (Parkhill &
Thomson 2004). Among the serotypes of another common
bacterium, Salmonella enterica (var. enteriditis; var. paratyphi;
var. typhi, and var. typhimurium), chromosome sizes can
differ by 300 kb (Parkhill & Thomson 2004). Among the
sequenced prokaryotic species, the genome sizes vary by
over 18 folds, from the obligate archaeon parasite Nanoarchaeum equitans that has a genome size of about 490 kb
(Waters et al. 2003) to the soil bacterium Streptomyces
avermitilis that has a genome size of over 9000 kb (Omura
et al. 2001). The genomic differences among sequenced
bacterial species will be discussed in a later section. In this
section, the focus is on analysing the naturally occurring
differences among bacterial strains within species. Up till
now, the focus has been on human pathogenic bacteria,
including N. meningitidis (Bart et al. 2000), Neisseria gonnorhoea
(Tinsley & Nassif 1996), Vibrio cholerae (Calia et al. 1998),
Bordetella spp. (23), and E. coli (Allen et al. 2001). Using RDA
and down-stream functional characterization, many strain 2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

specific genes in the above-mentioned species were found


to play important roles in ecological adaptations such as
host specificity, nutrient utilization, stress tolerance, pathogenicity, and antibiotic resistance. Below, I describe a recent
example of using RDA to analysing genome size and gene
content variation among strains of a nitrogen-fixing soil
bacterium Sinorhizobium meliloti (Guo et al. 2005).
The sequenced Si. meliloti strain Rm1021 has a tripartite
genome structure with one chromosome (3.65 Mb) and two
megaplasmids pSymA (1.35 Mb) and pSymB (1.68 Mb).
Using the RDA method (Fig. 3), a large number of novel
DNA sequences not present in the sequenced laboratory
model strain Rm1021 of Si. meliloti were identified. In this
study, we used strain Rm1021 as the driver and the type
strain of Si. meliloti ATCC9930, which has a genome size
370 kb bigger than strain Rm1021, as the tester. Among
the 85 novel DNA fragments examined, 55 showed no
obvious homologues anywhere in the public databases. Of
the remaining 30 sequences, 24 contained homologs to the
Rm1021 genome as well as unique segments not found in
the Rm1021 genome; 3 contained sequences homologous
to those published for another Si. meliloti strain but absent
in Rm1021; 2 contained sequences homologous to other
symbiotic nitrogen-fixing bacteria, Rhizobium etli and
Bradyrhizobium japonicum and 1 contained a sequence with
an 87% sequence identity to the 6-aminohexanoate-dimer
hydrolase gene on the plasmid of Pseudomonas spp. NK87.
Interestingly, this protein was found capable of degrading
nylon oligomers (Yomo et al. 1992; Kanagawa et al. 1993).
Nylon oligomers are among the compounds not present in
natural environments until synthesized and released by
humans very recently. The distribution of 12 of the above
85 novel sequences among a collection of 59 natural Si. meliloti
strains were further analysed using PCR. The distribution
varied widely among the 12 novel DNA fragments, from
1.7% to 72.9% (Guo et al. 2005). Our recent experiments
show that micro-arrays fabricated based on the genome
sequence of model strains can also be used very effectively
to examine the distributions of genes among strains (Fig. 4;
Guo & Xu, unpublished; Box 6). The exact ecological roles
of some of these sequences are being examined.

Unexpected microbial diversity from


environmental sources as revealed by genomics
tools
In this section, I will focus on how modern genomics tools
are helping us to reveal microbial diversity in natural
microbial communities. Until very recently, microbial
diversity in the environments is estimated using culturedependent approaches. However, for two reasons, the
culture-dependent methods cannot accurately describe
naturally occurring microbial communities. First, our
current culturing methods target only those we know how

1722 J . X U
Fig. 3 Overview of the representational
difference analysis of genomic differences
between strains of Sinorhizobium meliloti
(modified from Guo et al. in press). Tester
(T): ATCC9930. Driver (D): Rm1021. Filled
black boxes: DNA adaptors. Unfilled boxes:
tester DNA. Shaded boxes: driver DNA.

Fig. 4 Application of micro-array in the analysis of genomic differences between strains of Sinorhizobium meliloti. In this figure, red
represents hybridization signal from one strain; green represents hybridization signal from a different strain; and yellow represents that
both strains have the probe sequence. In each of the four subarrays, there are three vertically divided repeats. As can be seen from the arrays,
repeatability is high of using micro-array to screen for gene content differences among strains.

Box 6 Representation difference analysis (RDA) and


micro-array technology are powerful methods for
discovering whole genome differences among natural
prokaryotic strains.

to culture. For most unknown micro-organisms, we simply


dont know how to grow them. Second, even among
culturable micro-organisms, the observed diversity on
standard microbiological media may not be representative

of those in nature. This is because while thousands of


media and growth conditions have been developed over
the years to culture various micro-organisms, very few
researchers have the facility or manpower to experiment
all the conditions for natural microbial samples. The
application of culture-independent genomics tools in the
last two decades is allowing more accurate estimations.
Below, I provide a summary to show how four specific
methods (phylogenetic analysis of the ribosome RNA
(rRNA) genes, DNADNA re-association kinetics, metagenomics, micro-arrays) have been used to reveal microbial
diversity in natural environments.
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1723

Phylogenetic analyses of environmental ribosomal RNA


The use of culture-independent methods to estimate
microbial diversity in the environment started in the
mid-1980s (Pace et al. 1985). The initial scheme involved
isolating total DNA directly from the environment, cloning
the DNA using vectors such as bacteriophage-lambda, and
screening for clones that hybridized to the rRNA probes,
and sequencing the positive clones. Many types of rRNA
sequences not present among cultured microbes from
the same samples were identified. The incorporation of
gene-specific PCR before the cloning step in the late 1980s
significantly streamlined the procedure and allowed more
direct estimation. The very first application of PCR in
phylogenetic analysis of mixed microbial communities in
ocean waters led to the discovery of ubiquitous and
abundant groups of new micro-organisms (Giovannoni
et al. 1990). In addition, this study identified significant
genetic microheterogeneity among closely related phylogenetic types. Since the beginning of the 1990s, there has
been widespread application of PCR-based analyses of 16S
rRNA to examine mixed microbial communities in diverse
environments.
Phylogenetic comparisons of rRNA genes from environmental sources have led to the discovery of many novel
microbial taxonomic groups. Indeed, many new major groups
of micro-organisms have been found only through cultureindependent surveys. The following sections highlight
recent progresses for the major microbial groups (Box 7).
Bacteria. In 1987, based on rRNA sequence data, Woese
identified 12 major divisions (phyla) in the Domain Bacteria.
The analysed bacteria represent almost all major cultured
groups of Bacteria accumulated during the previous
century of microbiological research. In just over a decade,
culture-independent surveys identified that there are at
least 40 well-resolved major bacterial divisions. That is,
there are about 30 major bacterial divisions with no or very
few cultured representatives in our collection (Hugenholtz
et al. 1998; Konstantinidis & Tiedje 2004). These discoveries
are now guiding a coordinated effort by the microbiology
community to culture representatives from many of the

Box 7 Genomic analysis of natural microbial communities are revealing extremely rich and highly
variable DNA sequences from forest soils, pastures,
aquatic environments in both pristine and contaminated environments. Bioinformatic analyses of such
sequences suggest the existence of many uncultured
taxonomic groups of viruses, bacteria, archaea, fungi
and protozoa.

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

unknown major divisions of Bacteria in order to study


their genetic, physiological and ecological properties.
Archaea. The culture-independent methods have also
revealed major new types of Archaea. At present, there are
about 300 cultured and named archaeal species, primarily
belonging to phylum Euryarchaeota, with a few examples
from phylum Crenarchaeota, one from Nanoarchaeota
and none from Korarchaeota. Schleper et al. (2005) compiled
over 8000 deposited archaeal rRNA gene sequences from
various natural environments. Phylogenetic analyses suggested that Domain Archaea contains at least 50 distinct
phylogenetic groups with 33 from the current Euryarchaeota, 13 from Crenarchaeota, 1 each from Korarchaeota,
Nanoarchaeota, and the ancient archaeal group (AAG). The
divergence among these phylogenetic groups is similar to
those among many bacterial phyla. Among these 50 phylogenetic groups, only 13 have cultured representatives.
In addition, before the application of culture-independent
methods, Archaea are thought to be only present in
extreme habitats. Recent investigations have identified
that Archaea are also widespread in diverse nonextreme
habitats such as gardens and forests, water and sediments
in marine and freshwater lakes, as well as extreme habitats
such as hot springs, saline lakes and deep-ocean thermal
vents (Black Smokers). For example, in the marine environment at depths 1005000 m, the average Archaea density is
about 1 105/mL, accounting for about 20% of all microbial
cells in the ocean (Karner et al. 2001). In 2002, a tiny archaeon
appropriately called Nanoarchaeum was reported. This archaeon was found to live in an obligate association with another
archaeon in the genus Igneococcus. Phylogenetic analysis
indicated that Nanoarchaeum has diverged significantly from
all known archaeal rRNA sequences (Huber et al. 2002).
However, it should be pointed out that a recent phylogenetic
analysis using ribosomal protein gene sequences from many
archaea species suggested significant uncertainty in the placement of Nanoarchaeum in the tree of life (Brochier et al. 2005).
Eukaryotic microbes from anoxic environments. The most
deeply divergent of known eukaryotic lineages are found
in anaerobic or micro-aerobic environments. Ecologically
and evolutionary, this group of organisms are also the
least known among eukaryotes. Anoxic environments have
existed throughout the history of Earth. Therefore, such
environments may harbour unknown diversity of eukaryotic
microbes. Indeed, Dawson & Pace (2002) identified a very
high eukaryotic diversity from both marine and freshwater sediments. Their analysis identified seven major
phylogenetic lineages distinctly different from all known
eukaryotic kingdoms such as fungi, plants and animals.
Fungi. Approximately 80 000 fungal species have been
identified and named, and these species are grouped into

1724 J . X U
five main phyla: Chytridiomycota, Zygomycota, Glomeromycota, Basidiomycota, and Ascomycota (Moncalvo 2005).
Several recent studies of environmental DNA identified
major groups of unexpected fungal diversity in a variety of
environments. For example, in the analysis of fungal DNA
from the roots of the grass Arrhenatherum elatius, Vandenkoornhuyse et al. (2002) found 49 unique phylotypes from
a random library of 200 18S rRNA clones. Surprisingly,
only 7 of the 49 were found closely related to known sequences (> 99% identity). They found five distinct lineages
significantly different from all known fungal sequences (in
a pool of over 1200 at their time of analysis). In another
study by Schadt et al. (2003), culture-independent methods
were used to assess the seasonal dynamics of fungal
diversity in tundra soil in Colorado. Results revealed three
major groups of fungi significantly different from existing
classes and phyla. Their results also demonstrated that
fungi account for the majority of the biomass under snow
in the analysed environment (Schadt et al. 2003). Results
from these and other fungal community studies suggest
that there are likely over 1.5 million species of fungi in
Earths biosphere, a number about 20 times of the currently
named fungal species.
Viruses. Viruses are extremely abundant in natural environments. They contribute significantly to both prokaryote
and eukaryote population dynamics. Current cultureindependent studies identified that both DNA-based and
RNA-based viruses are common in terrestrial as well as
freshwater and marine environments (Edwards & Rohwer
2005). For example, in an analysis of picorna-like viruses (a
group of positive-sense single-stranded RNA viruses that
are major pathogens to plants and animals), Culley et al.
(2003) identified high, unexpected diversity in the sea.
Indeed, all of the picorna-like sequences from marine
samples were different from known picorna-like viruses
in the databases. Of specific note is a virus isolated in this
study that is a lytic pathogen to a toxic-bloom-forming alga
Heterosigma akashiwo. This result suggests that picorna-like
viruses may be important contributors in the regulation of
marine phytoplankton population dynamics.

DNADNA re-association kinetics


DNADNA re-association kinetics has long been used to
determine the overall genomic relationships between
organisms. The current operational definition of bacteria
species concept using 70% hybridization is rooted in this
kinetics. During DNADNA re-association, complementary
single-stranded DNA re-anneal to each other to form double
strands and the rate of re-annealing is positively correlated
to the degree of similarity. Torsvik et al. (1998) extended this
principle to analyse the complexity of environmental DNA
samples. The basic idea is that more complex environmental

DNA will take longer to re-anneal. The rate of re-association


can be compared to known samples of complexity such as the
Escherichia coli genome to derive the total genomic complexity of environmental DNA. They found that estimates
of environmental genome complexity derived from DNA
DNA re-association kinetics were about 100 times higher
than those derived from laboratory culture estimates (Torsvik
et al. 2002). This result is similar to the comparison between
phylogenetic methods based on fluorescent in situ hybridization (FISH) using signature prokaryotic sequences and
culture-dependent method (Torsvik et al. 2002). Their
analyses identified that terrestrial environments generally
contain higher genome complexity than aquatic sediments.
Among the three terrestrial niches compared, while the
number of prokaryotic cells per cubic centimetre of soil is
similar among them (about 10 billion), the pristine pasture
and forest soils contain over 10 times the genome complexity (equivalent to 35008800 E. coli genomes) as that of
the agricultural field soils (equivalent to 140350 E. coli
genomes) (Torsvik et al. 2002). Recently, improved analytical
methods showed that in fact, more than 1 million distinct
genomes might exist in the above-mentioned pristine soil,
exceeding previous estimates by two orders of magnitude
(Gans et al. 2005). Furthermore, it was estimated that metal
pollution could reduce the genomic diversity of pristine
environments by more than 99.9%, revealing the highly
toxic effect of metal contamination, especially for rare
microbial taxa (Gans et al. 2005).

Metagenomics
Metagenomics refers to the study of the collective genomes
in an environmental community. Such a community may
be a soil or a marine water sample that contains substantially more genetic information than is available in the
cultured subset. Studies of metagenomes typically involve
cloning fragments of DNA isolated directly from microbes
in natural environments, followed by sequencing and
functional analysis of the cloned fragments. While most of
the techniques for metagenomics have existed for quite
some time and are used routinely in molecular biology
research, their application in analysing unknown environmental DNA samples have opened a floodgate of exciting
research findings.
The phylogenetic analysis of environmental microbial
diversity was an early form of metagenomics. Over the
years, several significant trends for metagenomic studies
have emerged. First, the cloned DNA fragments have been
getting larger and larger in attempts to clone long stretches
of DNA from the same genome to allow the study of the
structure and function of potentially whole unknown/
uncultured genomes in the environments. Such an objective has propelled the development of new DNA isolation
methods as well as improved cloning systems. At present,
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1725

Box 8 Metagenomic studies have identified many


novel microbial genes coding for metabolic pathways
such as energy acquisition, carbon and nitrogen
metabolisms in natural environments that were previously considered to lack such metabolisms.

the bacterial artificial chromosome vector system is the


most commonly used for metagenomic studies. Second,
the study sites have expanded tremendously. At present,
metagenomic libraries and DNA sequence information
exist for microbial communities from many of the worlds
ecological niches. Third, the number of sequences generated
in individual studies has been increasing. For example, a
recent study obtained over 1.6 billion base pairs of DNA
sequences and about 1.045 billion were nonredundant from
a marine environment (Venter et al. 2004) (Box 8). Below I
will briefly review and discuss recent metagenomic
studies of microbial communities from the ocean, soil, and
an acid mine drainage.
Metagenomic analysis of marine microbial communities.
Marine microbial communities are among the first to be investigated using culture-independent genomics approaches
(Giovannoni et al. 1990). Marine microbial communities
are complex and contain heterogeneous micro-organisms
including viruses, bacteria, archaea, and eukaryotic microorganisms. Because of the size differences among these
groups of organisms, typical studies use filters to first
select the target size category of microbes. Phylogenetic
analyses have identified numerous novel DNA sequences
and phylogenetic groups in all groups of organisms
surveyed. In combination with other genomics tools, these
studies have led to other important discoveries. Two
specific studies are highlighted below.
In a classical metagenomic study of genome fragments
from a BAC library of marine picoplankton, Beja et al. (2000)
identified a new class of genes of the rhodopsin family,
named proteorhodopsin, from an uncultivated alphaproteobacterium SAR86. At that time, this rhodopsin
family was known to exist only in extremely halophilic
(salt-loving) archaea and had never before been observed
in cultured bacteria. Unlike the archaea rhodopsin that
does not express properly in model laboratory strains, the
proteorhodopsin gene from SAR86 expressed readily in the
laboratory model bacterium E. coli and it functioned as a
light-driven proton pump. Later studies identified that this
new type of light-driven energy generation process is in
fact widespread in the ocean and that there are optimized
absorption spectra of bacterial rhodopsins at different
depths of ocean water (Beja et al. 2001). In addition to this
form of light energy harvesting, the widespread importance
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

of oxygenic phototrophy in the ocean has been confirmed


by metagenomic studies, and another phototrophy, the
anoxigenic phototrophy, that was previously regarded as
playing only a minor role in ocean water productivity has
also been found to be very common in ocean surface
waters (Beja et al. 2002).
One of the most extensive microbial metagenomic
studies in the ocean was the shotgun sequencing of microorganisms of size ranges from 0.1 to 3.0 m in the Sargasso
Sea in the Atlantic Ocean near Bermuda (Venter et al. 2004).
Their study generated almost 2 million sequence reads,
yielding over 1.6 billion base pairs of raw DNA sequence.
Based on sequence relatedness and unique rRNA gene
counts, the analysis suggested that these DNA fragments
were derived from at least 1800 genomic species including
148 previously unknown bacterial phylogenetic types.
Their analysis also identified spatial variation in species
richness and relative abundance among the four sampled
sites (Venter et al. 2004). Computational analysis of the
data identified over 1.2 million potential unique protein
coding genes. This number is astonishing considering that
at that time, only about 140 000 protein data entries were
available in the curated SwissProt protein database. Among
the 1.2 million potential protein-coding genes, at least 782
new rhodopsin-like photoreceptors were identified, confirming the importance of this type of phototrophy in the open
sea. Of the specific group of micro-organisms identified,
one stood out. This organism, most likely a member of the
genus Burkholderia, had 21-fold coverage and comprised
38.5% of the sequence data from one of the four samples.
Burkholderia is typically found in terrestrial environmental
samples and the identification of a species in this genus in
the sea at such a high frequency led the authors to suggest
that terrestrial environments or coastal animals might play an
important role in marine microbial community structure.
However, based on several lines of evidence, DeLong 2005)
recently suggested that the high abundance of Burkholderia
like sequences in one sample might be due to contamination
of the original water sample in the Venter et al. (2004) study.
Such a revelation suggests that extreme caution should be
taken when conducting microbial metagenomic analysis.
Nevertheless, the reconstruction of complete genomes based
on shotgun sequencing of environmental microbial community DNA indicated the powerfulness of this approach in
future microbial ecology research.
Metagenomic analysis of soil microbial communities. Though
microheterogeneity in aquatic environments has been
found, its complexity pales that of soil environments.
Typical soil comprises mineral particles of different sizes,
shapes, and attached organic compounds such as humus.
The structural and chemical compositions of soil determines their physicalchemical properties such as waterholding capacity, surface-to-volume ratio (hence oxygen

1726 J . X U
availability) within the soil, pH and the availability of
various nutrients. In addition, unlike aquatic habitats, soil
surfaces may undergo dramatic daily or seasonal cyclic
changes in its physicalchemical properties. Such spatial
and temporal environmental microheterogeneity poses
significant challenges for microbial ecologists. However,
recent investigations especially those based on cultureindependent approaches are revealing the amazing diversities of micro-organisms in the soil.
Many studies of soil microbial diversity have been
carried out. Based on a variety of culture-independent
methods, current estimates indicate that a single gram of
soil may contain over 10 billion microbial cells representing several thousand to over a million distinct genomic
species (e.g. Torsvik et al. 2002; Gans et al. 2005). This number
is remarkable given that the total number of known
prokaryotes listed in the website of the National Center
for Biotechnology Information is about 17 000 (including
uncultured prokaryotes). Comparisons of culture-dependent
and independent methods revealed that in most soil
environmental samples, only 0.11% of microbial species
are cultured by standard microbiological methods. Therefore, a tremendous amount of microbial genetic, physiological and metabolic diversities in the soil remain to be
discovered and explored. Significant efforts are underway
to clone and analyse the soil metagenome diversity. Daniel
(2005) summarized the studies of soil metagenomic libraries
constructed to date. These libraries include soil samples
from a variety of ecological niches, including meadows,
crop fields, and forests.
Functional analyses of the soil metagenome are typically
conducted by one of two approaches. The first is based on
nucleotide sequences using either PCR or target-specific
probes to screen the soil metagenome library. This approach
has been used successfully to clone genes with highly
conserved domains, e.g. the gluconic acid reductase, an
essential enzyme during glucose metabolism (Eschenfeldt
et al. 2001). The second approach is based on functional
screening for metabolic activity of metagenomic clones.
Several novel genes coding for proteases, lipases, amylases,
agarases, alcohol oxidoreductases, antibiotics, and antibiotic resistance have been found through this screening
(Voget et al. 2003). Some of these products hold great
commercial potential and are actively pursued by biotechnology companies.
Metagenomic analysis of a microbial community from an acid
mine drainage. Acid mine drainages are seminatural
environments rich in extremophiles. These drainages are
created as a result of mining and the exposure of predominantly ferrous iron in pyrite (FeS2) to the oxygen-rich
atmosphere. Iron is one of the most abundant elements in
Earths crust and exists naturally in two oxidative states,
ferrous (Fe2+) and ferric (Fe3+). In nature, these two forms

cycle as a result of reduction and oxidation by microorganisms and by abiotic geochemical processes. The
reduction of Fe3+ to Fe2+ occurs in anoxic environment (e.g.
bogs and waterlogged soil) by bacteria such as Shewanella
putrefaciens, with organic compounds in these environments
acting as the electron donor. In contrast, the oxidation
occurs in oxygenic environment with O2 as the electron
acceptor. Though the released energy is small during
oxidation, several groups of chemolithotrophic organisms
(e.g. Acidithiobacillus ferrooxidans and Leptospirillum
ferrooxidans) can actively participate in the reaction and
thrive in such environments by oxidizing a large amount
of ferrous iron. Because pyrite (FeS2) is one of the most
common forms of iron in nature, the oxidation of pyrite
will release large amounts of sulphate ( SO2
4 ) and sulfuric
acid, allowing the development of acid conditions in the
surrounding environment with pH values as low as 0.
Mixing of acidic mine water with natural waters in rivers
and lakes causes major environmental problems.
The metagenomic analyses of a single biofilm sample
from an acid mine drainage from the Richmond Mine at Iron
Mountain, California, have provided important insights
into the microbial community structure (Tyson et al. 2004).
From the 78 Mb sequences obtained from this sample, the
genomes of the dominant species were constructed. These
included the dominant bacterium Leptospirillum group II
(10X coverage) and the dominant Archaeon, Ferroplasma
acidarmanus (also 10X coverage). Ferroplasma is a group of
cell wall-less prokaryotes. These two species were also
found to be dominant in this community by other analytical
methods. In addition to the above two genomes, other
reconstructed partial genomes were also identified,
including that of a group III Leptospirillum (3X coverage),
and an unknown species in the genus Sulfobacillus (0.5X
coverage) that is closely related to the cultured Sulfobacillus
thermosulfidooxidans.
Bioinformatics analyses of the metagome sequence data
identified several interesting results. First, the Leptospirillum
group III strain was found to contain genes homologous to
those for biological nitrogen fixation. This knowledge subsequently led to the design of a selective isolation strategy
that allowed the isolation of this organism (Allen & Banfield
2005). Second, genes involved in essential pathways (such
as nitrogen and carbon dioxide fixation and iron metabolism) in the above chemolithoautotrophs were revealed.
Third, the genomic sequence data identified genetic polymorphisms for many genes and suggested evidence for
genetic recombination in the Ferroplasma acidarmanus
population of this community. The metagenome sequence
information established a solid foundation for fine-scale
comparisons of microbial communities. In addition, a
recent proteomic analysis of this community identified an
abundant novel protein, a cytochrome, as an essential component to iron oxidation and acid mine drainage formation
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1727
(Ram et al. 2005). These results have the potential to guide
the remediation of sites contaminated by acid mine
drainages.

Micro-arrays
Micro-array technology is a powerful, high throughput
experimental system that allows the simultaneous analysis
of thousands to hundreds of thousands of genes at the
same time. Originally developed for monitoring wholegenome gene expressions, micro-arrays have been used
for other purposes such as the genome-wide mutational
screening for single nucleotide polymorphisms and the
distributions of species and strains in natural microbial
communities. Recently, several types of micro-arrays have
been developed and evaluated for bacterial detection
and microbial community analysis. These arrays include
(i) phylogenetic oligonucleotide arrays that contain
signature sequences from rRNA of specific groups of
organisms; (ii) community genome arrays that contain
highly specific signature gene sequences from known
cultured microbial species; and (iii) functional gene arrays
that contain conserved domains of genes involved in
specific metabolic pathways such as the biogeochemical
cycling of carbon, nitrogen, sulphate, phosphate and metals
(Zhou 2003). The number of genes and the sizes of arrayed
DNA fragments in the functional gene arrays can vary
according to analytical purposes.
Preliminary evaluations suggested micro-arrays have a
great potential for the detection, identification and characterization of micro-organisms in natural habitats (Wu et al.
2004). For example, Loy et al. (2002) constructed a microarray with 132 16S rRNA-targeted oligonucleotide probes
(18 nucleotides long) representing all recognized groups
of sulphate-reducing prokaryotes and showed that this
micro-array could be used to distinguish most of the
reference strains. Using this array, they determined the
diversity of sulphate-reducing prokaryotes in periodontal
tooth pockets and a hypersaline cyanobacterial mat. Results
from the micro-array study were similar to those from
cloning and sequencing of environmental 16S rRNA. These

analyses have been recently extended to other groups of


organisms such as the Rhodocyclales in beta-proteobacteria
(Loy et al. 2005) and Enterococcus species (Lehner et al.
2005).
Despite these successes, significant challenges remain
with regard to specificity, sensitivity, and quantification of
microbes in natural habitats. This is mainly because microbial communities contain highly heterogeneous groups of
organisms with undefined/unknown genomic relationships. The highly skewed distribution of microbial species,
the potential of cross-hybridization between closely related
species, the genetic variation among strains within species,
and the differential efficiencies of isolating DNA from
among the species can all bias our results and influence the
interpretations of the data. Further evaluations are needed
to understand the specific experimental conditions appropriate for the analyses of various environmental samples
using the different types of micro-arrays.

Inferences of microbial diversity and activity from


completed microbial genome sequences
Micro-organisms are the first and most abundant species
to be completely sequenced. While most of the original
objectives for microbial genome sequencing were guided
by their practical applications such as understanding
disease progression mechanisms of human pathogens and
the potential generation of useful products and services
from these microbes, the microbial genome sequencing
efforts have helped reveal much about their ecological
roles in their natural environments as well as the potential
genomic diversities within and between species. Currently,
the sequenced microbial genomes are highly biased towards
pathogens of plants, animals and humans. There are many
detailed comparisons and reviews on these microbial
genomes (e.g. Fraser et al. 2004). In the following paragraphs, I briefly summarize several important features
with regard to the relationship among microbial genome
size, gene content and their ecology (Box 9).
First, microbial genome sequence comparisons have
revealed that prokaryotic genomes are highly variable in

Box 9 The published 250 prokaryotic genomes as of September 2005 suggest several general features of these
genomes relevant to microbial ecology:
1.
2.
3.

Prokaryotic genomes are highly variable in genome size and gene content among strains from both within and
between species.
Microbial species with narrow ecological niches generally have smaller genomes than those with broader
ecological niches.
A large fraction (20 40%) of identified open reading frames in sequenced microbial genomes code for proteins
with unknown functions.
Most of these genes are likely regulated by ecological-niche specific factors.

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

1728 J . X U
both genome size and gene content (Table 1). Among the
completely sequenced and annotated 250 unique prokaryotic genomes (four strains were sequenced twice for a total
of 254 completed genomes as of August 2005), the genome
sizes vary by over 18 folds, from the smallest archaeon
Nanoarchaeum equitans (0.49 Mb, Waters et al. 2003) to the
largest Streptomyces avermitilis (9.12 Mb, Omura et al. 2001).
The genome sizes vary not only among species but also
among strains within individual species. An example is the
common Escherichia coli where whole genome sequences of
four strains are now available: the model laboratory strain
K12, the enterohemorrhagic O157:H7 RIMD and O157:H7
EDL933, and the uropathogenic CFT073 (Parkhill &
Thomson 2004). While all three pathogenic strains have
genomes essentially colinear with each other and with the
nonpathogenic K12, both the genome size and gene content vary considerably among the four strains. For example, the two pathogenic O157:H7 strains have genomes
over 5.5 Mb, almost 1 Mb bigger than that of strain K12
(4.6 Mb) and about 300 kb bigger than that of strain CFT073
(5.2 Mb). Furthermore, about 25% of the genes in the pathogenic O157:H7 strains were not found in strain K12.
When all four strains are considered, only about 3000 of the
total genes were shared from the total of 4288, 5349, 5361
and 5379 predicted protein-coding genes, respectively,
for strains K12, O157:H7 RIMD, O157:H7 EDL933 and
CFT073, respectively. Most of these extra genes have
unusual sequence characteristics and were likely obtained
through horizontal gene transfer events from external
sources and by the action of mobile genetic elements. Some
of these genes play important roles in their ecological
adaptation, including adhesion to specific host cell types.
Comparisons between strains in other human pathogenic
bacteria (e.g. Streptococcus pneumoniae and Burkholderia
cepacia) as well as the nonpathogenic plant symbiont Si.
meliloti revealed similarly highly variable genome size and
gene contents (Fraser et al. 2004; Guo et al. 2005; Sun S.,
unpublished). At present, population-level studies of
genome size and gene content variations are still very
limited to human pathogens.
Second, species with narrow ecological niches (e.g. obligate human pathogens) on average have smaller genomes
than those capable of living in diverse ecological conditions
(Table 1). For example, the obligate intracellular pathogen
Mycoplasma genitalium has a genome size of 580 kb (encoding 484 genes) and that of the amphids Buchnera aphidicola
has a genome size of 650 kb (504 genes). These genomes
lack many of the genes essential for metabolic functions in
many free-living organisms. The deletion and degeneration
of such genes were likely due to their nonessential functions in obligate parasites because the hosts can provide
such resources to the cells. Indeed, in several obligate intracellular parasites such as Rickettsia prowazekii and Rickettsia
conorii, there is evidence that their genomes are in the

processes of deteriorating and shrinking (Andersson 2004).


Though the 250 sequenced prokaryotic genomes may not
be representative of the community genomes in various
natural environments, there seemed a correlation between
genome size and habitat. Among the six groups of
prokaryotes classified based on habitats, those from terrestrial environments have, on average, the largest genomes,
followed by prokaryotes that live in multiple habitats, in
aquatic environments, and in specialized environments
(Table 1). Some of the largest bacterial genomes are found
in those with complex lifestyles such as the social bacteria
Myxococcus xanthus (> 10 Mb), the facultative nitrogenfixing plant symbiont Bradyrhizobium japonicum (> 9 Mb),
and the antibiotic-producing, free-living soil bacteria
Streptomyces (> 9 Mb).
Third, in almost all microbial genomes sequenced, a significant percentage (2040%) of the putative open-reading
frames show no obvious homology to any known proteins
or to any sequences in the database, including those from
other micro-organisms and macro-organisms. While one
reason for this high percentage of unknown open-reading
frames is due to our limited knowledge about the microbial world (e.g. limited genomes that have been sequenced
and limited knowledge about the functional properties of
these sequenced genomes even in standard laboratory
conditions), the ubiquitous distribution of such unknown
sequences suggest their potential importance in natural
environments. Indeed, a transcriptome analysis of the
radiation-resistant Deinococcus radiodurans revealed that
about 48% of the poorly characterized or uncharacterized
genes were highly expressed in at least one experimental
condition (Liu et al. 2003). Systematic investigations into
the potential roles of this group of genes are now underway in the nitrogen-fixing bacterium Si. meliloti using a
high throughput gene knockout, systematic screening of
hundreds of growth conditions for these mutants, and the
genome-wide transcriptome and metabolome analyses
(Finan T. et al., personal communication).

Conclusions and perspectives


With the development and application of genomics tools,
microbial ecology is undergoing a renaissance. Genomics
tools have allowed us unprecedented access to natural
microbial diversity and their potential activities. However,
genomics tools have also exposed how little we know
about the vast diversity of micro-organisms colonizing and
transforming our planet Earth. Indeed, many fundamental
questions remain to be addressed. For example, how many
microbial species are there on Earth? How many unknown
metabolic pathways are there in the microbial world?
What is the relationship between microbial diversity and
microbial activity in natural environments? Do laboratory
analyses of microbial activity reflect those in natural
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1729
environments? And, how best to use microbial ecological
data gained through genomic analysis in practical applications such as mining, environmental remediation, the control
of infectious diseases, the modulation of the global climate,
and the production of biotechnology goods and services?
To address these questions, an interdisciplinary systems
approach is needed. This approach requires the integration
of the analyses at various levels of ecological organization,
from subcellular and cellular levels to those of individuals,
populations, communities and ecosystems. The approach
also requires the development and complementary analysis of biological variations at the genome, transcriptome,
proteome and metabolome levels. Indeed, the American
Society of Microbiology has issued a call to create systems
microbiology and systems microbial ecology to coordinate
such efforts and to set it a priority area for future development (Buckley 2005). There is no doubt that such coordinated efforts will reveal many exciting new discoveries.

Acknowledgements
I thank Dr Hong Guo for preparing Figs 3 and 4 and Dr Turlough
M. Finan for comments on the manuscript. During the preparation
of this review, research in my lab is supported by the Natural
Sciences and Engineering Research Council (NSERC) of Canada,
the Ontario Premiers Research Excellence Award, and Genome
Canada.

References
Allen EE, Banfield JF (2005) Community genomics in microbial
ecology and evolution. Nature Reviews. Microbiology, 3, 489498.
Allen NL, Hilton AC, Betts R, Penn CW (2001) Use of representational difference analysis to identify Escherichia coli O157specific DNA sequences. FEMS Microbiology Letters, 197, 195
201.
Andersson SGE (2004) Obligate intracellular pathogens. In:
Microbial Genomes (eds Fraser CM, Read TD, Nelson KE),
pp. 291308. Humana Press, Totowa, New Jersey.
Bart A, Dankertvan J, der Ende A (2000) Representational difference
analysis of Neisseria meningitidis identifies sequences that are
specific for the hyper-virulent lineage III clone. FEMS Microbiology Letters, 188, 111114.
Beja O, Aravind L, Koonin EV, Suzuki MT et al. (2000) Bacterial
rhodopsin: evidence for a new type of phototrophy in the sea.
Science, 289, 19021906.
Beja O, Spudich EN, Spudich JL, Leclerc M, DeLong EF (2001)
Proteorhodopsin phototrophy in the ocean. Nature, 411, 786
789.
Beja O, Suzuki MT, Heidelberg JF et al. (2002) Unsuspected diversity among marine aerobic anoxygenic phototrophs. Nature,
415, 630633.
Bergthorsson U, Ochman H (1995) Heterogeneity of genome sizes
among natural isolates of Escherichia coli. Journal of Bacteriology,
177, 57845789.
Bergthorsson U, Ochman H (1998) Distribution of chromosome
length variation in natural isolates of Escherichia coli. Molecular
Biology and Evolution, 15, 6 16.
Blochl E, Rachel R, Burggraf S, Hafenbradl D, Jannasch HW,
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

Stetter KO (1997) Pyrolobus fumarii, gen. and sp. nov., represents


a novel group of archaea, extending the upper temperature limit
for life to 113 degrees C. Extremophiles, 1, 1421.
Bogliolo AR, Lauria-Pires L, Gibson WC (1996) Polymorphism in
Trypanosoma cruzi: evidence of genetic recombination. Acta
Tropica, 61, 3140.
Brochier C, Gribaldo S, Zivanovic Y, Confalonieri F, Forterre P
(2005) Nanoarchaea: representatives of a novel archaeal phylum
or a fast-evolving euryarchaeal lineage related to Thermococcales? Genome Biology, 6, R42.
Buckley MR (2005) Systems Microbiology: Beyond Microbial
Genomics. ASM Report. ASM Press, Washington, D.C.
Calia KE, Waldor MK, Calderwood SB (1998) Use of representational difference analysis to identify genomic differences between
pathogenic strains of Vibrio cholerae. Infection and Immunity, 66,
849852.
Capua I, Alexander DJ (2002) Avian influenza and human health.
Acta Tropica, 83, 16.
Cohan FM (2004) Concepts of bacterial biodiversity for the age of
genomics. In: Microbial Genomics (eds Fraser CM, Read TD,
Nelson KE), pp. 175194. Humana Press, Totowa, New Jersey.
Conway DJ, Roper C, Oduola AM et al. (1999) High recombination
rate in natural populations of Plasmodium falciparum. Proceedings
of the National Academy of Sciences, USA, 96, 45064511.
Culley AI, Lang AS, Suttle CA (2003) High diversity of unknown
picorna-like viruses in the sea. Nature, 424, 10541057.
Daniel R (2005) The metagenomics of soil. Nature Reviews.
Microbiology, 3, 470478.
Dawson SC, Pace NR (2002) Novel kingdom-level eukaryotic
diversity in anoxic environments. Proceedings of the National
Academy of Sciences, USA, 99, 83248329.
DeLong EF (2005) Microbial community genomics in the ocean.
Nature Reviews. Microbiology, 3, 459469.
Edwards RA, Rohwer F (2005) Viral metagenomics. Nature
Reviews. Microbiology, 3, 504510.
Eschenfeldt WH, Stols L, Rosenbaum H et al. (2001) DNA from
uncultured organisms as a source of 2, 5-diketo-D-gluconic acid
reductases. Applied and Environmental Microbiology, 67, 4206
4214.
Feil EJ, Enright MC (2004) Analyses of clonality and the evolution
of bacterial pathogens. Current Opinions in Microbiology, 7, 308
313.
Feil EJ, Spratt BG (2001) Recombination and the population structure of bacterial pathogens. Annual Review of Microbiology, 55,
561590.
Fraser CM, Read TD, Nelson KE (2004) Microbial Genomes.
Humana Press, Totowa, New Jersey.
Gans J, Wolinsky M, Dunbar J (2005) Computational improvements reveal great bacterial diversity and high metal toxicity in
soil. Science, 309, 13871390.
Giovannoni SJ, Britschgi TB, Moyer CL, Field KG (1990) Genetic
diversity in Sargasso Sea bacterioplankton. Nature, 345, 60
63.
Guo H, Sun S, Finan TM, Xu J (2005) Novel DNA sequences from
natural strains of the nitrogen-fixing symbiotic bacterium
Sinorhizobium meliloti. Applied and Environmental Microbiology,
71, 71307138.
Hendrix RW (2003) Bacteriophage genomics. Current Opinions in
Microbiology, 6, 506511.
Huber H, Hohn MJ, Rachel R, Fuchs T, Wimmer VC, Stetter KO
(2002) A new phylum of Archaea represented by a nanosized
hyperthermophilic symbiont. Nature, 417, 6367.

1730 J . X U
Hugenholtz P, Goebel BM, Pace NR (1998) Impact of cultureindependent studies on the emerging phylogenetic view of
bacterial diversity. Journal of Bacteriology, 180, 47654774.
James TY (2005) The population genetics of phycomycetes. In:
Evolutionary Genetics of Fungi (eds Xu J), pp. 117148 Horizon
Biosciences, Norfolk, UK.
Kanagawa K, Oishi M, Negoro S, Urable I, Okada H (1993)
Characterization of the 6-aminohexanoate-dimer hydrolase from
Pseudomonas sp. Nk87. Journal of General Microbiology, 139, 787795.
Karner MB, DeLong EF, Karl DM (2001) Archaeal dominance in
the mesopelagic zone of the Pacific Ocean. Nature, 409, 507510.
Keese P, Gibbs A (1993) Plant viruses: master explorers of evolutionary space. Current Opinion in Genetics and Development, 3,
873877.
Kidd SE, Guo H, Bartlett K, Xu J, Kronstad JW (2005) Comparative
gene genealogies indicate that two clonal lineages of Cryptococcus
gattii in British Columbia resemble strains from other geographical areas. Eukaryotic Cell, 4, 16291638.
Konstantinidis K, Tiedje JM (2004) Microbial diversity and genomics. In: Microbial Functional Genomics (eds Zhou J, Thompson DK,
Xu Y, Tiedje JM), pp. 2140. John Wiley & Sons, New Jersey.
Konstantinidis KT, Tiedje JM (2005) Genomic insights that
advance the species definition for prokaryotes. Proceedings of the
National Academy of Sciences, USA, 102, 25672572.
Koonin EV (2003) Horizontal gene transfer: the path to maturity.
Molecular Microbiology, 50, 725727.
Lehner A, Loy A, Behr T et al. (2005) Oligonucleotide microarray
for identification of Enterococcus species. FEMS Microbiology Letters,
246, 133142.
Lenski RE (1993) Assessing the genetic structure of microbial
populations. Proceedings of the National Academy of Sciences, USA,
90, 43344336.
Lisitsyn N, Lisitsyn N, Wigler M (1993) Cloning the differences
between two complex genomes. Science, 259, 946951.
Liu Y, Zhou J, Omelchenko MV, Beliaev AS, Venkateswaran A,
Stair J, Wu L, Thompson DK, Xu D, Rogozin IB, Gaidamakova EK,
Zhai M, Makarova KS, Koonin EV, Daly MJ (2003) Transcriptome dynamics of Deinococcus radiodurans recovering from
ionizing radiation. Proceedings of the National Academy of Sciences,
USA, 100, 41914196.
Loy A, Lehner A, Lee N et al. (2002) Oligonucleotide microarray
for 16S rRNA gene-based detection of all recognized lineages
of sulfate-reducing prokaryotes in the environment. Applied
Environmental Microbiology, 68, 5064 5081.
Loy A, Schulz C, Lucker S et al. (2005) 16S rRNA gene-based
oligonucleotide microarray for environmental monitoring of
the betaproteobacterial order Rhodocyclales. Applied and
Environmental Microbiology, 71, 1373 1386.
Maiden MC, Bygraves JA, Feil E et al. (1998) Multilocus sequence
typing: a portable approach to the identification of clones within
populations of pathogenic microorganisms. Proceedings of the
National Academy of Sciences, USA, 95, 31403145.
Margulies M, Egholm M, Altman WE et al. (2005) Genome
sequencing in microfabricated high-density picolitre reactors.
Nature, 437, 376380.
Mayden RL (1997) A hierarchy of species concepts: the denouement in the saga of the species problem. In: Species: The Unit of
Biodiversity (eds Claridge MF, Dawah HA, Wilson MR), pp. 381
424. Chapman & Hall, London.
Maynard Smith J, Smith NH, ORourke M, Spratt BG (1993) How
clonal are bacteria? Proceedings of the National Academy of Sciences,
USA, 90, 43844388.

Miyakawa Y, Mizokami M (2003) Classifying hepatitis B virus


genotypes. Intervirology, 46, 329338.
Moncalvo JM (2005) Molecular systematics: major fungal phylogenetic groups and fungal species concepts. In: Evolutionary
Genetics of Fungi (ed. Xu J), pp. 134. Horizon BioScience,
Norfolk, UK.
Nunez ME, Martin MO, Duong LK, Ly E, Spain EM (2003) Investigations into the life cycle of the bacterial predator Bdellovibrio
bacteriovorus 109J at an interface by atomic force microscopy.
Biophysials Journal, 84, 33793388.
Omura S, Ikeda H, Ishikawa J et al. (2001) Genome sequence of an
industrial microorganism Streptomyces avermitilis: deducing the
ability of producing secondary metabolites. Proceedings of the
National Academy of Sciences, USA, 98, 1221512220.
Pace NR, Stahl DA, Olsen GJ, Lane DJ (1985) Analyzing natural
microbial populations by rRNA sequences. ASM News, 51, 412.
Parkhill J, Thomson NR (2004) The Genomes of Pathogenic
Enterobacteria. In: Microbial Genomes (eds Fraser CM, Read TD,
Nelson KE), pp. 269290. Humana Press, Totowa, New Jersey.
Pujol C, Dodgson A, Soll DR (2005) Population genetics of ascomycetes pathogenic to humans and animals. In: Evolutionary
Genetics of Fungi (eds Xu J), pp. 149188. Horizon Biosciences,
Norfolk, UK.
Putaporntip C, Jongwutiwes S, Sakihama N et al. (2002) Mosaic
organization and heterogeneity in frequency of allelic recombination of the Plasmodium vivax merozoite surface protein-1 locus.
Proceedings of the National Academy of Science, USA, 99, 16348
16353.
Ram RJ, Verberkmoes NC, Thelen MP et al. (2005) Community
proteomics of a natural microbial biofilm. Science, 308, 1915
1920.
Rothschild LJ, Mancinelli RL (2001) Life in extreme environments.
Nature, 409, 10921101.
Schadt CW, Martin AP, Lipson DA, Schmidt SK (2003) Seasonal
dynamics of previously unknown fungal lineages in tundra
soils. Science, 301, 13591361.
Schleper C, Jurgens G, Jonuscheit M (2005) Genomic studies of
uncultured archaea. Nature Reviews. Microbiology, 3, 479488.
Sibley CG, Comstock JA, Ahlquist JE (1990) DNA hybridization
evidence of hominoid phylogeny: a reanalysis of the data.
Journal of Molecular Evology, 30, 202236.
Tinsley CR, Nassif X (1996) Analysis of the genetic differences
between Neisseria meningitidis and Neisseria gonorrhoeae: two closely
related bacteria expressing two different pathogenicities.
Proceedings of the National Academy of Sciences, USA, 93, 11109
11114.
Tolou HJ, Couissinier-Paris P, Durand JP et al. (2001) Evidence for
recombination in natural populations of dengue virus type 1
based on the analysis of complete genome sequences. Journal of
General Virology, 82, 12831290.
Torsvik V, Daae FL, Sandaa RA, Ovreas L (1998) Novel techniques
for analysing microbial diversity in natural and perturbed environments. Journal of Biotechnology, 64, 5362.
Torsvik V, Ovreas L, Thingstad TF (2002) Prokaryotic diversity
magnitude, dynamics, and controlling factors. Science, 296,
10641066.
Tyson GW, Chapman J, Hugenholtz P et al. (2004) Community
structure and metabolism through reconstruction of microbial
genomes from the environment. Nature, 428, 3743.
Urwin R, Maiden MCJ (2003) Multi-locus sequence typing: a
tool for global epidemiology. Trends in Microbiology, 11, 479
487.
2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

M I C R O B I A L E C O L O G I C A L G E N O M I C S 1731
Vandenkoornhuyse P, Baldauf SL, Leyval C, Straczek J, Young JP
(2002) Extensive fungal diversity in plant roots. Science, 295,
2051.
Venter JC, Remington K, Heidelberg JF et al. (2004) Environmental
genome shotgun sequencing of the Sargasso Sea. Science, 304,
6674.
Voget S, Leggewie C, Uesbeck A, Raasch C, Jaeger KE, Streit WR
(2003) Prospecting for novel biocatalysts in a soil metagenome.
Applied and Environmental Microbiology, 69, 6235 6242.
Waters E, Hohn MJ, Ahel I et al. (2003) The genome of Nanoarchaeum
equitans: insights into early archaeal evolution and derived
parasitism. Proceedings of the National Academy of Sciences, USA,
100, 1298412988.
West JA, Zuccarello GC (1999) Biogeography of sexual and
asexual populations in Bostrychia moritziana (Rhodomelaceae,
Rhodophyta). Phycological Research, 47, 115 123.
Woese CR (1987) Bacterial evolution. Microbiological Reviews, 51,
221271.
Wu L, Thompson DK, Liu X et al. (2004) Development and
evaluation of microarray-based whole-genome hybridization
for detection of microorganisms within the context of environmental applications. Environmental Science and Technology, 38,
67756782.
Xu J (2002) Mitochondrial DNA polymorphisms in the human
pathogenic fungus Cryptococcus neoformans. Current Genetics, 41,
4347.
Xu J (2004) The prevalence and evolution of sex in microorganisms.
Genome, 47, 775780.
Xu J (2005) Evolutionary Genetics of Fungi. Horizon Biosciences,
Norfolk, UK.
Xu J, Cheng M, Tan Q, Pan Y (2005) Molecular population genetics
of basidiomycete fungi. In: Evolutionary Genetics of Fungi (eds Xu
J), pp. 221252. Horizon Biosciences, Norfolk, UK.
Xu J, Mitchell TG (2003) Comparative gene genealogical analyses

2006 Blackwell Publishing Ltd, Molecular Ecology, 15, 17131731

of strains of serotype AD identify recombination in populations


of serotypes A and D in the human pathogenic yeast Cryptococcus
neoformans. Microbiology, 149, 21472154.
Xu J, Vilgalys R, Mitchell TG (2000) Multiple gene genealogies
reveal recent dispersion and hybridization in the human
pathogenic fungus Cryptococcus neoformans. Molecular Ecology, 9,
14711481.
Xu J, Luo G, Vilgalys RJ, Brandt ME, Mitchell TG (2002) Multiple
origins of hybrid strains of Cryptococcus neoformans with serotype AD. Microbiology, 148, 203212.
Yamaguchi J, Bodelle P, Kaptue L et al. (2003) Near full-length
genomes of 15 HIV type 1 group O isolates. AIDS Research and
Human Retroviruses, 19, 979988.
Yomo T, Urable I, Okada H (1992) No stop codons in the antisense
strands of the genes for nylon oligomer degradation. Proceedings
of the National Academy of Sciences, USA, 89, 37803784.
Zhou J (2003) Microarrays for bacterial detection and microbial
community analysis. Current Opinions in Microbiology, 6, 288
294.

J-P Xus general research interests are in microbial ecology and


evolutionary genetics. The current research in his laboratory
attempts to understand the origins and maintenance of genetic
variation in microorganisms. His research group examines both
natural microbial populations from the environment and clinics
and experimental populations evolved in the laboratory. Specifically,
by using microbiological, molecular and computational tools, their
research seeks to determine the rate and effect of spontaneous
mutations on microbial life history traits; the rate and route of spread
of microbes in natural environments and human populations; the
origins of novel strains and species, and the origin and evolution
of sex.

Vous aimerez peut-être aussi