Vous êtes sur la page 1sur 8

Available online at www.sciencedirect.

com

Identifications of pathogensa bioinformatic point of view


Richard Christen
Over the past 15 years, microbiology has undergone a
momentous shift toward molecular methods. New sequences
appear daily in the public databases and new computer tools
and web servers are published on a regular basis. Major
advances in molecular identifications of pathogens have been
made because new biotechnology methods have appeared
that often require a thorough in silico analysis of sequences.
However, significant difficulties partly remain in developing
efficient methods because the public databases contain many
poorly annotated or partial sequences (often of environmental
origin) and also because there are few dedicated web servers
and curated databases.
Addresses
University of Nice Sophia-Antipolis and CNRS UMR 6543, Institute of
Developmental Biology and Cancer, Parc Valrose, Centre de Biochimie,
F 06108 Nice, France
Corresponding author: Christen, Richard (christen@unice.fr)

Current Opinion in Biotechnology 2008, 19:266273


This review comes from a themed issue on
Environmental Biotechnology
Edited by Carla Pruzzo and Pietro Canepari
Available online 29th May 2008
0958-1669/$ see front matter
# 2008 Elsevier Ltd. All rights reserved.

Retrieval of every necessary sequence can, however, be


difficult, while the design of primers and probes is tedious
and may result in lower quality results if the multiple
criteria for design are not properly handled. New
sequences are now flowing in, seemingly faster than
programs can deal with. It is for example no longer easy
to Blast the 16S rRNA gene sequence of a new isolate to
find out which well known bacteria it is related to because
most newly submitted rRNA sequences now originate
from uncultivated clones. Housekeeping genes and
pathogenicity gene sequences have been submitted in
large numbers, but full sequences are not easily retrieved
by Blast (because many are quite divergent) or by keywords because their annotations are often poor or not
standard. Also, and in contrast to the community devoted
to analyses of complete genomes, there are few centralized services or web servers that gather data, clean them
and post them on the web with good query and analysis
tools. Finally, bioinformaticians continuously publish
new tools, but there are very few studies to compare
them and in fact analyze how good these new tools are
(see for example BALIBASE for estimating new alignement programs [24,25]).
Detailed analyses will be restricted to waterborne bacteria, for which we will review available sequences and
possible solutions for in silico analyses of diagnostic
methods before the real experiments.

DOI 10.1016/j.copbio.2008.04.003

Choice of a target gene


Introduction
In microbiology, nucleic-acid-based diagnostics gradually
are replacing culture-based methods [1,2,3,4].
Procedures that rely on PCR of a single gene or multilocus
sequence typing [58] as well as arrays [9,10,1113]
require the design of oligomers for amplification and
hybridization. Mass sequencing [14,15,16,17] or 16S
rRNA mass cataloging [18] produce sequences that have
to be matched to a database of known sequences. Finally
entirely new methods are appearing [19]. The consortium
of DDBJ, EMBL & GenBank exchanges data on a daily
basis (URL: http://www.insdc.org) and contains almost
every known sequence. Blast [20] is used to retrieve similar
sequences, ACNUC [21], SRS [22] or Entrez [23] retrieve
sequences according to keywords. There are many
free utilities to align sequences, compute and display
phylogenetic trees (URL: http://www.bioinformatics.org/
). Finally design of primers and probes can be done
using many tools (URL: http://bioinfo.unice.fr/softwares/
oligo_softwares.html).
Current Opinion in Biotechnology 2008, 19:266273

Target genes for bacterial identification can be the 16S


rRNA gene, a housekeeping gene or finally a pathogenicity gene. Some species are always pathogenic, and
targeting 16S rRNA gene sequences is often the solution
because many sequences have been published, PCR
primers and hybridization probes have usually been
described and tested; finally dedicated software and
web sites are available [26]. Cases of lateral transferts
[2729]) or too similar 16S rRNA gene sequences
(reviewed in reference [30]) have also highlighted the
need to use other or more rapidly evolving genes [31].
Some of these genes have, however, been completely
sequenced in very few different strains or species, making
it dubious that truly universal or specific oligomers have
been really designed. Also the general absence of very
conserved domains renders primers and probes design
difficult. Finally, there is always the chance that yet
unknown variant sequences exist that will escape molecular detection because of mutations. The last case
applies to clones that become pathogenic only after
acquisition of pathogenicity genes [32,33,34] or when
pathogenicity depends upon the genetic content, that is,
www.sciencedirect.com

Identifications of pathogensa bioinformatic point of view Christen 267

by differential regulation of some genes or integration of


genes (or domains) that belong to the species or genus
gene pool but are not always present in a particular clone
[35,36]. In such cases, targeting pathogenicity genes is the
best choice, with difficulties similar to housekeeping
genes. For other approaches such as multilocus sequence
typing (MLST) and analyses of variable number of tandem repeats (VNTR) see references [3740] for
examples.

different strains or species. On the contrary, one may


expect less divergence (due to smaller population sizes
and slower division rates) to be present in a population.

For Eukaria (often protists), the approach is very similar,


but there are often many fewer sequences available from

A list of pathogens likely to be found in aquatic environments was built (primarily based on WHO list

Finally, viruses are a very different situation, since there is


no homologous housekeeping gene shared among viruses,
and mutation rates are expected to be much higher.

Retrieval of sequence data for the major


waterborne pathogens

Table 1
For each taxon, the number of entries (number of different submissions) of protein coding sequences (CDS) and of genomes projects was
analyzed
Taxon

Entries

nbr CDS

Genomes

Adenoviridae (Atadenovirus, Aviadenovirus, Mastadenovirus, Siadeonvirus)


Atadenovirus (various adenoviruses)
Astroviridae (Avastrovirus, Mamastrovirus)
Caliciviridae (Lagovirus, Nebraska-like virus, Norovirus, Sapovirus, Vesivirus)
Hepeviridae (Hepatitis E virus)
Mamastrovirus (Astrovirus of various hosts)
Picornaviridae (Enterovirus, Hepatovirus, . . .)
Reoviridae (Aquareovirus, . . ., Rotavirus)
Enterovirus
Hepatovirus
Human enterovirus A
Human enterovirus B
Human enterovirus C
Human enterovirus D
Human astrovirus
Rotavirus
Sapovirus

3644
69
1072
6371
2767
809
21296
8232
13057
3605
3010
6578
762
161
792
5488
553

5356
198
1100
7410
2611
834
17975
8015
10994
3014
2723
5425
639
125
813
5254
608

44
5
6
16
1
3
40
352
16
2
1
1
1
1
1
33
4

Burkholderia pseudomallei
Campylobacter coli
Campylobacter jejuni
Escherichia coli
Legionella pneumophila
Legionella
Pseudomonas aeruginosa
Salmonella typhi
Salmonella
Shigella
Vibrio cholerae
Vibrio parahaemolyticus
Vibrio vulnificus
Vibrio
Yersinia enterocolitica

568
346
2153
39004
2451
3386
35034
191
7953
4149
2532
882
821
10358
445

27058
367
11676
75822
15145
15545
22340
474
41953
31292
10833
5775
10549
40967
5093

24
1
11
35
4
4
7
0
24
8
17
3
4
34
1

33450
11148
39678
164
2
101006
205767
24507
67

180
1262
1796
2
0
706
843
1364
38

1
2
4
0
0
0
6
0
0

Acanthamoeba
Cryptosporidium parvum
Cryptosporidium
Cyclospora cayetanensis
Dracunculus medinensis
Entamoeba histolytica
Entamoeba
Giardia intestinalis
Naegleria fowleri

Viruses were also queried according to a higher taxonomic rank because the names used to describe them can be quite different in different entries. A
table in additional materials also provides the list of most sequenced genes for a number of waterborne pathogens. Note: complete lists or synthetic
information on genome projects can be manually obtained from URL: http://www.ncbi.nlm.nih.gov/Genomes/ or URL: http://www.genomesonline.org/. Genome numbers are for finished to in progress projects.

www.sciencedirect.com

Current Opinion in Biotechnology 2008, 19:266273

268 Environmental Biotechnology

URL: http://www.who.int/water_sanitation_health/
gdwqrevision/watpathogens.pdf) and used to query GenBank release 163 (updates up to February 5, 2008) with
ACNUC [21]. The questions asked were what are the
respective numbers of entries and protein coding genes
available for each organism (an entry is a separate submission identified by its accession number, it may contain
the sequences of many genes), and how many complete
genomes are available. These data (Table 1) demonstrate contrasting situations for the different pathogens;
some organisms have been widely sequenced, both in
terms of entries as well as complete genomes. Other
pathogens, especially among Eukarya have gathered little
interest as for pathogens of the Apicomplexa phylum, the
Nematoda Dracunculus, and Naegleria fowleri. Finally, for
Bacteria and Eukarya, querying by species name is rather
easy, but for viruses, the name of the host is often
included in the name of the organism, making queries
more difficult. In order to retrieve all of the sequences for
a given virus, it is often necessary to query using a higher
order keyword (Table 1).
Genbank contained about 1 188 211 bacterial entries
(pathogenic or not), the 16S rRNA gene alone contributing 647 899 sequences. Noteworthy is that many of these
sequences were short to very short (50500 nt, 2530331
entries), only 186 310 entries had a length of 1 200 or
more. Only 32 900 of these long sequences belonged to
cultured strains annotated as about 8 000 different
species.
Considering all bacterial entries, the nifH gene (involved
in nitrogen fixation) was the most sequenced (9 421
entries), followed by gyrB (encoding a type II topoisomerase, 6 845 sequences) and rpoB (coding for b-subunit of
RNA polymerase, 6 231 sequences). For waterborne bacterial pathogens and surprisingly the mdh gene (which
encodes for an enzyme that catalyzes the interconversion
of malate and oxaloacetate) were the most sequenced,
followed by three housekeeping genes: gyrB, rpoB, and
recA. Genes gyrB (sequences available for 337 genera and
1 483 species, most sequenced genera: Pseudomonas and
Vibrio), rpoB (sequences available for 238 genera and
1 565 species, most sequenced genus: Mycobacterium)
and recA (sequences available for 232 genera and 999
species, most sequenced genus: Vibrio) have been widely
used as taxonomic markers.

Domains sequenced
The level of sequence divergence as well as the length of
available sequences drive the phylogenetic resolution. It
is not possible to easily provide such an evaluation at the
bacterial level because sequences for the different genes
are available for a wide but different distribution of taxa.
We therefore restrained the analysis to the Vibrio
sequences or gyrB, recA, and rpoB. To simplify the
analyses and results, protein sequences were downloaded
Current Opinion in Biotechnology 2008, 19:266273

and aligned. The length distributions showed that most


submitted sequences do not cover the entire length
(Table 2), as many sequences result from PCR amplifications using universal primers, which makes evaluation of
published primers difficult. Recent analyses showed that
a good phylogenetic resolution may be obtained using a
different gene for different taxa or better a multilocus
identification [4042].

Primers for gene amplification, a case study


Even using bioinformatics analyses, it would have been
very tedious if not impossible to evaluate published
primers for every gene and to present the results here.
As a case study we used the mip (macrophage infectivity
Table 2
Domains sequenced for rpoB, gyrB, and recA gene sequences
(Vibrios)
Length

lb

rb

Number

rpoB
1400
1370
400
400
300
300
200
100
100

0
30
700
400
700
400
1100
1100
600

1400
1400
1100
800
1000
700
1300
1200
700

2
3
15
1
2
65
1
9
1

gyrB
800
410
400
380
370
340
310
300
200
200
100

0
90
100
20
30
60
90
100
200
100
200

800
500
500
400
400
400
400
400
400
300
300

5
47
4
2
1
16
1
69
212
162
1

recA
400
360
350
340
330
240
230
220
210
200
130
120
100

0
40
50
60
70
60
70
80
90
100
70
80
100

400
400
400
400
400
300
300
300
300
300
200
200
200

1
1
7
6
8
1
94
141
47
300
63
4
22

This table is sorted by decreasing sequence lengths (column 1), lb and


rb correspond to left and right boundaries of sequences in alignments
(sequence starts aligning at or after lb position and ends before rb),
number is the number of sequences in each category (rpoB 99
sequences, 26 species; gyrB 520 sequences, 59 species; recA 695
sequences, 58 species). In bold are the most sequenced domains.

www.sciencedirect.com

Identifications of pathogensa bioinformatic point of view Christen 269

Figure 1

Heatmap analysis of oligomers used in references [43,44] to identify the presence of the mip gene. Tms were calculated using the nearest
neighbor algorithm and were then transformed into colors (corresponding Tm/color shown in Figure). Each column of the heatmap (on the right)
corresponds to an oligomer as indicated in the box Primers identifiers. A gray square is for a Tm below 40 8C, a white square for a sequence

www.sciencedirect.com

Current Opinion in Biotechnology 2008, 19:266273

270 Environmental Biotechnology

potentiator) gene that in Legionella encodes for a surface


protein, required for optimal infection of macrophages.
Querying the literature returned 44 publications that used
mip as a target for identification, and for the purpose of
this review, we analyzed only two recent studies [43,44].
We retrieved a total of 278 mip sequences in Legionella
species, only 146 of which were distinct (not contained in
a longer sequence). We evaluated how each oligomer
would bind to each variant of the mip gene sequences
(Table 3). It is particularly striking that primer Mip-R1
shows a mismatch for most sequences in first position, a
simple blast confirmed this problem. For the other oligomers, this analysis demonstrates that a number of
variant sequences will probably not be well recognized.

Table 3
Evaluation of primers and probes recently used for the identification of the mip genes in Legionella

We also analyzed if the mip gene was present in Legionella


species different from L. pneumophila and coupled Tm
calculation with a phylogenetic tree (Figure 1). This
analysis demonstrates that some oligomers are indeed
specifically targeting the mip gene in L. pneumophila
and not in other species of Legionella. The fact that the
mip gene is also present in other species of Legionella is not
clearly stated in these publications (but see reference
[45]), and since lateral gene transfers are rather common
in bacteria, it is not clear whether present primers indeed
amplify mip genes in every L. pneumophila strain (see
Figure 1).

Bioinformatic tools
Aside from the multipurposes tools available at NCBI,
EBI or elsewhere, a number of web servers or programs
may help analyses:
 GreenGenes. The greengenes web application provides access to a 16S rRNA gene sequence alignment
for browsing, blasting, probing, and downloading:
URL: http://greengenes.lbl.gov.
 PubMLST. This site hosts publicly accessible MLST
databases and software: URL: http://pubmlst.org, see
also reference [46].
 Legionella mip gene Sequence Database. This database
allows the comparison of a new mip gene DNA
sequences with reference sequences from all described
species of Legionella: URL: http://www.hpa.org.uk/cfi/
bioinformatics/ewgli/legionellamips.htm.
 leBIBI. Blast on databases of SSU-rDNA, gyrB, recA,
sodA, rpoB, tmRNA, tuf and groel2-hsp65 gene
sequences and tools for bacterial identification: URL:
http://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgi.
 ICB. Identification and classification of bacteria
database using gyrB: URL: http://seasquirt.mbio.co.jp/icb/.

For each oligomer: column (1) Tm in 8C estimated for each mip


sequence variant; column (2) the variant sequence; column (3) the
number of such sequences (about 270 mip sequences available, only
excerpts shown). F: forward primer, R: reverse primer.

 GPMS. Pathogenic bacteria strain genotyping essentially for epidemiological purposes based on polymorphic tandem repeat typing: URL: http://
minisatellites.u-psud.fr.
 VNTR. Molecular typing of bacteria using variable
number tandem repeats: URL: http://vntr.csie.ntu.edu.tw.
 OHM. A tool that produces heatmaps representing in
a visual manner the Tm of primers on a set of
sequences (can be combined with TreeDyn [47]):
URL: http://bioinfo.unice.fr/ohm.

(Figure 1 Legend Continued ) too short to contain the oligomer. Upper Figure (A) excerpt of L. pneumophila clade (possible cases of lateral
transfert in red). Lower Figure (B) excerpt of non-L. pneumophila clade. Primer #3 shows the highest predicted Tm, but will fail on some
sequences; primer #1 also shows quite a wide heterogeneity of predicted Tms. The full figure is available as supplementary material.

Current Opinion in Biotechnology 2008, 19:266273

www.sciencedirect.com

Identifications of pathogensa bioinformatic point of view Christen 271

 A Blast server, to Blast 16S rRNA sequences on


cultured bacteria only: URL: http://bioinfo.unice.fr/
blast.
 DDBJ. A Blast server to blast only on 16S rRNA gene
sequences only (fast): URL: http://blast.ddbj.nig.ac.jp/
top-e.html.
 The list of prokaryotic names with standing in
nomenclature (now including 16S rRNA accession
numbers): URL: http://www.bacterio.cict.fr/.
 Norovirus Molecular Epidemiology Database. The
norovirus database contains a collection of over 1000
sequences of norovirus strains and associated epidemiological data: URL: http://www.hpa.org.uk/cfi/
bioinformatics/norwalk/norovirus.htm.

Conclusions
If none of the above servers can be used (this is not an
exhaustive list), sequence retrieval, alignments, phylogenies, and design of primers can be quite time consuming
and tedious for scientists that cannot write computer
programs. Sequence retrieval using keywords is often
more efficient than a Blast. SRS (Advanced Search form)
or even better ACNUC or specific tools [48] should be
preferred to Entrez, because they are more powerful for
sequence retrieval. Combining keywords for the gene or
gene products with species name or taxon ID and a filter
on sequence length (very short sequences are useless) is
often very efficient. Since annotations are not standard,
building a list of gene products is often necessary (see
additional materials). If there are many sequences, it is
possible to cluster these sequences at a given similarity
level (using blastclust or Cd-hit [49]) and align one
representative sequence per cluster. A visual inspection
of alignments reveals sequences that do not align well;
they are often the result of a wrong annotation or have to
be inverted-complemented. The remaining sequences
can then be added to this good alignment (using Clustal
profile option for example). For protein coding gene a
program such as Transalign [50] may be a good choice.
When retrieving primers from publications, older papers
are often useless because primers were designed using a
very few numbers of sequences (primers can be analyzed
using the web server cited above, to produce figures
similar to Figure 1).
Finally, there is a large difference between amplification
using DNA extracted from a pure culture and DNA
extracted from an environmental sample. Primer (P)
binds to its target DNA (T) according to the classical
equation [P][T]/[PT] = Km. The presence of one or two
differences between the P sequence and the T sequence
may strongly influence the value of Km. With DNA
extracted from a pure culture [T] may be sufficiently
high so that [PT] is large enough for the PCR to succeed.
With environmental DNA, and in the presence of mismatch(es), the primer may bind to many other domains (at
low affinity but in many places) so that [PT] is not large
www.sciencedirect.com

enough to allow a successful amplification. This is why,


for environmental studies, any published primers should
always be carefully checked by comparison to newly
published sequences.

Acknowledgements
This work was supported by funds from the European Commission for the
HEALTHY WATER project (FOOD-CT-2006-036306) and a CNRS PICS
to R Christen. The authors are solely responsible for the content of this
publication. It does not represent the opinion of the European Commission.
The European Commission is not responsible for any use that might be
made of data appearing therein.

Appendix A. Supplementary data


Supplementary data associated with this article can be
found, in the online version, at doi:10.1016/j.copbio.
2008.04.003.

Conflict of interest
None.

References and recommended reading


Papers of particular interest, published within the annual period of
review, have been highlighted as:
 of special interest
 of outstanding interest
1.


Barken KB, Haagensen JA, Tolker-Nielsen T: Advances in nucleic


acid-based diagnostics of bacterial infections. Clin Chim Acta
2007, 384:1-11.
This review describes a range of different nucleic-acid-based diagnostic
methods and provides examples of the use of these methods for detection
of common bacterial infections, with a focus on automated procedures.

2.


Abubakar I, Irvine L, Aldus CF, Wyatt GM, Fordham R, Schelenz S,


Shepstone L, Howe A, Peck M, Hunter PR: A systematic review
of the clinical, public health and cost-effectiveness of rapid
diagnostic tests for the detection and identification of
bacterial intestinal pathogens in faeces and food. Health
Technol Assess 2007, 11:1-216.
This is a (230 pages long) review provided by the Health Technology
Assessment (HTA) program, now part of the National Institute for Health
Research (NIHR) and based on studies evaluating diagnostic accuracy of
rapid tests were retrieved using electronic databases and handsearching
reference lists and key journals, including cost assessments. Every study
is critically evaluated.

3.


Tenover FC: Rapid detection and identification of bacterial


pathogens using novel molecular technologies: infection
control and beyond. Clin Infect Dis 2007, 44:418-423.
A short (far from exhaustive) review comparing effectiveness of PNAFISH, real-time PCR and pyrosequencing and discussing the use of FDAcleared versus non-FDA-cleared assays (antibiotic resistance).

4. Shneyer VS: On the species-specificity of DNA: fifty years later.



Biochemistry (Mosc) 2007, 72:1377-1384.
A short historical review of the molecular methods used to identify
prokaryotes and eukaryotes.
5.

Angenent LT, Kelley ST, St Amand A, Pace NR, Hernandez MT:


Molecular identification of potential pathogens in water and
air of a hospital therapy pool. Proc Natl Acad Sci USA 2005,
102:4860-4865.

6.

Best EL, Fox AJ, Frost JA, Bolton FJ: Real-time singlenucleotide polymorphism profiling using Taqman technology
for rapid recognition of Campylobacter jejuni clonal
complexes. J Med Microbiol 2005, 54:919-925.

7.

Lehmann LE, Hunfeld KP, Emrich T, Haberhausen G, Wissing H,


Hoeft A, Stuber F: A multiplex real-time PCR assay for rapid
detection and differentiation of 25 bacterial and fungal
pathogens from whole blood samples. Med Microbiol Immunol
2007.
Current Opinion in Biotechnology 2008, 19:266273

272 Environmental Biotechnology

8.

9.

Ciammaruconi A, Grassi S, De Santis R, Faggioni G, Pittiglio V,


DAmelio R, Carattoli A, Cassone A, Vergnaud G, Lista F: Fieldable
genotyping of Bacillus anthracis and Yersinia pestis based on
25-loci multi locus VNTR analysis. BMC Microbiol 2008, 8:21
doi: 10.1186/1471-2180-8-21.
Wang XW, Zhang L, Jin LQ, Jin M, Shen ZQ, An S, Chao FH, Li JW:
Development and application of an oligonucleotide
microarray for the detection of food-borne bacterial
pathogens. Appl Microbiol Biotechnol 2007, 76:225-233.

24. Thompson JD, Plewniak F, Poch O: BAliBASE: a benchmark


alignment database for the evaluation of multiple alignment
programs. Bioinformatics 1999, 15:87-88.
25. Conery JS: Aligning sequences by minimum description
length. EURASIP J Bioinform Syst Biol 2007:72936.
26. Kumar Y, Westram R, Kipfer P, Meier H, Ludwig W: Evaluation of
sequence alignments and oligonucleotide probes with
respect to three-dimensional structure of ribosomal RNA
using ARB software package. BMC Bioinformatics 2006, 7:240
doi: 10.1186/1471-2105-7-240.

10. DeSantis TZ, Brodie EL, Moberg JP, Zubieta IX, Piceno YM,

Andersen GL: High-density universal 16S rRNA microarray
analysis reveals broader diversity than typical clone library
when sampling the environment. Microb Ecol 2007, 53:371-383.
Identification of pathogens in environmental samples often use parallel,
multispecies detection systems, in order to detect any pathogens. In this
analysis a DNA array with 2 97 851 probes was compared with 16S
cloning and sequencing to evaluate the biodiversity, with the conclusion
that the array was more efficient. However, pyrosequencing technologies
are likely to replace both of the approaches compared in this work.

28. Schouls LM, Schot CS, Jacobs JA: Horizontal transfer of


segments of the 16S rRNA genes between species of the
Streptococcus anginosus group. J Bacteriol 2003, 185:72417246.

11. Wiesinger-Mayr H, Vierlinger K, Pichler R, Kriegner A, Hirschl AM,


Presterl E, Bodrossy L, Noehammer C: Identification of human
pathogens isolated from blood using microarray hybridisation
and signal pattern recognition. BMC Microbiol 2007, 7:78 doi:
10.1186/1471-2180-7-78.

29. Dewhirst FE, Shen Z, Scimeca MS, Stokes LN, Boumenna T,


Chen T, Paster BJ, Fox JG: Discordant 16S and 23S rRNA gene
phylogenies for the genus Helicobacter: implications for
phylogenetic inference and systematics. J Bacteriol 2005,
187:6106-6118.

12. Hansen RR, Sikes HD, Bowman CN: Visual detection of labeled
oligonucleotides using visible-light-polymerization-based
amplification. Biomacromolecules 2008, 9:355-362.

30. Janda JM, Abbott SL: 16S rRNA gene sequencing for bacterial
identification in the diagnostic laboratory: pluses, perils, and
pitfalls. J Clin Microbiol 2007, 45:2761-2764.

13. Lin YC, Sheng WH, Chang SC, Wang JT, Chen YC, Wu RJ,
Hsia KC, Li SY: Application of a microsphere-based array for
rapid identification of Acinetobacter spp. with distinct
antimicrobial susceptibilities. J Clin Microbiol 2008, 46:612-617.

31. Santos SR, Ochman H: Identification and phylogenetic sorting


of bacterial lineages with universally conserved genes and
proteins. Environ Microbiol 2004, 6:754-759.

14. Yang ZJ, Tu MZ, Liu J, Wang XL, Jin HZ: Comparison of
amplicon-sequencing, pyrosequencing and real-time PCR for
detection of YMDD mutants in patients with chronic hepatitis
B. World J Gastroenterol 2006, 12:7192-7196.
15. Kobayashi N, Bauer TW, Tuohy MJ, Lieberman IH, Krebs V,

Togawa D, Fujishiro T, Procop GW: The comparison of
pyrosequencing molecular Gram stain, culture, and
conventional Gram stain for diagnosing orthopaedic
infections. J Orthop Res 2006, 24:1641-1649.
Sequencing more efficient than staining to differentiate Gram-positive
from Gram-negative bacteria. Who would have bet on it in 2005?
16. Luna RA, Fasciano LR, Jones SC, Boyanton BL Jr, Ton TT,
Versalovic J: DNA pyrosequencing-based bacterial pathogen
identification in a pediatric hospital setting. J Clin Microbiol
2007, 45:2985-2992.
17. Dowd SE, Sun Y, Secor PR, Rhoads DD, Wolcott BM, James GA,
Wolcott RD: Survey of bacterial diversity in chronic wounds
using Pyrosequencing, DGGE, and full ribosome shotgun
sequencing. BMC Microbiol 2008, 8:43 doi: 10.1186/1471-21808-43.
18. Jackson GW, McNichols RJ, Fox GE, Willson RC: Bacterial
genotyping by 16S rRNA mass cataloging. BMC Bioinformatics
2006, 7:321 doi: 10.1186/1471-2105-7-321.
19. Grun J, Manka CK, Nikitin S, Zabetakis D, Comanescu G, Gillis D,
Bowles J: Identification of bacteria from two-dimensional
resonant-Raman spectra. Anal Chem 2007, 79:5489-5493.
20. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation
of protein database search programs. Nucleic Acids Res 1997,
25:3389-3402.
21. Gouy M, Delmotte S: Remote access to ACNUC nucleotide and
protein sequence databases at PBIL. Biochimie 2008, 90:555562.
22. Etzold T, Ulyanov A, Argos P: SRS: information retrieval system
for molecular biology data banks. Methods Enzymol 1996,
266:114-128.
23. Schuler GD, Epstein JA, Ohkawa H, Kans JA: Entrez: molecular
biology database and retrieval system. Methods Enzymol 1996,
266:141-162.
Current Opinion in Biotechnology 2008, 19:266273

27. van Berkum P, Terefework Z, Paulin L, Suomalainen S,


Lindstrom K, Eardly BD: Discordant phylogenies within the rrn
loci of Rhizobia. J Bacteriol 2003, 185:2988-2998.

32. Smith DL, Wareing BM, Fogg PC, Riley LM, Spencer M, Cox MJ,
Saunders JR, McCarthy AJ, Allison HE: Multilocus
characterization scheme for shiga toxin-encoding
bacteriophages. Appl Environ Microbiol 2007, 73:8032-8040.
33. Ogura Y, Ooka T, Asadulghani, Terajima J, Nougayrede JP,

Kurokawa K, Tashiro K, Tobe T, Nakayama K, Kuhara S et al.:
Extensive genomic diversity and selective conservation of
virulence-determinants in enterohemorrhagic Escherichia coli
strains of O157 and non-O157 serotypes. Genome Biol 2007,
8:R138 doi: 10.1186/gb-2007-8-7-r138.
A systematic whole genome comparison between O157 and non-O157
EHEC strains using microarray and whole genome PCR scanning analyses. An example of modern analyses and comparisons of whole genomes to understand phenotypes and their evolutions in time.
34. Zhang Y, Laing C, Steele M, Ziebell K, Johnson R, Benson AK,

Taboada E, Gannon VP: Genome evolution in major Escherichia
coli O157:H7 lineages. BMC Genomics 2007, 8:121 doi: 10.1186/
1471-2164-8-121.
Same as reference [33], but using 6167 50-mer oligonucleotides wholegenome-based microarrays for E. coli.
35. Hsiao A, Liu Z, Joelsson A, Zhu J: Vibrio cholerae virulence
regulator-coordinated evasion of host immunity. Proc Natl
Acad Sci USA 2006, 103:14542-14547.
36. Pang B, Yan M, Cui Z, Ye X, Diao B, Ren Y, Gao S, Zhang L, Kan B:
Genetic diversity of toxigenic and nontoxigenic Vibrio
cholerae serogroups O1 and O139 revealed by array-based
comparative genomic hybridization. J Bacteriol 2007, 189:48374849.
37. Fox AJ, Taha MK, Vogel U: Standardized nonculture techniques
recommended for European reference laboratories. FEMS
Microbiol Rev 2007, 31:84-88.
38. Turner KM, Feil EJ: The secret life of the multilocus sequence
type. Int J Antimicrob Agents 2007, 29:129-135.
39. Chang CH, Chang YC, Underwood A, Chiou CS, Kao CY:
VNTRDB: a bacterial variable number tandem repeat locus
database. Nucleic Acids Res 2007, 35:D416-421.
40. Martens M, Dawyndt P, Coopman R, Gillis M, De Vos P, Willems A:
Advantages of multilocus sequence analysis for taxonomic
studies: a case study using 10 housekeeping genes in the
genus Ensifer (including former Sinorhizobium). Int J Syst Evol
Microbiol 2008, 58:200-214.
www.sciencedirect.com

Identifications of pathogensa bioinformatic point of view Christen 273

41. Stackebrandt E, Brambilla E, Richert K: Gene sequence


phylogenies of the family microbacteriaceae. Curr Microbiol
2007, 55:42-46.

46. Jolley KA, Chan MS, Maiden MC: mlstdbNet-distributed multilocus sequence typing (MLST) databases. BMC Bioinformatics
2004, 5:86 doi: 10.1186/1471-2105-5-86.

42. Guo Y, Zheng W, Rong X, Huang Y: A multilocus phylogeny of


the Streptomyces griseus 16S rRNA gene clade: use of
multilocus sequence analysis for streptomycete systematics.
Int J Syst Evol Microbiol 2008, 58:149-159.

47. Chevenet F, Brun C, Banuls A-L, Jacq B, Christen R: TreeDyn:


towards dynamic graphics and annotations for analyses of
trees. BMC Bioinformatics 2006, 7:439-448 doi: 10.1186/14712105-7-439.

43. Diederen BM, de Jong CM, Marmouk F, Kluytmans JA,


Peeters MF, Van der Zee A: Evaluation of real-time PCR for the
early detection of Legionella pneumophila DNA in serum
samples. J Med Microbiol 2007, 56:94-101.

48. Croce O, Lamarre M, Christen R: Querying the public databases


for sequences using complex keywords contained in the
feature lines. BMC Bioinformatics 2006, 7:45 doi: 10.1186/14712105-7-45.

44. Vervaeren H, Temmerman R, Devos L, Boon N, Verstraete W:


Introduction of a boost of Legionella pneumophila into a
stagnant-water model by heat treatment. FEMS Microbiol Ecol
2006, 58:583-592.

49. Li W, Godzik A: Cd-hit: a fast program for clustering and


comparing large sets of protein or nucleotide sequences.
Bioinformatics 2006, 22:1658-1659.

45. Ratcliff RM, Lanser JA, Manning PA, Heuzenroeder MW:


Sequence-based classification scheme for the genus
Legionella targeting the mip gene. J Clin Microbiol 1998,
36:1560-1567.

www.sciencedirect.com

50. Bininda-Emonds OR: transAlign: using amino acids to


facilitate the multiple alignment of protein-coding DNA
sequences. BMC Bioinformatics 2005, 6:156 doi: 10.1186/14712105-6-156.

Current Opinion in Biotechnology 2008, 19:266273

Vous aimerez peut-être aussi