Artigo Bioinformatica

Available online at www.sciencedirect.
com
Identifications of pathogensa bioinformatic point of view

Richard Christen
Over the past 15 years, microbiology has undergone a
momentous shift toward molecular methods. New sequences
appear daily in the public databases and new computer tools
and web servers are published on a regular basis. Major
advances in molecular identifications of pathogens have been
made because new biotechnology methods have appeared
that often require a thorough in silico analysis of sequences.
However, significant difficulties partly remain in developing
efficient methods because the public databases contain many
poorly annotated or partial sequences (often of environmental
origin) and also because there are few dedicated web servers
and curated databases.
Addresses
University of Nice Sophia-Antipolis and CNRS UMR 6543, Institute of
Developmental Biology and Cancer, Parc Valrose, Centre de Biochimie,
F 06108 Nice, France
Corresponding author: Christen, Richard (christen@unice.fr)
Current Opinion in Biotechnology 2008, 19:266273

This review comes from a themed issue on
Environmental Biotechnology
Edited by Carla Pruzzo and Pietro Canepari
Available online 29th May 2008
0958-1669/$ see front matter
# 2008 Elsevier Ltd. All rights reserved.
Retrieval of every necessary sequence can, however, be

difficult, while the design of primers and probes is tedious
and may result in lower quality results if the multiple
criteria for design are not properly handled. New
sequences are now flowing in, seemingly faster than
programs can deal with. It is for example no longer easy
to Blast the 16S rRNA gene sequence of a new isolate to
find out which well known bacteria it is related to because
most newly submitted rRNA sequences now originate
from uncultivated clones. Housekeeping genes and
pathogenicity gene sequences have been submitted in
large numbers, but full sequences are not easily retrieved
by Blast (because many are quite divergent) or by keywords because their annotations are often poor or not
standard. Also, and in contrast to the community devoted
to analyses of complete genomes, there are few centralized services or web servers that gather data, clean them
and post them on the web with good query and analysis
tools. Finally, bioinformaticians continuously publish
new tools, but there are very few studies to compare
them and in fact analyze how good these new tools are
(see for example BALIBASE for estimating new alignement programs [24,25]).
Detailed analyses will be restricted to waterborne bacteria, for which we will review available sequences and
possible solutions for in silico analyses of diagnostic
methods before the real experiments.
DOI 10.1016/j.copbio.2008.04.003
Choice of a target gene

Introduction
In microbiology, nucleic-acid-based diagnostics gradually
are replacing culture-based methods [1,2,3,4].
Procedures that rely on PCR of a single gene or multilocus
sequence typing [58] as well as arrays [9,10,1113]
require the design of oligomers for amplification and
hybridization. Mass sequencing [14,15,16,17] or 16S
rRNA mass cataloging [18] produce sequences that have
to be matched to a database of known sequences. Finally
entirely new methods are appearing [19]. The consortium
of DDBJ, EMBL & GenBank exchanges data on a daily
basis (URL: http://www.insdc.org) and contains almost
every known sequence. Blast [20] is used to retrieve similar
sequences, ACNUC [21], SRS [22] or Entrez [23] retrieve
sequences according to keywords. There are many
free utilities to align sequences, compute and display
phylogenetic trees (URL: http://www.bioinformatics.org/
). Finally design of primers and probes can be done
using many tools (URL: http://bioinfo.unice.fr/softwares/
oligo_softwares.html).
Target genes for bacterial identification can be the 16S

rRNA gene, a housekeeping gene or finally a pathogenicity gene. Some species are always pathogenic, and
targeting 16S rRNA gene sequences is often the solution
because many sequences have been published, PCR
primers and hybridization probes have usually been
described and tested; finally dedicated software and
web sites are available [26]. Cases of lateral transferts
[2729]) or too similar 16S rRNA gene sequences
(reviewed in reference [30]) have also highlighted the
need to use other or more rapidly evolving genes [31].
Some of these genes have, however, been completely
sequenced in very few different strains or species, making
it dubious that truly universal or specific oligomers have
been really designed. Also the general absence of very
conserved domains renders primers and probes design
difficult. Finally, there is always the chance that yet
unknown variant sequences exist that will escape molecular detection because of mutations. The last case
applies to clones that become pathogenic only after
acquisition of pathogenicity genes [32,33,34] or when
pathogenicity depends upon the genetic content, that is,
www.sciencedirect.com
Identifications of pathogensa bioinformatic point of view Christen 267
by differential regulation of some genes or integration of

genes (or domains) that belong to the species or genus
gene pool but are not always present in a particular clone
[35,36]. In such cases, targeting pathogenicity genes is the
best choice, with difficulties similar to housekeeping
genes. For other approaches such as multilocus sequence
typing (MLST) and analyses of variable number of tandem repeats (VNTR) see references [3740] for
examples.
different strains or species. On the contrary, one may

expect less divergence (due to smaller population sizes
and slower division rates) to be present in a population.
For Eukaria (often protists), the approach is very similar,

but there are often many fewer sequences available from
A list of pathogens likely to be found in aquatic environments was built (primarily based on WHO list
Finally, viruses are a very different situation, since there is

no homologous housekeeping gene shared among viruses,
and mutation rates are expected to be much higher.
Retrieval of sequence data for the major

waterborne pathogens
Table 1
For each taxon, the number of entries (number of different submissions) of protein coding sequences (CDS) and of genomes projects was
analyzed
Taxon
Entries
nbr CDS
Genomes
Adenoviridae (Atadenovirus, Aviadenovirus, Mastadenovirus, Siadeonvirus)

Atadenovirus (various adenoviruses)
Astroviridae (Avastrovirus, Mamastrovirus)
Caliciviridae (Lagovirus, Nebraska-like virus, Norovirus, Sapovirus, Vesivirus)
Hepeviridae (Hepatitis E virus)
Mamastrovirus (Astrovirus of various hosts)
Picornaviridae (Enterovirus, Hepatovirus, . . .)
Reoviridae (Aquareovirus, . . ., Rotavirus)
Enterovirus
Hepatovirus
Human enterovirus A
Human enterovirus B
Human enterovirus C
Human enterovirus D
Human astrovirus
Rotavirus
Sapovirus
3644
69
1072
6371
2767
809
21296
8232
13057
3605
3010
6578
762
161
792
5488
553
5356
198
1100
7410
2611
834
17975
8015
10994
3014
2723
5425
639
125
813
5254
608
44
5
6
16
1
3
40
352
16
2
1
1
1
1
1
33
4
Burkholderia pseudomallei
Campylobacter coli
Campylobacter jejuni
Escherichia coli
Legionella pneumophila
Legionella
Pseudomonas aeruginosa
Salmonella typhi
Salmonella
Shigella
Vibrio cholerae
Vibrio parahaemolyticus
Vibrio vulnificus
Vibrio
Yersinia enterocolitica
568
346
2153
39004
2451
3386
35034
191
7953
4149
2532
882
821
10358
445
27058
367
11676
75822
15145
15545
22340
474
41953
31292
10833
5775
10549
40967
5093
24
1
11
35
4
4
7
0
24
8
17
3
4
34
1
33450
11148
39678
164
2
101006
205767
24507
67
180
1262
1796
2
0
706
843
1364
38
1
2
4
0
0
0
6
0
0
Acanthamoeba
Cryptosporidium parvum
Cryptosporidium
Cyclospora cayetanensis
Dracunculus medinensis
Entamoeba histolytica
Entamoeba
Giardia intestinalis
Naegleria fowleri
Viruses were also queried according to a higher taxonomic rank because the names used to describe them can be quite different in different entries. A
table in additional materials also provides the list of most sequenced genes for a number of waterborne pathogens. Note: complete lists or synthetic
information on genome projects can be manually obtained from URL: http://www.ncbi.nlm.nih.gov/Genomes/ or URL: http://www.genomesonline.org/. Genome numbers are for finished to in progress projects.
268 Environmental Biotechnology
URL: http://www.who.int/water_sanitation_health/
gdwqrevision/watpathogens.pdf) and used to query GenBank release 163 (updates up to February 5, 2008) with
ACNUC [21]. The questions asked were what are the
respective numbers of entries and protein coding genes
available for each organism (an entry is a separate submission identified by its accession number, it may contain
the sequences of many genes), and how many complete
genomes are available. These data (Table 1) demonstrate contrasting situations for the different pathogens;
some organisms have been widely sequenced, both in
terms of entries as well as complete genomes. Other
pathogens, especially among Eukarya have gathered little
interest as for pathogens of the Apicomplexa phylum, the
Nematoda Dracunculus, and Naegleria fowleri. Finally, for
Bacteria and Eukarya, querying by species name is rather
easy, but for viruses, the name of the host is often
included in the name of the organism, making queries
more difficult. In order to retrieve all of the sequences for
a given virus, it is often necessary to query using a higher
order keyword (Table 1).
Genbank contained about 1 188 211 bacterial entries
(pathogenic or not), the 16S rRNA gene alone contributing 647 899 sequences. Noteworthy is that many of these
sequences were short to very short (50500 nt, 2530331
entries), only 186 310 entries had a length of 1 200 or
more. Only 32 900 of these long sequences belonged to
cultured strains annotated as about 8 000 different
species.
Considering all bacterial entries, the nifH gene (involved
in nitrogen fixation) was the most sequenced (9 421
entries), followed by gyrB (encoding a type II topoisomerase, 6 845 sequences) and rpoB (coding for b-subunit of
RNA polymerase, 6 231 sequences). For waterborne bacterial pathogens and surprisingly the mdh gene (which
encodes for an enzyme that catalyzes the interconversion
of malate and oxaloacetate) were the most sequenced,
followed by three housekeeping genes: gyrB, rpoB, and
recA. Genes gyrB (sequences available for 337 genera and
1 483 species, most sequenced genera: Pseudomonas and
Vibrio), rpoB (sequences available for 238 genera and
1 565 species, most sequenced genus: Mycobacterium)
and recA (sequences available for 232 genera and 999
species, most sequenced genus: Vibrio) have been widely
used as taxonomic markers.
Domains sequenced
The level of sequence divergence as well as the length of
available sequences drive the phylogenetic resolution. It
is not possible to easily provide such an evaluation at the
bacterial level because sequences for the different genes
are available for a wide but different distribution of taxa.
We therefore restrained the analysis to the Vibrio
sequences or gyrB, recA, and rpoB. To simplify the
analyses and results, protein sequences were downloaded
and aligned. The length distributions showed that most

submitted sequences do not cover the entire length
(Table 2), as many sequences result from PCR amplifications using universal primers, which makes evaluation of
published primers difficult. Recent analyses showed that
a good phylogenetic resolution may be obtained using a
different gene for different taxa or better a multilocus
identification [4042].
Primers for gene amplification, a case study

Even using bioinformatics analyses, it would have been
very tedious if not impossible to evaluate published
primers for every gene and to present the results here.
As a case study we used the mip (macrophage infectivity
Table 2
Domains sequenced for rpoB, gyrB, and recA gene sequences
(Vibrios)
Length
lb
rb
Number
rpoB
1400
1370
400
400
300
300
200
100
100
0
30
700
400
700
400
1100
1100
600
1400
1400
1100
800
1000
700
1300
1200
700
2
3
15
1
2
65
1
9
1
gyrB
800
410
400
380
370
340
310
300
200
200
100
0
90
100
20
30
60
90
100
200
100
200
800
500
500
400
400
400
400
400
400
300
300
5
47
4
2
1
16
1
69
212
162
1
recA
400
360
350
340
330
240
230
220
210
200
130
120
100
0
40
50
60
70
60
70
80
90
100
70
80
100
400
400
400
400
400
300
300
300
300
300
200
200
200
1
1
7
6
8
1
94
141
47
300
63
4
22
This table is sorted by decreasing sequence lengths (column 1), lb and

rb correspond to left and right boundaries of sequences in alignments
(sequence starts aligning at or after lb position and ends before rb),
number is the number of sequences in each category (rpoB 99
sequences, 26 species; gyrB 520 sequences, 59 species; recA 695
sequences, 58 species). In bold are the most sequenced domains.
Figure 1
Heatmap analysis of oligomers used in references [43,44] to identify the presence of the mip gene. Tms were calculated using the nearest
neighbor algorithm and were then transformed into colors (corresponding Tm/color shown in Figure). Each column of the heatmap (on the right)
corresponds to an oligomer as indicated in the box Primers identifiers. A gray square is for a Tm below 40 8C, a white square for a sequence
potentiator) gene that in Legionella encodes for a surface

protein, required for optimal infection of macrophages.
Querying the literature returned 44 publications that used
mip as a target for identification, and for the purpose of
this review, we analyzed only two recent studies [43,44].
We retrieved a total of 278 mip sequences in Legionella
species, only 146 of which were distinct (not contained in
a longer sequence). We evaluated how each oligomer
would bind to each variant of the mip gene sequences
(Table 3). It is particularly striking that primer Mip-R1
shows a mismatch for most sequences in first position, a
simple blast confirmed this problem. For the other oligomers, this analysis demonstrates that a number of
variant sequences will probably not be well recognized.
Table 3
Evaluation of primers and probes recently used for the identification of the mip genes in Legionella
We also analyzed if the mip gene was present in Legionella

species different from L. pneumophila and coupled Tm
calculation with a phylogenetic tree (Figure 1). This
analysis demonstrates that some oligomers are indeed
specifically targeting the mip gene in L. pneumophila
and not in other species of Legionella. The fact that the
mip gene is also present in other species of Legionella is not
clearly stated in these publications (but see reference
[45]), and since lateral gene transfers are rather common
in bacteria, it is not clear whether present primers indeed
amplify mip genes in every L. pneumophila strain (see
Figure 1).
Bioinformatic tools
Aside from the multipurposes tools available at NCBI,
EBI or elsewhere, a number of web servers or programs
may help analyses:
GreenGenes. The greengenes web application provides access to a 16S rRNA gene sequence alignment
for browsing, blasting, probing, and downloading:
URL: http://greengenes.lbl.gov.
PubMLST. This site hosts publicly accessible MLST
databases and software: URL: http://pubmlst.org, see
also reference [46].
Legionella mip gene Sequence Database. This database
allows the comparison of a new mip gene DNA
sequences with reference sequences from all described
species of Legionella: URL: http://www.hpa.org.uk/cfi/
bioinformatics/ewgli/legionellamips.htm.
leBIBI. Blast on databases of SSU-rDNA, gyrB, recA,
sodA, rpoB, tmRNA, tuf and groel2-hsp65 gene
sequences and tools for bacterial identification: URL:
http://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgi.
ICB. Identification and classification of bacteria
database using gyrB: URL: http://seasquirt.mbio.co.jp/icb/.
For each oligomer: column (1) Tm in 8C estimated for each mip

sequence variant; column (2) the variant sequence; column (3) the
number of such sequences (about 270 mip sequences available, only
excerpts shown). F: forward primer, R: reverse primer.
GPMS. Pathogenic bacteria strain genotyping essentially for epidemiological purposes based on polymorphic tandem repeat typing: URL: http://
minisatellites.u-psud.fr.
VNTR. Molecular typing of bacteria using variable
number tandem repeats: URL: http://vntr.csie.ntu.edu.tw.
OHM. A tool that produces heatmaps representing in
a visual manner the Tm of primers on a set of
sequences (can be combined with TreeDyn [47]):
URL: http://bioinfo.unice.fr/ohm.
(Figure 1 Legend Continued ) too short to contain the oligomer. Upper Figure (A) excerpt of L. pneumophila clade (possible cases of lateral
transfert in red). Lower Figure (B) excerpt of non-L. pneumophila clade. Primer #3 shows the highest predicted Tm, but will fail on some
sequences; primer #1 also shows quite a wide heterogeneity of predicted Tms. The full figure is available as supplementary material.
A Blast server, to Blast 16S rRNA sequences on

cultured bacteria only: URL: http://bioinfo.unice.fr/
blast.
DDBJ. A Blast server to blast only on 16S rRNA gene
sequences only (fast): URL: http://blast.ddbj.nig.ac.jp/
top-e.html.
The list of prokaryotic names with standing in
nomenclature (now including 16S rRNA accession
numbers): URL: http://www.bacterio.cict.fr/.
Norovirus Molecular Epidemiology Database. The
norovirus database contains a collection of over 1000
sequences of norovirus strains and associated epidemiological data: URL: http://www.hpa.org.uk/cfi/
bioinformatics/norwalk/norovirus.htm.
Conclusions
If none of the above servers can be used (this is not an
exhaustive list), sequence retrieval, alignments, phylogenies, and design of primers can be quite time consuming
and tedious for scientists that cannot write computer
programs. Sequence retrieval using keywords is often
more efficient than a Blast. SRS (Advanced Search form)
or even better ACNUC or specific tools [48] should be
preferred to Entrez, because they are more powerful for
sequence retrieval. Combining keywords for the gene or
gene products with species name or taxon ID and a filter
on sequence length (very short sequences are useless) is
often very efficient. Since annotations are not standard,
building a list of gene products is often necessary (see
additional materials). If there are many sequences, it is
possible to cluster these sequences at a given similarity
level (using blastclust or Cd-hit [49]) and align one
representative sequence per cluster. A visual inspection
of alignments reveals sequences that do not align well;
they are often the result of a wrong annotation or have to
be inverted-complemented. The remaining sequences
can then be added to this good alignment (using Clustal
profile option for example). For protein coding gene a
program such as Transalign [50] may be a good choice.
When retrieving primers from publications, older papers
are often useless because primers were designed using a
very few numbers of sequences (primers can be analyzed
using the web server cited above, to produce figures
similar to Figure 1).
Finally, there is a large difference between amplification
using DNA extracted from a pure culture and DNA
extracted from an environmental sample. Primer (P)
binds to its target DNA (T) according to the classical
equation [P][T]/[PT] = Km. The presence of one or two
differences between the P sequence and the T sequence
may strongly influence the value of Km. With DNA
extracted from a pure culture [T] may be sufficiently
high so that [PT] is large enough for the PCR to succeed.
With environmental DNA, and in the presence of mismatch(es), the primer may bind to many other domains (at
low affinity but in many places) so that [PT] is not large
enough to allow a successful amplification. This is why,

for environmental studies, any published primers should
always be carefully checked by comparison to newly
published sequences.
Acknowledgements
This work was supported by funds from the European Commission for the
HEALTHY WATER project (FOOD-CT-2006-036306) and a CNRS PICS
to R Christen. The authors are solely responsible for the content of this
publication. It does not represent the opinion of the European Commission.
The European Commission is not responsible for any use that might be
made of data appearing therein.
Appendix A. Supplementary data

Supplementary data associated with this article can be
found, in the online version, at doi:10.1016/j.copbio.
2008.04.003.
Conflict of interest
None.
References and recommended reading

Papers of particular interest, published within the annual period of
review, have been highlighted as:
of special interest
of outstanding interest
1.

Barken KB, Haagensen JA, Tolker-Nielsen T: Advances in nucleic

acid-based diagnostics of bacterial infections. Clin Chim Acta
2007, 384:1-11.
This review describes a range of different nucleic-acid-based diagnostic
methods and provides examples of the use of these methods for detection
of common bacterial infections, with a focus on automated procedures.
2.

Abubakar I, Irvine L, Aldus CF, Wyatt GM, Fordham R, Schelenz S,

Shepstone L, Howe A, Peck M, Hunter PR: A systematic review
of the clinical, public health and cost-effectiveness of rapid
diagnostic tests for the detection and identification of
bacterial intestinal pathogens in faeces and food. Health
Technol Assess 2007, 11:1-216.
This is a (230 pages long) review provided by the Health Technology
Assessment (HTA) program, now part of the National Institute for Health
Research (NIHR) and based on studies evaluating diagnostic accuracy of
rapid tests were retrieved using electronic databases and handsearching
reference lists and key journals, including cost assessments. Every study
is critically evaluated.
3.

Tenover FC: Rapid detection and identification of bacterial

pathogens using novel molecular technologies: infection
control and beyond. Clin Infect Dis 2007, 44:418-423.
A short (far from exhaustive) review comparing effectiveness of PNAFISH, real-time PCR and pyrosequencing and discussing the use of FDAcleared versus non-FDA-cleared assays (antibiotic resistance).
4. Shneyer VS: On the species-specificity of DNA: fifty years later.

Biochemistry (Mosc) 2007, 72:1377-1384.
A short historical review of the molecular methods used to identify
prokaryotes and eukaryotes.
5.
Angenent LT, Kelley ST, St Amand A, Pace NR, Hernandez MT:

Molecular identification of potential pathogens in water and
air of a hospital therapy pool. Proc Natl Acad Sci USA 2005,
102:4860-4865.
6.
Best EL, Fox AJ, Frost JA, Bolton FJ: Real-time singlenucleotide polymorphism profiling using Taqman technology
for rapid recognition of Campylobacter jejuni clonal
complexes. J Med Microbiol 2005, 54:919-925.
7.
Lehmann LE, Hunfeld KP, Emrich T, Haberhausen G, Wissing H,

Hoeft A, Stuber F: A multiplex real-time PCR assay for rapid
detection and differentiation of 25 bacterial and fungal
pathogens from whole blood samples. Med Microbiol Immunol
2007.
8.
9.
Ciammaruconi A, Grassi S, De Santis R, Faggioni G, Pittiglio V,

DAmelio R, Carattoli A, Cassone A, Vergnaud G, Lista F: Fieldable
genotyping of Bacillus anthracis and Yersinia pestis based on
25-loci multi locus VNTR analysis. BMC Microbiol 2008, 8:21
doi: 10.1186/1471-2180-8-21.
Wang XW, Zhang L, Jin LQ, Jin M, Shen ZQ, An S, Chao FH, Li JW:
Development and application of an oligonucleotide
microarray for the detection of food-borne bacterial
pathogens. Appl Microbiol Biotechnol 2007, 76:225-233.
24. Thompson JD, Plewniak F, Poch O: BAliBASE: a benchmark

alignment database for the evaluation of multiple alignment
programs. Bioinformatics 1999, 15:87-88.
25. Conery JS: Aligning sequences by minimum description
length. EURASIP J Bioinform Syst Biol 2007:72936.
26. Kumar Y, Westram R, Kipfer P, Meier H, Ludwig W: Evaluation of
sequence alignments and oligonucleotide probes with
respect to three-dimensional structure of ribosomal RNA
using ARB software package. BMC Bioinformatics 2006, 7:240
doi: 10.1186/1471-2105-7-240.
10. DeSantis TZ, Brodie EL, Moberg JP, Zubieta IX, Piceno YM,

Andersen GL: High-density universal 16S rRNA microarray
analysis reveals broader diversity than typical clone library
when sampling the environment. Microb Ecol 2007, 53:371-383.
Identification of pathogens in environmental samples often use parallel,
multispecies detection systems, in order to detect any pathogens. In this
analysis a DNA array with 2 97 851 probes was compared with 16S
cloning and sequencing to evaluate the biodiversity, with the conclusion
that the array was more efficient. However, pyrosequencing technologies
are likely to replace both of the approaches compared in this work.
28. Schouls LM, Schot CS, Jacobs JA: Horizontal transfer of

segments of the 16S rRNA genes between species of the
Streptococcus anginosus group. J Bacteriol 2003, 185:72417246.
11. Wiesinger-Mayr H, Vierlinger K, Pichler R, Kriegner A, Hirschl AM,

Presterl E, Bodrossy L, Noehammer C: Identification of human
pathogens isolated from blood using microarray hybridisation
and signal pattern recognition. BMC Microbiol 2007, 7:78 doi:
10.1186/1471-2180-7-78.
29. Dewhirst FE, Shen Z, Scimeca MS, Stokes LN, Boumenna T,

Chen T, Paster BJ, Fox JG: Discordant 16S and 23S rRNA gene
phylogenies for the genus Helicobacter: implications for
phylogenetic inference and systematics. J Bacteriol 2005,
187:6106-6118.
12. Hansen RR, Sikes HD, Bowman CN: Visual detection of labeled
oligonucleotides using visible-light-polymerization-based
amplification. Biomacromolecules 2008, 9:355-362.
30. Janda JM, Abbott SL: 16S rRNA gene sequencing for bacterial
identification in the diagnostic laboratory: pluses, perils, and
pitfalls. J Clin Microbiol 2007, 45:2761-2764.
13. Lin YC, Sheng WH, Chang SC, Wang JT, Chen YC, Wu RJ,
Hsia KC, Li SY: Application of a microsphere-based array for
rapid identification of Acinetobacter spp. with distinct
antimicrobial susceptibilities. J Clin Microbiol 2008, 46:612-617.
31. Santos SR, Ochman H: Identification and phylogenetic sorting

of bacterial lineages with universally conserved genes and
proteins. Environ Microbiol 2004, 6:754-759.
14. Yang ZJ, Tu MZ, Liu J, Wang XL, Jin HZ: Comparison of
amplicon-sequencing, pyrosequencing and real-time PCR for
detection of YMDD mutants in patients with chronic hepatitis
B. World J Gastroenterol 2006, 12:7192-7196.
15. Kobayashi N, Bauer TW, Tuohy MJ, Lieberman IH, Krebs V,

Togawa D, Fujishiro T, Procop GW: The comparison of
pyrosequencing molecular Gram stain, culture, and
conventional Gram stain for diagnosing orthopaedic
infections. J Orthop Res 2006, 24:1641-1649.
Sequencing more efficient than staining to differentiate Gram-positive
from Gram-negative bacteria. Who would have bet on it in 2005?
16. Luna RA, Fasciano LR, Jones SC, Boyanton BL Jr, Ton TT,
Versalovic J: DNA pyrosequencing-based bacterial pathogen
identification in a pediatric hospital setting. J Clin Microbiol
2007, 45:2985-2992.
17. Dowd SE, Sun Y, Secor PR, Rhoads DD, Wolcott BM, James GA,
Wolcott RD: Survey of bacterial diversity in chronic wounds
using Pyrosequencing, DGGE, and full ribosome shotgun
sequencing. BMC Microbiol 2008, 8:43 doi: 10.1186/1471-21808-43.
18. Jackson GW, McNichols RJ, Fox GE, Willson RC: Bacterial
genotyping by 16S rRNA mass cataloging. BMC Bioinformatics
2006, 7:321 doi: 10.1186/1471-2105-7-321.
19. Grun J, Manka CK, Nikitin S, Zabetakis D, Comanescu G, Gillis D,
Bowles J: Identification of bacteria from two-dimensional
resonant-Raman spectra. Anal Chem 2007, 79:5489-5493.
20. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,
Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation
of protein database search programs. Nucleic Acids Res 1997,
25:3389-3402.
21. Gouy M, Delmotte S: Remote access to ACNUC nucleotide and
protein sequence databases at PBIL. Biochimie 2008, 90:555562.
22. Etzold T, Ulyanov A, Argos P: SRS: information retrieval system
for molecular biology data banks. Methods Enzymol 1996,
266:114-128.
23. Schuler GD, Epstein JA, Ohkawa H, Kans JA: Entrez: molecular
biology database and retrieval system. Methods Enzymol 1996,
266:141-162.
27. van Berkum P, Terefework Z, Paulin L, Suomalainen S,

Lindstrom K, Eardly BD: Discordant phylogenies within the rrn
loci of Rhizobia. J Bacteriol 2003, 185:2988-2998.
32. Smith DL, Wareing BM, Fogg PC, Riley LM, Spencer M, Cox MJ,
Saunders JR, McCarthy AJ, Allison HE: Multilocus
characterization scheme for shiga toxin-encoding
bacteriophages. Appl Environ Microbiol 2007, 73:8032-8040.
33. Ogura Y, Ooka T, Asadulghani, Terajima J, Nougayrede JP,

Kurokawa K, Tashiro K, Tobe T, Nakayama K, Kuhara S et al.:
Extensive genomic diversity and selective conservation of
virulence-determinants in enterohemorrhagic Escherichia coli
strains of O157 and non-O157 serotypes. Genome Biol 2007,
8:R138 doi: 10.1186/gb-2007-8-7-r138.
A systematic whole genome comparison between O157 and non-O157
EHEC strains using microarray and whole genome PCR scanning analyses. An example of modern analyses and comparisons of whole genomes to understand phenotypes and their evolutions in time.
34. Zhang Y, Laing C, Steele M, Ziebell K, Johnson R, Benson AK,

Taboada E, Gannon VP: Genome evolution in major Escherichia
coli O157:H7 lineages. BMC Genomics 2007, 8:121 doi: 10.1186/
1471-2164-8-121.
Same as reference [33], but using 6167 50-mer oligonucleotides wholegenome-based microarrays for E. coli.
35. Hsiao A, Liu Z, Joelsson A, Zhu J: Vibrio cholerae virulence
regulator-coordinated evasion of host immunity. Proc Natl
Acad Sci USA 2006, 103:14542-14547.
36. Pang B, Yan M, Cui Z, Ye X, Diao B, Ren Y, Gao S, Zhang L, Kan B:
Genetic diversity of toxigenic and nontoxigenic Vibrio
cholerae serogroups O1 and O139 revealed by array-based
comparative genomic hybridization. J Bacteriol 2007, 189:48374849.
37. Fox AJ, Taha MK, Vogel U: Standardized nonculture techniques
recommended for European reference laboratories. FEMS
Microbiol Rev 2007, 31:84-88.
38. Turner KM, Feil EJ: The secret life of the multilocus sequence
type. Int J Antimicrob Agents 2007, 29:129-135.
39. Chang CH, Chang YC, Underwood A, Chiou CS, Kao CY:
VNTRDB: a bacterial variable number tandem repeat locus
database. Nucleic Acids Res 2007, 35:D416-421.
40. Martens M, Dawyndt P, Coopman R, Gillis M, De Vos P, Willems A:
Advantages of multilocus sequence analysis for taxonomic
studies: a case study using 10 housekeeping genes in the
genus Ensifer (including former Sinorhizobium). Int J Syst Evol
Microbiol 2008, 58:200-214.
41. Stackebrandt E, Brambilla E, Richert K: Gene sequence

phylogenies of the family microbacteriaceae. Curr Microbiol
2007, 55:42-46.
46. Jolley KA, Chan MS, Maiden MC: mlstdbNet-distributed multilocus sequence typing (MLST) databases. BMC Bioinformatics
2004, 5:86 doi: 10.1186/1471-2105-5-86.
42. Guo Y, Zheng W, Rong X, Huang Y: A multilocus phylogeny of

the Streptomyces griseus 16S rRNA gene clade: use of
multilocus sequence analysis for streptomycete systematics.
Int J Syst Evol Microbiol 2008, 58:149-159.
47. Chevenet F, Brun C, Banuls A-L, Jacq B, Christen R: TreeDyn:

towards dynamic graphics and annotations for analyses of
trees. BMC Bioinformatics 2006, 7:439-448 doi: 10.1186/14712105-7-439.
43. Diederen BM, de Jong CM, Marmouk F, Kluytmans JA,

Peeters MF, Van der Zee A: Evaluation of real-time PCR for the
early detection of Legionella pneumophila DNA in serum
samples. J Med Microbiol 2007, 56:94-101.
48. Croce O, Lamarre M, Christen R: Querying the public databases

for sequences using complex keywords contained in the
feature lines. BMC Bioinformatics 2006, 7:45 doi: 10.1186/14712105-7-45.
44. Vervaeren H, Temmerman R, Devos L, Boon N, Verstraete W:

Introduction of a boost of Legionella pneumophila into a
stagnant-water model by heat treatment. FEMS Microbiol Ecol
2006, 58:583-592.
49. Li W, Godzik A: Cd-hit: a fast program for clustering and

comparing large sets of protein or nucleotide sequences.
Bioinformatics 2006, 22:1658-1659.
45. Ratcliff RM, Lanser JA, Manning PA, Heuzenroeder MW:

Sequence-based classification scheme for the genus
Legionella targeting the mip gene. J Clin Microbiol 1998,
36:1560-1567.
50. Bininda-Emonds OR: transAlign: using amino acids to

facilitate the multiple alignment of protein-coding DNA
sequences. BMC Bioinformatics 2005, 6:156 doi: 10.1186/14712105-6-156.

Artigo Bioinformatica

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Artigo Bioinformatica

Transféré par

Droits d'auteur :

Formats disponibles

Available online at www.sciencedirect.

Identifications of pathogensa bioinformatic point of view

Current Opinion in Biotechnology 2008, 19:266273

Retrieval of every necessary sequence can, however, be

Choice of a target gene

Target genes for bacterial identification can be the 16S

Identifications of pathogensa bioinformatic point of view Christen 267

by differential regulation of some genes or integration of

different strains or species. On the contrary, one may

For Eukaria (often protists), the approach is very similar,

Finally, viruses are a very different situation, since there is

Retrieval of sequence data for the major

Adenoviridae (Atadenovirus, Aviadenovirus, Mastadenovirus, Siadeonvirus)

Current Opinion in Biotechnology 2008, 19:266273

268 Environmental Biotechnology

and aligned. The length distributions showed that most

Primers for gene amplification, a case study

This table is sorted by decreasing sequence lengths (column 1), lb and

Identifications of pathogensa bioinformatic point of view Christen 269

Current Opinion in Biotechnology 2008, 19:266273

270 Environmental Biotechnology

potentiator) gene that in Legionella encodes for a surface

We also analyzed if the mip gene was present in Legionella

For each oligomer: column (1) Tm in 8C estimated for each mip

Current Opinion in Biotechnology 2008, 19:266273

Identifications of pathogensa bioinformatic point of view Christen 271

 A Blast server, to Blast 16S rRNA sequences on

enough to allow a successful amplification. This is why,

Appendix A. Supplementary data

References and recommended reading

Barken KB, Haagensen JA, Tolker-Nielsen T: Advances in nucleic

Abubakar I, Irvine L, Aldus CF, Wyatt GM, Fordham R, Schelenz S,

Tenover FC: Rapid detection and identification of bacterial

4. Shneyer VS: On the species-specificity of DNA: fifty years later.

Angenent LT, Kelley ST, St Amand A, Pace NR, Hernandez MT:

Lehmann LE, Hunfeld KP, Emrich T, Haberhausen G, Wissing H,

272 Environmental Biotechnology

Ciammaruconi A, Grassi S, De Santis R, Faggioni G, Pittiglio V,

24. Thompson JD, Plewniak F, Poch O: BAliBASE: a benchmark

28. Schouls LM, Schot CS, Jacobs JA: Horizontal transfer of

11. Wiesinger-Mayr H, Vierlinger K, Pichler R, Kriegner A, Hirschl AM,

29. Dewhirst FE, Shen Z, Scimeca MS, Stokes LN, Boumenna T,

31. Santos SR, Ochman H: Identification and phylogenetic sorting

27. van Berkum P, Terefework Z, Paulin L, Suomalainen S,

Identifications of pathogensa bioinformatic point of view Christen 273

41. Stackebrandt E, Brambilla E, Richert K: Gene sequence

42. Guo Y, Zheng W, Rong X, Huang Y: A multilocus phylogeny of

47. Chevenet F, Brun C, Banuls A-L, Jacq B, Christen R: TreeDyn:

43. Diederen BM, de Jong CM, Marmouk F, Kluytmans JA,

48. Croce O, Lamarre M, Christen R: Querying the public databases

44. Vervaeren H, Temmerman R, Devos L, Boon N, Verstraete W:

49. Li W, Godzik A: Cd-hit: a fast program for clustering and

45. Ratcliff RM, Lanser JA, Manning PA, Heuzenroeder MW:

50. Bininda-Emonds OR: transAlign: using amino acids to

Current Opinion in Biotechnology 2008, 19:266273

Vous aimerez peut-être aussi

A Blast server, to Blast 16S rRNA sequences on