Vous êtes sur la page 1sur 8

JOURNAL OF BACTERIOLOGY, Oct. 2009, p. 60676074 Vol. 191, No.

19
0021-9193/09/$08.000 doi:10.1128/JB.00762-09
Copyright 2009, American Society for Microbiology. All Rights Reserved.

Comparative Sequence Analysis of Mycobacterium leprae and the New


Leprosy-Causing Mycobacterium lepromatosis
Xiang Y. Han,1 Kurt C. Sizer,1 Erika J. Thompson,2 Juma Kabanja,3 Jun Li,3 Peter Hu,3
Laura Gomez-Valero,4 and Francisco J. Silva5,6*
Department of Laboratory Medicine,1 DNA Analysis Core Facility,2 and School of Health Sciences,3 The University of
Texas M. D. Anderson Cancer Center, Houston, Texas; Institut Pasteur, Unite de Biologie des Bacteries Intracellulaires,
Paris, France4; Institut Cavanilles de Biodiversitat i Biologia Evolutiva, Universitat de Valencia, Valencia,
Spain5; and CIBER en Epidemiologa y Salud Publica (CIBERESP), Madrid, Spain6
Received 12 June 2009/Accepted 20 July 2009

Mycobacterium lepromatosis is a newly discovered leprosy-causing organism. Preliminary phylogenetic anal-


ysis of its 16S rRNA gene and a few other gene segments revealed significant divergence from Mycobacterium
leprae, a well-known cause of leprosy, that justifies the status of M. lepromatosis as a new species. In this study
we analyzed the sequences of 20 genes and pseudogenes (22,814 nucleotides). Overall, the level of matching of
these sequences with M. leprae sequences was 90.9%, which substantiated the species-level difference; the levels
of matching for the 16S rRNA genes and 14 protein-encoding genes were 98.0% and 93.1%, respectively, but the
level of matching for five pseudogenes was only 79.1%. Five conserved protein-encoding genes were selected to
construct phylogenetic trees and to calculate the numbers of synonymous substitutions (dS values) and
nonsynonymous substitutions (dN values) in the two species. Robust phylogenetic trees constructed using
concatenated alignment of these genes placed M. lepromatosis and M. leprae in a tight cluster with long terminal
branches, implying that the divergence occurred long ago. The dS and dN values were also much higher than
those for other closest pairs of mycobacteria. The dS values were 14 to 28% of the dS values for M. leprae and
Mycobacterium tuberculosis, a more divergent pair of species. These results thus indicate that M. lepromatosis
and M. leprae diverged 10 million years ago. The M. lepromatosis pseudogenes analyzed that were also
pseudogenes in M. leprae showed nearly neutral evolution, and their relative ages were similar to those of M.
leprae pseudogenes, suggesting that they were pseudogenes before divergence. Taken together, the results
described above indicate that M. lepromatosis and M. leprae diverged from a common ancestor after the massive
gene inactivation event described previously for M. leprae.

Leprosy, one of the oldest human diseases, remains a sig- sion numbers AL583917 to AL583926) (4) and strain Br4923
nificant public health problem in many developing countries from Brazil (GenBank accession number FM211192) (N.
(8). Mycobacterium leprae was the only known cause of leprosy Honore et al., unpublished data) share 99.98% identity.
until recently, when a new mycobacterium, Mycobacterium lep- Like M. leprae, M. lepromatosis has not been cultivated on
romatosis, was found to be the cause of diffuse lepromatous artificial media. In addition, our previous study also showed
leprosy (DLL), a unique form of leprosy endemic in Mexico other similarities between these organisms, such as degenera-
and the Caribbean (17). The discovery of this new species may tion of mmaA3 into a pseudogene, the presence of unique
provide an explanation for the clinical and geographic variabil- AT-rich inserted sequences in the 16S rRNA gene, identical
ity of leprosy. six-base tandem repeats in rpoT, similar GC contents, and
The initial phylogenetic analysis of M. lepromatosis was car- great evolutionary distance from other mycobacteria (17).
ried out using the sequences of the 16S rRNA gene and seg- The M. leprae genome (3.3 Mb) is much smaller than the
ments of groEL, rpoB, and other genes (total, 4.99 kb) (17). Mycobacterium tuberculosis genome (4.4 Mb) (3, 4). More in-
This study revealed significant sequence differences between triguingly, the M. leprae genome has undergone reductive evo-
M. lepromatosis and all known Mycobacterium species and lution; 40% of the genes are inactivated (4), and 50% of
placed M. lepromatosis closest to M. leprae. However, the se- the genes of the last common ancestor of M. leprae and M.
quence variation justified assigning a new species for the new tuberculosis have been lost (13). On the other hand, the M.
organism instead of classifying it as a variant of M. leprae. All leprae genome has been far more stable than the M. tubercu-
M. leprae strains collected worldwide have been found to be losis genome, and the worldwide clonality of the M. leprae
clonal and to differ by only single-nucleotide polymorphism or strains paralleled the global spread of M. leprae strains that
variable numbers of tandem repeats (24). Also, the genomes of occurred via human activity and migration during the last
two M. leprae strains, strain TN from India (GenBank acces- 100,000 years (24). Recently, by comparing the genomes of
M. leprae and M. tuberculosis and by analyzing the ages of the
M. leprae pseudogenes, Gomez-Valero et al. (13) estimated
* Corresponding author. Mailing address: Institut Cavanilles de Biodi- that a massive gene inactivation event took place in the M.
versitat i Biologia Evolutiva, Universitat de Valencia, Apartat 22085,
46071 Valencia, Spain. Phone: 34 963543650. Fax: 34 963543670. E-mail:
leprae genome in the last 20 million years.
francisco.silva@uv.es. The discovery of M. lepromatosis and its differences from M.

Published ahead of print on 24 July 2009. leprae make it relevant for further study for diagnosis, treat-

6067
6068 HAN ET AL. J. BACTERIOL.

ment, and prevention of DLL. Likewise, the many similarities genes were 3,429, 2,013, 1,575, 1,125, and 405 nucleotides, respectively, and there
between these two organisms prompted questions about their was total concatenation of 8,547 nucleotide sites.
A maximum likelihood phylogenetic analysis was performed with the concat-
evolutionary histories and about how M. lepromatosis became enated alignment using the program PHYML (16) with 100 bootstrap replicates
endemic mainly in Mexico, while M. leprae occurs worldwide. and the HKY model. An additional neighbor-joining phylogeny was also ob-
In this study, we extended and refined our previous phyloge- tained with the program MEGA 3.1 (19) with the Tamura-Nei model and 1,000
netic study by determining and analyzing the sequences of 20 bootstrap replicates. To include M. haemophilum (without genome data) in the
analysis, additional phylogenies with the same conditions were constructed using
genes and pseudogenes of M. lepromatosis. Our findings solid-
the full-length rpoB gene of this organism. Tajimas relative rate tests (28) were
ified the phylogeny of this new organism and provided new implemented in MEGA (19). Both M. leprae and M. lepromatosis branches were
insights into the history of pseudogenes. compared with M. haemophilum using M. tuberculosis CDC1551 as the outgroup.
The analyses included the first and second codon positions (usually associated
with nonsynonymous substitutions) and the third codon positions (usually result-
MATERIALS AND METHODS ing in synonymous substitutions due to codon wobbling).
DNA isolation and sequencing. DNA was extracted from the autopsy liver of Estimation of dN and dS values. The dS and dN values, along with standard
a patient who died of DLL (17). Briefly, approximately 200 mg of freshly frozen errors, were estimated for gene-gene, gene-pseudogene, or pseudogene-pseudo-
infected liver tissue was homogenized and processed with N-acetyl-L-cysteine and gene pairs using yn00 from PAML (32). When a pseudogene was involved, dS
NaOH to enrich the mycobacterium, decontaminate the preparation, and disrupt and dN values were used only for comparison, and no protein-encoding function
the host cells, the usual procedure for mycobacterial culture. After essentially of the pseudogene was indicated.
pure mycobacterial cells were obtained, a commercial DNA extraction solution To estimate the relative age of an M. lepromatosis pseudogene, the formula for
(PrepMan Ultra; Applied Biosystems, CA) was added, and the mixture was the p parameter developed for M. leprae (13) was used:
boiled to disrupt the bacterial cells. The supernatant contained the released
DNA and was used for PCRs. p dNilps fdNit/dSilps fdNit
Many sets of PCR primers were designed to target various genes and pseu-
dogenes of M. lepromatosis. Without knowledge of the specific sequences of this where dNilps is the dN value for the ancestor (i) of M. tuberculosis and M. leprae
organism, the primers were designed using the conserved sequences of the and the present pseudogene (lps) of M. leprae (or M. lepromatosis), dNit is the dN
corresponding genes of M. leprae and M. tuberculosis after alignment. For a long value for the same ancestor (i) and the gene of M. tuberculosis, and f(dNit) is a
gene, multiple sets of forward and reverse primers were designed, with each set function that makes a small correction for the expected number of substitutions
amplifying 500 to 600 bp, with overlaps. The PCR amplicons were sequenced in the lineage as if the pseudogene had not formed and is estimated from the
directly using an ABI sequencer. All sequences were matched with GenBank complete set of pairs of orthologous genes for M. tuberculosis and M. leprae. This
data and inspected visually to ensure accuracy. formula is based on the fact that the dN or dS value for a gene-pseudogene pair
The full-length rpoB gene of Mycobacterium haemophilum ATCC 29548T was (for example, an M. tuberculosis gene and an M. leprae pseudogene) is the sum for
also sequenced for phylogenetic analysis. the period of evolution as a gene (for the complete lineage of M. tuberculosis and
Retrieval of other sequences. Gene and pseudogene sequences of M. lepro- part of the lineage of M. leprae) and the period of evolution as a pseudogene
matosis were used to extract putative orthologous genes from completely se- (which depends on the moment of inactivation in the M. leprae lineage). It also
quenced mycobacterial genomes. The extraction was performed using the Mi- takes into consideration the fact that the rates of evolution for nonsynonymous
crobial Genome Database for Comparative Analysis (30). The genomes used changes are gene specific and that to determine these rates in M. leprae (or M.
included those of Mycobacterium abscessus ATCC 19977T, Mycobacterium avium lepromatosis), the dNilps and dNit for each gene had to be compared. Because of
strain 104, M. avium subsp. paratuberculosis, Mycobacterium bovis BCG Pasteur the lack of a data set for M. lepromatosis, the function estimated for M. leprae was
1173P2, Mycobacterium gilvum, M. leprae, Mycobacterium marinum M, Mycobac- used. For the substitution values, M. avium subsp. paratuberculosis was used as an
terium smegmatis, Mycobacterium sp. strain JLS, Mycobacterium sp. strain KMS, outgroup.
Mycobacterium sp. strain MCS, M. tuberculosis CDC1551, Mycobacterium ulcer- Nucleotide sequence accession numbers. The M. lepromatosis nucleotide se-
ans, and Mycobacterium vanbaalenii. Because there were six genomes of M. quences have been deposited in the GenBank database under the accession
tuberculosis complex organisms in the database and all of them have essentially numbers shown in Table 1. The GenBank accession number for the nucleotide
identical sequences, only two organisms, M. bovis BCG Pasteur 1173P2 and M. sequence of the full-length rpoB gene of M. haemophilum ATCC 29548T is
tuberculosis CDC1551, were used. GQ245966.
Sequence alignment. The MAFFT program (18) was used to align nucleotide
sequences of genes and pseudogenes. The ClustalW program implemented in
MEGA 3.1 (19) was used to align the amino acid sequences of protein-encoding RESULTS
genes for the corresponding nucleotide alignments. Each alignment was visually
inspected to detect short misalignments or poorly aligned regions. Doubtful M. lepromatosis sequences and matches with M. leprae. The
regions, usually at both ends, were trimmed. 20 genes and pseudogenes (22,814 nucleotides) from M. lep-
To calculate the numbers of synonymous (dS) and nonsynonymous (dN) nu-
romatosis and their matches with M. leprae genes and pseudo-
cleotide substitutions per site in five M. lepromatosis pseudogenes (citA, gnd2,
icd1, lld1, and mmaA3), alignment with corresponding M. leprae pseudogenes genes are shown in Table 1. These genes and pseudogenes
was performed using the following parameters: (i) the alignment length corre- were chosen from the eight initial sequences (17) with exten-
sponded to the available length of each M. lepromatosis pseudogene; (ii) to sion of the 16S rRNA, rpoB, groEL, and rpoT genes and 16
maintain continuous reading frames, like those in protein-encoding genes, the other metabolic genes or pseudogenes in the M. leprae and M.
nucleotides at triplet codon positions covering any indel in either organism were
not counted; and (iii) nucleotide positions at which there were stop codons in
tuberculosis genomes as previously described (4). Of the 16 new
either organism were removed. The final lengths of the aligned regions, with genes or pseudogenes, 12 were successfully amplified and se-
15% reduction of the original alignments, were 918, 852, 552, 483, and 540 quenced, but 4 pseudogenes (ligB, ummA, bfrB, and pfkB) were
nucleotides for citA, gnd2, icd1, lld1, and mmaA3, respectively. not amplified and sequenced despite many attempts and rede-
Phylogenetics and relative rate tests. To determine the relationship of M.
sign of the primers, implying that there is too much variation or
lepromatosis to other mycobacteria, phylogenetic trees were constructed using a
concatenated alignment of five conserved genes. These genes, rpoB, ligA, groEL, these pseudogenes are not present in M. lepromatosis. The
gnd1, and bfrA, were chosen from 14 protein-encoding genes based on the GC content of the nucleotide sequences was 58.6%, similar
following criteria: functional importance or conservation, the presence of or- to the 58.8% for the corresponding M. leprae sequences or the
thologous genes in the 17 completely sequenced mycobacterial genomes, full M. leprae genome (57.8%) (4).
length or nearly full length, and alignment accuracy. In particular, rpoB and
groEL have frequently been used in phylogenetic studies (6). The alignments
The average levels of nucleotide identity between M. lepro-
were also visually inspected to trim more variably aligned ends (on average, 9% matosis and M. leprae were 93.1% and 79.1% for the coding
of the alignments). The final lengths of the rpoB, ligA, groEL, gnd1, and bfrA genes and pseudogenes, respectively, and the overall level of
VOL. 191, 2009 SEQUENCE ANALYSIS OF M. LEPROMATOSIS 6069

TABLE 1. Sequences for 22,814 bp of M. lepromatosis and matches with M. leprae

GenBank Nucleotide matches Amino acid matches


Length
Gene or pseudogene (description) accession No. No. of No. of gaps No. No. of
(bp) % %
no. aligned matches (%) aligned matches

All 14 protein-encoding genes 17,318 17,250 16,061 93.1 47 (0.27) 5,527 5,260 95.2
rpoT (RNA polymerase sigma EU203595 881 881 775 88.0 47 (5.3) 252 199 79.0
factor)
rpsO (ribosomal protein O) EU203592 415 398 359 90.2 0a 89 82 92.1
ligA (DNA ligase) EU839554 2,138 2,130 1,970 92.5 0 685 636 92.8
ribF (riboflavin kinase) EU203592 352 343 319 93.0 0 114 108 94.7
bfrA (bacterioferritin) EU839559 421 419 387 92.4 0 136 129 94.9
icd2 (isocitrate dehydrogenase) EU839560 2,255 2,240 2,095 93.5 0 736 699 95.0
lld2 (L-lactate dehydrogenase 2) EU839561 1,292 1,290 1,194 92.6 0 400 380 95.0
mmaA4 (mycolic acid synthase 4) EU203591 280 280 265 94.6 0 84 80 95.2
gnd1 (6-phosphogluconate EU839562 1,545 1,543 1,441 93.4 0 481 461 95.8
dehydrogenase)
pfkA (6-phosphofructokinase) EU839563 845 844 789 93.5 0 281 272 96.8
groEL (heat shock protein 65) EU203593 1,615 1,614 1,488 92.2 0 536 520 97.0
umaA2 (mycolic acid synthase 2) EU839564 417 413 390 94.4 0 137 133 97.1
gltA2 (citrate synthase 1) EU839565 1,266 1,265 1,197 94.6 0 418 406 97.1
rpoB (RNA polymerase beta EU203594 3,596 3,590 3,392 94.5 0 1,178 1,155 98.0
subunit)
All five pseudogenes 3,958 4,077 3,224 79.1 207 (5.1)
gnd2 (6-phosphogluconate EU839556 1,147 1,220 911 74.7 82 (6.7)
dehydrogenase 2)
lld1 (L-lactate dehydrogenase 1) EU839555 564 581 448 77.1 40 (6.9)
icd1 (isocitrate dehydrogenase 1) EU839557 615b 190 149 78.4 13 (6.8)
433 355 82.0 12 (2.8)
citA (citrate synthase A) EU839558 978 975 802 82.3 21 (2.1)
mmaA3 (mycolic acid synthase 3) EU203591 654 678 559 82.4 39 (5.7)
16S rRNA gene EU203590 1,538 1,542 1,511 98.0 10 (0.65)
Total 22,814 22,869 20,796 90.9 264 (1.2)
a
There is a 3-bp gap before the start codon.
b
There is a 128-bp deletion between the better matched regions.

nucleotide identity was 90.9% (Table 1). This level is well in these pseudogenes, the lack of constraints of reading frames
below the 95% genome-level cutoff used to distinguish species allowed much faster accumulation of indels.
(15). In addition, the level of nucleotide identity for the rpoB Finally, the level of identity for the 16S rRNA gene was
gene was 94.5%, which is also less than the 98% cutoff pro- 98.0%, which was the highest level of identity, but there were
posed recently (1). Therefore, M. lepromatosis meets the def- 10 gap sites (0.65% of 1,542 bp). In all mycobacteria, 16S
inition of a species. rRNA genes are highly conserved, and there are relatively few
The levels of identity for the protein-encoding genes were indels or indels are not present in closely related mycobacteria,
88.0 to 94.6% at the nucleotide level and 79.0 to 98.0% at the such as M. marinum and M. ulcerans, M. avium and Mycobac-
amino acid level (overall level of identity, 95.2%). Thirteen of terium intracellulare, and other pairs of species (data not
the 14 genes studied (16,369 bp) were aligned without gaps; the shown). Thus, indel mutations are also a feature of 16S rRNA
exception was rpoT, which had 47 gap sites (5.3% of 881 bp). gene evolution in the two leprosy organisms. Other 16S rRNA
The rpoT gene of M. lepromatosis contained a unique 74-bp tan- gene features were analyzed previously (17).
dem repeat, (CGAGCCACCAATACAGCATCT)3CGAGCCA Phylogenetic position of M. lepromatosis. As shown in Fig.
CCAA, which corresponded to amino acids (RATNTAS)3RAT. 1A, phylogenetic trees constructed by using the maximum like-
A 42-bp portion of this sequence, (AATACAGCATCTCGA lihood and neighbor-joining methods produced the same to-
GCCACC)2, which corresponded to (NTASRAT)2, was not pology with bootstrap values of 100 at all nodes except one. M.
present in the M. leprae rpoT gene. Slightly downstream of this leprae and M. lepromatosis were the most closely related spe-
sequence was a 3-bp insertion in M. lepromatosis. Upstream, a cies, yet in both trees the terminal branches leading to these
2-bp insert in M. lepromatosis shifted the reading frame and species were much longer than those leading to other tight
likely led to a different start codon. Thus, rpoT is the most groups, such as the M. tuberculosis complex and the M. avium
variable of the 14 genes. group. These long branches could not be explained by faster
The levels of identity for the pseudogenes were much lower, evolution or high substitution rates. On average, a gene evolv-
ranging 74.7 to 82.4%. Due to indels, the sequence of each of ing in the lineage leading to M. leprae had a slightly higher dS
the five pseudogenes had many gaps in the alignment. Alto- value than a gene in the M. tuberculosis lineage (from the last
gether, there was 207 bp of gaps (5.1% of 4,077 bp), accounting common ancestor) (0.82 and 0.72, respectively) (10), while the
for 24.3% of the 853 mismatches. In addition, the M. leproma- average dN values were 0.065 and 0.045, respectively (13).
tosis icd1 gene had a 128-bp deletion in the middle. Therefore, Because the dS values are similar and their contribution to
6070 HAN ET AL. J. BACTERIOL.

FIG. 1. Maximum likelihood phylogenies of selected mycobacteria based on (A) concatenated alignment of the rpoB, ligA, groEL, gnd1, and
bfrA genes, (B) alignment of the rpoB gene, and (C) alignment of the 16S rRNA gene. Bootstrap values are indicated at the nodes. At some nodes
there are two values, a value obtained by the neighbor-joining method (left value) and a value obtained by the maximum likelihood method (right
value). The bars indicate the numbers of nucleotide substitutions per site. The trees were rooted to separate the slow-growing mycobacterial clade
(A and B) and the M. avium branch (C).

branch lengths is much more important than that of dN values, difference was the decrease in the GC content, mainly at the
we believe that a much earlier divergence time is the best third codon positions (Table 2).
explanation. A phylogeny was also obtained using the 16S rRNA gene
Because M. haemophilum was recently reported to be a close sequences, which resulted in a different topology (Fig. 1C); the
relative of M. leprae (22) and M. lepromatosis (17), phyloge-
netic trees based on rpoB (3,429 bp) were also reconstructed
(Fig. 1B). The same topology was obtained despite some lower TABLE 2. GC contents of rpoB gene
bootstrap values. It was observed that M. haemophilum was GC content (%)
much closer to the leprosy organisms than to other mycobac- Species Codon positions Codon
teria. Yet the branch lengths indicated that there was faster 1 and 2a position 3a
Total
evolution of the leprosy mycobacteria than of M. haemophilum.
Using Tajimas relative rate tests for this gene, the branches M. lepromatosis 51.4 77.9 60.2
M. leprae 51.9 79.1 61.0
leading to M. leprae and M. lepromatosis were found to be M. haemophilum 52.7 88.0 64.4
significantly longer than the branch leading to M. haemophilum M. tuberculosis 53.0 86.9 64.3
for the third codon position in both leprosy organisms a
The first- and/or second-codon substitutions are frequently nonsynonymous,
(P0.001 for both) and for the first and second codon posi- whereas the third-codon substitutions are usually synonymous due to codon
tions in M. lepromatosis (P 0.042). The main reason for this wobbling.
VOL. 191, 2009 SEQUENCE ANALYSIS OF M. LEPROMATOSIS 6071

FIG. 2. Numbers of nucleotide substitutions per site for five pairs of Mycobacterium species. (Left panel) dS values. (Right panel) dN values.
The genes analyzed were rpoB, ligA, groEL, gnd1, and bfrA (bars from left to right). Abbreviations: map, M. avium subsp. paratuberculosis; mav,
M. avium 104; mbo, M. bovis AF2122/97; mle, M. leprae; mlp, M. lepromatosis; mmi, M. marinum M; mtc, M. tuberculosis CDC1551; mul, M.
ulcerans. The dN/dS ratios for the rpoB, ligA, groEL, gnd1, and bfrA genes for the M. lepromatosis-M. leprae pair were 0.022, 0.155, 0.031, 0.075,
and 0.089, respectively; the average dN/dS ratio was 0.074.

branch between the leprosy organisms was much longer, and the M. lepromatosis and M. leprae lineages. Together, these
the M. marinum-M. ulcerans clade was closer to M. tuberculosis results suggest that the divergence of the two leprosy-causing
than to the clade formed by M. haemophilum and the leprosy organisms was not recent and that the length of their diver-
organisms. This tree was less realistic than the tree obtained gence was around 20% of the age of divergence of M. tuber-
using the concatenated protein-encoding genes (Fig. 1A). Phy- culosis and M. leprae or 10 million years.
logenomic studies have shown that in a gene phylogeny, when Age of M. lepromatosis pseudogenes. For all five pseudogenes
one lineage evolves much faster than the other lineages, an (citA, gnd2, icd1, lld1, and mmaA3) in both species there were
incorrect topology may be obtained, in which the fast-evolving corresponding functional genes in M. tuberculosis and other
branch has a tendency to move toward the base of the tree. mycobacteria. So when were the genes inactivated, in the last
Thus, the true topology may be obtained only with a small common ancestor or after divergence? This question was ad-
subset of genes that evolve slowly (2). The 16S rRNA genes of dressed from a few angles.
the leprosy organisms evolved much faster than those of other First, the dN/dS ratio for both species was estimated for each
mycobacteria, as shown by more indels, as noted above and pseudogene. For an ancestral pseudogene, continued evolu-
previously (17). tion after divergence in both lineages would be neutral, result-
Evolutionary distance between M. lepromatosis and M. lep- ing in similar dN and dS values with a ratio of 1 or close to 1.
rae. To estimate the evolutionary distance between M. lepro- Otherwise, selective evolution as a gene in both lineages after
matosis and M. leprae, dS and dN values were calculated using
divergence with late inactivation would result in a much
the five conserved genes described above. These values were
smaller dN/dS ratio. The longer the selective evolution, the
compared with the dS and dN values for other pairs of most
smaller the dN/dS ratio would be. For example, Gomez-Valero
closely related species or subspecies, including M. avium sub-
et al. (13) showed that in the M. leprae lineage after divergence
species, M. marinum and M. ulcerans, and M. tuberculosis and
from M. tuberculosis, the average dN/dS ratio for genes was
M. bovis. The more divergent pair M. tuberculosis and M. leprae
0.079.
was also included as a gauge because these most medically
As shown in Table 3, the dN/dS ratios for M. lepromatosis
important mycobacteria diverged 66 million years ago (13).
The results are shown in Fig. 2. and M. leprae were close to 1 (range, 0.680 to 0.789) for the
Since they are nearly unaffected by natural selection, the dS citA, gnd2, lld1, and mmaA3 pseudogenes, indicating that there
values are similar for many genes and thus more suitable for was likely neutral evolution and thus the sequences were pseu-
estimating relative divergence times. The dS values for M. dogenes prior to divergence. It is noteworthy that removal of
tuberculosis and M. bovis were zero, consistent with the clade- stop codons, which was required for calculation, resulted in
level divergence of these species and a genome level of identity underestimation of the dN values, leading to smaller ratios.
of 99.95% (7). Similarly, subspecies-level differences for the M. Mutations from codons to stop codons, which are common in
marinum-M. ulcerans and M. avium subspecies pairs were re- pseudogenes, are essentially dN events. Yet similar dN and dS
flected by low dS values. Conversely, for M. lepromatosis and values were also indicated by the overlapping confidence in-
M. leprae the dS values were far higher and for the five genes tervals (mean 1.96 standard errors) for these values. The
ranged from 14 to 28% (average, 20%) of the values for the only exception was icd1, for which the dN value was signifi-
much more divergent organisms M. tuberculosis and M. leprae. cantly smaller than the dS value (a ratio much less than 1). The
Similar results were obtained for the dN values, although in pseudogene dN/dS ratios contrasted with the dN/dS ratios for
this case natural selection affected each gene or lineage differ- the five coding genes analyzed, for which the average dN/dS
ently, which resulted in some variability. For example, the ratio was 0.074 (Fig. 2). The dN/dS ratios were also estimated
highest dN value for ligA meant faster evolution of this gene in for M. lepromatosis and M. tuberculosis, and as expected, they
6072 HAN ET AL. J. BACTERIOL.

TABLE 3. Synonymous and nonsynonymous substitutions in pseudogenesa


M. lepromatosis- M. lepromatosis- M. lepromatosis-M. leprae
Pseudogene M. leprae M. tuberculosis
dN/dS ratio dN/dS ratio dN dS P value

citA 0.6804 0.1669 0.1639 0.0175 0.2409 0.0372 NS


gnd2 0.7148 0.1947 0.1950 0.0200 0.2728 0.0428 NS
mmaA3 0.7690 0.1023 0.1415 0.0198 0.1840 0.0490 NS
lld1 0.7890 0.1458 0.1858 0.0264 0.2355 0.0496 NS
icd1 0.3234 0.0699 0.1072 0.0169 0.3315 0.0694 0.05
a
The confidence intervals for dN and dS (mean 1.96 standard errors) overlap (NS, not statistically significant) or do not overlap (P 0.05).

were small (Table 3), reflecting the evolution of the sequences opportunity to discern the evolutionary features of these or-
entirely as genes in the M. tuberculosis lineage. ganisms. Using in-depth analysis of 20 genes and pseudogenes
Second, the relative ages of M. lepromatosis pseudogenes from M. lepromatosis, this study fulfilled some of our goals.
were estimated by using the formula used for the M. leprae First, the species-level divergence between these two organ-
pseudogenes (13). This formula required the presence of cor- isms was confirmed to be on a larger genetic scale despite the
responding orthologous genes in M. tuberculosis and M. avium. current lack of culture and phenotypic features. Second, diver-
Thus, it could not be used for mmaA3 because this pseudogene gence 10 million years ago is proposed based on the topology
is not present in M. avium. The relative ages of the remaining of robust phylogenetic trees and the levels of nucleotide sub-
four pseudogenes (Table 4) were found to be similar to the stitutions in the protein-encoding genes. This time is orders of
relative ages in M. leprae and also to the average age (0.13) of magnitude (88 times) greater than the relatively well-estab-
the 611 M. leprae pseudogenes analyzed previously (13). Thus, lished divergence time for M. tuberculosis and M. bovis
the similar ages of pseudogenes in the two species may indicate (113,000 years ago), which share 99.95% genome identity (7)
that they were pseudogenes in the ancestor. and had no resolvable difference in our analyses (Fig. 1 and 2).
Finally, indels that were present in both species and thus And third, the age of the pseudogenes of M. lepromatosis,
likely of ancestral origin were examined using sequence align- which is nearly identical to the age of M. leprae pseudogenes,
ments. Several common indels were detected, except in citA, their nearly neutral evolution, and the presence of codon-
and they included a 2-nucleotide insertion in gnd2, a 5-nucle- altering common indels suggest these pseudogenes were pseu-
otide deletion in icd1, a 10-nucleotide deletion in lld1, and a dogenes in the last common ancestor of the two species.
5-nucleotide insertion and a 8- or 9-nucleotide deletion in Bacterial taxonomy has changed along with technological
mmaA3. These indels were not in triplet nucleotides; as a advances, from morphological classification a century ago to
result, they could alter the codon reading frames and possibly phenotype-based chemotaxonomy to a polyphasic approach
create stop codons. Thus, these findings in essence show the involving assessment of genetic variation, such as genomic
ancestral presence of these pseudogenes, rather than conver- DNA hybridization (70% cutoff to differentiate species) and
gent evolution. Many unique indels in each species represented divergence of the 16S rRNA gene (in general, 97% cutoff) in
continued evolution of these pseudogenes after divergence. recent decades (21). Numerous bacterial genome sequences
Together, these analyses provided ample evidence showing are now readily accessible, making genetic comparisons accu-
that (i) all or most of the pseudogenes analyzed were inacti- rate and comprehensive for the purpose of species identifica-
vated prior to the divergence of M. lepromatosis and M. leprae tion (15). Yet the phenotype of an organism, which includes
and (ii) the massive gene inactivation described for M. leprae complex traits involving many genetic elements, is still essen-
(4, 13) before the discovery of M. lepromatosis (17) in fact took tial. For instance, historically as well as practically, several
place in the last common ancestor of these two species. species with essentially identical genome sequences are re-
tained, such as M. tuberculosis for humans and M. bovis for
cows. In the case of M. lepromatosis, status as a novel species
DISCUSSION
can be justified because of its remarkable genetic variation, as
The discovery of the new leprosy-causing organism M. lep- well as uniqueness of the disease that it causes (DLL). How-
romatosis and its close relationship to M. leprae provided an ever, to satisfy the bacteriological code for a full description of
this new species, phenotypic characterization after cultivation
and/or animal passage is needed, and efforts to do this are
TABLE 4. Relative pseudogene agea being made.
Practically, the sequences enable us to separate the two
Pseudogene M. lepromatosis M. leprae
leprosy organisms to better define the disease. For instance, we
icd1 0.105 0.079 have devised a reliable PCR assay based on the species-specific
citA 0.106 0.089 16S rRNA gene sequences to detect the presence of each
lld1 0.111 0.080
gnd2 0.175 0.168
species in a skin biopsy (X. Y. Han, K. C. Sizer, F. Aung, H. H.
Avg 0.124 0.104 Tan, and B. Werner, unpublished data; X. Y. Han, K. C. Sizer,
a
S. Velarde-Felix, and F. Vargas-Ocampo, unpublished data).
The relative age ranges from 1 (the age equals the time of divergence of M.
lepromatosis-M. leprae from M. tuberculosis) to 0 (pseudogene inactivation at the Using this assay in a follow-up study of M. lepromatosis infec-
present time). tions in Mexico (X. Y. Han, K. C. Sizer, S. Velarde-Felix, and
VOL. 191, 2009 SEQUENCE ANALYSIS OF M. LEPROMATOSIS 6073

F. Vargas-Ocampo, unpublished data), we found that far more a tendency to lose the nonfunctional DNA (12, 23, 27). Anal-
leprosy cases were caused by M. lepromatosis than by M. leprae, ysis of some bacterial endosymbionts of insects, such as the
suggesting that M. lepromatosis is the dominant leprosy-causing aphid endosymbiont Buchnera aphidicola and the ant endo-
species in Mexico. We also noted that, in addition to leproma- symbiont Blochmannia floridanus, has permitted workers to
tous leprosy, M. lepromatosis also specifically caused DLL in establish a stepwise scenario for genome reduction involving
30% of the infections. DLL is endemic in Mexico but rare very slow and gradual degradation with scattered punctual
elsewhere. These findings, as well as the lack of DLL reports or large deletion events (11, 14). This model has been confirmed
description in Europe, more specifically in Spain, suggest that by complete sequencing of several B. aphidicola genomes (25).
DLL is native to Mexico and was not brought over by the In this regard, future genome sequencing of M. lepromatosis
Spaniards over the past 500 years. Native Mongoloid Mexicans and comparative genomic analyses with M. leprae may shed
initially migrated from Asia 12,000 years ago through the light on the genome decay in these highly specialized patho-
Alaska Bering land bridge and settled in the Pacific coastal gens.
areas of Mexico. Some of them migrated further south and Why did the ancestor evolve into two species 10 million
settled in the Caribbean, other central American countries, years ago if it had become parasitic in a primate host? The
and South America. Our finding of M. lepromatosis in Brazil answer to this question is even more elusive. There are a few
and Singapore Chinese (X. Y. Han, K. C. Sizer, F. Aung, H. H. possible explanations, including subtle niche differences in the
Tan, and B. Werner, unpublished data) and the presence of host, eventually leading to separate evolution paths; coevolu-
DLL in the Caribbean (8) are also consistent with the possi- tion with different host populations of the same species that
bility that there was Mongoloid spread of this organism. There- might be affected variably by the availability of nutrients, sus-
fore, the studies of M. lepromatosis have provided new insights ceptibility, geographic restriction, and other factors; and a shift
into the puzzling clinical and geographic features of leprosy. from one host species to another. The niche possibility can be
While the overall features of M. lepromatosis and M. leprae tested in future clinical studies by examining tissue infected by
are very similar in terms of culture difficulty, GC content, and each organism. Both organisms are known to be intracellular in
the functional status of genes and pseudogenes, M. lepromato- macrophages; however, remarkable endothelial invasion by M.
sis likely has some unique phenotypic traits that have yet to be lepromatosis in DLL has been documented (17, 26, 31). In our
discovered that may be responsible for some distinct clinical clinical studies, we also found many cases of dual M. leproma-
features of M. lepromatosis infections, such as DLL. DLL is tosis-M. leprae infection (X. Y. Han, K. C. Sizer, F. Aung, H. H.
chronic, and the organism is strictly intracellular (17, 20, 26). Tan, and B. Werner, unpublished data; X. Y. Han, K. C. Sizer,
Vargas-Ocampo (31) recently studied the histopathology of S. Velarde-Felix, and F. Vargas-Ocampo, unpublished data).
199 patients with DLL in Mexico and concluded that DLL is Further studies of the coexistence of these two organisms may
essentially a vascular disease caused by mycobacterial invasion, provide insight as well.
with early vaculitis to late-stage vascular occlusion, which is
responsible for the clinical manifestations of the disease that ACKNOWLEDGMENTS
include skin purpura, ulceration, and necrosis.
This work was supported in part by a University Cancer Foundation
Humans are the only known natural host of M. leprae and grant from the University of Texas M. D. Anderson Cancer Center
have probably been infected for 100,000 years according to (MDACC) (to X.Y.H.), by the School of Health Sciences of MDACC,
genomic evidence based on many M. leprae strains obtained and by National Institutes of Health grant CA16672 for the MDACC
worldwide (24). Consistent with the strictly parasitic lifestyle of DNA Core Facility.
this organism, the M. leprae genome is the smallest genome of We thank the DNA Core Facility staff for DNA sequencing and
John Spencer for helpful discussions. X.Y.H. initiated and orches-
all of the mycobacterial genomes sequenced to date. This ge- trated this study; X.Y.H., K.C.S., E.J.T., J.K., J.L., and P.H. deter-
nome has also undergone reductive evolution with massive mined the sequences; F.J.S., L.G.V., and X.Y.H. performed data anal-
inactivation of genes (4) estimated to have occurred during the ysis; and F.J.S. and X.Y.H. wrote the paper.
last 20 million years (13). The present study further establishes
REFERENCES
that this genome decay took place in the last common ancestor
of M. leprae and M. lepromatosis. 1. Adekambi, T., T. M. Shinnick, D. Raoult, and M. Drancourt. 2008. Complete
rpoB gene sequencing as a suitable supplement to DNA-DNA hybridization
What led to the genome decay in the ancestor 20 million for bacterial species and genus delineation. Int. J. Syst. Evol. Microbiol.
years ago? We speculate that there is some relationship to 58:18071814.
2. Canback, B., I. Tamas, and S. G. Andersson. 2004. A phylogenomic study of
adaptation from a free-living lifestyle, like that of almost all endosymbiotic bacteria. Mol. Biol. Evol. 21:11101122.
other mycobacteria (100 species) except M. tuberculosis and 3. Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V.
M. bovis, to an increasingly parasitic lifestyle in a host (an early Gordon, K. Eiglmeier, S. Gas, C. E. Barry, F. Tekaia, K. Badcock, D.
Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T.
ancestor of hominids?). During the last decade, sequencing Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornby, K. Jagels, A. Krogh,
and comparative analysis of many bacterial genomes have J. McLean, S. Moule, L. Murphy, K. Oliver, J. Osborne, M. A. Quail, M. A.
shown that changes in lifestyle from free living to host depen- Rajandream, J. Rogers, S. Rutter, K. Seeger, J. Skelton, R. Squares, S.
Squares, J. E. Sulston, K. Taylor, S. Whitehead, and B. G. Barrell. 1998.
dence or restriction to a specific host may lead to relaxation of Deciphering the biology of Mycobacterium tuberculosis from the complete
natural selection and make many genes nonessential, initiating genome sequence. Nature 393:537544.
4. Cole, S. T., K. Eiglmeier, J. Parkhill, K. D. James, N. R. Thomson, P. R.
the process of genome downsizing. In the early stages, many Wheeler, N. Honore, T. Garnier, C. Churcher, D. Harris, K. Mungall, D.
genes (up to thousands of genes), usually less selectively con- Basham, D. Brown, T. Chillingworth, R. Connor, R. M. Davies, K. Devlin, S.
strained genes, are inactivated (4, 5, 13, 29). This inactivation Duthoy, T. Feltwell, A. Fraser, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels,
C. Lacroix, J. Maclean, S. Moule, L. Murphy, K. Oliver, M. A. Quail, M. A.
occurs through missense or nonsense nucleotide substitutions, Rajandream, K. M. Rutherford, S. Rutter, K. Seeger, S. Simon, M. Sim-
indels, or mobilization of insertion elements (9). There is then monds, J. Skelton, R. Squares, S. Squares, K. Stevens, K. Taylor, S. White-
6074 HAN ET AL. J. BACTERIOL.

head, J. R. Woodward, and B. G. Barrell. 2001. Massive gene decay in the 18. Katoh, K., K. Misawa, K. Kuma, and T. Miyata. 2002. MAFFT: a novel
leprosy bacillus. Nature 409:10071011. method for rapid multiple sequence alignment based on fast Fourier trans-
5. Delmotte, F., C. Rispe, J. Schaber, F. J. Silva, and A. Moya. 2006. Tempo form. Nucleic Acids Res. 30:30593066.
and mode of early gene loss in endosymbiotic bacteria from insects. BMC 19. Kumar, S., K. Tamura, and M. Nei. 2004. MEGA3: integrated software for
Evol. Biol. 6:56. molecular evolutionary genetics analysis and sequence alignment. Briefs
6. Devulder, G., M. P. de Montclos, and J. P. Flandrois. 2005. A multigene Bioinform. 5:150163.
approach to phylogenetic analysis using the genus Mycobacterium as a model. 20. Latapi, F., and A. Chevez-Zamora. 1948. The spotted leprosy of Lucio: an
Int. J. Syst. Evol. Microbiol. 55:293302. introduction to its clinical and histological study. Int. J. Lepr. 16:421437.
7. Garnier, T., K. Eiglmeier, J. C. Camus, N. Medina, H. Mansoor, M. Pryor, 21. Ludwig, W., and H.-P. Klenk. 2001. Overview: a phylogenetic backbone and
S. Duthoy, S. Grondin, C. Lacroix, C. Monsempe, S. Simon, B. Harris, R. taxonomic framework for prokaryotic systematics, p. 4965. In D. R. Boone,
Atkin, J. Doggett, R. Mayes, L. Keating, P. R. Wheeler, J. Parkhill, B. G. R. W. Castenholz, and G. M. Garrity (ed.), Bergeys manual of systematic
Barrell, S. T. Cole, S. V. Gordon, and R. G. Hewinson. 2003. The complete bacteriology, 2nd ed., vol. I. The archaea and the deeply branching and
genome sequence of Mycobacterium bovis. Proc. Natl. Acad. Sci. USA 100: phototrophic bacteria. Springer-Verlag, New York, NY.
78777882. 22. Mignard, S., and J. P. Flandrois. 2008. A seven-gene, multilocus, genus-wide
8. Gelber, R. H. 2005. Leprosy (Hansens disease), p. 966972. In D. L. Kasper, approach to the phylogeny of mycobacteria using supertrees. Int. J. Syst.
E. Braunwald, A. S. Fauci, S. L. Hauser, D. L. Longo, and J. L. Jameson Evol. Microbiol. 58:14321441.
(ed.), Harrisons principles of internal medicine. McGraw-Hill, Inc., New 23. Mira, A., H. Ochman, and N. A. Moran. 2001. Deletional bias and the
York, NY. evolution of bacterial genomes. Trends Genet. 17:589596.
9. Gil, R., E. Belda, M. J. Gosalbes, L. Delaye, A. Vallier, C. Vincent-Monegat, 24. Monot, M., N. Honore, T. Garnier, R. Araoz, J. Y. Coppee, C. Lacroix, S.
A. Heddi, F. J. Silva, A. Moya, and A. Latorre. 2008. Massive presence of Sow, J. S. Spencer, R. W. Truman, D. L. Williams, R. Gelber, M. Virmond,
insertion sequences in the genome of SOPE, the primary endosymbiont of B. Flageul, S. N. Cho, B. Ji, A. Paniz-Mondolfi, J. Convit, S. Young, P. E.
the rice weevil Sitophilus oryzae. Int. Microbiol. 11:4148. Fine, V. Rasolofo, P. J. Brennan, and S. T. Cole. 2005. On the origin of
10. Gomez-Valero, L. 2006. Evolucion reductiva del tamano del genoma en leprosy. Science 308:10401042.
bacterias intracelulares. Ph.D. thesis. Universitat de Valencia, Valencia, 25. Moran, N. A., H. J. McLaughlin, and R. Sorek. 2009. The dynamics and time
Spain. scale of ongoing genomic erosion in symbiotic bacteria. Science 323:379382.
11. Gomez-Valero, L., A. Latorre, R. Gil, J. Gadau, H. Feldhaar, and F. J. Silva. 26. Rea, T. H., and R. S. Jerskey. 2005. Clinical and histologic variations among
2008. Patterns and rates of nucleotide substitution, insertion and deletion in thirty patients with Lucios phenomenon and pure and primitive diffuse
the endosymbiont of ants Blochmannia floridanus. Mol. Ecol. 17:43824392. lepromatosis (Latapis lepromatosis). Int. J. Lepr. Other Mycobact. Dis.
12. Gomez-Valero, L., A. Latorre, and F. J. Silva. 2004. The evolutionary fate of 73:169188.
nonfunctional DNA in the bacterial endosymbiont Buchnera aphidicola. 27. Silva, F. J., A. Latorre, and A. Moya. 2001. Genome size reduction through
Mol. Biol. Evol. 21:21722181. multiple events of gene disintegration in Buchnera APS. Trends Genet.
13. Gomez-Valero, L., E. P. C. Rocha, A. Latorre, and F. J. Silva. 2007. Recon- 17:615618.
structing the ancestor of Mycobacterium leprae: the dynamics of gene loss and 28. Tajima, F. 1993. Simple methods for testing the molecular evolutionary clock
genome reduction. Genome Res. 17:11781185. hypothesis. Genetics 135:599607.
14. Gomez-Valero, L., F. J. Silva, J. C. Simon, and A. Latorre. 2007. Genome 29. Toh, H., B. L. Weiss, S. A. H. Perkin, A. Yamashita, K. Oshima, M. Hattori,
reduction of the aphid endosymbiont Buchnera aphidicola in a recent evo- and S. Aksoy. 2006. Massive genome erosion and functional adaptations
lutionary time scale. Gene 389:8795. provide insights into the symbiotic lifestyle of Sodalis glossinidius in the tsetse
15. Goris, J., K. T. Konstantinidis, J. A. Klappenbach, T. Coenye, P. Vandamme, host. Genome Res. 16:149156.
and J. M. Tiedje. 2007. DNA-DNA hybridization values and their relation- 30. Uchiyama, I. 2007. MBGD: a platform for microbial comparative genomics
ship to whole-genome sequence similarities. Int. J. Syst. Evol. Microbiol. based on the automated construction of orthologous groups. Nucleic Acids
57:8191. Res. 35:D343D346.
16. Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to 31. Vargas-Ocampo, F. 2007. Diffuse leprosy of Lucio and Latapi: a histologic
estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696704. study. Lepr. Rev. 78:248260.
17. Han, X. Y., Y. H. Seo, K. C. Sizer, T. Schoberle, G. S. May, J. S. Spencer, W. 32. Yang, Z. H., and R. Nielsen. 2000. Estimating synonymous and nonsynony-
Li, and R. G. Nair. 2008. A new Mycobacterium species causing diffuse mous substitution rates under realistic evolutionary models. Mol. Biol. Evol.
lepromatous leprosy. Am. J. Clin. Pathol. 130:856864. 17:3243.

Vous aimerez peut-être aussi