In-Silico Prediction of Regulatory Element Mycobacterium Tuberculosis

International Journal of Biotechnology and Biochemistry
ISSN 0973-2691 Volume 5 Number 1 (2009) pp. 7–13

© Research India Publications
http://www.ripublication.com/ijbb.htm
In-silico Prediction of the Regulatory Element

Patterns of Human Pathogen Mycobacterium
Tuberculosis
Pramod Katara, Mugdha Agarwal, Ganga Jeena, Supriya Karkra,

Ishu Sharma and Vinay Sharma
Department of Bioscience and Biotechnology

Banasthali University, P.O Banasthali Vidyapith, India-304022
E-mail: pmkatara@yahoo.co.in
Abstract
Computational prediction of nucleotide binding specificity for transcription

factors remains a fundamental and largely unsolved problem. Determination
of binding positions is a prerequisite for research in gene regulation. Many
computational technologies make it feasible to identify potential targets of
transcription factors. Regulatory elements can be used for bio-control in the
case of disease causing bacteria. Cluster analysis of gene expression data, is
often used to infer regulatory modules or biological function by associating
unknown genes with other well known genes that have similar expression
patterns. Using simple clustering algorithm for microarray datasets we
grouped those genes that are very similar at expression level, and then
analyzed them for the prediction of regulatory elemetns by using RSAT. In
present work we applied this approach on virulence genes of human pathogen
M. tuberculosis and predict its regulatory sequence pattern. We found that
among 76 treated clusters of virulence genes, 36 clusters shown the presence
of both oligo and spaced dyad sequence patterns of regulatory elements.
Keyword: Transcription factor binding sites, Gene expression, clustering,

Transcription start site and Regulatory elements
Introduction
Gene function and metabolomics vary time to time just because of the changes in
gene expression, means that genes are not always remain active in cell, their
activation depends on the requirements of their product, that vary time to time on the
8 P. Katara, M. Agarwal, G. Jeena, S. Karkra, Ishu Sharma and Vinay Sharma
basis of the internal and external cellular conditions, thus to understand cellular
function of any cell or organism it is very important to understand about the gene
regulation system.
The identification of the repertoire of regulatory elements in a genome is one of
the major challenges in modern biology and also important for the study of gene
expression and its regulatory machinery. Proteins responsible for transcription
initiation and its rate called transcription factors (TF), transcription process initiate by
the interaction of these TF and cis-regulatory elements in DNA which act as binding
sites and known as transcription factor binding sites (TFBS), which play very
essential role in transcription initiation process (Alberts et al., 2002).
Regulatory elements are short sequences of DNA (5–20 bp in length) that
determine the timing, location and level of gene expression (Dominic et al., 2004).
They are generally located in an upstream region of a genes transcription start site and
through the interaction with specific transcription factors; they modulate the
expression patterns of the genes in a genome. In recent years, sequences with known
regulatory elements have become easily available due to the effort for large-scale
sequencing of many genomes. Meanwhile, technologies such as microarray and ChIP-
on-chip (or GSLA for Genome-Scale Location Analysis) make it feasible to identify
potential targets of transcription factors. Among the available techniques, microarray
technologies allow direct measurement of the level of expression of each single gene
in a cell (DeRisi et al., 1997). One can characterize sets of genes involved in a defined
cellular process by such methods. Incidentally, many computational methods, such as
MEME, AlignACE and Consensus (Wasserman and Sandelin, 2004) and algorithm
like phylogenetic footprinting (Mathieu and Martin, 2002) have been developed in the
past decade to predict regulatory elements and regulatory motifs (Yueyi et al., 2004,
Yoon et al., 2005).
Humans have been affected by various pathogens from years. These pathogens are
of various types some of them are very dangerous and some are less harmful, but all
are causes some type of harms to their target host, thus it is very important to
understand about the genes and to get inside about the activity of these pathogen and
their control.
M. tuberculosis is small, an obligate aerobe and act as causative agent of chronic
infectious disease tuberculosis that is resposible for the death of countless millions of
people; it was first described by Robert Koch in 1882. The genome of complete
sequence of laboratry strain M. tuberculosis H37Rv, has already been determined and
analyzed in order to improve our understanding of the biology of this pathogen and to
help the conception of new prophylactic and therapeutic interventions. Its genome
comprises 4.41 Million base pairs and contains around approximately 4048 predicted
genes (Cole et al., 1998). Antibiotics and other drugs are already developed, even
after that M. tuberculosis is very dangerous because it complex life cycle and acquired
resistance against them (Camus et al., 1998).
Our present work focus on the prediction of the regulatory elements sequence
pattern of virulence genes resposible for the pathogenecity of M. tuberculosis by
combining information of coexpressed genes from gene expression data (Sand and
In-silico prediction of the regulatory element patterns of human pathogen 9
van-Helden, 2007) and nucleotide sequence using clustering and comparative

genomics concepts (Heyer et al., 1999; Lenhard et al., 2003).
Methodology
In the present study for the mining of the regulatory sequence of virulence genes of
M. tuberculosis, virulence genes are collected from various available database VFDB
(Chen et al., 2005; Yoon et al., 2007), and by data mining from various research
articles (Cole et al., 1998; Camus et al., 2002 ). From various databases, gene
sequences of interest downloaded for further use. For M. tuberculosis, the data are
stored and scaled by taking the logarithm ratio of gene expression data from SMD
(Sherlock et al., 2001) for the clustering of the gene (Catherine et al., 2005).
For the clustering of genes on the basis of expression values (Schena et al., 1996,
Yenug et al., 2004), Cluster software from Eisen lab is used (Eisen et al., 1998). As
we are interested in genes responsible for virulence, therefore after K-mean
clustering, only those clusters having our gene of interest (virulence genes) were
selected and all genes shared by such cluster were considered as probable virulence
genes.
Further for the analysis of the pattern of regulatory sequence of coexpressed
virulence genes, by using RSAT program, first we mine their upstream region, then
oligo-analysis and dyad-analysis was performed (van-Helden et al., 2000; Thomas-
Chollier et al., 2008; Defrance et al., 2008), which took above obtained clusters
information as an input for finding upstream region, oligo and dyad analysis for M.
tuberculosis this is based on comparative analysis (Lenhard and Wasserman, 2002).
Result and Discussion

Analysis of coexpressed gene
For the prediction of co-expressed genes with virulence genes we subjected all genes
of M. tuberculosis to k-mean clustering (K=450) and as a result obtained 450 cluster
among them we consider only 402 clusters for further analysis and discarded rest of
48 genes, because of the very low number of genes shared by them. We checked all
genes of clustered in these 402 clusters to find those genes that were identical to the
experimentally proved virulence gene found by data mining. As a result we found 75
cluster sharing well-known virulence genes along with other genes that we considered
probable virulence genes because of their similar gene expression pattern. We then
utilize this tightly coexpressed cluster for the analysis of regulatory sequence (Heyer
et al., 1999; Sand and van-Helden, 2007).
Upstream region
To analyze the upstream region of virulence and probable virulence genes, all
coexpressed genes were subjected to RSAT server. 75 clusters having 1622 genes in
total were subjected to RSAT to find the regulatory upstream regions, as a result we
found various pattern of the upstream regions, with different size, distance from
transcription start sites (TSS) and sequence, for all cluster.
Upstream pattern
350
319
298
300
268
257
250
No. of upstream
194
200
150 138
100
57
47
50
0
0
0 1 to 10 11 to 21 to 51 to 100 to 201 to 501 to above
20 50 100 200 500 1000 1000
Distance from TSS
Figure 1: The distance distribution of upstream region for M. tuberculosis.
Out of these 75 clusters, 298 genes yielded no upstream, 195 genes with 1 to 20
bp long upstream, 194 genes had 50 bp, 587 genes with 51 to 200 bp, 304 genes found
to have upstream region in the distance of 201 to 1000 from TSS, with maximum
distance being 1063, Most of the upstream regions is found to be in range of 100 to
200 bp which is a common range for microbes (Wasserman and Sandelin, 2004), this
range facilitate the analysis of proximal promoter region and far from that.
Oligo-analysis
All 75 clusters were analyzed for the prediction of oligomer occurrence and we found
that among 75 cluster, 36 shows positive result, in which 23 clusters having single
oligo-pattern, 11 clusters shows 2 oligo-patterns, while the remaining 2 clusters had
seven patterns each (figure 2), which indicated that most of the virulence genes are
regulated by single type regulatory pattern but some are also controlled by more than
one regulatory pattern.
Distribution of oligo pattern
25 23
No. of clusters
20
15 11
10
5 0 0 0 0 2
0
1 2 3 4 5 6 7
Pattern per cluster
Figure 2: Figure showing the Bar diagram of oligo analysis using RSAT.
Among those giving single pattern, three pairs had the same conserved motifs
each. A total of 5 clusters having double pattern and 1 cluster is remaining had
distinct motifs and a pair of 2 clusters and a group of three were found to have same
oligomer patterns respectively. The remaining two clusters had the same seven
patterns shared by both.
Dyad analysis
All 75 clusters of M. tuberculosis were analyzed for dyad analysis and 36 clusters
showing the dyad result (figure 3).
Distribution of dyad pattern
25 21
No. of clusters
20
15
9
10
5 3 2
1 0 0 0
0
1 2 3 4 5 6 7 8
Pattern per cluster
Figure 3: Figure showing the Bar diagram of dyad analysis using RSAT.
Dyad analysis shows that out of 36 cluster which shown the presence of dyad
pattern, most of the cluster shows single (21) and double type of dyad pattern (9), It
suggest that virulence genes of M. tuberculosis have a very conserved and common
types of dyad patterns. It also observable that, among 36 cluster 10 clusters having 1
spacer result which means they are also showing the resulting oligo analysis and
showing best suited occurrence significance while remaining 28 clusters having the
spacers lies between 2 to 18 that represent the variation among dyad pattern binding
sites.
Conclusion
Our analysis reveals that genes of M. tuberculosis shown considerable similarities in
their gene expression with well documented virulence gene that are considerable as
probable virulence genes and we found that 1622 genes shared more than 60% gene
expression pattern similarities with the pattern of virulence genes . Analysis of these
all genes in form of cluster for analysis of upstream regions shown that, most of the
genes contained upstream regions of distance 100 to 200 base pair which is a common
range for microbes (Wasserman and Sandelin, 2004).
The regulatory motif analysis shows the presence of different and significant
pattern: contiguous oligo-nucleotides and spaced dyads. Some clusters were found to
have similar conserved patterns for oligo as well as dyad analysis indicating that these
clusters may have the same regulatory elements. As a result of oligo analysis we
found 23 clusters having single patterns, of which 17 patterns were distinct. The other
11 clusters gave two patterns each with a total of 12 distinct patterns. The remaining
two clusters with 7 patterns each, shared all of them.
In the results for dyad analysis, 72 patterns were obtained of 36 clusters. The
results showed dyads with various spacer values ranging from 0, 1 to maximum of 18.
Maximum patterns were obtained with the space of 1 and 18. Of these 4 distinct
patterns with 1 spacer values, 3 distinct patterns with 2 spacer values, 5 distinct
patterns with 18 spacer values were obtained.
References
[1] Alberts B, Johnson A, Lewis J, Raf M, Roberts Kand Walter P (2002).
Molecular Biology of the Cell - Fourth Edition; Garland Science, New York
[2] Ball C A, Awad I A B, Demeter J, Gollub J, Hebert J M, Hernandez-Boussard
T, Jin H, Matese J C, Nitzberg M, Wymore F, Zachariah Zachariah K, Brown
P O and Sherlock G (2005). The Stanford Microarray Database accommodates
additional microarray platforms and data formats. Nucleic Acids Res. 33: 580–
582
[3] Camus J C, Pryor M J, Medigue C and Cole S T (2002). Re-annotation of the
genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology. 44:
2967–2973Chen L, Jian Y, Jun Y, Zhijian Y, Lilian S, Yan S and Qi J (2005).
VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res.
33: D325-D328
[5] Cole S T, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV,
Eiglmeier K, Gas S, Barry C E 3rd, Tekaia F, Badcock K, Basham D, Brown
D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S,
Hamlin N, Holroyd S, Hornsby T, Jagels K, Krogh A, McLean J, Moule S,
Murphy L, Oliver K, Osborne J, Quail MA, Rajandream MA, Rogers J, Rutter
S, Seeger K, Skelton J, Squares R, Squares S, Sulston JE, Taylor K,
Whitehead S and Barrell BG (1998). Deciphering the biology of
Mycobacterium tuberculosis from the complete genome sequence. Nature.
393: 537–544Defrance M, Janky R, Sand O and van-Helden J (2008). Using
RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in
nucleic sequences. Nat Protoc. 3:1589-1603
[7] DeRisi J L, Iyer V R, Brown P O (1997). Exploring the metabolic and genetic
control of gene expression on a genomic scale. Science. 278(5338):680-6
[8] Dominic J A, Isaac S, and Atul J (2004). Quantifying the relationship between
co-expression, co-regulation and gene function. BMC Bioinformatics. 5: 18
[9] Eisen M B, Spellman P T, Brown P O and Botstein D (1998). Cluster analysis

and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U S A
95:14863–14868
[10] Heyer L J, Kruglyak S and Yooseph S (1999). Exploring expression data:
Identification and analysis of coexpressed genes. Genome Res. 9:1106–1115
[11] Lenhard B, Sandelin A, Mendoza L, Engstrom P, Jareborg N and Wasserman
WW (2003). Identification of conserved regulatory elements by comparative
genome analysis. J Biol. 2(2):13
[12] Lenhard B and Wasserman W W (2002). TFBS: Computational framework for
transcription factor binding site analysis. Bioinformatics. 18: 1135-1136
[13] Blanchette M and Tompa M (2002). Discovery of Regulatory Elements by a
Computational Method for Phylogenetic Footprinting. Genome Res. 12: 739-
748
[14] Thomas-Chollier M, Sand O, Turatsinze J V, Janky R, Defrance M, Vervisch
E, Brohee S and van-Helden J (2008). RSAT: regulatory sequence analysis
tools. Nucleic Acids Res. 36:W119-W127
[15] Sand O and van-Helden J (2007). Discovery of motifs in promoters of co-
regulated genes.Comparative genomics. Methods in Molecular Biology.
Humana Press. ISBN 978-1-58829-693-1
[16] Schena M (1996). Genome analysis with gene expression microarrys.
Bioessays. 18: 427-431
[17] Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese J C,
Dwight S S, Kaloper M, Weng S, Jin H, Ball C A, Eisen M B, Spellman P T,
Brown P O, Botstein D and Cherry J M (2001). The Stanford Microarray
Database. Nucleic Acids Res. 29(1):152-555
[18] Wasserman W W and Sandelin A (2004). Applied bioinformatics for the
identification of regulatory elements. Nature. 5: 276-287
[19] Yenug Y K, Medvedovic M, and Bumgarner R E (2004). From co-expression
to co-regulation: how many microarray experiments do we need?. Genome
biol. 5(7): 48
[20] Yoon S, Hur C, Kang H, Kim Y, Oh T, and Kim J (2005). A computational
approach for identifying Pathogenicity islands in prokaryotic genomes. BMC
Bioinformatics. 6: 184
[21] Yoon S, Park Y, Lee S, Choi D, Oh T, Hur C, and Kim J (2007). Towards
Pathogenomics: A Web-Based Resource for Pathogenicity Islands. Nucleic
Acids Res. 35: D395-D400
[22] Yueyi L, Liping W, Serafim B, Douglas L, Brutlag, Liu J S, and Shirley L X.
(2004).. A suite of web-based programs to search for transcriptional regulatory
motifs. Nucleic Acids Res. 32: 204–207
[23] van-Helden J, Rios A F and Collado-Vides J (2000). Discovering regulatory
elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids
Res. 28 (8):1808-18

In-Silico Prediction of Regulatory Element Mycobacterium Tuberculosis

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

In-Silico Prediction of Regulatory Element Mycobacterium Tuberculosis

Transféré par

Droits d'auteur :

Formats disponibles

International Journal of Biotechnology and Biochemistry

ISSN 0973-2691 Volume 5 Number 1 (2009) pp. 7–13

In-silico Prediction of the Regulatory Element

Pramod Katara, Mugdha Agarwal, Ganga Jeena, Supriya Karkra,

Department of Bioscience and Biotechnology

Computational prediction of nucleotide binding specificity for transcription

Keyword: Transcription factor binding sites, Gene expression, clustering,

van-Helden, 2007) and nucleotide sequence using clustering and comparative

Result and Discussion

Figure 1: The distance distribution of upstream region for M. tuberculosis.

Distribution of oligo pattern

Distribution of dyad pattern

[9] Eisen M B, Spellman P T, Brown P O and Botstein D (1998). Cluster analysis

Vous aimerez peut-être aussi