Vous êtes sur la page 1sur 30

MD MEHEDI HASAN

Machine learning

to predict promoter location, to identify gene and to determine protein-protein interaction(ppi) to build rational model (concept map) to compare network/to build model (comparative interactomics) to find candidate genes

Text mining

Sequence, structure and interaction data analysis

Bioinformatic and experimental analysis

The increased availability of sequence data for various eukaryotic organisms in recent years has necessitated for better tools and techniques

A learning model is initially developed to train the Support Vector Machine (SVM)

A novel approach in which 4-mer motifs/3-mer aa sequence in conjunction with SVM were used to distinguish between

promoter and non-promoter DNA sequences. an interaction and non-interaction of proteins. gene and non gene..

4
Method
Extraction of postive dataset Extraction of negetive dataset
i.e. Promoter seq./gene seq./known ppi data with seq. i.e. Non Promoter seq./non gene seq./ non-interactions data with seq. i.e. 4-mer motifs

Features Selection
Training with SVM Testing/Cross-validation accuracy Comparison with existing methods

i.e. LIBSVM

i.e. Jackknife validation

5
Results

In case of promoter prediction, plant, Drosophila, human, mouse and rat sequences, the classification model has showed 7-fold cross-validation percentage accuracies of 83.81%, 94.82%, 91.25%, 90.77% and 82.35% respectively. In case of ppi, the best prediction of this method for H. pylori data reached upto 82.87% cross validation accuracy.

In case of gene finding, success rate was 94.55% using this algorithm.
It was suggested that the approach proposed here would be a extremely useful and an efficient tool to meet the demands of the molecular biologists.

6
Furthur information
Peer-reviewed Scientific Publication

Anwar, F., Baker, S.M., Jabid, T., Hasan, M.M., Shoyaib, M., Khan, H., and Walshe, R. (2008). Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach. BMC Bioinformatics 9, 414.

Other Publications

Firdaus, S.N., Hasan, M.M., Shoyaib, M., and Begum Z. (2008) Prediction of Protein Secondary Structure from Amino Acid Sequence using Probability Concept and SVM, Dhaka University Journal of Biological Science, 17(2): 159-163. Jabid, T., Shoyaib, M., Baker, S.M., Anwar, F., and Hasan, M.M. (2008). Protein-Protein Interaction Detection from Primary Structure using Support Vector Machine. WORLDCOMP'08. Hasan, M.M., Azad, AKM., Baker, S.M., Shoyaib, M., and Khan, H. (2008). Plant pol II promoter prediction: a machine learning approach. IADIS Conference08, Portugal. Anwar, F., Baker, S.M., Shoyaib, M., Jabid, T., and Hasan, M.M. (2008). Identification of Gene using Machine Learning Technique. WORLDCOMP'08.

7
Tools/Web Service/Databases

SVM C++/Perl Soft Berry, Dragon Promoter Finder, Neural Network Promoter Prediction, Promoter 2.0 Prediction Server and Promoter Scan (Comparison) EPD (Eukaryotic Promoter Database) PlantProm DB Unigene BIND, DIP, MIPS (PPI database) GenBank

8
The

focused topic was to compare initiation events of DNA Replication between yeast and mammalian cell.

Sequential events of DNA Replication Initiation are

Licensing and Firing

DNA replication is a highly controlled process. The global regulation in the initiation of DNA replication ensures once per cell cycle rule.

Cdks are key player of the replication regulation.

Over 700 literatures were analyzed to find components of

initiation events of DNA Replication in yeast and mammalian cell. components but also their roles.

Various text mining tools were used not only to find the

9
Method
Downloading articles related to DNA replication Selecting articles related to DNA replication initiation events Sorting out yeast and mammalian cells related articles Selecting genes and proteins related to DNA replication initiation events Extraction of their functions Building concept map to compare initiation events of DNA Replication between yeast and mammalian cell.
i.e. Pubmed/Endnote

i.e. Software assisted/manually curated

i.e. Software assisted/manually curated

i.e. Software assisted/manually curated

i.e. Software assisted/manually curated

i.e. manually curated

10
Results
Licensing Cdk regulation
Comparative analysis of steps involved in licensing
ORC binding to DNA replication origins Cdc6 and Cdt1 loading on ORC-chromatin Chromatin loading and activation of MCM2-7

Firing

Comparative analysis of steps involved in firing


Chromatin loading of MCM10 and Cdc45 Chromatin loading of GINS Comparative analysis of yeast Sld2 and Xenopus and human RecQ4 yeast Sld3 and Xenopus and human Treslin yeast Dpb11, Xenopus Cut5 and human TopBP1

Role of Cdks in promoting G1/Sspecific transcription


Role of Cdks in controlling the onset of DNA replication Role of Cdks in promoting DNA replication initiation Role of Cdks in inhibiting relicensing Multisite protein phosphorylation in the control of DNA replication initiation

Our investigation highlighted a remarkable conservation of the general architecture of DNA replication initiation in both yeast and mammal system. Higher eukaryotes equipped with more control points (proteins and their regulation) to fine control the initiation event. Cdks role in the initiation of DNA replication could have been evolved substantially from yeast to mammals. Any anomaly in these event could lead to rereplication, genome instability and ultimately cancer.

11

Schematic map of Licensing (left) and Firing (right) events in budding yeast and in mammalian cells

12

Schematic map of the known Cyclin-Cdk mediated phosphorylation events regulating both positively and negatively the level and the activity of licensing and firing factors involved in the initiation of DNA replication in yeast and in mammalian cells.

13
Furthur information
Peer-reviewed Scientific Publication Sacco, E., Hasan, M.M., Alberghina, L., and Vanoni, M. (2012). Comparative analysis of the molecular mechanisms controlling the initiation of chromosomal DNA replication in yeast and in mammalian cells. Biotechnol Adv 30, 73-98.

Tools/Web Service/Databases

Pubmed EndNote iHop Alibaba CellDisigner rxncon UniprotKB

14
Whi5 and Rb dependent signaling and regulation of

upstream event of initiation of DNA replication is very important. events in G1/S transition of cell cycle.

Human Rb and yeast Whi5 are key proteins regulating

Our analysis is focused on comparative analysis of Rb

and Whi5 proteins and their underlining networks.

A system-level comparison of networks centered on

Rb and Whi5 to probe evolutionary conservation of the function of Whi5 and Rb and their cognate regulatory circuits.

15
Method
Using interactome data to expand the pathway model of Whi5 protein

Analysis of sequence, structure and phosphorylation sites of Rb and Whi5 proteins

Analysis of the interactome of Rb and Whi5 proteins

START

Structure based alignment between pocket proteins of human and Whi5

Sequence Alignment of Whi5 and pRb proteins in 11 different species

Comparison between Whi5 and pRb proteins

Pathway Comparison

Simple Core pathway comparison

Build Simple circuit and compare

Try structure based alignment (PRALINE)

Not Satisfactory

Result of sequence based alignment (MATLAB, CLUSTALW)

Interactors Comparison

Compare target genes (E2Fand SBF&MBF)

Compare posttranslational modification

Complex interactors pathway using CellDesigner

Satisfactory Put in the result and explain

Collect all interactors from databases

Bulid Model and find conserved module

Interactors from BioGRID, MINT, IntAct, HPRD, DIP, CORUM, OPHID, Mpact, MPPI and BIND

Put in the result and explain

Interactors of Whi5

Interactors of pRb

Interactors of p107 and p130

16

17
Results (overview)

Compositional and sequence analysis of Whi5Sc (A) Secondary structure prediction for Whi5Sc. A simplified output of Proteus 2.0 shows only secondary

structure

elements

(B)

Composition

profiling

of

Whi5Sc. (C) Charge-hydropathy plot of Whi5Sc (green diamond). (D) Cumulative plot of disorder prediction. (E) Analytical SDS-PAGE analysis of proteolysis (F) of chromatogram kinetics on recombinant, IMAC-purified Whi5Sc size-exclusion recombinant Whi5Sc.

18
Results (overview)
Conservation of structural disorder among Whi5 homologs in Fungi. (A) The plots represent the prediction of structural disorder by VSL2B for Whi5 homologs (B) Pattern of conserved motifs found by the MEME algorithm (search for 3 motifs) in Whi5 homologs from different yeast species.

19
Results (overview)

Phosphorylation hampers invitro interaction between peptides representing motifs 1 and 3 of Whi5Sc

20
Results (overview)
Functional classification of Whi5Sc and Rb interactors. (A) The panel shows proteins physically binding to Whi5Sc (inner circle, first level interactors), genetic interactors physically binding to first level interactors (second circle, second level interactors), genetic interactors physically binding to second level interactors (third circle, third level interactors), and genetic interactors that does not interact with any second and third level interactors of Whi5 (outer circle). (B) The interaction network of Rb consists only of physical interactors, since all genetic interactors are also physical interactors. The functional classification of interaction network was derived from database and literature search and color coded according to function.

21
Results (overview)

Treemap of GO termenrichment of Whi5 interactors (A) and Rb interactors (B)

22

Concept map of Whi5 function. The model has been designed in order to include all first, second, and third levelWhi5Sc interactors. The map is divided in four major modules:Whi5 synthesis and subcellular localization, Whi5 processing, Gene silencing, and Gene expression of SBF-dependent genes

23
Furthur information
Peer-reviewed Scientific Publication Hasan, M.M., Brocca, S., Sacco, E., Spinelli, M., Papaleo, E., Lambrughi, M., Alberghina, L., and Vanoni, M. (2013). A comparative study of Whi5 and retinoblastoma proteins: from sequence and structure analysis to intracellular networks. Frontiers in Physiology (in press)

Tools/Web Service/Databases

Cell Illustrator ClustalW2 Composition Profiler Cytoscape 2.8 Disprot FoldIndex Gene GO Bean Gene Ontology GPS2.1 iRefWeb NetPhosYeast

Pfam PONDRVL-XT, PONDR VL3-BA, PONDR-FIT, VSL2 PPSP

Pubmed EndNote iHop Alibaba CellDisigner rxncon UniprotKB ANCHOR BioGRID BioModels

PRALINE
Proteus ProtParam Revigo

SGD
VSL2 YPL+.db

24
Molecular

analysis included determination of polymorphism in SSR markers from different chromosomes as well as comparison of gross genomic structure by using RAPD markers. and distribution of the SSR markers (both polymorphic and nonpolymorphic) in 12 rice chromosomes, spatial relationship of the SSR motif variability to any genes in their vicinity which also searched for gene structure determination of SSR markers sequence, expression studies of gene, etc.

Bioinformatic analysis consisted of locating position

25
Method
Schematic diagram of overall bioinformatic analysis

26
Graphical View of polymorphic SSR markers with relative positions to the coding region

Graphical View of expression pattern of coding region

27

Variation in the SSR sequence may alter the corresponding gene expression such as serine/threonine-protein kinase, YT521-B-like family protein and sex determination protein tasselseed-2/ NAD dependent epimerase/dehydratase family protein.

Morphological, molecular and bioinformatic analysis showed the phenotypic/morphological variations of Horidhan and BR11 occur due to some specific changes at the molecular level.

28
Tools/Web Service/Databases

Pubmed/EndNote Rice Genome Annotation Project Gramene NIAS Rice Database (Japan) Affymetrix GeneChip Rice Genome Rice Expression Database (RED) Rice Proteome Database RiceESTs in GenBank

Rice cDNAs in GenBank


KOME OryGenesDB Rice Tos17 Insertion Mutant Database DFCI Rice Gene Index

29

Rb and Whi5 Model building using CellDesigner and rxncon, Model (Rb) simulation using Simbiology 4.0 & Mathematica 9.0 (Training in

Mathematical model building, Division of Theoretical Systems Biology,


DKFZ, Germany)

Protein Network analysis using Cytoscape 2.8.2 and Structure analysis using

Pymol and Chimera 1.6.1

Analysis of sequence data (derived from sequencing project of Plant Biotechnology Lab)

Progaramming basic to analyze sequnce data, multiple alignment and protein data.

Thank You