Vous êtes sur la page 1sur 33

Quick Overview of Bioinformatics

Chuong Huynh NIH/NLM/NCBI New Delhi, India September 28, 2004 huynh@ncbi.nlm.nih.gov

NCBI

What is bioinformatics? - Definition


My definition bringing biological themes to computers Peter Elkin: Primer on Medical Genomics: Part V: Bioinformatics BISTIC Bioinformatics Definition

Bioinformatics is the discipline that develops and applies informatics to the field of molecular biology.

BISTIC Computational Biology Definition

Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data Computational Biology: the development and application of dataanalytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.

NCBI

http://www.bisti.nih.gov/

Useful/Necessary Bioinformatics Skills


Strong background in some aspect of molecular biology!!! Ability to communicate biological questions comprehensibly to computer scientists Thorough comprehension of the problem in the bioinformatics field Statistics (association studies, clustering, sampling) Ability to filter, parse, and munge data and determine the relationships between the data sets Mathematics (e.g. algorithm development) Engineering (e.g. robotics) Good knowledge of a few molecular biology software packages (molecular modeling / sequence analysis) Command line computing environment (Linux/Unix knowledge) Data administration (esp. relational database concept) and Computer Programming Skills/Experience (C/C++, Sybase, Java, Oracle) and Scripting Language Knowledge (Perl and perhaps Phython)

NCBI

Bioinformatics Flow Chart (0)


1a. Sequencing

1b. Analysis of nucleic acid seq.

6. Gene & Protein expression data

2. Analysis of protein seq.

7. Drug screening

3. Molecular structure prediction

Ab initio drug design OR Drug compound screening in database of molecules

NCBI

4. molecular interaction 8. Genetic variability 5. Metabolic and regulatory networks

Bioinformatics Flow Chart (1)


1a. Sequencing -Base calling -Physical mapping -Fragment assembly

1b. Analysis of nucleic acid seq.

-gene finding
-Multiple seq alignment evolutionary tree 2. Analysis of protein seq. Sequence relationship 3. Molecular structure prediction 3D modeling; DNA, RNA, protein, lipid/carbohydrate Protein-protein interaction Protein-ligand interaction

Stretch of DNA coding for protein; Analysis of noncoding region of genome

NCBI

4. molecular interaction

5. Metabolic and regulatory networks

Bioinformatics Flow Chart (2)


6. Gene & Protein expression data -EST -DNA chip/microarray

7. Drug screening

a) Lead compound binds tightly to binding site of target protein b) Lead optimization lead compound modified to be nontoxic, few side effects, target deliverable Drug molecules designed to be complementary to binding Sites with physiochemical and steric restrictions.

Ab initio drug design OR Drug compound screening in database of molecules

8. Genetic variability

-Now investigated at the genome scale

-SNP, SAGE

NCBI

Genome Sequencing

Strategy
Libraries Sequencing Assembly Closure Annotation Release

Clone by clone vs whole genome shotgun


Subcloning; generate small insert libraries Most genome will be sequenced and can be sequenced; few problem are unsolvable. Assembly: Process of taking raw single-pass reads into contiguous Problem lies in understanding what you have: consensus sequence (Phred/Phrap) Closure: Process of ordering and merging consensus Gene prediction/gene finding sequences into a single contiguous sequence Annotation -DNA features (repeats/similarities) -Gene finding Release features -Peptidedata to the public e.g. EMBL or GenBank -Initial role assignment -Others- regulatory regions

NCBI

Sequencing
Genomic DNA
Shearing/Sonication

Small DNA fragments 1.0-2.0kb Clone Library pUC18 DNA sequencing Random clones Shotgun reads

Subclone and Sequence

Assembly

Contigs
Finishing read Finishing

Both strands coverage; Gap filled

NCBI

Complete sequence

Annotation of eukaryotic genomes

Genomic DNA

transcription
Unprocessed RNA

ab initio gene prediction

RNA processing
Mature mRNA
Gm3 AAAAAAA

translation
Nascent polypeptide

Comparative gene prediction

folding
Active enzyme

NCBI

Functional identification
Function

Reactant A

Product B

Annotation
Predict protein Extract ORFs Remove errors Compare with database of known function proteins Provide transitive annotations

NCBI

Positional Cloning

NCBI

Positional Candidate Cloning

NCBI

The new information is always partial


Complete Eukaryotic Genomes Ongoing Eukaryotic Prokaryotic Ongoing Published Even a complete genome is only partially understood

NCBI

Why not use the genome sequence once its ready?


Finding exons
30% overprediction 20% not found at all Comparison systems rely on EST sequences which themselves contain large error rates Others are looking through partial data Once the genome is done when?

NCBI

Expressed sequences are there in part and represent a very very powerful key.

Interpreting data from many sources

NCBI

Genomics and Tropical Diseases


How Can Genomics Contribute to the Control of Tropical Diseases? Challenges and Opportunities The Role of Bioinformatics
Strategic emphases for research http://www.who.int/tdr/grants/strategic-emphases/default.htm WHO/TDR Genomics and World Health Report 2002

NCBI

Why Pathogen Genomics?


The power and cost-effectiveness of modern genome sequencing technology mean that complete genome sequences of 25 of the major bacterial and parasitic pathogens could be available within five years. For about 100 million dollars (), we could buy the sequence of every virulence determinant, every protein antigen and every drug target.
B. Bloom (1995) A microbial minimalist. Nature 378:236

NCBI

Genomics and Drug Development for Tropical Diseases: Challenges


Knowledge limitations
A large proportion of pathogen genes have unknown function Heavy investment in genomics is done by the commercial sector and therefore not widely available

Emphasis and priorities

Genomes of non-pathogenic model organisms (S. cerevisiae, D. melanogaster, C. elegans, A. thaliana) Genomes of pathogens that affect individuals in developed countries Neglected diseases neglected pathogens

NCBI

Doing Successful Science in the new millennium


Huge increase in available biological information Classic paradigm of molecular biology now is altering rapidly to genomics Understanding of the new paradigms concerns more than just bench biology Discovery requires large scale systems and broad collaborations, Global problems Funding comes in large amounts at group level, no longer a single laboratory or institution effort. Accountable output

NCBI

The Bigger Picture (Malaria)

NCBI

Genomics Approach to Drug Development: Opportunities


Classical laboratory assays aim at targets in which mutation is lethal to the pathogen
Valuable targets can be missed
Sulphonamides: Inhibition of the p-aminobenzoic acid pathway not lethal for growth in laboratory but severely attenuate the capacity to cause disease

NCBI

Genomics Approach to Drug Development: Opportunities


New approaches for the identification of gene products specifically involved in the disease process may uncover further drug targets
Signature tagged mutagenesis (STM) Transposon site hybridization (TraSH)

NCBI

Pathogen genomics and data mining for the discovery of new drug targets

Fosmidomycin

September 1999: a basic

science breakthrough (data mining through bioinformatics identify new targets for chemotherapy of malaria)

NCBI

1st semester 2001: Results of


Phase I clinical trials

Fosmidomycin example - lesson

A lesson to take home: 1 years from data mining and laboratory research to phase II, proof-ofprinciple clinical trials

NCBI

Bioinformatics: Opportunities in Health Research and Development


New drug research and development
Identification of novel drug/vaccine targets Structural predictions Tapping into biodiversity Reconstruction of metabolic pathways Systems biology

Identification of vaccine candidates through analysis of surface antigens and epitopes

NCBI

Bioinformatics is an extremely important tool, with relevance to studying pathogenic organisms


Pathogens of interest to DECs already being sequenced (e.g. P. falciparum, T. cruzi, T. brucei, Leishmania sp.)

A Window of Opportunity for Disease Endemic Countries

Computational biology is people-intensive, less affected by infrastructure, economics, etc than other areas of biological research Critical mass issues less critical a world-wide community is within reach

NCBI

Relatively Modest Hardware Needs and Technical Support


Linux operating system permits use of the personal computer as a powerful workstation Individual accounts for remote access and data processing can be open at highperformance computer facilities and regional centers
EMB network nodes, FIOCRUZ (Brazil), SANBI (South Africa), CECALCULA (Venezuela), ICGEB (Trieste and New Delhi) Vast repository of public domain software for computational biology

NCBI

Relatively Modest Hardware Needs and Technical Support


Powerful searches using public websites
NCBI, EMB nodes, Sanger Center, Expasy/SwissProt, KEGG database

High-speed internet access is becoming more and more available in disease endemic countries through regional and international support, e.g.:
Asia-Pacific Advanced Network Consortium (APAN) http://www.th.apan.net/ MIMCom Malaria Research Resources http://www.nlm.nih.gov/mimcom/about.html

NCBI

International Training Course on Bioinformatics and Computational Biology Applied to Genome Studies (Train-the-trainers Workshop) May 21-June 15, 2001 FIOCRUZ, Brazil

TDR Regional Training Centers & Regional Training Courses on Bioinformatics Applied to Tropical Diseases Africa SANBI, Cape Town, South Africa Course: Jan 20-Feb 02, 2002; Mar 19-Apr 4, 2003; Feb 215, 2004 (with NBN series) Univ of Ibadan, Ibadan, Nigeria Course: May 26-Jun 07, 2003 South America USP, So Paulo, Brazil Course: Feb 18-March 02, 2002; July 17-19, 2003; July 516, 2004; Southeast Asia ICGEB, New Delhi, India Course: Apr 26-May 09, 2002; Sep 22-Oct 06, 2003; Sept 28-Oct 11, 2004 Mahidol University, Bangkok, Thailand Course: Jul 09-23, 2002; Sep 29-Oct 10, 2003; July 26Aug6, 2004

NCBI

Training Course on Bioinformatics and Functional Genomics Applied to Insect Vectors of Human Diseases At the Center for Bioinformatics and Applied Genomics (CBAG) and Center for Vector and Vector-Borne Diseases (CVVD), Faculty of Science, Mahidol University, Bangkok, Thailand January 17-28, 2005 Training Course on Functional Genomics of Insect Vectors of Human Diseases African Center for Training in Functional Genomics of Insect Vectors of Human Diseases (AFRO VECTGEN) At the Malaria Research and Training Center (MRTC), Bamako, Mali Dec 1-16, 2004

NCBI

Beginning Bioinformatics Books


Baxevanis & Ouellette 2001. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins 2nd Edition. John Wiley Publishing. Gibas & Jambeck 2001. Developing Bioinformatics Computer Skills. OReilly. Bioinformatics: Genome Sequence Analysis Mount 2001 Bioinformatics For Dummies Claverie & Notredame 2003 Bioinformatics and Functional Genomics Pesvner 2003 Introduction to Bioinformatics Lesk 2002 Fundamental Concepts of Bioinformatics Krane & Raymer 2003 Beginning Perl for Bioinformatics Tisdall 2002 Primer of Genome Science Gibson & Muse 2002

NCBI

Course Schedule

Take out your course schedule.

Comments and Suggestions

NCBI

The Challenge

What is expected of you?

NCBI

Vous aimerez peut-être aussi