Vous êtes sur la page 1sur 42

(Bioinformatics)

____________________
the intersection of

Information technology
&

Biology
Ahmed A. Zayed

How much

information
does our
body hold ??

The smallest amount of


information can be
obtained from a

YES / NO
question
With

1/0
Answer

i.e.
1 Bit of information

Our genetic code is made of 4 nucleotides each


can be represented by:

2 Bits of information

Our entire genetic code can be

stored in a single DVD !!!!

40,000,000,000,000 cells in body


X 1.5 GB
=
60 zettabytes
60 X 1021
All the information our
It can be stored in less
civilization stores will
than 100 gm of DNA
reach only 40 ZB by
!!!!
2020

Part 1

Introduction to
(Bioinformatics)
____________________

Part 2

Molecules of life

Ahmed A. Zayed

Introduction to
(Bioinformatics)

What is Bioinformatics?
It is about :
searching biological comparing sequences,
databases,

looking at protein
structures,

and asking
biological questions
with a computer

Bioinformatics is vastly growing that !!!!!!

Shortly, Bioinformatics is the:


&

Storage
Retrieval
of large-scale, complex

Analysis
converting sequences into gene

Modeling
Protein structure predictions

using

Papers, Sequences,
and structures of
(DNA, Proteins)

We, as END USERS


Can perform biological experiments
in vivo,
within a living organism.

in vitro,
(in glass) or in an artificial environment.

in silico,
through silicon chips (bioinformatics)

Theory of molecular
evolution
phylogenies based on
sequence comparison
differences between
homologous sequences
as a molecular clock to
estimate the time since
the last common ancestor
Linus Pauling

phylogenies based on
sequence comparison

Atlas of Protein Sequence


The first comprehensive,
computerized and
publicly available
collection of protein
sequences.

Margaret Oakley Dayhoff

It became a model for


many subsequent
sequence databases,
including GenBank.

Needleman-Wunsch
algorithm

Global sequence alignment

DNA sequencing and


Staden software
DNA
sequencing
and
software to
analyze it
(Staden
software)
Modern version of Staden software

Most of the bioinformatics softwares


(tools) include:
1- All basic sequence alignments programs.
2- Phylogenetic and classification methods.
3-Various display tools adapted to relatively
small sequence objects
(such as protein sequences of, at most, a few
thousand characters long).

Smith-Waterman algorithm

Local sequence
alignment

Smith and Waterman

Sequence
alignment

Global sequence
alignment

Local sequence
alignment

The concept of a sequence


motif
Nucleotide or amino-acid sequence
pattern that is widespread and has, or is
conjectured to have, a biological
significance.

A DNA sequence motif represented as a sequence logo


graphically representing the observed probabilities

GenBank Release 3 made


public

open access, annotated collection of all


publicly available nucleotide sequences and
their protein translations

http://www.ncbi.nlm.nih.gov/genbank/

Phage lambda genome


sequenced
Provided useful tools
in molecular genetics
such as being used as a
vector for the cloning
of recombinant DNA

Bacteriophage lambda

Sequence database
searching algorithm

Allowed searching
the fast growing
huge databases

David J. Lipman

FASTP/FASTN: fast
sequence similarity
searching
>gi|1045243| cytochrome b
ACTGATCATAGTACATGACATAGATATCAGATACATAGAC
FASTA format for nucleotide sequence
>gi|5524211|gb|AAD44166.1| cytochrome b
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGY
FASTA format for amino-acid sequence

National Center for


Biotechnology Information
(NCBI) created at NIH/NLM

http://www.ncbi.nlm.nih.gov/gquery
HUGE databases of:
Literature, Health, Genomes, Genes, Proteins, and
Chemicals

Activity 1
Searching NCBIs Literature Databases

http://www.ncbi.nlm.nih.gov/gquery

EMBnet network for


database distribution

Synchronizing the data


between databases every
night!!!

BLAST: fast sequence


similarity searching
Basic Local Alignment Search Tool

http://blast.ncbi.nlm.nih.gov/Blast.cgi

EST: expressed sequence


tag sequencing
short sub-sequence of a cDNA sequence.
They may be used to identify
gene transcripts, and are instrumental in
gene discovery.

Sanger Centre, Hinxton,


UK
charitably funded genomic
research centre , A leader in
the Human Genome Project

EMBL European
Bioinformatics Institute
http://www.embl.de/
http://www.ebi.ac.uk/

First bacterial genomes completely


sequenced
Yeast genome completely sequenced

Worm (multicellular) genome


completely sequenced
Fly genome completely sequenced

Human genome project is


complete

Thank You
Questions ?

Vous aimerez peut-être aussi