Académique Documents
Professionnel Documents
Culture Documents
Ge Gao
& Liping Wei
Center for Bioinformatics, Peking University
https://www.coursera.org/course/pkubioinfo
http://img.webmd.com/dtmcms/live/webmd/consumer_assets/site_images/media/medical/hw/n5551221.jpg
http://commons.wikimedia.org/wiki/File:Hospital_newborn_by_Bonnie_Gruenberg2.jpg
Copyright Peking University
Almost
mitochondrial DNA
epigenetics
environments/nurture
chance
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCC
So simple, yet so mysterious...
CCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAAC
AACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTAAACCCTAACCCTAACCCTAACC
ACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCTAACCCCTAACCCTAACCCTA
Human genome has
base pairs.
CTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACC
ACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGCCGGC
CCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAG
AACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGGCGCAG
~2.9% of the bases encode genes.
CAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGACACATGCTAGCGCGTCGGGGTGGAGGCGTGGCGC
CGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTACCGCGTCCAGGGGTGGAGGCG
CGCAGGCGCAGAGAGGCGCACCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGTG
~97.1% of the genome was previously called junk.
GCGTGGCGCAGGCGCAGAGACGCAAGCCTACGGGCGGGGGTTGGGGGGGCGTGTGTTGCAGGAGCAA
CGCACGGCGCCGGGCTGGGGCGGGGGGAGGGTGGCGCCGTGCACGCGCAGAAACTCACGTCACGGTG
CGGCGCAGAGACGGGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCC
CACTACAGGACCCGCTTGCTCACGGTGCTGTGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGC
They contain the regulatory elements that encode instructions
GCTCTCTTGCTTAGAGTGGTGGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCT
on when, where, which, and how much proteins to make.
TGTATAGTGGTGGCACGCCGCCTGCTGGCAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGTGTA
GCAGCACGCCCACCTGCTGGCAGCTGGGGACACTGCCGGGCCCTCTTGCTCCAACAGTACTGGCGGA
http://commons.wikimedia.org/wiki/File:Simplified_tree.png
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCC
CCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAAC
How do you decode the instructions in this manual of life?
AACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTAAACCCTAACCCTAACCCTAACC
ACCCTAACCCCAACCCCAACCCCAACCCCAACCCCAACCCCAACCCTAACCCCTAACCCTAACCCTA
CTACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACC
If you print 100 characters per line and 50 lines per page,
ACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGCCGGC
CCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAG
youll fill
, stacked
high.
AACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGGCGCAG
CAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGACACATGCTAGCGCGTCGGGGTGGAGGCGTGGCGC
CGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGACACATGCTACCGCGTCCAGGGGTGGAGGCG
If you read one base per second, nonstop,
CGCAGGCGCAGAGAGGCGCACCGCGCCGGCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGTG
GCGTGGCGCAGGCGCAGAGACGCAAGCCTACGGGCGGGGGTTGGGGGGGCGTGTGTTGCAGGAGCAA
it will take you
CGCACGGCGCCGGGCTGGGGCGGGGGGAGGGTGGCGCCGTGCACGCGCAGAAACTCACGTCACGGTG
CGGCGCAGAGACGGGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCC
CACTACAGGACCCGCTTGCTCACGGTGCTGTGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGC
15 basepairs in >165,000 organisms!
10
GCTCTCTTGCTTAGAGTGGTGGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCT
TGTATAGTGGTGGCACGCCGCCTGCTGGCAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGTGTA
to finish reading!
It will take you
GCAGCACGCCCACCTGCTGGCAGCTGGGGACACTGCCGGGCCCTCTTGCTCCAACAGTACTGGCGGA
60 meters
100 years.
30 million years
Genbank growth
log2(bp) = 1.2 103 + 0.59y
R2 = 0.97, p value < 2.2 10 16
Doubling every 20 months
SRA Growth
log2(bp) = 4.6 103 + 2.3y
R2 = 0.97, p value < 2.2 10 16
Doubling every 5 months!
True
variant
Error
Huge amount
Explosive growth
Low signal to noise ratio
Multiple types
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt
Computer/
Computational
Sciences
Life
Sciences
Bioinformatics
What is bioinformatics?
Medicine
Medical/Clinical
Informatics
Systems Biology
Synthetic Biology
Bioinformatics/
Computational
Biology
Computer
Sciences
Physics
Mathematics Statistics
Copyright Peking University
Phenotype
RNA
Sequence alignment
Database similarity search
Motif finding
Gene finding
Computational
& comparative
genomics
Evolution
DNA
methylation
Differential
expression
Co-expression
ncRNA
Molecular
Networks
Proteins
Cells
Protein interaction
networks
Transcriptional
regulation networks
Metabolic and
signaling networks
Network dynamics
Virtual cell
simulations
Physiology/
Disease
Population genetics
Human genetics
Discovery
Data
Management
Data
Computation
Data
Mining
Databases
Ontologies
Meta-data
Algorithms
Software tools
Web servers
Biological
discoveries
Modeling/
Simulation
Predictive models
Systems simulation
Summary Questions
What is bioinformatics?
Can you name some of the biomedical questions
studied in bioinformatics?
Can you name some of the technical questions studied
in bioinformatics?
Why are you interested in bioinformatics?
https://www.coursera.org/course/pkubioinfo