Vous êtes sur la page 1sur 77

BIO4320 Lecture Materials, Prepared by Dr.

Hon-Ming Lam

DNA Sequencing and


Sequence Analysis
Further Readings:
“Genome II” by T.A. Brown, Ch. 6;
“Gene Cloning and DNA Analysis” by T.A. Brown, Ch. 10;
“Bioinformatics: a Practical Guide to the Analysis of Genes
and Proteins” by A.D. Baxevanis and B.F.Francis Ouellette
(1998), Ch. 7, 11;
www.ncbi.nlm.nih.gov
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Two Major Methods


• Chemical degradation (Maxam and Gilbert, 1972)
– double-stranded DNA
– no primer is need (i.e. prior sequence information
is not necessary)
– involve toxic chemicals
– hard to automate
• Chain termination method (Sanger et al., 1977)
– single-stranded DNA as template
– based on enzymatic synthesis
– random chain termination by dideoxynucleotides
– relatively easy for automation
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chemical Degradation
Double-stranded DNA
End labeling
Denature into single-stranded
DNA; chemical cleavage
C G
C Cleave at C G
C Cleave at G G

T G
C Cleave at G and A G
C G
C Cleave at C and T A
T A
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chemical Degradation
C C&T G G&A
C 3’
T
C
G
G
C
G
T
A
G Assume 5’
T
C
end-labeling
T
G
A 5’
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chain Termination
Double-stranded DNA
Denature or replicate (in
filamentous phage) into
single-stranded DNA

Add primer, dNTPs and


5’ 3’
didexoynucleotides;
New chain randomly
terminated at A, C, G, or T
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chain Termination
5’ 3’
ddA
5’ 3’
ddA
5’ 3’
ddA
• Add primer, dNTPs and
didexoynucleotides;
5’ 3’ • New chain randomly
ddC terminated at A, C, G, or T
5’ 3’
ddC
5’ 3’
ddC
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Chain Termination
A C G T Template Sequence
A 3’ 5’ T
C G
A T
G C
G C
A T
T A
C G
T A
T A
C G
A T
C G
G C
T 5’ 3’ A
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Thermal Cycle Sequencing

ddA • Starting with ds DNA


ddA templates;
ddA
• PCR with just one primer in
the presence on ddNTPs;
• The number of chain-
terminated strands increase
ddC
as more cycles are carried out
ddC
ddC
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Automated DNA Sequencing


ddC ddG
ddC ddG
ddC ddG

ddT ddA
ddT ddA

AGTGCCACGT
• Use fluorescently-labeled ddNTPs
in chain termination reactions;
• Detect by an imaging system
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Pyrosequencing
5’ 3’
• Rapid, no separation of
+A degraded products, no ddNTPs
+C degraded • Addition of a dNTP releases
+G degraded a pyrophosphate molecule,
+T chemiluminescence reaction with sulfurylase to
5’ 3’ form a flash of
T chemiluminescence
+A degraded • dNTPs added one by one
+C degraded • Unused dNTPs degraded by
+G chemiluminescence
nucleotidase
5’ 3’
GT
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

DNA Sequencing by Gene Chips


ACTACCGATC • A gene chip carries every
CTACCGATCC possible 10-mer
TACCGATCCG oligonucleotides
ACCGATCCGA
• Hybridize with target DNA
ACTACCGATCCGA • Align sequence of oligos that
give positive signals
• For 10-mer: 1,048,576 spots
can sequence 1 Kb
• For 8-mer: 65,536 spots can
sequence 256 bps
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Analyzing the Sequenced Genes


• Structure prediction
– Secondary structure of DNA and RNA
– Possible 3-D structure of proteins
• Identity of the encoded gene/gene product
– Prediction of general physical properties (e.g. M.W., pI; may be
important for proteonomic analysis)
– Database (e.g. Genbank) search based on sequence homology
• Possible function of the encoded gene product
– Search for signature domains or function motifs using consensus
patterns (based on statistics)
• Possible location of the encoded gene product
– Prediction of subcellular localization by consensus patterns
• Prediction of evolutionary relationship
– Multiple alignment, clustering, etc.
• Gene prediction from genomic sequences
– Prediction for coding regions and location of introns
– Prediction for promoter regions
• Prediction of regulatory sites
– Prediction of consensus cis-acting regulatory elements
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Homologues
• Paralogues: related genes from gene
duplication in the same genome; may
diverge to play different roles.
• Orthologues: homologues in different
species.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Align and Compare Sequences


IAMABIOSTUDENTTAKINGMARINEANIMALPLANT

IAMABIOSTUDENTTAKINGMOLGENANIMALPLANT

IAMAMBTSTUDENTTAKINGMOLGENMETHODDIVER

IAMAMBTSTUDENTTAKINGMOLBIOMETHODDIVER
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Align and Compare Sequences

IAMABIOSTUDENTTAKINGMARINEANIMALPLANT
Gp1
IAMABIOSTUDENTTAKINGMOLGENANIMALPLANT

IAMAMBTSTUDENTTAKINGMOLGENMETHODDIVER
Gp2
IAMAMBTSTUDENTTAKINGMOLBIOMETHODDIVER

Consensus
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

BLAST Search
• www.ncbi.nlm.nih.gov/
• Basic Local Alignment Search Tool
• Uses heuristic algorithm which seeks local
(instead of global) alignments; able to detect
relationships among sequences which shares
similarity only in isolated regions
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
1 aagcgaacgt tacagagcta tacaagaaat tatatggaga ttctgttttc tatttccatt
61 cttgctcttc tcttttccgg agtagtagct gctccaagcg acgatgatgt tttcgaagag
121 gttagggttg gattggtggt tgacttgagt tctattcaag gcaagattct ggaaacttct
181 tttaacttag cgctttcaga tttctatggc atcaacaatg gataccgaac cagagtctct
241 gttttggtca gagactccca aggagacccg atcattgctc ttgccgccgc tactgatctt
301 ctcaaaaatg caaaagcgga agccattgtt ggtgcacaat cattacaaga ggcaaagctt
361 ttggcgacga ttagcgaaaa agctaaagtt ccggtcatat ctactttctt gccaaacacg
421 ttatctttga agaaatacga taactttatt caatggacgc atgatactac atcagaggct
481 aagggaatta caagtctcat acaagatttc agttgtaaat cggttgtggt tatatacgag
541
601
661
gatgctgatg
atctatatcg
aatcagctaa
attggagtga
ctcgttctgc
ggaagcttaa
gagtttgcaa
ttcttttgca
ggtctcaaga
atattggttg
gtctcatcat
gcatcggttt
agaattttca
caggagaaaa
ttgtggtgca
agataaagga
tcatatgatg
tatgtccgag
Sample DNA
721
781
841
attcttgttt
gcttggatcc
atgcaagggg
ctcgtctctt
tcactgcaag
tcattggttt
ccaatgtgta
aaccatgaac
caaatcttac
gagaagttag
tacttggaac
atccctgtat
gtttgatgga
attttgcaat
ctgaagaagt
agaagcgttc
aactaggtcg
taagaatttt
Sequence
901 acttcaagat tgaggaaacg tatgggagat gatacagaaa cagagcattc tagtgtaatc
961 atcggtttac gcgcacacga tatcgcttgt attctagcaa atgcagtaga gaagttcagt
1021 gtaagtggta aagttgaagc atcttcgaat gtatcagctg atcttctgga tacaattaga
1081 catagtagat tcaagggttt gagtggtgac atccaaatct ctgacaacaa atttatctca
1141 gagacatttg aaatcgtgaa tattggaaga gaaaaacaga gaaggatagg attatggagt
1201 ggtggtagtt ttagccaaag aagacagatt gtttggcctg gcaggtctcg taagatccca
1261 agacaccgtg ttttggcaga gaaaggtgaa aagaaggtgc ttagggtctt agttaccgca
1321 ggaaacaagg tcccgcatct agtgtcggtg cgtcctgatc ctgaaacagg tgttaatact
1381 gtctctggat tctgcgtaga ggttttcaag acttgcattg ctccttttaa ctacgagctt
1441 gaattcatac cttaccgtgg aaacaatgac aatcttgctt atctactttc tactcagaga
1501 gacaagtatg atgcagcagt tggtgatatc accatcactt ccaacagatc tttgtatgtt
1561 gattttactt tgccgtacac tgacattggt attggaatcc tgacagtaaa aaagaaaagc
1621 caagggatgt ggactttctt tgatcctttt gaaaaatcct tgtggctagc gagtggagct
1681 ttcttcgtct tgaccgggat tgttgtttgg ttggttgaac ggcccgttaa tccggagttt
1741 caaggctctt ggggacaaca acttagtatg atgctctggt ttggattctc taccattgtg
1801 tttgctcaca gggagaagct acagaaaatg tcatcaagat tcttagtcat agtttgggtt
1861 tttgtggtgt taatattgac ttcaagttac agcgcaaact tgacatcaac caagaccatt
1921 tctcgcatgc aattaaatca tcagatggtt ttcgggggat ctacgacgtc aatgactgcg
1981 aagctcggat ccattaatgc agttgaggcc tatgcacaac ttttgcgaga tggaactctt
2041 aatcatgtca tcaatgaaat accttatctc agtatcctta tcggaaatta tccgaatgat
2101 ttcgtaatga cagatagagt gactaatacc aatggctttg gctttatgtt ccagaaaggt
2161 tcggatttgg ttcctaaagt atcgcgagaa atcgcgaagc taagatcatt gggaatgttg
2221 aaagacatgg agaaaaaatg gtttcaaaag ctggattcac taaatgtaca ttccaacact
2281 gaggaagttg cctctaccaa cgacgatgat gaggcatcta agcgattcac cttccgtgag
2341 ttgcgcggtt tgttcatcat tgcgggagct gctcatgttc tcgtactagc cctacatctc
2401 tttcatacgc gtcaagaggt atcacgacta tgcaccaaac ttcaaagctt ctataagtaa
2461 aaagtgatcc atcattcata agctctacta tagcaattga tggaggactc ataagtaaca
2521 acaaagtaca cttcgaaaca aatgtcacat gtaatacttg gttttttttc ccgtttaaat
2581 tcacatgtaa taatttaact cacgtaaata ctaaagtgat tcacccaaaa aaaaaaaaaa
2641 a
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Sample Protein Sequence


MEILFSISILALLFSGVVAAPSDDDVFEEVRVGLVVDLSSIQGK
ILETSFNLALSDFYGINNGYRTRVSVLVRDSQGDPIIALAAATDLLKNAKAEAIVGAQ
SLQEAKLLATISEKAKVPVISTFLPNTLSLKKYDNFIQWTHDTTSEAKGITSLIQDFS
CKSVVVIYEDADDWSESLQILVENFQDKGIYIARSASFAVSSSGENHMMNQLRKLKVS
RASVFVVHMSEILVSRLFQCVEKLGLMEEAFAWILTARTMNYLEHFAITRSMQGVIGF
KSYIPVSEEVKNFTSRLRKRMGDDTETEHSSVIIGLRAHDIACILANAVEKFSVSGKV
EASSNVSADLLDTIRHSRFKGLSGDIQISDNKFISETFEIVNIGREKQRRIGLWSGGS
FSQRRQIVWPGRSRKIPRHRVLAEKGEKKVLRVLVTAGNKVPHLVSVRPDPETGVNTV
SGFCVEVFKTCIAPFNYELEFIPYRGNNDNLAYLLSTQRDKYDAAVGDITITSNRSLY
VDFTLPYTDIGIGILTVKKKSQGMWTFFDPFEKSLWLASGAFFVLTGIVVWLVERPVN
PEFQGSWGQQLSMMLWFGFSTIVFAHREKLQKMSSRFLVIVWVFVVLILTSSYSANLT
STKTISRMQLNHQMVFGGSTTSMTAKLGSINAVEAYAQLLRDGTLNHVINEIPYLSIL
IGNYPNDFVMTDRVTNTNGFGFMFQKGSDLVPKVSREIAKLRSLGMLKDMEKKWFQKL
DSLNVHSNTEEVASTNDDDEASKRFTFRELRGLFIIAGAAHVLVLALHLFHTRQEVSR
LCTKLQSFYK
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Go to the ncbi website;


Enter BLAST program
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

blastn: good for high score


search; not for comparison
of distant relationship
blastp: use substitution
matrix to find distant
relationship; can use SEG
to filter low complexity
region
blastx: use for new DNA
sequences and analysis of
ESTs
tblastn: search for coding
regions that are not defined
in the database
tblastx: use for analysis of
ESTs
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Paste your
sequence and
choose one
database
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Nucleotide Database
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Protein Database
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
Bit Score
The value S' is derived from the raw alignment
score S in which the statistical properties of the
scoring system used have been taken into
account. Because bit scores have been
normalized with respect to the scoring system,
they can be used to compare alignment scores
from different searches.
E Value
Expectation value. The number of different
alignments with scores equivalent to or better
than S that are expected to occur in a database
search by chance. The lower the E value, the
more significant the score.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Perform blastp
search using
predicted protein
sequence
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

CDD Search
Compares protein sequences to the Conserved
Domain Database. The CDD is a database
containing a collection of functional and/or
structural domains derived from two popular
collections, Smart and Pfam, plus contributions
from colleagues at NCBI.
Matrix
A key element in evaluating the quality of a pairwise
sequence alignment is the "substitution matrix",
which assigns a score for aligning any possible pair
of residues.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

PSI-BLAST
Position specific iterative BLAST refers to a
feature of BLAST 2.0 in which a profile (or position
specific scoring matrix, PSSM) is constructed
(automatically) from a multiple alignment of the
highest scoring hits in an initial BLAST search.
The PSSM is generated by calculating position-
specific scores for each position in the alignment.
Highly conserved positions receive high scores
and weakly conserved positions receive scores
near zero. The profile is used to perform a second
(etc.) BLAST search and the results of each
"iteration" used to refine the profile. This iterative
searching strategy results in increased sensitivity.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

PSSM
Position-specific scoring matrix. Based
on a Profile (A table that lists the
frequencies of each amino acid in each
position of protein sequence. Frequencies
are calculated from multiple alignments of
sequences containing a domain of
interest). The PSSM gives the log-odds
score for finding a particular matching
amino acid in a target sequence.
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam
BIO4320 Lecture Materials, Prepared by Dr. Hon-Ming Lam

Vous aimerez peut-être aussi