Digital Analysis of DNA: Synopsis

Chapter 9 171
Chapter 9 Digital Analysis of DNA

Synopsis:
This chapter introduces you to many of the recombinant DNA techniques that have provided a
powerful new approach for studying the mechanisms of inheritance and functions of specific genes
!estriction en"ymes# cloning DNA# ma$ing libraries# identifying clones of interest# DNA sequencing
and %C! amplification are now &ust a part of the tool$it that all biologists 'not &ust geneticists( use
These techniques will be referred to over and over throughout this te)tboo$ 'and probably in your other
biology courses as well( so it is worthwhile to get a solid understanding of these techniques from this
chapter
As you read about the various techniques and apply them to solve problems# try to $eep in mind
which techniques are done in solutions in test tubes 'restriction en"yme digests# ligating fragments
together# %C!# DNA sequencing# ma$ing cDNA( and which techniques involve analy"ing or
manipulating DNA in cells 'transformations# screening libraries# preparing large amounts of cloned
DNA# total genomic DNA or cellular !NA( This should help your understanding of the techniques and
their uses *ybridi"ation of nucleic acids is central to many techniques but is often challenging to
understand The basis of hybridi"ation is complementarity of bases in forming double stranded nucleic
acids A probe DNA or !NA molecule is used to locate a specific sequence 'on a nitrocellulose or
membrane based blot after electrophoresis in a gel# as a clone inside a cell# or in a chromosome squash(
based on hybridi"ation A probe contains a recogni"able radioactive or fluorescent tag that ma$es it
possible to identify the place where the probe found a complementary sequence
Signifcant Elements:
After reading the chapter and thin$ing about the concepts# you should be able to+
Describe the essential steps in cloning
Describe the basic components and uses of different types of cloning vectors
,a$e a map of restriction en"yme sites
!ead and interpret DNA sequencing gels 'Feature Figure 9.13( and automated DNA sequencing
results 'Figure 9.14(
Design %C! primers
Determine which technique's( you must use to achieve a desired goal There is often more than one
way to reach a goal *owever# there is usually one most efficient# preferred way to solve a
problem
The technique used determines what is being e)amined and limits the interpretation of the data -or
instance# probing a genomic library will give you a clone that is homologous to the probe# but this
17. Chapter 9
clone probably won/t be transcribed and translated in E. coli %robing a cDNA library will give you
a clone which can be translated and transcribed in E. coli
Problem Solving Tips:
Essential Steps in Cloning:
Cloning is basically a straightforward process that has lots of options and variations that can be used
depending on what is desired 0asic components are insert DNA and vector There are relatively few
sources for the insert DNAs *owever there are many# many types of vectors that have been developed
for various purposes
Types of insert DNA
cDNAs contain only the regions of genes that are present in processed 'spliced( transcripts
synthesi"ed in the cell from which they were isolated 'Figure 9.8(
genomic DNAs are digested fragments of the genomic DNA of an organism# and so contain all of
the DNA 'genes and non1coding regions( from the cells
0asic vector criteria
vectors must have an origin of replication so they can be replicated in the host organism# usually E.
coli
vectors must have a selectable mar$er's( so you can determine that they are present in the host
organism2 the selectable mar$er is often an antibiotic resistance
vectors also often have multiple cloning sites with $nown restriction sites and ways to detect the
presence of an insert DNA after cloning 3ne e)ample of an insert detection system is the 41
galactosidase 5 61gal detection system 7nsertion of a fragment into the middle of the lacZ gene
inactivates the gene Cells carrying an insert within the lacZ gene are unable to cleave a lactose1
li$e substrate '61gal( and are phenotypically 8ac
1
They are recogni"ed as white colonies while
colonies that received intact copies of the vector 'no insert interrupting the lacZ gene( can cleave
the substrate# turning the cells blue
Types of vectors5purpose of cloning ' Table 9.2(
plasmid vectors accept small pieces of insert DNA '19 $b or less( %lasmid vectors may be used to
amplify large amounts of specific DNA sequences :peciali"ed plasmid vectors called e)pression
vectors allow transcription and translation of cloned genes2 must be used with cDNA inserts
';enetics and :ociety# !ecombinant DNA Technology and %est1resistant Crops Figure A( <se
your $nowledge of the requirements for transcription and translation when considering if genes
cloned into e)pression vectors will be e)pressed in the host cell
0AC vectors 'bacterial artificial chromosomes( accept very large inserts of =99 $b
Cloning
Chapter 9 17=
after restriction en"yme digestion# mi) insert and vector DNAs and ligate together stic$y ends that
have complementary overhanging single1stranded bases can be 7t may be helpful to draw out the >/
and =/ ends generated 'including the individual bases of the recognition site( when a double
stranded DNA is cut by a restriction en"yme 'Figure 9.2(
transform the ligation mi) into the host cells# usually E. coli
select for presence of vector 'may also be able to isolate those vectors that you $now have an
insert(
grow up a large amount of the clone's(
7dentifying the desired clone
often you must identify a particular desired clone from a large variety of different inserts2 this
usually involves probing# or hybridi"ation with a labeled DNA
3ther Techniques
gel electrophoresis separates DNA fragments according to their si"e 'Feature Figure 9.4(
blotting is the process of transferring the material in the gel to a nitrocellulose filter or a nylon
membrane and covalently binding the material from the gel to the filter or membrane A :outhern
blot has DNA on the membrane 'a genomic :outhern has genomic DNA(# a Northern blot has
m!NA on the membrane and a ?estern blot has protein on the membrane 'Feature Figure 9.11(
!estriction mapping is part science and part art# li$e putting together a &igsaw pu""le <se a pencil
and an eraser 0e patient The first step is usually ascertaining if you began with a linear or a
circular piece of DNA <sually this is gotten out of conte)t 1 a plasmid clone is circular# for
instance 0egin the map by e)amining a single digestion lane on the gel and determining the total
si"e of the DNA 'the sum of all the fragments( and the number of restriction sites for that en"yme
'. fragments when you digested a circular piece of DNA means there were . restriction sites2 .
fragments when you digested a linear piece of DNA means there was only 1 restriction site( Ne)t#
loo$ at the double digestion lane Determine which bands from the single digestion are left
undigested in the double en"yme digestion The fragments from the single en"yme digestion that
disappear in the double digestion must have a restriction site for the second en"yme within them
-igure out which smaller fragments they have been bro$en into# then begin mi)ing and matching
various combinations of bands until you find one that gives you an order that will give the correct
pattern of bands when you digest the DNA with the second restriction en"yme alone 'see problems
91> and 91@( ,a$e sure the final sites you put on a map are consistent with results from all
digests
DNA sequencing provides the ultimate description of a cloned fragment of DNA ,a$e sure you
can e)plain the :anger sequencing method 'dideo)y sequencing( to a friend 'Feature Figure 9.13(
%C! rapidly purifies and amplifies a single DNA fragment from a comple) mi)ture 'Feature
Figure 9.12( 7n order to do %C! you must $now something about the DNA sequence of . short
17A Chapter 9
stretches of the DNA to be amplified The DNA fragment to be amplified is defined by a pair of
oligonucleotide primers that are each complementary to one of the strands of the DNA template
These primers are e)tended at their =/ ends The si"e of the final product of the %C! reaction is
determined by the distance between the >/ ends of the primer pair
Solutions to Problems:
Vocabulary
9-1. a 102 b 12 c 92 d 72 e 62 f 22 g 82 h 32 i 52 & 4
Section 9.1 Sequence-Specifc DNA Fragmentation
9-2.
a Sau=A recognition sites are A bases long and are e)pected to occur randomly every A
A
or .>@
bases The human genome contains about = 19
9
bases# one would e)pect =)19
9
5.>@ B 1.)19
7

~12,000,000 fragments
b Bam*7 recognition sites are @ bases long and would be e)pected every A
@
or A99@ bases
=)19
9
5A#199 B7=)19
>
~700,000 fragments are e!e"te#
c The Sfi7 recognition site is C specific bases The N indicates that any of the four bases is possible
at that site and therefore does not enter into the calculations !ecognition sites would be e)pected
every A
C
or @>#>=@ bases2 =)19
9
5@>#>99 B A@)19
A
~46,000 fragments are e!e"te#
9-3. :ee Feature Figure 9.4 and the section in the chapter /;el electrophoresis distinguishes DNA
fragments according to si"e/ The rate at which a piece of DNA moves through a gel is dependent on the
strength of the electric field# the gel composition# the charge density and the physical si"e of the
molecule ?hen electrophoresing DNA the only variable is the si"e of the molecule 1 all the rest of the
variables are the same for each molecule $%nger &'A m%le"ules ta(e u! m%re )%lume an#
t*eref%re bum! int% t*e gel matri, sl%+ing #%+n t*e m%le"ule,s m%)ement :horter molecules can
easily slip through many pore si"es in the gel matri)
9-4. ?hen you digest a circular DNA one fragment indicates that the DNA has 1 restriction site for the
en"yme Thus# Bam*7 and Eco!7 each cut the plasmid once The double digest gives information about
the relative positions of these two sites The . restriction sites are at two different positions on the
Chapter 9 17>
plasmid The Eco!7 site is = $b away from the Bam*7 site and it is @ $b around the rest of the circle
bac$ to the Eco!7
9-5.
a !emember the %roblem :olving Tips at the beginning of this chapterD 7f there is one restriction site
then digesting a circular molecule results in one fragment# while digesting a linear molecule
generates two fragments Digestion of a circular molecule will always result in one fewer
restriction fragments than the digest of a linear molecule -am!le A is t*eref%re t*e "ir"ular f%rm
of the bacteriophage DNA
b The length of the linear molecule is determined by adding the lengths of the fragments from one
digest >9E=9E.9 $b B 10.0 (b 'This si"e is not realistic 1 F DNA is# in fact# about >9 $b in
length(
c The circular form is the same length 1 10.0 (b
d Comparison of the circular and linear maps gives you information on which fragments contain the
ends of the linear molecule The >9 $b Eco!7 fragment is present in the circular but not the linear
digest so the A9 and 19 $b fragments must be &oined in the circular map while they are at either
end of the linear molecule 0egin drawing a picture of the molecule for yourself at this point The
same logic applies to the .7 $b Bam*7 fragment G it is present in the circular but not the linear
digest so the ..$b and 9> $b pieces must be at the ends of the linear molecule 7f the 9> $b
Bam*7 fragment was at the end where the Eco!7 19 $b fragment is# the 19 $b Eco!7 fragment
would have been cut by Bam*7 in the double digest *owever# the 19 $b fragment is still in the
double digest# so the 9> $b fragment must be within the A9 $b Eco!7 fragment The remaining
Eco!7 site is placed based on the double digests The .9 $b Eco!7 fragment is not cut by Bam*7
but the =9 $b fragment is# so place the site within the =9 $b Now double chec$ that all the
Bam*7EEco!7 fragment si"es are as seen in the different double digests
17@ Chapter 9
9-6. %lasmids are circular pieces of DNA# thus the Eco!7 and Sal7 digests indicate that there is one site
for each of these en"ymes Hind777# in contrast# cuts the molecule at three sites Draw a circle showing
the three Hind777 sites 7n the Sal7EHind777 digest the A9 $b Hind777 fragment is cut into .> and 1> $b
fragments The Sal7 site is therefore 1> $b from one end or the other in the A9 $b Hind777 fragment
:imilarly the Eco!7EHind777 double digest splits the 19 $b Hind777 fragment into 9@ and 9A $b
fragments# but the orientation of the Eco!7 site within the 19 $b Hind777 is ambiguous Try placing the
Eco!7 site in the two different positions in the 1 $b Hind777 fragment 7n each case see how this fits
with the Eco!7ESal7 digestion results The orientation that wor$s places the 9A $b Hind7771Eco!7
fragment ad&acent to the .> $b Sal71Hind777 fragment
Section 9.2 Cloning Fragments of DNA
9-7. :electable mar$ers in vectors provide a means %f #etermining +*i"* "ells in t*e transf%rmati%n
mi ta(e u! t*e )e"t%r These mar$ers are often drug resistance genes so a drug can be added to the
media and only those cells that have received and maintained the vector will grow
9-8. The study of genes often involves studying mutations in the genes and the phenotypes 'or diseases(
associated with these mutations 7f you are interested in studying mutations and diseases then you want
to focus on the protein1coding part of the genes .u(ar/%ti" genes are %ften )er/ large *owever t*e
ma0%rit/ %f t*is &'A "%nsists %f intr%ni" se1uen"es which do not end up in the m!NA -or e)ample
the human dystrophin gene in humans is .#>99 $b '.> ,b# see -igure C1>( The gene has more than
C9 introns which are spliced out to give an m!NA that is 1A$b long Therefore .#AC@ $b of the
dystrophin gene is intronsD Thus# m%st %f t*e &'A in eu(ar/%ti" gen%mi" libraries #%es n%t "%#e f%r
Chapter 9 177
!r%teins 7t can be difficult to figure out which sequences of the genomic DNA are actually part of the
m!NA so it can be difficult to figure out which gene sequences are important to the protein and which
are unimportant "&'A libraries, +*i"* are ma#e fr%m t*e m2'As, all%+ /%u t% ign%re all %f t*ese
intr%ni" se1uen"es All eu$aryotic m!NAs have polyA tails at their =/ end and this is used to ma$e
cDNAs The process begins by isolating m!NAs from an organism or a tissue in an organism and then
using polyT primer with reverse transcriptase 'Figure 9.8(
7n pro$aryotes most of the DNA in the genome codes for m!NA G there is very little non1
transcribed DNA %ro$aryotes also lac$ introns# so without processing the transcript is the same thing
as the m!NA 7n general the >/ and =/ <T!s are small# so most of the m!NA consists of coding
sequences 7t would also be difficult to ma$e cDNA libraries in !r%(ar/%tes be"ause t*ere is n%
!%l/A tail n%r an/ %t*er "%mm%n se1uen"e bet+een all m2'As
9-9. -irst# wor$ through the digestion and ligation of the DNA fragments and the vector The vector is
cut with Bam*7# leaving the following ends+
>/ H; ;ATCCH
=/ HCCTA; ;H
The insert DNA is cut with Mbo7# leaving the following stic$y ends+
>/ 3 4AT53
=/ 35TA4 3
The ligation of an Mbo7 fragment to a BamH7 stic$y end will only occasionally create a sequence that
can be digested by Bam*7 7t depends on the e)act base sequence at the ends of the Mbo7 fragment
The /6/ in the sequence below indicates this ambiguity 7n all cases the following sequence will be
found+ The sequences from the inserted Mbo7 fragment are in bold
>/ H;4AT56HHHHHHHH6;ATCCH
=/ HCCTA;6 HHHHHHHH65TA4;H
a 1007 of the &unctions can be digested with Mbo7
b A &unction that can be digested with Bam*7 must have a C at the =/ end of the Mbo7 recognition
sequence This would occur 15A or 257 %f t*e time
c '%ne of the &unctions will be cleavable by Xor77
d The first five bases fit the recognition site for Eco!77 The final position must be a pyrimidine 'C
or T( There is a 182 "*an"e that the &unction will contain an EcoR77 site
e -or the restriction site to be a Bam*7 site in the human genome it must have had a ; at the >/ end
This ; was in the vector sequence in the clones created The chance that the >/ end was N3T a
;B384
9-10.
17C Chapter 9
a The gen%mi" librar/ is based on the most inclusive and comple) starting material# so it would
consist of the greatest number of different clones
b All %f t*ese libraries +%ul# %)erla! ea"* %t*er t% s%me etent The genomic library contains all
the DNA sequences# while the other libraries are made up of subsets of the genomic sequences All
cells e)press a common subset of genes 'house$eeping genes( These genes would result in some
overlap of clones# although the cDNA libraries will each contain some unique sequences Although
introns often have repeated DNA# the transcribed and translated portions of sequences are usually
unique# so the library of unique genomic sequences will overlap with the cDNA libraries as well
c 4T*e gen%mi" libraries/ are "reate# fr%m uses t*e t%tal "*r%m%s%mal &'A an# insi#e t*e
"ell. T*e re!etiti)e se1uen"es in t*e gen%mi" &'A +%ul# *a)e t% be rem%)e# t% "reate at*e
uni1ue &'A librar/. T*e "&'A libraries are t/!i"all/ "reate# fr%m all start +it* t*e m2'A
!resent in t*e "ells an# t*us re!resent t*eref%re t*e e!resse# genes in t*ese "ells :ince
genomic DNA libraries are created from all of the DNA in the cell# genomic DNA libraries from
either the liver or brain should be identical *owever# cDNA libraries from liver and the brain
should have some clones that are identical between them but they should also have clones that are
entirely unique to each one as well as having clones that are derived from the same genes but
represent splice variants
9-11.
a Iou need 4-5 gen%me e1ui)alents to reach a 9>J confidence level that you will find a particular
unique DNA sequence
b The number of clones needed depends on the total si"e of the genome of your research organism
and the average insert si"e in the vector 0AC inserts can be >99$b while plasmid vectors normally
have inserts smaller than 1> $b &i)i#e t*e number %f base !airs in t*e gen%me b/ t*e a)erage
insert si9e t*en multi!l/ b/ fi)e to get the number of clones in five genome equivalents
9-12.
a An intact copy of the whole gene would be on a fragment larger than 1A9 $bp and would therefore
have to be cloned into a :A5 )e"t%r
b The entire coding sequence of 9=C7 $bp could be cloned into a :A5 !lasmi# )e"t%r 'K1>$bp(
'=91A> $b inserts( as a cDNA copy of the gene
c L)ons are usually small enough to clone into a !lasmi# )e"t%r 'K1> $bp inserts(
9-13. ;*en t*e )e"t%r <!;2590= is #igeste# +it* Eco2> /%u get %ne 2.4 (b fragment. ;*en t*e
)e"t%r is #igeste# +it* Mbo> t*ere are 3 fragments - 0.3, 0.5 an# 1.6 (b The somatostatin insert
was cloned into the vector at the Eco!7 site There is also an Eco!7 site very near one end of the insert
Chapter 9 179
DNA Therefore, after #igesti%n %f t*e re"%mbinant !lasmi# +it* Eco2>, a small Eco2> insert
fragment %f 49 b! an# t*e )e"t%r fragment %f 2.4 (b +ill be generate# Ne)t# consider the Mbo7
restriction pattern The insert fragment contains an Mbo7 site > bp from one end The insert fragment
could ligate into the vector in either of . possible orientations >n %ne %rientati%n t*e Mbo> site in t*e
insert is nearest t*e 700 b! Mbo> )e"t%r fragment, s% #igesti%n +it* Mbo> !r%#u"es 705, 300, 500
an# 944 <f%rme# fr%m t*e 900 b! )e"t%r fragment ? t*e rest %f t*e insert= b! fragments. >n t*e
%t*er %rientati%n, t*e Mbo> #igest !r%#u"es 905, 500, 300 an# 744 b! fragments.
9-14. Draw the recombinant plasmid to help you determine the fragment si"es before s$etching the gel
9-15.
a The goal of a ligation is to generate clones which have attached one piece of frog DNA to one
vector molecule A ligation mi)ture consists of linear double stranded vector DNA with
complementary Eco!7 stic$y ends 'Figure 9.2b and Figure 9.6( at both ends and linear double
stranded frog DNA with complementary Eco!7 stic$y ends at both ends 8igase simply attaches a
=/3* 'hydro)yl( group to a >/% 'phosphate( There are three different products that will occur in a
ligation mi) 'i( The desired ligation is vector5frog 'intermolecular ligation( 'ii( 8igase will also
&oin vector5vector 'intramolecular ligation which yields reconstituted vector molecules with no
inserts( and 'iii( frog5frog 'intramolecular ligation# giving chains of insert DNA with no vector( 7n
order to encourage the desired result you add more vector than insert G the vector DNA is easier to
come by This decreases the li$elihood of chains of the insert DNA and increases the probability
that any vector molecule that is ligated to an insert is only ligated to one insert molecule *owever
adding more vector increases the li$elihood of reconstituted vector with N3 inserts To decrease the
amount of reconstituted vector you treat the linear# digested vector with al$aline phosphatase
Al(aline !*%s!*atase rem%)es t*e 5,-!*%s!*ate gr%u!s on the linear DNA molecule G see M
below !emember that this represents the digested vector# so the DNA strands are contiguous
1C9 Chapter 9
e)cept for the bo)ed area This continuity is represented by the dashes at the ends of the lines
The bo)ed area represents the stic$y ends created by Lco!7
Chapter 9 1C1
3'OH 5'P*
5'P* 3'OH
After the treatment with al$aline phosphatase ligase "an n%t 0%in a */#r%/l gr%u! t% t*e #e-
!*%s!*%r/late# 5, en#s Therefore the . ends of the vector can not be ligated to each other and
this treated molecule will remain linear 7f insert DNA is added then the ligase will &oin the =/3* on
the vector with the >/% on the insert 7n effect this will ligate the left end of the top strand of the
vector shown above to the insert The left end of the bottom strand can not be ligated to the insert
leaving a nic$ in the bottom strand at this point 3n the right end the bottom strand ligates to the
insert and the top strand at the right end can not ligate leaving another nic$ The ligation mi) is
then transformed into Escherichia coli These nic$s in the phosphate bac$bone of the cloned DNA
are repaired after the ligated DNA enters the cells
%lasmid vectors are constructed so that they contain the lacZ gene with a restriction site right
in the middle of the gene 7f the vector reanneals to itself without inclusion of an insert# the lacZ
gene will remain uninterrupted2 if an insert has been cloned into the vector the lacZ gene will be
interrupted The ligation mi) is transformed into E. coli cells such that about one cell out of 1#999
cells ta$es up a plasmid The transformed cells are plated on media containing ampicillin 3nly the
cells with a plasmid will grow# thus removing the intramolecular ligation products that consist of
inserts The media also contains 61;al This is a substrate for the 41galactosidase protein that is
coded for by the lacZ gene The 41galactosidase en"yme cleaves 61;al and produces a molecule
that turns the cell blue Those cells that too$ up an intact# re1circulari"ed vector with no insert will
produce 41galactosidase and form blue colonies The bacterial cells that too$ up a vector E insert
'clone( will not be able to produce functional 41galactosidase and will form white colonies
T*e ligati%n +it* t*e n%n-!*%s!*%r/late# )e"t%r reanneals t% itself at a *ig* fre1uen"/,
lea#ing t% 998100 blue "%l%nies. T*e !*%s!*%r/late# )e"t%r f%rme# 998100 +*ite "%l%nies,
s*%+ing t*at alm%st all %f t*e )e"t%rs *a# an insert
b @es# the suggestion was a good one T*e #e!*%s!*%r/lati%n %f t*e )e"t%r in"rease# t*e number
%f "l%nes <)e"t%r ? insert= 100 f%l#.
c The choice of whether to dephosphorylate the vector versus the insert DNA is based on an
understanding of the mechanics of the bacterial transformation that is carried out after the ligation
7f the vector is dephosphorylated it cannot self1ligate The insert can self1ligate The self1ligated
inserts do not have any vector DNA# so they do not have a bacterial origin of replication '3!7( nor
do they have a gene encoding antibiotic resistance Therefore# these recirculari"ed DNA/s will not
allow the transformed bacteria to grow on the selective media >f t*e insert +ere
#e!*%s!*%r/late#, it +ill n%t self-ligate, but t*e )e"t%r ;>$$ self-ligate. T*e )e"t%r *as t*e
1C. Chapter 9
antibi%ti" resistan"e gene an# A2>, s% t*e Bem!t/B )e"t%r +ill be !r%!agate# in E. coli,
generating a *ig* le)el %f Bba"(gr%un#.B
Section 9.3 Hybridization
9-16.
a <1= 3.1, 6.9 (bC <2= 4.3, 4.0, 1.7 (bC <3= 1.5, 0.6, 1.0, 6.9 (bC <4= 4.3, 2.1, 1.9, 1.7 (bC <5= 3.1, 1.2,
4.0, 1.7 (b
b The 6.9 (b fragment in t*e Eco2>?Hin#>>> #igestC t*e 2.1 an# 1.9 (b fragments in t*e
BamD>?Pst>, an# t*e 4.0 (b fragment in t*e Eco2>?BamD> #igest will hybridi"e with the A9
$b probe
9-17
a The fragment si"es are too large to be resolved appropriately on a polyacrylamide gel
necessitating electrophoresis on an agarose gel
b Digestion of human genomic DNA with these en"ymes will result in hundreds of thousands of
fragments The si"es of these fragments will range from tens of thousands of base pairs to only a
few base pairs in lenghlength Agarose gel electrophoresis is not able to resolve fragments that
differ from each other by a few base pairs and so the digested DNA will appear as a smear
c The probe that is used does not hybridi"e to all of the restriction fragments that are generated by
the different digests
d can not draw in e)cel#
e No an orientation can not be established from the information given
9-18. %robes need to be at least 1> nucleotides to effectively anneal to DNA 7n this e)periment short
probes are desirable# because the longer the probe the greater the degeneracy Thus# this type of
e)periment is usually done with !r%bes bet+een ab%ut 15 an# 18 nu"le%ti#es l%ng The design of
degenerate probes is based on reverse translation# and there are a few considerations to $eep in mind+
'i( if you $now the amino acid sequence of the protein in one species then you can ma$e some guesses
about the amino acid sequence of the corresponding gene in the second species Iou hope that the
amino acid sequence of a particular# small region of the protein will be identical in the two species
:ince there are .9 different amino acids even one amino acid difference would ma$e it hard to design a
E
E
H
H
4.0
K
1.0
K
0.5 1.5 1.0
Chapter 9 1C=
probe >f /%u (ne+ t*e se1uen"e %f t*e !r%tein fr%m se)eral ba"terial s!e"ies /%u "%ul# "*%%se a
)er/ *ig*l/ "%nser)e# regi%n %n +*i"* t% base a !r%be 7f the amino acids are identical in several
different species then they might be identical in Beneckea nigripulchritudo. 'ii( 7f you don/t $now
anything about the amino acid sequence of the protein in other species of bacteria then you would fin#
a regi%n %f 5 %r 6 "%ntigu%us amin% a"i#s +it* l%+ #egenera"/ 1 that is amino acids that are encoded
by the lowest possible number of codons The best choices are ,et and Trp which are each encoded by
only a single codon <nfortunately# it is highly unli$ely that a region of > or @ amino acids would be
composed solely of ,et and Trp The ne)t best choices are %he# Tyr# Cys# *is# ;ln# Asn# 8ys# Asp# or
;lu# which are each coded for by . codons The worst choices would be 8eu# Arg# and :er '@ codons(
7f you had a > amino acid region composed only of these three amino acids# then the number of
different molecules in the degenerate probe would be @
>
B 777@
9-19.
c# &# f 'although f could be perfornmed before c and &( These steps must be performed before the rest
The order for the rest of the steps is d# a# $# l# g# b# e# h
Section 9.4 PCR
9-20.
a The human genome sequence shows the sequence of the normal allele of %N< @%u +is* t% (n%+
+*et*er t*e EFG s/n#r%me in t*is !atient is "ause# b/ a mutati%n in t*e !*en/lalanine
*/#r%/lase gene Iou suspect that there might be such a mutation in this particular e)on# so you
will sequence the %C! product 7f there is a mutation in this 1 $b e)on# you want to $now e)actly
what it is# how it affects the en"yme# and perhaps something about the history of this mutation in
human populations -or e)ample# if you compare the sequence in many patients and trac$ where
the patients are from# you might get an idea of where this mutation arose in time and geographical
space 7f you do not find a mutation in this 1 $b e)on that changes the amino acid sequence of the
en"yme# there might still be a mutation in a different e)on
b 3ne haploid human genome contains = ) 19
9
bp Therefore '= ) 19
9
bp5haploid genome( ) '@@ )
19
.
g5mole( ) 'mole5@9. ) 19
.=
bp( B == ) 19
11.
g5haploid genome 7n other words# one haploid
genome weighs == ) 19
11.
g or == picograms Lach haploid genome will contain only one
phenylalanine hydro)ylase gene to be used as the template for the %C! reaction Iou start the %C!
reaction with 1 ng '1 ) 19
19
g( of human DNA Therefore '1 ) 19
19
g DNA( ) '1 haploid
1CA Chapter 9
genome5== ) 19
11.
g( ) '1 template molecule51 haploid genome( B 9= ) 19
=
template molecules
B 300 tem!late m%le"ules in 1 ng %f &'A
c Iou begin the %C! with =99 template molecules 7f the %C! runs for .> cycles then this number of
molecules doubles e)ponentially .> times Therefore you will end up with =99 molecules ) .
.>
B
19
19
or about 19 billion molecules This result e)plains the power of %C!+ you started with only
=99 template molecules and end up with 19 billion copies of the region you are amplifying 7n
practice the yields are not quite as high because not all potential template molecules get amplified
each cycle *owever the amplification is still substantial The %C! product is 1 $b long# so '19
19

molecules of %C! product( ) '19
=
bp5molecule of %C! product( ) 'mole5@9. ) 19
.=
bp( ) '@@ )
19
.
g5mole( B 11 ) 19
1C
g B 119 ng Iou started with 1 ng of the whole genome and ended up with
110 ng %f a 1 (b se"ti%n %f t*e gen%me after t*e E52H
9-21. %rimers have to be >/ to =/ and have the =/ end toward the center so DNA polymerase can e)tend
into the sequence being amplified 3nly set b. satisfies these criteria
9-22.
a 0oth of the primers in set b in problem 91. 1 A are 1C nucleotides long 7f 'i( human DNA is
assumed to be a random sequence of equal proportions of A# ;# C# and T 'this is not entirely
accurate# but it is close enough for this discussion(# and 'ii( no mismatches are allowed between the
primer and the genomic template 'again# this is not entirely accurate as seen in parts b and c below#
but again# it is close enough( then t*e "*an"e t*at %ne %f t*e t+% !rimers +ill anneal t% a
ran#%m regi%n %f &'A t*at is n%t t*e targete# 5FT2 e%n +%ul# be <184=
18
, %r ab%ut 1
"*an"e in 7 10
10
7n other words# an 18 base se1uen"e +ill be !resent %n"e in e)er/ 70 billi%n
nu"le%ti#es :ince the human genome is = billion nucleotides long it is e)tremely unli$ely that even
one of the primers will anneal anywhere else than the desired target The probability is much lower
that both of the primers will anneal to other stretches of DNA that happen to be close enough
together to allow the formation of a %C! product This latter number is hard to calculate e)actly
because of the variation in the possible distance between the primers
b 'i( The lower limit on the si"e of the primers is governed by two main factors -irst# the %C!
amplification must be specific# so the primers should be long enough to guarantee this specificity
As in part a# t*e "*an"e !r%babilit/ %f a 16 base se1uen"e in ran#%m &'A is <184=
16
, %r 1
"*an"e %ut %f 4 10
9
Therefore# two 1@ base pair primers allow a comfortable margin for
specificity ,ore importantly the primers must anneal to the genomic DNA to be amplified As
Chapter 9 1C>
discussed in Chapter 9# hydrogen bonding between 1> or 1@ nucleotides of contiguous base pairs is
required to allow DNA to remain double stranded 'ii( 7f the primers are too long# several potential
problems arise -irst# t*e l%nger t*e !rimers t*e m%re e!ensi)e t*e/ are t% s/nt*esi9e :econd#
t*e l%nger t*e !rimers t*e m%re li(el/ t*e/ are t% anneal +it* ea"* %t*er# or for a single primer
to anneal to itself and form a hairpin loop# and the less li$ely the primers are to anneal with the
template Third# and most importantly# if t*e !rimer is t%% l%ng it "an */bri#i9e +it* &'A +it*
+*i"* it is n%t !erfe"tl/ mat"*e# 7nternal mismatches are tolerated and hybridi"ation can occur
as long as there are enough surrounding base paired nucleotides# especially at the =/ end of the
primer Thus# l%nger !rimers mig*t anneal t% %t*er regi%ns %f t*e gen%me t*an t*e regi%n /%u
a"tuall/ +ant t% am!lif/
c Iou would be more li$ely to obtain a %C! product if the mismatch were at t*e 5,-en# The =/1end
of a primer is its business end 1 that is where DNA polymerase adds additional nucleotides to the
chain Iismat"*es at t*e 3,-en# +%ul# !re)ent &'A !%l/merase fr%m a##ing an/ ne+
nu"le%ti#es t% t*e "*ain 'Iou might remember that some DNA polymerases have a =/1to1>/
e)onuclease that could potentially remove the mismatch# now allowing further polymeri"ation This
is true of E. coli DNA polymerase# but many of the DNA polymerases used in %C! come from
thermophilic bacteria and these DNA polymerases do not have this e)onuclease activity( A
mismatch at the >/1end of the primer does not matter as long as there is enough base1pairing
between the primer and genomic template to allow annealing
9-23.
a The Eco!7 and the Sal7 restrictions sites are both found in the p,ore vector sequence shown in the
problem The Eco!7 site is nearer the >/ end and the Sal7 site is nearer the =/ end of the p,ore
sequence shown This region of p,ore is at the C1terminal end of the maltose binding protein
',0%( Therefore your cloning will insert the C-T! DNA sequence into the DNA sequence that
codes for the C1terminal end of the ,0% protein 7n other words# t*e '-terminus %f t*e fusi%n
!r%tein "%ntains m%st %f t*e I:E !r%tein se1uen"e The ,0% sequence ends at the Cth amino
acid from the C1terminus of ,0% where the Eco!1 site cuts the ,0% DNA T*e net !art %f t*e
fusi%n !r%tein "%ntains t*e 5FT2 !r%tein en"%#e# b/ t*e E52 !r%#u"t Note that the %C!
amplifies the last protein coding e)on of the C-T! gene Therefore t*e 5-terminal en# %f t*e
fusi%n !r%tein +ill "%ntain t*e 5-terminal en# %f 5FT2 !emember that the N1to1C orientation
of the C-T! protein must be the same as that of the fusion protein as a whole -urther details of
the fusion protein will be discussed in part c below
b ?hen you use two different restriction en"ymes# t*e 5FT2 gene "an %nl/ be inserte# int% t*e
)e"t%r +it* t*e #esire# %rientati%n yielding the fusion protein you described in part a Thus the
1C@ Chapter 9
N1to1C orientation of the C-T! protein will be the same as the ,0% protein 7f the vector was
only cut with Eco!1 and the %C! product had Eco!1 sites at both ends# then the %C! product
could be inserted into the vector in two equally li$ely orientations# only one of which is the one you
desire A second advantage is that cutting with two en"ymes minimi"es unwanted products of the
ligation in which ends of the same molecule come together 'see problem 9117 a and b(
c There are many things to ta$e into consideration here -irst# you can use the set b %C! primers
you designed in your answer to problem 91.A in order to amplify the entire C-T! e)on :econd#
the C-T! e)on does not have sites for Eco!1 and Sal7 so you need to add nucleotides to the >/1
ends of the two primers that will contain appropriate sites for the two restriction en"ymes These
sites cannot be e)actly at the >/1ends of the %C! primers G you must also add > more nucleotides
beyond the restriction sites to enable the restriction en"ymes to bind to their recognition sequences
and digest the DNA The sequence of these > nucleotides is not important Third# the two parts of
the fusion protein must end up being in frame 0ecause the %C! product encodes the C terminus of
the fusion protein# there are fewer constraints on the identity of the additional nucleotides added to
the second 'bac$wards( primer The answer below is &ust one of many possible solutions The
sequence of the critical part of the p,ore vector is reproduced here The dots at the left and right
ends of this sequence represent the continuity of the DNA 1 this was a circular plasmid before the
digestion
>/AGGATTTCAGAATTCGGATCCTCTAGAGTCGACCTGTAGGGCAA=/
=/TCCTAAAGTCTTAAGCCTAGGAGATCTCAGCTGGACATCCCGTT>/
The vector is digested with Eco!7 and Sal7 to generate these stic$y ends+
ArgIleSerGluPh
>/AGGATTTCAG TCGACCTGTAGGGCAA=/
=/TCCTAAAGTCTTAA GGACATCCCGTT>/
The %C! product using the set b primers 'problem 91.A( is shown below !emember that this %C!
product contains the last protein coding e)on of the C-T! gene The left hand primer only has one
open reading frame with the amino acid sequence shown below The right hand primer contains the
DNA sequence coding for the last four amino acids at the C1terminal end of the C-T! protein# as
shown in the problem The stop codon ':T%( is underlined Therefore the amino acids are
LeuArgSerGluPheSerGluOTrpAlaIleMet
>/ GGCTAAGATCTGAATTTTCCGAGTTGGGCAATAATGTAGCGC =/
=/ CCGATTCTAGACTTAAAAGGCTCAACCCGTTATTACATCGCG >/
Now you need to add an Eco!1 site to the >/ end of the left primer and a Sal7 site to the >/ end of
the right primer G the restriction sites are underlined below These sites cannot be directly at the
ends of the DNA sequence# so you need > random nucleotides added to each of the primers
-urthermore# you must maintain the continuity of the 3!- 'open reading frame( between the ,0%
Chapter 9 1C7
and the C-T! proteins after the vector and insert are digested and ligated Therefore two more
nucleotides 'note the two ;+C pairs# italici"ed( were added to the left primer between the restriction
site and the beginning of the C-T! 3!- Also# the region between the vector and the insert cannot
have any in1frame stop codons The %C! product using these primers is+
LeuArgSerGluPheSerGlu TrpAlaIleMet
>/ CCCCCGAATTCGGGCTAAGATCTGAATTTTCCGAGTTGGGCAATAATGTAGCGCGTCGACCCCCC =/
=/ GGGGGCTTAAGCCCGATTCTAGACTTAAAAGGCTCAACCCGTTATTACATCGCGCAGCTGGGGGG >/
<pon digestion of the %C! product with Eco!1 and Sal7# you will get+
LeuArgSerGluPheSerGluOTrpAlaIleMet
>/ AATTCGGGCTAAGATCTGAATTTTCCGAGTTGGGCAATAATGTAGCGCG =/
=/ GCCCGATTCTAGACTTAAAAGGCTCAACCCGTTATTACATCGCGCAGCT >/
Now you can ligate the vector and the %C! product yielding+
ArgIleSerGluPheGlyLeuArgSerGluPheSerGluOTrpAlaIleMetSTP
>/AGGATTTCAGAATTCGGGCTAAGATCTGAATTTTCCGAGO
TTGGGCAATAATGTAGCGCGTCGACCTGTAGGGCAA=/
=/TCCTAAAGTCTTAAGCCCGATTCTAGACTTAAAAGGCTCO
AACCCGTTATTACATCGCGCAGCTGGACATCCCGTT>/
The ;ly 'italicised( is the result of the ad&ustment to the %C! primer to ensure that the N1terminal
part of the C-T! region was in frame with ,0% :o in summary# the two %C! primers needed are+
5, 555554AATT54445TAA4AT5T4AATTTT5 3, an#
3, A5554TTATTA5AT54545A45T444444 5,
Again# there are many possible answers that have minor variations# but you must still go through
all of these steps to ma$e sure your %C! primers will wor$ properly
d The fusion protein contains almost all of ,0%# so it should also bind to the amylose resin The
cloning described in part b removes only the last 7 amino acids from ,0% Ia(e etra"ts %f
ba"terial "ells e!ressing t*e fusi%n !r%tein an# a## t*ese etra"ts t% am/l%se resin The fusion
protein should stic$ on the resin while all the other bacterial proteins in the e)tract should not Iou
can +as* t*e %t*er ba"terial !r%teins a+a/ leaving the fusion protein bound to the resin T% get
t*e fusi%n !r%tein %ff t*e resin /%u "an a## t*e sugar malt%se ,altose and amylase will
compete for binding sites on the fusion protein 7f maltose is in e)cess then it will PdisconnectP the
fusion protein from the resin# leaving a solution with purified fusion protein
1CC Chapter 9
Section 9.5 DNA Sequence Analysis
9-24. >n +ell stu#ie# %rganisms su"* as C. elegans, D. melanogaster, /east an# mi"e t*e entire
&'A se1uen"e %f t*e gen%mes is n%+ a)ailable All you need to do in %r#er t% stu#/ an/ regi%n in
t*ese gen%mes is to #esign E52 !rimers base# %n t*e gen%mi" se1uen"e that will amplify the region
of interest 7f necessary you can then determine the DNA sequence of the amplified region using
automated methods Iou might do this# for e)ample# if you wanted to $now if an individual/s gene
carried a mutation These techniques require much less effort on the part of the investigator Thus
*a)ing t*e gen%me se1uen"e %f an %rganism in"reases t*e im!%rtan"e %f E52
!estriction mapping is becoming a rarity even when studying unusual organisms 1 if you have
cloned a gene from your organism you can sequence the DNA 3nce you $now the DNA sequence you
can automatically find the location of the sites for all $nown restriction en"ymes
*owever you still need to use restriction en"ymes to construct libraries and specific recombinant
DNA molecules 2estri"ti%n #igesti%ns remain t*e basis f%r man/ im!%rtant a!!li"ati%ns %f &'A
"l%ning and also for understanding in the ne)t chapter how scientists were actually able to determine
the DNA sequences of entire genomes
9-25. Notice how many of these processes require the use of DNA polymerase# underlining why it is so
important to learn how this en"yme wor$s
a .n9/me-base#C &'A ligase
b .n9/me-base#C restri"ti%n en9/mes
c '%n-en9/mati"C */bri#i9ati%n relies %n "%m!lementar/ base !airing
d .n9/me-base#C &'A !%l/merase
e .n9/me-base#C re)erse trans"ri!tase f%r t*e first stran# %f "&'A an# &'A !%l/merase f%r
t*e "%m!lementar/ stran#
f .n9/me-base#C &'A !%l/merases fr%m t*erm%!*ili" ba"teria E. coli DNA polymerase would
not be very effective for %C! because at each cycle# heat is applied to denature the DNA# and this heat
would inactivate the E. coli en"yme This is not true of DNA polymerases from bacteria that live in
high temperature conditions
Chapter 9 1C9
9-26.
a The newly synthesi"ed strand is read from the gel beginning with the smallest band which
corresponds to the >/ end of this strand This newly synthesi"ed strand is complementary to the
template strand !eading the sequence from the gel+
ne+l/ s/nt*esi9e# stran#J 5, TA45TA445TA4555TTTAT54 3,
tem!late stran#J 3, AT54AT554AT5444AAATA45 5,
b The sequencing template is the m!NA1li$e strand# s% t*e se1uen"e %f t*e m2'A isJ
5, 54AGAAA4445GA455GA45TA 3,.
c Any m!NA has = possible reading frames# which begin at the >/ end with the first nucleotide# the
second nucleotide and the third nucleotide T*ere are st%! "%#%ns in ea"* frame 'there are no
open reading frames or 3!-s( s% it is unli(el/ t*at t*is is an e%n se1uen"e %f a "%#ing regi%n
9-27.
a :ynthesis occurs in the >/ to =/ direction# so the smallest fragment would contain the >/ T added to
the primer and the ne)t si"ed product would incorporate the C
b -irst write out the sequence of both strands and scan each strand for stop codons T*e ne+l/
s/nt*esi9e# stran# *as st%! "%#%ns in all t*ree frames <un#erline#= an# t*eref%re +%ul# n%t be
t*e "%#ing <e%n= se1uen"e. An t*e &'A se1uen"ing tem!late stran# t*e rea#ing frame t*at
starts +it* t*e first nu"le%ti#e #%es n%t "%ntain a st%! "%#%n an# t*eref%re is t*e A2F in t*is
2'A-li(e stran#
:ynthesi"ed strand+ >/ TCTA;CCT;AACTAAT;C =/
DNA sequencing template+ =/ A;ATC;;ACTT;ATTAC; >/
c The peptide sequence begins with the amino terminal end which corresponds to the >/ end of the
m!NA1li$e DNA sequence 'the DNA sequencing template( is ' Ala-$eu-Kal-4ln-Ala-Arg
199 Chapter 9
9-28.
a 7n -igure 91Aa# you can see that the fragments of DNA get successively larger by adding
nucleotides onto the =/1end DNA polymerase synthesi"es growing strands in the >/1to1=/ direction
The trace shows a portion of a synthesi"ed single stranded DNA The green pea$ at the left end of
the trace means that there is a fragment of DNA of a specific length 'see part c( that was
terminated when a dideo)y1A 'ddA( was incorporated into the DNA strand being synthesi"ed T*is
terminal ##A, +*i"* is lin(e# t% a green flu%res"ent label, t*eref%re be"%mes t*e 3, en# %f t*is
m%le"ule
b 5,...A55TATTTTA5A44AATT...3,
c B2esi#ue E%siti%nB in#i"ates a !ea( at a s!e"ifi" l%"ati%n in t*e s"an ,ost probably# nucleotide
position 1 corresponds to the first nucleotide at the >/1end of the newly synthesi"ed fragments Iou
should note that all of the fragments will start at their >/1end with the same short oligonucleotide
primer# since DNA polymerase requires a primer Thus# nucleotide position 1 is also the >/1end of
the primer used to generate the nested array of fragments Therefore t*e si9e %f t*e single-
stran#e# &'A fragment is re!resente# b/ t*e resi#ue !%siti%n
d There are two different pea$s showing up at the same position 3ne is a T# the other is a ; T*e
#%uble !ea( at !%siti%n 370 is m%st li(el/ "ause# b/ t*e fa"t t*at t*e %riginal &'A a"tuall/
*a# t+% #ifferent &'A se1uen"es This pattern would be seen if the person whose DNA was
amplified was actually a hetero"ygote with %ne "*r%m%s%me "arr/ing a T-A base !air at t*is
l%"ati%n +*ile t*e *%m%l%gue *a# a 4-5 base !air This is in fact the way that %C!
amplification and DNA sequencing can be used together to loo$ for hetero"ygosity anywhere in the
genome 3f course this result could also be due to an error either in DNA sequencing or in %C!
amplification
Section 9.6 Bioinformatics: Information Technology and Genomes
9-29.
a 7t indicates that there are regions of the chromosome where genes are clustered
b The largest gene desert is from appro)imately >C999999 to @.999999
c The centromere corresponds the largest gene desert
d The C-T! is on the long arm of the chromosome
e The C-T! gene is trancribed in the direction of the green arrow which is pointing away from the
centromere
Chapter 9 191
f There are appro)imately .A e)ons in the C-T! gene 7t s an appro)imation as the eons are
predicted by computer analysis and not by a comparison to actual protein sequence
9-30.
The simplest method to try to determine potential proteins in this organism is to compare the sequences
to organims that have also had their genomes sequenced Those sequences that are most highly
conserved would be e)pected to be open reading frames from genes To determine alternative splicing
in various tissues the cDNA sequences from those tissues can be compared to each other and to the
genomic sequences

Digital Analysis of DNA: Synopsis

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Digital Analysis of DNA: Synopsis

Transféré par

Droits d'auteur :

Formats disponibles

Chapter 9 171

Chapter 9 Digital Analysis of DNA

Vous aimerez peut-être aussi