Vous êtes sur la page 1sur 44

PNAS PLUS

Design and characterization of a nanopore-coupled


polymerase for single-molecule DNA sequencing by
synthesis on an electrode array
P. Benjamin Strangesa,1, Mirk Pallaa,b,1, Sergey Kalachikovc, Jeff Nivalaa, Michael Dorwartd, Andrew Transd,
Shiv Kumarc, Mintu Porelc, Minchen Chienc, Chuanjuan Taoc, Irina Morozovac, Zengmin Lic, Shundi Shic, Aman Aberrae,
Cleoma Arnoldd, Alexander Yangd, Anne Aguirred, Eric T. Haradad, Daniel Korenblumd, James Pollardd, Ashwini Bhatd,
Dmitriy Gremyachinskiyd, Arek Bibillod, Roger Chend, Randy Davisd, James J. Russoc, Carl W. Fullerc,d, Stefan Roeverd,
Jingyue Juc,f, and George M. Churcha,b,d,2
a
Department of Genetics, Harvard Medical School, Boston, MA 02115; bWyss Institute for Biologically Inspired Engineering at Harvard University, Boston,
MA 02115; cCenter for Genome Technology and Biomolecular Engineering, Department of Chemical Engineering, Columbia University, New York, NY
10027; dGenia Technologies, Santa Clara, CA 95050; eDepartment of Biomedical Engineering, Arizona State University, Tempe, AZ 85281; and fColumbia
University College of Physicians and Surgeons, New York, NY 10032

Scalable, high-throughput DNA sequencing is a prerequisite for


precision medicine and biomedical research. Recently, we presented
a nanopore-based sequencing-by-synthesis (Nanopore-SBS) approach, which used a set of nucleotides with polymer tags that allow
discrimination of the nucleotides in a biological nanopore. Here, we
designed and covalently coupled a DNA polymerase to an -hemolysin
(HL) heptamer using the SpyCatcher/SpyTag conjugation approach. These porinpolymerase conjugates were inserted into
lipid bilayers on a complementary metal oxide semiconductor
(CMOS)-based electrode array for high-throughput electrical recording of DNA synthesis. The designed nanopore construct successfully
detected the capture of tagged nucleotides complementary to a DNA
base on a provided template. We measured over 200 tagged-nucleotide
signals for each of the four bases and developed a classification
method to uniquely distinguish them from each other and background signals. The probability of falsely identifying a background
event as a true capture event was less than 1.2%. In the presence
of all four tagged nucleotides, we observed sequential additions in
real time during polymerase-catalyzed DNA synthesis. Single-polymerase
coupling to a nanopore, in combination with the Nanopore-SBS
approach, can provide the foundation for a low-cost, single-molecule, electronic DNA-sequencing platform.

nanopore sequencing protein design polymer-tagged nucleotides


single-molecule detection integrated electrode array

Since the first demonstration of single-molecule characterization by a biological nanopore two decades ago (11), interest
has grown in using nanopores as sensors for DNA base discrimination. One approach is strand sequencing, in which each
base is identified as it moves through an ion-conducting channel,
ideally producing a characteristic current blockade event for
each base. Progress in nanopore sequencing has been hampered
by two physical limitations. First, single-base translocation can be
too rapid for detection (13 s per base), and second, structural
similarities between bases make them difficult to identify unambiguously (12). Some attempts to address these issues have
used enzymes as molecular motors to control single-stranded
DNA (ssDNA) translocation speeds but still rely on identifying
multiple bases simultaneously (1315). Other approaches used
Significance
DNA sequencing has been dramatically expanding its scope in
basic life science research and clinical medicine. Recently, a set of
polymer-tagged nucleotides were shown to be viable substrates
for replication and electronically detectable in a nanopore. Here,
we describe the design and characterization of a DNA polymerasenanopore protein construct on an integrated chip. This
system incorporates all four tagged nucleotides and distinguishes singletagged-nucleotide addition in real time. Coupling protein catalysis and nanopore-based detection to an
electrode array could provide the foundation of a highly scalable, single-molecule, electronic DNA-sequencing platform.

NA sequencing is a fundamental technology in the biological


and medical sciences (1). Advances in sequencing technology
have enabled the growth of interest in individualized medicine with
the hope of better treating human disease. The cost of genome sequencing has dropped by five orders of magnitude over the last
decade but still remains out of reach as a conventional clinical tool
(2, 3). Thus, the development of new, high-throughput, accurate,
low-cost DNA-sequencing technologies is a high priority. Ensemble
sequencing-by-synthesis (SBS) platforms dominate the current
landscape. During SBS, a DNA polymerase binds and incorporates a
nucleotide analog complementary to the template strand. Depending on the instrumentation, this nucleotide is identified either by its
associated label or the appearance of a chemical by-product upon
incorporation (4). These platforms take advantage of a high-fidelity
polymerase reaction but require amplification and have limited read
lengths (5). Recently, single-molecule strategies have been shown to
have great potential to achieve long read lengths, which is critical for
highly scalable and reliable genomic analysis (69). Pacific Biosciences SMRT SBS approach has been used for this purpose but has
lower throughput and higher cost compared with current secondgeneration technology (10).

www.pnas.org/cgi/doi/10.1073/pnas.1608271113

Author contributions: P.B.S., M. Palla, S. Kalachikov, J.N., S. Kumar, I.M., A. Bibillo, R.C., R.D.,
J.J.R., C.W.F., S.R., J.J., and G.M.C. designed research; P.B.S., M. Palla, S. Kalachikov, J.N., M.D.,
A.T., S. Kumar, M. Porel, M.C., C.T., I.M., Z.L., S.S., A. Aberra, C.A., A.Y., A. Aguirre, E.T.H., and
C.W.F. performed research; P.B.S., M. Palla, S. Kalachikov, M.D., S. Kumar, D.K., J.P., A. Bhat,
D.G., A. Bibillo, and R.C. contributed new reagents/analytic tools; P.B.S., M. Palla, and D.K.
analyzed data; and P.B.S. and M. Palla wrote the paper.
Reviewers: J.H.G., University of Washington; A.M., Boston University; and M.W.,
Northeastern University.
Conflict of interest statement: The Nanopore SBS technology has been exclusively licensed
by Genia. In accordance with the policy of Columbia University, the coinventors (S. Kumar,
M.C., C.T., Z.L., S. Kalachikov, J.J.R., and J.J.) are entitled to royalties through this license.
G.M.C. is a member of the Scientific Advisory Board of Genia, other potential conflicts are
described here: arep.med.harvard.edu/gmc/tech.html.
This article is a PNAS Direct Submission.
Freely available online through the PNAS open access option.
1

P.B.S. and M. Palla contributed equally to this work.

To whom correspondence should be addressed. Email: gchurch@genetics.med.harvard.


edu.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.


1073/pnas.1608271113/-/DCSupplemental.

PNAS Early Edition | 1 of 8

APPLIED BIOLOGICAL
SCIENCES

Edited by Stephen T. Warren, Emory University, Atlanta, Georgia, and approved September 28, 2016 (received for review August 12, 2016)

5
Template

5
(A) tag 1
(T) tag 2
(C) tag 3

Primer
Polymerase

Linkage

CIS

(G) tag 4
dNTP
Tag

pA
Nanopore

Lipid bilayer

TRANS

tag 2
tag 3
tag 1
tag 4

Time (s)
Fig. 1. Principle of single-molecule DNA sequencing by a nanopore using
tagged nucleotides. Each of the four nucleotides carries a different polymer
tag (green square, A; red oval, T; blue triangle, C; black square, G). During
SBS, the complementary nucleotide (T shown here) forms a tight complex
with primer/template DNA and the nanopore-coupled polymerase. As the
tagged nucleotides are incorporated into the growing DNA template, their
tags, attached via the 5-phosphate, are captured in the pore lumen, which
results in a unique current blockade signature (Bottom). At the end of the
polymerase catalytic reaction, the tag is released, ending the current
blockade, which returns to open-channel reading at this time. For the purpose of illustration, four distinct tag signatures are shown in the order of
their sequential capture. A large array of such nanopores could lead to
highly parallel, high-throughput DNA sequencing.

exonuclease to cleave a single nucleoside-5-monophosphate that


then passes through the pore (16), or modified the pore opening
with a cyclodextrin molecule to slow translocation and increase
resolution for individual base detection (17, 18). All of these
techniques rely on detecting similarly sized natural bases, which
produce relatively similar current blockade signatures. Additionally, no strategies for covalently linking a single enzyme to a
multimeric nanopore have been published.
Recently, we reported a method for SBS with nanopore detection (19, 20). This approach has two distinct features: the use
of nucleotides with specific tags to enhance base discrimination
and a ternary DNA polymerase complex to hold the tagged nucleotides long enough for tag recognition by the nanopore. As shown
in Fig. 1, a single DNA polymerase is coupled to a membraneembedded nanopore by a short linker. Next, template and four
uniquely tagged nucleotides are added to initiate DNA synthesis.
During formation of the ternary complex, a polymerase binds to a
complementary tagged nucleotide; the tag specific for that nucleotide is then captured in the pore. Each tag is designed to have a
different size, mass, or charge, so that they generate characteristic
current blockade signatures, uniquely identifying the added base.
This system requires a single polymerase coupled to each nanopore
2 of 8 | www.pnas.org/cgi/doi/10.1073/pnas.1608271113

to ensure any signal represents sequencing information from only


one DNA template at a time. Kumar et al. (19) demonstrated that
nucleotides tagged with four different polyethylene glycol (PEG)
molecules at the terminal phosphate were good substrates for
polymerase and that the tags could generate distinct signals as they
translocate through the nanopore. These modifications enlarge the
discrimination of the bases by the nanopore relative to the use of
the natural nucleotides. We recently expanded upon this work by
replacing the four PEG polymers with oligonucleotide-based tags
and showed that a DNA polymerase coupled to the nanopore
could sequentially add these tagged nucleotides to a growing DNA
strand to perform Nanopore-SBS (20). Although this work showcased the promise of this technology, it did not describe in detail
how to build a protein construct capable of Nanopore-SBS and
did not obtain enough data to develop a statistical framework to
uniquely distinguish the tagged nucleotides from each other.
Here, we describe the design and characterization of a protein
construct capable of carrying out Nanopore-SBS (Fig. 1). A porin
attached to a single DNA polymerase molecule is inserted into a
lipid bilayer formed on an electrode array. The polymerase
synthesizes a new DNA strand using four uniquely tagged nucleotides. The DNA polymerase is positioned in such a way that
when the ternary complex is formed with the tagged nucleotide,
the tag is captured by the nanopore and identified by the resulting
current blockade signature. We first describe the construction and
purification of an -hemolysin (HL) heptamer covalently attached
to a single 29 DNA polymerase using the SpyTag/SpyCatcher
conjugation approach (21), followed by binding of this conjugate
with template DNA and its insertion into a lipid bilayer array.
We confirm that this complex is stable and retains adequate pore
and polymerase activities. We verify that the tagged nucleotides
developed by Fuller et al. (20) can be bound by the polymerase
and accurately discriminated by the nanopore. We develop an
experimental approach and computational methods to uniquely
and specifically distinguish true tagged-nucleotide captures from
background and from other tagged nucleotides. We address ways
that tagged-nucleotide captures may be misidentified and demonstrate approaches to correct for these. We further show this
protein construct can capture tagged nucleotides during templatedirected DNA synthesis in the presence of Mn2+, demonstrating
its utility for Nanopore-SBS.
Results
Experimental Platform. To measure current through a nanopore,

we used a complementary metal oxide semiconductor (CMOS)


chip containing 264 individually addressable electrodes, which
was developed by Genia Technologies. In this first-generation
prototype, measurements are taken every 1 ms, which necessitated the creation of new tagged nucleotides as described by Fuller
et al. (20) To complete the development of Nanopore-SBS, we
designed a porinpolymerase conjugate that could function on the
Genia chip (Fig. 1). This protein assembly needs to ensure attachment of only one polymerase per pore, placement of polymerase on the cis side of the pore, and preservation of polymerase
and pore function. We investigated several approaches to meet
these requirements.
Construction of a PorinPolymerase Conjugate. We adopted the
SpyCatcher/SpyTag (21) protein conjugation system to couple a
single polymerase with one HL heptamer. Previous work had
demonstrated that heteroheptameric HL pores could be isolated
by tagging some subunits with charged residues (22). We then devised a way to purify a 1:6 heptameric pore, where one subunit
contains a C-terminal 6-histidine tag (6-His-tag) and the other six
contain neutral Strep-tags (23). An HL pore coupled to a single
polymerase molecule could then be made from three proteins: HL
with a C-terminal Strep-tag, HL with a C-terminal SpyTag peptide
followed by a 6-His-tag, and 29 with a C-terminal SpyCatcher
Stranges et al.

Confirmation of Nanopore Function. We then confirmed that our


porinpolymerase construct was viable for single-molecule
polymer-tag detection. First, the 1:6 HL pore was inserted into
the membrane of the Genia nanopore chip, followed by applying a
100-mV potential across the channel. The current through the pore
was 30 pA (Fig. 3A) in a buffer containing 20 mM Hepes, pH 7.5,
and 300 mM NaCl, representing a single pore insertion, thus
confirming the 1:6 pore construction yielded viable, active pores.
Next, the porinpolymerase conjugate was inserted into the lipid
bilayer. The only change was a small increase in the fluctuation of
the open channel current [root-mean-square fluctuation (RMSF) =
0.71 0.24 pA; SI Appendix, Methods 3] compared with the pore
alone (RMSF = 0.48 0.07 pA), indicating that conjugation of the
polymerase does not inhibit pore activity (Fig. 3B). To observe a
detectable signal from the tagged nucleotides, the 1:6 HL pore
was inserted into the membrane, followed by addition of all four
tagged nucleotides (SI Appendix, Fig. S5). There were noticeable

APPLIED BIOLOGICAL
SCIENCES

Fig. 2. Assembly of the porinpolymerase construct. (A) Protein constructs used to form the porinpolymerase conjugate include unmodified HL with a
Strep-tag, HL with a C-terminal SpyTag peptide and 6-His-tag, and 29 with a C-terminal SpyCatcher domain. (B) Assembly steps. HLSpyTag6-His and
unmodified HL are oligomerized with lipid, and the 1:6 SpyTag:unmodified assembled porin is purified. Addition of 29SpyCatcher to the 1:6 pore yields
one polymerase per HL pore. (C) A molecular model generated with Rosetta using the determined structures for 29 polymerase (PDB ID code 2PYJ), HL
(PDB ID code 7AHL), and SpyCatcher/SpyTag (PDB ID code 2X5P). Colors of the proteins match the cartoon representations in A and B. The expected tag exit
site on the polymerase and the opening to the nanopore can be in close proximity with distances as short as 46 in some models. (D) The stoichiometry in
solution of the porin subunits was confirmed by SDS/PAGE without boiling. To confirm the assembly, excess 29SpyCatcher was added to 1:6 pore. The
combination yields only pores with one polymerase attached.

Stranges et al.

PNAS PLUS

Polymerase function in bulk phase was determined by rolling circle


amplification (SI Appendix, Methods 2 and Fig. S4).

(Fig. 2A). The whole porinpolymerase conjugate can be assembled


stepwise, by first forming and purifying the 1:6 (SpyTag:unmodified)
HL pore, followed by addition of 29SpyCatcher (Fig. 2B). Amino
acid linker lengths between HLSpyTag and 29SpyCatcher were
chosen based on assembling the structures of these proteins (24, 25)
into a porinpolymerase conjugate in silico followed by macromolecular modeling of the linkers using Rosetta (26) (SI Appendix, Methods
1 and Figs. S1 and S2). The models demonstrated that these linkers
could allow the expected tag exit site of 29 to be positioned above
the pore (Fig. 2C). The two HL subunits were mixed at a ratio of
one part HLSpyTag6-His-tag to six parts unmodified HL and
oligomerized by adding lipid. The HL porins containing only one
unit of SpyTag+6-His-tag were purified by ion exchange chromatography, which allowed HL porins containing zero, one, two, or
more units of 6-His-tag to be readily distinguished (SI Appendix, Fig.
S3) (23). A single 29 DNA polymerase with a C-terminal SpyCatcher
was attached by incubating it with the 1:6 HL assembly overnight
(Fig. 2 A and B). Stoichiometry of the porinpolymerase conjugate
was analyzed by SDS/PAGE gels stained for total protein (Fig. 2D).

PNAS Early Edition | 3 of 8

Fig. 3. Representative current versus time traces for the various stages of the pore assembly. (A) When neither tagged nucleotide nor polymerase is present,
only stable open-channel current is observable. (B) Attachment of polymerase does not change the mean open-channel current. The current root-meansquare fluctuation (RMSF) increase in B may be an indication of the polymerase coupled to the pore. (C) When no polymerase is attached to the pore and
tagged nucleotide is introduced, transient events are observed. (D) When polymerasetemplate is attached to the pore and the complementary base dG6PdT30 is added, there are prolonged capture events as well as transient events as observed in C.

drops in current, indicating that tagged nucleotide causes some


transient blockage of the pore (Fig. 3C). Finally, we assembled the
porinpolymerase conjugate, followed by addition of a self-priming
DNA hairpin with a C nucleotide in the first position on the strand to
be replicated (SI Appendix, Fig. S6). This complex was inserted into
the membrane, and then the complementary tagged G nucleotide
(SI Appendix, Fig. S5: dG6P-T30) was added in a buffer containing
noncatalytic Ca2+ ions to allow capture of the tagged nucleotide but
prevent base incorporation. The current versus time trace for the fully
assembled nanoporepolymerasetemplate complex shows longer
blockade events than the 1:6 HL pore with tagged nucleotides, and
produces a stable minimum current signature for the added dG6PT30 (Fig. 3D), as well as blockade events similar to those seen in
Fig. 3C. This evidence suggests that the designed porepolymerase is
a viable construct to allow single-molecule detection of captured
tagged nucleotides. It also demonstrates the detection of singlemolecule binding to an enzyme covalently bound to a nanopore.
Detection of Ternary Complex Captures. After confirming that the
assembled nanoporepolymerasetemplate complex functioned
properly, we sought to determine its efficacy for detecting all
four tagged nucleotides (SI Appendix, Fig. S5). Four DNA hairpin
oligonucleotides, with different bases at the first query position
(SI Appendix, Fig. S6), were used as templates. Porinpolymerase
conjugates loaded with these templates were then inserted into a
lipid membrane, followed by addition of the complementary tagged
nucleotide in a buffer containing noncatalytic divalent metal (Ca2+)
ions. Whenever the current was deflected below 70% of the open
channel level, the mean current of that deflection, and the duration
4 of 8 | www.pnas.org/cgi/doi/10.1073/pnas.1608271113

of the deflection (dwell time) were recorded as current blockade


events. The total number of recorded events (n) were as follows: n =
716 for dG6P-T30, n = 812 for dA6P-FL, n = 727 for dC6P-dSp3,
and n = 717 for dT6P-dSp30. When a polymerasetemplate was
conjugated to the porin, tagged nucleotides were captured for longer
times and at distinct current levels that were not observed when
polymerase was absent (Fig. 4 and SI Appendix, Methods 4 and
Fig. S7). The dwell time of the tagged-nucleotide background events
were <10 ms, whereas with template and polymerase present dwell
times of ternary complex captures ranged from 10 ms to 5 s
(SI Appendix, Figs. S8 and S9). All mean currents were outside of a
SD of the next closest tag except for those between tagged A
(dA6P-FL) and G (dG6P-T30) nucleotides (SI Appendix, Table S1
and Figs. S9A and S10). These two tags could be distinguished by a
characteristic two-current level capture of dA6P-FL (Fig. 4 and
SI Appendix, Methods 5 and Fig. S11). These results demonstrate
that each of the four tagged-nucleotide signals is template specific
and can be clustered into distinct current blockade groups relative
to the open-channel current reading of the HL pore (Fig. 4). We
collected over 200 ternary complex capture events, which led us to
develop computational approaches to accurately distinguish one
tagged-nucleotide capture event from another.
Discrimination Among Tagged-Nucleotide Ternary Captures. We
quantified the accuracy of base calls among the four distinct
tagged-nucleotide ternary complex captures (TCCs) probing the
complementary tagged nucleotides only. First, we determined
that the key signal feature to distinguish the events associated
with each of the four tagged-nucleotide captures was the median
Stranges et al.

PNAS PLUS
APPLIED BIOLOGICAL
SCIENCES

Fig. 4. Tagged-nucleotide discrimination on a semiconductor chip array. All measurements were taken on a porepolymerasetemplate complex under
noncatalytic conditions where the first base on the template is complementary to the added tagged nucleotide. (A) Current versus dwell time (duration of
each current blockade) plots for captures of all tagged nucleotides. Capture events cluster into distinct current and dwell time regions for each tagged
nucleotide. (B) Representative single-pore traces of tagged-nucleotide capture shown in A. Current blockade levels for each are marked in red. The blockades
demonstrate unique, single-molecule events corresponding to the four distinct tag captures.

residual current (SI Appendix, Fig. S10). TCCs were differentiated from background captures by requiring their dwell time to
be greater than 10 ms. Then, we used a classification algorithm
derived from the characteristic dwell time and residual current
intervals for each set of ternary capture experiments to estimate
the accuracy with which one could call a given TCC event. We
Stranges et al.

found that there was a 78.899.2% chance of making an accurate


call for each tagged-nucleotide capture by computing a confusion
matrix (Table 1). We also determined that the transient captures
of tagged nucleotides could be readily distinguished from polymerase-mediated ternary captures (SI Appendix, Tables S2 and
S3). In addition, when all four nucleotides were added to a
PNAS Early Edition | 5 of 8

Table 1. Confusion matrix for discriminating between ternary


complex captures using a capture event classification algorithm
Actual nucleotide
Predicted nucleotide
G
A
C
T

96.77
2.15
1.08
0.00

14.38
78.77
2.05
4.80

0.78
0.00
99.22
0.00

0.00
0.00
1.61
98.39

Each cell represents the percent probability of classifying a particular


ternary complex capture (top row labels) as any of the four variants (left
column labels). The diagonal (bold text) represents the correct classification.
Ternary complex captures were classified by using a custom clustering algorithm based on mean dwell time and residual current level of observed
events (Methods).

template where the G nucleotide was at the first position, the


tagged C nucleotide, dC6P-dSp3, was captured the majority
(69%) of the time (SI Appendix, Fig. S12 and Table S4). Longer,
more distinguishable, captures of the complementary tagged nucleotide versus mismatched ones are supported by the observation
that 29s Michaelis constant is 10 times lower for the correct
nucleotide, versus the incorrect ones (27). This result could prove
important for future polymerase-engineering steps.
Detection of Sequential Additions of Nucleotides. With a functioning protein construct and ability to detect single-nucleotide
captures, we then tested whether sequential nucleotide additions
could be detected. Tagged-nucleotides dG6P-T30 and dC6PdSp3, along with natural dATP and dTTP, in catalytic Mn2+ ioncontaining buffer were added to a nanoporepolymerasetemplate assembly with an A nucleotide as the first query base (SI
Appendix, Fig. S6). There was clear capture of tags corresponding to the G and C nucleotides (Fig. 5A and SI Appendix, Figs.
S13 and S14), and they were detected at the same frequency as
predicted from the GC content of the template (SI Appendix,
Table S5). The dwell times for these tagged nucleotides were
shorter than in the noncatalytic condition, with average dwell
times of 0.1 s (SI Appendix, Fig. S13C) compared with 1.5 s in
Ca2+-containing buffer. The transient tagged-nucleotide capture
profile was unaffected by the divalent metal (SI Appendix, Fig.
S15), and the polymerase-mediated captures were still distinguishable from background.
Given this result, we used the same template to see whether all
tagged nucleotides could be detected under catalytic conditions.
Equimolar quantities of the four tagged nucleotides were added
in the presence of Mn2+ to perform Nanopore-SBS. Out of 70
single pores obtained, 25 captured two or more tags, whereas six
of those showed detectable captures of all four tagged nucleotides. The pore with the most transitions between tag capture
levels is shown in Fig. 5B. The other five are displayed in SI
Appendix, Fig. S16. All four characteristic current levels for the
tags and transitions between them can be readily distinguished.
The ability to observe all four tagged nucleotides without the
presence of noncatalytic divalent cations, which slows tag release, demonstrates greater potential for sequencing speed then
previously shown (20). Homopolymer sequences in the template,
and repeated, high-frequency tag capture events of the same
nucleotide in the raw sequencing reads were considered a single
base for sequence alignment. We recognized 12 clear sequence
transitions in a 20-s period. Out of the 12 base transitions observed in the data, 85% match the template strand, showing
that this method can produce results that closely align to the
template sequence. Improved methods that use the time between
tag capture events could allow discrimination between high-frequency captures of the same tag and captures due to new com6 of 8 | www.pnas.org/cgi/doi/10.1073/pnas.1608271113

plementary tagged-nucleotide binding (SI Appendix, Methods 6


and Fig. S17), which may further enhance the observed sequencing accuracy. These methods could allow more confident
sequencing of homopolymer regions in a template.
Discussion
Our results demonstrate that the binding and incorporation of
tagged nucleotides by DNA polymerase can be detected on a
nanopore array to perform Nanopore-SBS. By constructing a
protein conjugate with one polymerase per porin, we ensure the
observed activity comes from only one polymerase. The nanopore-attached polymerase retains its ability to bind and incorporate the complementary nucleotide for detection of realtime DNA synthesis. We improved upon our previous work by
demonstrating the polymerase can capture tagged nucleotides
over a long enough time period to be detected in the pore
without the need for noncatalytic divalent cations, which slow the
overall DNA synthesis rate. This represents a comprehensive
characterization study of a single enzyme conjugated to a
protein nanopore.
Previous uses of polymerases to guide DNA through a nanopore (14, 15) did not couple the polymerase directly to the
protein pore, instead relying on voltage to initiate the entry of

A
Template complement > TATGATGATCCCAGTAGTAGTCCCGCGCTCGAG

G
10 s

Template complement > TGATGATCAGTAG


:||||||| ||||
Base call > AGATGATC-GTAG

G
A A

T
C

G
A

G
(A)

G
A

1 s

Fig. 5. Representative examples of real-time detection of numerous successive tagged-nucleotide incorporations into a self-priming DNA hairpin
template catalyzed by nanopore-bound polymerase on the Genia chip.
(A) Two base captures of tagged C and G nucleotides with standard A and T
nucleotides. Part of the template sequence is shown in red (SI Appendix, Fig.
S6). The only captures observed in the trace match the expected levels for
dG6P-T30 and dC6P-dSp3. (B) Four-base sequencing. Events with dwell time
>10 ms were categorized by manually assigning current blockade events to
their respective tag capture boxes (Methods). Homopolymer regions in the
template and raw sequencing reads were considered a single base for local
sequence alignment. A 12-bp section of such an alignment is shown in red.

Stranges et al.

Methods
Protein Expression and Purification. The 29 DNA polymeraseSpyCatcher
construct with an N-terminal Strep-tag was expressed in BL21 DE3 Star cells by
growing them in Magic Media (Invitrogen) at 37 C until OD 0.6, followed by
overnight growth at 25 C. Cells were resuspended and lysed by sonication in
Polymerase Buffer (PolBuff): 50 mM Tris, pH 7.5, 150 mM NaCl, 0.1 mM EDTA,
0.05% (vol/vol) Tween 20, and 5 mM 2-mercaptoethanol. Benzolase nuclease
was added after cell lysis to remove excess bound DNA. The protein was purified using Streptactin columns per the manufacturers instructions (IBA). Purified protein was eluted with PolBuff with added desthiobiotin. Both HLStreptag and HLSpyTag-6-His were expressed in BL21 DE3 Star pLys-S cells grown
in Magic Media for 8 h at 37 C. Each was lysed by sonication in 50 mM Tris,
pH 8.0, 200 mM NaCl. Strep-tagged HL was purified on Streptactin columns
and eluted in the same buffer with desthiobiotin. His-tagged HL was purified
with a cobalt column and eluted with 300 mM imidazole.
1:6 Porin Assembly Formation and Isolation. To form a 1:6 SpyTag:unmodified
HL pore, purified HL proteins were mixed in a ratio of 1:6 SpyTag construct:unmodified. The lipid 1,2-diphytanoyl-sn-glycero-3-phosphocholine
(DPhPC) was added to a final concentration of 5 mg/mL, followed by in-

Stranges et al.

PNAS PLUS

cubation at 40 C for 30 min. Lipid vesicles were subsequently popped by


adding n-octyl--D-glucoside (OG) to 5% (vol/vol). Fully formed oligomers
were separated from vesicles and monomers by size exclusion chromatography (SEC) in 20 mM Hepes, pH 7.5, 75 mM KCl, and 30 mM OG. Oligomeric
protein obtained from the SEC was then run on a MonoS column in 20 mM
MES buffer, pH 5.0, 0.1% Tween 20, and eluted with a linear gradient of 0 M
to 2 M NaCl. The desired 1:6 assembly eluted after the 0:7 porin because the
1:6 assembly contains a 6-His-tag. The 1:6 composition was confirmed by
adding SpyCatcher protein and observing a size shift of the conjugate on an
SDS polyacrylamide gel indicative of only one SpyCatcher molecule per
assembled pore.
Polymerase and Template Attachment. Purified 29 and the desired template
were bound to the pore by incubating two molar equivalents of polymerase
and four equivalents of DNA template per 1:6 pore overnight at 4 C. The
full tertiary complex was isolated by SEC in 20 mM Hepes, pH 7.5, 150 mM
KCl, 0.01% Tween 20, and 5 mM tris(2-carboxyethyl)phosphine. Isolated
fractions were characterized by SDS/PAGE to confirm the presence of 29
and HL conjugate. Formed complexes were tested for polymerase function
by rolling circle amplification.
Lipid Bilayer Formation. Synthetic lipid 1,2-di-O-phytanyl-sn-glycero-3-phosphocholine (Avanti Polar Lipids) was diluted in tridecane (Sigma-Aldrich) to a
final concentration of 15 mg/mL. A single lipid bilayer was formed on a
silanized CMOS chip surface containing an array of 264 Ag/AgCl electrodes.
The automated lipid spreading protocol used an iterative buffer and air
bubble flow to mechanically thin the membrane. During this step, voltage
was applied across the lipid bilayer to detect its capacitance, which directly
correlates to structural integrity of the membrane. An empirically determined
capacitance threshold value of 5 fF/m2 was used to classify the properly formed,
single lipid bilayer to conclude the thinning protocols.
Pore Insertion. The automated pore insertion method consisted of two voltage
protocols: (i) initially constant DC voltage was applied at 160 mV for 1 min,
immediately followed by (ii) a linearly increasing voltage ramp from 50 to
600 mV with a 1 mV/s incremental step. The smoothly increasing voltage
gradient amplified the electrical driving force guiding the nanopores into the
lipid bilayer. If a cell became active, that is, had a measured current between
10 and 50 pA, we considered this event a pore insertion, due to the measured
increase in conductance across the bilayer. Immediately after this event, this
cell was turned off to prevent additional pore insertions. In this way, the
probability of multiple pore insertions above the same electrode array element was minimized.
Nanopore Experiments. All TCC experiments were performed in a buffer
containing 300 mM NaCl, 3 mM CaCl2providing the noncatalytic divalent
cations to probe nucleotide binding/unbinding eventsand 20 mM Hepes,
pH 7.5. For sequencing experiments, this buffer was modified by replacing
CaCl2 with 0.1 mM MnCl2 as a catalytic cation source during the polymerase
extension reaction to initiate and sustain sequential nucleotide additions
along the template DNA. Purified porinpolymerasetemplate conjugates
were diluted in buffer to a final concentration of 2 nM. After pumping a
5-L aliquot to the cis compartment, single pores were embedded in the
planar lipid bilayer that separates two compartments (denoted cis and
trans), each containing 3 L of buffer solution. Experiments were conducted at 27 C with 5 M tagged nucleotides added to the cis well.
Data Acquisition. The ionic current through the nanopore was measured
between individually addressable Ag/AgCl electrodes coupled to a silicon
substrate integrated electrical circuit. This consisted of an integrating patchclamp amplifier (Genia Technologies), which provided a constant 100-mV potential across the lipid bilayer in voltage-clamp mode. Data were recorded at a
1-kHz bandwidth in an asynchronous configuration at each cell using circuitbased analog-to-digital conversion and noise filtering (Genia Technologies),
which allows independent sequence reads at each pore complex. During the
various experimental steps, a precision syringe pump (Tecan) was used in an
automated fashion to deliver reagents into the microfluidic chamber of the
CMOS chip at a flow rate of 1 L/s. Software control was implemented in
Python, which interfaced with the pump via an RS 232 communication
protocol.
Event Detection and Data Analysis. Ionic current blockade events were identified
using a custom event detection algorithm implemented in MATLAB (2014b;
MathWorks). Briefly, an event was identified by selecting segments that

PNAS Early Edition | 7 of 8

APPLIED BIOLOGICAL
SCIENCES

ssDNA into the pore. In contrast, our Nanopore-SBS approach


allows clear, template-dependent single-molecule binding observations for a DNA polymerase replicating a target strand of
DNA. The use of tags with distinct electrostatic properties enhances the difference between bases and provides a way to
perform accurate single-molecule SBS using nanopore detection.
Improvement in the Nanopore-SBS platform allowed the
generation of hundreds of tagged-nucleotide captures, an order
of magnitude more data than our previous publication, necessitating the implementation of a series of methods to capture,
analyze, and interpret this large amount of data. We developed
an experimental approach and computational algorithms to
uniquely and specifically distinguish true tagged-nucleotide captures
from background and from other tagged nucleotides. We also
addressed ways that tagged-nucleotide captures can be misidentified
and demonstrate approaches to correct for these.
Ongoing work centers on overcoming several challenges such
as homopolymer sequencing, improving the yield of functional
pores, increasing pore lifetime, and demonstrating chip reusability (SI
Appendix, Methods 68 and Fig. S18). To increase accuracy, we will
continue to improve tag design to achieve better discrimination.
Future efforts include optimizing the linker length (28) and composition (29, 30) between HL and SpyTag, as well as between 29 and
SpyCatcher. A better linker should allow more reliable capture
and ensure detection of all incorporated tagged nucleotides.
The method of isolating a multimeric nanopore with one
unique subunit, followed by covalently attaching a single polymerase, could be extended to other methods of single-molecule
detection via a nanopore. Single-molecule enzyme activity, or
proteinprotein interactions, could be observed by coupling the
desired molecular event to the alteration of current through the
pore. This technology could serve as the basis for the design of a
host of high-throughput molecular sensors. It is likely that other
applications of using molecular motors, such as a polymerase
(13, 14), helicase (31), or unfoldase (32), to observe DNA or
protein in a nanopore could benefit from this work.
The nanopore measurements described here were obtained on
a first-generation CMOS-based electrode array chip developed
by Genia Technologies, which can potentially scale to billions of
sensors (33). Our progress in protein engineering for NanoporeSBS is currently being carried forward to inform development of
the next-generation device and protein constructs. Future work will
focus on the development of new polymerases that have more
desirable kinetics, new porinpolymerase conjugation strategies,
and new tags that produce more distinguishable current blockade
signatures. These improvements are being implemented on Genias
state-of-the-art, massively parallelized nanopore arrays, which can
serve as a high-throughput single-molecule sequencing system.

deflected from open-channel current (I O = 30 pA at 100 mV in 300 mM


NaCl, 3 mM CaCl2, and 20 mM Hepes, pH 7.5) below a cutoff value of
70% of IO (21 pA) to a stable current level (I B) with a minimum dwell time
of >10 ms. For each nanopore experiment, event searches were performed to obtain the average residual current level (with respect to
open channel) for each capture event (I RES). Statistical analysis was performed to determine the mean, median, and SD of each capture event by
fitting a Gaussian to a histogram of IB values. The residual current blockade
was defined as follows: IRES% = IRES/IO, whereas the duration of the event in the
deflected segment corresponds to the dwell time. Mean dwell time and
residual current of each event in an experimental set was accumulatively
quantified using scatter plots and box-and-whisker plots. On each box plot,
the central red mark represents the median, whereas the bottom and top blue
edges of the box are the first and third quartile median values, respectively.
The whiskers extend to the lowest and highest values within 1.5 interquartile
range of the first and third quartile medians. Alternatively, average dwell

Classification of Capture Events. As a conservative classification method, we


have identified the TCC events as all events clustered inside the tag capture box
defined by a mean dwell time interval of 102 to 10+1 s and a normalized current
blockade (or residual current) region bounded by the first and third quartile
median values (lower/upper bounds) of the normalized current blockage boxplotsfor a particular tagged nucleotiderespectively (Fig. 4). The lower
bound of the dwell time interval (10 ms) corresponds to the background cutoff
(SI Appendix, Fig. S9), whereas the upper bound was selected to filter out
clogged pores from the TCC event set. Mean and median residual currents and
SDs are determined after Gaussian fitting of the TCC event histograms.

1. Shendure J, Lieberman Aiden E (2012) The expanding scope of DNA sequencing. Nat
Biotechnol 30(11):10841094.
2. Lander ES (2011) Initial impact of the sequencing of the human genome. Nature
470(7333):187197.
3. Soon WW, Hariharan M, Snyder MP (2013) High-throughput sequencing for biology
and medicine. Mol Syst Biol 9(1):640.
4. Chen CY (2014) DNA polymerases drive DNA sequencing-by-synthesis technologies:
Both past and present. Front Microbiol 5(JUN):305.
5. Fuller CW, et al. (2009) The challenges of sequencing by synthesis. Nat Biotechnol
27(11):10131023.
6. Perkins TT, Quake SR, Smith DE, Chu S (1994) Relaxation of a single DNA molecule
observed by optical microscopy. Science 264(5160):822826.
7. Smith SB, Cui Y, Bustamante C (1996) Overstretching B-DNA: The elastic response of
individual double-stranded and single-stranded DNA molecules. Science 271(5250):
795799.
8. Rief M, Clausen-Schaumann H, Gaub HE (1999) Sequence-dependent mechanics of
single DNA molecules. Nat Struct Biol 6(4):346349.
9. Harris TD, et al. (2008) Single-molecule DNA sequencing of a viral genome. Science
320(5872):106109.
10. Eid J, et al. (2009) Real-time DNA sequencing from single polymerase molecules.
Science 323(5910):133138.
11. Kasianowicz JJ, Brandin E, Branton D, Deamer DW (1996) Characterization of individual polynucleotide molecules using a membrane channel. Proc Natl Acad Sci USA
93(24):1377013773.
12. Feng Y, Zhang Y, Ying C, Wang D, Du C (2015) Nanopore-based fourth-generation
DNA sequencing technology. Genomics Proteomics Bioinformatics 13(1):416.
13. Cherf GM, et al. (2012) Automated forward and reverse ratcheting of DNA in a
nanopore at 5- precision. Nat Biotechnol 30(4):344348.
14. Manrao EA, et al. (2012) Reading DNA at single-nucleotide resolution with a mutant
MspA nanopore and phi29 DNA polymerase. Nat Biotechnol 30(4):349353.
15. Laszlo AH, et al. (2014) Decoding long nanopore sequencing reads of natural DNA.
Nat Biotechnol 32(8):829833.
16. Clarke J, et al. (2009) Continuous base identification for single-molecule nanopore
DNA sequencing. Nat Nanotechnol 4(4):265270.
17. Astier Y, Braha O, Bayley H (2006) Toward single molecule DNA sequencing: Direct
identification of ribonucleoside and deoxyribonucleoside 5-monophosphates by using an engineered protein nanopore equipped with a molecular adapter. J Am Chem
Soc 128(5):17051710.

18. Ayub M, Hardwick SW, Luisi BF, Bayley H (2013) Nanopore-based identification of
individual nucleotides for direct RNA sequencing. Nano Lett 13(12):61446150.
19. Kumar S, et al. (2012) PEG-labeled nucleotides and nanopore detection for single
molecule DNA sequencing by synthesis. Sci Rep 2:684.
20. Fuller CW, et al. (2016) Real-time single-molecule electronic DNA sequencing by
synthesis using polymer-tagged nucleotides on a nanopore array. Proc Natl Acad Sci
USA 113(19):52335238.
21. Zakeri B, et al. (2012) Peptide tag forming a rapid covalent bond to a protein, through
engineering a bacterial adhesin. Proc Natl Acad Sci USA 109(12):E690E697.
22. Howorka S, Cheley S, Bayley H (2001) Sequence-specific detection of individual DNA
strands using engineered nanopores. Nat Biotechnol 19(7):636639.
23. Davis R, Chen R, Bibillo A, Korenblum D, Dorwart M (2014) Nucleic acid sequencing
using tags. US Patent Application 14/073,445.
24. Song L, et al. (1996) Structure of staphylococcal alpha-hemolysin, a heptameric
transmembrane pore. Science 274(5294):18591866.
25. Berman AJ, et al. (2007) Structures of phi29 DNA polymerase complexed with substrate: The mechanism of translocation in B-family polymerases. EMBO J 26(14):
34943505.
26. Leaver-Fay A, et al. (2011) ROSETTA3: An object-oriented software suite for the
simulation and design of macromolecules. Methods Enzymol 487:545574.
27. Santos E, Lzaro JM, Prez-Arnaiz P, Salas M, de Vega M (2014) Role of the LEXE motif
of protein-primed DNA polymerases in the interaction with the incoming nucleotide.
J Biol Chem 289(5):28882898.
28. Robinson-Mosher A, Shinar T, Silver PA, Way J (2013) Dynamics simulations for engineering macromolecular interactions. Chaos 23(2):025110.
29. Reddy Chichili VP, Kumar V, Sivaraman J (2013) Linkers in the structural biology of
protein-protein interactions. Protein Sci 22(2):153167.
30. Klein JS, Jiang S, Galimidi RP, Keeffe JR, Bjorkman PJ (2014) Design and characterization of structured protein linkers with differing flexibilities. Protein Eng Des Sel
27(10):325330.
31. Derrington IM, et al. (2015) Subangstrom single-molecule measurements of motor
proteins using a nanopore. Nat Biotechnol 33(10):10731075.
32. Nivala J, Marks DB, Akeson M (2013) Unfoldase-mediated protein translocation
through an -hemolysin nanopore. Nat Biotechnol 31(3):247250.
33. Merolla PA, et al. (2014) Artificial brains. A million spiking-neuron integrated circuit
with a scalable communication network and interface. Science 345(6197):668673.

8 of 8 | www.pnas.org/cgi/doi/10.1073/pnas.1608271113

time/residual current probability histograms were generated by plotting each


bin normalized by the total number of observed events.

ACKNOWLEDGMENTS. This work was supported by NIH Grant R01 HG007415


and Genia Technologies, Inc.

Stranges et al.

SUPPORTING INFORMATION (SI APPENDIX)

Design and characterization of a nanopore-coupled polymerase for


single-molecule DNA sequencing by synthesis on an electrode array
P. Benjamin Strangesa,1, Mirk Pallaa,b,1, Sergey Kalachikovc, Jeff Nivalaa, Michael Dorwartd,
Andrew Transd, Shiv Kumarc, Mintu Porelc, Minchen Chienc, Chuanjuan Taoc, Irina Morozovac,
Zengmin Lic, Shundi Shic, Aman Aberrae, Cleoma Arnoldd, Alexander Yangd, Anne Aguirred,
Eric T. Haradad, Daniel Korenblumd, James Pollardd, Ashwini Bhatd, Dmitriy Gremyachinskiyd,
Arek Bibillod, Roger Chend, Randy Davisd, James J. Russoc, Carl W. Fullerc,d, Stefan Roeverd,
Jingyue Juc,f, and George M. Churcha,b,d,2

Department of Genetics, Harvard Medical School, Boston, MA 02115

Wyss Institute for Biologically Inspired Engineering at Harvard University, Boston, MA 02115

Center for Genome Technology and Biomolecular Engineering, Department of Chemical Engineering, Columbia
University, New York, NY 10027
d

Genia Technologies, Santa Clara, CA 95050

Department of Biomedical Engineering, Arizona State University, Tempe, AZ 85281


f

Columbia University College of Physicians and Surgeons, New York, NY 10032

These authors contributed equally to this work.

Address correspondence to: gchurch@genetics.med.harvard.edu

S1

SI APPENDIX METHODS

S3

1. Protein Modeling

S3

2. Confirmation of Polymerase Function

S7

3. Root-Mean-Square Fluctuation of Open Channel Current

S9

4. Tagged Nucleotide Background Signal

S12

5. Bimodal Signal of Tags

S18

6. Repeated Tag Captures

S29

7. Pore Lifetime Optimization

S32

8. Chip Reusability

S34

SI APPENDIX REFERENCES

S36

S2

SI APPENDIX METHODS

1. Protein Modeling
A model of the HL-SpyTag-29-SpyCatcher conjugate was made by hand using PyMOL
(Schrdinger, LLC). Structures for HL, 29 DNA polymerase and, FBAB-B (which forms the
SpyTag/SpyCatcher complex) (PDB codes 7AHL, 2PYJ, and 2X5P respectively) were arranged
in an expected tagged nucleotide capturing orientation and Gly/Ser linkers were built to join the
HL-SpyTag and 29-SpyCatcher (Fig. S1). This final structure was then repacked/minimized in
Rosetta to remove clashes. The conformational freedom of the conjugate was explored using the
FloppyTail backbone sampling protocol in Rosetta (1). The backbone torsion angles of the
linkers between HL and SpyTag, and 29 and SpyCatcher were allowed freedom while the
backbone of the rest of the protein was held fixed. The results of backbone sampling are shown if
Fig. S2.
Sample submission code follows:
./FloppyTail.mpi.linuxgccrelease
-database /path/to/main/database
-s 7AHL_SpyTAG_phi29_SpyCatch_assemble_repacked.pdb
-nstruct 5000
-in:file:movemap movemap_file
-packing:repack_only
-AnchoredDesign:refine_repack_cycles 30
-AnchoredDesign:perturb_cycles 15000
-AnchoredDesign:refine_cycles 3000

Where move_map file is.


RESIDUE 2052 2060 BBCHI
RESIDUE 2638 2650 BBCHI

The MoveMap file represents the residue numbers of the linkers. All other backbone angles are
held constant.
S3

>29 DNA Polymerase-SpyCatcher


MASWSHPQFEKGAETHMPRKMYSCDFETTTKVEDCRVWAYGYMNIEDHSEYKIGNSLDEFMAWVLKVQAD
LYFHNLKFDGAFIINWLERNGFKWSADGLPNTYNTIISRMGQWYMIDICLGYKGKRKIHTVIYDSLKKLP
FPVKKIAKDFKLTVLKGDIDYHKERPVGYKITPEEYAYIKNDIQIIAEALLIQFKQGLDRMTAGSDSLKG
FKDIITTKKFKKVFPTLSLGLDKEVRYAYRGGFTWLNDRFKEKEIGEGMVFDVNSLYPAQMYSRLLPYGE
PIVFEGKYVWDEDYPLHIQHIRCEFELKEGYIPTIQIKRSRFYKGNEYLKSSGGEIADLWLSNVDLELMK
EHYDLYNVEYISGLKFKATTGLFKDFIDKWTYIKTTSEGAIKQLAKLMLNSLYGKFASNPDVTGKVPYLK
ENGALGFRLGEEETKDPVYTPMGVFITAWARYTTITAAQACYDRIIYCDTDSIHLTGTEIPDVIKDIVDP
KKLGYWAHESTFKRAKYLRQKTYIQDIYMKEVDGKLVEGSPDDYTDIKFSVKCAGMTDKIKKEVTFENFK
VGFSRKMKPKPVQVPGGVVLVDDTFTIKGSGDYDIPTTENLYFQGAMVDTLSGLSSEQGQSGDMTIEEDS
ATHIKFSKRDEDGKELAGATMELRDSSGKTISTWISDGQVKDFYLYPGKYTFVETAAPDGYEVATAITFT
VNEQGQVTVNGKATKGDAHI
Color assignment: StrepTag 29 linker SpyCatcher

>HL-SpyTag-His
MADSDINIKTGTTDIGSNTTVKTGDLVTYDKENGMHKKVFYSFIDDKNHNKKLLVIRTKGTIAGQYRVYS
EEGANKSGLAWPSAFKVQLQLPDNEVAQISDYYPRNSIDTKEYMSTLTYGFNGNVTGDDTGKIGGLIGAN
VSIGHTLKYVQPDFKTILESPTDKKVGWKVIFNNMVNQNWGPYDRDSWNPVYGNQLFMKTRNGSMKAAEN
FLDPNKASSLLSSGFSPDFATVITMDRKASKQQTNIDVIYERVRDDYQLHWTSTNWKGTNTKDKWTDRSS
ERYKIDWEKEEMTNGGSSGGSSGGAHIVMVDAYKPTKKGHHHHHH
Color assignment: HL linker SpyTag 6xHis

Fig. S1. Sequences of 29 DNA polymerase and HL constructs with colored annotations for the various protein
sequence regions.

S4

Fig. S2. Linker sampling of the nanopore-polymerase complex in Rosetta. Backbone torsion angles were sampled
on the linkers highlighted using Rosettas FloppyTail application. Various final models are shown in different
colors. The majority of the models place the expected tag site >100 away. A small fraction of models place the tag
exit site within 50 , demonstrating that these linker lengths make it possible for a polymerase-bound tagged
nucleotide to be captured by the pore.

S5

Fig. S3.

Separation of hemolysin heptamers with different SpyTag-6-His-tag to unmodified subunit

stoichiometries on MonoS column and the isolation of 1:6 (SpyTag:unmodified) HL complex. Monomeric
unmodified HL and SpyTag-6-His HL were mixed at a ratio of 6:1 respectively and oligomerized by addition of
lipid. After oligomerization, the lipid vesicles were broken with detergent and the oligomers were separated on a
MonoS column (GE Healthcare) with a linear NaCl gradient from 0 M to 2 M. The ratio of the SpyTag-6-His HL
to unmodified HL is indicated near the corresponding peak on the chromatographic profile. The fraction
representing 1:6 pore assembly was further characterized by polyacrylamide gel electrophoresis under denaturing
conditions, assayed for ability to bind only one SpyCatcher fusion polymerase and used to assemble the sequencing
complexes in further experiments. FT = flow-through.

S6

2. Confirmation of Polymerase Function


Polymerase function was determined with a rolling circle amplification assay (RCA). In brief
circular M13mp18 ssDNA (New England Biolabs) was used as a template. Wild type 29 was
purchased from NEB as a positive control. RCA was performed at room temperature for 3 hours
in 25 L reactions. Each reaction contained 1x 29 DNA polymerase reaction buffer (NEB), 1
nM M13mp18 ssDNA, 10 nM primer (CGC CAG GGT TTT CCC AGT CAC GAC), 0.3 mM
dNTP, and 1 L of protein sample. Products were analyzed by agarose gel electrophoresis (Fig.
S4).

S7

Fig. S4. Rolling circle amplification (RCA) of purified HL-29 conjugates. (a) SDS/PAGE gel of purified HLSpyTag-His and 29-SpyCatcher-StrepTag for evaluation by RCA. Additional HisTrap pulldown by magnetic beads
(Qiagen) was performed to ensure no free 29-SpyCatcher was present. (b) Agarose gel of RCA using purified
proteins from a. Samples are numbered as in a. Negative control contained no polymerase. Positive control
contained wild-type 29 (NEB). Amplified products can be seen in all samples where 29-SpyCatcher is present,
indicating that polymerase activity is maintained when fused to SpyCatcher and when conjugated to HL-SpyTag.

S8

3. Root-Mean-Square Fluctuation of Open Channel Current


The root-mean-square fluctuation (RMSF) of the open channel current (I) was calculated for
each nanopore trace (n > 10) in a particular experimental set according to equation:

1
() = ( ) 2

=1

where T is the total number of time steps during the nanopore measurment; It is the measured
current level at time step t and is the time-averaged current level of the same pore. Then the
mean and standard deviation of RMSF current was computed for that experiment. This algorithm
was implemented in MATLAB (2014b, MathWorks, Natick, MA).

S9

Fig. S5. Structures of the four polymer tagged nucleotides. Nucleotides used in this study are 5-hexaphosphates
(red) connected to a common linker (green) and an oligonucleotide tag (black) consisting of Cy3 and either an
unmodified oligonucleotide chain (dT 30) or oligonucleotides with a variety of modifications including runs of abasic
nucleotides (dSp8 and dSp30) or two thymidines substituted with fluorescein (FL). Notation used throughout: dG6PT30, dA6P-FL, dC6P-dSp3, dT6P-dSp30.

S10

JAMN
5' - TTT TT(G CGC TCG AGA TCT CCG TAA GGA GAT CTC GAG CGC) GGG ACT ACT ACT
GGG ATC ATC ATN (GCC ACC TCA GCT GCA CGT AAG TGC AGC TGA GGT GGC) - 3'
Reverse complement:
5NATGATGATCCCAGTAGTAGTCCCGCGCTCGAGATCTCCTTACGGAGATCTCGAGCGCAAAAA-3
Reverse complement with homopolymers reduced to a single base:
5-NATGATGATCAGTAGTAGTCGCGCTCGAGATCTCTACGAGATCTCGAGCGCA-3

Fig. S6. General sequence and predicted secondary structure of the DNA hairpins used as templates. The first query
base (highlighted in yellow) probes complementary base N for tag discrimination. The N is replaced with the four
bases (A, T, C, G) in the experimental oligonucleotides. Structure predicted by UNAFold (2). The reverse
complement (bases that would be added by the nanopore-bound polymerase) is shown below the JAMN sequence.
In addition, the complementary strand with homopolymer runs reduced to a single base is shown.

S11

4. Tagged Nucleotide Background Signal

With the established 1:6 pore assembly, we evaluated the background signal due to tagged
nucleotides in the absence of template DNA. After incorporating the porin-polymerase conjugate
into the membrane the tagged nucleotides were added under non-catalytic (Ca2+ ion containing)
buffer conditions with a constant 100 mV potential. Addition of each tagged nucleotide caused
current fluctuations more often than in the channel without tagged nucleotides. The deflections in
the open channel current were evaluated for the amount of time spent below 70% of open
channel current (dwell time) and the mean current over the same time. While each tagged
nucleotide produced a wide distribution of deflection current, all dwell times were below ~10
ms, consistent with rapid translocations through the nanopore, and readily distinguishable from
longer polymerase/template/nucleotide ternary captures (Figs. S7-9).
To further test the viability of our approach for real-time sequencing, we evaluated the
tagged nucleotide background in Mn2+ containing buffer with the established 1:6 pore assembly.
After inserting the pore into the membrane the tagged nucleotides were added and a constant
potential of 100 mV was applied across the membrane. Addition of each tagged nucleotide
resulted in more frequent current deflections than in the channel without tagged nucleotides
similar to what was observed for the Ca2+ buffer experimental counterpart. As before, all tagged
nucleotides produced similar dwell times below ~10 ms (Fig. S15). By using frequency analysis,
we found that there was only a 0.00-1.20% probability of falsely identifying a background event
as a true ternary complex capture event, which reflects a very low false positive rate due to
background even under catalytic conditions (Table S3) required for sequencing.

S12

Fig. S7. Residual current versus dwell time scatter plots of current blockade events for ternary complex captures
(red) and tagged nucleotide background (blue). Ternary complex capture experiments were run at 100 mV with Ca 2+
as the divalent cation to enable the polymerase to bind a nucleotide. Each tagged nucleotide was added to a
nanopore-polymerase-template complex where the first base in the template was the complementary base to the
nucleotide. In a separate experiment, background events were measured by adding the individual tagged nucleotides
to the chip, after the 1:6 HL pore insertion. A constant 100 mV potential was applied and the background signals of
tagged nucleotides were observed. Buffer contained 300 mM NaCl, 3 mM CaCl2, and 20 mM HEPES pH 7.5.
Individual tagged nucleotides were added at 3 M.

S13

Fig. S8. Dwell time distributions of the four tagged nucleotides within the tag capture current band. Current band
levels were selected based on mean capture level distributions (Fig. S9). Both capture and background distributions
are normalized by the total number of counts. Dwell times for captures show a shift to longer times than the
background current blockade events. Most background captures are shorter than 10 ms while ternary complex
captures last longer than 10 ms.

S14

Fig. S9. Comparison of residual current and dwell time characteristics for the four tagged nucleotides. Box-andwhisker plots were used to discriminate the four tagged nucleotides (dG6P-T30 = G, dA6P-FL = A, dC6P-dSp3 = C,
dT6P-dSp30 = T) during ternary complex capture (TCC a and c) and tagged nucleotide background (TNB b and
d) experiments. The central red mark represents the median, while the bottom and top blue edges of the box are the
1st and 3rd quartile median values respectively. The whiskers extend to the lowest and highest values within 1.5 IQR
of the 1st and 3rd quartile medians. Events are collected over at least 3 experiments for both TCC and TNB
experiments. Number of events (n) for captures: nG = 376, nA = 229, nC = 512, nT = 243 and for background: nG =
7688, nA = 6844, nC = 4068, nT = 3069.

S15

Table S1. Residual current statistics of ternary complex capture events with complete protein
construct for each of the four tagged nucleotides (dG6P-T30 = G, dA6P-FL = A, dC6P-dSp3 =
C, dT6P-dSp30 = T) under non-catalytic (Ca2+ ion containing) buffer conditions.a

Tag Capture Boxb

Tagged Nucleotide

Residual Current

Median

0.2108

0.2269

0.3132

0.4768

0.2108

0.2254

0.3141

0.4735

0.0119

0.0086

0.0102

0.0202

Buffer containing 300 mM NaCl, 3 mM CaCl2 and 20 mM HEPES pH 7.5 was used to create a buffer condition to

inhibit polymerase catalysis for preventing sequential nucleotide incorporation on the DNA hairpin template. b Tag
capture box was defined as described in Methods (see Classification of Capture Events). Median, mean and
standard deviation of dwell and residual current was calculated from dwell and residual current data presented in
Fig. S9.

S16

Fig. S10. Residual current histogram for all current blockade events with dwell time greater than 10 ms during
ternary complex capture of individual tagged nucleotides. All experiments were performed as described in Fig. S7.
Buffer contained 300 mM NaCl, 3 mM CaCl2, and 20 mM HEPES pH 7.5. Individual tagged nucleotides were
added at 3 M.

S17

5. Bimodal Signal of Tags


We have observed that one of the four tagged nucleotides (dA6P-FL) has a characteristic twolevel current blockade signature. This may be due to collisions of the two bulky FL groups inside
the nanopore lumen during the end of a tag capture event (Fig. S11). Future tag design efforts
may capitalize on this unused signal feature (along with the current blockade level) of tags to
achieve better discrimination among the tagged nucleotides.

S18

Fig. S11. Characteristic two-level current blockade signature of captured dA6P-FL. When the A-tag containing
two bulky FL groups enters the pore, the open channel current reading drops to a stable current blockade level of
~9 pA (state i) due to the partial ion current blockade. The pore lumen is not wide enough to smoothly translocate
this tag, so it folds up into a knot-like structure. This knot significantly blocks the current flow at this point, so the
stable current blockade level drops even closer to baseline level (state ii). The voltage-driven back-pressure builds
up in the pore lumen exerting enough force on the folded A-tag to push it into the trans side of the lipid bilayer. At
this point, the signal returns to open-current reading indicating pore clearing and completion of the A-tag
translocation (state iii). Color assignment: maroon DNA template, blue primer, orange SpyCather-SpyTag
linker, red A-tag, black FL group, grey 29 polymerase, dark green HL, light green lipid bilayer.

S19

Table S2. Percent probability of identifying the tagged nucleotide background events as ternary
complex captures for each of the four tagged nucleotides (dG6P-T30 = G, dA6P-FL = A, dC6PdSp3 = C, dT6P-dSp30 = T) under non-catalytic (Ca2+ ion containing) buffer conditions.a

Current Blockade Signal

Total Eventsb

3884

1711

4068

3069

Events in Tag Capture Boxc

15

64

51

29

% of G Capture Events

0.08

1.17

0.17

0.39

% of A Capture Events

0.08

0.88

0.39

0.36

% of C Capture Events

0.10

1.23

0.42

0.13

% of T Capture Events

0.13

0.47

0.27

0.07

Ca2+

Buffer containing 300 mM NaCl, 3 mM CaCl2 and 20 mM HEPES pH 7.5 was used to create non-catalytic buffer

conditions. b Capture events were measured at 100 mV applied potential and identified as described in Methods (see
Event Detection and Data Analysis). c Tag capture box was defined as described in Methods (see Classification of
Capture Events). There is only a 0.07-1.23% probability of inaccurately identifying a background event as a true
ternary complex capture event, which reflects a very low false positive rate due to background.

S20

Table S3. Percent probability of identifying the tagged nucleotide background events as ternary
complex captures for each of the four tagged nucleotides (dG6P-T30 = G, dA6P-FL = A, dC6PdSp3 = C, dT6P-dSp30 = T) under catalytic (Mn2+ ion containing) buffer conditions.a

Current Blockade Signal

Total Eventsb

751

917

3230

963

Events in Tag Capture Boxc

10

24

26

10

% of G Capture Events

0.00

0.33

0.15

0.10

% of A Capture Events

0.27

0.87

0.28

0.21

% of C Capture Events

0.67

1.20

0.12

0.42

% of T Capture Events

0.40

0.22

0.25

0.31

Mn2+

Buffer containing 300 mM NaCl, 0.1 mM MnCl2 and 20 mM HEPES pH 7.5 was used to create catalytic buffer

conditions. b Capture events were measured at 100 mV applied potential and identified as described in Methods (see
Event Detection and Data Analysis). c Tag capture box was defined as described in Methods (see Classification of
Capture Events).

S21

Fig. S12. Ternary complex capture of one tagged nucleotide in the presence of all tagged nucleotides. A nanoporepolymerase-JAMG template complex (Fig. S6) was inserted into lipid bilayers. All four tagged nucleotides were
added at equimolar concentrations (3 M) and a constant 100 mV potential was applied. Expected tag capture levels
are highlighted on the right side of the plot with complementary C level in bold. Buffer contained 300 mM NaCl, 3
mM CaCl2, and 20 mM HEPES pH 7.5.

S22

Table S4. Percent probability of identifying the correct ternary complex capture event of the
complementary tagged nucleotide in the presence of all tagged nucleotides (dG6P-T30 = G,
dA6P-FL = A, dC6P-dSp3 = C, dT6P-dSp30 = T) under non-catalytic (Ca2+ ion containing)
buffer conditions.a

Current Blockade Signal

JAMGb | C

Total Eventsc

1087

Events > Dwell Time Cutoffd (EDC)

256

Events in Tag Capture Boxe (TCB)

144

% of G Captures in TCB

20.1390

% of A Captures in TCB

6.9444

% of C Captures in TCB

68.7500

% of T Captures in TCB

4.1667

Buffer containing 300 mM NaCl, 3 mM CaCl2 and 20 mM HEPES pH 7.5 was used to create buffer conditions to

inhibit polymerase catalysis for preventing sequential nucleotide addition to the DNA template. b DNA hairpin with
a G as the first position on the strand to be replicated (Fig. S6). c Capture events were measured at 100 mV applied
potential and identified as described in Methods (see Event Detection and Data Analysis). d Background dwell time
cutoff of 10-2 s was used as determined by tagged nucleotide background analysis (Fig. S9d). e Tag capture box was
defined as described in Methods (see Classification of Capture Events).

S23

Fig. S13. Two base capture under catalytic conditions. A nanopore-polymerase-template complex was loaded into
the bilayer followed by addition of 3 M of dC6P-dSp3, dG6P-T30, dTTP, and dATP in 300 mM NaCl, 0.1 mM
MnCl2, and 20 mM HEPES pH 7.5. A constant 100 mV potential was applied. (a) Current versus dwell time scatter
plot for two tag captures. Expected current levels for dC6P-dSp3, dG6P-T30 are shown. (b) Current versus time
trace for one pore in this experiment. Expected tag capture levels for dC6P-dSp3, and dG6P-T30 are shown. (c)
Dwell time histogram for all current blockade events for the tagged G/C nucleotide capture. (d) Residual current
histogram for all observed blockade events below 0.70 times the open channel. Expected G and C levels indicated
above the respective peak.

S24

Fig. S14. Example current versus time traces for captures of dG6P-T30 and dC6P-dSp3 during strand elongation.
Equimolar dATP, dTTP, dG6P-T30 and dC6P-dSp3 in Mn2+ containing buffer were added to pore-polymerasetemplate complex. Expected current blockade levels for captured dG6P-T30 and dC6P-dSp3 are highlighted on the
traces. Captures of the tagged G and C nucleotides can be observed in all traces. Each trace represents signal from a
different pore.

S25

Table S5. Frequency count of ternary complex capture events with full protein construct during
sequential nucleotide additions using two tagged nucleotides (dG6P-T30 = G, dC6P-dSp3 = C)
along with natural dATP and dTTP nucleotidesa under catalytic (Mn2+ ion containing) buffer
conditions.b

Translocation Signal

Frequency (cnt)

Total Eventsc

1159

Events > Dwell Time Cutoffd (EDC)

824

Events in Tag Capture Boxe (TCB)

749

% of G Captures in TCB

55.27

% of C Captures in TCB

44.73

Natural nucleotides also present (dATP, dTTP) did not produce capture events. b Buffer containing 300 mM NaCl,

0.1 mM MnCl2 and 20 mM HEPES pH 7.5 was used to create buffer conditions to promote polymerase catalysis for
sequential nucleotide incorporation on the DNA hairpin template. c Capture events were measured at 100 mV applied
potential and identified as described in Methods (see Event Detection and Data Analysis). d Background dwell time
cutoff of 10-2 s was used as determined by tagged nucleotide background analysis (Fig. S15). e Tag capture box was
defined as described in Methods (see Classification of Capture Events).

S26

Fig. S15. Comparison of residual current versus dwell time scatter plots for tagged nucleotide background events in
catalytic (magenta) and non-catalytic (green) conditions. After the 1:6 HL pore insertion, individual tagged
nucleotides were added to the chip. A constant 100 mV potential was applied and the background signal of tagged
nucleotides was observed. Buffer contained 300 mM NaCl, 20 mM HEPES pH 7.5 and either 0.1 MnCl2 (calalytic)
or 3 mM CaCl2 (non-catalytic) conditions. Individual tags were added at 3 M. The divalent cation had little effect
on the signal from free tagged nucleotides.

S27

Fig. S16. Examples of addition of all four tagged nucleotides. All tagged nucleotides in Mn 2+ containing buffer
were added to the pore-polymerase-template complex and the resulting current blockades were observed.
Deflections in current were observed at all four expected levels based on previously observed individual tag
captures. All panels represent different pores, except for the two at the bottom as the highlight shows. Current
blockade events were manually categorized as tag captures based on previously characterized current blockade level
and dwell time.

S28

6. Repeated Tag Captures


We found that under non-catalytic conditions, the same nucleotide was repeatedly captured in the
pore giving repeated high-frequency current fluctuation events. When only catalytic metal is
used the same repeated high frequency tag captures are observed. During the sequential
incorporation of nucleotides by polymerase catalysis, one may encounter two distinct patterns of
current blockade events. The first is NN where the event current level is the same as the one
that preceded it, the second is NM where the current level is different than the previous one.
NN transitions could represent repeated capture of the same tag, tagged nucleotide switching
without incorporation, or tagged nucleotide switching post-catalysis. NM transitions should
only represent tagged nucleotide switching due to incorporation of the previous nucleotide. We
sought to find a way to discriminate between high frequency captures of the same ternary
complex-bound tagged nucleotide from new, complementary, tagged nucleotide binding. We
assumed that the time scale of nucleotide switching will be relatively longer than the repeated tag
capture events of the same polymerase-bound tagged nucleotide. Here, we define wait time as
the time duration between distinct transitions events (NN or NM respectively).

As an initial test to discriminate between tagged nucleotide switching events and repeated tag
captures of the same nucleotide, we conducted a frequency analysis of ternary complex capture
event transitions using two tagged nucleotides (dG6P-T30 and dC6P-dSp3) along with natural
dATP and dTTP nucleotides under catalytic (Mn2+) sequencing conditions. Capture events were
measured at 100 mV applied potential and identified as described in Methods (see Event
Detection and Data Analysis). Our statistical analysis of wait time distributions concluded that
the wait time for NN transitions were clearly shorter than for the NM transitions, with

S29

average wait times of ~0.07 s compared to ~1.41 s respectively (Fig. S17). These results could
prove important for future base calling algorithms to correctly identify incorporated nucleotide
signal.

S30

Fig. S17. Comparison of wait time characteristics of ternary complex capture events during sequential addition of
nucleotides. All experiments were performed as described in Fig. S13. Box-and-whisker plots were used to
discriminate the four possible transitions (NN = {GG, CC}, NM = {GC, CG}) during template
progression. The central red mark represents the median, while the bottom and top blue edges of the box are the 1 st
and 3rd quartile median values respectively. The whiskers extend to the lowest and highest values within 1.5 IQR of
the 1st and 3rd quartile medians. Events are collected over at least 3 independent experiments (Table S5).

S31

7. Pore Lifetime Optimization


In order to achieve long read length with Nanopore-SBS, we tested pore lifetime with the fully
assembled nanopore-polymerase-template complex and all four tagged nucleotides under
sequencing conditions (Methods) for ~1 hr using a cyclic ON/OFF DC data acquisition method.
Briefly, (1) constant +100 mV was applied for 200 s to capture tags of the successively
incorporating nucleotides to the DNA template, followed by (2) a trapezoidal voltage function
(stage 1: linearly increasing voltage ramp from -50 mV to +100 mV with a rate of +75 mV/s,
stage 2: constant +100 mV DC for 3 s, stage 3: linearly decreasing voltage ramp from +100 mV
to -50 mV with a rate of -75 mV/s) repeated three times to eject any clogs from the pore lumen
and (3) a final step of 0 mV for 200 s to allow time to equilibrate the electrolyte solution between
the cis and trans sides of the membrane. A voltage cycle was defined as (1)-(3). Nine voltage
cycles were implemented in succession. As a representative example, we show that a single pore
can survive for 6 full cycles or ~40 min although it experiences occasional clogs that clear
spontaneously, or when the voltage is interrupted or reversed (Fig. S18). During cycles 7-9, there
was no pore activity, indicating the end of its lifetime or membrane rupture. This ON/OFF
voltage protocol can maintain long life times during sequencing experiments.

S32

Fig. S18. Pore stability during nanopore experiments. (a) During the first 200 s of data acquisition tag capture
events are present throughout, indicating a functional pore complex. (b) After the first voltage cycle, the pore seems
to clog, which is indicated by a low ion current signal (~1 pA). (c) After the second voltage cycle, the pore clog was
eliminated by the voltage fluctuation, possibly ejecting a jammed tag because if the polarity change. (d) After the
third voltage cycle, the pore seems to be clogged again similarly to (b), but after about 125 s the jammed entity
leaves the pore, indicated by the return of the stable open channel current and the frequent capture events. (e) After
the fourth voltage cycle, the pore shows capture events for the first 80 s, but after that it is clogged again. (f) Finally,
after the fifth voltage cycle, the pore is functional again for the full 200 s as indicated by multiple tag capture events.
Note, that capture events are present even after 40 min indicating that this ON/OFF voltage protocol can maintain
pore life during sequencing data acquisition.

S33

8. Chip Reusability
We have tested the reusability of our metal-oxide-semiconductor (CMOS) based electrode arrays
for nanopore measurements by regenerating the chip using an automated cleaning protocol.
Briefly, after a full experiment composed of (1) lipid bilayer formation, (2) pore complex
insertion, (3) tagged nucleotide addition and finally (4) data acquisition while applying 100 mV
constant voltage for 5 min. An automated syringe pump (Tecan, Mnnedorf, Switzerland) was
utilized to deliver reagents into the microfluidic chamber of the CMOS chip. First, 50 L of
hexane (Sigma-Aldrich, St. Louis, MO) was pumped into the chamber to remove all lipid bilayer
residues from the electrode surface. This was then followed by a 50 L wash with pure ethanol
(Crystalgen, Commack, NY). Finally, the chip chamber was washed with 100 L buffer solution
(300 mM NaCl, 3 mM CaCl2 and 20 mM HEPES pH 7.5). This three-stage washing cycle was
repeated five times. All injection flow rates were at 100 L/s. Software control was implemented
in Python, which interfaced with the pump via RS 232 communication protocol.

After these cleaning steps, the success of regeneration was evaluated by demonstrating that new
lipid membranes can be formed over the electrode arrays and by confirming viable nanoporepolymerase construct insertions into this membrane. Next, by applying a trapezoidal waveform
(stage 1: linearly increasing voltage ramp from -50 mV to +100 mV with a rate of +75 mV/s,
stage 2: constant +100 mV DC for 3 s, stage 3: linearly decreasing voltage ramp from +100 mV
to -50 mV with a rate of -75 mV/s), we also confirmed successful pore complex insertion into the
membrane by observing stable open channel current (~30 pA) at the majority of the electrodes
indicating single pore insertions. We performed three successive regeneration cycles for the
same CMOS chip obtaining 105 single pores in the first cycle, 79 in the second and 21 in the

S34

third cycle. Our results indicate that the number of single pores inserted in the membrane
decreases with each regeneration cycle, suggesting that improved regeneration methods may be
beneficial.

S35

SI APPENDIX REFERECNES
1.

Kleiger G, Saha A, Lewis S, Kuhlman B, Deshaies RJ (2009) Rapid E2-E3 assembly and
disassembly enable processive ubiquitylation of cullin-RING ubiquitin ligase substrates.
Cell 139(5):95768.

2.

Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction.
Nucleic Acids Res 31(13):34063415.

S36

Vous aimerez peut-être aussi