Vous êtes sur la page 1sur 8

2011 IEEE International Conference on

Automation Science and Engineering


Trieste, Italy - August 24-27, 2011

FrBB.1

A System for Automatic HPV Typing via PCR-RFLP Gel


Electrophoresis
Christos F. Maramis, Student Member, IEEE Anastasios N. Delopoulos, Member, IEEE
Alexandros F. Lambropoulos and Sokratis P. Katafigiotis
Abstract The identification of the types of the human papillomavirus (HPV) that have infected a female patient provides
valuable information as regards to her risk for developing
cervical cancer. A widely used method for performing the above
task (namely HPV typing) is PCR-RFLP gel electrophoresis.
However, the conventional HPV typing protocol is error-prone
and resource-ineffective due to lack of interaction between the
phases involved in it. In order to treat these shortcomings, we
introduce a novel HPV typing system that can be built upon
widely available laboratory equipment. The proposed workflow
of the system automates the task of HPV typing via PCRRFLP gel electrophoresis. The proof-of-concept of the proposed
methodology is evaluated via an experiment that emulates the
operation of the introduced system on a set of real HPV data.

I. INTRODUCTION
According to recent epidemiological studies, the human
papillomavirus (HPV) is considered to be the causal factor of
cervical cancer [1], [2]. For this reason, the detection of HPV
in the cervical cells of a female patient provides an indication
regarding her probability of developing the aforementioned
cancer type. The significance of HPV detection is fostered
by the high frequency of cervical cancer it is one of the
leading cancers affecting women worldwide [3].
However, HPV does not appear in only one form: currently, over 40 HPV types (i.e., variants of the virus characterized by different genotypes) that infect the anogenital tract
have been discovered. Moreover, virologists have classified
these types into four discrete categories with respect to their
associated risk for the development of cervical cancer [4].
Due to the existing diversity among the type-specific risks,
the identification of the exact HPV type(s) that have infected
a female patient provides her medical practitioner with
valuable prognostic information.
The procedure of identifying the infecting HPV types
based on their genotypic differences is called HPV genotyping or, more simply, HPV typing, and it is currently
performed by a variety of molecular biology methods: reverse hybridization assays [5], DNA microarrays [6], DNA
sequencing [7] just to name a few. Among them, the PCRRFLP gel electrophoresis [8], [9] is the method of choice
for the majority of molecular laboratories worldwide, due
C. Maramis and A. Delopoulos are with the Information Processing Laboratory Multimedia Understanding Group, Department of Electrical and
Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki,
54124 GREECE (e-mail: chmaramis@mug.ee.auth.gr, adelo@eng.auth.gr).
A. Lambropoulos and S. Katafigiotis are with the Laboratory of Molecular
Biology, 1st Department of Obstetrics and Gynecology, General Regional
Hospital Papageorgiou, Aristotle University of Thessaloniki, Thessaloniki,
54124 GREECE (e-mail: lambrop@auth.gr, sokikat@otenet.gr).

978-1-4577-1732-1/11/$26.00 2011 IEEE

to its simplicity, cost-effectiveness, and moderate equipment


requirements. The essence of this method is the digestion of
the viral DNA into fragments of known lengths, which are
next used to identify the HPV types.
Although almost all HPV typing methods have been
automated with the help of appropriate specifically-designed
devices (e.g., see [10], [11]), this has not been the case
for PCR-RFLP gel electrophoresis. Instead, HPV typing via
the latter method is still performed manually in two phases
with well-defined boundaries between them. The first phase
involves the in vitro processing of a cervical tissue sample
that has been collected from a subject, and it is followed by
an in silico phase, where the outcome of the first phase is
analyzed with the help of appropriate software in order to
reach to a typing decision. However, as we will demonstrate
here, an efficient HPV typing system can be built should
interaction be introduced between the two phases. On top of
it, the proposed HPV typing system is cost-effective, since it
employs general-purpose equipment that is already available
in the majority of molecular biology laboratories worldwide.
This is opposed to the already automated HPV typing methods, which require specialized hardware and/or consumables
(e.g., DNA microarrays and microarray scanners).
The rest of the paper is structured as follows: In the next
section we explore the state-of-the-art in the field of HPV
typing via PCR-RFLP gel electrophoresis. Then, Section III
introduces the proposed HPV typing system, presenting the
system components and the associated HPV typing workflow.
In Section IV two algorithms that are required at specific
steps of the proposed methodology are presented in more
detail. The proof-of-concept of our approach is evaluated in
Section V. Finally, the conclusions of this work are drawn
in Section VI.
II. BACKGROUND
In this section, we start by describing the conventional
protocol for HPV typing via PCR-RFLP gel electrophoresis,
which we call the single image protocol. Then, we present a
recent related work, which constitutes an important component of the proposed system. Finally, we list the shortcomings
of the single image approach that remain unresolved after the
adoption of the aforementioned related work, and are tackled
by the proposed system.
A. The Single Image HPV Typing Protocol
The in vitro phase starts with the collection of a cervical
tissue sample and the extraction of the contained DNA.

549

Next, the polymerase chain reaction (PCR) [12, Ch. 24]


amplifies a highly reserved region of HPVs L1 gene sequence. After that, a predefined restriction enzyme digests
the amplified viral DNA at sites that are characterized by a
specific nucleotide sequence. This is the restriction fragment
length polymorphism (RFLP) analysis [12, Ch. 50] the
cornerstone of the discussed method; it produces for each
HPV genotype a set of DNA fragments whose lengths in base
pairs (bp) are known a priori. The fragment length pattern
(FLP) that results from the digestion of each virus genotype
serves as its signature throughout the typing process.
The next step is the gel electrophoresis [12, Ch. 5]. First,
the digested PCR product is stained with a fluorescent dye
(ethidium bromide) and an appropriate solution of the stained
DNA is injected into an individual well at the front end of
a gel matrix. Then, in the presence of an electric field, the
negatively-charged DNA fragments are forced to move with
different mobilities (i.e., drift velocities) against the electric
field and toward the anode. During the electrophoresis, the
larger molecules remain closer to the well, while the more
agile smaller molecules cover a much larger distance. This
way, one lane starting from each well is formed; each lane
contains concentrations of DNA of the same length shaped
as bands in the perpendicular to the electric field direction.
On each gel, one or more wells are reserved to include DNA
ladders, i.e., DNA of known lengths.
After the completion of electrophoresis, the gel matrix is
excited by UV light, causing the ethidium bromide molecules
to fluoresce. This way, the viral DNA on the gel becomes
visible and can be captured by a common digital camera.
The acquisition of a digitized image of the gel matrix (see
Fig. 1 for an example) completes the in vitro phase.
In the in silico phase, an expert biologist analyzes the
single acquired image with the help of appropriate
software. The essence of this analysis is the fact that the
intensity of the image at some position can be related to the
viral DNA concentration (viral load) at the corresponding
position on the gel matrix. First, the fragment lengths that
correspond to the observed bands on a lane of interest are
estimated by a software application. This is achieved with
the help of a virtual marker that associates positions along
the electrophoresis axis with fragment lengths; this marker is
constructed via interpolation from the bands of the images
ladder(s). Finally, the biologist manually compares the set
of estimated fragment lengths in the investigated lane with
the FLPs of all HPV types in order to judge which type or
combination of types has produced the observed pattern of
fragment lengths (HPV typing decision).
B. Recent Progress
Although there are many software applications that automate the first step of the in silico phase, i.e., the estimation
of the fragment lengths that correspond to the observed
lane bands [13][17], respective efforts for the actual HPV
typing decision process (i.e., the phases last step) have been
missing. Recently, a methodology for automating the above
process was introduced in [18]; this methodology has further

LANE

LANE

LADDER
200bp

BAND

160bp
BAND

120bp
BAND
80bp

Fig. 1. Typical image of a gel matrix after one-dimensional electrophoresis.


Four lanes that correspond to cervical samples and one DNA ladder are
depicted. Samples of lanes, bands and ladder are enclosed in rectangles.

evolved and has been extensively evaluated in [19]. Since


this constitutes an important stepping stone for the herein
proposed system, the rest of this section is occupied by a
brief description of the aforementioned methodology.
First, the background intensity of the gel electrophoresis
image is estimated and the result is subtracted from the
observed image intensity. Then, an one-dimensional curve
that aggregates the intensity information of an investigated
lane across the perpendicular to the electrophoresis axis is
extracted. This is the intensity profile of the lane and consists
of several (let us assume K) bell-shaped bands.
The methodology of [18] introduces an appropriate observation model, m(), to describe the above intensity profile as
the superposition of K appropriately-shaped peak functions:

m(x; A, , , x) =

K
X
i=1

Ai exp(

1 x xi i
|
| ),
i i

(1)

where x denotes the position in the electrophoresis axis, A =


[A1 , A2 , . . . , AK ] and , , x are defined accordingly. The
optimization procedure for fitting the observation model to
the intensity profile is described in [18], [19]. A method for
selecting the value of K is described also in [18]; however,
here we will employ an alternative method for dealing with
this issue (see Section IV-B).
An appropriate function to model the relation between the
lengths (l) of the DNA fragments and their positions along
the electrophoresis axis (x as above) is also introduced:
x = d(l; ) = 1 + 2 log2 (3 + 4 l + l2 ).

(2)

The length-position relation is calibrated by exploiting the


information that is provided by the bands in the DNA
ladder(s) of the examined image; the optimization procedure
required for the calibration is detailed in [18], [19].
The suitability of the formulas in (1) and (2) for modeling
the intensity profile and the length-position relation respectively has been evaluated extensively by the experiments that
have been conducted on real HPV data in [19]. With the
help of these models, the methodology of [18] is able to
estimate the fragment lengths (li ) and concentrations (ci ) that
correspond to the bands of the investigated lane as follows:

550

i(x)

li
ci

= d(1) (xi ; ) ,
=

1/
2i i i (1/i )

li i

(3)
Ai ,

(4)

for each i = 1, . . . , K, where () is the complete gamma


function. This concludes the first step of the methodology,
namely fragment information extraction.
The aim of the second step, namely virus typing algorithm,
is to decide which combination(s) of HPV types provide
the best quantitative explanation for the extracted fragment
information. First, the compatibility of each HPV type with
the estimated fragment lengths l = [l1 , . . . , lK ] is checked
individually, so as to eliminate the incompatible types. Then,
each combination of the remaining HPV types is tested for
its ability to generate the estimated fragment concentrations
c = [c1 , . . . , cK ] via an optimization procedure (see [18]).
The final typing decision is based on both the optimization
results and the prior probabilities of the HPV type combinations. An alternative approach to testing the combinations
of compatible HPV types is presented in [19].
C. Current Shortcomings
The HPV typing methodology of [18] manages to tackle
several problems that are associated with PCR-RFLP gel
electrophoresis. However, even with its help, the conventional single image protocol, which was described in Section II-A, suffers from certain shortcomings. The main
cause of these shortcomings is the lack of interaction and
synchronization between the in vitro and the in silico phase:
the sole input to the second phase is a single image that has
already been taken in the first phase. As we will demonstrate
in the subsequent sections, the proposed automatic HPV
typing methodology treats these shortcomings by shifting
away from the single image approach.
The main shortcomings to be tackled are the following:
a) Insufficient background intensity information: The
efficient removal of the background intensity from the
examined image is critical if the fragment concentration
information is to be employed [18][20]. However, the information that is provided by the single image approach to the
second typing phase does not suffice for accurate background
subtraction, since the observed intensity of the acquired postelectrophoresis image is the product of interference between
the background intensity and the intensity that is produced
by the viral DNA.
b) Inaccurate fragment concentration estimation: The
accurate estimation of fragment concentration is very important for the methodology of [18], and it has been the reason
for introducing the observation model of (1). However, the
desired accuracy can be achieved only if the bands of a lane
lie adequately far from each other. In the opposite case, i.e.,
when there is extensive overlapping between two bands, the
situations that are depicted in Fig. 2 can occur.
If the observed intensity profile on the left part of Fig. 2
is denoted by i(x), then a parameter vector [A0 , 0 , 0 , x0 ]
can be found such that

Fig. 2. Two discrete cases of extensive band overlapping. For each case,
the underlying overlapping bands are depicted with non-continuous lines,
while the resulting intensity profile is depicted with continuous line.

i(x) A0 exp(

1 x x0 0
|
| ),
0 0

for each x along the electrophoresis axis. This means that


the superposition of the two bands will be confused by the
observation model with a single band.
Another type of problem is caused by the band overlapping
on the right part of Fig. 2. This time, the observed intensity
profile i(x) is not confused with a single band, but there
exists more than one combination of parameter vectors
[A1 , 1 , 1 , x1 ] and [A2 , 2 , 2 , x2 ] such that
i(x) A1 exp(

1 x x2 2
1 x x1 1
|
| )+A2 exp( |
| ),
1 1
2 2

for each x along the electrophoresis axis. In other words,


the observation model cannot decide how to distribute the
observed concentration to the underlying bands.
c) Resource-ineffectiveness: The duration of the gel
electrophoresis has to be decided carefully. Short electrophoresis duration will probably lead to intense band
overlapping (see Section II-A), complicating and sometimes
falsifying the typing decision. On the other hand, unnecessarily long electrophoresis durations contribute to the ineffective
use of the laboratory resources and decrease the throughput
of the HPV typing method. However, in order to circumvent possible band overlapping problems, the single image
protocol usually exaggerates the electrophoresis duration.
III. T HE PROPOSED HPV TYPING SYSTEM
The HPV typing system that we introduce in this work
is able to automatically make HPV typing decisions while
the gel electrophoresis is still in progress. This is achieved
owing to (i) the introduction of interaction between the in
vitro and the in silico typing phase and (ii) the adoption of
the methodology presented in [18].
A. System Components
The proposed system can be built from common, generalpurpose devices that belong to the standard equipment of
most molecular biology laboratories worldwide, and it requires only minor adjustments/modifications to integrate the

551

Image Acquisition
System

step 1:
capture gel image

camera
image acquisition
request/data

fluorescence
chamber

gel

step 2:
locate lanes
step 3:
start electrophoresis

start/stop
electrophoresis

Computer System

step 4.1:
capture gel image
at time t = k t0

UV light source

Electrophoresis Device

Fig. 3. The components of the proposed HPV typing system and the flow
of information between them.

employed devices into a single system. As it can be observed


in Fig. 3, the proposed system consists of three components:
Electrophoresis Device. This is the device where the
electrophoresis of the gel matrix takes place (see
Section II-A). It is configurable with respect to
the main electrophoresis parameters and provides
the computer system with an interface to initiate/terminate the electrophoresis.
Image Acquisition System. This component includes a
fluorescence chamber, a UV light source set on
the chamber ground, and a digital camera attached
to the chamber ceiling. The electrophoresis device
(including the gel matrix) is placed inside the
fluorescence chamber, which ensures controlled illumination conditions, and the UV light excites the
fluorescent dye contained in the gel to make the viral DNA visible. The digital camera is configurable
with respect to imaging parameters (e.g., exposure
time, focus, aperture); it acquires an image of the
visible gel upon request by the computer system
and sends the image back to the computer system.
Computer System. The computer system bears the software that (i) orchestrates the entire HPV typing
process (e.g., by requesting gel images from the
digital camera and by retrieving them when they are
acquired), and (ii) makes the HPV typing decisions.
The integration of the employed components is facilitated
significantly by the fact that most modern electrophoresis
devices are by design UV transparent. Hence, all that is
required is to fix the electrophoresis device on top of the UV
light source inside the fluorescence chamber, and to ensure
a power supply (external or internal) for the electrophoresis.
B. The Proposed HPV Typing Procedure
The proposed HPV typing procedure is outlined by the
flowchart of Fig. 4. It is assumed that the following parameters regarding the examined gel matrix are known: (i) the
number of HPV-related lanes (N ), and (ii) the number (M )
and relative positions of the DNA ladders. Moreover, a time
step t0 has been defined.
Once the viral DNA has been prepared according to the
description of Section II-A and injected into the wells of the

step 4.2:
subtract background
lane N

lane 1

step 4.3.1:
extract intensity profile
step 4.4.1
not typed
& typing
suitable?

step 4.3.N:
extract intensity profile

no

no

yes

step 4.4.N
not typed
& typing
suitable?
yes

step 4.5.1:
estimate fragment
lengths & concentrations

step 4.5.N:
estimate fragment
lengths & concentrations

step 4.6.1:
apply typing algorithm

step 4.6.N:
apply typing algorithm

no

all lanes
typed?
yes
step 5:
stop electrophoresis

Fig. 4.

The flowchart of the proposed HPV typing procedure.

gel matrix, the gel is placed on the electrophoresis device


inside the fluorescence chamber and an initial image of it
(I0 ) is acquired (step 1). The use of this image is twofold.
First, it is used to predict the future boundaries of the lanes
and ladders (step 2), as it will be described in Section IVA. Second, since the viral DNA is restricted to only a very
small portion of the image (i.e., the well areas), it provides
an accurate model for the background intensity component
of the subsequent images; this is true owing to the controlled
illumination conditions in the chamber.
After the above preliminary processing actions, the elec-

552

trophoresis is initiated (step 3). Then, an iterative procedure


commences (step 4). Briefly, this is described as follows:
Until a typing decision has been reached for all the HPVrelated lanes, capture instances of the gel matrix at each time
t = kt0 , with the same image acquisition conditions as in I0 ,
and attempt to type each lane individually. This is the main
loop of the proposed procedure and, as it is clear from its
description, it involves parallelism with respect to the typing
of the lanes. We elaborate on this iterative procedure in the
following paragraphs by focusing on the nth lane.
When the gel image Ik at time k t0 has been acquired
(step 4.1), its background intensity component is removed by
subtracting I0 (step 4.2). Hence, the background-corrected
image I k is given by: I k = Ik I0 , where we assume pixelwise intensity subtraction. Then, the intensity profile in () of
the nth lane is extracted (step 4.3.n) as follows:
in (x) = median[I k (x, y)|ynl 6y6ynr ] ,
y

x = 1, . . . , S X (5)

where (i) S X is the size of the image in the electrophoresis


axis X, and (ii) ynl , ynr are the left and right boundaries of
the lane in the perpendicular to electrophoresis axis Y (see
Section IV-A).
The objective of the next step (step 4.4.n) is to decide
whether the extracted profile in (x) is suitable to be analyzed
according to the methodology of [18] in order to make
a typing decision for the lane. For this purpose, a novel
algorithm is employed to decide whether the profile provides
sufficient information to apply the aforementioned typing
methodology. If in (x) is found to be suitable for typing, HPV
typing is attempted according to [18] using exclusively the
information contained in the intensity profile of the current
image. In the opposite case or if the lane has already been
typed, in (x) is discarded. The typing suitability algorithm is
introduced in Section IV-B.
The estimation of the lanes fragment lengths and concentrations (step 4.5.n) identifies with the fragment information
extraction procedure of [18]. This procedure has been briefly
described in Section II-B. Assuming a lane that includes K
bands, the introduced models of (1) and (2) are employed
to estimate via (3) and (4) the fragment lengths ln =
n
[l1n , l2n , . . . , lK
] and concentrations cn = [cn1 , cn2 , . . . , cnK ] that
correspond to these bands.
The estimated fragment properties are propagated to the
next step (step 4.6.n). This step makes the HPV typing
decision for the lane and identifies with applying the virus
typing algorithm of [18]. A brief description of this procedure has been also given in Section II-B. The outcome of
this step is a set of HPV type combinations that provide
the best explanation for the estimated fragment lengths and
concentrations, when the probability of each type has been
also taken into account.
The execution of steps 4.x.n for all the N HPV-related
lanes of the gel matrix completes one iteration of step 4.
This procedure is repeated until all lanes are typed. Then,
the electrophoresis is terminated (step 5).

Fig. 5. Illustration of application of the lane detection algorithm. On the


left, a sample initial image I0 () is depicted. On the right, the binary image
that results from thresholding at 3 medianx,y [I0 (x, y)] is shown. The
predicted lane boundaries have been superimposed on the right image.

IV. D ETAILED ALGORITHM DESCRIPTIONS


A. Lane detection algorithm
Step 2 detects the lane boundaries, or more accurately
predicts the future boundaries of the lanes. This task is
performed on the basis of the initial gel image I0 and, owing
to the predetermined lane shape, reduces to the estimation of
a pair of positions on axis Y (perpendicular axis) that bound
each lane (left and right boundary).
Lane detection is greatly facilitated by the predictable
appearance of I0 : The image is formed by bright blobs
of DNA that are concentrated in the well areas, while the
background although variable is characterized by much
lower intensities (see the left image in Fig. 5). For this reason,
the pixels that correspond to DNA are easily identified by
thresholding the image at a certain multiple of its median
intensity (i.e., at th1 medianx,y [I0 (x, y)]).
Then, the connected white regions in the resulting binary
image (right image in Fig. 5) are detected. Since as
discussed in Section III-B the number of lanes (N ) and
ladders (M ) on the gel are known, we keep the M + N
most extended white regions, eliminating this way false
noise-related regions. The leftmost and rightmost pixels of
the remaining regions are found and their Y -coordinates
constitute the algorithms estimate of the lane boundaries.
B. Typing suitability algorithm
The main task of the typing suitability algorithm (step
4.4.n) is to decide whether the concentration information of
the lane bands can be inferred accurately enough from a
given intensity profile. Since the accurate estimation of the
band concentrations is critical for the employed HPV typing
methodology of [18], the algorithms decision will determine
whether the proposed system will proceed with HPV typing
(steps 4.5.n 4.6.n) based on the current profile or it will
wait some time for a more suitable profile to be extracted. In
the case that the current profile is considered to be suitable
for typing, the algorithm also estimates the number K and
approximate positions xi of the lane bands, which serve as
input to the employed typing methodology.
If we assume that the background intensity of the profile
has been removed efficiently in step 4.2, then the typing
suitability decision can be extracted by examining the local
extrema of the intensity profile and its first derivative. Let us
now take a look at the upper part of Fig. 6, which illustrates
the evolution of the intensity profile i(x) resulting from two

553

local maximum (minimum). Function append(L, j) appends


j to the end of list L, whose cardinality is denoted by kLk.
The lists L1 , L2 , and L3 hold at algorithm termination the
local maxima of the intensity profile, the local maxima of
the first derivative and the local minina of the first derivative
respectively. Symbol denotes the negation of a logical
proposition. Finally, the thresholding operation i(x) > th2 m
is employed to avoid the consideration of small noise-related
local maxima of the intensity profile.

i(x)

0
i(x)

Fig. 6. Evolution of the overlapping between two bands during electrophoresis. In the upper part, the intensity profiles (continuous lines)
resulting from the underlying overlapping bands (dashed lines) are depicted
at three time instances. Below the profiles, the corresponding first derivative
curves are drawn. The local maxima of the intensity profiles and the local
extrema of the first derivatives are noted (stars).

overlapping bands, as electrophoresis proceeds. Initially the


overlapping bands appear perfectly as one; then, they start
to separate but they still produce a single peak; finally, they
are separated as much as needed to form two peaks in the
profile. The first derivative curves, i (x), of the investigated
intensity profiles are depicted in the lower part of Fig. 6.
Among the described band overlapping instances, only
the third type is exploitable, i.e., suitable for typing. As
discussed in Section II-C, the other two types hinder the
accurate estimation of the underlying concentrations, and,
for this reason, they should be detected by the algorithm.
Regarding the detection of the second type of overlapping,
we can exploit the following observation: Although in the
other two overlapping types each pair of consecutive local
extrema (maximum and minimum) of the first derivative
surrounds a local maximum of the profile, an orphan pair
of extrema is observed in the first derivative of the second
overlapping type. If such an one-to-one correspondence
between the local extrema of the first derivative and the
local maxima of the intensity profile can be established, then,
the potential typing suitability criterion is satisfied and the
second type of overlapping is ruled out.
However, both the first and the third type of overlapping
satisfy the above criterion. In order to circumvent this
situation, we do not proceed immediately to HPV typing
once we diagnose satisfaction of the criterion. Instead, we
give any existing bands of the first type the required time to
transit to the second type. If no such transitions are observed
for a specific amount of time, i.e., if a specific number
of consecutive criterion satisfactions is diagnosed, then we
proceed to steps 4.5.n 4.6.n. When this is the case, the
local maxima of the last intensity profile are also employed
for estimating the number and approximate positions of the
lane bands.
The potential typing suitability criterion is expressed formally by Algorithm 1. The investigated intensity profile is
denoted by i(x) for x = 1, . . . , S X . The operators Dx [] and
medianx [] calculate the first derivative and the median of a
digital signal respectively. The boolean function lmax(s(), i)
(lmin(s(), i)) is true if the ith sample of signal s() is its

Algorithm 1 Evaluate potential typing suitability criterion


L1 , L2 , L3 =
i (x) = Dx [i(x)]
m = medianx [i(x)]
for x = 1 to S X do
if i(x) > th2 m then
if lmax(i(), x) then
append(L1 , x)
end if
if lmax(i (), x) then
append(L2 , x)
end if
if lmin(i (), x) then
append(L3 , x)
end if
end if
end for

if kL1 k = kL2 k = kL3 k then
return false
end if
for i = 1 to kL1 k do

if L2 (i) 6 L1 (i) 6 L3 (i) then
return false
end if
end for
return true
V. P ROOF OF C ONCEPT
In order to prove the feasibility of our typing methodology,
we emulated the operation of the proposed system to perform
HPV typing on a set of cervical samples collected from 4
female patients. These had already been typed according
to the conventional approach by expert biologists in the
Molecular Biology Laboratory, Papageorgiou Hospital, Thessaloniki (Greece). The samples were processed as described
in Section II-A. Regarding the RFLP analysis, two digestion
configurations were employed: (i) digestion by the restriction
enzyme HpyCH4V [9], and (ii) concurrent triple digestion
by the enzymes PstI, HaeIII and RsaI [8]. The resulting 8
DNA samples (4 patients 2 digestions) along with 2 DNA
ladders were injected into the wells of a gel matrix.
The gel matrix was placed at a specific position inside
the fluorescence chamber of the image acquisition system
R HP was employed) and the initial image
(AlphaImager
was acquired by the integrated camera. The exposure time
parameter was adjusted so as to avoid saturation of the image

554

TABLE I
G EL MATRIX SETUP AND ASSOCIATED HPV TYPING RESULTS .

Fig. 7. The gel matrix images that have been acquired for the system
emulation after 30, 60, and 80 min. of electrophoresis (top to bottom).
Image processing techniques have been applied to improve visualization.

intensity. The acquired image was stored in a computer


system via the cameras firewire interface. The employed
imaging parameters (aperture, zoom, focus, exposure time)
were recorded.
After that, the gel matrix was fixed on top of the electrophoresis device and was electrophorized at 160 V. Every
10 min. the following procedure was performed: The electrophoresis was stopped and the gel matrix was placed at
the predefined position inside the fluorescence chamber to be
photographed according to the recorded imaging parameters.
The image was stored to the computer system and the gel
matrix was returned to the electrophoresis device to resume
the electrophoresis. This procedure was repeated 8 times,
resulting in an overall electrophoresis duration of 80 min.
The 9 resulting images (see Fig. 7) were employed to type
the DNA samples in an entirely automatic manner according
to the methodology described in Section III-B with a single
exception. Due to the inability of the system emulation
to ensure the desired accuracy in placing the gel matrix
inside the fluorescence chamber, an alternative method for
background subtraction was employed, namely the rolling
disk approach [21]. The HPV types that were taken into

Lane

Class

Expert Diagnosis

Typed

1st
2nd
3rd
4th
5th
6th
7th
8th
9th
10th

P1/D1
P1/D2
P2/D1
P2/D2
Ladder 1
P3/D1
P3/D2
P4/D1
P4/D2
Ladder 2

HPV53
HPV53
HPV66a
HPV66a

HPV16
HPV16
HPV6 & HPV53 & HPV59
HPV6 & HPV53 & HPV59

yes
yes
yes
yes

yes
yes
yes
no

Diagnosis
Rank
1/4
1/1
1/1
1/2

1/1
1/2
2/18

account by the typing algorithm are those considered in [9].


The prior probabilities of these types were estimated with
the help of the type-specific HPV infection frequencies that
were retrieved from the cervical cancer repository of the
ASSIST project [22]. These probabilities were employed
for ranking the solutions of the HPV typing algorithm as
described in [18].
A prototype implementation of the proposed methodology
R The value 5 was employed
was developed in MATLAB .
for thresholds th1 and th2 (see Section IV-A and IV-B),
while a disk radius equal to 3% of the lanes height was
considered for the background subtraction. The compatibility
and length coincidence thresholds that are associated with the
methodology of [18] (steps 4.5.n-4.6.n) were set to 20 and
7 respectively. Moreover, two consecutive satisfactions of
the potential typing suitability criterion were required before
attempting an HPV typing decision for a certain sample.
In Fig. 7, three of the employed gel images are depicted.
The gel matrix setup and the typing results for each sample
are given in Table I. In the 2nd column, Px denotes the xth
patient and Dy the yth digestion configuration. In the 5th
column, z/w denotes that the expert diagnosis (provided in
the 3rd column) has been ranked zth among the w solutions
of the HPV typing algorithm. Finally, the 4th column informs
us whether the sample has been typed or not at the end of
the experiment.
The values of the potential typing suitability criterion for
each sample at each examined time instance are presented
in Table II. In this table 1 denotes the criterion satisfaction,
while 0 denotes the opposite case. Once a typing decision
has been reached, the criterion stops being evaluated. The
extracted intensity profiles of two samples at the time when
their typing decisions were made and the associated first
derivative curves are depicted in Fig. 8.
VI. C ONCLUSION
The lack of interaction between the phases of the single
image protocol for HPV typing via PCR-RFLP gel electrophoresis has been the source of several shortcomings: Erroneous typing decisions due to extensive band overlapping
and unnecessary loss of laboratory resources are probably the
most serious among them. The need to tackle these shortcomings has been the motivation for proposing the system

555

TABLE II
P OTENTIAL TYPING SUITABILITY CRITERION FOR EACH SAMPLE
( COLUMNS ) AT EACH TIME INSTANCE ( ROWS ).
time\sample
10 min.
20 min.
30 min.
40 min.
50 min.
60 min.
70 min.
80 min.

1st
0
0
0
0
1
1

2nd
0
0
0
0
1
1

3rd
0
0
0
0
0
1
1

4th
0
0
1
0
1
1

6th
0
0
0
0
1
1

7th
0
0
0
0
0
1
1

8th
0
0
0
0
1
0
1
1

9th
0
0
0
0
0
1
0
0

Fig. 8. The intensity profile (middle) and corresponding first derivative


curves (bottom) of the first (left) and the third (right) lane of the gel matrix
after 60 and 70 min. of electrophoresis respectively. It is at these time
instances that the HPV typing decisions regarding the associated samples
are made. The thresholding value that is involved in the potential typing
suitability criterion is indicated by a horizontal line in the intensity profile
graphs.

that has been described here. Owing to the introduction of


interaction between the typing phases and the appropriate
novel algorithms, the proposed system manages to automate
entirely the task of HPV typing; this denotes significant
progress for the discussed molecular biology method.
The proof-of-concept of the proposed approach has been
evaluated on a small set of real HPV data. The results
from the emulation of the system operation have been very
encouraging. Indeed, the system has automatically reached to
correct HPV typing decisions for all but one examined sample. It worths mentioning that the untyped sample (9th lane)
is the product of triple infection, and, for this reason, the
pattern of the contained DNA fragment lengths is unusually
complex. This complexity has most probably prevented the
potential typing suitability criterion from being satisfied two
times in a row within the employed electrophoresis run. In
support of this explanation, the other triply-infected sample
(8th lane) has been the last one to be typed at the last
examined time instance (please refer to Table II).
The feasibility of the proposed approach requires further
validation through emulation experiments on larger sets of
real HPV data and/or simulations. Then, we should proceed
with the prototype implementation of the proposed system
possibly also as a compact integrated device.
R EFERENCES
[1] J. Walboomers, M. Jacobs, M. Manos, F. Bosch, J. Kummer, K. Shah,
P. Snijders, J. Peto, C. Meijer, and N. Munoz, Human papillomavirus
is a necessary cause of invasive cervical cancer worldwide, Journal
of Pathology, vol. 189, no. 1, pp. 1219, 1999.

[2] F. Bosch, A. Lorincz, N. Munoz, C. Meijer, and K. Shah, The causal


relation between human papillomavirus and cervical cancer, Journal
of Clinical Pathology, vol. 55, no. 4, pp. 244265, 2002.
[3] S. Landis, T. Murray, S. Bolden, and P. Wingo, Cancer statistics,
1999. CA: A Cancer Journal for Clinicians, vol. 49, no. 1, p. 8.
[4] N. Munoz, F. Bosch, S. de Sanjose, R. Herrero, X. Castellsague,
K. Shah, P. Snijders, C. Meijer et al., Epidemiologic classification
of human papillomavirus types associated with cervical cancer, New
England Journal of Medicine, vol. 348, no. 6, pp. 518527, 2003.
[5] B. Kleter, L. Van Doorn, L. Schrauwen, A. Molijn, S. Sastrowijoto, J. Ter Schegget, J. Lindeman, B. Ter Harmsel, M. Burger,
and W. Quint, Development and clinical evaluation of a highly
sensitive PCR-reverse hybridization line probe assay for detection and
identification of anogenital human papillomavirus, Journal of clinical
microbiology, vol. 37, no. 8, p. 2508, 1999.
[6] T. Oh, C. Kim, S. Woo, T. Kim, D. Jeong, M. Kim, S. Lee, H. Cho, and
S. An, Development and clinical evaluation of a highly sensitive DNA
microarray for detection and genotyping of human papillomaviruses,
Journal of clinical microbiology, vol. 42, no. 7, p. 3272, 2004.
[7] B. Gharizadeh, M. Kalantari, C. Garcia, B. Johansson, and P. Nyren,
Typing of human papillomavirus by pyrosequencing, Laboratory
Investigation, vol. 81, no. 5, pp. 673679, 2001.
[8] O. Lungu, T. Wright et al., Typing of human papillomaviruses by
polymerase chain reaction amplification with L1 consensus primers
and RFLP analysis, Molecular and cellular Probes, vol. 6, no. 2, pp.
145152, 1992.
[9] E. Santiago, L. Camacho, M. Junquera, and F. Vazquez, Full HPV
typing by a single restriction enzyme, Journal of clinical virology,
vol. 37, no. 1, pp. 3846, 2006.
[10] J. Lee, M. Kim, S. Song, J. Hong, K. Min, J. Kim, E. Song, J. Lee,
J. Lee, and S. Hur, Comparison of Human Papillomavirus Detection
and Typing by Hybrid Capture 2, Linear Array, DNA Chip, and
Cycle Sequencing in Cervical Swab Samples, International Journal
of Gynecological Cancer, vol. 19, no. 2, p. 266, 2009.
[11] A. Ermel, B. Qadadri, A. Morishita, I. Miyagawa, G. Yamazaki,
B. Weaver, W. Tu, Y. Tong, M. Randolph, H. Cramer et al., Human
papillomavirus detection and typing in thin prep cervical cytologic
specimens comparing the Digene Hybrid Capture II Assay, the Roche
Linear Array HPV Genotyping Assay, and the Kurabo GeneSquare
Microarray Assay, Journal of Virological Methods, 2010.
[12] D. Tagu and C. Moussard, Techniques for molecular biology. Science
Pub Inc, 2006.
[13] TotalLab::Phoretix, http://www.totallab.com/products/totallabquant,
June 2011.
[14] GelCompar II - Fingerprint and Gel Analysis Software,
http://www.applied-maths.com/gelcompar/gelcompar.htm,
June
2011.
[15] Gel-Pro Analyzer - Gel Analysis and electrophoresis analysis software, http://www.mediacy.com/index.aspx?page=GelPro, June 2011.
[16] I. Bajla, I. Hollander, S. Fluch, K. Burg, and M. Kollar, An alternative method for electrophoretic gel image analysis in the GelMaster
software, Computer methods and programs in biomedicine, vol. 77,
no. 3, pp. 209231, 2005.
[17] S. Shadle, D. Allen, H. Guo, W. Pogozelski, J. Bashkin, and T. Tullius,
Quantitative analysis of electrophoresis data: novel curve fitting
methodology and its application to the determination of a proteinDNA binding constant, Nucleic acids research, vol. 25, no. 4, pp.
850860, 1997.
[18] C. Maramis, A. Delopoulos, and A. Lambropoulos, Analysis of PCRRFLP Gel Electrophoresis Images for Accurate and Automated HPV
Typing, in 10th International Conference on Information Technology
and Applications in Biomedicine, ITAB 2010, Corfu, Greece, 2010,
pp. 16.
[19] , A Computerized Methodology for Improved Virus Typing by
PCR-RFLP Gel Electrophoresis, IEEE Transactions on Biomedical
Engineering, in press.
[20] C. Maramis and A. Delopoulos, Efficient Quantitative Information
Extraction from PCR-RFLP Gel Electrophoresis Images, in 20th
International Conference on Pattern Recognition, ICPR 2010, Istanbul,
Turkey, 2010, pp. 25642567.
[21] I. Mikhailyuk and A. Razzhivin, Background subtraction in experimental data arrays illustrated by the example of Raman spectra and
fluorescent gel electrophoresis patterns, Instruments and Experimental Techniques, vol. 46, no. 6, pp. 765769, 2003.
[22] The Cervical Cancer Repository of the ASSIST Project,
http://kastor.ee.auth.gr:8888/assist, June 2011.

556

Vous aimerez peut-être aussi