Vous êtes sur la page 1sur 35

CS253 Report 3 The Edit Distance Problem

Aaron Wilhelm

October 26, 2011

Background and Motivation


Dynamic Programming is a programming technique in which a problem is divided into multiple sub-problems, the sub-problem is then solved and combined to solve the original problem. This is very similar to the divide and conquer methodology with the exception that dynamic programming saves the results of intermediate steps. This optimizes for time since since the many of the sub-problems are calculated multiple times. Many of these problems when not optimized in this manner have a run time complexity of O(2n+m ), while the optimized version of the algorithm will have a run time complexity of O(m n). There is a trade o though, the the space complexity of the faster algorithm is
O(m n) and the space complexity of the slower algorithm is O(m + n). Since

we generally wish to have results quickly and memory is relatively cheap the faster algorithm is generally the preferred choice. One problem that requires the use of dynamic programming techniques is the edit distance problem. The edit distance problem is concerned with nding the minimum changes that can be made to one string to get a desired resultant string. Dierent implementations of the algorithms used to solve this problem use dierent transformations for string modication. The transformations used are generally a subset of the following transformations:

Copy: simply copies a character from the output string to the input string Replace: sets a character in the input string to a value in the output string Delete: deletes a character in the input string Insert: inserts a character from the output string Twiddle: swaps the next two characters Kill: removes the rest of the characters in the text
2

Understanding the edit distance problem can be used in several applications such as, spell checking, spell checkers must search for closest word to a misspelled word, DNA sequencing uses need to nd similarities in DNA structure to nd what what sequences are responsible for what attributes and di algorithms are used to nd and track dierences in source code and other les to manage what was done by whom.

Procedures
In order to get the algorithms to work appropriate weights and transformations had to picked. To pick the transformations I attempted to pick the most basic of transformations and also make sure that I was guaranteed the ability to transform and string to any other string. Since kill is simply a repeated delete until the string is gone I decided not to use it. Since replace can easily be substituted by a delete and an insert I didn't implement replace either. Copy, delete, insert and twiddle are what I decided to use. Twiddle was used because I wished to possibly use these algorithms with DNA sequencing or to aid in tracking changes in text documents where it is common for letters or lines to be switched. For the weights, the original values relative to each other were picked by using logic. For instance the copy transformation should be the only transformation used when the beginning and ending string are the same, therefore the copy operator should be lowest in weight. Copying should also be lower weight than inserting or deleting since copying changes the string the least, both copying and inserting do almost the same amount in modifying the string so they should be close to the same and twiddle should have a fairly low weight since nothing is created or destroy only ipped around. From there I modied the values by hand until I got transformations that matched what I believed to be intuitive 3

answers (These are in the unit testing portion of the code).

How The Recursive Algorithm Works


The way that this algorithm works is by splitting up the problem into smaller sub-problems, this nds the minimum weight of the by recursively trying to nd the minimum weight to turn input[i+1..input.size] into output[j..output.size], input[i+1..input.size] into output[j+1..output.size], input[i..input.size] into output[j+1..output.size] and input[i+2..input.size] into output[j+2..output.size].

Algorithm 1 RecursiveM inW eight(input, i, output, j) Pre: When originally called i and j equal 1 Post: Return minimum weight needed to change input[i..input.size] to out1: smallest = 2: i input.size OR j output.size 3: j output.size 4: weight = IN SERT _W EIGHT

put[j..output.size]

if

if

5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:

end if if i input.size then

+ RecursiveM inW eight(input, i, output, j + 1) smallest = min(smallest, weight)

then

then

AND j output.size AND input[i] == output[j]

end if if i input.size then

weight = COP Y _W EIGHT + RecursiveM inW eight(input, i + 1, output, j + 1) smallest = min(smallest, weight) weight = DELET E _W EIGHT + RecursiveM inW eight(input, i + 1, output, j) smallest = min(smallest, weight) j + 1 output.size AND input[i] == output[j + 1] AND output[j] == input[i + 1] weight = T W IDDLE _W EIGHT + RecursiveM inW eight(input, i + 2, output, j + 2) smallest = min(smallest, weight)

end if if i + 1 input.size AND

16: 17: 18: 19: 20: 21: 22:

then

end if else smallest = 0 end if return smallest

Complexity Analysis
The recurrence for this algorithm is
T (n, m) = T (n 1, m) + T (n, m 1) + T (n 1, m 1) + T (n 2, m 2) (1)

Which grows at an exponential rate leading to a run time complexity of


O(2n+m )

(2)

The memory complexity on the other hand has a constant memory per call to itself and has max depth of n + m leading to a memory complexity of
O(n + m)

(3)

How The Iterative Algorithm Works


This algorithm works in a manner similar to the above algorithm except instead of calling itself to get the value of the sub-problems it uses a table of lower look up values. The function calc_weight requires for values of the sub-problems of changing input[i..input.size] to output[j..output.size] to already be solved. This can be achieved by simply using calc_weight on the largest values of indexes before using it on the larger values

Algorithm 2 IterativeM inW eigh(input, output) Pre: None Post: This will return the minimum weight of transforms to turn input into
1: Let weightable[1..input.size + 1][1..output.size + 1] be a new array 2: Let transf ormtable[1..input.size + 1][1..output.size + 1] be a new array 3: i = input.size + 1 to 1 4: j = output.size + 1 to 1 5: calc_weight(transf ormtable, weighttable, input, output, i, j) 6: 7: 8: F indM inP ath(transf ormtable, weighttable) 9: weighttable[1][1]

output and it will save the list of best transforms

for for end for end for return

do

do

This gets the minimum weight of need to change input[i..input.size] to output[j..output.size]. 6

Algorithm 3 calc_weight(transf ormtable, weighttable, input, output, i, j) Pre: The the weight and transform table must be lled correctly for values if x and y such that x > i and y > j Post: This will save that optimal solution in weighttable[i][j] 1: smallest = 2: if i input.size OR j output.size then 3: if j output.size then 4: weight = IN SERT _W EIGHT + weighttable[i][j + 1] 5: if weight < smallest then
6: 7: 8: 9: 10:

11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24:

end if end if if i input.size AND j output.size AND input[i] == output[j] then weight = COP Y _W EIGHT + weighttable[i + 1][j + 1] if weight < smallest then end if end if if i input.size then weight = DELET E _W EIGHT + weighttable[i + 1][j] if weight < smallest then end if end if if i + 1 input.size AND if
trans = DELET E smallest = weight trans = COP Y smallest = weight

trans = IN SERT smallest = weight

25: 26: 27: 28: 29: 30: 31: 32: weighttable[i][j] = smallest 33: transf ormtable[i][j] = trans

end if end if end if

j + 1 output.size AND input[i] == output[j + 1] AND output[j] == input[i + 1] weight = T W IDDLE _W EIGHT + weight[i + 2][j + 2] weight < smallest trans = T W IDDLE smallest = weight

then

then

Algorithm 4 F indM inP ath(transf ormtable, weighttable) Pre: The 2-D arrays transformtable and weighttable is properly ll, with (1,1) the in the tables is the minimum transform weight Post: That the returned list will contain the the smallest
1: Let minList be an empty list 2: i = 1 3: j = 1 4: i start.size OR j end.size 5: t = transf ormtable[i][j] 6: minList.pushb ack(t) 7: t == COP Y 8: i=i+1 9: j =j+1 10: t == DELET E 11: i=i+1 12: t == IN SERT 13: j =j+1 14: t == T W IDDLE 15: i=i+2 16: j =j+2 17: 18: 19: minList

while if

do

then

else if else if else if

then then then

end if end while return

Complexity Analysis
The function FindMinPath will run in time proportional to the number of elements in the weight table which means the FindMinPath runs in O(n m) time, since calc_weight runs in constant time and is called a number of times proportional to n m so the running time of the iterative algorithm is
O(n m)

(4)

The memory usage is constant every where except for the variables weighttable and transform table which each have a size of (m+1)(n+1). So the memory

complexity of the iterative algorithm is


O(n m)

(5)

Testing Plan and Results


First the testing will made sure the algorithms worked for very small inputs including ones where one or both of the strings are of size zero. Then small tests were performed to test for easy to check answers such as when the two string are the same, the input string is the output string with all the odd elements swapped with its previous element. Then more complicated were performed by hand and checked by by coding in the answer into the unit test code. The running times of the recursive algorithm is not reduced in complexity by a large amount, the running time will still be O(2n+m ), although with strings where none of the characters are the same lines 8 and 16 of algorithm never get executed and cause the running time to decrease. The worst would happen when all the elements are the same causing lines 8 and 16 to be run every time. With the iterative algorithm the only thing aecting the running time is how many dierent transforms are in the optimal transformation list. The more transformations there are in the list the more the algorithm that get the optimal transformation list has to travel through the large two dimensional table. This means the best running time will be when the two strings will have nothing but twiddles for the optimal transformation list. The slowest time will then be when the two strings have no characters in common. For all of the benchmarks, even thought the running time depends on both string sizes, the sizes of the two strings are of equal size. This is to simplify things and because there is nothing to learn from performing those tests.

Results
Recursive Version Empirically Derived Run Times:
Recursive Edit Distance Results 70 Recursive Plot Recursive Approximation 60

50

Time (Sec.)

40

30

20

10

0 8 9 10 11 Input Size 12 13 14 15

T (n) = 0.00162938 2n

(6)

n0 = 13

(7)

Iterative Version Empirically Derived Run Times:

10

Iterative Edit Distance Results 25 Iterative Plot Iterative Approximation

20

Time (Sec.)

15

10

0 0 2000 4000 6000 8000 10000 Input Size 12000 14000 16000 18000 20000

T (n) = 5.90954e 08 n2

(8)

n0 = 9000

(9)

Final Weights
COPYWEIGHT = 1 DELETEWEIGHT = 3 INSERTWEIGHT=3 TWIDDLEWEIGHT=2

11

Problems
One problem was that there was no clear cut best set of weights or even best sequences for the more complicated input strings. This was dealt with by setting up many dierent test cases and hand tweaking the weights until sensible solutions were found. Another is that the recursive version was so slow that using it to compare transformation sequences with the iterative version quickly became unbearable. Another problem is that nding the exact values of n0 such that the actual time taken of input sizes of n n0 behaves like the asymptotic complexity is dicult to nd do to timer resolution begin only a second.

Conclusion
From the data it has been shown that by using dynamic programming it has drastically cut down the running time complexity but at the cost of increasing the memory complexity. Both algorithms exhibited their expected run time complexities and the recursive algorithm's time approached asymptotic complexity very quickly, at about n0 = 13 while the iterative version was about
n0 = 9000. With the experiments I performed I arrived at

12

Appendix A
Source Code
main.cpp
// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / // / \ file \ author \brief main . c p p Aaron Wilhelm and benchmarking of the Edit Distance algorithms

Testing

#include <iostream > #include <vector > #include <c a s s e r t > #include <s t r i n g > #include <fstream > #include <c s t d l i b > #include <time . h> #include " e d i t _ d i s t a n c e . h" #define ITER_START_SIZE 1000 #define ITER_STEP_SIZE 1000 #define ITER_END_SIZE 20300 #define RECV_START_SIZE 8 #define RECV_STEP_SIZE 1 #define RECV_END_SIZE 15 using namespace std ; void s t r 2 v e c t o r ( const s t r i n g & s t r ,
{ { } { }

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

std : : vector <

char> &

vct )

for ( unsigned int

vct . c l e a r ( ) ;

i = 0 ; i < s t r . s i z e ( ) ; ++i )

vct . push_back ( s t r [ i ] ) ;

void
{

str2vector ( vct . c l e a r ( ) ;

const char

s t r , std : : vector <

char> &

vct )

for ( unsigned int

i = 0 ; s t r [ i ] != ' \0 ' ; ++i )

13

vct . push_back ( s t r [ i ] ) ;

void
{

parse_inputs ( string str ;

int

argc ,

char

argv [ ] ,

bool

r,

bool

i,

bool

u )

for ( int
{ } { } { } { } {

j = 0 ; j < argc ; ++j )

s t r = argv [ j ] ;

if (

s t r == "h" ) cout << " Well your screwed " << endl ; exit (0); s t r == " b i " )

else if (

( i ) =

else if (

true ; true ; true ;

s t r == "br " )

( r ) =

else if (

s t r == "u" )

(u) =

else

} {

cout << "Not a v a l i d f l a g t r y u s i n g h" << endl ; exit (1);

int

main (

int

argc ,

Edit_Dist< Edit_Dist<

bool bool bool

char> char>

char

argv [ ] )

diff ; iterate , recurse ;

bench_recv = bench_it = unit_test =

false ; false ; false ;

//

benchmark

recursive

algo

//

benchmark

iterative

algo

14

vector < > start , f i n i s h ; ofstream r e c _ f i l e , i t _ f i l e ; time_t start_time , end_time ; srand ( time ( 0 ) ) ; argc ; argv++; parse_inputs ( argc , argv , &bench_recv , &bench_it , &u n i t _ t e s t ) ; {

char

if (

bench_recv ) cout << " S t a r t i n g R e c u r s i v e Benchmark" << endl ; r e c _ f i l e . open ( " r e c u r s e . dat " ) ; start . clear (); finish . clear (); {

for ( unsigned int if (


{

i = 1 ; i <= RECV_END_SIZE; ++i )

s t a r t . push_back ( rand ( ) ) ; f i n i s h . push_back ( rand ( ) ) ; i >= RECV_START_SIZE && ( i RECV_START_SIZE) % RECV_STEP_SIZE == 0) start_time = time (NULL) ; r e c u r s e . find_min_weight ( s t a r t , f i n i s h ) ; end_time = time (NULL) ; r e c _ f i l e << i << "\ t " << ( end_time start_time ) << endl ;

} {

} } rec_file . close (); cout << " Ending R e c u r s i v e Benchmark" << endl ; bench_it )

if (

cout << " S t a r t i n g I t e r a t i v e Benchmark" << endl ; i t _ f i l e . open ( " i t e r a t e . dat " ) ; start . clear (); finish . clear (); {

for ( unsigned int

i = 1 ; i <= ITER_END_SIZE; i++ )

s t a r t . push_back ( rand ( ) ) ; f i n i s h . push_back ( rand ( ) ) ; 15

if (
{

i >= ITER_START_SIZE && ( i ITER_START_SIZE) % ITER_STEP_SIZE == 0) start_time = time (NULL) ; i t e r a t e . it_min_weight ( s t a r t , f i n i s h ) ; end_time = time (NULL) ; i t _ f i l e << i << "\ t " << ( end_time start_time ) << endl ;

}
//

it_file . close (); cout << " Ending I t e r a t i v e Benchmark" << endl ;

some

simple

tests

if (

unit_test ) cout << " S t a r t i n g Unit Test " << endl ;


/

start . clear (); finish . clear ();

EMPTY VECTOR /

i t e r a t e . it_min_weight ( s t a r t , f i n i s h ) ; r e c u r s e . find_min_weight ( s t a r t , f i n i s h ) ; a s s e r t ( i t e r a t e . s i z e ( ) == 0 ) ; a s s e r t ( f i n i s h . s i z e ( ) == 0 ) ;
/

start . clear (); finish . clear (); {

OBVIOUS COPY TEST /

for ( int

i = 0 ; i < 1 0 ; ++i )

} i t e r a t e . it_min_weight ( s t a r t , f i n i s h ) ; r e c u r s e . find_min_weight ( s t a r t , f i n i s h ) ; 16

s t a r t . push_back ( 'A ' ) ; f i n i s h . push_back ( 'A ' ) ;

a s s e r t ( i t e r a t e . s i z e ( ) == s t a r t . s i z e ( ) ) ; { }
/

for ( int

i = 0 ; i < 1 0 ; ++i )

a s s e r t ( i t e r a t e [ i ] == r e c u r s e [ i ] ) ; a s s e r t ( i t e r a t e [ i ] == COPY ) ;
OBVIOUS INSERT TEST /

start . clear (); finish . clear (); {

for ( int

i = 0 ; i < 1 0 ; ++i )

f i n i s h . push_back ( 'B ' ) ; } i t e r a t e . it_min_weight ( s t a r t , f i n i s h ) ; r e c u r s e . find_min_weight ( s t a r t , f i n i s h ) ; a s s e r t ( i t e r a t e . s i z e ( ) == r e c u r s e . s i z e ( ) ) ; a s s e r t ( i t e r a t e . s i z e ( ) == f i n i s h . s i z e ( ) ) ; {

for ( unsigned int

i = 0 ; i < i t e r a t e . s i z e ( ) ; ++i )

}
/

a s s e r t ( i t e r a t e [ i ] == r e c u r s e [ i ] ) ; a s s e r t ( i t e r a t e [ i ] == INSERT ) ;
OBVIOUS DELETE TEST /

start . clear (); finish . clear (); {

for ( int

i = 0 ; i < 1 0 ; ++i )

s t a r t . push_back ( 'B ' ) ; } i t e r a t e . it_min_weight ( s t a r t , f i n i s h ) ; r e c u r s e . find_min_weight ( s t a r t , f i n i s h ) ; a s s e r t ( i t e r a t e . s i z e ( ) == r e c u r s e . s i z e ( ) ) ; 17

{ }

for ( unsigned int

i = 0 ; i < i t e r a t e . s i z e ( ) ; ++i )

a s s e r t ( i t e r a t e [ i ] == r e c u r s e [ i ] ) ; a s s e r t ( i t e r a t e [ i ] == DELETE ) ;
OBVIOUS TWIDDLE TEST /

start . clear (); finish . clear ();

s t r 2 v e c t o r ( "AaBbCc" , s t a r t ) ; s t r 2 v e c t o r ( "aAbBcC" , f i n i s h ) ; i t e r a t e . it_min_weight ( s t a r t , f i n i s h ) ; r e c u r s e . find_min_weight ( s t a r t , f i n i s h ) ; a s s e r t ( i t e r a t e . s i z e ( ) == r e c u r s e . s i z e ( ) ) ; {

for ( unsigned int

i = 0 ; i < i t e r a t e . s i z e ( ) ; ++i )

a s s e r t ( i t e r a t e [ i ] == r e c u r s e [ i ] ) ; a s s e r t ( i t e r a t e [ i ] == TWIDDLE ) ;

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

start . clear (); finish . clear ();

s t r 2 v e c t o r ( " This t e x t w i l l be m od if i ed " , s t a r t ) ; s t r 2 v e c t o r ( "Text m o d i f c a t i o n done " , f i n i s h ) ; i t e r a t e . it_min_weight ( s t a r t , f i n i s h ) ; a s s e r t ( i t e r a t e [ 0 ] == COPY) ; i = 1 ; i <= 5 ; i ++) { a s s e r t ( i t e r a t e [ i ] == DELETE) ; } ( i = 6 ; i <= 9 ; i++ ) { a s s e r t ( i t e r a t e [ i ] == COPY) ; } ( i = 1 0 ; i <= 1 7 ; i ++)

for ( unsigned int

for unsigned int for unsigned int

18

{ } { } {

a s s e r t ( i t e r a t e [ i ] == DELETE) ; i = 1 8 ; i <=22; i ++)

for ( unsigned int

a s s e r t ( i t e r a t e [ i ] == COPY) ; i = 2 3 ; i <= 2 5 ; i ++)

for ( unsigned int

a s s e r t ( i t e r a t e [ i ] == INSERT ) ; } a s s e r t ( i t e r a t e [ 2 6 ] == COPY) ; ( i = 2 7 ; i <= 32 ; i ++) { a s s e r t ( i t e r a t e [ i ] == INSERT ) ; } a s s e r t ( i t e r a t e [ 3 3 ] == COPY) ; a s s e r t ( i t e r a t e [ 3 4 ] == DELETE) ;

for unsigned int

} }

cout << " Ending Unit Test " << endl ; 0;

return

19

edit_distance.h

#ifndef #define
// / // / // /

EDIT_DISTANCE_H EDIT_DISTANCE_H
edit_distance . h Aaron Wilhelm for Class to handle the Edit distance problem

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @file @author @brief

Declarations

#include < l i s t > #include <vector > #include " e d i t _ d i s t a n c e _ t r a n s . h" struct _ed_table_cell { unsigned int weight ;
};
// / // /

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

Trans_types

t_type ;

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @class @brief Edit_Dist Declarations for Class to handle the Edit distance problem

template<class g e n e r i c > class Edit_Dist { public :


Edit_Dist ( ) ; Trans_types

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

unsigned int
( );

operator [ ] ( unsigned int ) ;


find_min_weight

std : : vector <g e n e r i c > & s t a r t , std : : vector <g e n e r i c > & end

void c l e a r ( ) ; unsigned int get_min_weight ( ) ; void g e t _ t r a n s f o r m a t i o n s ( std : : vector <std : : s t r i n g > static bool apply_transformations ( const Transform_list &, const std : : vector <g e n e r i c > & in ,
std : : vector <g e n e r i c > out 20

&);

);

unsigned int it_min_weight ( const std : : vector <g e n e r i c > & s t a r t , const std : : vector <g e n e r i c > & end
);

unsigned int private : unsigned int


(

size ();

Transform_list min_trans_list ; _find_min_weight

);

unsigned int pos_i , std : : vector <g e n e r i c > & unsigned int pos_o
cell_weight _ed_table_cell t ,

std : : vector <g e n e r i c > & input , output ,

void
(

); };

const std : : vector <g e n e r i c > & const std : : vector <g e n e r i c > & unsigned int x , unsigned int y

input , output ,

Transform_list c u r r _ t r a n s _ l i s t ;

#include " e d i t _ d i s t a n c e . tpp " #endif


/ EDIT_DISTANCE_H /

21

edit_distance.tpp
// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / // / // / @file @author @brief edit_distance . tpp Aaron This Wilhelm implements the Edit Distance problem both iteratively and

recursively

#include <s t r i n g > #include <l i m i t s . h>


// / // / // / // / @fn @brief @pre @post Default None None

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / Edit_Dist constructor for the class doesn ' t really do anything

template<class
}

generic > Edit_Dist<g e n e r i c > : : Edit_Dist ( ) {

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / // / // / // / // / @fn @brief @pre @post @param @return operator [ ] The none returns index The the index transform optimal you want index th optimal transform

Which th

index

transform

template<class return
@brief

generic > Trans_types Edit_Dist<g e n e r i c >: : { min_trans_list . l i s t [ index ] ; }


// / // / // / // / // / // / @pre @post @param @fn find_min_weight This the None This calculates the finds vector the

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

operator [ ] ( unsigned int

index )

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

minimum sum into the

of

weights end

it

takes

to

transform

the

start

vector

minimum

weight

to

transform

start

into

end

v e c t o r <g e n e r i c >

start

22

// / // / // /

@param @return

v e c t o r <g e n e r i c > e n d The minimum start sum to of end weights that it takes to transform the

vector

template<class g e n e r i c > unsigned int Edit_Dist<g e n e r i c > : : find_min_weight


( ) { } std : : vector <g e n e r i c > & s t a r t , std : : vector <g e n e r i c > & end

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

return
@fn @brief

clear (); _find_min_weight ( s t a r t , 0 , end , 0 ) ;

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / // / // / // / // / // / // / @post @return @pre get_min_weight After that One the min weight has been calculated this can be used to get

weight of the find_min_weight empty get the ones ) zero most efficient list transformation list functions must be must be to called get with information

vectors else Get

( even

called

valuable

you ' l l the

just of

weight of the

Weight

transformation

template<class g e n e r i c > unsigned int Edit_Dist<g e n e r i c > : : get_min_weight ( ) { return min_trans_list . weight ( ) ;
}
// / // / // / // / // / // / // / // / // / // / // / // / // / @param @param @post @pre @fn @brief _find_min_weight This it is the be recursive spliting implementation the to problem up

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

of and

the only

edit

distance

problem

works

transforming

i n p u t [ pos_i . . s i z e ( ) ] In order for this to

o u t p u t [ pos_o . . s i z e ( ) ] a not correct been answer the (e . g . previous no copy move

yield have

transformations when The the input [ i ] minimum minimum !=

must

invalid

output to

[ j ]) input is into in is output is calculated and

weight

transform list

transformation input the

saved that input

min_trans_list started with that get is

v e c t o r <g e n e r i c > is turned into

the

vector (

output

string

doesn ' t

actually

modified unsigned int pos_i

23

// / // / // / // / // / // / // / // / // / // / // / // / // / // / // / proof : @return @param @param

The a

position

into of

the

input the min

vector , weight vector

this of

splits

the

problem .. end ]

into

subproblem

find

i n p u t [ pos_i the input

v e c t o r <g e n e r i c > is turned into int into of

output

the

that

vector

unsigned The a

pos_o the output the min vector , weight this of splits the problem .. into

position

subproblem minimum

find of

o u t p u t [ pos_i

size ()] to

The

weight

transforming

i n p u t [ pos_i . . s i z e ( ) ]

o u t p u t [ pos_o Since and It at it at each

.. size ()] step the does shown algorithm not that picks the optimal decision solution . made something possible

that can each makes

decision be

conflict the most the

with

global

optimal is

easily step sure

optimal

choice

because that

each

time

algorithm is the

attempts smallest

the

resulting

value

value

template<class g e n e r i c > unsigned int Edit_Dist<g e n e r i c > :: _find_min_weight


(

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

unsigned int pos_i , std : : vector <g e n e r i c > & unsigned int pos_o

check to see if at end

std : : vector <g e n e r i c > & input , output ,

unsigned int
/

curr , min = UINT_MAX;


/

if (

pos_i < input . s i z e ( ) | | pos_o < output . s i z e ( ) )

if (

Insert

pos_o < output . s i z e ( ) ) c u r r _ t r a n s _ l i s t . push_back (INSERT ) ; c u r r = INSERT_WEIGHT + _find_min_weight ( input , pos_i , output , pos_o +1); c u r r _ t r a n s _ l i s t . pop_back ( ) ; {

if (

c u r r < min ) min = c u r r ;

}
/

Copy

if (

pos_i < input . s i z e ( )

&& 24

) {

pos_o < output . s i z e ( ) && input [ pos_i ] == output [ pos_o ] c u r r _ t r a n s _ l i s t . push_back (COPY) ; c u r r = COPY_WEIGHT + _find_min_weight ( input , pos_i +1, output , pos_o +1); c u r r _ t r a n s _ l i s t . pop_back ( ) ; {

if (

c u r r < min ) min = c u r r ;

}
/

if (

Delete

pos_i < input . s i z e ( ) ) c u r r _ t r a n s _ l i s t . push_back (DELETE) ; c u r r = DELETE_WEIGHT + _find_min_weight ( input , pos_i +1, output , pos_o ) ; c u r r _ t r a n s _ l i s t . pop_back ( ) ;

{ }
/

if (

c u r r < min ) min = c u r r ;

if (

Twiddle

pos_i + 1 < input . s i z e ( ) && pos_o + 1 < output . s i z e ( ) && input [ pos_i ] == output [ pos_o+1] && output [ pos_o ] == input [ pos_i +1] c u r r _ t r a n s _ l i s t . push_back (TWIDDLE) ; c u r r = TWIDDLE_WEIGHT + _find_min_weight ( input , pos_i +2, output , pos_o +2); c u r r _ t r a n s _ l i s t . pop_back ( ) ;

{ }
at to /

if (

c u r r < min ) min = c u r r ;

}
end min

}
/

and

need

to

compare to see

weight if bettter

weight

list

else

25

{ }

if (

min_trans_list . s i z e ( ) == 0 ) min_trans_list = c u r r _ t r a n s _ l i s t ; min_trans_list . weight ( ) > c u r r _ t r a n s _ l i s t . weight ( ) )

} }

min_trans_list = c u r r _ t r a n s _ l i s t ; } min = 0 ; min ;

else if (

return

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / // / // / // / // / // / // / // / // / // / // / // / // / // / // / // / // / // / // / proof : @return @param @param @pre @post @fn @brief it_min_weight This it is the be iteraive spliting implementation the to problem up of and the edit distance problem

works

only

transforming

i n p u t [ pos_i . . s i z e ( ) ] None The the minimum minimum weight to

o u t p u t [ pos_o . . s i z e ( ) ]

transform list

input is

into in is

output

is

calculated

and

transformation input the

saved that input

min_trans_list started with that get is

v e c t o r <g e n e r i c > is turned into

the

vector (

output

string

doesn ' t

actually

modified v e c t o r <g e n e r i c > is The turned into weight of transforming i n p u t [ pos_i . . s i z e ( ) ] to output

the

vector

that

the

input

vector

minimum

o u t p u t [ pos_o Since and It at it at each

.. size ()] step the does shown algorithm not that picks the optimal decision solution . made something possible

that can each makes

decision be

conflict the most the

with

global

optimal is

easily step sure

optimal

choice

because that

each

time

algorithm is the

attempts smallest

the

resulting

value

value

template<class g e n e r i c > unsigned int Edit_Dist<g e n e r i c > : : it_min_weight ( const std : : vector <g e n e r i c > & s t a r t , const std : : vector <g e n e r i c > & end
) { 26

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

unsigned int r e t ; t a b l e = new _ed_table_cell [ s t a r t . s i z e ( ) + 1 ] ; for ( unsigned int i = 0 ; i < s t a r t . s i z e ()+1; i ++) { t a b l e [ i ] = new _ed_table_cell [ end . s i z e ( ) + 1 ] ;
// Create table

_ed_table_cell t a b l e = NULL;

// //

Fill

table test here

for ( unsigned int i = { for ( unsigned int


invariant

s t a r t . s i z e ( ) ; ; i )

would

check

that

table

is

empty

j = end . s i z e ( ) ; ; j )
here would is check loop condition properly a local minimum for x > i and y > j and being filled is

// // //

invariant that Could the

table

c e l l _ w e i g h t ( t a b l e , s t a r t , end , i , j ) ;

check

that

table [ x ][ y]

{ } { }
// // //

if (

j == 0 )

break ;

if (

i == 0 )

}
and that

break ;
here the check ! loop condition is true that post condition has the min table [0][0] weight

invariant

//

min_trans_list . l i s t . r e s e r v e ( ( ( s t a r t . s i z e ( ) < end . s i z e ( ) ) ? end . s i z e ( ) : s t a r t . s i z e ( ) ) ); min_trans_list . c l e a r ( ) ;

Find

min

transformation

list

// // //

invariant that the the

could value

be of

used in the

to

show

at at

each i , j

value is

of

and

table

truely

optimal

subproblem

27

for ( unsigned int


// Could

i = 0 , j = 0 ; i < s t a r t . s i z e ( ) | | j < end . s i z e ( ) ; )


that table [ x ][ y] is a local minimum for x > i and y > j

Trans_types a ; a = t a b l e [ i ] [ j ] . t_type ; min_trans_list . push_back ( a ) ; {

check

switch ( a ) case COPY:


++i ; ++j ;

case case case


}
// //

break ; break break

DELETE: ++i ; ; INSERT : ++j ; ; TWIDDLE: i +=2; j +=2; ;


could loop be used to check that correct sequence was chosen

};
and the !

break

invariance

r e t = t a b l e [ 0 ] [ 0 ] . weight ;

condition

{ }

for ( unsigned int i = 0 ; delete [ ] t a b l e [ i ] ;

i < s t a r t . s i z e ()+1; i ++)

}
// / // / // /

delete [ ] t a b l e ; return r e t ;
@fn @brief @pre cell_weight Calculate the table the must be filled in the positive pos_i and positive pos_o

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

28

// / // / // / // / // / // / // / // / // / // / // / // / // / // / // / @param @param @param @param @param @post

direction The in The minimum possible weight for the sub

p r o b l e m

will

be

saved

t [ p o s _ i ] [ pos_o ] table of minimum weights and corresponding that input is transformations with that get is

v e c t o r <g e n e r i c > is turned into

input the

the

vector (

started

output

string

doesn ' t

actually

modified v e c t o r <g e n e r i c > is turned into int into of pos_i the input the min vector , weight this of splits the problem .. end ] into output

the

vector

that

the

input

vector

unsigned The a

position

subproblem int

find

i n p u t [ pos_i

unsigned The a

pos_o into of the input the min vector , weight this of splits the problem .. end ] into

position

subproblem

find

o u t p u t [ pos_o

template<class g e n e r i c > void Edit_Dist<g e n e r i c > : : c e l l _ w e i g h t


(

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

) {

const std : : vector <g e n e r i c > & const std : : vector <g e n e r i c > & unsigned int pos_i , unsigned int pos_o unsigned int
/

_ed_table_cell t ,

input , output ,

curr , min = UINT_MAX; Trans_types curr_best_trans ;

if (

check

pos_i < input . s i z e ( ) | | pos_o < output . s i z e ( ) )

to

see

if

at

end

if (

Insert

pos_o < output . s i z e ( ) ) c u r r = INSERT_WEIGHT + t [ pos_i ] [ pos_o +1]. weight ; { }

if (

c u r r < min ) curr_best_trans = INSERT ; min = c u r r ;

}
/

if (

Copy

29

) {

pos_i < input . s i z e ( ) && pos_o < output . s i z e ( ) && input [ pos_i ] == output [ pos_o ] c u r r = COPY_WEIGHT + t [ pos_i +1][ pos_o +1]. weight ; { }

if (

c u r r < min ) curr_best_trans = COPY; min = c u r r ;

}
/

if (

Delete

pos_i < input . s i z e ( ) ) c u r r = DELETE_WEIGHT + t [ pos_i +1][ pos_o ] . weight ;

{ }

if (

c u r r < min ) curr_best_trans = DELETE; min = c u r r ;

}
/

if (

Twiddle

pos_i + 1 < input . s i z e ( ) && pos_o + 1 < output . s i z e ( ) && input [ pos_i ] == output [ pos_o+1] && output [ pos_o ] == input [ pos_i +1] c u r r = TWIDDLE_WEIGHT + t [ pos_i +2][ pos_o +2]. weight ;

{ }

if (

c u r r < min ) curr_best_trans = TWIDDLE; min = c u r r ;


finding to table

}
//

} }

t [ pos_i ] [ pos_o ] . weight = min ; t [ pos_i ] [ pos_o ] . t_type = curr_best_trans ;


std : : endl ;

Write

// s t d : : c e r r < < min < <

30

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / // / // / @fn @brief @pre @post clear clear none clears internal data internal data

template<class g e n e r i c > void Edit_Dist<g e n e r i c > : : c l e a r ( )


{ } min_trans_list . c l e a r ( ) ; curr_trans_list . clear ( ) ;

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / // / // / // / @fn @brief @pre @post @return size get None get size the of size of transformation list list the size of the transformation list

transformation

template<class g e n e r i c > unsigned int Edit_Dist<g e n e r i c > : : s i z e ( ) { return min_trans_list . s i z e ( ) ;


}

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

31

edit_distance_trans.h
// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / // / // / @file @author @brief edit_distance . h Aaron This of Wilhelm manages the transform list and weight sums for the list

transforms

#ifndef EDIT_DISTANCE_TRANS_H #define EDIT_DISTANCE_TRANS_H #include < l i s t > #include <vector > #define COPY_WEIGHT #define DELETE_WEIGHT #define INSERT_WEIGHT #define TWIDDLE_WEIGHT
/ Weights of

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

different

transforms

1 3 3 2

enum
/

Different

Trans_types

types

of

transformations

};
// / // / // / // / // / // / // /

COPY = 0 , DELETE, INSERT, TWIDDLE

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @fn @brief @param @pre @post @return get_weight get the none get the in the weight of a of a transformation unless an a is invalid the type weight of of a transformation that you want the weight of

transformation

weight which

transformation is returned

case

zero

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

inline unsigned int get_weight ( Trans_types switch ( a ) { case COPY: return COPY_WEIGHT; case DELETE: return DELETE_WEIGHT;
32

a)

};
// / // /

case INSERT : return INSERT_WEIGHT; case TWIDDLE: return TWIDDLE_WEIGHT; } return 0 ;


// Error @class @brief Transform_list Class to manage the list of tranformations

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

class Transform_list public : void void unsigned int unsigned int void
};

Transform_list ( ) ; Transform_list & = ( Transform_list & A) ; push_back ( Trans_types a ) ; pop_back ( ) ; weight ( ) ; size (); clear ();

operator

unsigned int
/

std : : vector <Trans_types> l i s t ; total_weight ;

EDIT_DISTANCE_TRANS_H

#endif

33

edit_distance_trans.cpp

#include
// / // / // / // / @fn @pre @post

" e d i t _ d i s t a n c e _ t r a n s . h"
Transform_list default nono none constructor

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

@brief

Transform_list : : Transform_list ( ) { total_weight = 0 ; }


// / // / // / // / // / // / @fn @brief @param @pre @post @return o p e r a t o r= copy A of the a list of

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

transformations want to copy from

list

you

none This this object object will contain the same data as A

Transform_list & Transform_list : : { total_weight = A. total_weight ; l i s t = A. l i s t ; ; }

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

operator=

( Transform_list & A)

return this
@brief

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / // / // / // / @fn push_back add a a transformation transformation to you the end of the to list the list

@param @pre @post

the

want

added

none transformation added to the end of the list

{ }

void

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

Transform_list : : push_back ( Trans_types a )

l i s t . push_back ( a ) ; total_weight += get_weight ( a ) ;

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / // / // / @fn @brief pop_back remove last transformation from list

34

// / // /

@pre @post

must last

not

be

empty removed from list

transformation

void

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

Transform_list : : pop_back ( )

}
// / // / // / // / // /

Trans_types end ; end = l i s t . back ( ) ; l i s t . pop_back ( ) ; total_weight = get_weight ( end ) ;

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / @fn @brief @pre @post @return size get none size size of of list the gotten list size of the list

unsigned int Transform_list : : s i z e ( ) { return l i s t . s i z e ( ) ;


}
// / // / // / // / // / @fn @brief @pre @post @return weight get none weight weight of ot list the weight of list

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

recieved list

unsigned int Transform_list : : weight ( ) { return total_weight ;


}
// / // / // / // / @fn @brief @pre @post clear clear none list cleared the list

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

{ }

void

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

Transform_list : : c l e a r ( )

total_weight = 0 ; l i s t . clear (); 35

Vous aimerez peut-être aussi