Vous êtes sur la page 1sur 3

Jimmy Mu

Victor Shih
CS100 DNA README

Evaluation:

1. A.
The algorithm in SimpleStrand.cutAndSplice is O(N) based on the data gathered from
the results of DNABenchMark. We ran the test various times and graphed the Splice
length vs Time. Though the data seems to show that at lower splice lengths, the time
is not correlated to the lengths, it is evident that at larger splice lengths, the time
taken is very heavily dependent on splice length. The reason for this may be that at
lower splice lengths, the code runs so quickly through cut and splice method that it
does not account for much of the runtime, and the time shown is instead due to other
areas in the code. However, at large splice lengths, the code runs through the cut
and splice portion for a much longer time. Based on our data gathered from various
test runs of DNABenchMark using ecoli.dat, we arrived at a linear relationship for
Splice length vs Time that had a trend line equation: Time = 4*10-7(Length) + 0.009.
The R-squared value for this equation is 0.9966. The graph of this follows.

B.
At the –xmx 512 heap-size specification, my computer could compute splice strings
up to 65,536 characters in length using the ecoli.dat file and DNABenchMark.java. I
reached this conclusion by configuring the DNABenchMark program to run at –
xmx512M and recorded the output. The final splice string at 65,536 characters took
0.585 seconds to run through. At –xmx1024M, the final splice strand doubled:
131,072 and the time to run also doubled roughly: 0.969 seconds. At –xmx2048M,
the final splice strand doubled again: 262,144, and the time to run doubled roughly
again: 2.199 seconds.
2. The code for our LinkSplice program is included in the submission.

3. The implementation of LinkSplice allows the program to run at a function of O(B)


where B is the number of breaks. To test this out, I used the ecoli.dat as a base
program and created doublecoli.dat, quadcoli.dat, and octacoli.dat. Each of these dat
files were created by copy pasting two copies of the previous file, so ecoli had 646
breaks, doublecoli had 1292 breaks, quadcoli had 2584 breaks, and octacoli had
5168 breaks. By running each of these through DNABenchMark:
ecoli : 0.040 seconds
doublecoli: 0.083 seconds
quadcoli: 0.162 seconds
octacoli: 0.324 seconds
Even without a graph It is evident that the function O(B) will be linear. Every time the
number of breaks doubles, the time taken roughly doubles as well. To be sure, when
put through an excel graph, the trend line equation given is Time = 6*10-5(Breaks) +
0.00007 with an R-squared value of 0.9999. The graph is shown below.

Reflection:

This lab was a very informative lab that allowed me to understand linked lists much
better, specifically the node method that we used in the programming. It also furthered my
understanding for how to debug a program using J-list as well as how to configure heap sizes
through eclipse. My partner was very helpful and contributed to a very large amount of work
in this project; together we were able to finish this lab relatively quickly.

Time: ~7 hours
Help:

TA Sander

Vous aimerez peut-être aussi