Académique Documents
Professionnel Documents
Culture Documents
Contributions
Developed methods for: Identifying new genes Constructing evolutionary trees Comparing phylogenetic solution space
Biological Sequences
DNA (gene) RNA
protein
cgttaacaaagc...
Prof. Rushen Chahal
MAEKPKLH...
Main Tasks
Evolutionary Trees
time
Putting it together
new sequences related sequences relationships
Evolution
?
database
Prof. Rushen Chahal
Goal: Discover previously unknown genes Strategy: Design PCR primers for large set of known gene family members Unknown genes will (hopefully) be amplified
Prof. Rushen Chahal
Gene Family
herpesEC crnvHH2 cmvHH3
humfMLF humIL8 ratANG ratG10d bovLOR1 chkGPCR RBS11 humSSR1 gpPAF dogRDC1 ratODOR musdelto musP2u humC5a chkP2y ratBK2 humTHR ratRTA humMRG ratLH bovOP humMAS humEDG1 ratCGPCR ratNPYY1 ratPOT ratNK1 humACTH flyNK humMSH flyNPY musEP3 musGIR humTXA2 ratCCKA dogAd1 ratNTR musEP2 musTRH humD2 musGnRH dogCCKB humA2a musGRP ratV1a hamA1a bovETA hamB2 ratD1 hum5HT1a bovH1 humM1 Prof. Rushen Chahal
humRSC
Primers
Primer Group
herpesEC
crnvHH2 humRSC cmvHH3 humfMLF humIL8 ratG10d ratANG bovLOR1 chkGPCR RBS11 humSSR1 gpPAF dogRDC1 musdelto musP2u humC5a chkP2y ratBK2
ratODOR
?
ratLH bovOP humEDG1 ratCGPCR ratPOT humACTH humMSH musEP3 humTXA2 musEP2
?
ratCCKA dogAd1 humD2 humA2a hamA1a ratD1 hamB2 hum5HT1a bovH1 humM1 Prof. Rushen ratNPYY1 ratNK1 flyNK flyNPY musGIR ratNTR musTRH musGnRH musGRP ratV1a bovETA
dogCCKB
Chahal
Approaches
Exact algorithms:
exhaustive brute-force brutebranch-andbranch-and-bound
ProvablyProvably-good heuristics:
solution quality: log(# sequences) OPT
Sample Output
herpesEC
crnvHH2 humRSC cmvHH3 humfMLF humIL8 ratG10d ratANG bovLOR1 chkGPCR RBS11 humSSR1 gpPAF dogRDC1 musdelto musP2u humC5a chkP2y ratBK2
ratODOR
ratLH
bovOP
ratCCKA
dogCCKB
dogAd1 humD2 humA2a hamA1a ratD1 hamB2 hum5HT1a bovH1 humM1 Prof. Rushen
ratNPYY1 ratNK1 flyNK flyNPY musGIR ratNTR musTRH musGnRH musGRP ratV1a bovETA
Chahal
Evolution
tree cost
Previous Approaches
FitchFitch-Margoliash [1967] NeighborNeighbor-Joining [1987] QuartetQuartet-Puzzling [1997] SplitSplit-Decomposition [1995] PAUP [1998], PHYLIP [1993]
However .
Topologically distant solutions may exist
1 1 0.2561 3 4 6 0.2560 2 3 4 5
3 3 3
1 6
0.2562 4 5 2 3
NeighborNeighbor-Joining Method
1 2 3
5 4
1111 11 2222 22 3
5 4
1 2 3
55 5 55 44 4 44
1 2
5 4 3
1 4
5 2 3
3 4
5 1 2
1 2 3
5 4
3 4 2
Prof. Rushen Chahal
5 1
3 4 1
5 2
topological distance
number of solutions
4 3 2 1 0 0.001
exhaustive
Q D
50, 0 45, 5 25, 25 5, 45 0, 50
10
1 0.001
0.01
0.1
0.01
0.1
least-squares cost
least-squares cost
Prof. Rushen Chahal
K=20
K=100
LS
solution cost
10 -4 10 -5 10 -6 10 -7 10 -8
ME
K=20
K=100
Generate candidates:
O(K N2)
K
20 50 100 200 500
N=8
0.08 0.2 0.5 1.1 3.1
N=16 N=32
0.8 2.1 4.4 8.8 24.2 9.8 25.1 52.1 103.7 262.7
Select candidates:
O(K N2 (lg K + lg N))
Summary
Evolution
Refereed Publications
Pearson, W. R., Robins, G., and Zhang, T., Generalized NeighborNeighborJoining: More Reliable Phylogenetic Tree Reconstruction, to appear in Journal of Molecular Biology and Evolution. Pearson, W. R., Robins, G., Wrege, D. E., and Zhang, T., On the Primer Selection Problem for Polymerase Chain Reaction Experiments, Discrete and Applied Mathematics, Vol. 71, 1996, pp. 231231-246. Pearson, W. R., Robins, G., Wrege, D. E., and Zhang, T., A New Approach to Primer Selection in Polymerase Chain Reaction Experiments, Proc. International Conference on Intelligent Systems for Molecular Biology, Cambridge, England, July, 1995, pp. Prof. Rushen Chahal 285285-291.
Griffith, J., Robins, G., Salowe, J. S., and Zhang, T., Closing the Gap: NearNear-Optimal Steiner Trees in Polynomial Time, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. Computer13, No. 11, November 1994, pp. 1351-1365. 1351Barrera, T., Griffith, J., McKee, S. A., Robins, G., and Zhang, T., Toward a Steiner Engine: Enhanced Serial and Parallel Implementations of the Iterated 1-Steiner MRST Algorithm, Proc. Great Lakes Symposium 1on VLSI, Kalamazoo, MI, March 1993, pp. 90-94. 90Barrera, T., Griffith, J., Robins, G., and Zhang, T., Narrowing the Gap: NearNear-Optimal Steiner Trees in Polynomial Time, Proc. IEEE International ASIC Conference, Rochester, September 1993, pp. 878790. Prof. Rushen Chahal
Generalization
Generate Partial Solutions Evaluate Partial Solutions Select Partial Solutions
n-i & i Parsimony, Least-Squares Prefer distant trees
Future Work
Generalize GNJ
other optimality criteria other solution space sampling alternative topological distance metrics
Generalized Neighbor-Joining NeighborInput: a set of leaves S,the distance matrix over S Output: a set of possible phylogenetic trees for S 1. T = {t}, where t is the star-tree over S 2. Repeat T* n All next-step trees derived from T T n Select up to K trees from T* Until (all trees in T are fully resolved) 3. Output T