Domain Interaction Prediction Methods: ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 10??

Domain interaction prediction methods
Manoj Gajjarapu
Protein-protein interactions are requisite to most of the cellular mechanisms ranging from DNA replication, transcription, splicing, the assembly of cytoskeletal elements, the formation of multi-protein complexes, and translation to secretion, metabolism, and cell cycle control. Interpreting proteinprotein interactions is crucial for the investigation of intracellular signalling pathways, simulation of complex protein structures and for reaching insights into several biochemical processes. Recent progress on high-throughput experiments ensued in recognition of proteins interactions. A requisite to canvas these protein interactions is the prediction of domain interaction, the predicted domains facilitate in revealing the structural aspect of protein interactions thereby ultimately leading to their functional identication. A database should be maintained with the predicted domains to organise all the datasets. In this survey paper certain protein-protein interaction identication methods and domain-domain interaction prediction methods are reviewed along with the research on datasets of the domains that are obtained.
Contents
1 INTRODUCTION
1.1 2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 OVERVIEW OF RESEARCH
2.2
Methods To Predict Domain Interactions . . . . . . . . . . . . . . . 3 2.1.1 Inferring domain-domain interactions from protein-protein interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.2 An integrated approach to the prediction of domain-domain interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.3 Predicting domain-domain interactions using a parsimony approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.4 Predicting protein-protein interactions from protein domains using a set cover approach . . . . . . . . . . . . . . . . . . . . 5 2.1.5 Interrogating domain-domain interactions with parsimony based approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.6 Computational approaches to predict protein-protein and domaindomain interactions . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Analysis and Improvement of Existing Methods: . . . . . . . . . . . 8 2.2.1 Improving domain-based protein interaction prediction using biologically-signicant negative dataset . . . . . . . . . . . . 8 2.2.2 Analysis on multi-domain cooperation for predicting proteinprotein interactions . . . . . . . . . . . . . . . . . . . . . . . . 9
ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 10??.
2.3 2.2.3 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Datasets Developed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Comparative analysis and unication of domain-domain interaction networks . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Domine: a comprehensive collection of known and predicted domain-domain interactions . . . . . . . . . . . . . . . . . . . 2.3.3 Summery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 10 11 12
3 CONCLUDING COMMENTS 4 ANNOTATIONS

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 Bjrkholm et al. 2009 Deng et al. 2002 . . . Guimares et al. 2006 Guimares et al. 2008 Huang et al. 2007 . . . Jothi et al. 2008 . . . Lee et al. 2006 . . . . Li et al. 2006 . . . . . Wang et al. 2007 . . . Yellaboina et al. 2011 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12 12
12 13 14 15 16 17 17 18 19 19
1. 1.1
INTRODUCTION Introduction
Protein interactions are of biological importance because they direct a several cellular processes like metabolic pathways and immunological identication. The profusion of protein-protein interactions can also be involved in regulation and the control of dierent cellular processes. Almost all modications of proteins involve transient protein-protein interactions like signal transduction, which are restricted for a short period of time. Domains are portion of proteins that often are stable and are assigned a specic function to perform; therefore, proteins are assumed to interact as a result of their interacting domains. This survey is about existing techniques to predict the domain interactions, also to show the importance of domain prediction in identifying the proteins and further development of a domain database. Deng et al [2002] inferred domain interactions from the protein interactions from novel based approaches, several years later Jothi et al [2008] attempted to show computational approaches to identify the domain-domain interactions. With this high throughput experiments large data of predicted proteins and domains are obtained, hence there is a need for organizing and storing these datasets as a database in a devised manner. To serve this purpose Bjrkholm et al [2009] and Yellaboina et al [2011] developed unication of domain interaction networks and datasets of domains are developed respectively. The research papers for the survey were found using Google Scholar,ACM and IEEE. To provide an overview for serving the purpose of this survey 10 papers were selected from Google Scholar as the most crucial which are journal papers and have
ACM Journal Name, Vol. V, No. N, Month 20YY.
been annotated. The research paper of Deng et al [2002] appears to be the rst paper to dene and implement a novel approach of maximum likelihood estimate method to infer the domain interactions. Lee et al [2006] work presents the high condence domain interactions by calculating the probabilities. Guimares et al [2006] work presents the novel approach based on parsimonious later Guimares et al [2008] generalized the parsimonious approach. With the predicted domains the research on development of datasets is done by Bjrkholm et al [2009] and Yellaboina et al [2011]. The research papers on novel approaches are collaborated to form one section, in this section six papers are clustered based on the domain prediction methods, this section portrays various approaches in predicting the domains and proteins. The research papers on analysis and improvement of the domain prediction methods are grouped to form one section and the research papers on developed datasets are catalogued into one section in which a comprehensive discussion of databases and datasets of domains are illustrated.
2. OVERVIEW OF RESEARCH
The depiction of annotation is done in this section and it is further divided into three sub-parts. The novel approaches in predicting domain interactions are dealt in sub-section one. The subsection two depicts the databases that are developed from predicted domains and the last subsection portrays the analysis and improvement of the existing methods.
2.1 Methods To Predict Domain Interactions
In this section novel approaches for predicting domain interactions are dened and the experiments conducted by dierent authors and their approaches are discussed here Novel approaches for predicting domain interactions are dened in this section. Also the experiments conducted by dierent authors and their approaches are discussed here. 2.1.1 Inferring domain-domain interactions from protein-protein interactions. With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. Expression proling has been used to analyze gene functions, after the completion of the genome sequence of saccharomyces cerevisiae, budding yeast. Many researchers have undertaken the task to functionally analyze the yeast genome, comprises of ~6280 proteins of which roughly one-third do not have known functions. This allowed Deng et al [2002] to study large scale conserved patterns of proteins of interactions between the protein domains. This allowed the authors to study more on protein to analyze their functions. Deng et al [2002] applied maximum likelihood estimate method to infer interaction between pairs of domains and measured the accuracies of prediction at the
protein level. They implemented an EM algorithm recursively to derive domain interactions for the MLE method. Deng et al [2002] applied MLEM and estimated probabilities of interacting domains that are consistent with observed protein-protein interactions. The authors have taken into account of multiplicity of observations in the datasets. And they also used Association method as a novel base of approach. Deng et al [2002] claim to have achieved two sets of results depending on the associated method and the MLE method that they have conducted experiments on. In association method they claimed to have achieved the 55.5% specicity and 55.0% sensitivity by setting threshold at 0.65 using the combined datasets. While in the MLE method they got fn value greater than or equal to 0.64. Deng et al [2002] claim to have made a probabilistic model and MLE method to be robust such that they can incorporate various kinds of protein datasets. They claim that the prediction rate of their method is ~100 times better than that of random assignment in prediction of protein interactions in MIPS. 2.1.2 An integrated approach to the prediction of domain-domain interactions. With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. Lee et al [2006] focus on integrating multiple data sources from multiple species to predict high-condence domain interaction by calculating the probability of domain interaction of each species. Lee et al [2006] divided the experiments into steps. First they collected multiple datasets, then they investigated information on protein fusion and domain functions nally they applied Bayesian approach to integrate the data sources. Lee et al [2006] state that they have found the conserved domain interactions across multiple species, They claimed to have found the domains on dierent species. Lee et al [2006] claim that they have developed a new measure to score domaindomain interactions instead of using indirect ways such as validating re-inferred protein interactions. 2.1.3 Predicting domain-domain interactions using a parsimony approach. With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. Guimares et al [2006] refer to the work of Deng et al. [2002]. Many domain-domain interaction prediction methods tie the goal of predicting
domain interactions to predict the protein-protein interactions. Deng et al. [2002] dened the probabilistic model likely estimate to predict the domain-domain interactions, their expectation, maximization algorithm computes domain interaction probabilities that maximize the expectation of observing a given protein-protein network. Guimares et al [2006] have proposed a novel based approach to predict domaindomain interactions from protein-protein interaction network by applying the parsimonydriven explanation of network and the domain interactions are inferred using the linear programming optimization. Guimares et al [2006] have applied Parsimonious explanation method and then they formulated that with using linear programming optimization problem, where each potential domain-domain contract is a variable that can receive a value ranging from 0 and 1. This allowed the authors to handle the false positives in a novel way in the protein interactions. Guimares et al [2006] applied PE method on a protein-protein datasets comprising of 26,032 interactions underlying 11,403 proteins from organisms. The protein domains were annotated using pfam hidden markov model proles. The authors have claimed to achieve high scoring putative interactions and predicted interaction partners for the Ras and SNARE families of domains. And they have achieved PPV and sensitivity of their methods to be 75.3% and 76.9% respectively. Guimares et al [2006] claim that their method outperformed previous approaches by a considerable margin; the results indicate that the parsimony principle provides a correct approach for detecting domain-domain contacts. 2.1.4 Predicting protein-protein interactions from protein domains using a set cover approach. With the advancements of genomic technology and genome wide
analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. Huang et al [2007] refers to the work of Deng et al. [2002], Lee et al. [2006], Li et al [2006]. Deng et al. [2002] proposed his model for predicting the domain interaction which is based on the maximum likelihood estimate, whereas Lee at el [2006] extended this model and with his model the researchers are able to predict the domains with high condence value. Li at el [2006] proposed a domain-based classication method to predict protein-protein interactions using probabilities of putative interacting domain pairs derived from both experimentallydetermined interacting protein pairs and carefully-chosen non-interacting protein pairs.The existing high throughput experimental techniques assay protein-protein interactions, yet they do not provide any direct information on the interactions among domains Huang et al [2007] have introduced message passing algorithms by which domain interactions can be studied in a more detailed way. Huang et al [2007] have introduced a new powerful algorithm that infers the prediction problem, this algorithm is based on message passing in which input is given
as interacting map among the set of proteins and output is a list of probabilities of interaction between each proteins. This algorithm is applied on yeast dataset by cross validation. Huang et al [2007] claim that their algorithm performed better using the cross validation in comparison with the existing algorithms, their algorithm performed with the average accuracy values over 10-folds corresponding to the parameter values which minimized the Bethe free energy is 82% and the corresponding values of sensitivity and specicity are 79% and 85% respectively. Huang et al [2007] claim that their algorithm can be applied to large datasets with inferring the domain interactions from large scale protein-protein interaction data. 2.1.5 Interrogating domain-domain interactions with parsimony based approaches. With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. Guimares et al [2008] refer to the work of Shoemaker and Panchenko [2007], Lee et al. [2006]. Lee et al. [2006] improved the previous work by creating a method called as Integrated Bayesian method. This method estimates the likelihood of domain interaction based on a protein interaction network from dierent organisms and on the amount of biological evidence relating two domains as co-occurrence of domains in the same protein and existence of common GO terms at the functional levels. From the previous Parsimonious explanation method to predict domain-domain interaction the authors have introduced Generalized Parsimonious Explanation(GPE), which adjusts the granularity of domain denition to the granularity of the input dataset and permits domain interactions to have dierent costs. Guimares et al [2008] rst implemented generalized parsimonious explanation method which seeks smallest set of domains interactions that can explain all protein interactions in the network. Then they analyzed the role of co-occurring domains in mediating protein interactions, furthermore they conducted the analysis on topranked predictions and then they compared the results with other old traditional methods. Guimares et al [2008] claim that with the objective function employed in GPE allows assigning dierent costs to dierent types of interaction, to have used this feature to study the eect assigning a lower cost to domain pairs. Because of this low costs only about 23% of predicted domain interactions were between co-occurring domains. Guimares et al [2008] claim that the Generalized Parsimonious Explanation approach provides a new means to predict and study domain-domain interactions and claimed that the mediating domains exhibits a signicant deviation of properties of domain interactions mediating interactions in the network.
2.1.6 Computational approaches to predict protein-protein and domain-domain interactions. With the advancements of genomic technology and genome wide anal-
ysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. The idea is create a tool that combines the primary sequences, domain annotations and structural annotations of the proteins. They also to introduce a new algorithm to predict Domain interactions in a given pair of query proteins. Jothi et al [2008] have tested the GAIA against the gold standard positive and negative protein-protein interaction datasets by setting the length of n-gram to 4 and the threshold of domain-domain interaction hits to 8.3. Jothi et al [2008] claimed that when their algorithm was tested against the gold standard data set they achieved about 82% true positive rate and 21% of false positive rate. They also claim to have identied a list of 4 gram pairs that signicantly over-represented in the DDI dataset and many mediate protein-protein interactions. Jothi et al [2008] claim that their results show the localization of interacting hotspots and provide testable hypotheses for experimental validation. Complemented with other prediction methods, this study will allow us to elucidate the interactomes of cells.
2.1.7
Year 2002
Summary
Authors Deng et al Title Inferring domain-domain interactions from protein-protein interactions An integrated approach to the prediction of domain-domain interactions Predicting domain-domain interactions using a parsimony approach Predicting protein-protein interactions from protein domains using a set cover approach Interrogating domain-domain interactions with parsimony based approaches Computational approaches to predict protein-protein and domain-domain interactions Table I. 2.1 Papers Referred to None Major Contribution Developed probabilistic MLE method Developed measures to score Domain-Domain interactions Developed parsimony approach and claimed that it outperformed previous methods The authors developed a message passing algorithm to infer protein problems The authors have analysed the role of co-occurring domains using parsimonious explanation method Developed a tool that shows localization of hotsopts
2006
Lee et al
None
2006
Guimares et al
Deng et al [2002] Deng et al [2002], Lee et al [2006], Li et al [2006] Lee et al [2006], Shoemaker et al [2007], None
2007
Huang et al
2008
Guimares et al
2008
Jothi et al
2.2
Analysis and Improvement of Existing Methods:
A detailed analysis is provided for the existing methods and further improvement of them is described. 2.2.1 Improving domain-based protein interaction prediction using biologicallysignicant negative dataset. With the advancements of genomic technology and
genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. Li et al [2006] refers to the work of Deng et al. [2002]
A few domain-based interaction detection techniques have recently been proposed. Deng et al. [2002] described maximum likelihood estimation technique to infer domain-domain interactions that was then used to predict protein interactions. Li et al [2006] proposed a domain-based classication method to predict proteinprotein interactions using probabilities of putative interacting domain pairs derived from both experimentally determined interacting protein pairs and carefully-chosen non-interacting protein pairs. The idea is to use biologically signicant negative data to predict the domain interactions. Li et al [2006] claim to have developed some methods and algorithms to preprocess biological annotations and to generate negative set, for generating the negative set they made an algorithm. Li et al [2006] claim that for the yeast dataset they infer domain-domain interactions from both positive set and the negative set and achieved lowest of 56.00 specicity and 84.36 sensitivity respectively on dierent dataset. Li et al [2006] claim that their experimental results on multiple species shows probabilistic approach is eective and outperforms other similar domain-based techniques for protein interaction prediction. 2.2.2 Analysis on multi-domain cooperation for predicting protein-protein interactions. With the advancements of genomic technology and genome wide analysis
of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. Wang et al [2007] refer to the work of Guimares et al. [2006], Lee et al. [2006]. A number of computational algorithms have been developed to infer proteinprotein interactions, such as methods based on gene fusion, phylogenetic prole, and protein structure and domain information. In particular, inferring protein-protein interactions based on domain information, and associated method, probabilistic method. The SVM- based methods have attracted much attention due to its clear biological implication and simplicity. Wang et al [2007] have focussed to identify the cooperative domain for protein interactions by extending two-domain interactions to multi-domain interactions. Wang et al [2007] developed linear programming with multi-domain pairs and an associated probabilistic method with multi-domain pairs(APMM). Wang et al [2007] claim that they have found cooperative domains eectively with higher accuracy for predicting protein interactions than existing methods. Wang et al [2007] claim that from a computational viewpoint, this paper gives a general framework to predict protein interactions in a more accurate manner by considering the information of both multi-domains and multiple organisms, which can also be applied to identify cooperative domains, to reconstruct large complexes and further to annotate functions of domains.
10
2.2.3
Summery
Authors Li et al Title Improving domain-based protein interaction prediction using biologically-signicant negative dataset Analysis on multi-domain cooperation for predicting protein-protein interactions Papers referred to Deng et al [2002] Major contribution Developed algorithm for pre-processing biological annotations
Year 2006
2007
Wang et al
Guimares et al [2006], Lee at el [2006]
Developed a framework to predict protein interactions by considering multidomains and multiple organisms.
Table II. 2.2
2.3
Datasets Developed
After the development of prediction approaches and prediction, the data obtained should be collected in an organized way for certain techniques utilized. This section deals with the concept mentioned in the earlier sections. 2.3.1 Comparative analysis and unication of domain-domain interaction networks. With the advancements of genomic technology and genome wide analysis of
organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. Bjrkholm et al [2009] refer to the work of Deng et al. [2002],GuimarAes et al. [2006],Lee at el [2006]. Bjrkholm et al [2009] state that Deng et al. [2002] proposed their model for predicting the domain interactions which are based on the maximum likelihood estimate whereas Lee at el [2006] extended this model and with their model with which the researchers are able to predict the domains with high condence value. Bjrkholm et al [2009] state that GuimarAes et al. [2006] proposed their Parsimonious Explanation(GPE), which adjusts the granularity of domain denition to the granularity of the input dataset and permits domain interactions to have dierent costs. The authors have done the analysis on coverage and quality of the existing resources as well as the extent of the protein overlap. They conducted experiments to merge individual domains interaction networks to construct a comprehensive and reliable database. Bjrkholm et al [2009] claimed that they introduced a new approach towards comparing domain-domain interaction network which is used to
11
compare nine predicted domain and protein interaction network which is used to generate a database of unied domain interactions. Bjrkholm et al [2009] successfully able to generate the unied dataset which is scored according to the benchmarked reliability. And they got the result that kinase domain interacted with 480 other domains. Bjrkholm et al [2009] claimed that each interaction with the dataset they generated scored according to the benchmarked reliability, the performance of the network that they produced is an improved one compared to the underlying source networks and other composite resource like domine. 2.3.2 Domine: a comprehensive collection of known and predicted domain-domain interactions. With the advancements of genomic technology and genome wide anal-
ysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cures to diseases by developing the drug design. Yellaboina et al [2011] refer to the work of Deng et al. [2002],GuimarAes et al. [2006],Lee at el [2006]. Deng et al. [2002] proposed their model for predicting the domain interactions which is based on the maximum likelihood estimate whereas Lee at el [2006] extended this model and with their model the researchers are able to predict the domains with high condence value. GuimarAes et al. [2006] proposed their Parsimonious Explanation(GPE), which adjusts the granularity of domain denition to the granularity of the input dataset and permits domain interactions to have dierent costs. Yellaboina et al [2011] introduced Jaccard index to compare one method with the other method and to measure how well the set of predictions was computed based on the method by the results, which provided them high condence, low condence and medium condence values. The authors claimed that they introduced a collection of predicted interacting domain database system in which all the datasets are set on a single roof. Yellaboina et al [2011] have achieved to categorize the domains based on high condence, medium condence and low condence values and collected about 2989 high condence, 2537 low condence and 16,094 medium condence values respectively. Yellaboina et al [2011] claim that their collection of predicted database of domains which they named as DOMINE provide researchers an opportunity to get the interacting domain database at one single place
12
2.3.3
Summery
Authors Bjrkholm et al Title Comparative analysis and unication of domain-domain interaction networks Domine: a comprehensive collection of known and predicted domain-domain interactions Papers referred to Major contribution
Year 2009
2011
Yellaboina et al
Deng et al Developed an unied [2002], datasets that scored GuimarAes with benchmark et al. reliability. [2006],Lee at el [2006] Deng et al. Collected predicted [2002],GuimarAes datasets of domains et al. with scoring functions [2006],Lee at el [2006]
Table III. 2.3
3.
CONCLUDING COMMENTS
One of the remarkable medical applications for prediction for protein interactions is identication of cause and cure for diseases. Future research will further continue prediction of domain-domain interactions in a comprehensive manner which assists the researchers in prediction of the protein interactions. The study of structural aspect of protein interactions will help in the prediction and discovery of functions of those proteins. Improvisation of the methods like probabilistic putative interactions of domain pairs based on the negative data by Li et al [2006] can be used to predict domain interactions accurately, also to anticipate multi-domain cooperation done by Wang et al [2007]. There is a need for sophistication of tools that combine the primary sequences, domain annotations and structural annotations with high sensitivity Jothi et al [2008]. After the collection of predicted domain datasets there is also a call for improvisation of the database of domains based on their condence values obtained from the prediction Yellavoina et al [2011].
4. 4.1 ANNOTATIONS Bjrkholm et al. 2009
Citation: BJRKHOLM, P. AND SONNHAMMER, E. 2009.Comparative analysis and unication of domain-domain interaction networks. Bioinformatics 25, 22, 3020-3025. The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design.
13
Previous work by others referred to by the authors : The authors refer to the work of Deng et al. [2002],GuimarAes et al. [2006],Lee at el [2006]. Shortcomings of previous work : The authors state that Deng et al. [2002] proposed their model for predicting the domain interactions which are based on the maximum likelihood estimate whereas Lee at el [2006] extended this model and with their model with which the researchers are able to predict the domains with high condence value. The authors state that GuimarAes et al. [2006] proposed their Parsimonious Explanation(GPE), which adjusts the granularity of domain denition to the granularity of the input dataset and permits domain interactions to have dierent costs. Experiments and analysis conducted : The authors have done the analysis on coverage and quality of the existing resources as well as the extent of the protein overlap. They conducted experiments to merge individual domains interaction networks to construct a comprehensive and reliable database. The authors claimed that they introduced a new approach towards comparing domain-domain interaction network which is used to compare nine predicted domain and protein interaction network which is used to generate a database of unied domain interactions. Results that the authors claim to have achieved : The authors successfully able to generate the unied dataset which is scored according to the benchmarked reliability. And they got the result that kinase domain interacted with 480 other domains. Claims made by the authors : The authors claimed that each interaction with the dataset they generated scored according to the benchmarked reliability, the performance of the network that they produced is an improved one compared to the underlying source networks and other composite resource like domine.
4.2 Deng et al. 2002
Citation: DENG, M., MEHTA, S., SUN, F., and CHEN, T. 2002. Inferring domaindomain interactions from protein-protein interactions. Genome Research 12, 10, 1540-1548. The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design. Previous work by others referred to by the authors : The authors do not refer to any previous work exactly on this topic. Shortcomings of previous work : Expression proling has been used to analyze gene functions, after the completion of the genome sequence of saccharomyces cerevisiae, a budding yeast. Many researchers have undertaken the task to functionally analyze the yeast genome, comprises of ~6280 proteins of which roughly one-third do not have known functions. This allowed the authors to study large scale conserved patterns of proteins of interactions between the protein domains. This allowed the authors to study more on protein to analyze their functions. The new idea : The authors applied maximum likelihood estimate method to infer
14
interaction between pairs of domains and measured the accuracies of prediction at the protein level. They implemented an EM algorithm recursively to derive domain interactions for the MLE method. Experiments and analysis conducted : The authors Applied MLEM and estimated probabilities of interacting domains that are consistent with observed proteinprotein interactions. The authors have took into account of multiplicity of observations in the datasets. And they also used Association method as a novel base of approach. Results that the authors claim to have achieved : The authors claim to have achieved two sets of results depending on the associated method and the MLE method that they have conducted experiments on. In association method they claimed to have achieved the 55.5% specicity and 55.0% sensitivity by setting threshold at 0.65 using the combined datasets. while in the MLE method they got fn value greater than or equal to 0.64. Claims made by the authors : The authors to have made a probabilistic model and MLE method to be robust such that they can incorporate various kinds of protein datasets. They claim that the prediction rate of their method is ~100 times better than that of random assignment in prediction of protein interactions in MIPS.
4.3 Guimares et al. 2006
Citation: GUIMARAES, K., JOTHI, R., ZOTENKO, E., AND PRZYTYCKA, T. 2006. Predicting domain-domain interactions using a parsimony approach. Genome Biology 7, 11, R104 The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design. Previous work by others referred to by the authors : The authors refers to the work of Deng et al. [2002]. Shortcomings of previous work : Many domain-domain interaction prediction methods tie the goal of predicting domain interactions to predict the protein-protein interactions. Deng et al. [2002] dened the probabilistic model like maximum likely estimate to predict the domain-domain interactions, Their expectation maximization algorithm computes domain interaction probabilities that maximize the expectation of observing a given protein-protein network. The new idea : The authors have proposed a novel based approach to predict domain-domain interactions from protein-protein interaction network by applying the parsimony-driven explanation of network and the domain interactions are inferred using the linear programming optimization. Experiments and analysis conducted : The authors have applied Parsimonious explanation method and then they formulated that with using linear programming optimization problem, where each potential domain-domain contract is a variable that can receive a value ranging from o and 1. This allowed the authors to handle the false positives in a novel way in the protein interactions.
15
Results that the authors claim to have achieved : The authors applied PE method on a protein-protein datasets comprising of 26,032 interactions underlying 11,403 proteins from organisms. The protein domains were annotated using pfam hidden markov model proles. The authors have claimed to achieve high scoring putative interactions and predicted interaction partners for the Ras and SNARE families of domains. And they have achieved PPV and sensitivity of their methods to be 75.3% and 76.9% respectively. Claims made by the authors : The authors claim that their method outperformed previous approaches by a considerable margin, the results indicate that the parsimony principle provides a correct approach for detecting domain-domain contacts.
4.4 Guimares et al. 2008
Citation: GUIMARES, K. and PRZYTYCKA, T. 2008. Interrogating domaindomain interactions with parsimony based approaches. BMC Bioinformatics 9, 1, 171 The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design. Previous work by others referred to by the authors : The authors refers to the work of Shoemaker and Panchenko [2007], Lee et al. [2006]. Shortcomings of previous work : Lee et al. [2006] improved the previous work by creating a method called as Integrated Bayesian method. This method estimates the likelihood of domain interaction based on a protein interaction network from dierent organisms and on the amount of biological evidence relating two domains as co-occurrence of domains in the same protein and existence of common GO terms at the functional levels. The new idea : From the previous Parsimonious explanation method to predict domain-domain interaction the authors have introduced Generalized Parsimonious Explanation(GPE), which adjusts the granularity of domain denition to the granularity of the input dataset and permits domain interactions to have dierent costs. Experiments and analysis conducted : The authors rst implemented generalized parsimonious explanation method which seeks smallest set of domains interactions that can explain all protein interactions in the network. Then they analyzed the role of co-occurring domains in mediating protein interactions, furthermore they conducted the analysis on top-ranked predictions and then they compared the results with other old traditional methods. Results that the authors claim to have achieved : The authors claim that with the objective function employed in GPE allows assigning dierent costs to dierent types of interaction, The authors claim to have used this feature to study the eect assigning a lower cost to domain pairs. Because of this low costs only about 23% of predicted domain interactions were between co-occurring domains. Claims made by the authors : The authors claim that the Generalized Parsimonious Explanation approach provides a new means to predict and study domainACM Journal Name, Vol. V, No. N, Month 20YY.
16
domain interactions and claimed that the mediating domains exhibits a signicant deviation of properties of domain interactions mediating interactions in the network.
4.5
Huang et al. 2007
Citation: HUANG, C., MORCOS, F., KANAAN, S., WUCHTY, S., CHEN, D., and IZAGUIRRE, J. 2007. Predicting protein-protein interactions from protein domains using a set cover approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 4, 1, 78-87. The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design. Previous work by others referred to by the authors : The authors refers to the work of Deng et al. [2002], Lee et al. [2006], Li et al [2006]. Shortcomings of previous work : Deng et al. [2002] proposed his model for predicting the domain interactions which is based on the maximum likelihood estimate where as Lee at el [2006] extended this model and with his model the researchers are able to predict the domains with high condence value. Li at el [2006] proposed a domain-based classication method to predict protein-protein interactions using probabilities of putative interacting domain pairs derived from both experimentallydetermined interacting protein pairs and carefully-chosen non-interacting protein pairs. The existing high throughput experimental techniques assy protein-protein interactions, yet they do not provide any direct information on the interactions among domains The new idea : the authors have introduced message passing algorithms by which domain interactions can be studied in a more detailed way. Experiments and analysis conducted : The authors have introduced a new powerful algorithm that infers the prediction problem, this algorithm is based on message passing in which input is given as interacting map among the set of proteins and output is a list of probabilities of interaction between each proteins. This algorithm is applied on yeast dataset by cross validation. Results that the authors claim to have achieved : The authors claim that their algorithm performed better using the cross validation in comparison with the existing algorithms, their algorithm performed with the average accuracy values over 10-folds corresponding to the parameter values which minimized the Bethe free energy is 82% and the corresponding values of sensitivity and specicity are 79% and 85% respectively. Claims made by the authors : The authors claim that their algorithm can be applied to large datasets with inferring the domain interactions from large scale protein-protein interaction data.

4.6 Jothi et al. 2008
17
Citation: JOTHI, R. and PRZYTYCKA, T. 2008. Computational approaches to predict protein-protein and domain-domain interactions. Bioinformatics Algorithms, 465-491 The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design. Previous work by others referred to by the authors : The authors do not refer any previous work exactly on this topic. Shortcomings of previous work : No short comings of the previous work is noted as there is no previous work referred to. The new idea : The idea is create a tool that combine the primary sequences, domain annotations and structural annotations of the proteins. They also to introduce a new algorithm to predict Domain interactions in a given pair of query proteins. Experiments and analysis conducted : The authors have tested the GAIA against the gold standard positive and negative protein-protein interaction datasets by setting the length of n-gram to 4 and the threshold of domain-domain interaction hits to 8.3. Results that the authors claim to have achieved : The authors claimed that when their algorithm was tested against the gold standard data set they achieved about 82% true positive rate and 21% of false positive rate. They also claim to have identied a list of 4 gram pairs that signicantly over-represented in the DDI dataset and many mediate protein-protein interactions. Claims made by the authors : The authors claim that their results shows the localization of interacting hotspots and provide testable hypotheses for experimental validation. Complemented with other prediction methods, this study will allow us to elucidate the interactomes of cells.
4.7 Lee et al. 2006
Citation: LEE, H., DENG, M., SUN, F., and CHEN, T. 2006. An integrated approach to the prediction of domain-domain interactions. BMC Bioinformatics 7, 1, 269 The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design. Previous work by others referred to by the authors : The authors do not refer to any previous work exactly on this topic.
18
Shortcomings of previous work : The development of high-throughput technologies such as yeast two-hybrid assays has produced large scale protein interaction data sets for several species, and signicant eorts have been made to analyze them. Further studies on conserved protein complexes and functional modules can be found. The new idea : The authors focus on integrating multiple data sources from multiple species to predict high-condence domain interaction by calculating the probability of domain interaction of each species. Experiments and analysis conducted : The authors divided the experiments into steps. First they collected multiple datasets then they investigated information on protein fusion and domain functions then they applied Bayesian approach to integrate the data sources. Results that the authors claim to have achieved : The authors claimed to have found the conserved domain interactions across multiple species, They claimed to have found the domains on dierent species. Claims made by the authors : The authors claim that they have developed a new measure to score domain-domain interactions instead of using indirect ways such as validating re-inferred protein interactions.
4.8 Li et al. 2006
Citation: LI, X., TAN, S., and NG, S. 2006. Improving domain-based protein interaction prediction using biologically-signicant negative dataset. International Journal Of Data Mining And Bioinformatics 1, 2, 138-149 The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design. Previous work by others referred to by the authors : The authors refers to the work of Deng et al. [2002] Shortcomings of previous work : A few domain-based interaction detection techniques have recently been proposed. Deng et al. [2002] described maximum likelihood estimation technique to infer domain-domain interactions that was then used to predict protein interactions. The new idea : The authors proposed a domain-based classication method to predict protein-protein interactions using probabilities of putative interacting domain pairs derived from both experimentallydetermined interacting protein pairs and carefully-chosen non-interacting protein pairs. The idea is to use biologically signicant negative data to predict the domain interactions. Experiments and analysis conducted : The authors claim to have developed some methods and algorithms to pre-process biological annotations and to generate negative set, for generating the negative set they made an algorithm. Results that the authors claim to have achieved : The authors claim that for the yeast dataset they infer domain-domain interactions from both positive set and the negative set and achieved lowest of 56.00 specicity and 84.36 sensitivity
19
respectively on dierent dataset. Claims made by the authors : The authors claim that their experimental results on multiple species shows probabilistic approach is eective and outperforms other similar domain-based techniques for protein interaction prediction.
4.9 Wang et al. 2007
Citation: WANG, R., WANG, Y., WU, L., ZHANG, X., and CHEN, L. 2007. Analysis on multi-domain cooperation for predicting protein-protein interactions. BMC Bioinformatics 8, 1, 391 The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design. Previous work by others referred to by the authors : The authors refers to the work of Guimares et al. [2006], Lee et al. [2006]. Shortcomings of previous work : A number of computational algorithms have been developed to infer protein-protein interactions, such as methods based on gene fusion, phylogenetic prole, protein structure and domain information. In particular, inferring protein-protein interactions based on domain information, and associated method, probabilistic method. The SVM- based method have attracted much attention due to its clear biological implication and simplicity. The new idea : The authors have focussed to identify the cooperative domain for protein interactions by extending two-domain interactions to multi-domain interactions. Experiments and analysis conducted : The authors developed linear programming with multi-domain pairs and an associated probabilistic method with multi-domain pairs(APMM). Results that the authors claim to have achieved : The authors claim that they have found cooperative domains eectively with higher accuracy for predicting protein interactions than existing methods. Claims made by the authors : The authors claim that from a computational viewpoint, this paper gives a general framework to predict protein interactions in a more accurate manner by considering the information of both multidomains and multiple organisms, which can also be applied to identify cooperative domains, to reconstruct large complexes and further to annotate functions of domains.
4.10 Yellaboina et al. 2011
Citation: YELLABOINA, S., TASNEEM, A., ZAYKIN, D., RAGHAVACHARI, B., and JOTHI, R. 2011. Domine: a comprehensive collection of known and predicted domain-domain interactions. Nucleic Acids Research 39, suppl 1, D730 The problem which the authors addressed : With the advancements of genomic technology and genome wide analysis of organisms, more and more organisms are being studied extensively for gene expression on global scale. To evaluate them functional evolution is needed. To predict the protein-protein interactions domain
20
interaction prediction is needed as well as domains are said to be the functional units of proteins which may help in predicting protein-protein interactions. This allows researchers to nd cure to diseases by developing the drug design. Previous work by others referred to by the authors : The authors refers to the work of Deng et al. [2002], GuimarAes et al. [2006], Lee at el [2006]. Shortcomings of previous work : Deng et al. [2002] proposed their model for predicting the domain interactions which is based on the maximum likelihood estimate where as Lee at el [2006] extended this model and with their model the researchers are able to predict the domains with high condence value. GuimarAes et al. [2006] proposed their Parsimonious Explanation(GPE), which adjusts the granularity of domain denition to the granularity of the input dataset and permits domain interactions to have dierent costs. Experiments and analysis conducted: The authors introduced Jaccard index to compare one method with the other method and to measure how well the set of predictions was computed based on the method by the results they gave them high condence, low condence and medium condence values. The authors claimed that they introduced a collection of predicted interacting domain database system in which all the datasets are set on a single roof. The new idea : The idea is to create a domain database and rate them on their condence scores. Results that the authors claim to have achieved : They have achieved to categorize the domains based on high condence, medium condence and low condence values and collected about 2989 high condence, 2537 low condence and 16,094 medium condence values respectively. Claims made by the authors : The authors claim that their collection of predicted database of domains which they named as DOMINE provide researchers an opportunity to get the interacting domain database at one single place .
REFERENCES
2009. Comparative analysis and unication of domain domain interaction networks. Bioinformatics 25, 22, 30203025. Chen, L., Wang, R., and Zhang, X. 2009. Biomolecular networks: methods and applications in systems biology. Vol. 10. John Wiley & Sons Inc. Deng, M., Mehta, S., Sun, F., and Chen, T. 2002. Inferring domaindomain interactions from proteinprotein interactions. Genome Research 12, 10, 15401548. Guimares, K., Jothi, R., Zotenko, E., and Przytycka, T. 2006. Predicting domain-domain interactions using a parsimony approach. Genome Biology 7, 11, R104. Guimares, K. and Przytycka, T. 2008. Interrogating domain-domain interactions with parsimony based approaches. BMC Bioinformatics 9, 1, 171. Huang, C., Morcos, F., Kanaan, S., Wuchty, S., Chen, D., and Izaguirre, J. 2007. Predicting protein-protein interactions from protein domains using a set cover approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 4, 1, 78 87. Iqbal, M., Freitas, A., Johnson, C., and Vergassola, M. 2008. Message-passing algorithms for the prediction of protein domain interactions from proteinprotein interaction data. Bioinformatics 24, 18, 20642070. Jothi, R. and Przytycka, T. 2008. Computational approaches to predict proteinprotein and domaindomain interactions. Bioinformatics Algorithms , 465491. Kelvin, Z. and Francis, O. 2009. Gaia: a gram-based interaction analysis toolan approach for identifying interacting domains in yeast. BMC Bioinformatics 10.
Bjrkholm, P. and Sonnhammer, E.

Lee, H., Deng, M., Sun, F., and Chen, T.
21
2006. An integrated approach to the prediction of domain-domain interactions. BMC Bioinformatics 7, 1, 269. Li, X., Tan, S., and Ng, S. 2006. Improving domain-based protein interaction prediction using biologically-signicant negative dataset. International Journal Of Data Mining And Bioinformatics 1, 2, 138149. Qi, Y. and Noble, W. 2011. Protein interaction networks: Protein domain interaction and protein function prediction. Handbook of Statistical Bioinformatics , 427459. Raghavachari, B., Tasneem, A., Przytycka, T., and Jothi, R. 2008. Domine: a database of protein domain interactions. Nucleic Acids Research 36, suppl 1, D656D661. Schlicker, A., Huthmacher, C., Ramrez, F., Lengauer, T., and Albrecht, M. 2007. Functional evaluation of domaindomain interactions and human protein interaction networks. Bioinformatics 23, 7, 859. Schuster-Bckler, B. and Bateman, A. 2007. Reuse of structural domaindomain interactions in protein networks. BMC Bioinformatics 8, 1, 259. Shoemaker, B. and Panchenko, A. 2007. Deciphering proteinprotein interactions. part ii. computational methods to predict protein and domain interaction partners. PLoS Computational Biology 3, 4, e43. Singhal, M. and Resat, H. 2007. A domain-based approach to predict protein-protein interactions. BMC Bioinformatics 8, 1, 199. Wang, R., Wang, Y., Wu, L., Zhang, X., and Chen, L. 2007. Analysis on multi-domain cooperation for predicting protein-protein interactions. BMC Bioinformatics 8, 1, 391. Yellaboina, S., Tasneem, A., Zaykin, D., Raghavachari, B., and Jothi, R. 2011. Domine: a comprehensive collection of known and predicted domain-domain interactions. Nucleic Acids Research 39, suppl 1, D730. Zhao, X. and Chen, L. 2008. Domain-domain interaction identication with a feature selection approach. Pattern Recognition in Bioinformatics , 178186.

Domain Interaction Prediction Methods: ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 10??

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Domain Interaction Prediction Methods: ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 10??

Transféré par

Droits d'auteur :

Formats disponibles

Domain interaction prediction methods

3 CONCLUDING COMMENTS 4 ANNOTATIONS

ACM Journal Name, Vol. V, No. N, Month 20YY.

Analysis and Improvement of Existing Methods:

ACM Journal Name, Vol. V, No. N, Month 20YY.

Guimares et al [2006], Lee at el [2006]

Table II. 2.2

ACM Journal Name, Vol. V, No. N, Month 20YY.

Table III. 2.3

Huang et al. 2007

ACM Journal Name, Vol. V, No. N, Month 20YY.

Vous aimerez peut-être aussi