Indexing Sparse Graphs For Similarity Search

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
Indexing Sparse graphs for Similarity Search

Aditya Ojha1, Sagar Patil2, Arun Rajeevan3, Sourav Das4
1,2,3,4
K.K.Wagh College of Engineering & Research, Nashik, Maharashtra 422003, India
Abstract:
Data which is schemaless, such as chemical compounds, can be efficiently modeled using graph structure. The project focuses on indexing the graph structure for similarity search. It includes an efficient indexing mechanism. The project achieves this by decomposing graphs into Adjacent Tree patterns. Using these Adjacent Tree patterns and the lower bound estimation of their edit distance we can perform filtering to obtain the candidate set of graphs for further similarity search. The project focuses on using a graph data set consisting of hydrocarbons for the same. In this way the project helps to implement a better and more optimized similarity search.
Keywords: Graph indexing, Similarity search, Adjacent tree
Figure 1: Actual representation
1. INTRODUCTION
Whether two graphs are similar to each other can be judged by the size of their maximum common subgraph, but only theoretically, since the sub graph isomorphism test has been proved to be an NP-Complete problem. Sequential searching from a large set of graphs introduces a huge computational cost. Due to this low efficiency of a sequential search, a filter-and verification method is usually employed to speed up the search efficiency of graph similarity matching over a graph set and an index on the graph set can be used to filter the graph set to reduce candidates. Indexing graphs using k-adjacent tree structure is an efficient method of indexing. However neglecting the C-H bonds we store it as follows:
Figure 2: Our Representation Also for decomposition of any given graph the value of k has been set to 1 for obtaining the adjacent sets. For the above compound, the following to indexes are available in the 1-ATS:
2. RELATED WORK
Cheng at al have proposed a nested inverted-index called FG-index to avoid candidate verification by exploiting frequent subgraphs and edges as indexing features. However, when encountering infrequent queries, the method performs poorly, as infrequent subgraphs are not incorporated into the FG-Index. Wang et al have proposed the technique for decomposing the graph into k-adjacent tree. However the lemma stated is ambiguous.
Figure 3(a)
3. RELATED WORK
Focus is on removing the redundancies in the k-adjacent tree method for indexing sparse graphs containing hydrocarbons. The large number of C-H bonds in a hydrocarbon is not considered. If the compound is C3H8, i.e., Propane its structure is: Volume 2, Issue 2 March April 2013
Figure 3(b) Figure 3: 1-ATS of our representation The frequency count for the indexes generated above is as follows:
Page 234

Table 1: Frequency count Index Structure Frequency Count CC1 2 CC1C1 1 The graphs are evaluated for similarity on the basis of the following lemma: Edit Distance: The Graph Edit Distance between two graphs G1 and G2 is the minimum number of GEOs needed to transform G1 to a graph isomorphic to G2. The definition of edit distance of two graphs gives us a measurement to quantify the difference of two graphs. The GEO can be one of the following six operations: 1. Delete an edge from the graph. 2. Insert an edge between two disconnected vertices. 3. Delete an isolated vertex from the graph. 4. Insert an isolated vertex into the graph. 5. Change the label of a vertex. 6. Change the label of an edge. Consider the following two graphs: Query Graph: Figure 6: Block Diagram Here |=2 |V(Q)| = 7 Hence for graph edit distance of 3 both graphs would be similar. The following figure represents a basic block diagram for the proposed system. Table 3: Frequency Count of 1-ATS in sample graph Index Structure Frequency Count CC1O2 1 CC1C1 1 CC1N1 1
4. CONCLUSION
By decomposing the graphs into small pieces (1-ATs), and pairing-up these pieces, we evaluate the global similarity between them. In order to seek for a compromise between frequent-subgraph-based indexing methods and graph-decomposition-based indexing methods, we use the redundant subtree structure: 1-AT pattern for index construction. 1-AT records more structural information on each vertex than a normal graph-decomposition- based indexing method, and while maintaining the simple structure of tree. By calculating the number of common 1-ATs of two graphs, we can estimate the graph edit distance between them. This gives us a method for indexing and candidate filtering in a graph set for similarity matching.
Figure 4: A sample Query Graph Table 2: Frequency count of 1-ATS in Query Graph Index Structure Frequency Count CC1 1 CC1C1 2 CC1C1O2 1 CC1N1 1 Graph in dataset:
ACKNOWLEDGEMENT
We would like to express our sincere gratitude and appreciation to our project guide Prof. Rutuja Jadhav for the patience, guidance, help and for being our greatest source of information during this project. We also thank Prof. Kamlapur S., for providing time and helpful comments for our work.
References
[1] T.H. Cormen, Np Completeness, Introduction to Algorithms, W. Yu, ed., second ed., vol. 7, pp. 620630. China Machine Press, 2007. Page 235
Figure 5: A sample Graph from the dataset Volume 2, Issue 2 March April 2013

[2] Efficiently Indexing Large Sparse Graphs for Similarity Search Guoren Wang, Bin Wang, Xiaochun Yang, Member, IEEE Computer Society, and Ge Yu, Member, IEEE IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 3, MARCH 2012 [3] Data Structures and Algorithms in Java (2nd Edition) by Robert Lafore [4] Introductory Graph Theory by Gary Chartrand [5] Thinking in Java (4th Edition) Bruce Eckel [6] M. Kuramochi and G. Karypis, Frequent Subgraph Discovery, Proc. 2001 IEEE Intl Conf. Data Mining, pp. 313-320, 2001. [7] S. Sarawagi and A. Kirpal, Efficient Set Joins on Similarity Predicates, Proc. ACM SIGMOD, pp. 743-754, 2004. [8] D. Justice and A. Hero, A Binary Linear Programming Formulation of the Graph Edit Distance, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1200-1214, Aug. 2006. [9] O. Johansson, Graph Decomposition Using Node Labels, doctoral dissertation, Royal Inst. of Technology, 2001. [10] Y. Tian and J.M. Patel, Tale: A Tool for Approximate Large Graph Matching, Proc. 24th Intl Conf. Data Eng., pp. 963-972, 2008. [11] H. Jiang, H. Wang, P.S. Yu, and S. Zhou, Gstring: A Novel Approach for Efficient Search in Graph Databases, Proc. 23rd Intl Conf. Data Eng., pp. 566-575, 2007. [12] L. Zou, L. Chen, J.X. Yu, and Y. Lu, A Novel Spectral Coding in a Large Graph Database, Proc. 11th Intl Conf. Extending DatabaseTechnology, pp. 181-192, 2008. [13] D.W.Williams, J. Huan, and W. Wang, Graph Database Indexing Using Structured Graph Decomposition, Proc. 23rd Intl Conf. Data Eng., pp. 976-985, 2007. [19] D. Shasha, J.T.-L. Wang, and R. Giugno, Algorithmics and Applications of Tree and Graph Searching, Proc. 21st ACMSIGACT-SIGMODSIGART Symp. Principles of Database Systems, pp. 39-52, 2002. AUTHOR
Aditya Ojha is an U.G. student at KKWIEER, University of Pune. He is also an IBM Student Ambassador for TGMC. He is a member of CSI, his areas of interest are parallel and distributed systems, database theory, networking. Sagar Patil is an U.G. student at KKWIEER, University of Pune. He is a member of CSI. His areas of interest are graph theory, advanced operating systems, analysis of algorithms. Sourav Das is an U.G. student at KKWIEER, University of Pune. He is a member of CSI. His areas of interest are embedded software, P2P networks, data quality
Arun Rajeevan is an U.G. student at KKWIEER, University of Pune. His areas of interest are neural networks, databases, digital signal processing.
Volume 2, Issue 2 March April 2013
Page 236

Indexing Sparse Graphs For Similarity Search

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Indexing Sparse Graphs For Similarity Search

Transféré par

Droits d'auteur :

Formats disponibles

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Indexing Sparse graphs for Similarity Search

K.K.Wagh College of Engineering & Research, Nashik, Maharashtra 422003, India

Keywords: Graph indexing, Similarity search, Adjacent tree

Figure 1: Actual representation

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Volume 2, Issue 2 March April 2013

Vous aimerez peut-être aussi