Académique Documents
Professionnel Documents
Culture Documents
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
Abstract:
Data which is schemaless, such as chemical compounds, can be efficiently modeled using graph structure. The project focuses on indexing the graph structure for similarity search. It includes an efficient indexing mechanism. The project achieves this by decomposing graphs into Adjacent Tree patterns. Using these Adjacent Tree patterns and the lower bound estimation of their edit distance we can perform filtering to obtain the candidate set of graphs for further similarity search. The project focuses on using a graph data set consisting of hydrocarbons for the same. In this way the project helps to implement a better and more optimized similarity search.
1. INTRODUCTION
Whether two graphs are similar to each other can be judged by the size of their maximum common subgraph, but only theoretically, since the sub graph isomorphism test has been proved to be an NP-Complete problem. Sequential searching from a large set of graphs introduces a huge computational cost. Due to this low efficiency of a sequential search, a filter-and verification method is usually employed to speed up the search efficiency of graph similarity matching over a graph set and an index on the graph set can be used to filter the graph set to reduce candidates. Indexing graphs using k-adjacent tree structure is an efficient method of indexing. However neglecting the C-H bonds we store it as follows:
Figure 2: Our Representation Also for decomposition of any given graph the value of k has been set to 1 for obtaining the adjacent sets. For the above compound, the following to indexes are available in the 1-ATS:
2. RELATED WORK
Cheng at al have proposed a nested inverted-index called FG-index to avoid candidate verification by exploiting frequent subgraphs and edges as indexing features. However, when encountering infrequent queries, the method performs poorly, as infrequent subgraphs are not incorporated into the FG-Index. Wang et al have proposed the technique for decomposing the graph into k-adjacent tree. However the lemma stated is ambiguous.
Figure 3(a)
3. RELATED WORK
Focus is on removing the redundancies in the k-adjacent tree method for indexing sparse graphs containing hydrocarbons. The large number of C-H bonds in a hydrocarbon is not considered. If the compound is C3H8, i.e., Propane its structure is: Volume 2, Issue 2 March April 2013
Figure 3(b) Figure 3: 1-ATS of our representation The frequency count for the indexes generated above is as follows:
Page 234
4. CONCLUSION
By decomposing the graphs into small pieces (1-ATs), and pairing-up these pieces, we evaluate the global similarity between them. In order to seek for a compromise between frequent-subgraph-based indexing methods and graph-decomposition-based indexing methods, we use the redundant subtree structure: 1-AT pattern for index construction. 1-AT records more structural information on each vertex than a normal graph-decomposition- based indexing method, and while maintaining the simple structure of tree. By calculating the number of common 1-ATs of two graphs, we can estimate the graph edit distance between them. This gives us a method for indexing and candidate filtering in a graph set for similarity matching.
Figure 4: A sample Query Graph Table 2: Frequency count of 1-ATS in Query Graph Index Structure Frequency Count CC1 1 CC1C1 2 CC1C1O2 1 CC1N1 1 Graph in dataset:
ACKNOWLEDGEMENT
We would like to express our sincere gratitude and appreciation to our project guide Prof. Rutuja Jadhav for the patience, guidance, help and for being our greatest source of information during this project. We also thank Prof. Kamlapur S., for providing time and helpful comments for our work.
References
[1] T.H. Cormen, Np Completeness, Introduction to Algorithms, W. Yu, ed., second ed., vol. 7, pp. 620630. China Machine Press, 2007. Page 235
Figure 5: A sample Graph from the dataset Volume 2, Issue 2 March April 2013
Arun Rajeevan is an U.G. student at KKWIEER, University of Pune. His areas of interest are neural networks, databases, digital signal processing.
Page 236