Académique Documents
Professionnel Documents
Culture Documents
4, 2011
ABSTRACT Clustering semantically related terms is crucial for many applications such as document categorization, and word sense disambiguation. However, automatically identifying semantically similar terms is challenging. We present a novel approach for automatically determining the degree of relatedness between terms to facilitate their subsequent clustering. Using the analogy of ensemble classifiers in Machine Learning, we combine multiple techniques like contextual similarity and semantic relatedness to boost the accuracy of our computations. Other research suggests that neglected conditions may be even more important than is generally indicated in the literature. I.LITERATURE SURVEY It is found that hundreds of bugs involving neglected conditions in code for major operating systems such as Linux and OpenBSD applied the semantic-graph differencing tool Dex to samples of patches to the Apache HTTP server and GCC C-compiler and found that 38 percent of the Apache patches and 44 percent of the GCC patches involved inserting conditional selection statements and that 31 percent of the Apache patches and 32 percent of the GCC patches involved altering existing if Conditions. Many neglected conditions can be prevented by the use of requirement elicitation and analysis techniques that are intended to ensure completeness of a requirement specification, such as viewpoint analysis. However, many other neglected conditions are not traceable to shortcomings of requirements engineering because they involve design or implementation issues that do not correspond directly to requirements. A familiar example is failing to check that a pointer or object reference is non-NULL before it is de-referenced to a function call and candidate rules and possible rule violations are identified automatically. As with Engler et al.s approach, candidate rules are identified by their frequency of occurrence and must be confirmed manually by developers. Our approach does consider semantically relevant constraints between elements of potential rules, in the form of enhanced program dependences .This paper extends it by employing EPDGs, presenting a new heuristic maximal frequent sub graph algorithm, and evaluating our approach on four open source projects not considered in [5]. PRELIMINARY STUDY A program dependence graph is a labeled directed graph that models dependences between the statements of a program or procedure. Two types of dependences are represented: A statement s1 is data dependent on a statement s2 if there is a variable x and a control flow path s2P s1 from s2 to s1 such that x is defined at s2, used at s1, and not redefined along the sub path P; s1 is control dependent on s2 if s2 is a branch predicate that directly controls whether or not s1 is executed. Because program dependence graphs capture the essential ordering constraints between program elements,
programming rules relating elements that need not be adjacent to one another in a program and need not appear in the same textual order wherever the rule occurs can be represented as sub graphs or, as we shall see, minors of program dependence graphs. Neglected conditions are an important but difficult-to-find class of software defects. This paper presents a novel approach for revealing neglected conditions that integrates static program analysis and advanced data mining techniques to discover implicit conditional rules in a code base and to discover rule violations that indicate neglected conditions. The approach requires the user to indicate minimal constraints on the context of the rules to be sought, We present a new approach to the detection of neglected conditions in software that builds upon the idea that vital clues about neglected conditions are often distributed throughout a project code base (or even multiple code bases). Our approach is intended to discover a wide variety of programming rules and violations of them without requiring developers to supply specific rule templates or checkers. Instead, developers indicate minimal constraints on the kind of rule violations they wish to find (e.g., any neglected condition or any neglected condition pertaining .In our work, we have employed the SDG generated by the Code Surfer static analysis tool. The SDG extends the program dependence graph representation for monolithic programs to incorporate collections of procedures. Each procedure is represented by a PDG and PDGs are augmented with special edges linking callers and cal lees. SDG edges can be classified into two overlapping sets of categories: 1) data dependence and control dependence edges and 2) inter procedural edges and intra procedural edges. We use the code shown in Fig. 1, from the openssl project to illustrate informally how a programming rule associated with the function U I process is mined and how a violation of the rule is detected by our approach. The first step of our approach is to create a dependence sphere with limited radius r for each call site node of U I process with the call site node as its center. The objective of this step is to extract the essential elements of rule instances. Initially, the dependence spheres contain only control and data dependences, but they are enhanced by adding SDDEs. SDDEs allow some semantic relationships to be modeled more precisely and provide some benefits of inter procedural analysis without incurring its cost. Each sphere is then reduced by removing those nodes whose occurrences in the set of spheres are infrequent. In our approach, a programming rule corresponds to a frequent graph minor of an SDG. To mine a frequent minor, instead of a frequent sub graph, our HMFSM algorithm is applied to NTCs of the reduced spheres (with 80 percent support) to find a maximal frequent sub graph. A discovered frequent graph minor is a candidate rule. The PDGs are enhanced by adding directed edges, called shared data dependence edges (SDDEs), between pairs of program elements that use the same variable definition and are connected by a control flow path. The resulting graphs are called enhanced PDGs (EPDGs).
October Issue
October Issue
October Issue