Académique Documents
Professionnel Documents
Culture Documents
Master Thesis
Dipl.-Ing. Oleksandr Panchenko
Matr.Nr. 724084
Mentors: Dr.-Ing. Bernhard Grne, Hasso-Plattner-Institute for IT Systems Engineering Dr. Albert Becker, SAP AG, Systems Applications Products in Data Processing
Abstract
The handover of software from development to the support department is accompanied by many tests and checks, which prove the maturity and the readiness for go to market. However, these quality gates are not able to assess the complexity of the entire product and predict the effort of maintenance. This work aims the researching of metric-based quality indicators in order to be able to assess the most important maintainability aspects of the standard software. The static source code analysis is selected as the method for mining information about the complexity. The research of this thesis is restricted to the ABAP and Java environment. The used quality model is derived from the Goal Question Metric approach and extends it for purposes of the current thesis. After literature research, the quality model was expanded by standard metrics and some special newly invented metrics. The selected metrics were validated theoretically against numerical properties using Zuses software measurement framework and practically against the ability to predict the maintainability using experiments. After experiments with several SAP-projects, some metrics were recognized as reliable indicators of the maintainability. Some other metrics can be used to find non-maintainable code and provide additional metric-based audits. For semiautomated analysis, few tools were suggested and an XSLT converter was developed in order to process the measurement data and prepare reports. This thesis should prepare the basis for further implementation and usage of the metrics.
Zusammenfassung
Vor der Softwarebergabe von der Entwicklung zur Wartung werden zahlreiche Tests und Untersuchungen durchgefhrt, die berprfen sollen, ob das Produkt bereits reif genug ist, um an den Markt zu gehen. Obwohl die Qualittskontrollen sehr umfangreich sind, wurden die gesamte Softwarekomplexitt und der Aufwand fr die zuknftige Wartung bisher kaum bercksichtigt. Deshalb setzt sich die vorliegende Arbeit zum Ziel, die verschiedenen auf Metriken basierten Qualittsindikatoren, die die wichtigsten Aspekte der Wartbarkeit von Standardsoftware einschtzen, zu untersuchen. Als Komplexittsanalysemethode wurde die statische Quellcodeanalyse ausgewhlt. Die Untersuchung ist auf die ABAP- und Java-Umgebung beschrnkt. Das Qualittsmodell ist von der Goal Question Metric - Methode abgeleitet und auf die Anforderungen der vorliegenden Arbeit angepasst. Nach ausfhrlicher Literaturrecherche wurde das Qualittsmodell um bereits vorhandene und neu entwickelte Metriken erweitert. Die numerischen Eigenschaften der ausgewhlten Metriken wurden mit Hilfe des Messsystems von Zuse theoretisch validiert. Um die Aussagefhigkeit von Metriken einzuschtzen, wurden praktische Studien durchgefhrt. Experimente mit ausgewhlten SAP-Projekten besttigten einige Metriken als zuverlssige Wartbarkeitsindikatoren. Andere Metriken knnen verwendet werden, um nicht wartbaren Programmcode zu finden und zustzliche auf Metriken basierte Audits zu liefern. Fr ein halbautomatisches Vorgehen wurden einige Werkzeuge ausgewhlt und zustzlich eine XSLT entwickelt, um Messdaten zu aggregieren und Berichte vorzubereiten. Die vorliegende Arbeit soll sowohl als Grundlage fr weitere Forschungen als auch fr zuknftige Implementierungen dienen.
Abbreviations
A ABAP AMC AST Ca CBO CDEm e COBISOME CLON CQM CR CYC D D2IMS DCD DCI DD DIT DOCU FP FPM GQM GVAR H I IF IMS In ISO KPI LC LCC LCOM LOC LOCm m
Abstractness Advanced Business Application Programming (Language) Average Method Complexity Abstract Syntax Tree Afferent Coupling Coupling between Objects Class Definition Entropy (modified) Efferent Coupling Complexity Based Independent Software Metrics Clonicity Code Quality Management Comments Rate Cyclic Dependencies Distance from Main Sequence Development to IMS Degree of Cohesion (direct) Degree of Cohesion (indirect) Defect Density Depth of Inheritance Tree Documentation Rate Function Points Functions Point Method Goal Question Metric Number of Global Variables Entropy Information Inheritance Factor Installed Base Maintenance & Support Instability International Standards Organization Key Performance Indicators Lack of Comments Loose Class Cohesion Lack of Cohesion of Methods Lines Of Code Average LOC in methods Structure Entropy
MCC MEDEA MI MTTM NAC NDC NOC NOD NOM NOO NOS OO-D PIL RFC SAP SMI TCC U UML XML XSLT V WMC ZCC
McCabe Cyclomatic Complexity Metric Definition Approach Maintainability Index Mean Time To Maintain Number of Ancestor Classes Number of Descendent Classes Number Of Children in inheritance tree Number Of Developers Number of Methods Number Of Objects Number of Statements OO-Degree Product Innovation Lifecycle Response For a Class Systems, Applications and Products in Data Processing Software Maturity Index Tight Class Cohesion Reuse Factor Unified Modeling Language eXtensible Markup Language eXtensible Stylesheet Language Transformation Halstead volume Weighted Methods per Class ZIP-Coefficient of Compression
Table of content
1. Introduction . . . . . . . . . 2. Research problem description . . . . . . . Different methods for source analysis . . . . . Metrics vs. audits . . . . . . . Classification of the metrics . . . . . . Types of maintenance . . . . . . . Goal of the work . . . . . . . . 3. Related works and projects . . . . . . . Maintainability index (MI) . . . . . . . Functions point method (FPM) . . . . . . Key performance indicators (KPI) . . . . . Maintainability assessment . . . . . . Abstract syntax tree (AST) . . . . . . . Complexity based independent software metrics (COBISOME) . Kaizen . . . . . . . . . ISO/IEC 9126 quality model . . . . . . 4. Quality model goals and questions . . . . . . Goal Question Metric approach . . . . . . Quality model . . . . . . . . Size-dependent and quality-dependent metrics . . . 5. Software quality metrics overview . . . . . . Model: Lexical model . . . . . . . Metric: LOC Lines Of Code . . . . . Metric: CR Comments Rate, LC Lack of Comments . Metric: CLON Clonicity . . . . . . Short introduction into information theory and Metric: CDEm Class Definition Entropy (modified) . . . Model: Flow-graph . . . . . . . . Metric: MCC McCabe Cyclomatic Complexity . . Model: Inheritance hierarchy . . . . . . Metric: NAC Number of Ancestor Classes . . . Metric: NDC Number of Descendant Classes . . . Geometry of Inheritance Tree . . . . . Metric: IF Inheritance Factor . . . . . Model: Structure tree . . . . . . . Metric: CBO Coupling Between Objects . . . Metric: RFC Response For a Class . . . . Metric: m Structure entropy . . . . . Metric: LCOM Lack of Cohesion Of Methods . . . Metric: D Distance from main sequence . . . Metric: CYC Cyclic dependencies . . . . Metric: NOM Number Of Methods and WMC Weighted Methods per Class . . . . . . . 11 13 13 13 15 15 16 17 17 17 18 18 19 19 19 19 20 20 21 24 25 25 25 26 26 27 35 35 36 37 37 37 40 40 40 42 43 45 50 51 53 9
Model: Structure chart . . . . . Metric: FAN-IN and FAN-OUT . . . Metric: GVAR Number of Global Variables . Other models . . . . . . Metric: DOCU Documentation Rate . . Metric: OO-D OO-Degree . . . Metric: SMI Software Maturity Index . . Metric: NOD Number Of Developers . . Correlation between metrics . . . . Metrics selected for further investigation . . Size-dependent metrics and additional metrics . 6. Theoretical validation of the selected metrics . . Problem of misinterpretation of metrics . . . Types of scale . . . . . . Types of metrics . . . . . . Conversion of the metrics . . . . . Other desirable properties of the metrics . . Visualization . . . . . . . 7. Tools . . . . . . . . ABAP-tools . . . . . . . Transaction SE28 . . . . . Z_ASSESSMENT . . . . . CheckMan, CodeInspector. . . . AUDITOR . . . . . . Java-tools . . . . . . . Borland Together Developer 2006 for Eclipse . Code Quality Management (CQM) . . CloneAnalyzer . . . . . Tools for dependencies analyze . . . JLin . . . . . . . Free tools: Metrics and JMetrics . . . Framework for GQM-approach . . . . 8. Results . . . . . . . . Overview of the code examples to be analyzed . . Experiments . . . . . . . Admissible values for the metrics . . . Interpretation of the results . . . . Measurement procedure . . . . . 9. Conclusion . . . . . . . 10. Outlook . . . . . . . . References . . . . . . . . Appendix . . . . . . . . 10
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53 54 54 55 55 55 55 56 56 57 59 60 60 61 62 64 67 67 70 70 70 70 70 71 71 71 72 72 72 73 73 74 75 75 76 84 85 85 88 90 92 97
1. Introduction
The Product Innovation Lifecycle (PIL) of SAP is divided into five phases with a set of milestones. A brief overview of the PIL can be seen in figure 1.1. Consider the milestone (so called Quality-Gate) Development to IMS (D2IMS) in more details. The Installedbased Maintenance and Support Department (IMS) gets the next release through such a Quality-Gate with the start of the Main Stream phase and will support it for the rest of its lifecycle. The Quality-Gate is a formal decision to hand over the release responsibility to IMS and is based on a readiness check, which proves the quality of the release by a set of checks. However, this check aims to establish the overall quality and absence of errors and is not intended for determination of how easy is it to maintain the release. For correct planning of resources for the maintenance the IMS needs additional information about those attributes of software that impact maintainability. This information will not influence the decision of the Quality-Gate, but will help IMS developers in the planning of resources and analyzing of the release. Such information can also support the code reviews and allows earlier feedback to the development. This thesis aims at filling this gap by providing of a set of indicators, which describe the release from the viewpoint of its maintainability, and provides instructions how these should be interpreted. The second goal is a set of indicators, which can find the badly maintainable code. Detailed descriptions of the PIL concept can be found at SAPNet Quicklink /pil or in [SAP04].
Figure 1.1: Goals of Quality-Gate Development to IMS. [SAP05 p.43] With maintainability is meant the attribute for how easy and rapidly the process of maintaining the software is. The high maintainability means smooth and well 11
structured software, which can be easily maintained. Other definitions of the maintainability like likelihood of errors are out of scope of this work. On time of the Quality-Gate D2IMS, the product has already been completely developed and tested. Thus the complete source code is accessible. However the product is only about to go to market and no data about customer messages or errors is available. Consequently, only internal static properties of the software can be analyzed at this point of time. One way of approaching this problem is to investigate the dependency between the maintainability of the software and its design, with the goal to find the design properties that can be used as maintainability indicators. Since standard software is usually very large and no human analysis is possible, such finding should be taken by an automated device and must be objective. Thus only objective measures can be used. The subject of this thesis is the complexity of the software, which often leads to the badly maintainable code. Metrics provide a mathematical fashion for a purposeful describing of certain properties of the objective. After comprehension of the maintainability basis and finding the design peculiarities which impact the maintainability, this thesis proposes a way to describe these properties of the design using the metrics. Consequently, several selected metrics should be able to indicate the most important aspects of maintainability, and the overall quality of the software. Moreover it is commonly accepted that a bad code or lack of design is much easier to discover than a good code. Therefore it should not be a big challenge to find code that could cause problems for maintenance. All in all the solution of this task allows: deep understanding of the essence of the maintainability and its factors, estimating the quality of the product from viewpoint of the maintainability, appropriate planning the resources for the maintenance and providing the earlier feedback to the development. A more detailed problem description and the goals of this thesis are presented in chapter 2. This thesis is composed in the following way: chapter 3 gives an overview of the related work. The quality model, which is used to determine the essence of the maintainability, is discussed in chapter 4. Chapter 5 provides short descriptions of the metrics candidates for extending the quality model. Chapter 6 supplements metrics descriptions with theoretical validation. Tools, which can be used for the software measurement, are discussed in chapter 7. Experiments and results are discussed in chapter 8. Conclusions are given in chapter 9 and a short outlook in chapter 10 finishes this thesis.
12
homomorphic mapping is presented in figure 2.1. More to the theoretic framework of the software measurement see [ZUSE98, in particular p.p. 103-130]. An example of metric is LOC (Lines Of Code), this metric preserves relation Analyzability, since smaller programs are in common easier to understand than larger programs. Other metric like NOC (Number Of Children in inheritance tree) preserves relation Changeability, since class with many sub-classes is more difficult to change, than class with only few or no sub-classes.
Figure 2.1: Metric - Mapping between Empirical and Numerical objects which preserves all relations According to Zuses measurement framework [ZUSE98], the specifying of the metric includes the following steps: Identify attributes for real world entities Identify empirical relations for such attributes Identify numerical relations corresponding to each empirical relation Define mapping from real world entities to numbers Check that numerical relations preserve and are preserved by empirical relations In opposite to the metrics, audits are just verification of adherence to some rules or development standards. Usually audit is a simple calculation of violation of these rules or patterns. SAP uses a wide range of audit-based tools for ABAP: CHECKMAN, Code Inspector, Advanced Syntax Check, SLIN; and Java: JLin, Borland Together. Audits help developers find and fix the code potentially having errors and increase awareness of the quality in development. However audits are bad predictors of the maintainability, because though an application is conformant to the development standards, it could be poorly maintainable. Moreover, the audits give concrete recommendation to developers, but are not able to characterize the quality of the product in general. The second reason for rejecting of the audits is absence of the complexity analysis main part of the maintainability analysis. Further in this work only metrics will be considered. Approaches for finding of the appropriate metrics are discussed in chapter 4. Research of numerical properties of metrics is discussed in chapter 6. 14
Based on the metric definition, the following scenarios of usage are thinkable: Compare the certain attributes of two or several software systems Formally describe the certain attributes of the software Prediction. If a strong correlation between the metrics is found, the value of one metric can be predicted based on the values of another metric. For example if the relation between some complexity metric (product metric) and the fault probability (process metric) is found, one can predict the probability of a fault in certain module based on its complexity Keep track of evolution of the software. Comparing different versions of the same software allows drawing conclusions about the evolution and trend of the product
Types of Maintenance
There are three main types of maintenance (based on [RUTH]): Corrective making it right (also called repairs) o To correct residual faults: specification, design, implementation, documentation, or any other types of faults o Time consuming, because each repair must go through the full development lifecycle. 15
o On average, ~20% of the overall maintenance time (however at IMS reaches 60%, and with beginning of the Extended Maintenance even up to 100%) Adaptive making it different (functional changes) o Responses to changes in environment, in which the product operates o Changed hardware o Changed connecting software, e.g. new database system o Change data, e.g. new phone dialing codes o On average, ~20% of maintenance time (at IMS 30%, with the time 10%) Perfective making it better o Change software to improve it, usually requested by client o Add functionality o Improve efficiency for example performance (also called polishing) o Improve maintainability (also called preventative maintenance) o On average, ~60% of the maintenance time (at IMS only 10-20%) For the IMS most important and time consuming is the corrective maintenance. However, this thesis doesnt distinguish between special types of maintenance, because the general the process is the same for all types of maintenance. Nevertheless, the results of this analysis especially can be used for the planning of the preventative maintenance.
16
points predicts the development or maintenance effort for the whole application. Assumed that developer can daily implement certain number of FP on average, manager can predict number of developers and time needed. FPM is perfectly applicable at early project phases and allows predicting the development and maintenance effort when source code is not yet available. It also suits strong dataoriented concept of SAP applications. Nevertheless, in case of this work source code is already available and it could be difficult to conversely calculate the number of FPs, which were implemented. Especially, it could be difficult in case of the product, which has been bought from outside and no project or design documentation is available. To make matter worse, FP are subjective and dont suit requested objective model. Also, these measures were rather designed for cost estimations (before source code is available) than for the measurement. Thus in the best way one can collect information from source code directly, not using FPM as additional layer of abstraction. For readers, who are interested in FPM, the following sources are recommendable: [AHN03], [ABRA04b]
Maintainability Assessment
This project aims assessing of the maintainability of the SAP products shortly before handover to IMS. Thus the goal is nearly the same as with the current thesis. However the assessment chosen in this project is audit-based. Several aspects of the maintainability are inspected and the list with questions is prepared. Expert should analyze the product in manual manner, answer them and fill out a special form. After that the final conclusion about the maintainability can be automatically reported. Main drawbacks of the suggested method are the manual character of the assessment and only one single resulting value, which is difficult to interpret. In this project also some 18
primitive metrics like lines of code and comments rate are suggested and the tool for supporting of these metrics is provided.
Kaizen
Objective of the project Kaizen is to analyze selected SAP code to understand it better and look for ways to continually improve it. Three possible objectives of the code improvement are: Improve readability and general maintainability Reduce cost of service enabling Enable future enhancements in functionality (when well understood) Kaizen will focus on objectives #1 and #2 as applicable to most SAP code. One of the first steps of the project is the analysis of the maintainability metrics.
19
Optional Tools level can be included into the model in order to show the tool assignment for the metrics The abstract example of the GQM model is illustrated in figure 4.1. A more detailed description of the GQM and step-by-step procedure of using it are described in [SOLI99]. GQM is useful because it facilitates identifying not only the precise measures required, but also the reasons why the data are being collected [PARK96, p. 53]. It is possible to range the impact of metrics on questions using weight coefficients, to make clear, which metric is more important. However, the used in this thesis model doesnt aim to describe, which weights the metrics have. Author believes that the best way is to give the analyst full freedom in his decision. The analyst can decide dependently on the situation, which indicator is more important.
Figure 4.1: The Goal Question Metric approach (abstract example) The measurement process using the GQM-approach includes four main steps: Definition of the top goal and the goal hierarchy Definition of the list of the questions, which explain the goals Selection of the appropriate metric set; theoretical and empirical analysis of each metric; selection of the measurement tools Collecting measurement data and interpreting of the results The first three steps are intended for the definition of the GQM-Quality model, the last step means the actually measurement and interpretation and can be repeated many times.
Quality Model
According to the GQM goal specification, the major goal for the maintainability purpose is: to assess (purpose) maintainability (quality issue) of standard software (object) from IMSs viewpoint (viewpoint) in order to manage it and find possible ways to improve it (purpose) in the ABAP and Java environment (context). The question for the major goal could be How easy is the location and fixing of an error in the software?, but this question is very vague and can only be answered with the process metrics like MTTM. As it was mentioned before, measuring such process metrics is only possible during the 21
maintenance and thus is inappropriate for purposes of this work. Lets call such goals external goals, because the degree of the goal achievement also depends on some external motive. The degree of achievement of internal goals depends only on internal properties of the software and hence can be described relatively early in the lifecycle. This major goal is highly complex and it is difficult to create appropriate questions for it, thus complex hierarchy of goals should be used including top goal, goals and subgoals. Moreover, on the bottom of the hierarchy should be placed internal goals only, so that questions will be addressed only to the internal goals. The goal hierarchy is depicted on figure 4.2, where blue boxes present external goals. Such decomposition allows sensible selection of questions and necessary granularity. The full model is presented in appendix B.
Figure 4.2: Mapping of external and internal goals The used quality model is based on several validated and acknowledged quality models as: ISO 9126 standard quality model, McCall quality model, software quality characteristics tree from Boehm and Fentons decomposition of maintainability. Corresponding parts of these models can be found in appendix A or in [KHAS04]. Several sub-goals and metrics also were taken from [MISR03]. After examination of these quality models, theoretical speculation and research of the literature in this field, the following areas (goals) were recognized as important for maintainability of the software: Maturity Clonicity Analyzability Changeability Testability The goals Maturity and Clonicity are described together with corresponding metrics in chapter 5 (see p.55 and p.26, correspondently). Next, the aspects Analyzability, Changeability and Testability are discussed. The Analyzability is probably the most important factor of the maintainability. Nearly all metrics used in the model are also presented in the Analyzability area. First, author 22
wanted to include also goal Localizing into the model, which would characterize how easy it is to localize (find) the fault in the software. But later it was found, that most of metrics for this goal are already included in Analyzability and Localizing was rejected from the model. The following sub-goals should be fulfilled in order to create the easy comprehensible software: Algorithm Complexity - Keeping easy the internal (algorithm) complexity Selfdescriptiveness - Providing of the appropriate internal documentation (naming conventions and comments) Modularity - Keeping the modules small and appropriate encapsulation of the functionality into the modules (cohesiveness) Structuredness - Proper organization of the modules in the entire structure Consistency - Keeping the development process easy and well organized. There are a lot of researches trying to determine whether the well organized development process leads to good quality of the product. However no evident relation was found. Nevertheless, the maintainer is sometimes confused if he sees that the module was changed many times by different developers. Consistency in this context means clear distribution of tasks between developers. For the changeability (or easiness of making changes in the software) it is important to have proper design of software, which allows the maintenance without side-effects. The quality model includes the goals Structuredness, Modularity and Packaging in this area. Whereas the structuredness has several different aspects: Coupling describes the connectivity between classes Cohesiveness describes the functional unity of a class Inheritance describes the properties of inheritance trees The Testability means easiness of the testing and the maintenance of test-cases. Bruntink in [BRUN04] investigates the testability from the perspective of unit testing and distinguishes between two categories of source code factors: factors that influence the number of test cases required testing the system (lets call the goal for these factors Value), and factors that influence the effort required to develop each individual test case (Lets call the goal for these factors Simplicity). Noteworthy, that with the Value the number of the necessary test-cases is meant and not a number of actually available test-cases. Consequently, for the high maintainability is important to keep the Value small. Nevertheless, most efforts in the field of the test coverage are concentrated on the low procedure level, for example the percentage of the tested statements within a class. The quality model includes several metrics for the testability validated in [BRUN04]. In the SAP system important part of the complexity is included in the parameters for customization, however the experts argue, that most of the customization complexity is already included in source code, where the parameters are read and processed. The impact of individual metrics on the maintainability is discussed in greater detail in chapter 5.
23
24
easily estimated by LOC. For example figure 5.14 depicts the correlation between LOC and WMC (Weighted Methods per Class).
identified as such. It may also be the case that they are not removable without a major refactoring of the code. This may, firstly, result in dead code, which is never executed, and, secondly, such code increases the cognitive load of future maintainers Larger sequences repeated multiple times within a single function make the code unreadable, hiding what is actually different in the mass of code. Code is then also likely to be on different levels of detail, slowing down the process of understanding If all copies are to be enhanced collectively at one point, the necessary enhancements may require varying measures in the cases where copies have evolved differently. As an extreme case, one can imagine that a fix introduced in the original code actually breaks the copy Exact and parameterized clones are distinguished. Finding of exact clones is easier and is language independent. Parameterized clones are more difficult to find, but emphatically more helpful, because clones are often insignificantly changed already by coping. In [RYSS] various techniques for clone finding are classified. These techniques can be roughly classified into three categories: string-based, i.e. the program is divided into a number of strings (typically lines) and these strings are compared against each other to find sequences of duplicated strings token-based, i.e. a lexer tool divides the program into a stream of tokens and then searches for series of similar tokens parsetree based, i.e., after building a complete parse-tree one performs pattern matching on the tree to search for similar subtrees. Parse-tree based technique was considered also during ASP project. Choosing of the technique should be made according to the goal of the measurement. The finding of all possible clones for next audits preferably will use the token or parsetree based technique. In context of this thesis it is more interesting to know only approximately number of clones and thus the simple and quick string-based technique can be used. The next important property of string-based technique is language independency, since the ABAP and Java environments are considered. As most important indicator is suggested the metric CLON (Clonicity), which is a ratio of LOC in all detected clones to the Total-LOC. This metric should give an idea about usage of the copy-paste in development process and consequently about redundancy of the final product.
Short Introduction into Information Theory and Metric: CDEm Class Definition Entropy (modified)
Methods for describing of complexity There are many methods, which allow describing the complexity of the system. Only few of them are listed below (partially taken from [CART03]): Human observation and (subjective) rating. The weakness of such evaluation is too subjective manner and the required human involvement.
27
Number of parts or distinct elements. Nevertheless, size and complexity are truly two different aspects of software. Despite of this fact traditionally various size metrics have been used to help indicate the complexity. However many such metrics are size-dependent and dont allow the comparing of systems of different size. It is also not always clear, what should be counted as a distinct part. Number of parameters controlling the system. Here the same comments as by number of parts can be applied. Minimal description in some model/language presents some kind of abstraction. Obviously a system, which has smaller minimal description, is easier, than a system with larger minimal description. In this method a model (a description) includes only relevant information, thus the redundant information, which intensifies size without incrementing complexity, is avoided. Information content (how is defined/measured information?) Minimal generator/constructor (what machines/methods can be used?) Minimum energy/time to construct. Several experts argue that the system, which needs more time to be designed (implemented), is more complex. Obviously, the study of complex systems is going to demand that the analyst uses some kind of statistical method. Next, after short introduction into information theory, entropy-based metrics for supporting of some above mentioned methods are discussed. Information Remark: all following theses are considered in terms of the probability. Consider a process of reading of a random text, whereas it is supposed that alphabet is initially known to the reader. The reading of each next symbol can be seen like an event. The probability of this event depends on the symbol and its place within the text. Examine a measure related to how surprising or unexpected an observation or event is and lets call this measure information. Thus the information, which is gotten from each new symbol, is in this context an amount of new knowledge, which reader gets from this symbol. It is obvious that information inversely related to the probability of the event: if the probability of occurrence of i is small, reader would be quite surprised if the outcome actually was i. Conversely, if probability of certain symbol is high (for example the probability of occurrence of the symbol i after t in the word information tend to 1), reader will not get much information form this symbol. Lets describe the information measure more scientifically. For that Shannon proposed four axioms: Information is a non-negative quantity: I(p) >= 0 If two independent events occur (whose joint probability is the product of their individual probabilities), then the information reader gets from observing the events is the sum of the two informations: I(p1*p2) = I(p1)+I(p2) I(p) is continuous and monotonic function of the probability (slight changes in probability should result in slight changes in information) If an event has probability 1, reader gets no information from the occurrence of the event: I(1) = 0 28
Deriving from these axioms one can get the definition of information in terms of probability: I(p) = log2(p). More detailed description of this derivation can be found in [CART03] or [FELD02]. Index 2 means binary character of the events. In this case units for the information are bits. However other indexes are also possible. Entropy Each symbol in the text brings different amount of information. Interesting would be the average amount of information within the text. For this propose term entropy is introduced. After simple transformation the following expression for entropy can be derived:
Noteworthy, that H(P) is not a function of X. It is a function of the probability distribution of the random variable X. Entropy has the following important property: 0<=H(P)<=log(n). H(P) = 0, when exactly one of the probabilities is one and all the rest are zero (only one symbol is possible). H(P) = log(n) only when all of the events have the same probability 1/n. It is surely wanted to maximize H by a uniform distribution. Everything is equally likely to occur - you can't get much more uncertain than that. Since maximal possible entropy is known, the normalized Entropy can be introduced:
It is important, since entropy is project size dependent. Remarkable, that entropy logarithmic depends on size: doubling of the size increments maximal entropy by one point. Next some possible interpretations of entropy are listed: Entropy of a probability distribution is just the expected value of the information of the distribution Entropy is also related to how difficult it is to guess the value of a random variable X [FELD02, p.p. 5-7]. One can show that H(X) <= Average # of Yes-No Questions to Determine X <= H(X) + 1 Entropy indicates the best possible compression for the distribution average number of bits needed to store the value of the random variable X. Noteworthy that entropy suggests only theoretical basis, some practical algorithm should be used for the actually coding (for example Huffman codes). Next, some applications of entropy for the software measurement are discussed. Average Information Content Classification In [ETZK02, p 295] the work of Harrison is mentioned. Harrison and other scientists proposed to extend Halstead's number of operators and measure distribution of different operators and operands within a program. It should allow assessing the analyzability of one single chunk of code. However such method is not very useful since the main complexity is contained within user-defined strings. Remarkable, that syntactical rules in languages decrease entropy. For example it is not possible to have 2 operands without an operator in between and the compiler takes care for it. Hence the probabilities of the occurrence of the operands in the syntactical correct 29
programming text depend on the syntactical rules. Consequently, the entropy of the syntactical correct program will never reach maximum and normalizing with respect to the syntactical rules becomes much more difficult. Metric: CDEm - Class Definition Entropy (modified) This metric reduces alphabet for text to the user-defined strings used in a class, because these contain the most part of the complexity. Examples of the user-defined strings are: Names of classes, attributes and methods Package and class names within an import section Types of public attributes and types of return values for methods Types of parameters Method calls, etc By such restriction one can get another level of granularity. Lets illustrate this metric by an example. Consider a maintainer seeking throw the source code. How surprised would be the maintainer, if he sees a reference to other class? Suppose that: Maintainers work easily if they confront with the same object again and again Maintainers work difficult if they often should analyze new unknown objects Consider two programs presented in figure 5.1. Assume that the maintainer has to fix two faults in the modules B and C. Both modules the use functionality provided by the modules A and E. In the first program the module A plays a role of an interface and the maintainer can work easy, because he has to keep in mind only one pattern for collaboration. In the second program the modules B and C have references to different modules, thus such model is more multifarious and more difficult to comprehend. Figure 5.2 show different patterns for frequency of occurrence of module names in an abstract program. The frequent used modules play a role of interface for its contained package. Entropy of frequency distribution is an indicator for evidence of interfaces. Positively P1 has less value for entropy than P2. Noteworthy that other metrics will show that P2 is much easier to comprehend: less coupling between modules less complexity. Consequently high entropy for distribution of the user-defined strings will indicate the difficult comprehensible text. Different variances of this metric have been proposed. A very simple and intuitive variant is an analysis of the import-section only and calculation of the distribution of occurrences of class names in the import-sections. The classes, which occur most often in the import-sections, are also often used from outside of the package, where they are defined, and thus are the interface of this package. Clear (small) package interfaces are indicator of the good design. This metric is called CDEm Class Definition Entropy (modified). Reader interested in other implementations of this kind of metric are referred to [ETZK99], [ETZK02] and [Yi04]. Incidentally some entropy-based metrics use also semantic analysis to improve its pronouncement.
30
P1
F E A
H(P1) = 2,37
P2
F E A
H(P2) = 2,5
Figure 5.1: The uniform (left) and the multifarious pattern for communication between modules
ABCDEF . . . Z
Name of module
ABCDEF . . . Z
Name of module
The system with pronounced interfaces The system with ulterior interfaces Figure 5.2: The evidence of classes, which play the role of interface for the packages
For the calculation of CDEm two programs were developed: Class Entropy.java prepares the list of all classes in the project and after that seeks the source code in order to find references on classes from this list. Next, the list is filled up with data about frequencies and entropy is calculated. Class EntropyImport.java doesnt have a list of classes to be found, this tool seeks source files and calculates entropy of import-clauses. In this case list of user-defined strings (import clauses) is prepared dynamically. It is argued that both tools measure the same aspect, since the results of both tools are correlated. Since entropy based on analysis of import section is easier, it will be used for the further research. As very initial indicator of entropy the coefficient of compression (for example of ZIP archive) can be taken. The author believes that ZIP practically implements algorithm, which with its ZIP-coefficient of compression (ZCC) tends to the best possible 31
compression defined by means of entropy. However, ZIP works with symbols, while CDEm works with tokens. ZCC = size of the ZIP-archive / size of the project before the compression. Thus, high ZCC indicates high complexity of the project, low ZCC indicates simple project with high redundancy. High CDEm indicates complex design, low CDEm indicated simple design. The next simple experiment tries to find out whether these two metrics are correlated and proves whether there is a correlation between CDEm and ZIPcoefficient. Figure 5.3 shows dependency between ZIP-compression coefficient and import-based CDEm. Input for this experiment were examples of code described in chapter 8.
CDEm (%)
89,0
88,0
87,0
86,0
85,0
84,0
0,32; 84,2
82,0 0,17
0,18; 82,4
0,20
0,21; 82,5
0,23 0,26 0,29 ZCC 0,32
Figure 5.3: ZCC and CDEm do not have evident dependence During this experiment 4 pairs of projects were analyzed, whereas each pair presents two versions of the same project old and new. Each newer project is supposed to have better values than older one. In figure 5.3 arrows connecting two measurement points indicate trend of values within one project. Since directions of the arrows are quite different one can say about absence of any connection between these metrics. Overview of all considered projects is also shown in table 5.1. Trend of metrics (improvement or degradation - ) is shown using arrows. According to experts opinion all newer version should show improvement, however the metrics often show opposite results. Nevertheless, ZCC measures not pure entropy of the code, but also entropy of comments, most of which are generated or consist of the same predicates. Consequently, ZCC shows lower values that the entropy actually is. More accurate experiment should exclude comments before compression. Besides, high CLON can cause low ZIP-coefficient values as well. 32
The analysis of the examples also shows that many developers use * in import sections. Such inaccurate definition leads to inexact CDEm calculation. Hence the contradiction between ZCC and CDEm is most probably caused by not proper computation of the metrics. The author argues that more accurate experiments should be made in order to ascertain ability of these metrics to predict the maintainability. Noteworthy, that some peculiar properties of the software design can influence CDEm. For example the project Ant (www.apache.org) has very low value for CDEm, because almost every class uses the following classes: BuildException, Project, Task and some others. Such distribution of the user-defined strings leads to the underestimation of the entropy. Complexity of the development process Interesting approach is suggested in [HASS03] by Hassan. He argues, that if developers should intensively modify many files at the same time, it increases cognitive loading and can also cause problem for managers. Such strategy can also lead to the bad quality of the product. As measurement for the chaos of software development was suggested an entropy-based process metric. Input data for the measurement is a history of code development. Time is divided in periods and for each period is calculated the frequency of changes for each source file (see illustration in figure 5.4). Through main property of the entropy it will maximize for uniform distribution. Thus high entropy for the distribution of the source code changes indicates situation, when many files are changed very actively. Low entropy shows normal development process, when only few files are edited actively and the rest is kept untouched or is changed very insignificantly. Evolution of entropy during the development process is illustrated in figure 5.5. Hence high entropy can warn the project manager about insufficient organization of the development process. The next entropy-based metric is discussed in sub-chapter Metric: m Structure Entropy after introduction of an appropriate model. As short conclusion to the usage of the information theory in the software measurement one can say: it is powerful non-counting method for describing of the semantic properties of the software, but before one can use it, more experiments with exact and perceptive tools should be made.
33
34
Model: Flow-graph
The flow-graph model represents intra-modular control complexity in form of a graph. The flow-graph consists of edges and nodes, whereas nodes represent operators and edges represent possible control steps between the operators. A flow-chart is a kind of flow-graph, where decision nodes are marked out with a different symbol. Figure 5.6 provides an example of both notations. A region is an area within a graph that is completely bounded by nodes and edges.
Nodes Nodes
Edges Edges 1 1 2 2 6 6 3 3 6 6 7 7 9 9 10 10 11 11 8 8 4 4 5 5 7 7
R1 R1
1 1
2,3 2,3
R2 R2
4,5 4,5
R3 R3 R4 R4
8 8
9 9 10 10 11 11 Regions Regions
Flow Flow FlowChart Chart FlowGraph Graph Figure 5.6: Example of Flow-graph and corresponding flow-chart [ALTU06, p. 15]
One possible way to calculate MCC is: MCC = E - N + 2, where E = number of edges and N = number of nodes. It has also been shown that for a program with binary decisions only (all nodes have out-degree <= 2), MCC = P + 1, where P is number of predicated (decision) nodes (operators: if, case, while, for, do, etc.). Usage of MCC in the object-oriented environment This intra-modular metric can be used both in procedural and in object-oriented context. However, the usage of this popular metric in the object-oriented context has some peculiarities. Usual object-oriented programs show understated values of MCC, because up to 90% of methods could have MCC = 1. In [SER05] is hypothesized that part of the complexity is hidden behind object-oriented mechanisms such as inheritance, polymorphism or overloading. These mechanisms are in fact the hidden decision nodes. A good illustration of this phenomenon applied to overloading could be the following example: Listing 5.1: Illustration of the hidden decision node in case of the overloading
class A{ method1(int arg){}; method1(String arg){} }; public A a; a.method1(b);
Hidden in the last statement decision node could be represented in procedural way by proving the type of the argument and calling the corresponding method. Additional decision nodes for polymorphism and inheritance could be presented similar. The hypothesis is: the less OO mechanisms are used, the more complex methods should be. Experiment described in [SER05] tried to find inverse correlation between inheritance factor (Depth in Inheritance Tree) and MCC, but didnt show significant results. Nevertheless polymorphism or overloading could be a better factor to correlate with. Additional experiments are needed. Since MCC can be calculated only for the single chunk of code, in OO-environment one further metric is introduced in order to aggregate values and present metric for entire class, see metric WMC (Weighted Methods per Class) for more details.
the interface and the class, which implements this interface. The Extended Inheritance Hierarchy is a directed acyclic graph with no loops The Advanced Inheritance Hierarchy supplements the Extended Inheritance hierarchy by adding attributes and methods for each class Because in ABAP and Java interfaces are widely used and they have great impact on the analyzability and changeability, the Simple Inheritance Hierarchy was rejected. On the other side the very detailed level of granularity, provided by the Advanced Inheritance Hierarchy, is not very useful in context of the maintainability. Consequently, the Extended Inheritance Hierarchy is chosen as most appropriate basis for the maintainability metrics. For this model the following metrics were proposed: Chidamber and Kemerer proposed the Depth of Inheritance Tree (DIT) metric, which is the length of the longest path from a class to the root in the inheritance hierarchy [CHID93 p.p. 14-18] and the Number of Children (NOC) metric, which is the number of classes, that directly inherit from a given class [CHID93 p.p. 18-20]. Later, Li suggested two substitution metrics: the Number of Ancestor Classes (NAC) metric to measure how many classes may potentially affect the design of the class because of inheritance and Number of Descendent Classes (NDC) metric to measure how many descendent classes the class may affect because of inheritance. These two metrics are good candidates for the quality model in the areas Analyzability and Changeability respectively and will be discussed in more details.
distribution. The width is a ratio of super-classes to total number of classes. An indicator of width is U metric (Reuse factor), where U = super-classes / classes = (CLS - LEAFS) / CLS. A super-class is a class that is not a leaf class. U measures reuse via inheritance. The high U indicates a deep class hierarchy with high reuse. The reuse ratio varies in the range 0 <= U < 1. The weight distribution means tendency to where main functionality is implemented. However there is no appropriate metric for the weight distribution. The best way to indicate weight distribution is a histogram, where vertical axis represents DIT and horizontal axis represents the number of classes, number of methods or sum of WMC. Figure 5.8 depicts an example of top-heavy hierarchy, as functionality indicator the metric WMC is selected. Different designs of inheritance hierarchy are presented in figure 5.7. The next experiment tries to estimate best geometry for inheritance hierarchy from viewpoint of the maintainability using metrics NAC and NDC.
Figure 5.7: Types of Inheritance Hierarchies. First of all, values of metrics are calculated for each class and then aggregated using arithmetic mean. Lets try to estimate the analyzability and changeability for each type of hierarchy based on the average values. The comments can be also seen in the figure 5.7.
38
DIT
6 5 4 3 2 1 0
Figure 5.8: Weight distribution Top-heavy hierarchies maybe not take advantage of reuse potential. Ultimately here the design is discussed from viewpoint of the maintainability. Top-heavy means, that the classes with main functionality are placed near the root, hence such a hierarchy should be easy to understand, because the classes have small number of ancestors. However, if classes have large number of descendents they are difficult to change. Bottom-heavy hierarchy is easy to change, because many classes have no children. Narrow bottom-heavy designs are difficult to understand because of many unnecessary levels of abstraction. Nevertheless this consideration has several problems: Though the metrics NAC and NDC seem to be comprehensive, mean values of this metrics are fungible and yield same numerical values. Ave-NAC can be calculated as the number of descendent-ancestor relations divided by the number of classes. Ave-NDC can be calculated as the number of ancestordescendent relations divided by the number of classes. Because each descendentancestor relation is the reversed ancestor-descendent relation, Ave-NAC = AveNDC. The numbers in figure 5.7 confirm it. Noteworthy, that the metrics DIT and NOC have the same property, applied to simple inheritance hierarchy: Ave-DIT = Ave-NOC. Therefore the aggregated values for these metrics are redundant In some cases the metrics NAC and NDC cannot distinct between different types of hierarchies, in the given example top-heavy narrow hierarchy has approximately equal values as a bottom-heavy wide hierarchy with the same number of classes. To distinct different designs additional metric is needed In common it is not possible to assess the maintainability based on the geometrical properties of the hierarchy, because it is more important to know how the inheritance is used Some experts mean, that the inheritance hierarchy should be optimized, for example using balanced trees. However the theory of balanced trees is intended for the search or change operation, such a tree will be always wide and bottom-heavy, because 50% of the nodes are leafs. Thus such optimization is misleading for the maintainabilitys goals.
39
Many experts agree that the inheritance is a very important attribute of the software, which has impact also on the maintainability. However, there are different points of view, some experts recommend using a deep hierarchy, others prefer a wide hierarchy. Nevertheless author does not see any possibility to assess the entire inheritance hierarchy from maintainabilitys point of view using the metrics. Consequently, the suggestion is to use metrics NAC and NDC for finding of classes, which could be difficult to maintain because of erroneous usage of the inheritance. A simple example of the audit is the report, which includes all classes with more than 3 super-classes or more than 10 sub-classes.
Some change elsewhere requires both A and B to be changed Obviously, the first case is much easier to find. In general, objects can be coupled in many different ways. The next list presents several important types of coupling, resulted from theoretical speculations of the author and partially taken from [ZUSE98]: By content coupling one module directly references the code of another module. This type of coupling is very strong because almost any change in the referred module will affect the referring module. In Java such type of coupling is impossible, in ABAP is implemented through the INCLUDE directive By common coupling two modules share a global data structure. In ABAP is most commonly used in the DATA DICTIONARY. Such coupling is not very dangerous, because data structures are changed very seldomly By external coupling two modules share a global variable. This coupling deserves attention, because excessive usage of global variables can lead to maintenance problems. To handle with the external coupling a metric GVAR (number of global variables) is suggested. However this metric is duplicated in metrics FANIN and FAN-OUT and thus rejected from further investigation (see p. 54) Data coupling is the most commonly used and unavoidable. In his work Yourdon stated that any program can be written using only data coupling [ZUSE98, p. 524]. Two modules are data coupled if one calls the other. In objectoriented environment there are even more possibilities to use the data coupling: o Class A has a method with local variable of type B o Class A has a method with return type B o Class A has a method with argument of type B o Class A has an attribute of type B There are several metrics for data coupling: FAN-IN, FAN-OUT for procedural and RFC (Response For a Class), CBO for object-oriented environment. For the metrics FAN-IN, FAN-OUT see section Structure Chart (p. 54) Inheritance coupling appears in an inheritance hierarchy between classes or interfaces. The metrics for this type of coupling have been discussed in the previous section Structural coupling appears between all units, which are combined together in a container. For example all methods within a class are structural coupled into the class; all classes within a package are coupled into the package. In order to qualify such coupling the term cohesion is introduced in one of the next sections (p. 44) Logical coupling is unusual coupling, because modules are not coupled physically, but changing of one will cause changing of another. Since there is no representation of such coupling in the source code, the logical coupling is very difficult to find out. Reader interested in this type of coupling can find reference to the research of logical coupling at the end of chapter 10 (p. 91) Indirect coupling. If class A has direct references to A1, A2, , An, then class A has indirect references to those classes directly and indirectly referenced by A1, A2, , An. In this thesis (except inheritance) only direct coupling is considered 41
Content, common, logical and indirect coupling are not considered in this thesis, structural coupling in form of cohesion of methods is discussed in one of the next sections. The metrics for inheritance coupling have been discussed in the previous section. Coupling Between Objects (CBO) is the most important metric for data coupling in the object-oriented paradigm. CBO for a class is a count of the number of other classes to which it is coupled [CHID93, p. 20]. However it would be more precise to call this metric Coupling Between Classes, because at the time of the measurement are no objects created yet. In order to improve modularity and promote encapsulation, inter-object class couples should be kept to a minimum. The larger the number of couples, the higher the sensitivity to changes in other parts of the design, and therefore maintenance is more difficult [CHID93, p. 20]. Also the class with many relations to other classes is difficult to test. Hence CBO impacts the Changeability and the Testability. Nevertheless, CBO can indicate the Analyzability as well, but the RFC metric indicates it more precisely.
Such calls embarrass the understanding of the program repeatedly and should be counted as separate method calls. Noteworthy in ABAP this is not possible.
42
If a large number of methods can be invoked in response to a message, the testing and debugging of the class becomes more complicated since it requires a greater level of understanding required on the part of the tester [CHID93 p.22]. From the definition of the RFC is clear that it consists of two parts: number of methods within a class and number of calls of other methods. Hence, RFC correlates with NOM and FAN-OUT, this has been shown in [BRUN04, p.9]. RFC is an OO-metric and corresponds to FAN-OUT in the procedural context.
agreement with assumptions, the system with many short calls and only few long has a good design. Lets find an optimal method for describing of character of the calls. Initially each dotted edge can be described by a pair of numbers: start leaf and end leaf. Therefore, each leaf needs log2F bits, where F equals the number of leafs. For the description of each call 2*log2F bits are needed. However it is possible to reduce number of bits, by indicating a relative path for the end leaf. Hence short calls need shorter description and long calls longer. If one describes all calls of the system in such way, and calculate average number of bits needed for each call one can gather about design of the system. Higher number of bits needed for description of average relation indicates poor design.
System I System II Packages
A A B B a a b b c c C C d d e e
A A B B a a b b c c C C d d e e
Figure 5.9: Example of structure tree Nevertheless the analyst doesnt need to actually code all these calls. The information theory says that one can easily estimate average number of needed bits based on the entropy. As probability basis for entropy one can use the frequencies of length of call. To make matter worse, long calls can be additionally penalized by coefficients. For entropy background see section Introduction into Information Theory. More detailed description of this metric can be found in [SNID01, p.p. 7-9]. Here just one simple example is given in order to illustrate ability of this metric. Consider two small systems, depicted in figure 5.9. Both systems have equal number of classes, methods (F=5) and calls (E=4). However most of the calls in the first system are long, this disadvantage was fixed in the second system by better encapsulation: method c provide an interface for attributes d and e in its class. Thus it is supposed, that the second system is more maintainable, because of more easier and structured design. According to formulas given in [SNID01, p.p. 7-9], the structure entropies of the given systems are: m(I) = - (3/5*log2(3/5) + 1/5*log2(1/5) + 0 + 1/5*log2(1/5)) + 4/5 (*log2(5* 8/20) + *log2(5* 12/20)) 2,52 m(II) = - (2/5*log2(2/5) + 2/5*log2(2/5) + 1/5*log2(1/5)) + 4/5 (*log2(5* 8/20) + *log2(5* 12/20)) 2,44 44
Hence, the second system needs fewer bits for its description and has fewer long calls. Consequently metric m (Structural Entropy) can indicate the tendency of the system to have short or long calls.
Figure 5.10 Non-cohesive class can be divided in parts. However, coupling and cohesion are also interesting because they have been applied to procedural programming languages as well as OO languages [DARC05, p.28]. In case of the procedural paradigm, the procedures and functions of the module should implement a single logical function. Concern an example with a functions pool in ABAP. It is an analogue of a class, it has internal global data (attributes) and functions (methods). Noteworthy that by a call of one function from the pool the entire function group is loaded into the memory. Consequently, if you create new function modules, you should deliberate how they will be organized in the function groups. In one function group you should combine only function modules, which use common components of this function groups, so that the loading into the memory is not useless (translation from [KELL01, p. 256]). Finally the low cohesion can indicate potential performance problems. For the maintenance, the low cohesion means that the maintainer has to understand the additional not related to the main part code, which may be badly structured. This fact has an impact on the analyzability. Additionally, a low cohesive component, which implements several different functionalities, will be more affected by the maintenance, because the changing of one logical part of the component can destroy other parts. Components with low cohesion are modified more often since they implement multiple functions. Such components are also more difficult to modify, because a modification of one functionality may affect other functionalities. Thus, low cohesion implies lower maintainability. In contrast, components with high cohesion are modified less often and are also easier to modify. Thus, high cohesion implies higher maintainability [NAND99]. This fact has impact on the changeability. 45
High cohesion indicates good class subdivision. The cohesion degree of a component is high, if it implements a single logical function. Objects with high cohesiveness cannot be split apart. Lack of cohesion or low cohesion increases complexity, thereby increasing effort to comprehend unnecessary parts of component. Classes with low cohesion could probably be subdivided into two or more subclasses with increased cohesion. It is widely recognized that highly cohesive components tend to have high maintainability and reusability. The cohesion of a component allows the measurement of its structure quality. There are at least two different ways of measuring cohesion: 1. Calculate for each attribute in a class what percentage of the methods use that attribute. Average the percentages then subtract from 100%. Lower percentages mean greater cohesion of data and methods in the class. 2. Methods are more similar if they operate on the same attributes. Count the number of disjoint sets produced from the intersection of the sets of attributes used by the methods [ROSE, p.4]. In [BADR03] most used metrics for cohesion are shortly described, see brief definitions in table 5.2. Metrics for cohesion are not applicable for classes and interfaces with: no attributes one or no methods only attributes with get and set methods for these (data-container classes) abstract classes numerous attributes for describing internal states, together with an equally large number of methods for individually manipulating these attributes multiple methods that share no variables but perform related functionality. Such situation can appear because of usage of several patterns Classes, where calculation of cohesion is not possible, are accepted as cohesive. To overcome these limitations the following various implementations of the LCOM metric are possible: Regarding inherited attributes and/or methods in the calculation or not Regarding constructor in the calculation or not Regarding only public method or all methods in the calculation Regarding get and set methods or not These implementations are independent of which definition is used. According to the recommendation from [ETZK97], [LAKS99] and [KABA] and theoretical speculation the following options were selected: Inherited attributes and methods are excluded from calculation Constructors are excluded from calculation Get and set methods are excluded from calculation Methods with all types of visibility are included into calculation
46
It is also possible to find and remove all data-container classes from research of cohesion. It can be easily made by an additional metric NOM (Number Of Methods). In case of the data-container class NOM=WMC. Table 5.2: The major existing cohesion metrics [BADR, p. 2] Metric Description LCOM1 The number of pairs of methods in a class using no attribute in common. LCOM2 Let P be the pairs of methods without shared instance variables, and Q be the pairs of methods with shared instance variables. Then LCOM2 = |P| |Q|, if |P| > |Q|. If this difference is negative, LCOM2 is set to zero. LCOM3 The Li and Henry definition of LCOM. Consider an undirected graph G, where the vertices are the methods of a class, and there is an edge between two vertices if the corresponding methods share at least one instance variable. Then LCOM3 = |connected components of G| LCOM4 Like LCOM3, where graph G additionally has an edge between vertices representing methods Mi and Mj, if Mi invokes Mj or vice versa. Co Connectivity. Let V be the vertices of graph G from LCOM4, and E its
LCOM5
edges. Then Consider a set of methods {Mi} (i = 1, , m) accessing a set of instance variables {Aj} (j = 1, , a). Let (Aj) be the number of methods that
reference Aj. Then Coh Cohesiveness is a variation on LCOM5. Tight Class Cohesion. Consider a class with N public methods. Let NP be the maximum number of public method pairs: NP = [N*(N 1)]/2. Let NDC be the number of direct connections between public methods. Then TCC is defined as the relative number of directly connected public methods. Then, TCC = NDC / NP. Loose Class Cohesion. Let NIC be the number of direct or indirect connections between public methods. Then LCC is defined as the relative number of directly or indirectly connected public methods. LCC=NIC/NP. Degree of Cohesion (direct) is like TCC, but taking into account Methods Invocation Criterion as well. DCD gives the percentage of methods pairs, which are directly related. Degree of Cohesion (indirect) is like LCC, but taking into account Methods Invocation Criterion as well 47
TCC
LCC
DCD
DCI
In [LAKS99] and [ETZK97] various implementations of the cohesion metrics (LCOM2 and LCOM3) are compared on C++ code example classes. Best results show the following metrics: LCOM3, which did not include inherited variables, and that did include the constructor function in the calculations [ETZK97], [LAKS99] LCOM3 with consideration of inheritance and constructor [LAKS99] The metrics LCOM5 and Coh are not robust and are rejected from the further investigation. The next simple example presented in table 5.3 shows this. Table 5.3: Example for LCOM5 and Coh A1 A2 A3 A4 A5 M1 M2 M3 M4 M5 2 2 2 + + + + + + + + 2 + 2 +
Obviously, the class is relative cohesive all pairs of method have one common variable, but the metrics show the opposite. LCOM5 = ((1/a) (Aj) m) / (1 m) = (10 / 5 5) / (1 - 5) = 0,75 Coh = (Aj) / (m*a) = 10 / 5*5 = 0,4 In [BADR03] experts argue that methods can be connected in many ways: Attributes Usage Criterion two methods are connected, if they use at least one attribute in common. Methods Invocation Criterion two methods are connected, if one calls other Only three metrics (LCOM4, DCD, and DCI) consider both types of connections, all other metrics consider only attribute connection. The metrics have different empirical meaning: Number of pairs of methods (LCOM1, LCOM2) Number of connected components (LCOM3, LCOM4, Co) Relative number of connections (TCC, LCC, DCD, DCI) Most logically and interesting for the goals of this thesis is the number of connected components, this could be interpreted as number of parts, in which the class could be split. Noteworthy, that the values of normalized metrics (TCC, LCC, DCD, DCI, Co) are difficult to aggregate for representing of the result for entire system because averaging of the percentages leads to value with bad numerical and empirical properties. For more precise results the weighted mean value should be used. In case of size-dependent metrics (LCOM1, LCOM2, LCOM3, LCOM4) simply average value can be used. 48
Hence LCOM4 is the most appropriate metric. Basically it is the well-handled metric LCOM3 extended by the methods invocation criterion. A non-cohesive class means that its components tend to support different tasks. According to common wisdom, this kind of class has more interactions with the rest of the system than classes encapsulating one single functionality. Thus, the coupling of this class with the rest of the system will be higher than the average coupling of the classes of the system. This relationship between cohesion and coupling means that a non-cohesive class should have a high coupling value [KABA, p.2]. However in [KABA, p.6] by means of an experiment is shown that in general, there is no relationship between these (LCC, LCOM) cohesion metrics and coupling metrics (CBO, RFC). Also one cannot say that less cohesive classes are more coupled to other classes. In [DARC05] Darcy believes that metrics for coupling and cohesion should be used only together and expects, that for more highly coupled programs, higher levels of cohesion increase comprehension performance. He motivated his conception by the following thought experiment (figure 5.11).
Figure 5.11: Interaction of coupling and cohesion (according to [DARC05, p. 17]) If a programmer needs to comprehend program unit 1, then the programmer must also have some understanding of the program units to which program unit 1 is coupled. In the simplest case, program unit 1 would not be coupled to any of the other program units. In that case, the programmer need only comprehend a single chunk (given that program unit 1 is highly cohesive). In the second case, if program unit 1 is coupled to program unit 2, then just 1 more chunk needs to be comprehended (given that program unit 2 also shows high cohesion). If program unit 1 is also coupled to program unit 3, then it can be expected that Short-Term Memory (STM) may fill up much more quickly because program unit 3 shows low cohesion and thus represents several chunks. But, the primary driver of what needs to be comprehended is the extent to which program unit 1 is coupled to other units. If coupling is evident, it is only then that the extent of cohesion becomes a comprehension issue. Next, Darcy confirmed his hypotheses with an experiment with the maintenance of a test application. However, the very artificial sort of the experiment prevents reader from untried implementation of this hypothesis without more experiments.
49
LCOM Essential:
It is the degree of relatedness of methods within a class. Cohesion can be used in the procedural and in object-oriented model as well. Has impact on the analyzability and the changeability Cohesion may be concerned together with coupling LOCM4 seems to be most appropriate metric from the theoretical point of view, additional experiments are needed.
50
with database or offering tools usually have high afferent coupling and low efferent coupling, therefore are highly stable and difficult to change. Thus it is useful to have more abstract classes here in order to be able to extend these and in such way maintain the packages. Packages for user interface depend from many other packages thus they have low afferent coupling and high efferent coupling and are mostly instable. Hence designers dont need to have many abstract classes here, because these packages could be easily changed. This statement should be empirically proved. In figure 5.12 the analysis of project Mobile Client 7.1 (detailed described in chapter 8) is presented. As it can be seen, the packages are evenly distributed on the whole square, and it is impossible to conclude whether the entire system has good or bad design. The same situation can be seen in all other analyzed projects. Hence D-metric is bad indicator for the maintainability of the entire project. But one can notice that single packages from areas A and C possible may by difficult to maintain. Thus D-metric is supposed to be used for the metric-based audits. However experiments and discussion with the designers show, that audits based on D-metric can find only evident errors of design (for example not used abstract classes). Consequently, D-metric is rejected from the quality model.
Figure 5.12: Demonstration of the analysis of Martin on project Mobile Client 7.1
in turn depend on that package. Cyclic dependencies are difficult to maintain and indicate potential code to apply refactoring changes, since cyclically dependent packages are not only harder to comprehend/compile individually, but they cannot be packaged, versioned, and distributed independently. Thus, they violate the idea that a package is the unit of release. Unfortunately, this metric is project size dependent and it is impossible to compare two projects based on this metric. Consequently, the audits based on this metric can be useful to catch cyclic package dependencies before they make it into a software baseline.
Metric: NOM - Number of Methods and WMC - Weighted Methods per Class
Consider a class with n methods. Let c1...cn be the complexity of the methods. Then:
If all method complexities are considered to be unity (equal to 1), then WMC = NOM = n, the number of methods. However in most cases complexity of the methods is estimated by MCC. The metric WMC was introduced by Chidamber and Kemerer's [CHID93, p. 12] and criticized by Churcher, Shepperd and Etzkorn. In particular, Etzkorn has suggested new metric for complexity of the methods [ETZK99] Average Method Complexity.
He argued that WMC has overstated values for classes with many simple methods. For example, a class with 10 attributes has 10 get-methods, 10 set-methods and the constructor, thus WMC = 21, what is very high value for such a primitive class. AMC, on the opposite, will have understated values for classes with a few really complex methods (MCC > 100) and many simple methods. Thus, AMC is not intended primarily as a replacement for the WMC metric, but rather as an additional way to examine particular classes for complexity [ETZK99, p. 12]. In this thesis it is more preferable to use WMC instead of AMC, because WMC is class size dependent, but independent from the project size. Additionally, it has very clear meaning: number of all decision statements in the class plus number of methods. Consequently, WMC is a good metric for estimating of overall algorithm complexity of the class. For data-container classes NOM = WMC because such classes have only get- and setmethods, which have MCC = 1. Thus, NOM can be used as additional metric for finding data-container classes. It is important for rejecting the data-container classes from the cohesion research.
Structure Chart
This model describes the communication between modules in the procedural environment and suits for illustration of processes in non-OO ABAP programs. Example of structure chart is depicted in figure 5.13. Boxes present modules (function 53
modules, programs, includes, etc.), circles present global variables and arcs present calls, whereas parameters of the call can be also depicted. Direction of the arrows distinguishes between importing and exporting parameters.
Other Models
Here some simple metrics, which dont suit to any previous introduced models, are discussed
55
56
WMC
200
150
100
50
2
0 0
400
800
1200
1600
LOC
Figure 5.14: Correlation between LOC and WMC The next possible relation between metrics is not so obvious. Each product has the minimal inherent complexity, which depends only on the problem statement. If complexity of one perspective is reduced, complexity of other perspective will increase. For example reducing the high intra-modular complexity by increasing the total number of classes will lead to increasing the inter-modular complexity. An example of such relation is depicted in figure 5.15. With the numbers 1 and 2, two releases with the same functionality are marked off. Here can be seen that decreasing the average MCC leads to increasing the total number of classes in the second release.
the quality model, because they have no great impact on the maintainability and are difficult to calculate. For the question Are the naming conventions followed? has not been found any appropriate simple metric and this question was removed from the quality model. All in all the proving of the naming conventions is not a trivial task. The question How complex is the DB-schema? is not urgent for SAP architecture and was removed. The question about the complexity of the data types was removed because of lack of metrics. However the quality model is still redundant. Different metrics get the same qualitative statement. Therefore, the metrics, which cannot get new qualitative information or are difficult to calculate, should be discarded. These metrics may be put on a waiting list for the implementing in the future. For the selection of the most important metrics, three additional criteria for each metric were provided: the importance, estimation or judgment in the literature and ease of implementation. In the following the list of all rejected metrics are enumerated with an indication of the reason for rejection: A (Abstractness) is used also in D, the aggregated value doesn't provide appreciable qualitative meaning CN (Control Nesting) is included in and correlates with MCC CR (Comments Rate) is replaced by LC (Lack of Comments) CYC (Cyclic Dependencies) is size-dependent; optional can be used for additional audits D (Distance from Main Sequence) its aggregated value doesn't provide appreciable qualitative meaning DIT (Depth of Inheritance Tree) is extended and replaced by NDC DOCU (Documentation Rate) is difficult to analyze, is a part of the Maintainability Assessment D-INFO is size-dependent and included in FAN-IN and FAN-OUT GVAR (Number of Global Variables) is included in FAN-IN and FAN-OUT NOC (Number of Children) is extended by NAC NOF (Number of Fields) - correlates with LOC NOM (Number of Methods) - correlates with LOC NOS (Number of Statements) - correlates with LOC and MCC U (Reuse Factor) is not very important for the maintenance After the truncation of the quality model the following metrics were supposed as maintainability indicators and thus selected for further research: CBO - Coupling between objects CDEm - Class Definition Entropy (Modified) CLON - Clonicity LC Lack of Comments FAN-IN (substitutes CBO in non OO ABAP environment) 58
FAN-OUT (substitutes RFC in non OO ABAP environment) LCOM - Lack of Cohesion of Methods LOC - Lines Of Code LOCm Average LOC in methods m Structure Entropy MCC - McCabe Cyclomatic Complexity (substitutes WMC in non OO ABAP environment) NOD - Number of Developers RFC - Response For a Class SMI - Software Maturity Index WMC - Weighted Methods per Class The metrics NAC (Number of Ancestor Classes) and NDC (Number of Descendent Classes) are suggested to support metric-based audits. The selected metrics are expected to describe the maintainability of the software and should cover the following aspects: Incoming and outgoing connections between programming objects Quantity of the internal code documentation Cohesion of programming objects Degree of conformance to the principle high cohesion - low coupling Modularity Algorithmic complexity Number of developers Usage of the inheritance Maturity Clonicity
59
M5 M2 M4 M3 M3
Figure 6.1: The maintenance as the process of exchanging components. Obviously the new component M3 is somehow better than the substituted component M3 and one of the metrics should show it. In this example it would be the Defects Density (DD). However an improvement of the single part (or even each part) of the system not always leads to the improvement of the entire system. Often this is not a problem of the improvement, but of the description of it. Thus the right metrics for estimation and right operation at metrics should be used. The next table (6.1) is taken from [ZUSE98, p. 47] and shows two versions of one system with five modules. Each module in newer version is better than the correspondent module in old version it has smaller DD. However, overall DD for the system becomes worse. This can happen because DD is a percentage measure and thus depends on the size of module. The overall DD depends also on distribution of size between the modules and the analyst must interpret such metrics very carefully. It is helpful to follow the number of steps to ensure the reliability of the proposed metrics. Some approaches were found to check the numerical properties of metrics in 60
M1
order to find admissible transformations and prepare hints for analyst how to handle metrics. One of them is axiomatic approach, proposed by Weyuker [WEYU88], which provides a framework based on a set of nine axioms. Table 6.1: Trend of DD for two versions of the system [ZUSE98, p. 47] Version 1 Version 2 # of module # of errors LOC DD # of errors LOC DD 1 3 55 0.0545 12 777 0.0154 2 6 110 0.0545 5 110 0.0454 3 3 110 0.0272 2 110 0.0181 4 70 10000 0.0070 6 1000 0.0060 5 4 110 0.0363 3 110 0.0272 SUM 86 10385 0.0082 28 2107 0.0132
Zuses framework for the software measurement provides also a set of axioms, the so called extensive structure. Depending on the fulfillment of these, one can conclude about the type of scale of the metric and hence admissible transformation, which can be applied to the metric. This framework was used in this thesis for the examination of the selected metrics as it is more competent, common accepted and simple.
Types of Scale
Admissible transformations and hence types of scale are probably the most important properties of metrics, because other properties follow from types of scale. In the early 1940s Stevens introduced a hierarchy of measurement scales and classified statistical procedures according to the scales, for which they were permissible. A brief description and criticism can be found in [VELL]. Here some basics will be introduced. All types of scale are summarized in table 6.2. The first very primitive scale is nominal. The values of metrics on this scale have no qualitative meaning and characterize just belonging to one or another class. The only possible operation is equality one can define whether two values belong to the same class. Examples of nominal scale would be labels or classifications such as: f(P) = 1, if Program is written in ABAP f(P) = 2, if Program is written in Java The ordinal scale introduces the qualitative relation between values, thus they are able to be compared. The experts notes are on ordinal scale and one can use the empirical operation more maintainable. Values on the interval scale have equal distance between values. The Ratio scale allows comparing ratios between the values. The Absolute scale is a special case of a ratio scale and presents the actual count. The used in this thesis metrics are placed on the ordinal and ratio scale. One can see that higher scales (ratio) provide more possibilities for interpretation (wide range of empirical and statistical operation), but are more sensible to admissible transformation. Using a not appropriate transformation will lead to decreasing of type of scale for the result value or even to wrong conclusion. 61
The main idea of analyzing the types of scale is to help choosing the appropriate model and correctly analyzing the results by using of the appropriate operations. Table 6.2: Types of Scales (partially taken from [PARK96, p.9])
Scale Type Admissible Transform. Any oneto-one transformat ion Nominal Basic Empirical Operations determination of equality Statistical Operations Examples
y2 > y1 iff x2 > x1 (strictly monotone increasing transformat ion) y = ax + b, a>0 (positive linear transformat ion) y = ax, a > 0 (similarity transformat ion) y = x (identity)
Ordinal
The above plus Rank order statistics (Spearman and Kendall Tau correlation coefficient, Median), Maximum, Minimum The above plus Comparisons of arithmetic means, the Pearson correlation coefficient The above plus Comparison of percentage calculations, Variance the above
Interval
the above, plus determination of the equality of intervals or differences the above, plus determination of the equality of ratios the above, plus determination of equality with values obtained from other scales of the same type
labels or classifications such as: f(P) = 1, if Program is written in ABAP f(P) = 2, if Program is written in Java; activities (analyzing, designing, coding, testing); problem types; numbering of football players rankings or orderings such as severity and priority assignments f(P) = 1, if Program is easy to read f(P) = 2, if Program is not hard to read; NAC; NDC; CR; LC; OO-D; SMI; D; NOD; DOCU; IF; m; CDEm; GVAR; CLON; DD The absolute time when an event occurred; calendar date; temperature in degrees Fahrenheit or Celsius; intelligence scores (standard scores) time intervals; cost, effort (staffhours), length, weight, & height; temperature in degrees Kelvin; LOC; LCOM; CBO; RFC; WMC; FAN-IN; FAN-OUT; NOM Counting; probability
Types of Metrics
Measures can be divided into different groups regarding the kind of receiving information for the metric. See [ZUSE98, p.p. 242 246] for the full list. Below only used in this thesis types of measure are presented: Counting simple calculating of objects or their artifacts. The following operations can be applied: Range, Sum (for additive metrics), Average (for additive metrics, carefully), Weighted Mean, Median, Standard Deviation, Graphic, Aggregation (very careful). 62
Absolute
Ratio
Examples are: LOC, WMC, RFC, CBO, LCOM4, FAN-IN, FAN-OUT, NOM, MCC, NOD. Density one metric value is divided by another independent metric value. Examples are: GVAR, DD. The following operations can be applied: Range (very careful), Weighted Mean, Median, Standard Deviation, Graphic, Aggregation (very careful) Percentage a metric expressed as ratio of one part of empirical objects or their artifacts with respect to their total number. Examples are: CR, OO-D, SMI, DOCU. The following operations can be applied: Range (very careful), Weighted Mean, Median, Standard Deviation, Graphic, Aggregation (very careful). In particular this means that the percentage metrics must not be used for arithmetic mean. For example if one module has CR(P1) = 50% and other module CR(P2) = 10%, one must not average these to 30%. Dependent on size of modules the real CR(P1 + P2) could be between 10 and 50%. Especially for the CR the weighted mean will be smaller than the arithmetic mean, because smaller classes usually have larger CR and weighted mean weights small classes with smaller coefficients. Distribution for LOC and CR by the example of date from the project ObjMgr (new) is shown in figure 6.2.
Distribution of LOC and CR
LOC
2000
1500
1000
500
CR 300
Figure 6.2: Distribution between LOC and CR, smaller classes usually have larger CR. Minimum, Maximum minimal or maximal value of a population (metric set of each empirical object). Only Range operation can be used. Hybrid metric is a metric, which consists of the union of other metrics using the addition or multiplication. Examples are: LC, m, CDEm, D, MI. Hybrid metric inherits lowest numerical properties of its components. Hence, usually such metrics have relatively poor numerical properties and only few operations can be applied. Concatenation operation for inheritance hierarchies is indefinably, because new nodes can be added at any place in hierarchy. Thus all metrics, based on this model have only ordinal scale. This is detailed discussed in [ZUSE98 p.p. 273 - 335]. 63
Conversion of Metrics
The conversion of metrics is a numerical operation with one or more metrics in order to get metrics with new numerical or qualitative properties. The first type of conversion is the aggregation. Some metrics like SMI, CLON or OO-D are calculated for the project as whole and dont need to be aggregated, but many others metrics describe one single class or even method and need to be aggregated in some way to one single value indicates the entire group of empirical objects (package, inheritance hierarchy or whole system). Depends on the type of scale, different methods are possible. The first method is the range, in this method only maximal (minimal) value is taken. However one extreme value is a bad indicator for the entire system and only can be used together with other methods. Range can be used for metrics on ordinal or higher scale. The second method is the averaging using arithmetic mean: values of all modules are summed and divided by number of modules. Keep on mind that this method can be applied only to metrics on interval and higher scales. This is probably the most simple and popular way of the averaging. Nevertheless it will change empirical statement of the resulting metric and its properties. Consider some features of the arithmetic mean applied to the inter-modular metrics. As example the metrics FAN-IN and FAN-OUT are taken. The metrics FAN-IN and FAN-OUT are good indicators of the analyzability and changeability of a single module. But how the values of single modules can be combined to one common indicator for the entire system? Consider an example system on figure 6.3 to illustrate this.
A LOC:200 FAN-IN:0 FAN-OUT:2 C LOC:100 FAN-IN:2 FAN-OUT:2 E LOC:50 FAN-IN:2 FAN-OUT:2 G LOC:200 FAN-IN:1 FAN-OUT:3 F LOC:50 FAN-IN:3 FAN-OUT:3 B LOC: 100 FAN-IN:0 FAN-OUT:2 D LOC:150 FAN-IN:1 FAN-OUT:2
Figure 6.3: Example system for weighted mean Average values of FAN-IN and FAN-OUT are: Ave-FAN-IN = (0+0+2+1+2+3+1+2+2+4)/10 = 1,7 Ave-FAN-OUT = (2+2+2+2+2+3+3+0+1+0)/10 = 1,7 64
It is not a singular coincidence. For any closed system average values are equal because each relation is directional and is calculated twice: once in FAN-IN and once in FANOUT by the constant number of modules. Thus for the arithmetic mean in closed systems simple formula Ave-FAN-IN = AveFAN-OUT = Number of Relations / Number of Objects can be used. The problem is that all modules have equal weight, thus all empirical objects have equal impact on the average value. In real world large and complex objects have more impact on attribute of the whole system, but averaging will equalize in rights complex and simple objects. It is quite reasonable to calculate weighted mean value, which characterizes quality attribute of the entire system more precisely. Different systems for the weighting can be used. Suppose that possibility of changing the larger module is more than the smaller one, because it has more LOC, which could be changed. Consider again the example on figure 6.3, notice that the darkness of rectangles represents the size of the module. Lets calculate weighted by size (in LOC) mean values of FAN-IN and FAN-OUT. Mean-FAN-IN = 0*0,2 + 0*0,1 + 2*0,1 + 1*0,15 + 2*0,05 + 3*0,05 + 1*0,2 + 2*0,05 + 2*0,05 + 4*0,05 = 1,2 Mean-FAN-OUT = 2*0,2 + 2*0,1 + 2*0,1 + 2*0,15 + 2*0,05 + 3*0,05 + 3*0,2 + 0*0,05 + 1*0,05 + 0*0,05 = 2 The results show, that on average by the analysis of the system the developer should analyze 2 modules and by changing should keep stable 1,2 modules. That means the system is relative stable, but difficult to analyze. Hence the weighted mean allows not only more precise calculating of aggregated values, but also distinguishing the systems with tendency to predominance of one or other direction of relations. Noteworthy predominance in this case means the probability of having to analyze relations in this direction. Nevertheless, in generally a large module uses more other modules than smaller one. That means weighted mean for FAN-OUT tends to be more than weighted mean for FAN-IN. Figure 6.4 shows that usually large classes (larger LOC) have more connections with other classes (larger RFC). This observation is based on 8 SAP Java projects. Hence weighted FAN-OUT supposed to be larger than weighted FAN-IN. Since WMC is good indicator for class complexity and should be correlated with the fault probability, this metric also can be used for weighting. In this case the weighted mean for the FAN-OUT can be interpreted as the average number of modules the developer has to analyze by localizing of a fault. The second type of conversion is the normalizing. Metrics have different ranges of values and for comparison or presentation it is useful to have all metrics having the same range (usually in interval [0; 1]). This could be achieved by the normalizing. Example for this conversion is the normalizing of entropy: because maximal entropy is known, the normalizing is easy. Anyway since the entropy metric is on ordinal scale, the normalizing will not worsen its numerical properties.
65
400
RFC
350
300
250
200
150
100
50
LOC
1400
Figure 6.4: Correlation between LOC and RFC The third type of conversion is the composition of set of metrics to one hybrid metric. Very popular is polynomial composition, however other types are also possible. Example is the Maintainability Index. The fourth type of conversion is the percentage grouping. All modules are grouped into three groups based on the metrics values (normal values, high but still acceptable, inadmissible) after that the percentage of modules in each group are presented using pie-diagram as it is shown in figure 6.5. Such diagrams also can give an idea about distribution of values within the system and good complements the aggregated value.
Percentage grouping for LOC Mobile Client - 7.0
red: 62 36%
yellow : 32 8%
green:240 62%
Figure 6.5: Example of percentage grouping. Comparison of two versions of Mobile Client
66
The percentage grouping can also be done at more detailed level of granularity, namely in LOC. For that it is needed to recalculate all values in LOC and present the distribution of LOC, like it is shown in figure 8.6 (see p. 82). Such detailed description allows not only the analyzing the module distribution into normal, high and inadmissible areas, but also analyzing how many LOC do the inadmissible modules have. The aggregation and normalizing are also used to convert size-dependent metrics to quality-dependent metrics. Usually, average values of size-dependent metric are no more size-dependent. Nevertheless any conversion should be used very carefully, because such transformation can change the numerical properties and qualitative meaning of the metric.
Visualisation
The very popular method for the presentation of metric results is the Kiviat-Diagram. For the maintainabilitys dimensions such diagram is presented in figure 6.6. These diagrams are very pictorial and present information simple and intuitive. However this intuition can be misunderstood. Usually, all dimensions are on ordinal scale and any ratios between numbers by comparison of two systems are meaningless. But the numbers are graphically presented on each dimension using the rational intervals, what can lead to the situation, when the analyst will try to interpret the ratio between intervals on Kiviat-Diagram as the ratio between metrics on ordinal scale. This is the misunderstanding. The same holds for other column-diagrams as well.
67
The next drawback is hiding of the information: behind each dimension several metrics are hidden and after aggregation it is not clear, which metric caused the deviation. It is also not clear, which weights for each metric should be taken.
Figure 6.6: The Kiviat-diagram for the maintainability dimensions Consequently, the best possibility for presenting of the multidimensional information remains a table, where all used metrics with aggregated values are listed. Additionally some color marker accent indicators with high or not permissible values. By comparing multiple releases of one system it is possible to indicate trends of values with arrows. The example of such presentation is given in table 6.3. Table 6.3: Example of output table (Mobile Client 7.0 vs. 7.1)
The next possibility is the usage of a Business Warehousing system. Here different reports can be prepared and the history can be saved. However at this point of time it is not possible and can be planned for remote future. In this work the simple tables will be used. 68
It is interesting to inspect how values of metric are distributed between modules. The possibility to take a deeper look into essential of metric is the distribution graphic. One of them is depicted in figure 6.7. Here can be seen that most of methods have few LOC and only few methods are very large. Thus averaging of LOC can lead to underestimating.
LOC
176 151 126 101 76 51 26 1 1 51 101 151 201 251 301 351 401 451 # of Me thod
Figure 6.7: Distribution of LOC in Methods (small part of the project Mobile Client 7.1) The class blueprint graphically presents information in for of the chart, where simple metrics and trivial class diagrams are combined. Elements of a class are graphically represented as boxes. Their shape, size and color reflect semantic information. The two dimensions of the box are given by two selected metrics. Example is given in figure 6.8. Three-dimensional diagrams are also possible.
Figure 6.8: Example of visualization of class diagram. Other class graphs for the software metrics visualization are detailed discussed in [LANZ99]. For his work Lanza used the tool called CodeCrawler, which is a language independent reverse engineering tool, which combines the metrics and software visualization. See also http://www.iam.unibe.ch/~scg/Research/CodeCrawler/ . 69
7. Tools
In this chapter several tools for the software measurement and analysis are discussed. In the first section tools for ABAP are presented, in second section Java tools are introduced and in third section tools for automation of the GQM-approach and the integration of several tools in order to automate experiments are discussed.
ABAP-tools
Transaction SE28
The transaction SE28, which uses the package SCME, can be used to calculate metrics and visualize these in form of the hierarchy. Program SAPRCODD can be used for calculating of a set of metrics for a single program or for a set of the programs. In this case the mask * should be used as for the parameter object name. SE28 is not a standard tool, which can be found in each standard installation of the basis. It is available for example in BC0 and B20 systems. For parsing of the source code the ABAP command SCAN ABAP-SOURCE INTO TOKENS is used. However, only few metrics are implemented: LOC, MCC, Comparative complexity (calculates number of ORs and ANDs within IF statements), DIF Halstead Difficulty, Number of comments, etc. Unfortunately, it is impossible to show more than 4 metrics at the same time. Noteworthy, that this transaction only presents the results, the actually calculation is made by the job EU_PUT. Additionally, for the measurement some standard ABAP tools can be used. The transaction DECO and its components can be used for calculating of the FAN-IN and FAN-OUT metrics. See tables BUMF_DECO and TADIR, which contain entire structure of the system, the package SEWA and functions MRMY_ENVIR_CHECK, RS_EU_CROSSREF and REPOSITORY_ENVINRONMENT_CHECK. These functions return a list with where-used and uses objects. Before these functions can be used a special job should be started in order to fulfill the tables. Questions refer to Martin Runte or Andreas Borchardt.
Z_ASSESSMENT
This small report calculates metrics needed for the Maintainability Assessment project. They are: Total number of forms, Forms with > 150 lines, Percent with > 150 lines and Ratio comments to code. It is very small, but useful report. Please contact Alpesh Patel for further questions.
CheckMan, CodeInspector
These tools increase awareness of the quality in development by checking of code against a great many rules (audits). In opposite to CheckMan, CodeInspector provides also possibility to counting of some software attributes and hence allows metrics 70
implementation. However until now no metric was implemented in such way. ABAP Test Cockpit is successor for Code Inspector and CheckMan. It also suits to counting of metrics. But its productive start is only planned. For more details about CheckMan see [SAP03], for more details about CodeInspector see [SAP03b].
AUDITOR
This tool from third vendor (www.caseconsult.de) is supposed for audits and doesnt suit for the calculation of sums. Each ABAP program is processed separately and the result is stored in a HTML-file. Thus, if one wants to have information about the whole system, or some of the inter-modular metrics, these HTML-files should be additionally processed, in order to collect this information. The second problem is that AUDITOR can read only flat files, thus the entire system should be exported before the measurement. Additional audits can be implemented in form of C++ libraries. All listed problems make this tool awkward for current task. At this point of time the transaction SE28, report Z_ASSESSMENT and some standard tools can be used. Nevertheless, the ABAP environment has deficit in tools for the following metrics: CDEm, m, SMI, LCOM4, RFC, CBO, NAC and NDC. Other metrics can be measured directly or with a small work-around. For the metric CLON the tool CloneAnalyzer can be used.
Java-tools
Some tools for Java need the binary code or compilable source code because these have to parse the classes first and only then the metrics will be calculated. Some other tools dont need the compilable code. This allows calculating of metrics for the coding, which contains syntax errors or references on missed classes. This makes it possible to use such a tool for analyzing of a part of the application only. Nevertheless in this case results can be slightly garbled and the analyst should draw the conclusions very guardedly.
71
Empty strings and full-row comments are rejected. Noteworthy if several statements are written in one line, they are counted separately. This option has been chosen for the experiments The metric for CR is called TCR and counts the ratio of the documentation and implementation comments to the Total-LOC. Comments inside a file, which contains the class, but outside the class (before the class body) are not counted. The metric for WMC is called WMPC1. The metric for DIT is called DOIH. For NOM (number of methods) the metric NOO Number of operations (except constructors) is used. The tool provides several metrics for cohesion. LCOM is calculated using attributesmethods coupling only. The coupling between methods and inherited attributes are not considered. Thus no proper metric for cohesion was found. Borland Together allows processing the classes with missed imports, however it can lead to erroneous results. For example DIT will be equal to zero if the parent class is missed. The same can be said about many other inter-modular metrics. Thus, after the calculation the selectively results should be proved manually.
CloneAnalyzer
CloneAnalyzer is a free Eclipse plug-in for the software quality analysis. It allows finding, displaying and inspecting of clones, which are fragments of duplicated source code resulting from lack of proper reuse. Noteworthy this tool finds only exact clones. That means this tool is language independent and can be used both for the Java and for ABAP environment. Nevertheless, the CloneAnalyzer works with flat files only. Thus the corresponding part of the ABAP system should be exported and the file filter should be set in options. In options it is also possible to set the minimal size of the clones to be found: in context of this thesis clones with length of 15 LOC and larger were found. The founded clones can be saved in CSV-format with specifying of number of clones in clone set and the length of a clone. This data allows calculation of total LOC in the clones and thus Clonicity. For the more details see http://cloneanalyzer.sourceforge.net/ .
http://www.clarkware.com/software/JDepend.html and http://andrei.gmxhome.de/jdepend4eclipse/ . OptimalAdvisor 4.0 is a commercial tool, which graphically presents results for each package, whereas if package has sub-packages, those classes will be taken into account. Though well graphical representation no possibility for saving information is provided. For more details see http://javacentral.compuware.com/products/optimaladvisor/ . Code Analysis Plug-In (CAP) is a free plug-in for Eclipse. It provides handy interface for browsing classes and showing dependencies. Unfortunately, it is not possible to save results. This plug-in doesnt take into account standard java libraries. CodePro is a commercial plug-in for Eclipse and parallel with dependency analysis also provides few other metrics and other functionality. CodePro metrics take into account dependencies to standard java libraries. For more information see http://www.instantiations.com/codepro/ . The problem of many tools is the aggregation, all tools do it in different ways and the methods used for the aggregation are even not documented. For example CodePro just takes the value of the top package (for example com or org). Since this package usually has no responsibility, the D-metric will be equal to the abstractness, what is quite confusing. The best way to get the average value for D (or other metric) is to calculate it ourselves using XSLT.
JLin
SAP-internal tool JLin performs static tests. Possible applications include Identification of potential error sources Enforcing of code conventions Evaluation of metrics and statistics Enforcing of architectural patterns Monitoring In order to fulfill these requirements, JLin can be used in the following environments: As an Eclipse plug-in Within the SAP make process (which currently uses ant, in the near future the Component Build Server) via an API Nevertheless, only few metrics are implemented. For more details see [JLIN05].
DIT, NOM. JMetric also provides few analysis methods in the form of drill down and cross section tables, charts, and raw metrics text. For more details see http://www.it.swin.edu.au/projects/jmetric/ . Though the wide range of tools several metrics for Java should be additionally implemented: CDEm, m, SMI, LCOM4, NAC, NDC and NOD.
Figure 7.1: Architecture of the metric report generator In the experiments the trial version of Borland Together Developer 2006 for Eclipse, trial version of CodePro 4.2 and Clone Analyzer 0.0.2 were used. This selection is explained by a wide number of provided metrics and proper output format. However, on conditions that the analyzed project doesnt contain compiler errors, the free tools like Metric 1.3.6 or JMetric can be used in order to save costs. 74
8. Results
For empirical validation of the selected metrics several experiments were made. In this chapter the description of the experiments, their results and drawn conclusion are discussed.
Thorsten Himmel
ABAP
Package CRM_DNO Package ME
For assessment of the actually maintainability for selected project either process metrics or experts opinion are needed. The process metrics could be easily extracted from the system for the customer messages management. Examples are MTTM or Backlog Management Index (number of problem closed during the month / number of problem arrivals during the month). Further examples for the available process metrics can be found in [SAP05c]. However, for the new projects no information about the maintenance is available yet, for other projects it was a bureaucratic problem to get the data. Hence the alternative option using the experts opinion should be used. Very popular method for evaluation of software metrics is finding the correlation between experts estimation and automated calculated metrics. As statistical method the convergence or other similar methods can be used. However this works in case of enough estimated sources are available. Furthermore, it is desirable to have all
75
estimations made by one expert in order to ensure the uniformity of the evaluation. In context of the current experiment it is impossible. Hence for initial evaluation a simplified procedure is suggested. Several experiments will be made. In each experiment only two releases of one software component will be compared. Older release supposed to have improper design, which is improved in the last release. It is assumed that releases have near the same functionality and thus the same minimal complexity, to be achieved. Comparing of metric values for two releases should give an idea whether the selected metrics are robust enough to indicate the maintainability improvement. This methodology is simple, but powerful. One of the experts remarks, that he couldnt characterize any of provided examples as the good or bad maintainable one, but the ranking among examples is obvious. Because of lack of tools for ABAP only Java projects will participate in the experiments. However here short description of the selected ABAP projects is presented, which can be used for further researches. The first example comes from Richard Himmelsbach. Package CRM_DNO presents classical usage scenario of ABAP and OO-ABAP. The example can be found in the system TSL 001. This package contains several reports for DNO monitor and also includes used classes and functions. Advantages of design are: the clear structure and readability, the function encapsulation, modularity, exceptions handling, comments and naming conventions and customizability because of used parameters. The second example was suggested by Joachim Seidel and includes several objects from the package ME. The following objects are worthy of notice, because these have being continuously changed by many notes in different releases; the history of changes in release 4.6C is the longest one: the function module ME_READ_LAST_GR in the function group EINR the function module ME_CONFIRMATION_MAINTAIN_AVIS and include LEINBF0Q in the function group EINB the includes LMEDRUCKF17, LMEDRUCKF06 and the function module ME_READ_PO_FOR_PRINTING in the function group MEDRUCK the program SAPMM06E and include MM06EF0B_BUCHEN the function group MEPO Such high number of faults denotes a bad design of the earlier releases, however no opinion about the maintainability of the new version of ME is available.
Experiments
The overview of all analyzed projects can be found in table 8.2. The arrows in the cells of the newer versions indicates the improvement ( ) or degradation ( ). It can be seen, that most pairs of metrics (old-new version) show improvement. Because of lack of tools fulfilling of the following goals was not proved: Consistency, Maturity and Packaging. 76
Project: ObjMgr
400
37 8,8
ObjMgr
350
300
250
200
11 0,0
150
224 ,2
87 ,0
100
50 ,1
77 ,5
81, 0
82 ,4
83, 3
14 ,9
1 9, 4
0 LOC LOCm WMC RFC CBO LC CLON*10 CDEm * 100 TotalKLOC TotalNOC
Figure 8.1: Evolution of the project ObjMgr The measurement data for the project ObjMgr is presented in figure 8.1. Red columns (the left columns in each pair) present old version, yellow new. Two metrics on the right side are additional and represent the size of the system in Total-LOC and TotalNOC. The new version has twice more classes, but total amount of code in LOC rose insignificantly. It caused the reduction of the average number LOC in classes. But for all that the inter-modular metrics are also improved. Old version had the redundancy of the complexity. It is shown by the metrics WMC and CLON. In new version significant reduction of number of clones leads to decreasing of the intra-modular complexity. The 77
6,9
7,4
8,9
10, 0
50
20 ,45
24, 88
3 0, 1
54
111
metric WMC has greatly improved from 50,1 in old version to 19,4 in new version. Such large difference could be explained by the distribution of the complexity in twice larger number of classes. However total amount of complexity was also reduced in old version sum of WMC in all classes is equal 2700, in newer version about 2150. Assumed, that about 10% of complexity was decreased by reducing of clonicity and the residual difference is caused by proper design. Insignificant degradation is shown only by the metrics LC and CDEm. ObjMgr (new) is a new developed version of ObjMgr (old). During the development the design of the application was very simplified. This leaded to more handy usage of the API and therefore to the higher maintainability.
Project: SLDClient
The same can be said about the evolution of the SLDClient, whereas here new version provides additional new functionality. Probably new version seems to be more complex, because of the new features, however from viewpoint of the maintainability old version was very poor, because often the changes of one part caused faults in another part. Redundancy was also reduced in newer version. Noteworthy, the extensive usage of the patterns, in particular Visitor, leads to increasing of LOC in the classes of newer version.
400
350
300
250
182 70,0 82,5
200
268,3
3 28,9
SLD C lie nt
150
4 5, 3
35,4
20,0
Project: JLin/ATX
This project is special, because all metrics without exception show improvement of the maintainability and thus completely meet the experts opinion. 78
4,2
8,0
8,8
10, 0
50
25, 7
9,9
43, 5
4 8, 0 20
48,82 4
100
54,1
83, 0
146
In particular one can see, that the newer version has less Total-LOC, whereas it provides more functionality. Moreover, Total-NOO was slightly increased and clonicity was reduced, all this lead to decreasing of the average LOC and WMC in classes and methods. Noteworthy, that though the increasing of the total number of classes, the intermodular metrics (RFC, CBO) show also improvement. See figure 8.3 for the overview.
400
378 408
JLin/ATX
350 300
250
200
132,9 108,1
150
72,0
9,5 7,7
15,9 13,8
0 LOC LOCm WMC RFC CBO LC CLON*10 CDEm * 100 TotalKLOC TotalNOC
Algorithm Complexity
The algorithm complexity is relative high, but still acceptable in both releases. The average WMC has been slightly reduced in 7.1: Ave-WMC(7.0) = 16,754 Ave-WMC(7.1) = 16,592
7,0 6,4
22,0
50
33,0 30,2
54,0 40,0
50,24 9 44,11 6
100
85,9 84,2
79
400
Mobile Client
350 300
250
150
157, 9 153,0
87, 1 8 8, 6
6,3 6,2
16, 8 16, 6
50
24, 3 23, 7
0 LOC LOCm WMC RFC CBO LC CLON*10 CDEm * 100 TotalKLOC TotalNOC
Selfdescriptiveness
In common 7.1 is better commented than 7.0. LC(7.0) = 35. This means 65 comments for each 100 LOC. LC(7.1) = 31. This means 69 comment lines for each 100 LOC. However, the manual examination shows, that many comments are automatically generated (JavaDoc), not very meaningful or are just the code commented out. Noteworthy, that 7.1 has more interfaces, abstract classes and data-container classes, which have very high comments/code ratio. In this case such small changes of LC are difficult to interpret. This metric deserves attention only in case of inadmissible values.
Modularity
Most important metrics for modularity are presented in table 8.3. 7.1 is twice so large than 7.0 based on size on disk, number of classes and LOC. 7.1 has on average smaller classes (153 lines of code). Typical class of 7.1 is smaller than typical class of 7.0 (see medians in table 8.3). This means that in 7.1 only few classes are large, 7.0 has more large classes. See appendix D with lists of complex classes and complex methods for more details. 7.1 has slightly smaller methods as well. Only 22 methods are larger than 80 LOC. In 7.0 40 methods have more than 80 LOC. Based on these facts the modularization of 7.1 is better.
80
4,7 5,1
27, 00 0 59, 66
3 5, 0 31, 0
3 0, 0
60, 0
100
171
200
3 90
Table 8.3: Modularity analysis Size on disk NOC (Number Of Classes) NOC (Number of all Classes, including internal) LOC (Lines Of Code) Median-LOC Ave-LOC Ave-LOCm (of methods) 7.0 772 159 139 171 27 000 92 157,89 6,33 7.1 1 725 178 326 390 59 664 83,5 152,98 6,22
Structuredness
For coupling two metrics were selected: RFC and CBO. 7.1 has smaller RFC but bigger CBO. For 7.0 this means that on average classes are coupled to less number of other classes, but use these relations more actively. For qualitative statement about the coupling the experts input is required. Ave-RFC(7.0) = 24,292 Ave-RFC(7.1) = 23,667 Ave-CBO(7.0) = 4,737 Ave-CBO(7.1) = 5,15 For assessing of the cohesiveness metric LCOM4 is proposed. However, Borland Together is not able to calculate cohesion in form it is supposed, thus four other available metrics were analyzed, average values for which are presented in table 8.4. The arrows in the cells of the newer versions indicates the improvement ( ) or degradation ( ). Table 8.4: Comparison of different cohesion metrics for all classes Component LCOM1 LCOM2 LCOM3 Mobile Client 7.0 31,94 39,74 34,33 Mobile Client 7.1 51,35 38,91 30,36
Thus based on 3 of 4 metrics 7.1 is appreciably more cohesive. More detailed analysis has showed that many classes, which have a role of data storage, are not cohesive because of get and set methods. Nevertheless, such classes dont have to be cohesive in sense of intersection of attribute usage. Noteworthy that 7.1 has more data containers than 7.0: 148 of 390 classes against only 52 of 171. Cohesion in 7.1 is relative high, what is the indicator of good design. However, in order to be able to give some suggestion for improvement, the new implementation of cohesion metric (LCOM4) is needed. For assessing of inheritance usage metrics NAC (Number of Ascendant Classes) and NDC (Number of Descendant Classes) are suggested. However Together doesnt calculate these metrics, and the metrics DIT (Depth in Inheritance Tree) and NOC (Number Of direct Children) are used instead. 81
Since these metrics are intended for a single class, any aggregation will lead to uninterpretable results, thus these metrics are used in form of audits. Percentage grouping is used in order to aggregate values for the entire system. Any other method of aggregation needs additional human input.
Classes with DIT > 3 (7.0)
red: 13 8%
Figure 8.6: Percentage of LOC in classes with more than 3 parents. Figure 8.5 shows percentage of classes with more than 3 parents, the list of such classes for 7.1 can be found in appendix D. However, the check in more detailed level of granularity, namely in LOC, shows, that number of LOC in classes with DIT>3 has been slightly increased from 8% to 9%. It is shown in figure 8.6. 7.0 has higher IF and so less stand-alone classes: IF (7.0) = 0,77 IF (7.1) = 0,68 The list of the complex stand-alone classes, which probably should be broken into small hierarchies, can be found in appendix D. 82
Clonicity
7.0 has 4 clone sets with totally 1636 LOC. Thus Clonicity is 6,1% 7.1 has 8 clone sets with totally 1850 LOC. Thus Clonicity is 3,1% Both components have quite small clonicity, the difference is insignificant. The list with clones for 7.1 can be found in appendix D.
Entropy
An archive compression rate can be used as very primitive indicator of entropy average amount of information within the text. Based on compression rate both components have approximately equal entropy of source code. Import-based CDEm shows that 7.1 has slightly higher entropy of the package names usage and thus is expected to require more cognitive loading from the maintainer. Nevertheless 7.0 has only 21 packages, while 7.1 has 44 packages, thus 7.1 has much more possibilities for composing of import-section. This fact together with improper calculation of the CDEm demands the additional research. Table 8.5: Comparison of compression coefficients Size on disk Size of ZIP-archive Coefficient of compression (ZIP) Normalized CDEm (import-based) CDEm (import-based) 7.0 772 159 216 556 0,280 0,85 4,48 7.1 1 725 178 501 998 0,291 0,89 6,63
Value
7.1 has slightly smaller average values for the metrics LOC and WMC, which affect the number of test-cases. Thus it is expected that 7.1 should have on average slightly smaller number of test-cases needed pro class.
Simplicity
7.1 has slightly smaller average values for the metrics LOC and RFC, which affect the simplicity of the test-cases. Thus it is expected that 7.1 should have on average slightly easier test-cases.
Summary
Based on the metrics investigation the Mobile Client 7.1 has less complexity than Mobile Client 7.0. In most investigated aspects 7.1 is more maintainable than 7.0. This conclusion is based on the analysis of the set of the selected metrics. Only CDEm (entropy-metric) and CBO (Coupling Between Objects) have shown degradation of newer release. Here the additional research is needed. Under condition 83
that 7.1 is twice so large than 7.0 and also provides new functionality, such small degradation is insignificant and expected.
Measurement Procedure
For reader, who is interested in repeating of this experiment with his own data, the following instruction might be helpful. For the measurement is recommendable to use Borland Together Developer 2006 for Eclipse, because this tool provides most of needed metrics and allows saving of the results in XML-format. It is possible to use trial version in order to save costs. Visit http://www.borland.com/us/products/together/ in order to download and install the tool. After the installation a new entry Quality Assurance in context menu for project should appear in your Eclipse platform. Go to the Java perspective and choose Quality Assurance->Metrics from context menu for your project. Before the calculation of the metrics can start, some options should be set. Choose Option and select the following metrics from the list: LOC, NOO, LCOM1, LCOM2, LCOM3, TCC, RFC, WMPC1 (in current work this metric is called WMC), CBO, DOIH (DIT), NOCC (NOC), TCR (CR). Additional settings for each metric can be set. However it is recommendable to leave all settings by default value, because Together calculates average values not properly. Confirm your choice by OK and start calculation. After that the measurement data will appear in hierarchical form in a new Eclipse view called Metric. For the analysis it is useful to present the metric data in tabular form like is shown in table 6.3. This could be done automatically using XSLT transformation. Before automatic filling up, export the measurement data from Together to a file in XML85
format. Put your XML-file into directory with XSLT files and start XSLT transformation using the sequence of two following commands: java XslTransformator // Java-class for transformation myproject.xml // your saved XML-file MMtogether2xml_average.xsl // XSLT adapter for Together MetricXml.xml // temporary file java XslTransformator // Java-class for transformation MMGQM.xml // GQM quality model in XML format MMGQM2table.xsl // XSLT for the output table MetricTable.html // Output file Make sure that your CLASSPATH includes a link to the JAXP-compliant XSLT processor, for example xerces. The output table will be saved in the file you selected as third parameter in the second transformation (in given example it is MetricTable.html). After this procedure several metrics are still missing in the output table. To insert these values some other tools are suggested. One of them is CloneAnalyzer, discussed in chapter 7. Download and install this Eclipse Plug-In, the new menu element called CloneAnalyzer should appear. Select CloneAnalyzer -> Build and new Eclipse view CloneTreeViewer should come in sight, where all clones are recorded with indication of the size of the clone and source file, where this clone was found. Noteworthy that CloneAnalyzer search in all open projects, thus please close unnecessary projects. Unfortunately, this tool does not provide metric CLON, and number of LOC in all clones should be calculated manually and divided by Total-LOC, in order to get the aggregated value. After that metric CLON can be included into the output table. For the metric CDEm special Java-class was developed. Select next command for the start: java com.sap.olek.EntropyImport "C:\Program Files\workspace\objmgr" // Path to directory with source code import_objmgr_old.txt // Output file After execution of this command two files are generated: output file and statistic file. At the end of the output file find value for Norm. Entropy and put it into the output table for CDEm value. It is also possible to automatically prepare the metric-based audits using the command: java XslTransformator // Java-class for transformation myproject.xml // your saved XML-file MMtogether2xml_list_reports.xsl // XSLT for audits MetricTable.html // Output file 86
The output file presents HTML report with classes, which violate one of the following audits (example of this report for Mobile Client 7.1 is given in appendix D): The list of complex methods (LOC > 80) The list of classes with DIT > 3 The list of classes with NOC > 10 The list of complex stand-alone classes (WMC > 50) The list of large classes (LOC > 500) The used for the transformations files can be found in CD for this master thesis. It is also possible to use free tools in order to save costs. The most appropriate candidate is the tool Metrics (see corresponding section in chapter Tools). Nevertheless, the XSLT adapters for new tools should be implemented.
87
9. Conclusion
The experiments have shown that most of the selected and validated metrics can be used as reliable maintainability indicators. Nevertheless, many metrics provided by available tools have other implementation as it was initially supposed. Such deflection is acceptable for the initial examination, however for the next usage the metric implementations should be corrected. After the analysis of the ability to assess the maintainability, the following groups of metrics have been distinguished:
88
4. Metrics that have not been participating in the experiments, but are supposed to be good indicators of the maintainability
Metrics: m (Entropy), LCOM (Lack of Cohesion Of Methods), NOD (Number Of Developers), FAN-IN, FAN-OUT and SMI (Software Maturity Index) have not been participating in the experiments because of lack of tools or data. These metrics can only restrictedly be admitted into the quality model. The result of this thesis is a deeper knowledge about the maintainability, which is essentially formalized in form of the quality model. Based on this model it is possible to understand the substance of maintainability and also measure the most important indicators of it. Based on the theoretical speculation and the experiments presented above, the following conclusions can be made: It is possible to describe the different maintainability related aspects of the software using metrics-based indicators. Several metrics chosen in this study appear to be useful in predicting the maintainability Since only limited aggregation is possible and the output of this research is a list of the maintainability indicators, only a semi-automated process is possible. The metrics can provide only the description of the system. The final decision should be made by human Because thousands of metrics exist, it should not be a problem to find the appropriate metrics among them. However, during this thesis two new metrics were suggested Metrics have different levels of granularity: some of them describe a single class or method, others can characterize the entire system. Since final indicators have to describe the system, all metric values should be aggregated Most problems occur when aggregating the data in order to characterize the entire system. It is caused by the data garbling, information hiding or inadmissible operations, since the metrics are good indicators of a single module. Therefore the aggregation should be done very carefully Since a poor quality of code can be found much more easily than a good designed code, metrics can be used in form of audits. Metric-based audits are good supporter for code reviews In this work the admissible values for each practically investigated metric were determined. On the other hand, these values depend on the used programming paradigm and language Metrics are able to show the trend (improvement or degradation) within the list of releases of one component Metrics are of limited use to compare different components Nevertheless, the metrics are just one possibility to describe certain properties of the software. Interpretation whether such description means good or bad maintainable code depends on the design and goals. The final conclusion about the maintainability of software can only be made by human. 89
10. Outlook
The current thesis gives an idea about the abilities of the metrics from viewpoint of the maintainability assessment. Parallel with the theoretical introduction into the software measurement, also a practical example of usage has been produced. Hence the results of this work can be already practically used. Nevertheless, before the successful usage several open issues should be settled. The most important issue is finding the appropriate tool for the measurement. In chapter 7 several tools are discussed, but none of them provides all required metrics. Several metrics for Java should be additionally implemented. The ABAP environment has even more deficit in the tool support. Also several metrics were not validated because of a lack of data or tools. The author believes that all selected metrics are reliable, but to ensure results the additional experiments should be made. The next important step is to research how the usage of patterns impacts values of metrics. For example, the usage of the pattern Visitor can lead to increasing of LOC in the class-Visitor and it would not be unwanted. Exact impacts of different patterns and their consequences for the metrics have to be researched. In [GARZ02] some metrics for patterns are discussed. In [KHOS04] the assessment of 23 patterns from viewpoint of the simplicity, modularity, understandability etc is provided. Khosravi argues that the patterns should be used very carefully, for example the pattern Proxy makes debugging much harder and increases the number of classes. In the second part of the outlook, several interesting approaches are mentioned, which make possible the further expansion of the metric-based quality mechanisms. One of the problems of integrating the measurement tools and automating the measurement procedure is the handling in heterogeneous and changing environment, because during the lifecycle various tools can be used. In [AUER02] a simple metric data exchange format and a data exchange protocol to communicate the metric data is proposed. This approach aims filling the gap between frameworks and tools by offering detailed instructions on how to implement metric data collection, yet an open and simple standard, which allows easy integration of existing tools and their data handling processes. A flexible, easy and fast implementation of new metrics is also important. In [MARI] a Static Analysis Interrogative Language is introduced. This is the language dedicated to the aforementioned type of static analyses of source code and allows the implementation of various metrics in a homogenous manner. After the parsing of the source code, the simple but powerful queries can be written in order to obtain information about certain properties of the code and calculate the metrics. In this thesis the quality model aims only at collection and presentation of metrics data to expert who should make decisions about the maintainability of the product based on measurement data and their experiences. However, the processing of the metrics data can be fully automated as well by using for example Fuzzy Logic or Neuronal Networks. In 90
[THWI] Thwin uses the neuronal networks for proving the ability of object-oriented metrics to predict the number of software defects and the maintenance effort. The next approach is not metric-based, but nonetheless very interesting and useful. In [GALL02] Gall uses CVS history for detecting of not obvious logical relations between classes. The classes, which are often changed together by a single change request, may have a logical relation not necessarily reflected in physical relations. After the study of the change history, the list of the classes, which were often changed together by one change request, is generated and hints about such relations could be given to the maintainer.
91
References
[ABRA04] Alain Abran, Miguel Lopez, Naji Habra, An Analysis of the McCabe Cyclomatic Complexity Number , in 14th International Workshop on Software Measurement (IWSM) IWSM-Metrikon 2004, Konigs Wusterhausen , Magdeburg, Germany , Shaker-Verlag , 2004 , pp. 391-405 . [ABRA04b] Alain Abran, Olga Ormandjieva, Manar Abu Talib, Information Theorybased Functional Complexity Measures and Functional Size with COSMIC-FFP, 2004 [AHN03] Yunsik Ahn, Jungseok Suh, Seungryeol Kim and Hyunsoo Kim, The software maintenance project effort estimation model based on function points, J. Softw. Maint. Evol.: Res. Pract. 2003; 15:7185 [ALTU06] Yusuf Altunel, Component-Based Software Engineering, Chapter 9: Component-Based SW Testing, Lecture Notes, 26.01.2006 [AUER02] Martin Auer, Measuring the Whole Software Process: A Simple Metric Data Exchange Format and Protocol, 2002 [BADR03] Linda Badri and Mourad Badri, A New Class Cohesion Criterion: An empirical study on several systems, 7th ECOOP Workshop on Quantitative Approaches in Object-Oriented Software Engineering, (QAOOSE'2003), July 22nd, 2003 [BASI94] Victor R. Basili, Gianluigi Caldiera, H. Dieter Rombach, The Goal Question Metric Approach, 1994 [BASI95] Victor R. Basili, Lionel Briand and Walclio L. Melo, A validation of objectoriented design metrics as quality indicators, Technical Report, Univ. of Maryland, Dep. of Computer Science, College Park, MD, 20742 USA. April 1995. [BIEM94] James M. Bieman and Linda M. Ott, Measuring Functional Cohesion, IEEE Transactions on software engineering, Vol. 20, No. 8. August 1994, p.p. 644 657 [BRUN04] Magiel Bruntink, Arie van Deursen, Predicting Class Testability using Object-Oriented Metrics, 2004 [CART03] Tom Carter, An introduction to information theory and entropy, Complex Systems Summer School, June, 2003 [CHID93] Shyam R. Chidamber, Chris F. Kemerer, A metrics suite for object-oriented design, M.I.T. Sloan School of Management, Revised December 1993 [DARC05] David P. Darcy, Chris F. Kemerer, Sandra A. Slaughter, The Structural Complexity of Software: Testing the Interaction of Coupling and Cohesion, January 22, 2005 [DOSP03] Jana Dospisil, Measuring Code Complexity in Projects Designed with Aspect/J, Informing Science InSITE - Where Parallels Intersect June 2003 92
[DUMK96] Reiner R. Dumke, Erik Foltin, Metrics-based Evaluation of Object-Oriented Software Development Methods, 1996 [ETZK97] Letha Etzkorn, Carl Davis, and Wei Li, "A Statistical Comparison of Various Definitions of the LCOM Metric," Technical Report TR-UAH-CS-1997-02, Computer Science Dept., Univ. Alabama in Huntsville, 1997 [ETZK99] Letha Etzkorn, Jagdish Bansiya, and Carl Davis, Design and Code Complexity Metrics for OO Classes, Journal of Object Oriented Programming 1999; 12(1):3540 [ETZK02] Letha H. Etzkorn, Sampson Gholston and William E. Hughes, A semantic entropy metric, J. Softw. Maint. Evol.: Res. Pract. 2002; 14:293310 [FELD02] David Feldman, A Brief Introduction to: Information Theory, Excess Entropy and Computational Mechanics, April 1998 (Revised October 2002) [GALL02] Harald Gall, Mehdi Jazayeri and Jacek Krajewski, CVS Release History Data for Detecting Logical Couplings, Technical University of Vienna, Distributed Systems Group, Proceedings of the Sixth International Workshop on Principles of Software Evolution (IWPSE03) [GARZ02] Javier Garzs and Mario Piattini, Analyzability and Changeability in Design Patterns, SugarloafPLoP 2002 Conference [HASS03] Ahmed E. Hassan and Richard C. Holt, The Chaos of Software Development, 2003 [JLIN05] SAP-intern documentation. See in SAPnet http://bis.wdf.sap.corp:1080/twiki/bin/view/Techdev/JavaTestTools -> JLin [KABA] Hind Kabaili, Rudolf K. Keller, Franois Lustman and Guy Saint-Denis, Class Cohesion Revisited: An Empirical Study on Industrial Systems [KAJK] Mira Kajko-Mattsson, Software Evolution and Maintenance [KELL01] Horst Keller, Sascha Krger, ABAP Objects, Einfhrung in die SAPProgrammiereung, 2001, SAP PRESS [KHOS04] Khashayar Khosravi, Yann-Gael Gueheneuc, A Quality Model for Design Patterns, Summer 2004 [LAKS99] Anuradha Lakshminarayana and Timothy S. Newman, "Principal Component Analysis of Lack of Cohesion in Methods (LCOM) metrics," Technical Report TR-UAH-CS-1999-01, Computer Science, Dept., Univ. Alabama in Huntsville, 1999 [LANZ99] Michele Lanza, Combining Metrics and Graphs for Object Oriented Reverse Engineering, 1999 [LAVA00] Luigi Lavazza, Providing Automated Support for the GQM Measurement Process, IEEE SOFTWARE May/June 2000, p.p. 56-62
93
[MARI] Cristina Marinescu, Radu Marinescu, Tudor Grba, A Dedicated Language for Object-Oriented Design Analyses [MART95] Robert Martin: OO Design Quality Metrics - An Analysis of Dependencies, August 14, 1994 (revised June 20, 1995) [MISR03] SUBHAS C. MISRA, VIRENDRAKUMAR C. BHAVSAR, Measures of Software System Difficulty, SQP VOL. 5, NO. 4/2003, ASQ [MUST05] K. Mustafa and R. A. Khan, Quality Metric Development Framework (qMDF), Journal of Computer Science 1 (3): 437-444, 2005 [NAND99] Jagadeesh Nandigam, Arun Lakhotia and Claude G. Cech, Experimental Evaluation of Agreement among Programmers in Applying the Rules of Cohesion, Journal of Software Maintenance: Research and Practice, J. Softw. Maint: Res. Pract. 11, 3553 (1999) [PARK96] Robert E. Park, Wolfhart B. Goethert, William A. Florac, Goal-Driven Software Measurement A Guidebook, August 1996, Software Engineering Institute [PIAT] Mario Piattini and Antonio Martnez, Measuring for Database Programs Maintainability [REIS] Ralf Reiing, Towards a Model for Object-Oriented Design Measurement [RIEG05] Matthias Rieger, Effective Clone Detection Without Language Barriers, Inauguraldissertation der Philosophisch-naturwissenschaftlichen Fakultat der Universitat Bern, 10.06.2005 [ROSE] Dr. Linda H. Rosenberg and Lawrence E. Hyatt, Software Quality Metrics for Object-Oriented Environments, NASA [RUTH] Ian Ruthven, Maintenance [RYSS] Filip Van Rysselberghe, Serge Demeyer, Evaluating Clone Detection Techniques [SAP03] Cneyt am, W. Hagen Thmmel, Philip J. Zhang, Essentials of CheckMan, SAP AG 2003 [SAP03b] Randolf Eilenberger and Andreas Simon Schmitt, Evaluating the Quality of Your ABAP Programs and Other Repository Objects with the Code Inspector, SAP Professional Journal, 2003 [SAP04] Product Innovation Lifecycle, From Ideas to Customer Value, Whitepaper Version 1.1, July 2004, Mat. Nr. 500 70 026 [SAP05] Dr. Eckart Spitzberg, Process Description: Quality Gates, Version 4.1 31.03.2005 [SAP05b] Pieter Bloemendaal, SAP Code Quality Management Newsflash - June 15, 2005
94
[SAP05c] Thomas Haertlein, Ulrich Weber, Neelakantan Padmanabhan, Horst Pax, Project Quality Indicators,23 May 2005 [SAP05d] Pieter Bloemendaal, Code Quality Management (CQM), 2005 SAP SI AG [SERO05] Gregory Seront, Miguel Lopez, Valerie Paulus, Naji Habra: On the Relationship between Cyclomatic Complexity and the Degree of Object Orientation, 2005 [SHEL02] Frederick T. Sheldon, Kshamta Jerath and Hong Chung, Metrics for maintainability of class inheritance hierarchies, J. Softw. Maint. Evol.: Res. Pract. 2002; 14:147160 [SNID01] Greg Snider, Measuring the Entropy of Large Software Systems, HP Laboratories Palo Alto, HPL-2001-221, September 10th, 2001 [SOLI99] Rini van Solingen and Egon Berghout, The Goal/Question/Metric Method: a practical guide for quality improvement of software development, 1999, McGraw-Hill Publishing Company, London [THWI] Mie Mie Thet Thwin, Tong-Seng Quah, Application of Neural Networks for Software Quality Prediction Using Object-Oriented Metrics [VELL] Paul Velleman and Leland Wilkinson, Nominal, Ordinal, Interval, and Ratio Typologies are Misleading [WELK97] Kurt D. Welker, Paul W. Oman And Gerald G. Atkinson, Development And Application Of An Automated Source Code Maintainability Index, Software Maintenance: Research And Practice, Vol. 9, 127159 (1997) [WEYU88] Weyuker, E. J., Evaluating Software Complexity Measures. IEEE Transactions on Software Engineering, Volume: 14, No. 9, pp. 1357 1365. 1988. [WOLL03] Bjrn Wolle, Analyze von ABAP- und Java-Anwendungen im Hinblick auf Software-Wartung, CC GmbH, Wiesbaden, Published in MetriKon 2003 SoftwareMessung in der Praxis [Yi04] Tong Yi, Fangjun Wu, Empirical Analysis of Entropy Distance Metric for UML Class Diagrams, ACM SIGSOFT Software Engineering Notes, Volume 29, Issue 5, September 2004 [ZUSE98] Horst Zuse: A Framework of Software Measurement, Walter de Gruyter, Berlin, 1998, 755 pages. ISBN: 3-11-015587-7.
95
Abschlieende Erklrung
Ich versichere hiermit, dass ich die vorliegende Masterarbeit selbstndig, ohne unzulssige Hilfe Dritter und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Die aus fremden Quellen direkt oder indirekt bernommenen Gedanken sind als solche kenntlich gemacht. Potsdam, den 23. Februar 2006
96
97
98
Clonicity (CL)
CLON Is coupling low enough? What is the number of modules changed per change cause? MCC WMC How many relation are between objects?
Maturity (MA)
Selfdescriptiveness (SD)
How many new, changed, deleted objects does the system have? Should developer analyze not relevant code?
LOC RFC
How high is degree of conformance of system to principles of maximal cohesion and minimal coupling? m
Legenda
LOC How affect parts on each other? LCOM Modularity (MO)
Goal (Internal Properties) Should developer understand other parts of the system?
Structuredness (ST)
Question
Intra-modular metrics How many other objects have to understand developer in order to completely comprehend given object? NAC RFC
Inter-modular metrics
Consistency (CO)
LOC
NOO
How many developers have toched an object? How often does developer face to new unknown object? NOD H
FANOUT
Additional metrics
Rejected metrics
</Question> </Question> <Question text="How often does developer face to new unknown object?"> <Metric name="" shortname="CDEm" /> </Question> </Goal> </Question> </Goal> <Goal name="Changeability"> <Question text="How easy is to change the SW?"> <Goal name="Structuredness"> <Question text="Should developer change other objects? Or check either these have to be corrected?"> <Question text="What is the number of modules changed per change cause?"> </Question> <Question text="Is coupling low enough?"> <Question text="How many relation are between objects?"> <Question text="How many global variables there are?"></Question> <Metric name="" shortname="NDC" /> <Metric name="" shortname="CBO" /> <Metric name="" shortname="FAN-IN" /> </Question></Question> <Question text="Should developer analyze not relevant code?"> <Metric name="" shortname="LCOM" /> </Question> <Question text="How high is degree of conformance of system to principles of maximal cohesion and minimal coupling?"> <Metric name="" shortname="m" /> </Question> </Question> </Goal> <Goal name="Modularity"> <Question text="Should developer change large chinks of code at a time?"> <Question text="Is the code sufficient divided in parts?"> <Metric name="" shortname="LOC" /> </Question> </Question> </Goal> <Goal name="Packaging"> <Question text="Are reusable elements isolated from non-reusable elements?"> <Metric name="" shortname="I" /> </Question> <Question text="How full are abstract classes used?"> <Metric name="" shortname="A" /> </Question> 101
<Metric name="" shortname="D" /> </Goal> </Question> </Goal> <Goal name="Testability"> <Question text="How easy is to test the SW?"> <Goal name="Value"> <Question text="How many test-cases should be changed/proved?"> <Metric name="" shortname="LOC" /> <Metric name="" shortname="WMC" /> <Metric name="" shortname="MCC" /> </Question> </Goal> <Goal name="Simplicity"> <Question text="How easy are test-cases to maintain?"> <Metric name="" shortname="FAN-OUT" /> <Metric name="" shortname="LOC" /> <Metric name="" shortname="RFC" /> </Question> </Goal> </Question> </Goal> <Goal name="Clonicity"> <Question text="Does the system have clones?"> <Metric name="" shortname="CLON" /> </Question> </Goal> <Goal name="Maturity"> <Question text="Should the SW be compared with original release because of changes?"> <Question text="How significant was the system changed?"> <Question text="How many new, changed, deleted objects does the system have?"> <Metric name="" shortname="SMI" /> </Question> </Question> </Question> </Goal> </Question> </Goal> <Additional> <Metric name="" shortname="TotalLOC" /> <Metric name="" shortname="TotalNOC" /> </Additional> </Model> 102
103
104
Number of classes with NOC > 10: 16 Total number of classes: 390
106