0 évaluation0% ont trouvé ce document utile (0 vote)
23 vues4 pages
Software developers in large projects work in complex information landscapes, and staying on top of all relevant software artifacts is challenging. As software systems often evolve for years, a high number of issue reports is typically managed during the lifetime of a system. Efficient management of incoming issue requires successful navigation of the information landscape. In our work, we address two important work tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team. CIA deals with identifying how source code changes affect the software system, a fundamental activity in safety-critical development. Our solution approach is to support navigation, both among development teams and software artifacts, based on information available in historical issue reports. We present how we apply techniques from machine learning and information retrieval to develop recommendation systems. Finally, we report intermediate results from two controlled experiments and an industrial case study.
Titre original
Embrace your issues: Compassing the software engineering landscape using bug reports
Software developers in large projects work in complex information landscapes, and staying on top of all relevant software artifacts is challenging. As software systems often evolve for years, a high number of issue reports is typically managed during the lifetime of a system. Efficient management of incoming issue requires successful navigation of the information landscape. In our work, we address two important work tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team. CIA deals with identifying how source code changes affect the software system, a fundamental activity in safety-critical development. Our solution approach is to support navigation, both among development teams and software artifacts, based on information available in historical issue reports. We present how we apply techniques from machine learning and information retrieval to develop recommendation systems. Finally, we report intermediate results from two controlled experiments and an industrial case study.
Software developers in large projects work in complex information landscapes, and staying on top of all relevant software artifacts is challenging. As software systems often evolve for years, a high number of issue reports is typically managed during the lifetime of a system. Efficient management of incoming issue requires successful navigation of the information landscape. In our work, we address two important work tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team. CIA deals with identifying how source code changes affect the software system, a fundamental activity in safety-critical development. Our solution approach is to support navigation, both among development teams and software artifacts, based on information available in historical issue reports. We present how we apply techniques from machine learning and information retrieval to develop recommendation systems. Finally, we report intermediate results from two controlled experiments and an industrial case study.
Markus Borg Dept. of Computer Science Lund University Lund, Sweden markus.borg@cs.lth.se ABSTRACT Software developers in large projects work in complex in- formation landscapes, and staying on top of all relevant software artifacts is challenging. As software systems often evolve for years, a high number of issue reports is typically managed during the lifetime of a system. Ecient man- agement of incoming issue requires successful navigation of the information landscape. In our work, we address two important work tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team. CIA deals with identifying how source code changes aect the software system, a fundamental activity in safety- critical development. Our solution approach is to support navigation, both among development teams and software artifacts, based on information available in historical issue reports. We present how we apply techniques from machine learning and information retrieval to develop recommenda- tion systems. Finally, we report intermediate results from two controlled experiments and an industrial case study. Categories and Subject Descriptors D.2.5 [Software Engineering]: Testing and Debugging Debugging aids, tracing; D.2.9 [Software Engineering]: ManagementProgramming teams Keywords issue management, recommendation systems, machine learn- ing, information retrieval 1. INTRODUCTION The information landscape in large software engineering projects is complex and ever-changing. Development teams collaborate using information repositories such as require- ments databases, test management systems, source code repos- itories, and general document management systems, in to- tal storing thousands of artifacts [9, 25, 17]. Often the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. interoperability between the repositories is poor, i.e., the repositories turn into isolated information silos with little transparency. An intrinsic strength of software is that it is easy to modify, but source code changes impact other soft- ware artifacts as well. Thus, both source code and other stored information is continuously created and updated in large projects [14], and staying on top of the information landscape can constitute a signicant challenge for both de- velopers and managers [26]. Information overload, i.e., projects characterized by a state where individuals do not have time or capacity to pro- cess all available information [16], threatens productivity in large software engineering projects. Quick and concise access to information is crucial in knowledge intensive work such as software development and evolution. If the project environment does not provide sucient support for naviga- tion and retrieval, substantial eort is wasted on locating relevant information [23]. In some projects, issue repositories constitute examples of information silos that contain challenging amounts of soft- ware engineering data. The continuous inow of issue re- ports, especially for public issue repositories [2], makes ac- tivities such as duplicate management, prioritization, and work allocation tedious and error-prone [12, 7, 22]. While some of the activities are supported in state-of-practice issue repositories (e.g., Bugzilla and HP Quality Center oer au- tomated duplicate detection), most processing of incoming issue reports is still manual. Previous research argues that the issue repository is a key collaborative hub in large software engineering projects. Cubranic et al. developed Hipikat, a recommendation sys- tem to help newcomers in open source communities navi- gate the existing information, and used the issue repository to build a project memory [15]. Anvik and Murphy pre- sented automated decision support for several activities in- volved in issue triaging, all based on information stored in issue repositories [4]. We also view the issue repository as a key collaborative hub, but increase the granularity further by considering each individual issue report as an important juncture in the in- formation landscape. We have previously shown that issue reports can connect software artifacts that are stored in sep- arate databases [10], i.e., issue reports are a way to break information silos. In software engineering contexts where the change management process is rigid, every corrective change originating from xing an issue report must be documented. As such, the trace links from issue reports to artifacts in various repositories turn into trails in the information land- < p r e p r i n t > scape, created by past engineers as part of their everyday software development. In line with previous work, we apply Machine Learning (ML) to detect patterns in the inow of issue reports. As ML in general performs better the more data that are avail- able [5], we aim to harness the daunting inow of issue re- ports to assist navigation in the software engineering land- scape. Early results indeed show that our solutions improve as the number of issue reports grows, e.g., in issue assign- ment [21] and prioritization [24]. The rest of this paper presents how we support two work tasks in issue manage- ment: Issue Assignment (IA) and Change Impact Analysis (CIA). 2. PROBLEMANDSOLUTIONAPPROACH Our work particularly addresses IA and CIA, as our indus- trial partners stressed the importance of these two activities. 2.1 Automated IA using Ensemble Learning The initial decision in issue triaging is typically to de- cide which development team should investigate a certain issue report, an activity referred to as IA. Several studies report that manual bug assignment is labor-intensive and error-prone [6, 19, 8], resulting in bug tossing (i.e., reas- signing issue reports to other developers) and delayed issue corrections. We support IA using supervised ML. In line with previous work, we train a classier on the textual content in issue repositories [3, 19, 1]. However, while previous work has applied individual classiers, we combine several classiers in an ensemble learner using Stacked Generalization (SG). SG is a versatile ensemble learner able to combine classiers of dierent types, and it was for example heavily used in the winning contribution of the well-known Netix Prize [27]. Previous research on IA has primarily targeted Open Source Software (OSS) projects. OSS development is indeed inter- esting to study, but it is often treated as the default software engineering context. In fact, it diers in several aspects such as development processes, team organization and developer incentives. As presented in Section 3, we instead evaluate our approach in proprietary issue management contexts. 2.2 A Recommendation System for CIA When the issue triager has concluded that a corrective x is required for an issue report, a CIA is required. Especially in safety-critical development contexts, CIA is a formal anal- ysis that must be completed before any changes to the source code is allowed [18]. Several researchers have studied CIA on the source code level [13]. However, in many proprietary development contexts, it is critical to also determine impact on other types of artifacts, e.g., requirements, design speci- cations, and test case descriptions. Our approach to support CIA is to develop a Recommen- dation System, i.e., a software application that provides information items estimated to be valuable for a software engineering task in a given context [26]. Our solution, Imp- Rec, relies on mining a knowledge base, a network of artifacts and trace links, from previous CIA reports in an issue repos- itory (similar to the project memory in Hipikat [15]). Then we use Apache Lucene 1 , a state-of-the-art OSS search en- gine library, to index the natural language content of all 1 lucene.apache.org issue reports. Also, we calculate the relative importance of the artifacts in the knowledge base using network centrality measures. ImpRec rst identies a set of potentially impacted ar- tifacts, and then orders them using a ranking function (cf. Figure 1). When a CIA for a new issue report is needed, the tool uses traditional Information Retrieval (IR) through Lucene, to locate textually similar issue reports in the knowl- edge base. Starting from the similar issue reports, the start- ing points, ImpRec conducts breadth-rst searches to iden- tify additional candidates. Finally, all impact candidates are ranked based on textual similarities, distances in the knowl- edge base, and network centralities. Figure 1: Overview of impact recommendations in ImpRec. 1) Starting points in knowledge base iden- tied using IR. 2) Breadth-rst searches follow trace links to locate previous impact. 3) The impact can- didates are ranked using network measures and tex- tual similarity. 3. EVALUATION METHOD Our idea is to evaluate automated support for issue man- agement in two proprietary organizations. The rst com- pany, Comp Telecom, develops telecom infrastructure and employs thousands of software engineers. Allocating re- sources is a major challenge, and bug tossing is an explicit problem. The second company, Comp Auto, develops safety- critical solutions for the process automation industry. The development process follows safety standards mandating rig- orous process requirements on traceability. CIA is an im- portant work task for the developers, but it requires much eort. We evaluate automated IA in a controlled experiment us- ing ve proprietary datasets. Four datasets originate from dierent projects at Comp Telecom, and one dataset orig- inates from Comp Auto. In total more than 50,000 issue reports are used in the experiment. We train classiers on dierent subsets of historical issue reports, and test our ap- proach on more recent issue reports. Initial evaluations of ImpRec were also conducted in a controlled in-vitro setting. We extracted 27,000 issue reports from Comp Auto, constituting 12 years of development of an industrial automation system. We mined a knowledge base from the previous CIA reports, in total about 5,000 reports, and compared the ImpRec output to the true im- pact previously reported. As ImpRec has several parameters that need to be congured, and as automated solutions in software engineering typically are highly dependent on the dataset [11], we aim to publish guidelines on how to sys- < p r e p r i n t > tematically tune tools such as ImpRec as an outcome of the in-vitro evaluation. ImpRec is also evaluated in an in-vivo setting. We are cur- rently conducting an industrial case study at Comp Auto in Sweden (Case Sweden). We have selected one development team that performs frequent CIAs as part of their work. Four team members with dierent development roles in- stalled ImpRec on their local machines in March 2014. The team members are encouraged to use ImpRec when they perform CIAs, and to report their experiences from using the tool. As the developers use ImpRec, all user actions are collected. Before the tool was deployed, we conducted seven interviews to better understand the current CIA process. Moreover, in August 2014 we will deploy ImpRec in a team at the Comp Auto development site in Bangalore, India to enable a replication of the case study in another part of the organization (Case India). While the two cases under study will be dierent, the involved developers work on the same automation system and use the same issue repository. Thus, the required reconguration of ImpRec will be minimal. 4. INITIAL RESULTS Our controlled experiment of automated IA shows that combining classiers under SG consistently outperforms in- dividual classiers with regard to prediction accuracy [20]. Furthermore, our results conrm that it is worthwhile to strive for a diverse set of individual classiers in the en- semble, consistent with recommendations in the general ML research eld. Our approach reaches prediction accuracies of 50% to 90% for the ve dierent datasets, in line with the prediction accuracy of the current manual processes. The learning curves for all ve datasets display similar behaviour; IA for all datasets benets from more training data, but reaches dierent maximum prediction accuracy. However, as all learning curves start to atten out at about the same size of the training set, we present an empirically based rule of thumb: when training a classier for automated IA, aim for at least 2,000 issue reports in the training set. We rst evaluated ImpRec in a controlled setting, and ex- plored how to congure the involved parameters. Our initial results suggest that using multiple starting points (>15) in the knowledge base, and considering impacted artifacts sev- eral trace links away (>5), increases the prediction accuracy. However, considering a high number of impact candidates requires a good ranking function. In our ranking function, the most important features are the network centrality, and the textual similarity between the newly submitted issue re- port and the historical issue reports used as starting points. Using feasible feature weighting, ImpRec predicts more than 30% of the previously reported impact among the rst 5 rec- ommendations, and more than 40% among the rst 10 rec- ommendations. Some recommendations beyond the rst 10 tend to be correct, but we typically do not value candidates far down the list, as we suspect that developers are unlikely to browse too long lists of potentially impacted artifacts. A preliminary analysis of the interviews conducted in Case Sweden indicates that developers see the CIA work task as a rather negative necessity, sometimes conducted just to com- ply with the safety process. The interviewees conrm that it is a manual time-consuming activity, and highlight two fundamental challenges: 1) recognizing the value of formal CIA, and 2) feeling condent in the answers provided in a CIA. Furthermore, one developer claims that the most di- cult artifacts to assess for potential impact are the individual requirements rather than the source code. The interviews thus suggest that our work addresses important challenges. First, using ImpRec could motivate developers to conduct high-quality CIA, as the information will be reused to kick- start CIAs in the future [10]. Second, ImpRec can be used to verify a manual CIA as a way to boost condence. Fi- nally, ImpRec supports recommending potentially impacted requirements, as well as other non-code software artifacts. Since ImpRec was deployed in Case Sweden, the collected log les reveal that the four developers have used ImpRec to search for impacted artifacts 23 times. The case study has so far been running for 11 weeks, i.e., 44 man-weeks, showing that the developers on average conducted about one IA ev- ery second week (assuming they followed our instructions). While this number is lower than what we hoped for, it is in line with the expectations expressed by the developers dur- ing the interviews. During this rst part of the case study, ImpRec has recommended information considered valuable to the developers in 37.5% of the use cases. As the data col- lected from the tool are rich, containing both ranking output and time stamps, we plan to perform a deeper analysis when user data has been collected also from Case India. 5. CONCLUSION Software developers in large projects must navigate com- plex information spaces as part of their everyday work. One activity that forces developers to locate relevant information is issue management, i.e., resolving incoming issue reports. While the volume of issue reports in a project can be daunt- ing, we suggest harnessing the intrinsic navigational value of historical issue reports. IA is an early manual step in issue management. We train an ensemble learner on previous issues, and use the identied patterns to automate team assignment. Our results are in line with the current manual process, i.e., our automated so- lution selects the correct team for about 50-80% of the issue reports. Thus, the accuracy of automated team assignment is not higher, but it is much faster as the system can sug- gest team assignments for newly submitted issue reports in an instant. CIA requires a high degree of information seeking, both in the source code and the project documentation. Ana- lyzing impact prior to changing source code is mandated by development processes in safety-critical domains, but it is a tedious and error-prone activity. We present a recom- mendation system that supports CIA for changes caused by correcting reported issues. The system, ImpRec, builds a knowledge base by extracting previously reported impact originating from the corrective xes of old issue reports. An industrial case study is ongoing, but initial results from an in-vitro evaluation indicate that more than 40% of thetrue impact is identied among the 10 rst recommendations. Future work includes completing the industrial case study on CIA at an industrial partner in Sweden and conducting post-mortem interviews. Before that however, we plan to initiate a replication of the case study in India, to enable additional feedback from another set of developers. Finally, we hope to explore how issue reports can be used to pro- vide decision support for other navigational challenges in the software engineering context, such as fault localization and analysis of feedback from end-users. < p r e p r i n t > Acknowledgments This work was funded by the Industrial Excellence Center EASE Embedded Applications Software Engineering 2 . 6. REFERENCES [1] M. Alenezi, K. Magel, and S. Banitaan. Ecient bug triaging using text mining. Journal of Software, 8(9), 2013. [2] J. Anvik, L. Hiew, and G. Murphy. Coping with an open bug repository. In Proc. of the 2005 OOPSLA workshop on Eclipse technology eXchange, pages 3539, 2005. [3] J. Anvik, L. Hiew, and G. C. Murphy. Who should x this bug? In Proc. of the 28th International Conference on Software Engineering, pages 361370, 2006. [4] J. Anvik and G. Murphy. Reducing the eort of bug report triage: Recommenders for development-oriented decisions. ACM Trans. Softw. Eng. Methodol., 20(3):135, 2011. [5] M. Banko and E. Brill. Scaling to very very large corpora for natural language disambiguation. In Proc. of the 39th Annual Meeting on Association for Computational Linguistics, pages 2633, 2001. [6] O. Baysal, M. Godfrey, and R. Cohen. A bug you like: A framework for automated assignment of bugs. In Proc. of the 17th International Conference on Program Comprehension, pages 297298, 2009. [7] N. Bettenburg, R. Premraj, T. Zimmermann, and K. Sunghun. Duplicate bug reports considered harmful... really? In Proc. of the International Conference on Software Maintenance, pages 337345, 2008. [8] P. Bhattacharya, I. Neamtiu, and C. R. Shelton. Automated, highly-accurate, bug assignment using machine learning and tossing graphs. Journal of Systems and Software, 85(10):22752292, 2012. [9] E. Bjarnason, P. Runeson, M. Borg, M. Unterkalmsteiner, E. Engstrom, B. Regnell, G. Sabaliauskaite, A. Loconsole, T. Gorschek, and R. Feldt. Challenges and practices in aligning requirements with verication and validation: A case study of six companies. Empirical Software Engineering, (To appear). [10] M. Borg, O. Gotel, and K. Wnuk. Enabling traceability reuse for impact analyses: A feasibility study in a safety context. In Proc. of the 7th International Workshop on Traceability in Emerging Forms of Software Engineering, pages 7278, 2013. [11] M. Borg and P. Runeson. IR in software traceability: From a birds eye view. In Proc. of the International Symposium on Empirical Software Engineering and Measurement, pages 243246, 2013. [12] M. Borg and P. Runeson. Changes, evolution and bugs - Recommendation systems for issue management. In M. Robillard, W. Maalej, R. Walker, and T. Zimmermann, editors, Recommendation Systems in Software Engineering, pages 477509. Springer, 2014. [13] G. Canfora and L. Cerulo. Impact analysis by mining software and change request repositories. In Proc. of 2 http://ease.cs.lth.se the 11th International Symposium on Software Metrics, pages 929, 2005. [14] J. Cleland-Huang, C. K. Chang, and M. Christensen. Event-based traceability for managing evolutionary change. Transactions on Software Engineering, 29(9):796810, 2003. [15] D. Cubranic, G. Murphy, J. Singer, and K. Booth. Hipikat: A project memory for software development. Transaction on Software Engineering, 31(6):446465, 2005. [16] M. Eppler and J. Mengis. The concept of information overload: A review of literature from organization science, accounting, marketing, MIS, and related disciplines. The Information Society, 20(5):325344, 2004. [17] R. Feldt. Do system test cases grow old? In Proc. of the 7th. International Conference on Software Testing, number (To appear), 2014. [18] International Electrotechnical Commission. IEC 61508 ed 1.0, Electrical/electronic/programmable electronic safety-related systems, 2010. [19] G. Jeong, S. Kim, and T. Zimmermann. Improving bug triage with bug tossing graphs. In Proc. of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pages 111120, 2009. [20] L. Jonsson, M. Borg, D. Broman, K. Sandahl, S. Eldh, and P. Runeson. Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts. Empirical Software Engineering, (Submitted), 2014. [21] L. Jonsson, D. Broman, K. Sandahl, and S. Eldh. Towards automated anomaly report assignment in large complex systems using stacked generalization. In Proc. of the 5th International Conference on Software Testing, Verication and Validation, pages 437446, 2012. [22] S. Just, R. Premraj, and T. Zimmermann. Towards the next generation of bug tracking systems. In Proc. of the 2008 Symposium on Visual Languages and Human-Centric Computing, pages 8285, 2008. [23] P. Karr-Wisniewski and Y. Lu. When more is too much: Operationalizing technology overload and exploring its impact on knowledge worker productivity. Computers in Human Behavior, 26(5):10611072, 2010. [24] L. Olofsson and P. Gullin. Development of a Decision Support System for Defect Reports. Master thesis, Dept. of Comp. Science, Lund University, 2014. [25] B. Regnell, R. Berntsson-Svensson, and K. Wnuk. Can we beat the complexity of very large-scale requirements engineering? In Proc. of REFSQ, pages 123128, 2008. [26] M. Robillard, R. Walker, and T. Zimmermann. Recommendation systems for software engineering. IEEE Software, 27(4):8086, 2010. [27] J. Sill, G. Takacs, L. Mackey, and D. Lin. Feature-weighted linear stacking. CoRR, 2009. < p r e p r i n t >