0 évaluation0% ont trouvé ce document utile (0 vote)
18 vues5 pages
This document discusses mining a software developer's local interaction history to help coordinate team activities and changes made to project artifacts. It presents an approach and prototype implementation to capture a developer's local edits, track changes to code structure over time, and analyze browsing patterns. This local interaction data could support team awareness, identify refactoring patterns, coordinate file undos, and provide insights for project management.
This document discusses mining a software developer's local interaction history to help coordinate team activities and changes made to project artifacts. It presents an approach and prototype implementation to capture a developer's local edits, track changes to code structure over time, and analyze browsing patterns. This local interaction data could support team awareness, identify refactoring patterns, coordinate file undos, and provide insights for project management.
Droits d'auteur :
Attribution Non-Commercial (BY-NC)
Formats disponibles
Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd
This document discusses mining a software developer's local interaction history to help coordinate team activities and changes made to project artifacts. It presents an approach and prototype implementation to capture a developer's local edits, track changes to code structure over time, and analyze browsing patterns. This local interaction data could support team awareness, identify refactoring patterns, coordinate file undos, and provide insights for project management.
Droits d'auteur :
Attribution Non-Commercial (BY-NC)
Formats disponibles
Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd
Mining a Software Developer’s Local Interaction History
Kevin A. Schneider, Carl Gutwin, Reagan Penner and David Paquette
Department of Computer Science, University of Saskatchewan 57 Campus Drive, Saskatoon, SK S7N 5A9 Canada {kas, gutwin, rpenner}@cs.usask.ca, dnp972@mail.usask.ca
Abstract ture of the software. Hierarchical containment of language
entities (the structure of the software) is modeled separately Although shared software repositories are commonly so that we can track changes across the language entities. used during software development, it is typical that a soft- For example, we can track changes to a method across ware developer browses and edits a local snapshot of the classes and packages. We use this strategy to monitor API software under development. Developers periodically check (application programming interface) change and usage. their changes into the software repository; however, their Mining local interaction histories has a number of poten- interaction with the local copy is not recorded. Local in- tial applications, including: teraction histories are a valuable source of information and • Coordinating team member activities. Monitoring should be considered when mining software repositories. changes to an API and monitoring API usage may be In this paper we discuss the benefits of analyzing local useful in supporting team awareness during software interaction histories and present a technique and prototype development. (The focus of this paper and our current implementation for their capture and analysis. As well, we prototype implementation.) discuss the implications of local interaction histories and the infrastructure of software repositories. • Identifying refactoring patterns. Analysing local in- teraction histories may be useful for identifying novel refactoring patterns and coordinating refactorings that 1. Introduction affect other team members. • Coordinating multiple file undos. Tracking changes We are interested in mining local interaction histories of with respect to the structure of a software system may a software development team to help coordinate their activi- provide software development guidance when undoing ties and to coordinate the change and use of project artifacts. a set of changes. A software developer’s interaction with a software repos- itory includes editing source code but also involves actions • Identifying browsing patterns. Local interaction his- to browse or locate source code. We are interested in record- tory includes the developer’s searching, browsing and ing and analysing this interaction, which we refer to as the file access activities. Analysing this browsing inter- developer’s local interaction history. Our principle motiva- action may be useful in supporting a developer locate tion is to use this information to support awareness in team technical expertise or exemplars. based software development. • Project Management. Recording the changes a de- Developers normally change a local copy of the software veloper makes to software with respect to communica- under development. Periodically, the developer will syn- tion logs or project plans may prove to be fruitful for chronize their changes with the shared software repository. organizing and managing a software project. Although a portion of the developers’ interaction with the local software artifacts may be recorded for the purpose of The next section discusses background and related work, undoing changes and for recovering from previously saved focusing on coordination and communication issues in soft- versions, the interaction is not recorded in the shared reposi- ware development. Subsequent sections describe our ap- tory and is incomplete when considering awareness support. proach and prototype. The implications of mining local in- In our approach, as a developer changes software ar- teraction histories and the infrastructure of software reposi- tifacts the different versions are recorded in a shared tories is discussed with our future research directions in the ‘shadow’ repository and analysed with respect to the struc- paper’s conclusion. 2. Background and Related Work coordinating actions, managing coupling, discussing tasks, anticipating others’ actions, and finding help. Collaborative software development presents difficult In a software project, knowledge of others’ activities, coordination and communication problems, particularly both past and present, has obvious value for project man- when teams are geographically distributed [6, 8, 10, 12, 13]. agement, but developers also use the information for many Even though projects can be organized to make individual other purposes that assist the overall cohesion and effec- developers partly independent of one another, dependencies tiveness of the team. For example, knowing the specific cannot be totally removed [10]. As a result, there are of- files and objects that another person has been working on ten situations where team members duplicate work, over- can give a good indication of their higher-level tasks and write changes, make incorrect assumptions about another intentions; knowing who has worked most often or most re- person’s intentions, or write code that adversely affects an- cently on a particular file indicates who to talk to before other part of the project. starting further changes; and knowing who is currently ac- These problems often occur because of a lack of aware- tive can provide opportunities for real-time assistance and ness about what is happening in other parts of the project. collaboration. Unfortunately, current development tools and environments On software projects, awareness information is currently do not make it easy to maintain awareness of others’ activ- difficult to obtain from development environments: al- ities [1]. Awareness is a design concept that holds promise though some of the facts exist (e.g. from CVS logs) there for significantly improving the usability of collaborative are currently no low-effort means for gathering them. A few software development tools. research systems do show awareness information (particu- larly TUKAN [12] and Plantı́r [11]), but little support exists 2.1. Collaboration in Software Development in more widespread environments.
Collaboration support has always been a part of dis- 3. Project Watcher
tributed development – teams have long used version con- trol, email, chat groups, reviews, and internal documenta- ProjectWatcher is a prototype system that gathers infor- tion to coordinate activities and give and gather information mation about project artifacts and developer’s actions with – but these solutions generally either represent the project those artifacts, and that visualizes this awareness informa- at a very coarse granularity (e.g. CVS [3]), require con- tion in the Eclipse [5] development environment (Figure 1). siderable time and effort (e.g. reading documentation), or ProjectWatcher consists of two main parts – the mining depend on people’s current availability (e.g. IRC). component and the visualization plugins. Researchers in software engineering and CSCW have found a number As Herbsleb of problemsand thatGrinter still [7] occurstate, lack of in group projects and distributed software development. They foundsame awareness – “the inability to share at the environment that it is difficult and to to: determine whensee two what is happening people are making at the changes to the same artifacts [10]; communicate with others in other site” (p. 67) is one of the major factors these problems. across timezones and work schedules [6]; find partners for closer collaboration or assistance on particular issues [12]; determine who 2.2has expertise Group or Awareness knowledge about the differ- ent parts ofIntheany project [13]; benefit group work situation, fromawareness the opportunis- of others tic and unplanned contact that occurs when provides information that is critical developers for smooth are and colocated [8]. As Herbsleb and Grinter [8] state, effective collaboration. This is group awareness: lack of awareness –the “theunderstanding inability to share at the same environment of who is working with you, and to see whatwhatis they happening at the and are doing, otherhowsite”your (p. 67) own is one actions of the majorinteract factors with theirs in these [11]. Group awareness is useful problems. for many of the activities of collaboration—for Figure 1. ProjectWatcher in the Eclipse IDE; Figure 1. ProjectWatcher in Eclipse. Visual- 2.2. Groupcoordinating Awareness actions, managing coupling, visualizations are at lower left and upper right. izations are at lower left and upper right. discussing tasks, anticipating others’ actions, and finding help.situation, awareness of others pro- In any group work In that a software 4.1 Fact extraction vides information is criticalproject, knowledge for smooth of others’ and effective The mining component analyzes the source code of a collaboration. activities, This is both grouppast and present, awareness: the has obvious value understanding Thetofact project extraction produce facts component analyzes for use by the the sourcevisu- ProjectWatcher for project of who is working with management, you, what theybut aredevelopers doing, and alsohow use code plugin. alization of a project to produce The mining facts forgathers component use byinforma- the the information for many other your own actions interact with theirs [7]. Group awarenesspurposes that assist tionProjectWatcher on the structure ofvisualization plugin. the project and also onThethe fact current the overall cohesion and effectiveness is useful for many of the activities of collaboration – for of the team. and extractor historical gathers information activity of the projecton themembers. team structure of For example, knowing the specific files and the project and also on the current and historical objects that another person has been working on activity of the project team members (Figure 2). can give a good indication of their higher-level tasks and intentions; knowing who has worked User checkout and commits Project most often or most recently on a particular file CVS indicates who to talk to before starting further Repository Auto-commits changes; and knowing who is currently active can source code transformation. At this point, the the fact extract s of others method call facts are not uniquely identified since tasks); the ove mooth and we do not have sufficient information to identify display of all awareness: which package or class the method being called overlaid with with you, belongs to. This resolution is accomplished by interaction hist wn actions stage two, the Method Call Resolver. CVS front-end ss is useful Figure To 1. to be able ProjectWatcher gather developer in activity the Eclipse IDE; a information, colour on the ration—for visualizations shadow CVS are at repository lower of the left and project upper is right. maintained (Fig- Java API Facts collect much m coupling, and provide m ctions, and ure 2). User edits are auto-committed to the shadow repos- allow team m itory as developers edit source code files. Although Eclipse 4.1 Fact extraction of others’ provides a local history of changes, we require that the Fact Extractor (TXL) Partial Method Call Resolver (Java) awareness infor Factbase ProjectWat vious value changes The fact extractiontocomponent be available analyzesinthethesource other developers software base to create rs also use development code of a team project andtosoproduce publishing factsthem for use in theby shadow the developer is d s that assist repository ProjectWatcher visualization gives us that facility. Asplugin.well, weThearefact able to Java Project Complete overview plugi f the team. record extractor actionsgathers information along with changes to onsoftware the structure artifacts,of and Source Code Factbase shown in a sim files and we the project are able and also to commit on theatcurrent changes differentand timehistorical intervals. packages, files, working on activity of the project team members (Figure 2). Figure 3: Fact extraction from Java projects always stacked igher-level Figure 3. Mining User Edits. In a two stage The Method process, Call Resolver package, class and extracts method scope facts facts are location in the as worked User checkout and by the user. O ticular file commits Project from the project extracted source codewith and combined andJava integrates them API facts. CVS withfacts The the facts are extracted used byfrom stage one. Next,com- the visualization the representation, ng further Repository Auto-commits method call ponent facts are API to convey analyzed use toanddetermine which API change First, each deve active can and this colour stance and package and class the method that was called information. belongs to. This process involves resolving the overview based User Edit Fact Shadow User Edit Extractor CVS types of variables and return types of methods that include who ha ng that we FactBase Repository or modified the world, but are passed as arguments to method calls. First, the ing component types of all(thisthe isarguments only done areonceidentified. for all projects). Then Not summary of th difficult in Figure 2: User edit fact extraction scope, package, class, and method facts are all calls may be resolved, however for our purpose the ac- with a small b t, trying to Figure 2. Capturing User Edits. A shadow rectangle; bars To be able to gather developer activity information, analyzed to determine which package and class the curacy of the method call resolver is adequate. is often software repository is used to record the ac- a shadow CVS repository of the project is method The belongs complete to. To factbase resolve contains calls toidentified uniquely the Javafacts class since its c e projects, tivities of a software developer. about an artifa maintained. User edits are auto-committed to the library, the full Java API is first processed by the indicating all packages, classes, methods, variables, and re- difficult to cursor over a re shadow repository as developers edit source code ProjectWatcher lationships for a Javafact extractor project and all(this is onlyThese user edits. donefacts s: although files. The user edit fact extractor analyzes the are once used for by all theprojects). visualization plugin to show activity and the class and a logs) there The user CVS edits repository mining component details about th shadow to obtain analyzes facts aboutthewhoshadow The complete proximity information. The factbase contains time and space uniquely required for fact gathering CVS repository to obtain facts about who has been edit- CVS repository has been editing the class methods and when. identified extraction and facts factbaseindicating all packages, storage depends on theclasses, size of the awareness The ing the class methodsvisualization and when. A plugin version of a shows cre- file is code.methods, variables, For example, and relationships ProjectWatcher for tailored has been a Java for , but little onments. atedrelationships each time it isbetween packages, auto-committed classes, to the shadowand reposi- Java,project and alltheuser and mining Javaedits. These facts Development Kitare used 1.4.1 by in results 4.2.1 Awar tory.methods The miningand component the activities analysesof theproject team be- differences 202 the visualization package facts, 5,530 plugin class to show facts, activity 47,962 method andfacts, members with theseAPI entities. usageThe andrelationships API change.that Following on tween versions to track and proximity 106,926 callinformation. facts. The time and space needed others’ activitie are mining The currently extractedis include: component implemented method in twocalls, stages for fact extraction and factbase storage depends on is “who is wor hat gathers andimports, implements, may either be run onand theinheritance; shadow softwarethe user edit repository the size of the code; for example, Java 1.4.1 4. Awareness structures and d developer’s facts or on thethat are extracted shared include edits software repository and API (Figure 3). usage. Stage one contains 202Visualization package facts, 5,530 class facts, The fact extractorin istheimplemented under develop ualizes this uniquely names all entities project whileinextracting two 47,962 method facts, and 106,926 call facts. concept in DE (Figure the stages entity (see Figure 3). Stage and relationship facts. one Thisuniquely process names is accom- 4.1. Activity Awareness developers who all entities with a in theprogram project while using extracting the entity parts – the plished and TXL relationship facts. syntactic pattern This process match- is not 4.2 Awareness visualizations implicit sub-tea ns. ing [2, 4]. At this point, the method call facts are 4.2.1 Awareness ProjectWatcher visualizesofteam activity members’ past and cur- whose interact uniquely identified since we do not have sufficient infor- rent activities on project artifacts. The visualization uses Proximity grou mation to identify which package or class the method being the ProjectWatcher visualizes ideas of interaction historyteam members’ [9] and past and overviews: the in- change member called belongs to. This resolution is accomplished by stage current teraction activities history on allproject is a record artifacts. of the actions that aThe person to task; theref two, the method call resolver. visualization undertakes with auses the artifact project ideas of(gathered interaction history unobtrusively determine who The method call resolver extracts facts from the project by the mining component as people carry out their nor- source code and integrates them with the facts extracted mal tasks); the overview representation is a compact display from stage one. Next, the method call facts are analyzed of all the project artifacts, that can be overlaid with visual to determine which package and class the method that was information about the interaction history. Although some called belongs to. This process involves resolving the types tools such as CVS front-ends do have limited visualization of variables and return types of methods that are passed as (e.g. by colour on the project tree), our goal here is to col- arguments to method calls. The types of all the arguments lect much more information about interaction, and provide are identified, and then scope, package, class, and method much richer visualizations that will allow team members to facts are analyzed to determine which package and class the gather more detailed awareness information. method belongs to. To resolve calls to the Java library, the ProjectWatcher plugins use the extracted fact base to cre- full Java API is first processed by the ProjectWatcher min- ate a visual model of what each developer is doing in that ten. Second, we show a summary of the activity history for each artifact with a small bar graph drawn inside the object’s rectangle; bars represent amount of change to the class since its creation. Finally, more information about an artifact can be obtained by holding the cursor over a rectangle: for ex- ample, the name of the class and a more detailed bar graph, along with details about the state of the class compared to the CVS repository.
4.2. Proximity Awareness
Following on from a basic understanding of others’ ac-
tivities is the question of proximity – that is “who is working near to me?” in terms of the structures and dependencies of the software system under development. The notion of distance to another person has not been studied extensively, although it has been explored previ- ously in Schümmer’s TUKAN [12]. We have developed a visualization tool (Figure 5) that makes it easier to see proximity-based groups. Once actions are mapped to the dependency structure, the graph is presented in visual form with people’s locations and proximities made explicit. Figure 5. ProjectWatcher graph view showing packages, classes, methods, and call dependencies. When the user holds the cursor over a class, the dependencies for individual methods appear. Graph nodes are coloured (by developer colour) according to recency of edit.
4 Conclusions & future work
Figure 4. Project overview plugin showing We have presented a system to address some of packages4. (grey Figure Projectbars) and classes overview plugin within each showing the awareness problems experienced in distributed package (coloured packages (grey bars)blocks). and Colour classesindicates who within each software development projects. The system edited the(coloured class most blocks). recently. Black marks inside observes user activities in an IDE and records package Colour indicates class edited blocks chart those actions in relation to the artifact-based who the edits classsince most project start. Black recently. dependencies extracted from source code. marks inside class blocks chart edits since The notion of distance to another person in this Visualization plugins represent this information project start. for developers to see and interact with. Although dependency space has not been studied extensively, although it has been explored previously in our prototypes have limitations (particularly in Schümmer’s TUKAN [2,3]. We have developed a terms of project size), they can provide developers visualization project space. Intool the (Figure overview5) plugin that makes it easier (Figure to 4), project with much-needed information about who is see proximity-based working on the project, what they are doing, and artifacts are shown in a groups. The visualization simple stacked fashion thatis dis- based on a dependency how closely linked two developers are. plays packages, files, classes,graphs derived from and methods. the are Artifacts extracted Our future plans with the system involve both always stacked factbase by creation and date,from so thatthetheir fine-grained location in the recording of interaction history. Once actions are improvements and new directions. With the overview can over time be learned by the user. On this basic current system, we plan to continue refining our mapped to the dependency structure, the (but space-saving) representation, we overlay awareness graph is in- presented in each visual form with people’s locations representations and filters to determine how the formation. First, developer is assigned a unique colour, information can be best presented to developers. and proximities made explicit. and this colour can be added to the blocks in the overview Second, Figure we currently visualize sourcegraph 5. ProjectWatcher code view that is showing based on a set of filters. Common filters include who has modified artifacts most recently, or modified them most of- packages, Figureclasses, methods, and call 5. ProjectWatcher dependencies. graph view When the user holds the cursor over a class, the dependencies for individual methods appear. Graph nodes are coloured (by developer colour) according to recency of edit.
4 Conclusions & future work
5. Conclusion tion history and extracting method call facts from the source code provides us with basic API usage information. We We have presented a system for mining local interaction can present this information in a future plugin to provide histories to help address some of the awareness problems awareness of technology expertise. A developer wishing to experienced in distributed software development projects. know how to use a particular Java API feature may be pre- The system observes a software developer’s activities in a sented with a list of developers who have used the feature software development environment and records those ac- frequently or recently. Alternatively, the visualization plu- tions in relation to the artifact-based dependencies extracted gin may present this information overlaid on the project’s from source code. Visualization plugins represent this infor- dependency structure. mation for developers to see and interact with. Although our prototypes have limitations (particularly in terms of project Acknowlegment size), they can provide developers with much-needed infor- mation about who is working on the project, what they are The authors would like to thank IBM Corporation for doing, and how closely linked two developers are. supporting this research. Our experience suggests a number of directions for min- ing software repository research, including: References • Content. Research on awareness often monitors a software development teams’ interaction with a shared [1] M. C. Chu-Carroll and S. Sprenkle. Coven: brewing better software repository. Unfortunately, the granularity of collaboration through software configuration management. check-in and check-out is usually too coarse to ade- In Proceedings of the 8th ACM SIGSOFT international sym- quately monitor change. This suggests that the content posium on Foundations of software engineering, pages 88– of shared software repositories should also include lo- 97. ACM Press, 2000. [2] J. R. Cordy, T. R. Dean, A. Malton, and K. A. Schnei- cal interaction histories. der. Source transformation in software engineering using • Rapid incremental processing. For our purposes it the TXL transformation system. Journal of Information and Software Technology, 44(13):827–837, October 2002. is important that the computation of source facts and [3] CVS. Concurrent Versions System. Available online at their resolution be relatively efficient to support inter- http://www.cvshome.org/. active visualizations. [4] T. R. Dean, J. R. Cordy, K. A. Schneider, and A. Malton. Us- ing design recovery techniques to transform legacy systems. • Robustness. Our analysis may process source that is In ICSM, pages 622–631, 2001. currently being edited and so the source may not be [5] Eclipse. Available online at http://www.eclipse.org/. well-formed. We require that fact extraction and reso- [6] R. E. Grinter, J. D. Herbsleb, and D. E. Perry. The geogra- lution needs to support analysis under ongoing change. phy of coordination: dealing with distance in r&d work. In Proceedings of the international ACM SIGGROUP confer- Our future plans with the system involve both improve- ence on Supporting group work, pages 306–315, 1999. ments and new directions. With the current system, we plan [7] C. Gutwin and S. Greenberg. A descriptive framework of to continue refining our representations and filters to deter- workspace awareness for real–time groupware. Computer mine how the information can be best presented to develop- Supported Cooperative Work, 11(3):411–446, 2002. ers. We currently visualize source code that is in the pro- [8] J. D. Herbsleb and R. E. Grinter. Architectures, coordina- cess of being edited, and therefore the source code may be tion, and distance: Conway’s law and beyond. IEEE Soft- inconsistent, incomplete and frequently updated. We are ware, pages 63–70, 1999. [9] W. C. Hill, J. D. Hollan, D. Wroblewski, and T. McCandless. investigating techniques for improving the robustness and Edit wear and read wear. In Proceedings of CHI’92, pages performance of the mining component and visualizing par- 3–9. ACM Press, 1992. tial information given these circumstances. [10] R. E. Kraut and L. A. Streeter. Coordination in software de- Longer range plans involve extensions to the basic ideas velopment. Communication of the ACM, 38(3):69–81, 1995. of project artifacts and interaction histories. We plan to [11] A. Sarma, Z. Noroozi, and A. van der Hoek. Palantı́r: raising extend our artifact collection to include entities other than awareness among configuration management workspaces. those in source code. Many other project artifacts exist, in- In Proceedings of ICSE 2003, pages 444–454, 2003. [12] T. Schümmer. Lost and found in software space. In Pro- cluding communication logs, bug reports and task lists. We ceedings of the 34th HICSS, 2001. hope to establish additional facts to model these artifacts [13] B. Zimmermann and A. M. Selvin. A framework for assess- and to use the new artifacts and their relationships in the ing group memory approaches for software design projects. awareness visualizations. In Proceedings of the conference on Designing interactive We can also extend our use of the interaction histories systems, pages 417–426. ACM Press, 1997. to other areas. For example, recording developers’ interac-