Vous êtes sur la page 1sur 5

Mining a Software Developer’s Local Interaction History

Kevin A. Schneider, Carl Gutwin, Reagan Penner and David Paquette


Department of Computer Science, University of Saskatchewan
57 Campus Drive, Saskatoon, SK S7N 5A9 Canada
{kas, gutwin, rpenner}@cs.usask.ca, dnp972@mail.usask.ca

Abstract ture of the software. Hierarchical containment of language


entities (the structure of the software) is modeled separately
Although shared software repositories are commonly so that we can track changes across the language entities.
used during software development, it is typical that a soft- For example, we can track changes to a method across
ware developer browses and edits a local snapshot of the classes and packages. We use this strategy to monitor API
software under development. Developers periodically check (application programming interface) change and usage.
their changes into the software repository; however, their Mining local interaction histories has a number of poten-
interaction with the local copy is not recorded. Local in- tial applications, including:
teraction histories are a valuable source of information and
• Coordinating team member activities. Monitoring
should be considered when mining software repositories.
changes to an API and monitoring API usage may be
In this paper we discuss the benefits of analyzing local
useful in supporting team awareness during software
interaction histories and present a technique and prototype
development. (The focus of this paper and our current
implementation for their capture and analysis. As well, we
prototype implementation.)
discuss the implications of local interaction histories and
the infrastructure of software repositories. • Identifying refactoring patterns. Analysing local in-
teraction histories may be useful for identifying novel
refactoring patterns and coordinating refactorings that
1. Introduction affect other team members.
• Coordinating multiple file undos. Tracking changes
We are interested in mining local interaction histories of with respect to the structure of a software system may
a software development team to help coordinate their activi- provide software development guidance when undoing
ties and to coordinate the change and use of project artifacts. a set of changes.
A software developer’s interaction with a software repos-
itory includes editing source code but also involves actions • Identifying browsing patterns. Local interaction his-
to browse or locate source code. We are interested in record- tory includes the developer’s searching, browsing and
ing and analysing this interaction, which we refer to as the file access activities. Analysing this browsing inter-
developer’s local interaction history. Our principle motiva- action may be useful in supporting a developer locate
tion is to use this information to support awareness in team technical expertise or exemplars.
based software development.
• Project Management. Recording the changes a de-
Developers normally change a local copy of the software
veloper makes to software with respect to communica-
under development. Periodically, the developer will syn-
tion logs or project plans may prove to be fruitful for
chronize their changes with the shared software repository.
organizing and managing a software project.
Although a portion of the developers’ interaction with the
local software artifacts may be recorded for the purpose of The next section discusses background and related work,
undoing changes and for recovering from previously saved focusing on coordination and communication issues in soft-
versions, the interaction is not recorded in the shared reposi- ware development. Subsequent sections describe our ap-
tory and is incomplete when considering awareness support. proach and prototype. The implications of mining local in-
In our approach, as a developer changes software ar- teraction histories and the infrastructure of software reposi-
tifacts the different versions are recorded in a shared tories is discussed with our future research directions in the
‘shadow’ repository and analysed with respect to the struc- paper’s conclusion.
2. Background and Related Work coordinating actions, managing coupling, discussing tasks,
anticipating others’ actions, and finding help.
Collaborative software development presents difficult In a software project, knowledge of others’ activities,
coordination and communication problems, particularly both past and present, has obvious value for project man-
when teams are geographically distributed [6, 8, 10, 12, 13]. agement, but developers also use the information for many
Even though projects can be organized to make individual other purposes that assist the overall cohesion and effec-
developers partly independent of one another, dependencies tiveness of the team. For example, knowing the specific
cannot be totally removed [10]. As a result, there are of- files and objects that another person has been working on
ten situations where team members duplicate work, over- can give a good indication of their higher-level tasks and
write changes, make incorrect assumptions about another intentions; knowing who has worked most often or most re-
person’s intentions, or write code that adversely affects an- cently on a particular file indicates who to talk to before
other part of the project. starting further changes; and knowing who is currently ac-
These problems often occur because of a lack of aware- tive can provide opportunities for real-time assistance and
ness about what is happening in other parts of the project. collaboration.
Unfortunately, current development tools and environments On software projects, awareness information is currently
do not make it easy to maintain awareness of others’ activ- difficult to obtain from development environments: al-
ities [1]. Awareness is a design concept that holds promise though some of the facts exist (e.g. from CVS logs) there
for significantly improving the usability of collaborative are currently no low-effort means for gathering them. A few
software development tools. research systems do show awareness information (particu-
larly TUKAN [12] and Plantı́r [11]), but little support exists
2.1. Collaboration in Software Development in more widespread environments.

Collaboration support has always been a part of dis- 3. Project Watcher


tributed development – teams have long used version con-
trol, email, chat groups, reviews, and internal documenta- ProjectWatcher is a prototype system that gathers infor-
tion to coordinate activities and give and gather information mation about project artifacts and developer’s actions with
– but these solutions generally either represent the project those artifacts, and that visualizes this awareness informa-
at a very coarse granularity (e.g. CVS [3]), require con- tion in the Eclipse [5] development environment (Figure 1).
siderable time and effort (e.g. reading documentation), or ProjectWatcher consists of two main parts – the mining
depend on people’s current availability (e.g. IRC). component and the visualization plugins.
Researchers in software engineering and CSCW have
found a number As Herbsleb
of problemsand thatGrinter
still [7]
occurstate, lack of
in group
projects and distributed software development. They foundsame
awareness – “the inability to share at the
environment
that it is difficult and to
to: determine whensee two
what is happening
people are making at the
changes to the same artifacts [10]; communicate with others in
other site” (p. 67) is one of the major factors
these problems.
across timezones and work schedules [6]; find partners for
closer collaboration or assistance on particular issues [12];
determine who 2.2has expertise
Group or Awareness
knowledge about the differ-
ent parts ofIntheany project [13]; benefit
group work situation, fromawareness
the opportunis-
of others
tic and unplanned contact that occurs when
provides information that is critical developers
for smooth are and
colocated [8]. As Herbsleb and Grinter [8] state,
effective collaboration. This is group awareness: lack of
awareness –the “theunderstanding
inability to share at the same environment
of who is working with you,
and to see whatwhatis they
happening at the and
are doing, otherhowsite”your
(p. 67)
own is one
actions
of the majorinteract
factors with theirs
in these [11]. Group awareness is useful
problems.
for many of the activities of collaboration—for Figure 1. ProjectWatcher in the Eclipse IDE;
Figure 1. ProjectWatcher in Eclipse. Visual-
2.2. Groupcoordinating
Awareness actions, managing coupling, visualizations are at lower left and upper right.
izations are at lower left and upper right.
discussing tasks, anticipating others’ actions, and
finding help.situation, awareness of others pro-
In any group work
In that
a software
4.1 Fact extraction
vides information is criticalproject, knowledge
for smooth of others’
and effective The mining component analyzes the source code of a
collaboration. activities,
This is both
grouppast and present,
awareness: the has obvious value
understanding Thetofact
project extraction
produce facts component analyzes
for use by the the sourcevisu-
ProjectWatcher
for project
of who is working with management,
you, what theybut aredevelopers
doing, and alsohow use code plugin.
alization of a project to produce
The mining facts forgathers
component use byinforma-
the
the information for many other
your own actions interact with theirs [7]. Group awarenesspurposes that assist tionProjectWatcher
on the structure ofvisualization plugin.
the project and also onThethe fact
current
the overall cohesion and effectiveness
is useful for many of the activities of collaboration – for of the team. and extractor
historical gathers information
activity of the projecton themembers.
team structure of
For example, knowing the specific files and the project and also on the current and historical
objects that another person has been working on activity of the project team members (Figure 2).
can give a good indication of their higher-level
tasks and intentions; knowing who has worked User checkout and
commits Project
most often or most recently on a particular file CVS
indicates who to talk to before starting further Repository
Auto-commits
changes; and knowing who is currently active can
source code transformation. At this point, the the fact extract
s of others
method call facts are not uniquely identified since tasks); the ove
mooth and we do not have sufficient information to identify display of all
awareness: which package or class the method being called overlaid with
with you,
belongs to. This resolution is accomplished by interaction hist
wn actions stage two, the Method Call Resolver. CVS front-end
ss is useful
Figure
To 1. to
be able ProjectWatcher
gather developer in activity
the Eclipse IDE; a
information, colour on the
ration—for
visualizations
shadow CVS are at
repository lower
of the left and
project upper
is right.
maintained (Fig-
Java API Facts collect much m
coupling, and provide m
ctions, and ure 2). User edits are auto-committed to the shadow repos-
allow team m
itory as developers edit source code files. Although Eclipse
4.1 Fact extraction
of others’ provides a local history of changes, we require that the
Fact Extractor
(TXL)
Partial Method Call Resolver
(Java)
awareness infor
Factbase
ProjectWat
vious value changes
The fact extractiontocomponent
be available analyzesinthethesource
other developers software
base to create
rs also use development
code of a team
project
andtosoproduce
publishing factsthem
for use
in theby shadow
the developer is d
s that assist repository
ProjectWatcher visualization
gives us that facility. Asplugin.well, weThearefact able to Java Project Complete overview plugi
f the team. record
extractor
actionsgathers information
along with changes to onsoftware
the structure
artifacts,of and Source Code Factbase
shown in a sim
files and we the project
are able and also
to commit on theatcurrent
changes differentand
timehistorical
intervals. packages, files,
working on activity of the project team members (Figure 2). Figure 3: Fact extraction from Java projects always stacked
igher-level Figure 3. Mining User Edits. In a two stage
The Method
process, Call Resolver
package, class and extracts
method scope
facts facts
are location in the
as worked User checkout and
by the user. O
ticular file
commits Project from the project
extracted source codewith
and combined andJava
integrates them
API facts.
CVS
withfacts
The the facts
are extracted
used byfrom stage one. Next,com-
the visualization the representation,
ng further Repository
Auto-commits method call
ponent facts are API
to convey analyzed
use toanddetermine which
API change First, each deve
active can and this colour
stance and package and class the method that was called
information.
belongs to. This process involves resolving the overview based
User Edit Fact Shadow
User Edit
Extractor CVS types of variables and return types of methods that include who ha
ng that we FactBase
Repository
or modified the
world, but are passed as arguments to method calls. First, the
ing component
types of all(thisthe isarguments
only done areonceidentified.
for all projects).
Then Not summary of th
difficult in Figure 2: User edit fact extraction
scope, package, class, and method facts are
all calls may be resolved, however for our purpose the ac- with a small b
t, trying to Figure 2. Capturing User Edits. A shadow rectangle; bars
To be able to gather developer activity information, analyzed to determine which package and class the
curacy of the method call resolver is adequate.
is often software repository is used to record the ac-
a shadow CVS repository of the project is method
The belongs
complete to. To
factbase resolve
contains calls toidentified
uniquely the Javafacts class since its c
e projects, tivities of a software developer. about an artifa
maintained. User edits are auto-committed to the library, the full Java API is first processed by the
indicating all packages, classes, methods, variables, and re-
difficult to cursor over a re
shadow repository as developers edit source code ProjectWatcher
lationships for a Javafact extractor
project and all(this is onlyThese
user edits. donefacts
s: although
files. The user edit fact extractor analyzes the are once
used for
by all
theprojects).
visualization plugin to show activity and the class and a
logs) there The user CVS
edits repository
mining component details about th
shadow to obtain analyzes
facts aboutthewhoshadow The complete
proximity information. The factbase contains
time and space uniquely
required for fact
gathering CVS repository to obtain facts about who has been edit- CVS repository
has been editing the class methods and when. identified
extraction and facts
factbaseindicating all packages,
storage depends on theclasses,
size of the
awareness
The
ing the class methodsvisualization
and when. A plugin
version of a shows cre-
file is code.methods, variables,
For example, and relationships
ProjectWatcher for tailored
has been a Java for
, but little
onments.
atedrelationships
each time it isbetween packages,
auto-committed classes,
to the shadowand reposi- Java,project and alltheuser
and mining Javaedits. These facts
Development Kitare used
1.4.1 by in
results 4.2.1 Awar
tory.methods
The miningand component
the activities
analysesof theproject team be-
differences 202 the visualization
package facts, 5,530 plugin
class to show
facts, activity
47,962 method andfacts,
members with theseAPI
entities.
usageThe andrelationships
API change.that Following on
tween versions to track and proximity
106,926 callinformation.
facts. The time and space needed
others’ activitie
are mining
The currently extractedis include:
component implemented method in twocalls,
stages for fact extraction and factbase storage depends on
is “who is wor
hat gathers andimports, implements,
may either be run onand theinheritance;
shadow softwarethe user edit
repository the size of the code; for example, Java 1.4.1
4. Awareness structures and d
developer’s facts
or on thethat are extracted
shared include edits
software repository and API
(Figure 3). usage.
Stage one contains 202Visualization
package facts, 5,530 class facts,
The fact extractorin istheimplemented under develop
ualizes this uniquely names all entities project whileinextracting
two 47,962 method facts, and 106,926 call facts.
concept in
DE (Figure the stages
entity (see Figure 3). Stage
and relationship facts. one Thisuniquely
process names
is accom- 4.1. Activity Awareness
developers who
all entities
with a in theprogram
project while
using extracting the entity
parts – the plished
and
TXL
relationship facts.
syntactic pattern
This process
match-
is not
4.2 Awareness visualizations implicit sub-tea
ns. ing [2, 4]. At this point, the method call facts are 4.2.1 Awareness
ProjectWatcher visualizesofteam
activity
members’ past and cur- whose interact
uniquely identified since we do not have sufficient infor- rent activities on project artifacts. The visualization uses Proximity grou
mation to identify which package or class the method being the ProjectWatcher visualizes
ideas of interaction historyteam members’
[9] and past and
overviews: the in- change member
called belongs to. This resolution is accomplished by stage current
teraction activities
history on allproject
is a record artifacts.
of the actions that aThe
person to task; theref
two, the method call resolver. visualization
undertakes with auses the artifact
project ideas of(gathered
interaction history
unobtrusively determine who
The method call resolver extracts facts from the project by the mining component as people carry out their nor-
source code and integrates them with the facts extracted mal tasks); the overview representation is a compact display
from stage one. Next, the method call facts are analyzed of all the project artifacts, that can be overlaid with visual
to determine which package and class the method that was information about the interaction history. Although some
called belongs to. This process involves resolving the types tools such as CVS front-ends do have limited visualization
of variables and return types of methods that are passed as (e.g. by colour on the project tree), our goal here is to col-
arguments to method calls. The types of all the arguments lect much more information about interaction, and provide
are identified, and then scope, package, class, and method much richer visualizations that will allow team members to
facts are analyzed to determine which package and class the gather more detailed awareness information.
method belongs to. To resolve calls to the Java library, the ProjectWatcher plugins use the extracted fact base to cre-
full Java API is first processed by the ProjectWatcher min- ate a visual model of what each developer is doing in that
ten. Second, we show a summary of the activity history for
each artifact with a small bar graph drawn inside the object’s
rectangle; bars represent amount of change to the class since
its creation. Finally, more information about an artifact can
be obtained by holding the cursor over a rectangle: for ex-
ample, the name of the class and a more detailed bar graph,
along with details about the state of the class compared to
the CVS repository.

4.2. Proximity Awareness

Following on from a basic understanding of others’ ac-


tivities is the question of proximity – that is “who is working
near to me?” in terms of the structures and dependencies of
the software system under development.
The notion of distance to another person has not been
studied extensively, although it has been explored previ-
ously in Schümmer’s TUKAN [12]. We have developed
a visualization tool (Figure 5) that makes it easier to see
proximity-based groups. Once actions are mapped to the
dependency structure, the graph is presented in visual form
with people’s locations and proximities made explicit.
Figure 5. ProjectWatcher graph view showing
packages, classes, methods, and call dependencies.
When the user holds the cursor over a class, the
dependencies for individual methods appear.
Graph nodes are coloured (by developer colour)
according to recency of edit.

4 Conclusions & future work


Figure 4. Project overview plugin showing We have presented a system to address some of
packages4. (grey
Figure Projectbars) and classes
overview plugin within each
showing the awareness problems experienced in distributed
package (coloured
packages (grey bars)blocks).
and Colour
classesindicates who
within each software development projects. The system
edited the(coloured
class most blocks).
recently. Black marks inside observes user activities in an IDE and records
package Colour indicates
class edited
blocks chart those actions in relation to the artifact-based
who the edits
classsince
most project start. Black
recently.
dependencies extracted from source code.
marks inside class blocks chart edits since
The notion of distance to another person in this Visualization plugins represent this information
project start. for developers to see and interact with. Although
dependency space has not been studied extensively,
although it has been explored previously in our prototypes have limitations (particularly in
Schümmer’s TUKAN [2,3]. We have developed a terms of project size), they can provide developers
visualization
project space. Intool the (Figure
overview5) plugin
that makes it easier
(Figure to
4), project with much-needed information about who is
see proximity-based working on the project, what they are doing, and
artifacts are shown in a groups. The visualization
simple stacked fashion thatis dis-
based on a dependency how closely linked two developers are.
plays packages, files, classes,graphs derived from
and methods. the are
Artifacts
extracted Our future plans with the system involve both
always stacked factbase
by creation and
date,from
so thatthetheir
fine-grained
location in the
recording of interaction history. Once actions are improvements and new directions. With the
overview can over time be learned by the user. On this basic current system, we plan to continue refining our
mapped to the dependency structure, the
(but space-saving) representation, we overlay awareness graph is in-
presented in each
visual form with people’s locations representations and filters to determine how the
formation. First, developer is assigned a unique colour, information can be best presented to developers.
and proximities made explicit.
and this colour can be added to the blocks in the overview Second, Figure
we currently visualize sourcegraph
5. ProjectWatcher code view
that is showing
based on a set of filters. Common filters include who has
modified artifacts most recently, or modified them most of- packages,
Figureclasses, methods, and call
5. ProjectWatcher dependencies.
graph view
When the user holds the cursor over a class, the
dependencies for individual methods appear.
Graph nodes are coloured (by developer colour)
according to recency of edit.

4 Conclusions & future work


5. Conclusion tion history and extracting method call facts from the source
code provides us with basic API usage information. We
We have presented a system for mining local interaction can present this information in a future plugin to provide
histories to help address some of the awareness problems awareness of technology expertise. A developer wishing to
experienced in distributed software development projects. know how to use a particular Java API feature may be pre-
The system observes a software developer’s activities in a sented with a list of developers who have used the feature
software development environment and records those ac- frequently or recently. Alternatively, the visualization plu-
tions in relation to the artifact-based dependencies extracted gin may present this information overlaid on the project’s
from source code. Visualization plugins represent this infor- dependency structure.
mation for developers to see and interact with. Although our
prototypes have limitations (particularly in terms of project Acknowlegment
size), they can provide developers with much-needed infor-
mation about who is working on the project, what they are The authors would like to thank IBM Corporation for
doing, and how closely linked two developers are. supporting this research.
Our experience suggests a number of directions for min-
ing software repository research, including:
References
• Content. Research on awareness often monitors a
software development teams’ interaction with a shared [1] M. C. Chu-Carroll and S. Sprenkle. Coven: brewing better
software repository. Unfortunately, the granularity of collaboration through software configuration management.
check-in and check-out is usually too coarse to ade- In Proceedings of the 8th ACM SIGSOFT international sym-
quately monitor change. This suggests that the content posium on Foundations of software engineering, pages 88–
of shared software repositories should also include lo- 97. ACM Press, 2000.
[2] J. R. Cordy, T. R. Dean, A. Malton, and K. A. Schnei-
cal interaction histories. der. Source transformation in software engineering using
• Rapid incremental processing. For our purposes it the TXL transformation system. Journal of Information and
Software Technology, 44(13):827–837, October 2002.
is important that the computation of source facts and
[3] CVS. Concurrent Versions System. Available online at
their resolution be relatively efficient to support inter- http://www.cvshome.org/.
active visualizations. [4] T. R. Dean, J. R. Cordy, K. A. Schneider, and A. Malton. Us-
ing design recovery techniques to transform legacy systems.
• Robustness. Our analysis may process source that is
In ICSM, pages 622–631, 2001.
currently being edited and so the source may not be [5] Eclipse. Available online at http://www.eclipse.org/.
well-formed. We require that fact extraction and reso- [6] R. E. Grinter, J. D. Herbsleb, and D. E. Perry. The geogra-
lution needs to support analysis under ongoing change. phy of coordination: dealing with distance in r&d work. In
Proceedings of the international ACM SIGGROUP confer-
Our future plans with the system involve both improve- ence on Supporting group work, pages 306–315, 1999.
ments and new directions. With the current system, we plan [7] C. Gutwin and S. Greenberg. A descriptive framework of
to continue refining our representations and filters to deter- workspace awareness for real–time groupware. Computer
mine how the information can be best presented to develop- Supported Cooperative Work, 11(3):411–446, 2002.
ers. We currently visualize source code that is in the pro- [8] J. D. Herbsleb and R. E. Grinter. Architectures, coordina-
cess of being edited, and therefore the source code may be tion, and distance: Conway’s law and beyond. IEEE Soft-
inconsistent, incomplete and frequently updated. We are ware, pages 63–70, 1999.
[9] W. C. Hill, J. D. Hollan, D. Wroblewski, and T. McCandless.
investigating techniques for improving the robustness and Edit wear and read wear. In Proceedings of CHI’92, pages
performance of the mining component and visualizing par- 3–9. ACM Press, 1992.
tial information given these circumstances. [10] R. E. Kraut and L. A. Streeter. Coordination in software de-
Longer range plans involve extensions to the basic ideas velopment. Communication of the ACM, 38(3):69–81, 1995.
of project artifacts and interaction histories. We plan to [11] A. Sarma, Z. Noroozi, and A. van der Hoek. Palantı́r: raising
extend our artifact collection to include entities other than awareness among configuration management workspaces.
those in source code. Many other project artifacts exist, in- In Proceedings of ICSE 2003, pages 444–454, 2003.
[12] T. Schümmer. Lost and found in software space. In Pro-
cluding communication logs, bug reports and task lists. We
ceedings of the 34th HICSS, 2001.
hope to establish additional facts to model these artifacts [13] B. Zimmermann and A. M. Selvin. A framework for assess-
and to use the new artifacts and their relationships in the ing group memory approaches for software design projects.
awareness visualizations. In Proceedings of the conference on Designing interactive
We can also extend our use of the interaction histories systems, pages 417–426. ACM Press, 1997.
to other areas. For example, recording developers’ interac-

Vous aimerez peut-être aussi