Académique Documents
Professionnel Documents
Culture Documents
Introduction
106
Related Work
There is much research in the area of process mining [1]. People from dierent
research domains, such as software process engineering, software conguration
management, workow management, and data mining are interested in deriving
the behavioural models from the audit trails of the standard software.
The rst application of process mining to the workow domain was presented by Agrawal et al. in 1998 [10]. The approach of Herbst and Karagiannis [11] uses machine learning techniques for acquisition and adaptation of
workow models. The seminal work in the area of process mining was presented
by van der Aalst et al. [12,13]. In this work, the causality relations between activities in logs are introduced and the -mining algorithm for discovering workow
models is dened. The research in the area of software process mining started
in the mid 90ties with new approaches to the grammar inference problem proposed by Cook and Wolf [14]. The other work from the software domain is in
the area of mining from software repositories [15]. Our approach [4] aims at
combining software process mining with mining from software repositories; it
derives a software process from the logs of software conguration management
systems.
Another research area, which is discussed in this paper, is the area of Petri
net synthesis and the theory of regions. The seminal paper in this area was
written by Ehrenfeucht and Rozenberg [5]. It answered a long open question
in Petri net theory: how to obtain a Petri net model from a transition system.
Further research in this area came up with synthesis algorithms for elementary
net systems [7] and even proved some promising complexity results for bounded
place/transition systems [6].
First ideas of combining process mining and process synthesis were already
mentioned in the process mining domain [13,16]. In this paper, we make the next
step, we present an algorithm that enables us using the Petri net synthesis tool
Petrify [9] for process mining.
107
In this section, we present the overall approach; it combines our mining algorithms with Petri net synthesis algorithms in order to discover process models
from versioning logs of document management systems.
The overall scheme of this approach is presented in Fig. 1. It starts with a versioning log as an input; by means of our activity mining algorithm, we derive a
set of activities from the log. Using the set of activities, we do transition system
generation. From the transition system, we derive a Petri net with the help of the
synthesis algorithm. In this paper, we briey discuss our activity mining algorithm;
however, the focus of this paper is on the transition system generation, the use of
the synthesis algorithm and the process models that can be obtained by it.
3.1
In this section, we deal with the versioning logs and present the transition system
generation algorithm.
Initial Input and Activity Mining. Here, we briey discuss our activity
mining algorithm and the structure of the input it needs. This input information
is versioning logs of dierent document management systems, such as Software
Conguration Management (SCM) systems, Product Data Management (PDM)
systems and other conguration and version management systems.
An example of a versioning log is shown in Table 1. The log contains data
on the documents and timestamps of their commits to the system along with
data on users and log comments. The versioning log consists of execution logs
(in our example, they are separated by double lines), the structure of which
can be derived using additional information, not discussed in this paper. These
execution logs contain information about the instances of the process. Our small
example was inspired by the software change process [17]; for this process, there
are dierent executions, in which dierent documents are committed in dierent
order starting with the design and nishing with the review. We group
execution logs into clusters. A cluster is a set of execution logs, which contains
identical sets of documents. For example, the rst two execution logs make up
a cluster, because they both contain design, code, testPlan and review
documents; the third execution log forms another cluster.
From the information about the execution logs and their clusters, the documents and the order of their commits to the system, we derive a set of activities
108
Date
01.01.05
01.01.05
05.01.05
07.01.05
01.02.05
15.02.05
20.02.05
28.02.05
01.02.05
15.02.05
20.02.05
28.02.05
14:30
15:00
10:00
11:00
11:00
17:00
09:00
18:45
11:00
17:00
09:00
18:45
Author
de
dev
qa
se
de
qa
dev
se
de
se
dev
se
Comment
status: initial
status: generated
status: initial
status: pending
status: initial
status: initial
status: generated
status: pending
status: initial
status: initial
status: generated
status: pending
with the help of the activity mining algorithm (for details, see [4]). The resulting
set is shown in Table 2. Since we have only information about the documents,
we adopt a document-oriented view on the activities: they are dened by the
input and the output documents1 . The output documents are derived from the
logs straightforwardly; the challenge of activity mining is deriving the inputs,
because this information is not represented explicitly. The input contains all the
documents that precede the output document in all the execution logs. For each
activity, we have also shown the clusters from which it was derived; i.e. 1
means the cluster with the rst two execution logs, 2 is the cluster with the
third one. For example, activity 1 has s0 as input, design as output and can be
derived from clusters 1 or 2.
In general, let us assume, there are n clusters and each cluster is given a
unique identier from the set C = {1, . . . , n}. For every subset cl C, there is a
of sets of documents that belong to each
set Dcl , which contains the intersection
execution log of this cl: Dcl = ecl De . So, each activity is a tuple (I, O, cl),
where cl is a set of clusters from which this activity was derived; I and O are the
sets of input and output documents resp. In a formal notation, a set of activities
is dened the following way:
A {(I, O, cl)|I Dcl , O Dcl , cl C}
(1)
For each tuple, we dene a . notation, which gives the concrete eld value
by its name. E.g. for activity a1 = ({s0}, {design}, {1, 2}), we have a1 .I = {s0},
a1 .O = {design} and a1 .cl = {1, 2}.
1
109
Input
s0
s0,
s0,
s0,
s0,
s0,
s0,
s0,
s0,
Output
s0
design
design
code
design, vericationResults
code
design
testPlan
design
vericationResults
design, code, testPlan
review
design, code, vericationResults
review
design, code, testPlan, review
e0
design, code, vericationResults, review e0
Clusters
1, 2
1, 2
1
2
1
2
1
2
1
2
110
s_s0
design
s_s0_design
code
s_design_s0_code
testPlan
verificationResults
s_design_s0_testPlan
testPlan
s_design_s0_verificationResults
code
code
s_testPlan_design_s0_code
review
s_code_design_s0_testPlan_review
e0
s_code_design_review_s0_testPlan_e0
s_design_s0_verificationResults_code
region
review
s_code_design_s0_verificationResults_review
e0
s_code_design_review_s0_verificationResults_e0
In this section, we describe the last step of our mining and synthesis approach:
synthesis of a Petri Net from a mined transition system. We use the tool Petrify
[9] for it.
Petrify, given a nite transition system, synthesizes a Petri net with a reachability graph that is bisimilar to the transition system. The synthesis algorithm
is based on the theory of regions and was described in the work of Cortadella
et al. [19]. Petrify uses labelled Petri nets and, thus, supports synthesis from
arbitrary transition systems. It supports dierent methods for minimizing the
Petri nets and for improving the eciency of the synthesis algorithm. Here, we
do not go into the details of the synthesis algorithm, but give the essential idea
and motivate the relevance of it for the process mining area.
111
code
review
design
e0
verificationResults
testPlan
A region is a set of states to which all transitions with the same labels have
the same relations: either they enter this set, or they exit this set or they do not
cross this set. For example, in the transition system in Fig. 2, the set of states
{ s code design s0 testP lan review,
s code design s0 verif icationResults review
is a region, because all transitions with a label review enter this set and all
transitions with a label e0 exit it. Petrify discovers a complete set of minimal
regions for the given transition system and then removes the redundant ones. A
region corresponds to a place in the synthesized Petri Net; so, Petrify tries to
minimize the number of places and to make the Petri net understandable. For
example, the synthesized Petri net is shown in Fig. 3. A place between Petri
net transitions review and e0 corresponds to the set of states, shown above.
In the transition system, dierent transitions correspond to the same event. An
event in the transition system corresponds to a Petri net transition. For example,
for the event review there is a transition with the identical name. There is
an arc between a transition and a place in the Petri net, if the corresponding
transition in the transition system enters or exits the corresponding region.
In the context of process mining, the generated Petri net represents the control aspect of the process and models concurrency and alternatives, which were
initially hidden in the logs. The transitions represent the activities. Since we
have a document-oriented view on the activities, the execution of every activity
results in committing a document to the document management system. By now,
activities are named by the names of the committed documents, for example,
activity code results in committing the document code to the system.
Since Petrify supports label splitting, it allows us to synthesize Petri nets
under dierent optimization criteria and belonging to dierent classes, such as
pure, free-choice, etc. Practically, for big projects, for complex Petri nets, we can
generate pure or free-choice versions of them, which can be better understandable
by managers and process engineers and, therefore, serve communication purposes
in the company. For example, for the Petri net shown in Fig. 3, we can generate
a pure analog of it, see Fig. 4.
3.3
Along with applying our algorithms to the area of process mining from the
versioning logs, we have also dealt with the activity logs as a standard input for
112
verificationResults
code
review
e0
testPlan
the most of classical mining approaches [13,14]. These logs are usually obtained
from the workow management systems or some standard software which is used
for executing the processes in the company. For activity logs, we have deliberately
chosen an example, which is very similar to the one given for verioning logs in the
previous part of this section; it was done to motivate the generality of the mining
and synthesis approach and to improve the readability of the paper. Actually, the
algorithms for dealing with the versioning logs and for dealing with the activity
logs are absolutely dierent and one can not be replaced by the other.
An example of the activity log (event log, as it is often called in literature)
is shown in Table 3. It consists of process executions, which represent process
instances (cases); in our example, we have three instances of the process. Every
instance contains a set of activities and an order of their execution. For example,
in the rst instance, activities are executed in the following order: doDesign,
writeCode, planTest and then doReview. We add activity s0 to the
beginning of every log and activity e0 to the end of every log to make the
process start and the process end explicit.
From the activity log, without any preprocessing steps, we can generate a
transition system. In this case, a state is again a set of activities. An event is
an activity enabled in a state. An activity is enabled in a state when there is a
process execution, where the activity is executed after the set of the activities
of the state. For example, the system is in a state s1 = {s0, doDesign}, when
activities s0 and doDesign have been executed. Since in the Execution 1, an
activity writeCode is executed after the activities of the state s1 , an event
writeCode can occur in this state. When the activity is executed, the system
comes to a state s2 = {s0, doDesign, writeCode}; so, there is a transition between the states s1 and s2 . The resulting transition system is shown in Fig. 5.
The Petrify synthesis algorithm generates a Petri net from it, see Fig. 6.
113
s_s0
doDesign
s_s0_doDesign
planTestswriteCode
s_s0_doDesign_planTests
verify
s_s0_doDesign_writeCode
writeCode
s_s0_doDesign_verify
planTests
s_s0_doDesign_writeCode_planTests
writeCode
s_s0_doDesign_verify_writeCode
doReview
doReview
s_s0_doDesign_writeCode_planTests_doReview
s_s0_doDesign_verify_writeCode_doReview
e0
e0
s_s0_doDesign_writeCode_planTests_doReview_e0
s_s0_doDesign_verify_writeCode_doReview_e0
planTests
doDesign
verify
doReview
e0
writeCode
In this section, we show the rst steps and directions for the evaluation of the
presented algorithms. For making a small ad-hoc comparison with the existing
process mining approaches, we have used ProM and the -algorithm [13] for
generating a Petri net from the log presented in Table 3. As a result, we have
got the Petri net shown in Fig. 7. The algorithms provide dierent results, but,
for example, for our small activity log, the synthesized Petri net has no deadlocks and it models all the process executions from the activity log, whereas the
model obtained with ProM reaches a deadlock situation after executing activities
doDesign and planTests and, thus, does not model the Execution 2.
This shows that our algorithm gives a better result for at least one example. But there are other benets: First, we are capable of dealing with dierent
sources of information: versioning logs and activity logs. Second, our approach is
114
doD esign
verify
writeC ode
doR eview
planTests
exible and extensible, because improving the initial algorithms (they work with
versioning logs) for dealing with the activity logs resulted in: 1) removing clustering and activity mining parts, which are specic and necessary for versioning
logs; 2) slightly changing the transition system generation part2 . In general, the
Petri net synthesis approach assumes having complete transition system with all
possible transitions, which is not always a realistic case; but, for the versioning
logs, the activity mining algorithm has to cope with the defects of the input data
and the transition system generation algorithm remains the same.
Our algorithms were implemented in Prolog, which gives a certain exibility
of the solution and simplies the capabilities of experimenting with it and expanding it. We have made several experiments with the algorithms. For these
experiments, the logs were generated articially but they are based on our experience on real examples. The execution times of all the algorithms (mining,
transition system generation and synthesis) are shown in Table 4. The execution time depends on the number of executions (execution log) and the average
number of documents in the execution. The columns in the table correspond to
the experiments; the time needed for constructing a Petri net from 10 logs with
10 documents in each log is less then 10 seconds, which is a rather promising
result, since this is an example in the size of a realistic log.
In this section, we have presented the rst steps towards combining the mining and the synthesis approaches for discovering process models from both versioning logs and activity logs. Though, the approach is not fully worked out and
evaluated yet, we can already see its benets even for the given simple examples.
In this paper, we have presented mining and synthesis algorithms, which derive
a Petri net model of a business process from a versioning log of a document
2
Now, the ProM community has done their own implementation of some regions
algorithms, which is available as a Region miner plugin for ProM.
115
management system. This way, we have opened a new application area for mining
without activity logs. We have also shown an extension of our approach, which
can deal with activity logs of workow management systems. The approach uses
the well-developed and practically-applicable theory of Petri net synthesis for
solving a vital problem of process mining. In order to do it, we have developed
a transition system generation algorithm, which is the main focus of the paper.
The algorithms which were presented in this paper can deal with concurrency
and alternatives in the process models. By now, we are not dealing with iterations. Detecting iterations in the versioning logs is a very important domainspecic and company-specic problem. We will deal with this problem in our
future research, even though this problem appears rather seldom, if the conventions of using the document management system are introduced and fullled in
the company. Another relevant domain-specic problem is identifying the activities and naming them meaningfully. Both issues belong to the part on activity
mining. In the future, we will improve the activity mining algorithm and, possibly, use the interaction with the user for solving these problems. However,
activity mining is not the focus of this paper; as soon as it is improved, the
transition system generation algorithm has only to be slightly changed for introducing iterations and activities identiers to the transition systems.
Much work has to be done in applying the mining and synthesis algorithms
to dierent document management systems in dierent application areas and
making practical evaluation of them both in the area of business process management and software process engineering. Since our approach is also relevant
to the area of mining the activity logs, in the future, we should also compare it
to the existing approaches in this area. This paper aims at making the rst step
from the well-developed theory of Petri net synthesis to the practically relevant
research domain of process mining.
References
1. van der Aalst, W., van Dongena, B.F., Herbst, J., Marustera, L., Schimm, G.,
Weijters, A.J.M.M.: Workow mining: A survey of issues and approaches. Data &
Knowledge Engineering 47 (2003) 237267
2. Kindler, E., Rubin, V., Sch
afer, W.: Incremental Workow mining based on Document Versioning Information. In Li, M., Boehm, B., Osterweil, L.J., eds.: Proc.
of the Software Process Workshop 2005, Beijing, China. Volume 3840 of LNCS.,
Springer (2005) 287301
3. Humphrey, W.S.: Managing the software process. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1989)
4. Kindler, E., Rubin, V., Sch
afer, W.: Activity mining for discovering software process models. In Biel, B., Book, M., Gruhn, V., eds.: Proc. of the Software Engineering 2006 Conference, Leipzig, Germany. Volume P-79 of LNI., Gesellschaft f
ur
Informatik (2006) 175180
5. Ehrenfeucht, A., Rozenberg, G.: Partial (Set) 2-Structures. Part I: Basic Notions
and the Representation Problem. Acta Informatica 27 (1989) 315342
6. Badouel, E., Bernardinello, L., Darondeau, P.: Polynomial algorithms for the synthesis of bounded nets. In: TAPSOFT. (1995) 364378
116
7. Desel, J., Reisig, W.: The synthesis problem of Petri nets. Acta Inf. 33 (1996)
297315
8. Badouel, E., Darondeau, P.: Theory of regions. In: Lectures on Petri Nets I: Basic
Models, Advances in Petri Nets, the volumes are based on the Advanced Course
on Petri Nets, London, UK, Springer-Verlag (1998) 529586
9. Cortadella, J., Kishinevsky, M., Kondratyev, A., Lavagno, L., Yakovlev, A.: Petrify:
a tool for manipulating concurrent specications and synthesis of asynchronous
controllers. IEICE Transactions on Information and Systems E80-D (1997)
315325
10. Agrawal, R., Gunopulos, D., Leymann, F.: Mining Process Models from Workow
Logs. In: Proceedings of the 6th International Conference on Extending Database
Technology, Springer-Verlag (1998) 469483
11. Herbst, J., Karagiannis, D.: An Inductive approach to the Acquisition and Adaptation of Workow Models. citeseer.ist.psu.edu/herbst99inductive.html (1999)
12. Weijters, A., van der Aalst, W.: Workow Mining: Discovering Workow Models
from Event-Based Data. In Dousson, C., H
oppner, F., Quiniou, R., eds.: Proceedings of the ECAI Workshop on Knowledge Discovery and Spatial Data. (2002)
7884
13. van der Aalst, W., Weijters, T., Maruster, L.: Workow mining: Discovering process
models from event logs. IEEE Transactions on Knowledge and Data Engineering
16 (2004) 11281142
14. Cook, J.E., Wolf, A.L.: Discovering Models of Software Processes from Event-Based
Data. ACM Trans. Softw. Eng. Methodol. 7 (1998) 215249
15. MSR 2005 International Workshop on Mining Software Repositories. In: ICSE 05:
Proceedings of the 27th international conference on Software engineering, New
York, NY, USA, ACM Press (2005)
16. Herbst, J.: Ein induktiver Ansatz zur Akquisition und Adaption von WorkowModellen. PhD thesis, Universit
at Ulm (2001)
17. Kellner, M.I., Felier, P.H., Finkelstein, A., Katayama, T., Osterweil, L., Penedo,
M., Rombach, H.: ISPW-6 Software Process Example. In: Proceedings of the First
International Conference on the Software Process, Redondo Beach, CA, USA, IEEE
Computer Society Press (1991) 176186
18. Wielemaker, J.: An overview of the SWI-Prolog programming environment. In
Mesnard, F., Serebenik, A., eds.: Proceedings of the 13th International Workshop
on Logic Programming Environments, Heverlee, Belgium, Katholieke Universiteit
Leuven (2003) 116 CW 371.
19. Cortadella, J., Kishinevsky, M., Lavagno, L., Yakovlev, A.: Deriving Petri nets from
nite transition systems. IEEE Transactions on Computers 47 (1998) 859882