Vous êtes sur la page 1sur 25

Using workflow data to explore the structure of an organizational routine

Brian Pentland1, Thorvald Haerem2 and Derek Hillison1


1

Michigan State University, 2Norwegian School of Management

Prepared for the 3rd International Conference on Organizational Routines: Empirical research and conceptual foundations 25-26 May 2007, Strasbourg, France

Abstract
Empirical studies of organizational routines are often limited because performances are distributed in time and space, and therefore difficult to observe. Workflow management systems provide an opportunity to collect data about a large number of complete performances at relatively low cost. In this paper, we analyze data from a workflow management system for processing invoices to demonstrate how workflow event logs can be used to analyze organizational routines. The sample includes all invoices processed over an eleven month period (N=2072). We use the data to compute a variety of measures which can be used to identify and compare the structures of routines.

Acknowledgements We thank Compello Software, Visma Services, Bo Hjort Christensen and Norwegian Research Council for their support. We also appreciate the help from Alf Slettemoen in collecting and preparing the data for the analyses.

Using workflow data

Page 2 of 25

Introduction
Empirical study of organizational routines poses many difficulties for the researcher. Routines are typically distributed in time, space, and throughout an organizations structure. Short of stapling yourself to the paperwork, it is difficult to observe even a single performance of an organizational routine from beginning to end. In addition, the natural variability in performances makes it difficult to identify a representative pattern that encompasses the range of possible performances. Time, money and patience often limit us to observations of a few performances, or parts of performances, or interviews with a subset of participants. Workflow systems provide an unprecedented opportunity to gather data about the patterns of action generated by routines (van der Aalst et al, 2003;). With the proliferation of computer network technology, more and more organizations have adopted workflow systems to support their routines (Basu and Kumar, 2002). These systems typically involve a mixture of human and automatic processingthey are like the glue that holds together other, more common applications for accounting, email, and so on (Becker, Muehlen and Gille, 2002). Typical workflow systems generate event logs that include time-stamped records of each event or action that occurs in the system, making it possible to collect large numbers of performances at very low cost. Computer scientists have made tremendous progress in analyzing event logs for a range of purposes, such as the recovery of formal process models (van der Aalst et al, 2003; van der Aalst and Weijters, 2004). Organizational scholars, however, have not paid as much attention to this increasingly ubiquitous source of data about organizational processes and routines. In this paper, we demonstrate how workflow event logs can be used to analyze organizational routines. We begin by discussing the challenges involved in analyzing routines

Using workflow data

Page 3 of 25

as systems that generate patterns of events that can change over time. We provide a survey of techniques for analyzing and representing the structure of organizational routines, and illustrate these techniques using over 2000 performances of an invoice approval routine, as captured in a workflow management system. While workflow event logs have several important limitations, it seems clear that they provide a valuable source of data about organizational routines.

Theory
To have an empirical science of organizational routines, we need to be able to identify particular routines and compare them to other routines. These analytical moves identification and comparison are fundamental to any empirical science. In disciplines where individuals or organizations are the unit of analysis, such as psychology or economics, identification and comparison are difficult. The additional complexity of studying patterns of behaviors by groups of individuals that take place in the context of an organization compounds this difficulty. Identification and comparison are particularly challenging if we conceptualize routines as generative systems (Feldman and Pentland, 2003; Pentland and Feldman, 2005). We are able to observe surface structure (the performative aspects of the routine), but we would like to say something about the underlying structures which generate the performances. The grammatical metaphor of surface vs. deep structure has limitations when applied to social systems (Pentland, 1995), but the basic analogy is useful when trying to understand human perceptions of complex tasks and systems (Chi, Feltovich and Glaser, 1981; Haerem and Rau, 2007): we are trying to understand the features of a system that can generate a potentially infinite variety of performances from a finite sample of performances.

Using workflow data

Page 4 of 25

When we say that organizational routines are generative systems, it means that there is some underlying mechanism that generates the interdependent patterns of action that we recognize as an organizational routine. Figure 1 illustrates this perspective.

Figure 1: Routine as generative structure

Routine

Generates

Performances ABCD

Ostensive pattern ABCD

ACD CD (etc)

Figure 1 assumes that there is one routine generating the performances we are seeing. But in field research, the situation is more like Figure 2: we see a collection of performances, and we would like to know something about the underlying routine. Is there one routine, or many? Which performances represent standard practice, and which are exceptions? More broadly, how can we generalize about a routine from observing a sample of its performances?
Figure 2: Generalization of structure from specific performances

Performances
ABCD ACD CD (etc)

Indicate

Routine?

Using workflow data

Page 5 of 25

We can see this problem in two ways: measurement or induction. As a measurement problem, we are using the performances to compute various properties of the routine, such as the average cycle time, or the number of steps. Because each performance has a beginning and an end, we can compute the cycle time. The variability of the performances is easily dealt with by familiar procedures such as averaging, or other measures of central tendency. We can use these properties to compare routines (which routine is faster/better/cheaper?) While this familiar procedure works well for properties, our current theory of organizational routines suggests that they are not just things with variable properties. Rather, they are complex generative systems that produce recognizable, repetitive, patterns of interdependent actions carried out by multiple actors. If we are interested in the pattern of actions the actual performances of the routine -- then familiar statistical techniques like averaging are not applicable, because the data are not scalar. Generalizing about the patterns involves induction: generalizing about a class from observing instances of that class. For example, one might observe several black crows, and on the basis of these observations, one might conclude that all crows are black. If a routine generated identical performances, the induction problem would be just that easy. Unfortunately, we know that organizational routines tend to generate a great variety of patterns. In a typical situation, we are attempting to generalize from a small set of observations to a large class that is filled with variety. Furthermore, the granularity of observations can affect our impression of how much variety is present. From a distance, every performance looks the same; up close, they are all different. One practical approach to this problem is to simply ignore the variety and describe typical performances. In process mapping, for example, we interview participants to create a

Using workflow data

Page 6 of 25

flow chart (or some other kind of diagram) that summarizes their ostensive understanding of how routine typically works. This method avoids the problem of induction because it bypasses the primary evidence the actual performances of the routine. While flowcharts and other process maps can be useful, this seems a rather weak basis for an empirical science of routines, especially when we know that flow charts are often a poor representation of actual activities (Suchman, 1997). Properties and patterns over time Part of what makes organizational routines different is their intrinsically event-based nature. To study routines up close, and understand them, we must account for the role of time. Within a single performance, time is embodied in the pattern of events. For some routines, like the invoice routine, the events occur over a few hours or days. For other routines, like hiring or technology roadmapping (Howard-Grenville, 2005), events may take weeks or months. Even in a very fast routine, like a sports team driving for the goal, or landing an airplane, timing is critical. Nevertheless, when we write about routines, these patterns are generally conceptualized as synchronic occurring at a single moment in time. When we consider changes in the pattern over time, of course, we need a longitudinal (more macro-scale) time (Barley, 1990). Figure 1 shows the different ways that time can factor into empirical studies of routines.
Figure 1: Approaches to identification and comparison Properties Cross-sectional Synchronic comparison Patterns Identification

Longitudinal

Diachronic comparison

Evolution

Using workflow data

Page 7 of 25

Our familiar statistical tools are well suited for working with properties, such as cycle time. Cross-sectional data takes a snap-shot of a routine during a particular time window, which is sufficient for a synchronic description or comparison (Barley, 1990). With longitudinal data, we can do diachronic comparisons and track how the attributes of the system change over time. When working with patterns, the empirical operations are less familiar. Typical social science methods work on scalar (or columnar) data like questionnaire items that can be fed into correlation-based analyses. Network models provide a prominent example of a more sophisticated approach, and we will illustrate a network-based approach later in this paper. Other techniques, such as classification and clustering algorithms, generally depend on computing a distance between patterns. In other words, we need to characterize the space of possible patterns and locate each observed pattern within that space. The ability to classify patterns directly would be an important aid in an empirical science of organizational routines. For example, we theorize that routines are composed of sub-routines, and that these sub-routines can be re-combined in various ways to create new routines. But how can we identify particular subroutines, given the enormous variability in a typical set of performances? The answer is not obvious, but the ability to identify and compare the structure of routines and map their change over time would be fundamental to empirical tests of our most influential theories about stability, change, and evolution in organizations. Nelson and Winter (1982) hypothesize that organizations change by mutation, and by recombining routines in various ways. Genetic models and metaphors continue to be an influential approach to theorizing about routines (Becker, 2004), but empirical evidence from real organizations has been difficult to obtain, and methods for analyzing changes in patterns over time have been lacking. The use of workflow data may afford the possibility of testing evolutionary concepts. In

Using workflow data

Page 8 of 25

the following sections, we offer some modest examples of how workflow data might be used to explore the structure of an invoice processing routine.

Methodology
For this paper, we collected workflow data from an invoice-management system (Compello Software) for one company over an 11-month period. The data include 2072 performances of the invoice processing routine, consisting of 62756 individual actions taken from a lexicon of 31 action types. Figure 2 shows a flow chart for the invoice process. Invoices can enter the system on paper (which is most common), or via an electronic portal. If the invoice conforms to a known Form type, it is scanned, registered in the system, and distributed for approval. Some invoices can be immediately sent to the financial system for payment; others require multiple approvals. Once these are complete, the invoice is paid.

Using workflow data

Page 9 of 25

Form?

Scanning

OCR

OCR Form def.

Financial system

Electronic invoice

Registration/Distribution

Authorization

Status Fully posted

Figure 2: Flow chart of the invoice processing routine

Table 2 shows one performance of the invoice process routine as it appears in the workflow event log. The first column in the table indicates that the routine consists of two main phases: entry and approval. The first portion of this routine, referred to as Entry, is almost entirely automated. When Person is -1, that means that a computer is taking the action automatically, according to a rule. In the approval phase, some actions are automatic, but others are manual. People are back in the picture. In the analysis that follows, we will sometimes refer to the routine as a whole, but at times it will be useful and interesting to compare the automated, initial entry section of the routine (Entry) to the more manual, approval phase (Approval). After the last approval, the invoice data is exported to a separate accounts payable system that is outside the scope of the workflow system. This is an example of

Using workflow data

Page 10 of 25

a limitation to workflow event logs their scope is limited by the design and architecture of the underlying systems.
Table 2: Example event log from workflow system1
TYPE OF ACTION Data entry Data entry Data entry Data entry Data entry Data entry Data entry Data entry Data entry Approve Data entry Approve Data entry Workflow Workflow Data entry Data entry Data entry Workflow Data entry Approve Data entry CI 27475 3/31/2006 4/20/2006 33 some text 1250 2 some text 5 1250 5 33 2 2 2 Acct 16 4/5/2006 131 5 some text

PHASE Entry Entry Entry Entry Entry Entry Entry Entry Entry Entry Entry Approval Approval Approval Approval Approval Approval Approval Approval Approval Approval Approval

ACTION Enter document type Enter invoicno. Enter invoicedate Enter duedate Enter period Enter vendor account Enter amount Enter currency Enter text Approve Enter amount Approve Enter period Distribute to approver Distribute to approver Enter currency Enter account Enter Tax-code Notify Enter value dim. 1 (DP) Approve Enter text

DATA

DATE 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 05-Apr-06 06-Apr-06 18-Apr-06

PERSON -1 -1 -1 -1 -1 -1 -1 -1 6 -1 6 19 -1 6 6 -1 19 19 -1 19 22 44

DATA CAPTURE OCR/XML OCR/XML OCR/XML OCR/XML OCR/XML OCR/XML OCR/XML System Manual System Manual Manual OCR/XML Manual Manual System Manual System/manual System Manual Manual Manual

In some respects, workflow data is extremely convenient, because it is computer generated. Events are unitized, categorized and time-stamped. This largely eliminates the need for the arduous and time-consuming task of coding that would be required in a typical observational or archival field study. In other respects, depending on how the workflow event log is stored, it may require some rather sophisticated computer procedures (e.g., queries written in SQL) to retrieve it. Many software programs databases differ from installation to installation. Such differences introduce more complexity in making the data comparable across organizations.

Some columns have been removed or values changed for confidentiality.

Using workflow data

Page 11 of 25

To minimize such complications we chose a process and software which allowed a variation within a standardized framework. But once data has been retrieved, it is more or less ready for analysis. Describing the data: Lexical variety The most basic property of a routine is the list of events it generates. Events are fundamental because we need the events to describe and recognize the patterns. By analogy to human language, we refer to the number of different events generated by a routine as its lexical variety or lexical size. Table 2 provides some measures of the lexical variety each phase in the routine. These measures alone might suggest that entry and approval are basically the same they have almost the same number of actions in their lexicon (15 vs. 17); they have a similar average length; and a similar number of different lexical items appear in each performance.
Table 2: Lexical variety measures Entry Average Length Lexicon Size Average Lexicon Items 11.31532 15 10.15916 Approval 12.2525 17 9.634741

While the number of events in each phase is similar, the events themselves are not. Figure 3 shows which events are found in each phases of the routine. Some of events are shared between phases, yet other events are unique to a specific phase. Thus, one can easily discriminate between the phases based on the presence (or absence) of particular events. In other words, these two phases can be easily identified or recognized. In cases where there is more overlap, more elaborate classification (or pattern recognition) techniques can be used to

Using workflow data

Page 12 of 25

discriminate between phases (e.g., markov and n-grams are two widely used techniques for classifying sequence data).
Figure 3: Frequency of action types in the entry and approval phases

Frequency of Actions in Entry and Approval Phase


New scanning Forward Approved by backup Enter value dim. 4 (EM) Send back to accountant Reminder (last if more than one) Remove approver Enter Last Comment Enter value dim. 2 (PR) Enter value dim. 3 (SP) Notify Enter value dim. 1 (DP) Enter Tax-code Enter account Distribute to approver Enter document type Enter vendor account Enter invoicno. Enter invoicedate Enter duedate Enter amount Approve Enter text Enter period Enter currency 0 500 1000 1500 2000 2500

Entry Phase

Approval Phase

Listing the sequences Lexical variety provides a foundation, but we are ultimately more interested in the performances, which are temporally ordered patterns of events. To describe how performances of a routine vary, one could simply list all the sequences one observes. If the routine is simple enough, with fixed performances, this might be a good way to describe what the routine does.

Using workflow data

Page 13 of 25

Table 3 shows the ten most frequent sequences for each phase of the routine. These sequences were produced by assigning an integer code to each event type in the lexicon (taken from the second column of table 2).
Table 3: Ten most frequent sequences for each phase

Entry Phase
# 904 419 147 121 37 36 30 25 24 17 % of total 45.25% 20.97% 7.36% 6.06% 1.85% 1.80% 1.50% 1.25% 1.20% 0.85% Action Sequence (119 unique, n= 1998) 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 16 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7, 13 9, 12, 11, 10, 14, 21, 7, 8, 2, 16, 7 9, 12, 11, 10, 21, 7, 8, 16, 2, 7, 14 9, 12, 11, 10, 14, 8, 21, 7, 16, 2, 7 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 16, 13 9, 12, 11, 14, 21, 7, 8, 16, 2, 7, 10 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 13, 16 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7, 7 # 138 92 51 44 44 37 33 32 27 25 % of total 6.01% 4.01% 2.22% 1.92% 1.92% 1.61% 1.44% 1.39% 1.18% 1.09%

Approval Phase
Action Sequence (902 unique, n=2297) 2, 14, 5, 8, 16, 6, 15, 20 14, 5, 5, 8, 25, 19, 2, 16, 6, 15, 17, 2 2, 14, 5, 5, 8, 25, 19, 2, 16, 6, 15, 17 2, 14, 5, 5, 8, 25, 19, 2, 6, 15, 17, 16 2, 14, 5, 5, 8, 25, 2, 16, 6, 15, 17 2, 2, 14, 5, 5, 8, 16, 6, 15, 25, 17, 19 2, 14, 5, 5, 8, 25, 2, 6, 15, 17, 16 2, 2, 14, 5, 5, 8, 6, 15, 25, 17, 19, 16 2, 14, 5, 8, 16, 6, 15, 17 2, 14, 5, 5, 8, 16, 6, 15, 25, 17, 19, 2

In the invoice routine, raw enumeration is apparently not a useful way to summarize the routine: there is too much variety. There are 119 distinct sequences in the entry phase, and 902 distinct sequences in the approval phase. For the entry phase, the ten most common sequences account for 88% of the observed sequences. For the approval phase, the ten most common account for just under 23%. One implication of Table 3 is that, even for automated workflows such as the Entry phase, there can be quite a lot of variability in the performances. For the less automated Approval phase, variability is even higher. For this routine, collecting a small sample of performances is not likely to result in a good description. Even for the entry phase, which is

Using workflow data

Page 14 of 25

almost entirely automated, there are over 100 distinct sequences. In the context of the invoice processing routine, these variations may or may not be particularly meaningful. In a more complex routine, there may be more than two phases (or subroutines), and they may be combined and recombined in many different ways. Clearly, we need tools to identify and compare such variations and combinations.

Structure of routines
To assess the structure of the routine, and possible changes over time, we need to analyze the performances. We would like to be able to compute properties of the routine that are sensitive to the patterns, so that we can perform cross-sectional and longitudinal comparisons. Better yet, we would like to be able to compare the patterns directly to support identification of routines and testing of evolutionary models. Sequential Variety To create a meaningful comparison that takes into account the pattern of events, we need to use methods that use information about the sequences. Sequential variety (Pentland, 2003) uses all of the data in a set of observed sequences (not just pairs of events) to create a single, scalar measure of the variability in the sample. It provides an index of the variability in the performances that can be used for either cross sectional or longitudinal comparisons. Sequential variety is closely correlated to the complexity of the observed sequences, as measured by the Lempel-Ziv complexity.2

Algorithmic complexity is a well-established approach to detecting and measuring spatio-temporal patterns in sequence data (Kaspar and Schuster, 1987). The Kolgomorov-Chaitin Complexity (often called the Kolgomorov complexity) refers to the length of the shortest self-terminating program required to produce a given sequence. While the Kolgomorov-Chaitin measure has great theoretical value, it is not readily computable (Kolgomorov, 1965). To address this problem, Lempel and Ziv (1974) devised an algorithm that provides an index of the Kolgomorov complexity. The Lempel-Ziv measure has similar properties to the Kolgomorov-Chaitin measure, and

Using workflow data

Page 15 of 25

Sequential variety is computed by comparing each sequence in a sample to every other sequence (Pentland, 2003). The distance between sequences is computed using optimal string matching (Sankoff and Kruskal, 1983; Abbott, 1995). This distance (also called a Levenshtein distance) can be used for subsequent clustering and pattern classification. Sequential variety is simply the average distance between sequences in a sample. Like the Lempel-Ziv complexity, this method uses entire sequences (not just pairs of actions). Figure 4 shows the sequential variety of the two phases of the invoice process over the 11 month period in our data set.
Sequential variety over time
8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 Month Initial Phase Approval Phase 7 8 9 10 11

Figure 4: Entry phase is less variable than the approval phase

Figure 4 shows the sequential variety of each phase in the invoice routine. The numbers in Figure 4 are computed month by month. This is, of course, an artificial way of partitioning

thus provides a practical measure of algorithmic complexity on empirical data (Kaspar and Schuster, 1987). For example, Butts (2001) uses the Lempel-Ziv measure to explore the effects of complexity in social networks.

Using workflow data

Page 16 of 25

the data. But sequential variety depends on comparing sequences within a sample, such as a months worth of invoices. Figure 4 demonstrates that the entry phase is consistently less variable than the approval phase. This makes sense, because the entry phase is more highly automated and (as shown earlier) the sequences are more similar. Network Graphs and Measures The simplest way to summarize a large number of patterns is to break each sequence of action into pairs of actions. In this way, workflow data can be processed and displayed as a valued, directed network graph, as shown in figure 5. Each node in the graph represents a type of action, while the arcs represent the sequential relationship between actions. Formally, these graphs are first-order Markov models, except that the nodes are events rather than states. Pentland (1999) calls this kind of graph an action network because it indicates which actions follow which other actions. The action network perspective allows us to use methodologies that had previously been used for the study of social networks to ask new questions about the nature of the routine in the organization.

Figure 5: Action network for entire routine (all 31 actions)

If we graph the relationships between actions in each phase, we can visually compare how these graphs may be similar or different. Figure 5 shows the network of actions found in

Using workflow data

Page 17 of 25

the initial entry phase, while Figure 6 refers to the approval phase. The thickness of the arcs is an indication of how often a particular sequence occurs, thicker lines indication more frequent connections between those actions. From first glance, it is easy to see differences between them. The approval phase is much more interconnected and varied than the entry phase. This is consistent with the comparison of sequential variety in figure 4. More numerous (denser) connections should be correlated with higher sequential variety.

Figure 5: Action Network for Entry Phase


25

21

10

16

29

11 7 2

14 9 24 12

13

Using workflow data

Page 18 of 25

Figure 6: Action Network for Approval Phase

31 23 8 19 5 6 3 18 17 29 15 25 2 16 28 14 20

Once the routine has been represented as a network, a large number of analytical tools and techniques become available (Wasserman and Faust, 1994). In addition to the overall density, we can compute centrality of particular events, block models and structural equivalence. On a practical level, we can detect potential bottlenecks and most common paths. Dynamic networks Because we have a sequences sampled over a long period of time, we can show how the pattern of action within the routine changes over time using a software application called SoNIA (Moody, McFarland and Bender-deMoll, 2005). SoNIA creates dynamic network movies that allow visualization of changes in a network over time. On paper, we cannot display the movies, but figure 7 shows three time slices from the entry phase, based on consecutive samples of 25 invoices each. From these network graphs, the human eye can determine some common structures and gain further insight into the patterns of action over time. When the full set of invoices is animated, these patterns are even more apparent.

Using workflow data

Page 19 of 25

8
8

10

14

16

10

14

16

10

21
21
21

14

16

11 24 7

2
24

11 7

2
24

11 7

13 12
12

13
12

13

30

30

30

Time 1

Time 2

Time 3

Figure 8: Network of action changes over time (entry phase)

Dynamic network graphs give an exciting moving picture that allows the human brain to use its abstraction and induction capabilities to discover shapes and regularities in the movement of nodes and arcs. Time is vital to our understanding of the dynamics and flexibility found in an organizational routine, and the moving network graphs give us this.

Discussion
The ability to identify and compare organizational routines based on large samples of performances creates some exciting new possibilities that we have only begun to explore here. In the final section of the paper, we offer some thoughts on the limits and possibilities of this approach as we currently understand it. Data, data everywhere... The most exciting aspect of workflow data is the large and growing number of different organizations and processes for which it may be available. Workflow systems are built into enterprise software, such as SAP and Peoplesoft. In fact, basic workflow services are built into every copy of Microsoft Windows (MS Sharepoint is a workflow manager). So it is safe to say that workflow data is ubiquitous, but there may be difficulties preventing its universal use in research.

Using workflow data

Page 20 of 25

The first challenge, of course, is getting it. In many operational systems, logging is turned off because it would rapidly consume too much disk space. If so, then recording workflow data would require that the organization agree to activate this functionality. Depending on how the event data are stored, it may contain data that are considered confidential or sensitive. If so, it may be difficult to get an organization to invest the resources required to extract sequences of events (or other abstract representations) that do not compromise the confidentiality of the organization or its members. Organizations are understandably reluctant to share this kind of data, and protracted negotiations and non-disclosure agreements may be required. Thus, while the marginal cost of a thousand extra performances may be very low, the cost of getting any data at all may still be quite high. Unless the event log is stored in a single file or database table, considerable technical skill may be required to extract the data. The second challenge is insuring that it is comparable between organizations. Workflow software can be configured differently by each organization using it. The utility of workflow event logs depends to a great extent on what kinds of events have been recorded. The event log records events from a particular point of view, and only to the extent that programmers (and system configuration) allows. Different events may be monitored in the event log. Different processing rules may be used to automate different steps in the process. These differences are interesting and potentially researchable in their own right, but they could diminish or destroy the comparability of the data in the event logs. The third challenge in working with event log data is the novelty of the methods. We have found that we are making rather extensive use of SQL and Visual Basic to create customized data analysis procedures. Some analytical tools exist (e.g., for string matching), and others are being developed. One of the most promising developments is the availability of an

Using workflow data

Page 21 of 25

XML-based standard for representing workflow data (Van Dongen & van der Aalst, 2005) and an open-source software platform for analyzing these data structures(Van Dongen & van der Aalst, 2005) . This platform, called ProM, supports the use of plug-ins for analysis. While the availability of such tools lowers the barrier to entry somewhat, analyzing workflow data requires skills that are not part of a typical social science PhD program. Identifying structure is important While workflow data presents some challenges, the potential payoff is significant. For example, it can provide a better understanding of the underlying structure of routines. For a simple case like invoice processing, detailed analysis of the workflow might be perceived as overkill. But even in our simple example, some interesting issue arose. For example, is invoice processing one routine or two? How many routines are generating the patterns we observe, and how similar are they? Based on the lexicon, the entry and approval phase seem to be very similar. They have almost the same size lexicon, and nearly the same number of steps per performance. However, the actions in each phase are different and there is little overlap between the actions. Furthermore, the phases are very different in sequential variety, and in the actual patterns of action. Based on this analysis, invoice processing might be better characterized as two routines, and additional subroutines of these may be discovered with additional analysis. Combinatorics of organizational evolution This minor difference in description is important for a variety of reasons. First, organizations are hypothesized to adapt and evolve by combining and recombining routines. Routines are like genetic material, or perhaps building blocks, whose combinations can create meaningful differences in the overall organization. But what is the level of granularity at which

Using workflow data

Page 22 of 25

these combinations can occur? Is it high-level processes, such as order-to-cash? Or is it the nominal process we see in the process map (e.g., invoice processing)? Or is it at an even lower level, such as the entry and approval phases identified here? By identifying distinct routines, we begin to set some boundaries on the combinatorics of organizational adaptation at the level of routines. While workflow event logs can help identify meaningful sub-routines, they may not be very reliable as a tool for observing novel combinations and re-combinations. This is because the event log is generated by a technical artifact (a software system) that could be swapped in (or out) of the picture at any time. It would only be possible to observe re-combinations at a lower level of granularity and a shorter time-scale than the life-span of the system itself. Sequential variety may indicate evolutionary potential Such smaller scale variations could be conceptualized as evolution within a routine, as well. Feldman and Pentland (2003) suggest the possibility of variation and selective retention within routines, as exception handling and improvisation create variations that may be recognized and retained as a part of the routine. Variability in routines is important because it creates opportunity for change. As new ways of doing things are tried out (accidentally or intentionally) and incorporated into the organizations repertoire of action. To observe this evolutionary process, we need detailed data on the patterns of action in each performance, over time. Of course, workflow data offers just such an opportunity. We expect that routines with higher sequential variety would have a greater possibility of displaying evidence of variation and selective retention. For example, the approval phase has much more variability than the entry phase, as shown in figure 4. If the organization is able to select desirable patterns, and retain them, they should be able to achieve improved outcomes over

Using workflow data

Page 23 of 25

time. For organizations that wish to direct their evolution intelligently, the identification of patterns, causes and outcomes is necessary to selectively retain the best patterns for execution in specific circumstances. Antecedents and consequences Workflow patterns can also be linked to outcome or performance variables. Outcome measures, such as cycle time, are often available from workflow systems, using the same data collection techniques as described earlier. By utilizing such measures we could answer questions about the consequence of sequential variety and particular workflow patterns. Potentially, one could investigate the criteria that seem to drive selection of routines (cost, quality, cycle time, etc.). It is also possible to study the antecedents of the structure of routines. Do different environments and organizations tend to produce the same patterns, or are there systematic differences? Do different organizations, given similar environments, produce similar patterns? Are there characteristics of the persons or team responsible for the workflow system that may predict variation in patterns of actions? In other words, are routines shaped more by the external environment or by internal features of the organization? While answers to these questions seem a long way off at the moment, the ability to analyze workflow data lets us contemplate them in ways that were never possible before.

Using workflow data

Page 24 of 25

References
Abbott, A. (1995). Sequence Analysis: New Methods for Old Ideas. Annual Review of Sociology, 21, 93-113. Barley, S. R. (1990). Images of Imaging: Notes on Doing Longitudinal Field Work. Organization Science, 1(3), 220-247. Basu, A., & Kumar, A. (2002). Research Commentary: Workflow Management Issues in eBusiness. Information Systems Research, 13(1), 1-14. Becker, J., zur Muehlen, M., & Gille, M. (2002). Workflow Application Architectures: Classification and Characteristics of workflow-based Information Systems. Workflow Handbook, 39-50. Becker, M. C. (2004). Organizational routines: a review of the literature. Industrial and Corporate Change, 13(4), 643-678. Butts, C. T. (2001). The complexity of social networks: theoretical and empirical findings. Social Networks, 23 31-71. Chi, M. T. H., Feltovich, P. J., & Glaser, R. Categorization and Representation Physics Problems by Experts and N0vices. cOGNITIVE SCIENCE, 5, 121-152. Feldman, M. S., & Pentland, B. T. (2003). Reconceptualizing Organizational Routines as a Source of Flexibility and Change. Administrative Science Quarterly, 48(1), 94-118. Haerem, T., & Rau, D. (In Press). The Influence Of Degree Of Expertise And Objective Task Complexity On Perceived Task Complexity And Performance. Journal of Applied Psychology. Howard-Grenville, J. A. (2005). The Persistence of Flexible Organizational Routines: The Role of Agency and Organizational Context. Organization Science, 16(6), 618. Kaspar, F., & Schuster, H. G. (1987). Easily Calculable Measure for the Complexity of Spatiotemporal Patterns. Physical Review A 36(2), 842-848. Lempel, A., & Ziv., J. (1976). On the Complexity of Finite Sequences. IEEE Transactions on Information Theory, 22. Moody, J., McFarland, D., & Bender-deMoll, S. (2005). Dynamic Network Visualization. American Journal of Sociology, 4, 1206-1241. Nelson, S. G., & Winter, R. R. (1982). An Evolutionary Theory of Economic Change. Cambridge, MA: Harvard University Press.

Using workflow data

Page 25 of 25

Pentland, B. T. (1995). Grammatical Models of Organizational Processes. Organization Science, 6(5), 541-556. Pentland, B. T. (1999). Organizations as Networks of Action. In J. A. C. Baum & W. McKelvey (Eds.), Variations in Organization Science: Essays in Honor of Donald T. Campbell. Thousand Oaks, CA: Sage. Pentland, B. T., & Feldman, M. S. (2005). Organizational Routines as a Unit of Analysis. Industrial and Corporate Change, 14(5), 793-815. Sankoff, D., & Kruskal, J. B. (1983). Time Warps, Strings Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley. van der Aalst, W. (2003). Business Process Management: Past, Present and Future. van der Aalst, W., & Weijters, A. J. M. M. (2004). Process Mining: A Research Agenda. Computers in Industry, 53(3), 231-244. Van Dongen, B. F., Medeiros, D. J., Verbeek, H. M. W., Weijters, A., & van der Aalst, W. M. P. (2005). The ProM framework: A new era in process mining tool support. Paper presented at the 26th International Conference on Applications and Theory of Petri Nets (ICATPN 2005), Heidelberg. Van Dongen, B. F., & van der Aalst, W. M. P. (2005). A Meta Model for Process Mining Data. Paper presented at the CAiSE05 WORKSHOPS. Wasserman, S., & Faust, K. (1994). Social Network Analysis: methods and applications: Cambridge University Press.

Vous aimerez peut-être aussi