Vous êtes sur la page 1sur 15

Automatic Data Collection in Logistics Costing: Analysing the Causes and Effects of Variation

Mikko Varila, Marko Seppnen & Petri Suomala Tampere University of Technology Cost Management Center Institute of Industrial Management P.O. Box 541, FI-33101 Tampere, Finland mikko.varila@tut.fi

The need for cost efficiency makes it necessary to monitor costs in detail in the logistics environment. This challenges the traditional methods and assumptions of cost accounting. Due to the wide variety of products with different characteristics and needs, duration drivers may be more accurate than transaction drivers in assigning costs of logistics activities. An automatic data collection system provides support to cost accounting with accurate time data and low measuring costs. The main objective of this paper was to study how large amounts of automatically collected data should be analysed in order to understand the effects of product-related variation on activity duration. If time drives costs in logistics, the key question for accounting should be what drives time. The study was based on a development project of an accounting system of an electronics wholesaler. Data were collected and analysed from one case activity and the effects of key variables on activity duration were examined. A forecasting model based on multiple regression analysis was built to estimate the duration beforehand for a product batch with certain characteristics. The information collected provides excellent possibilities for a profound analysis of activities. By splitting an activity into subtasks and identifying the related key variables, it is possible to trace the causes and effects of variation in the time for performing the activity. The differences in the way products consume an activity cause significant variation in activity duration on both the batch and unit level. No single driver was found that could explain the variation. Multiple variables were needed to reach a sufficient level of accuracy. It was demonstrated that it is possible to increase the accuracy of accounting by collecting more data and adding more variables to estimating costs.


Increasing the visibility of logistics costs

One of the key competitive advantages of logistics is the cost efficiency of processes. In a recent logistics survey, as many as 71 percent of those responding ranked cost control/cost reduction as their top concern (Cooke 2002). The firms seem to have an understanding about how to accomplish this goal because the need to utilise and optimise information technology was cited by 40 percent of

the respondents. Logistics activities are still among the largest cost drivers, but at the same time, the importance of logistics as a competitive advantage has been recognised. Proper cost control requires that logistics processes be monitored accurately enough. Reaching this level of accuracy will largely depend on the ability of the firms cost accounting system to trace costs to cost objects (Pohlen and La Londe 1994). Earlier accounting methods are based on assumptions of a stable and predictable market, long product life-cycles, large production runs, and a large portion of direct variable costs in total product costs. This is more rarely the case in todays logistics environment. Especially, wholesalers/retailers have a need for an instrument that is capable of linking logistics process information to financial information (van Damme and van der Zon 1999). Thus, logistics requires more visibility. Increased visibility will provide an understanding to assess the dependencies between price and volume, to identify potential targets of cost reductions, to assess new technology investments and to pay attention to the management of all assets. According to Graham (2003) enterprise solutions that simply manage transactions are not sufficient. What is needed now is a way to collect information on a real-time basis in order to provide data that is always correct. Advanced identification methods, e.g. bar codes and RFID, have been important landmarks in a relationship for recording even the smallest phases of the processes. Concurrently, they pose a challenge to cost accounting. When the data are available and the costs of acquiring data decrease, this will enable a more accurate cost accounting. In addition, automatic data collection produces plentiful information for analysing activities. This means that increased visibility helps assigning costs to activities and improving the cost efficiency of activities. 1.2 Objectives and research method

There are many sources of variation that affect the cost of performing activities. Examples of these are machinery, labour, environment, methods and products. This study concentrates on the variation that is product related. The research problem is how different products and the product-related variations in the methods affect the consumption of an activity. The study is based on a development project of an accounting system of an electronics wholesaler. The data produced by the accounting system is used to find new perspectives in the behaviour of logistics activities and especially in assigning activity costs to products. The main objective of the paper is to study how large amounts of automatically collected data should be analysed in order to understand the cost behaviour of an activity. Breaking an activity down to subtasks and identifying the related key variables will be considered. A special focus of the paper is in developing a forecasting model, which could estimate the activity cost driver amount for a product batch with certain characteristics as accurately as possible. In this context, models based on single and multiple variables are compared. Finally, the effects of the results on the accuracy of accounting and applicability in practice are considered.


Theoretical background
Assignments of logistics activities

Activity-based costing (ABC) is utilised to achieve more accurate assignments of costs to different products. In traditional costing, overhead costs were allocated more arbitrarily: for instance, in proportion to direct work hours or in relation to the percentage of direct costs. In activity-based

costing many drivers are typically utilised to reach as fair an assignment as possible. This is especially important in circumstances where products do not consume resources equally. Three things should be considered in selecting suitable drivers: 1) effect on behaviour, 2) reliability of measurement and 3) costs of measurement (Geiger 1999). At its best, a good driver motivates to reduce costs, and in the worst case, it directs employees toward undesired behaviour. When the number of drivers increases, in most cases more accurate results will be achieved. On the other hand, at the same time the costs of acquiring driver information grow, especially if the information is acquired and entered manually. Many companies underestimate the laboriousness of acquiring information required for an ABC system. Activity cost drivers are the key innovation of ABC systems. The disadvantage is that defining activity cost drivers is in many cases the most expensive and hardest part of the whole ABC project (Kaplan and Atkinson 1998, p. 110), (Lahikainen and Paranko 2001). According to Gunasekaran et al. (1999), the level of desired accuracy should be based on a companys strategic objectives. Cooper (1989) warns that, in environments where different products consume very unequal amounts of activities, the selection of activity cost drivers should be done very carefully. An activity cost driver which treats all products evenly may dramatically undermine the accuracy of accounting. Driver information may be utilised to reduce costs guiding interest to the right targets. However, Johnson et al. (1991) point out that looking at the driver information only may lead decision-makers to wrong tracks. It must always be remembered that the existence of activity should be questioned. Often there are several ways to achieve the desired final results. Kaplan and Atkinson (1998) have classified activity drivers to three classes: transaction, duration and intensity drivers. An activity cost driver is a quantitative measure of the output of an activity. The selection of an activity driver reflects a subjective trade-off between accuracy and the cost of measurement. The aim is to select an appropriate driver which reflects the real consumption of the resources as well as possible. The activities of warehousing incur a remarkable part of the costs of a logistics process. On a general level, these activities are: 1) receiving, 2) put-away, 3) storage, 4) order picking, 5) packing, marking and staging and 6) shipping (Roth and Sims 1991). Depending on the accounting context, it is possible to use finer or coarser classification. In logistics activities, a transaction driver (e.g. the number of products or rows handled) is typically used (see e.g. Fernie et al. 2001). Transaction drivers are the least expensive type of cost drivers but they could be the least accurate because they assume that the same quantity of resources is required every time an activity is performed (Kaplan and Atkinson 1998). In real life, this is very seldom the case. Using weight indexes is one way to simplify the cost assignment phase (Kaplan and Atkinson 1998). In this approach an individual activity is divided into different levels and weighted by weight factors that indicate the time required by each of the levels (Lahikainen and Paranko 2001). In the logistics environment, when the number of different items and alternative ways to handle different products grow, even weight indexes oversimplify the situation. Also, updating weight indexes to ten thousands of items will be overly laborious. Themido et al. (2000) suggest using simple statistical techniques in order to correlate output with alternative activity drivers. This can be seen as a surrogate driver (cf. Raffish and Turney 1991). Themido et al. actually claim that it is not unusual that the models are based on a single or even multiple regressions. According to Babad et al. (1993) a cost driver is an event, associated with an activity, that results in the consumption of a firms resources. If the resource consumption is directly proportional to time, it might be more convenient to use duration drivers instead of transaction ones. The problem has been that the measurement of durations has been too laborious or even impossible. Therefore, when seeking to optimise accuracy and costs of measurement, a coarse driver has been selected.


Towards more accurate accounting

Laboriousness of data collection and analysis may diminish the usefulness of cost management. In this chapter we present two different methods which may ease operational cost management: automatic data collection (ADC) and work measurement. ADC is a common name to methods which help collecting information from processes with minimum manual efforts. Todays information systems ease the collection of information and by help of bar codes and RFID technology the products in the material flow can even be individually recognised (Smith and Offodile 2002). Also, ADC enables large-scale information collection without human errors. The improved data accuracy, more rapid availability, better managerial decisions, improved job performance and improved response rate to changes in production schedules have been considered as the benefits of ADC (Christoph et al. 1991). The benefits have positive reflections to productivity. Rossetti and Clark (2003) offered an example of how to utilise automatically collected time data. To aid scheduling and capacity planning, time stamps were collected from machine centre arrival and departure events. They created a regression model, which calculated the operating time to each product type. By help of ADC, the real-time process information can be linked to the cost information and this results in automatic and very accurate cost accounting. However, many times there exists a desire to know logistics costs in advance. For the needs of pricing, for instance, this real-time information is not sufficient. The variation of the real process must be smoothed. One possibility for smoothing is work measurement. Work measurement is used in many industries to eliminate inefficiency, to reduce operational costs, and to increase productivity. The aim is to find the most efficient way to complete a given task. It also offers a possibility to find objective reasoning to determine the cost of a single work stage. In addition, the information gained could help in cost assignment. Traditionally, work measurement is separated into two main techniques: time studies and engineered approaches. In both techniques tasks are split into very small parts, which are examined in detail. In time studies the duration of a task or its part and task outputs are measured continuously or sporadically. Engineered approaches apply either times of similar stages or general predetermined motion-time systems. An example of an expert defined predetermined motion-time standard is MOST (Maynard Operation Sequence Technique). (Michaels 1989) The drawbacks of many work measurement techniques are that they can be time-consuming and costly (Failing et al. 1988). Especially in environments in which the number of items is large and different items consume resources variously, work measurement may be laborious. Furthermore, the standards may quickly become obsolete because of the ever-changing set of product mixes (Gray 1992). A high turnover of product mix is a typical situation in a logistics environment. An alternative to a proper time-study approach is to recognise the key variables of different stages and examine the total time of task by means of these key variables. Multivariate techniques (e.g. multiple regression analysis) can be used to examine how variables affect the total time (Gray 1992). Rather than breaking tasks into smaller elements, multivariate techniques utilise variables to create a predictive model of total time. This implies that the work needed to measurement reduces and similarly the costs of measurement decrease. In addition, both the total time and the variables can easily be attained by means of ADC, which makes the data collection even simpler.


Producing the data

Developing an automatic time-based costing system

The roots of the study are in the development project of an accounting system in the case company, an electronics wholesaler. Characteristic of this industry is that the gross margin is extremely low, which makes cost effectiveness one of the cornerstones of the business. Due to the demand for cost effectiveness, the logistics process must be smooth and mistakes cannot be tolerated. The number of different items is tens of thousands and the product life-cycles are very short, varying typically from a few months to one year. Because of the large and continually changing product assortment it is not possible to control the costs intuitively. In order to increase the visibility, more data must be collected and it must be analysed with new methods. The project was started on this basis, aiming at building an accounting system that monitors costs on the level of individual product ID codes. This was made possible by the companys advanced information system, which is able to track the material flow accurately. Data collection is possible in connection with many actions: scanning a bar code, pressing a button or clicking a mouse with a computer and starting or ending a transfer by the automation system. An essential piece of information which can be attained is the time of the event. This time data was utilised by the accounting system in assigning activities to products. The logistics process was divided into thirty activities. In activity definition, it was important that each activity was bounded by transaction points that can gather information. This may be challenging, because the transaction points are not necessarily the natural starting and ending points of the activity. Basically, transaction points can be added into the process by for example scanning codes or pressing a button more often, but the cost of data collection limits the accuracy to a reasonable level. Each activitys total cost per time period was calculated by using activity based costing, whose basic idea is depicted in Figure 1. Because the idea of the system was to utilise time as a cost driver, a practical capacity in seconds was defined for each activity. For example, in an activity where labour is a bottle-neck resource, the activitys capacity was set to the available labour seconds in a time period. On the basis of these pieces of information, a cost per second was calculated for each activity.
Costs (accounts) Resources Activities Cost objects (ID codes)

Figure 1. The basic idea of Activity-Based Costing.

From the information systems point of view, an activity means everything that is done between two transaction points. Each time an item passes a transaction point, an event row is recorded in the information system. An event row includes among other information the starting and ending time of an event. Each consecutive pair of transaction points represents an activity that has a cost per second. An event is also linked with information on the product ID codes and the number of items being handled. A product ID code gives an access to other product information such as weight, volume and product group. 3.2 Identifying the key variables of the case activity

Picking was chosen as a case activity for further research. It is performed in five picking stations in two shifts. Picking starts after products are automatically transferred to picking stations by a 5

conveyor in plastic boxes. The number of items indicated by the information system is picked, receipted into the system, padded if needed and packed into customer boxes. Picking was chosen, because it includes a relatively large amount of manual work and is for that reason a potential source of variation in quite an automatic environment. Furthermore, it is one of the most expensive activities, which makes it a natural focus of interest in the case company. At first glance, picking seems quite a simple activity: products are moved from one box to another. However, it can be divided into several subtasks (see Figure 2), where case-specific variation takes place. The products can be receipted in three ways. In simple ways, all the products are receipted by scanning one common EAN-code or pressing a button and feeding the amount into the system. In the hard way, a unique serial number must be scanned from the side of each item. Different products also have a different need for packing: some must be carefully padded and wrapped, while others do not need such actions at all. Moving products can be more or less difficult, intuitively based on the weight, volume and shape of the items. Finally, an employee may get a note on his/her screen for additional handling. This may mean for example that the products must be supplied with additional parts or handled with extra features.

- How are the products receipted? - How many receipts are made?

- Do the products need padding? - How many units?

Moving to boxes
- How difficult are the products to handle? - How many units?

Additional handling
- Does a handling instruction exist? - How many units?

Figure 2. The picking activity divided into subtasks and the questions that determine the duration of subtasks.

Monitoring the causes and effects of variation in an activity requires examining it on the level of subtasks. Following the duration of each subtask would be technically demanding and costly and would complicate the actual work unacceptably much. Approximately the same information can be attained by identifying the key variables related to subtasks and using them in explaining the total duration of the whole activity. Every subtask consists of how or whether the task is done and the amount being treated. Receipting a subtask consists of the way the products are receipted and the number of receipts1. Additional handling consists of whether it exists and the number of units. The difficulty of moving items into customer boxes cannot be measured accurately. This is approximated by using the total weight, total volume and product group. The product group also approximates the need for padding, because the groups are quite homogenous when it comes to fragility. Every variation in an activity is recorded in the data as a different combination of variables, depending on the characteristics of the products that are handled. Overall, the following variables were included in the data, each representing a part of the information content of a subtask: Number of units Total weight Total volume Additional handling? Receipt method (How many units?) (How difficult are the products to handle?) (How difficult are the products to handle?) (Does a handling instruction exist?) (How are the products receipted?)

1 If the receipting method is a serial number, the number of receipts is the number of units. In the other two cases the number of receipts is the number of different customer orders.

Number of receipts Product group

(How many receipts are made?) (How difficult are the products to handle? Do the products need padding?)

Data consisting of the above mentioned variables was collected from the picking activity. The number of events totalled 1449, where one event means picking items from one plastic box into one or several customer boxes. Since one plastic box can contain only one type of products, each item handled in an event is similar and is also handled similarly. An example of data containing the variables is depicted in Table 1. Naturally, each event was also linked with its duration, which was calculated from the difference between starting and ending times. A starting transaction of a picking event occurs when the plastic box arrives to the picking station and an ending transaction occurs when the plastic box leaves. It is supposed that the time between the transactions consists mainly of effective work.
Table 1. An example of the collected data.
Time [s]
14 33 144 213 318 12 11 186 168

# of reciepts
1 1 1 3 1 1 1 1 1

# of units
1 4 1 3 1 1 1 1 1

Total weight [g]

150 560 470 3000 500 1300 30 2000 635

Total volume [cm^3]

385 1539 2239 1944 1425 6615 175 6426 2728

Additional handling
0 0 0 1 0 0 0 0 1

Receipt method
2 2 1 1 1 1 2 1 0

Product group
Product group 6 Product group 6 Product group 4 Product group 1 Product group 2 Product group 3 Product group 9 Product group 3 Product group 3

The distribution of event durations is far from the normal distribution, being emphasised by short time events (see Figure 3). Some events can be seen where the duration has been long despite the proportionally small number of units handled. Standard deviation is high in proportion to average duration, which tells about significant variation. Clarifying the causes and effects of the variation has been a matter of interest in the case company for a long time.
80 70

Number of events: Average time: Maximum time: Minimum time: Standard deviation:

1449 44,89 s 645 s 6 s 57,78 s

Number of events

60 50 40 30 20 10 0 0 100 200 300 400 500 600 700

Time [s]

Figure 3. The frequency of event duration and the basic information of the distribution.


Analysing the data

The effects of the variables on activity duration

The total time spent on a working task is typically of the form a + bx, where a is the setup time, b is the time per unit and x is the number of units. On the other hand, if an activity consists of multiple subtasks, the total time is the sum of multiple setup times and times per unit. Since the connection between time and the number of units is approximately linear, the effect of different variables can be examined by help of linear regression. A basic single variable regression is used in determining the effects of each independent variable on the dependent variable, which in this case is time. Examining all the event times as a function of number of units (see Figure 4) reveals that the variability is very high. The regression line shows that the number of units increases the duration, but the actual degree seems to be quite coincidental. Also the correlation between time and number of units remains poor (R2~11%). Furthermore, what is eye-catching is the large number of events containing only a few units, but yet resulting in quite a long duration. Despite the intuitive assumption that the picking time is dependent on the number of units being handled, it cannot explain much of the variance.
700 600 500

Time [s]

400 300 200 R2 = 0,110 100 0 0 20 40 60 80 100

Number of units

Figure 4. The picking time as a function of number of units. The R2 value is low, which means that the number of units explains only approximately 11 percent of the variance.

An independent variable can have a so-called main effect on the dependent variable in itself, but also interaction effects with other independent variables. Interaction effect means that the impact of one independent variable on the dependent variable is influenced by the level of another independent variable. Interactions must be included in an analysis, because they may in many situations have the biggest impact on the result. (Aiken and West 1991, pp. 1-8) In this occasion a typical interaction occurs between the variables related to certain subtasks and the number of units. The subtask-related variables are categorical in nature. They describe whether (yes, no) or how (method 1, method 2, etc.) a certain subtask is carried out. Categorical variables and their special characteristics in regression analysis have been studied by for example Hobson (1969), who argued that their effects on continuous variables should be taken into account. An interaction between a categorical and continuous variable can be described by plotting a regression line for each category. If the lines are parallel, categorical variables have only a main effect on the dependent variable. If the slope of lines is different, an interaction effect is present. (Aiken and West 1991, pp. 116-138) The first categorical variable, receipt method, explains how the products are receipted into the information system. Pressing a button or scanning one EAN bar code are non-recurring tasks and 8

the number of units does not have a significant effect on the duration of receipting task. Scanning a serial number from each item, however, must be repeated for each item. It can be seen in Figure 5 that the effect of number of units on duration is strongly dependent on the receipting method. The second variable examined relates to additional handling. The larger the number of units is the greater is the increase of additional handling on the duration of the activity. This means that additional handling has an interaction effect with the number of units on picking time.
Receipt method
300 300

Additional handling



Time [s]


Time [s]
100 0 1 20 0 1 20

Number of units
button EAN SN

Number of units
yes no

Figure 5. The effects of receipt method (on the left) and additional handling (on the right) on picking time as a function of number of units.

Probably the most interesting findings were made between different product groups. A product group approximates the need of padding and the difficulty of moving and generally handling the items. Figure 6 reveals that there are differences in both setup-times between different product groups, but especially in times per unit. With respect to the activity cost driver hierarchy, these results reveal that an activity can be a unit-level or a batch-level one depending on what kind of a product is being handled. Compared to other categorical variables under examination, the differences between product groups are largest. This implies that the product group-related subtasks, such as padding, have a significant role in determining the activitys duration.

500 Product group 1 400 Product group 3 Product group 5 Product group 7 Product group 9 100 Product group 2 Product group 4 Product group 6 Product group 8 Product group 10

Time [s]



0 1 20

Number of units

Figure 6. The effects of the product groups on the picking time as a function of number of units.

Based on the aforementioned Figures it is reasonable to assume that the variables are essential. Furthermore, it can be seen that there are interactions between the number of units and other variables, which makes this activity more complicated. The variation is caused by both different types of products and different types of methods. An essential observation is that there is no single variable that could explain the variability in activity duration sufficiently. Instead, multiple variables are needed. 4.2 Predicting the activity duration

Automatic costing system produces a large amount of data for analysing the behaviour of an activity afterwards. However, it is often desired to know the duration of an activity for a certain amount of a certain product beforehand, for example for planning purposes or for assessing the costs. This could be problematic if the product is new and there is no experience of its behaviour. In that case the time must be estimated. There are several methods that aim at estimating a cycle-time of an activity (Chung and Huang 2002): simulation, statistical analysis methods, analytical methods and hybrid methods. Simulation is often used in modelling a complicated system and producing a large amount of data. Statistical analysis methods, regression analysis for example, utilise historical data to determine the relationship between time and related parameters. Analytical methods are often based on queuing theory and existing deviations. Hybrid methods combine some or all of these methods. The examined activity is too complicated for analytical solution and the amount of data is sufficient, which makes simulation worthless. Regression analysis is a good starting point for constructing an estimation model in the current situation. Regression analysis, like any other method, is a simplification of real life and holds some assumptions that must be born in mind. Firstly, residuals the differences between the actual and forecasted values of the independent variable should be normally distributed with zero mean and constant variance. Secondly, residuals or independent variables must not be correlated. (Wang 1993) Additionally, in linear regression the basic assumption is that the relationship between the dependent variable and independent variables is linear. Even though the actual collecting of data is quite accurate, an event may be interrupted by many reasons. In case of picking activity, the possible sources of interruptions are for example malfunction of automation or information systems and short breaks. Because the case activitys duration is typically short, interruptions may cause severe distortion in the data. These diverging observations, often called outliers, may seriously weaken the explanatory power of the model based on historical data. Detecting and filtering outliers from regression data has been a widely researched topic (see e.g. Srivastava and von Rosen 1998). One way to prevent outliers from corrupting the data is to analyse the residuals either visually or on the basis of indexes and eliminate the data points where the results are unusual. Another solution is to use robust regression methods, which are designed to tolerate diverging observations. In this case outliers were eliminated by making a preliminary model and detecting and eliminating the events where the forecasted time differed from actual time more than three standard deviations. The final model was based on data where the possible outliers were filtered. In multiple linear regression, where n observations are made and the number of variables is p-1, the problem is of the form y = X + , where y n1 is dependent variable vector, X np is independent variable matrix, p1 is coefficient vector depicting the effect of variables and


is normally distributed random vector, which depicts the error associated with the observations. The basic idea behind regression analysis is to find the estimate for the coefficient vector that minimises the forecasting error of the model. By help of the coefficient vector, new values of y can be estimated. The number of original variables collected from the case activity was seven. Because of the categoricality of some variables and the interactions, new variables must be added. The most frequently used procedure for representing categorical variables in regression is dummy variable coding (Aiken and West 1991, pp. 116-127). For a categorical variable containing k options, k-1 dummy variables are created. A dummy representing category A is given the value 1 if A exists and 0 otherwise. In order to avoid singularity of the data matrix, one category must stand as a reference for others and not have a dummy variable at all. Previously, the interactions between categorical variables and the number of units were observed. That is why each interaction has to get a separate term (Aiken and West 1991, pp. 123-126). The variables used in the model are represented in Table 2.
Table 2. The variables used in the model and their possible values.
Variable [# of units] [Total weight] [Total volume] [Additional handling] [Additional handling]*[# of units] [Receipt method SN] [Receipt method EAN] [# of receipts] [Receipt method SN]*[# of receipts] [Receipt method EAN]*[# of receipts] [Product group 2] [Product group 10] [Product group 2]*[# of units] [Product group 10]*[# of units] Possible values N R+ R+ 0 = no, 1 = yes N 0 = no, 1 = yes 0 = no, 1 = yes N N N 0 = no, 1 = yes 0 = no, 1 = yes N N


The actual regression analysis was made with SPSS for Windows software. The summary results for the model based on all variables are presented in the first row of Table 3 and the model coefficients in Appendix A. The explanatory power (R Square) of the model is 50 %, which cannot be considered very high. However, according to Wang (1993) a good model is one which is based on a sound theory. In this case the result was expected, because the study concentrated only on the variation caused by different product characteristics and the methods depending on them. In other words, this is the variance that can with good reason be assigned to products as costs that vary from one product to another. Also the variance caused by other factors labour and environment for example are worth examining for process improvement purposes. According to preliminary estimations, the differences in labour effectiveness are double at the highest, which makes the remaining 50 % of the variation explicable.


Table 3. The model summary all variables vs. the number of units only.
Adjusted R Square 0.490 0.110 Std. Error of the Estimate 29.841 54.521

Model All variables Number of units only

R 0.707 0.332

R Square 0.500 0.110

A better source of comparison is to use only the number of units as a basis for estimation. This would be the basis of forecasts, if a transaction based driver was used. Another regression model based only on the number of units was built, the results of which can be seen in the second row of Table 3. The number of units explains only 11 % of the variation. Both models are statistically significant according to F-test, but the model based on all variables has a considerably better explanatory power and a lower standard error. On this basis, it is justifiable to criticise the use of a single variable in estimation.
Table 4. An example of estimations of the picking time. Number of units based estimation underestimates difficult products.
Model Product type Variables
[# of units] = 20, [Total weight] = 1000, [Total volume] = 1000, [Additional handling] = 0, [Receipt method EAN] = 1, [# of receipts] = 1, [Product group 7] = 1 [# of units] = 20, [Total weight] = 10000, [Total volume] = 10000, [Additional handling] = 1, [Receipt method SN] = 1, [# of receipts] = 20, [Product group 10] = 1 [# of units] = 20

Estimated picking time [s] 26

Easy product All variables Difficult product


Number of units only

Any product


An exemplary calculation in Table 4 shows the meaning of the results in estimating picking times on the basis of either all the variables available or number of units only. Estimation was made for a batch of twenty units of products to be handled. When all the variables are considered, the difference in picking times can be almost tenfold depending on the characteristics of the product. If only the number of units is the basis of estimation, the result is the same for each product regardless of its characteristics. The latter estimation is typically too low, underestimating the laboriousness of difficult products.


The main objective of the paper was to study how large amounts of automatically collected data should be analysed in order to understand the cost behaviour of an activity. The analysis of an activitys behaviour afterwards and forecasting its duration beforehand were considered. In the wholesale environment, where the gross margin is extremely low, every penny is worth accounting for. Due to the wide variety of products with different characteristics and needs, transaction-based drivers may not be accurate enough in assigning costs of logistics activities. For most resources in logistics, time drives costs, and time should also be used as a driver in activity assignment. The time usage of a large and constantly renewable product assortment is impossible to monitor manually. Automatic data collection provides a useful tool for building a time-based accounting system


Especially in a logistics environment, where the process is straightforward and identification is recurrent, data are easily available for cost accounting purposes. However, extra care is necessary in defining activities in order to attain the desired information related to them. RFID technology provides interesting possibilities for accounting in the future. It remains to be seen whether it is possible to even more easily and accurately follow the products paths and also record cost information to RF tags in real time. If time drives costs in logistics, the key question for accounting should be what drives time. The information collected provides excellent possibilities for profound analysis of activities. By splitting an activity into subtasks and identifying the related key variables, it is possible to trace the causes and effects of variation in activity duration. Data containing activity durations and the key variables were collected from a case activity, picking products into customer boxes. The data were analysed and some interesting notifications were made. Some products consume the subtasks differently and require different working methods to be used. This causes significant variation in picking times on both the batch and unit level. For some product groups the picking time seemed to be almost independent of the number of units handled, while for others the growing number increased picking times dramatically. A forecasting model was built to estimate the picking time beforehand for a product batch with certain characteristics. Multiple linear regression was used because of its suitability for situations where plenty of historical data are available. The explanatory power of the model remained quite weak, which was expected, while only the product-related variation was studied. The main purpose, however, was comparing the estimation based on all variables with the estimation based on only the number of units. The explanatory power of the latter model was very poor, and the standard error of the estimate was very high when compared to the average duration of the activity. With multiple variables a considerably better explanatory power and a smaller standard error were reached. The differences in estimations can be very high, if the number of units is large and the products have different characteristics. Based on these results it can be argued that using only one variable in forecasting the activity driver amount may end up in a poor result. Despite this fact, single variable, such as the number of units, is commonly used in this kind of environment. It is clearly demonstrated that it is possible to increase the accuracy of accounting by collecting more data and adding more variables to the cost estimation. The question still remains whether this result is meaningful. One issue to be considered is how big the cost differences in picking are when compared to other costs, such as the purchase prices of items. In the case company, the purchase prices vary from cheap bulk to high-end specialty products. It is not at all obvious that an expensive product and high picking costs go hand in hand. This study considered only one activity, but a typical product in the case company consumes around ten activities. The cumulative effect of rather small variations could result in dramatic differences in the total costs. An effort should be made to examine this variation in the total product costs and compare it with the purchase prices of items. Completely automatic accounting is not commonplace yet. Even though collecting data would be automatic, analysing and interpreting it is challenging and time-consuming. In a dynamic environment, such as electronics wholesale, an accounting system needs constant updating. A new analysis should be made each time the product assortment or the way an activity is performed changes. Despite the challenges, an accounting system based on automatic data collection provides a potential for more accurate accounting and tracing of the underlying causes and effects of variability.


Aiken, L. S. and West, S. G. (1991). "Multiple regression: Testing and interpreting interactions." Thousand Oaks, Sage Publications. Babad, Y. M. and Balachandran, B. V. (1993). "Cost driver optimization in activity-based costing." Accounting Review 68(3): 563-575. Christoph, O. B., Stevens, S. P. and Christoph, R. T. (1991). "Automatic Data Collection Systems: Observed Benefits and Problems." International Journal of Operations & Production Management 12(5): 57-68. Chung, S.-H. and Huang, H.-W. (2002). "Cycle time estimation for wafer fab with engineering lots." IIE Transactions 34(2): 105-118. Cooke, J. (2002). "Inventory velocity accelerates." Logistics Management. 42: 33-38. Cooper, R. (1989). "The Rise of Activity-Based Costing - Part Three: How Many Cost Drivers Do You Need, and How You Select Them?" Journal of Cost Management 3(Winter): 34-46. Failing, R. G., Janzen, J. L. and Blevins, L. D. (1988). "Work measurement techniques." Journal of Accountancy 165(4): 104-108. Fernie, J., Freathy, P. and Tan, E. (2001). "Logistics Costing Techniques and their Application to a Singaporean Wholesaler." International Journal of Logistics: Research and Applications 4(1): 117-131. Geiger, D. R. (1999). "Practical issues in cost driver selection for managerial costing systems." The Government Accountants Journal 48(3): 32-39. Graham, D. D. (2003). "Warehouse of the Future." Frontline Solutions. 4: 20-26. Gray, C. F. (1992). "An integrated methodology for dynamic labor productivity standards, performance control and system audit in warehouse operations." Production and Inventory Management Journal 33(3): 63-66. Gunasekaran, A., Marri, H. B. and Yusuf, Y. Y. (1999). "Application of activity-based costing: some case experiences." Managerial Auditing Journal 14(6): 286293. Hobson, T. F. J. (1969). "Regression analysis with categorical regressor variables." The Statistician 19(2): 153-161. Johnson, H. T., Vance, T. P. and Player, R. S. (1991). "Pitfalls in Using ABC Cost-Driver Information to Manage Operating Costs." Corporate Controller(Jan/Feb): 26-32. Kaplan, R. S. and Atkinson, A. A. (1998). "Advanced management accounting." Upper Saddle River, Prentice Hall. Lahikainen, T. and Paranko, J. (2001). "Easy Method for Assigning Activities to Products - an Application of ABC." 5th International Seminar on Manufacturing Accounting Research, Pisa, Italy, EIASM. Michaels, E. A. (1989). "Work Measurement." Small Business Reports 14(3): 55-63. Pohlen, T. L. and La Londe, B. J. (1994). "Implementing activity-based costing (ABC) in logistics." Journal of Business Logistics 15(2): 1-23. Raffish, N. and Turney, P. B. B. (1991). "Glossary of Activity-Based Management." Journal of Cost Management 5(3): 53-64. Rossetti, M. D. and Clark, G. M. (2003). "Estimating operation times from machine center arrival and departure events." Computers & Industrial Engineering 44: 493-514. Roth, H. P. and Sims, L. T. (1991). "Costing for Warehousing and Distribution." Management Accounting(August): 42-45. Smith, A. D. and Offodile, F. (2002). "Information management of automatic data capture: an overview of technological developments." Information Management & Computer Security 10(2/3): 109-118. Srivastava, M. S. and von Rosen, D. (1998). "Outliers in multivariate regression models." Journal of Multivariate Analysis 65: 195-208. Themido, I., Arantes, A., Fernandes, C. and Guedes, A. P. (2000). "Logistics costs case study - an ABC approach." Journal of Operational Research Society 51: 1148-1157. van Damme, D. A. and van der Zon, F. L. A. (1999). "Activity Based Costing and Decision Support." The International Journal of Logistics Management 10(1): 71-82. Wang, G. C. S. (1993). "What you should know about regression based forecasting." The Journal of Business Forecasting Methods & Systems 12(4): 15-21.


The variables and their coefficients of the model based on all variables are presented in Table 5. The model coefficients did not seem to be very informative, and some of them were not statistically significant. Especially the product group variable seemed to partly explain some other variables, such as the need for an additional handling or receipting method. Because of this, the coefficients are biased and the values do not reflect the actual effects of the variables. Examining the effect of one variable requires making a separate analysis for each variable, as was done in Chapter 4.1. However, for forecasting purposes using all the variables is justifiable. The coefficients of the model based on the number of units only can be found in Table 6.
Table 5. The variables and their coefficients used in the regression model based on all variables.
Unstandardized Coefficients B 26.417 0.114 0.002 0.000 38.522 -9.908 3.718 -10.069 5.558 -1.085 9.381 -8.661 -5.059 -15.514 -14.786 -43.424 -7.500 -4.221 21.727 14.745 1.021 0.533 0.381 0.975 10.514 -0.137 3.504 6.634 12.713 Std. Error 13.522 0.466 0.001 0.000 8.253 2.779 11.602 12.028 10.639 10.324 10.642 6.003 7.015 6.801 5.888 12.075 7.849 8.162 9.021 10.713 0.501 0.829 1.839 0.468 3.369 1.787 2.527 2.992 3.113 Standardized Coefficients Beta 0.024 0.082 -0.007 0.241 -0.227 0.043 -0.119 0.239 -0.042 0.312 -0.076 -0.037 -0.107 -0.151 -0.225 -0.060 -0.033 0.114 0.055 0.100 0.029 0.006 0.183 0.211 -0.003 0.055 0.075 0.141

Variable name (Constant) [# of units] [Total weight] [Total volume] [Additional handling] [Additional handling]*[# of units] [Receipt method SN] [Receipt method EAN] [# of receipts] [Receipt method SN]*[# of receipts] [Receipt method EAN]*[# of receipts] [Product group 2] [Product group 3] [Product group 4] [Product group 5] [Product group 6] [Product group 7] [Product group 8] [Product group 9] [Product group 10] [Product group 2]*[# of units] [Product group 3]*[# of units] [Product group 4]*[# of units] [Product group 5]*[# of units] [Product group 6]*[# of units] [Product group 7]*[# of units] [Product group 8]*[# of units] [Product group 9]*[# of units] [Product group 10]*[# of units]

t 1.954 0.246 1.943 -0.297 4.667 -3.566 0.320 -0.837 0.522 -0.105 0.881 -1.443 -0.721 -2.281 -2.511 -3.596 -0.956 -0.517 2.408 1.376 2.036 0.643 0.207 2.082 3.121 -0.077 1.386 2.218 4.084

Sig. 0.051 0.806 0.052 0.767 0.000 0.000 0.749 0.403 0.601 0.916 0.378 0.149 0.471 0.023 0.012 0.000 0.339 0.605 0.016 0.169 0.042 0.521 0.836 0.037 0.002 0.939 0.166 0.027 0.000

Table 6. The variables and their coefficients used in the regression model based on the number of units only.
Variable name (Constant) [# of units] Unstandardized Coefficients B 35.178 2.148 Std. Error 1.605 0.160 Standardized Coefficients Beta 0.332 t 21.912 13.389 Sig. 0.000 0.000