Vous êtes sur la page 1sur 6

C ONTRIBUTED RESEARCH ARTICLE

Using R in Continuous Assurance: Restricted Vector Autoregressive Model (RVAR) of Continuity Equations
by Erik van Kempen
Abstract Continuous assurance is a methodology to provide assurance on nancial data on a near real-time basis. One of the fundamental elements of continuous assurance is continuous data auditing in which the integrity of the data provided by the client is tested. Continuity equations can be used to evidence assertions regarding data integrity. In order to do so, data is tested by predicting subsequent values based on a tting model. In this paper the Restricted Vector Autoregressive Model (RVAR) is covered as a method for continuous data auditing. The vars package is used to implement the model and prediction methods in R.

Introduction
Continuous assurance has been a subject of interest for auditors and nancial professionals for the last three decades. However, this eld of research took off only after Vasarhelyi et al. (2004) published a widely accepted conceptual framework for continuous assurance in 2004. In the following years additional studies were performed in this eld, but most of these studies were focused on rening the theoretical framework and developing new and innovative analysis methods. Fully functional implementations were not yet in scope. This paper focuses on the implementation of continuity equations in R as one of the most powerful tools from the continuous assurance domain. As most auditors and nancial professionals are not trained to develop algorithms, code or applications the nal implementation in R needs to be fairly easy to understand for these target groups.

Continuous assurance
Continuous auditing [or continuous assurance] is a methodology that enables independent auditors to provide written assurance on a subject matter using a series of auditors reports issued simultaneously with, or a short period of time after, the occurrence of events underlying the subject matter. (Canadian Institute of Chartered Accountants (CICA), 1999) In order to be able to provide assurance on a near real-time basis, the auditors have to rely heavily on automated testing. Vasarhelyi et al. (2004, 2010) have dened three elements of continuous assurance and continuous monitoring: Continuous Control Monitoring (CCM), Continuous Data Auditing (CDA), Continuous Risk Monitoring and Assessment (CRMA). CCM can be compared to interim testing of procedures in the conventional audit framework and CDA can be compared to nal testing focusing more on data than procedures. These two elements combined can be used to provide sufcient assurance. CRMA can be used as an additional part of the control framework, but is not essential for providing assurance. CDA veries the integrity of the data owing through the information system. The data provided by the client is the basis for all testing procedures, so data assurance forms an essential part of continuous assurance. Continuity equations can be used as a tool from the CDA sub-domain to evidence assertions focusing on data integrity.

Continuity equations
Continuity equations have been a fundamental part of classical physics since the 18th century. These equations describe the transport of a quantity, while simultaneously ensuring conservation of this quantity (like mass and/or energy). Accordingly similar relations can be dened for the transport of quantities within a system in the nancial domain. The movement of reported quantities, e.g. ordered kilograms or invoiced units, between steps in the key business processes can be described with continuity equations. The term continuity equations was coined in 1991, when Vasarhelyi and Halper (1991) modeled the ow of billing data at AT&T. Although Vasarhelyi and Halper proposed continuity equations more than 20 years ago, little research has been performed on the application in practice and implementation

C ONTRIBUTED RESEARCH ARTICLE

of a decent continuity equations model. Especially research focusing on the Vector Autoregressive (VAR) model has been rarely performed in the last two decades. Only Dzeng (1994) and Kogan et al. (2010) have also considered the VAR model in their papers. In most businesses the ow of goods is the most important basis for revenue recognition. As such, it can be used to provide evidence for the completeness, timeliness and accuracy of the reported revenue. If the continuity equations hold for a specic business process, one can assert that there are no leakages from the transaction ow, i.e. the integrity of the ow of goods can be asserted. Therefore, continuity equations provide a method to evidence the integrity of the basis for revenue recognition, which makes them a valuable tool in continuous assurance. Continuity equations are based on historical data of quantities in the separate steps of business processes. For example, the sales cycle can be modeled as three separate steps: receiving the order from the customer, shipping goods to the customer and invoicing for the ordered and shipped goods. The quantity of ordered goods today will of course show up in the invoicing step a certain amount of days later. The daily ow of goods between these steps can be dened with a certain quantity Q and a lag between the steps . In this paper we will focus on the sales cycle consisting of the three previously dened process steps. The continuity equations for the sales cycle can be represented as Equation 1. In this model orderedt , shippedt and invoicedt are respectively the quantities ordered, shipped and invoiced at time t, the terms are N 1 transition vectors for a multivariate linear model, the M terms are N 1 vectors containing daily aggregates of quantities Q for the given dimension and N is the amount of time periods covered in the model. orderedt = oo M(ordered) + so M(shipped) + io M(invoiced) shippedt = os M(ordered) + ss M(shipped) + is M(invoiced) invoicedt = oi M(ordered) + si M (shipped) + ii M(invoiced) Each of these sub-equations models a predictor for the reported quantities in a specic step in the business process. As previously dened, the quantities are related to quantities in the other process steps by a time delay (lag). For example, if orders are shipped in exactly one day, without exception, and invoicing is performed simultaneously with shipping, the resulting predictors can be dened as equation 2. ordert = 1 shippedt+1 + 2 invoicedt+1 shippedt = invoicedt = 1 ordert1 1 ordert1 (1)

+ +

2 invoicedt 2 shippedt

(2)

In practice most business processes cannot be modeled this simplistically sufciently due to varying lags and dependencies between process steps.

Basic Vector Autoregressive model


In the basic Vector Autoregressive (VAR) model the model is estimated optimizing for the overall R2 by trying different lags for the process steps. Only the maximum expected lag is provided to the algorithm, which then tries to nd the best tting model by iterating trough all lag possibilities. The exact lags do not have to be known prior to modeling as the best tting lags are determined while modeling. One can easily understand that it is not always trivial to determine lags prior to the modeling process, e.g. lags in the purchasing cycle are highly dependent on the policies and processes at third parties. Therefore, the VAR model can be a powerful tool for modeling continuity equations when lags can not be predened easily.

Restricted Vector Autoregressive model


Kogan et al. (2010) have shown in their studies that the VAR model shows outstanding results. More importantly, they showed that the Subset VAR or Restricted VAR model resulted in better results. With a MAPE (mean absolute percentage error) of 0.3374 on the test set it outscored even several other models, i.e. SEM, GARCH and LRM. Only the BVAR model performed better whenk taking only the MAPE into account, but it also resulted in a larger standard deviation for the absolute percentage error. The RVAR model was found to be one of the best models for continuity equations. The Restricted Vector Autoregressive Model translates roughly to optimizing for R2 of the predictor by removing insignicant variables from the VAR model. For example, if the mean lag between order and shipping is less than a month shipment shippedt+365 a year after ordering is obviously not signicant and thus excluded from the model. This method iterates the modeling process per equation

C ONTRIBUTED RESEARCH ARTICLE

by removing all variables with |t|-statistics below a predened threshold, as explained in gure 1.

Data

Threshold

Final model

Yes

Start

Initial model estimation

Exclude parameters with t-statistic below threshold

Re-estimate model

All t-statistics below threshold?

No

Figure 1: RVAR modeling process. The initial VAR model is restricted by excluding parameters with a t-statistic below a predened threshold. The model is re-estimated followed by the next exclusion iteration, until all parameters satisfy the t-statistic requirement.

Implementation
The RVAR modeling is implemented in four stages: data collection, pre-processing, modeling and prediction. The code is centered around the vars pacakge, which has been developed and pusblished by Bernhard Pfaff and Matthieu Stigle and is available via CRAN. (Pfaff, 2008b; Pfaff and Im Taunus, 2007; Pfaff, 2008a) The package includes several functions for modeling VARs, testing the VARs and presenting the results.

Data collection
The proposed base model for the sales cycle is based on three different quantities: the ordered quantity, the quantity of goods shipped and the quantity invoiced. These three variables can be provided by most ERP systems on a daily basis. In this implementation data was used from a wholesaler in technical supplies. This company uses an off-the-shelf solution of Microsoft Dynamics AX 2009. The data was extracted from separately generated reports for each of the process steps by merging the columns by date, as presented in gure 2.

SalesOrders PK Date Quantity

Shipments PK Date Quantity PK

Invoices Date Quantity

SalesData PK,FK1,FK2,FK3 Date SO GS IS

Figure 2: Data model consisting of daily aggregates for three different stages in the sales cycle: ordered quantity (SO), quantity of goods shipped to customer (GS) and quantity invoiced (IS) combined by date via a SQL join clause. The date serves as the primary and foreign keys of the data source involved. The resulting data is exported as a CSV le to be imported by the implementation of the modeling tool in R. The CSV le consists of four data elds, i.e. date, the quantities ordered, quantities shipped and quantities invoiced, and is imported.

C ONTRIBUTED RESEARCH ARTICLE

> data.raw <- read.csv(file="Data/Sales-Quantities.csv", sep=";", header=TRUE, colClasses=c(Date, numeric, numeric, numeric)) > summary(data.raw) Date SO GS IS Min. :2007-01-02 Min. : 0 Min. : 0 Min. : -100 1st Qu.:2007-03-22 1st Qu.: 38384 1st Qu.: 42098 1st Qu.: 39736 Median :2007-06-14 Median : 63227 Median : 63738 Median : 60765 Mean :2007-06-14 Mean : 67769 Mean : 62624 Mean : 60695 3rd Qu.:2007-09-06 3rd Qu.: 85723 3rd Qu.: 83428 3rd Qu.: 79757 Max. :2007-11-30 Max. :547694 Max. :285074 Max. :299235

Pre-processing
The data generated by the ERP system is probably not provided in the correct data format as used by the modeling functions. Therefore, the data has to be pre-processed in order to be used as input for the modeling stage. First the raw data has to be imported. If data is missing for a specic day, e.g. weekends, the date is left out of the reports from the ERP system. The missing dates are added to the data set with quantities zero resulting in a complete time linear data set. This complete data set can be converted to a multiple time series object (mts), which is used in the modeling stage. > data.empty <- data.frame( Date=seq.Date(from=as.Date(head(sort( data.raw[,1] ), 1 )), to=as.Date(tail( sort(data.raw[,1]), 1) ), by="1 day")) > data.merged <- merge(data.empty, data.raw, by = c("Date"), all.x=TRUE, all.y=FALSE) > data.merged[is.na(data.merged)] <- 0 > data.tseries <- ts(data = data.merged[, 2:4])

Modeling
When the vars package is loaded in R, a fairly compact piece of code can be used for modeling. Only a few functions from the vars package are used. First a full VAR model is calculated using all variables up to the maximum lags. Trends and constant terms should be excluded from the VAR model. > library("vars") > model.var <- VAR(data.tseries, p=1, lag.max=30, type="none") In the nal step of the modeling stage the restrict function of the vars package is used to automate the exclusion process for insignicant variables. The result is a RVAR model using only correlated variables. > model.var.restricted <- restrict(model.var, thresh=2, method = "ser") > summary(model.var.restricted) ... Estimation results for equation SO: =================================== SO = IS.l1 + IS.l5 + SO.l7 + IS.l7 + GS.l9 + SO.l16 + GS.l19 + GS.l20 + SO.l21 Estimate Std. Error t value IS.l1 0.22214 0.05723 3.881 IS.l5 -0.18138 0.05998 -3.024 SO.l7 0.18734 0.05663 3.308 IS.l7 0.29996 0.07139 4.202 GS.l9 -0.19140 0.06479 -2.954 SO.l16 0.11165 0.05287 2.112 GS.l19 0.16278 0.06126 2.657 GS.l20 0.27508 0.05881 4.677 SO.l21 0.15094 0.05026 3.003 --Signif. codes: 0 *** 0.001 ** ... Pr(>|t|) 0.000127 0.002705 0.001052 3.49e-05 0.003381 0.035512 0.008301 4.38e-06 0.002896

*** ** ** *** ** * ** *** **

0.01 * 0.05 . 0.1 1

C ONTRIBUTED RESEARCH ARTICLE

Prediction
Finally the RVAR model is used to generate predictions for subsequent time periods with the predict function. These predictions can be used to be compared to actual quantities reported in subsequent time periods for conformance. Deviations between the actual values and predicted values are agged as exceptions if they exceed a predened threshold. In this specic implementation the restricted model is used for predicting 10 subsequent time periods with a condence interval of 1%. In this example the condence interval parameter ci is misused in order to obtain a small range between the upper and lower bounds. These resulting upper and lower bounds can be used as the thresholds for agging the dates as exceptions. The vars also provides functions, like print and fanchart to visualize the predicted values, as shown in gure 3. > model.predictions <- predict(model.var.restricted, n.ahead = 10, ci = 0.01) > print(model.predictions) $SO fcst lower upper CI [1,] 7583.843 6982.703 8184.983 601.1400 [2,] 4133.422 3523.785 4743.058 609.6366 [3,] 58130.976 57521.340 58740.613 609.6366 ... $GS fcst lower upper CI [1,] 9136.999 8706.614 9567.384 430.3852 [2,] 3682.716 3252.331 4113.101 430.3852 [3,] 44230.601 43800.215 44660.986 430.3852 ... $IS fcst lower upper CI [1,] 8660.376 8203.789 9116.964 456.5877 [2,] 2995.246 2538.658 3451.833 456.5877 [3,] 49854.651 49398.063 50311.239 456.5877 ...
Fanchart for variable SO
5e+05 1e+05 2e+05

50

100

150

200

250

300

350

Fanchart for variable GS

50000

150000

50

100

150

200

250

300

350

Fanchart for variable IS

50000

150000

50

100

150

200

250

300

350

Figure 3: Plot of the three predicted (or forecasted) variables, i.e. SO, GS and IS, using the fanchart function.

C ONTRIBUTED RESEARCH ARTICLE

Conclusion
The resulting script proves that it is feasible to implement the Restricted Vector Autoregressive model for continuity equations fairly easily and understandable in R. Due to the availability of the vars package by Pfaff the VAR model and the RVAR model can be implemented in just a few lines of code. Once the data is processed to t the desired data model for the modeling functions from the package in total three functions have to be called to generate predictions for the RVAR model.

Recommendations
As expressed by Roy Sidebotham, Professor of Accountancy at Victoria University of Wellington, Accountants are cautious men, and their caution is expressed in the concept of conservatism. Certainly in the eld of continuous assurance this has been shown to be true throughout the years. Although continuous assurance has been available for a very long time, only a handful of auditors have embraced the advantages the tools can add to the organization. By providing a more intuitive look-and-feel to these tools, auditors might be more inclined to try and use them. Furthermore, this paper was focused on a single model of continuity equations. Other models might provide a better t on data for specic company types. Additional model implementations, e.g. Bayesian VAR, GARCH or SEM should be studied to provide a wider range of models.

Bibliography
Canadian Institute of Chartered Accountants (CICA). Continuous auditing, 1999. [p1] S. Dzeng. A comparison of analytical procedures expectation models using both aggregate and disaggregate data. Auditing: A Journal of Practice & Theory, 13(Fall):124, 1994. [p2] A. Kogan, M. G. Alles, M. A. Vasarhelyi, and J. Wu. Analytical procedures for continuous data level auditing: Continuity equations. 2010. [p2] B. Pfaff. vars: Var modelling. R package version, pages 13, 2008a. [p3] B. Pfaff. Var, svar and svec models: Implementation within r package vars. Journal of Statistical Software, 27(4):132, 2008b. [p3] B. Pfaff and K. Im Taunus. Using the vars package. 2007. [p3] M. A. Vasarhelyi and F. B. Halper. The continuous audit of online systems. Auditing: A Journal of Practice & Theory, 10(1):110125, 1991. [p1] M. A. Vasarhelyi, M. G. Alles, and A. Kogan. Principles of analytic monitoring for continuous assurance. Journal of Emerging Technologies in Accounting, 1(1):121, 2004. [p1] M. A. Vasarhelyi, M. Alles, and K. T. Williams. Continuous assurance for the now economy. Institute of Chartered Accountants in Australia Sydney, Australia, 2010. [p1] Erik van Kempen Fontys University of Applied Sciences Fontys Hogeschool Financieel Management (Fontys School of Financial Management) Rachelsmolen 1 5612MA Eindhoven The Netherlands erikvankempen@ieee.org

Vous aimerez peut-être aussi