Vous êtes sur la page 1sur 7

2011 Ninth International Conference on ICT and Knowledge Engineering

Towards a Data Warehouse Testing Framework


Neveen ElGamal
Information Systems Department Faculty of Computers and Information, Cairo University Cairo, Egypt n.elgamal@fci-cu.edu.eg

Ali El Bastawissy
Information Systems Department Faculty of Computers and Information, Cairo University Cairo, Egypt alibasta@fci-cu.edu.eg

Galal Galal-Edeen
Information Systems Department Faculty of Computers and Information, Cairo University Cairo, Egypt galal@acm.org

Abstract --- Data warehouse (DW) testing is a very critical stage in the DW development because decisions are made based on the information resulting from the DW. So, testing the quality of the resulting information will support the trustworthiness of the DW system. A number of approaches were made to describe how the testing process should take place in the DW environment. In this paper we will present briefly these testing approaches, and then a proposed matrix that structures the DW testing routines will be used to evaluate and compare these approaches. Afterwards an analysis of the comparison matrix will highlight the weakness points that exist in the available DW testing approaches. Finally, we will point out the requirements towards achieving a homogeneous DW testing framework. In the end, we will conclude our work. Keywords: Data Warehouse Testing, Data Warehouse Quality

the worthiness of an entity through a group of tests while Evaluation is the process of analyzing, reflecting upon, and summarizing assessment information and making judgments or decisions based upon the information gathered [2]. DW quality is different from the other terms as it refers to the combined outcome of the three processes. It is widely agreed upon that the DW is totally different from other systems such as Software or Transactional Systems. Consequently, the testing techniques used for these other systems are inadequate to be used in DW testing. Here are some of the differences: x DW always answers Ad-hoc queries, which makes it impossible to test prior to system delivery. On the other hand, all functions in the software engineering realm are predefined. DW testing is data centric, while software testing is code centric. DW always deals with huge data volumes. The testing process in other systems ends with the development life-cycle while in DWs it continues after the system delivery. Software projects are self contained but a data warehouse project continues due to decision-making process requirement for ongoing changes [3]. Most of the available testing scenarios are driven by some user inputs while in DW most of the tests are system-triggered scenarios. Volume of test-data in DW is considerably large compared to any other testing process. In other systems test cases can reach hundreds but the valid combinations of these test cases will never be unlimited. Unlike the DW, the test cases are unlimited due to the core objective of the DW that allows all possible views of data. [4]. DW testing consists of different types of tests depending on the time the test is taking place for example; Initial data load test is different from the incremental data load test.

I. INTRODUCTION During the development of DWs, a considerable amount of data is integrated, structured, cleansed, and grouped in a single framework that is the DW. A number of changes take place on the data which could lead to data manipulation and corruption. Data warehouse projects fail for many reasons, all of which can be traced to a single cause: nonquality. [1]. There should be a way of guaranteeing that the data in the sources is the same data that reached the DW, and the data quality is improved; not lost. In the data warehousing process, data passes through several stages each one causing different kind of changes to the data to finally reach the user in a form of a chart or a report. It is not the best approach to compare the DW system outputs and the data in the data sources to test if the DW system is working properly. This type of test is an informative test that will take place at a certain point in the testing process but the most important part in the testing process should take place during the DW development. Every stage and every component the data passes through should be tested to guaranty its accuracy and data quality preservation or even improvement. DW Assessment, Evaluation, Testing and Quality are most of the time used as synonyms which refer to how good the DW is. Linguistically Assessment and Evaluation are synonyms however in DW field, Assessment is the process of measuring 65

x x x

x x x

As shown in figure 1, DW system consists of a number of inter-related components:

978-1-4577-2162-5/11/$26.00 2011 IEEE

x x x x x

Data Sources (DS) Operational Data Store (ODS)/ Data Staging Area (DSA) Data Warehouse (DW) Data Marts (DM) And, User Interface (UI) Applications .Ex; OLAP reports, Decision Support tools, and Analysis Tools

perspective. Some authors presented a DW testing methodology like [7, 10, 18]. In this paper we are only concerned with the attempts considering how to test the DW. In other words, were concerned with the types of tests that must be considered while testing the DW. The rest of this section will introduce these attempts in a chronological order and a comparison between them will be presented later in the following sections. 1. In [9], the author introduced a DW testing and validation technique. He had broken the testing and validation process into four well defined and high-level processes namely; 1. Integration testing 2. System testing 3. Data validation 4. Acceptance testing. 2. In [12], the authors presented a DW testing approach that they named a DW validation strategy. This attempt was proposed during a DW testing project. They concentrated on validating the data that is loaded in the DW and checking its credibility. This took place via 2 main approaches:  Approach I: tends to follow the data from the source to the target warehouse.  Approach II: tends to follow the source through the Extraction Transformation Loading (ETL) process then into the target warehouse. Using these 2 approaches they have divided the process of testing into consecutive levels: Constraint testing Source to target Counts Source to target data validation Error processing Defect Tracking 3. In [7], Wipro Technologies company presents in this white paper their data warehouse testing strategy. They have used the standard software testing process that includes: Unit testing, Integration testing, System and acceptance testing, and performance testing. They chose to customize the contents of these tests in order to be adequate to be used for DW testing. In addition, they presented an abstract life cycle for testing the DW application. 4. In [11], the author divided the data warehouse testing into: x Requirements Testing x Unit testing x Integration testing x Acceptance testing

Figure 1.

DW System Architecture

Each component needs to be tested to verify its efficiency independently. The connections between the DW components are groups of transformations that take place on data. These transformation processes should be tested as well to ensure data quality preservation. The results of the DW system (ex: Charts, Reports, outputs of Decision support and analysis tools) should be compared with the original data existing in the DSs. Finally, from the operational point of view the DW system should be tested for performance, reliability, robustness, recovery, etc The remainder of this paper will be organized as follows; section II will briefly survey the existing DW testing approaches. Section III will introduce the matrices that will be used later in section IV to compare and evaluate the existing DW testing approaches. Section V will analyze the comparison matrix to highlight the drawbacks and weaknesses that exist in the area of DW testing. Section VI will use the analysis in section V to state the needs for a DW testing framework. Finally we will conclude our work in section VII. II. EXISTING DW TESTING APPROACHES A number of trials had been made to address the DW testing process. Some were made by companies offering consultancy services for DW testing like [5-8]. Others, to fill the gap of not finding a generic DW testing technique, had proposed one as a research attempt like [3, 9-15]. A different trend was taken by authors to present some automated tools for the DW testing process like [16, 17] and from a different 66

What was unique for this approach is that it tested the data granularity on its lowest level and it also verified the user requirements with the resulting data. 5. In [10], the author introduced an abstract DW testing methodology as follows:  Use of Traceability to enable full test coverage of Business Requirements  In depth review of Test Cases  Manipulation of Test Data to ensure full test coverage  Provision of appropriate tools to speed the process of Test Execution & Evaluation  Regression Testing He also stated that the DW Testing Types (routines) are: 1. Unit Testing. 2. Integration Testing. 3. Technical Shakedown Testing. 4. System Testing. 5. Operation readiness Testing 6. User Acceptance Testing. 6. In [15], the author concentrated on testing the ETL Applications since most of the work is done through it. He has stated the testing goals that are required to be met after building the DW. The DW testing goals are: Data completeness Data transformation Data quality Performance and scalability Integration testing User-acceptance testing Regression testing 7. In [13], the author stated that during the process of building the DW with its ETL tools and applications, six types of testing need to be conducted. These tests are: ETL Testing Functional Testing Performance Testing Security Testing User Acceptance Testing End-to-end Testing

published in [18-20] The testing activities can be split into four logical units regarding: Multidimensional database testing, Data pump (ETL) testing, Metadata and, OLAP testing. The authors then highlighted how these activities split into smaller more distinctive activities to be performed during the DW testing process. 9. In [3], the authors introduced data warehouse testing activities (routines) framed within a DW development methodology introduced in [21]. They have stated that the components that needs to be tested are, Conceptual Schema, Logical Schema, ETL Procedures, Database, and Front-end. To be able to test these components, they have listed eight test types that best fit the characteristics of DW systems. These test types are: Functional test Usability test Performance test Stress test Recovery Test Security test Regression test

A comprehensive explanation of how the DW components are being tested by the above testing routines is then explored showing what type of test(s) is suitable for which component as shown in table I.
TABLE I: DW COMPONENTS VS TESTING TYPES [3]

The author did not give considerable attention to the DW testing methodology but concentrated on how these tests are conducted in the DW environment. Moreover, he stated how these tests can be conducted using the Microsoft SQL Server tools. 8. In [14], the authors suggested a proposal for basic DW testing activities (routines) as a final part of the DW testing methodology. Other parts of the methodology were 67 10. In [6], the author presented DW testing types with respect to DW development stages and illustrated the DW testing focus points categorized into 2 main high-level aspects: Underlying Data:

1. Data Coverage 2. Data Complying with the transformation logic in accordance with the business rules DW Components: 1. Performance and scalability 2. Component orchestration testing (Integration Test) 3. Regression Testing III. PROPOSED DW TESTING MATRICES

when any change takes place on the design of the system. o After System Delivery: Redundant test that takes place several times during system operation. The what, where and when testing categories will result in a 3 dimensional matrix. As shown in table II, the rows represent the where dimension, the columns represent the what dimension, and later on the when dimension shall be represented in color in the following section when this matrix is used to compare the existing DW testing approaches to show to what extent did the testing approaches cover the aspects of the DW testing process.
TABLE II: DW TESTING MATRICES

DW testing process consists of a number of testing routines. These routines could be categorized by what, where, and when these tests will take place, x WHERE: presents the component of the DW that this test targets. This divides the DW architecture as shown in figure 1 into the following layers: o Data Sources to Operational Data Store: Presents the testing routines targeting data sources, wrappers, extractors, transformations and data staging area itself. o Data Staging Area to DW: Presents the testing routines targeting the loading process, and the DW itself. o DW to Data Marts: Presents the testing routines targeting the data marts and the transformations that take place on the data used by the data marts and the data marts themselves. o Data Marts to User Interface: Presents the testing techniques targeting the transformation of data to the Interface applications and the interface applications themselves. WHAT: represents what these routines will test in the targeted DW component. o Schema: focuses on testing DW design issues. o Data: concerned with all data related tests like data quality, data transformation, data selection, data presentation, etc o Operational: tests the data warehousing as an integrated product to confirm its reliability, robustness, regression, etc and tests that are concerned with the process of putting the DW into operation. WHEN will this test take place? o Before System Delivery: A one time test that takes place before the system is delivered to the user or

Schema Data DSODS ODSDW DWDM DMUI Backend Frontend

Operation

IV. APPROACHES COMPARISON AND EVALUATION After studying how each proposed DW testing approach addressed the DW testing and according to the DW testing matrices defined in the previous section, a comparison matrix is presented in table III showing the test routines that each approach covered. The DW testing approaches are represented on the columns, the what and where dimensions classify the test routines on the rows. The intersection of rows and columns indicates the coverage of the test routine in this approach where represents full coverage and represents partial coverage. Finally, the when dimension that indicates whether this test takes place before or after system delivery is represented by color highlighting the tests which take place after the system delivery, while the tests that take place during the system development or when the system is subject to change are left without color highlighting. We were able to compare only 10 approaches, as not enough data was available for the rest of the approaches. In our study we focused on the approaches, showing what to test and how to test it and not the attempts presenting how to automate the testing process.

68

TABLE III:

DW APPROACHES COMPARISON

69

As it is obvious in table III, none of the proposed approaches addressed the entire DW testing matrices. This is simply because each approach addressed the DW testing process from its own point of view without leaning on any standard or general framework. Some of the attempts considered only parts of the DW framework shown in figure 1. Other attempts used their own framework for the DW environment according to the case they are addressing for example; [3] used a DW architecture that does not include either an ODS or DW Layers. The data is loaded from the Data Sources to the Data Marts directly. This architecture makes the Data Marts layer acts as both the DW and the Data Mart interchangeably. Other approaches like [7, 9, 13] did not include the ODS layer. From another perspective, there are some test routines that are not addressed by any approach like; the ODS conceptual model tests, Data Quality factors like accuracy, completeness, precision, continuity, etc... Some major components of the DW were not tested by any of the proposed approaches which is the DM Schema and the additivity of measures in the DMs. V. COMPARISON MATRIX ANALYSIS By studying carefully the existing DW testing approaches, it is evident that the DW environment lacks the following: 1. The existence of a generic, well defined, DW testing approach that could be used in any project. 2. None of the existing approaches covered all the tests needed to guarantee the efficiency of DW after delivery. 3. The approaches proposed in [14, 18, 19] were the only ones focusing on both the DW testing routines and the life cycle of the testing process. The life cycle was presented as follows: o Test Plan o Test Cases o Test Data o Termination Criteria o Test Result Nevertheless, they presented the two approaches independently not showing how the testing routines can fit in a complete DW testing life cycle. To fully cover the testing process of DW a framework needs to fill this gap. 4. None of the existing approaches was targeted to the Unconventional DW types like Spatial DW, Active DW, Temporal DW, DW2.0, etc Some of the contents of the testing routines will differ from one DW to another according to its DW type. A specialization of these test routines needs to be defined in order to make the DW testing approach applicable for all DW types. 5. Some of the above test routines could be automated but none of the proposed approaches showed how these routines could be automated or have an automated assistance. Due to the huge amount of data in the DW 70

6.

and the considerable number of tests that the DW passes through during development and after delivery, having an automated support for some of the test routines is a must to accelerate the testing process. The existing DW testing approaches missed testing some of the DW components. These tests affect the quality, efficiency and effectiveness of the DW severely. These tests are: a. ODS Conceptual Schema: The ODS has a conceptual model that carries data from a number of heterogeneous data sources, which are different in the structure and implementation. Lots of integration takes place in the ODS which may lead to severe data loss or data corruption if the conceptual model where the data is loaded in happens to be incorrect. b. DM Schema Design: Data marts are miniature DWs that need to be designed and validated to ensure data quality preservation. Lack of proper DM design could lead to data loss, inadequate dimension hierarchy, incorrect data aggregation, and violating the additivity of facts with respect to dimensions. Improper DM schema could lead to misleading the decision makers with incorrect data display. c. Additivity Guards: Facts are always preferred to be fully additive, and it is almost prohibited to be nonadditive, but real life is not always perfect. Facts are sometimes semi-additive which means that the fact defined in the DM is additive on some but not all the dimensions. For example: Inventory is non additive on the time dimension but it is additive on the location and supplier dimensions. Ignoring the additivity of measures along dimensions may cause the generation of misleading data so it is mandatory to guard the additivity of measures in the Data marts. d. Data Quality factors: as presented in [22] are Completeness, Accuracy, consistency, Precision, granularity, Continuity, Currency, Duration, Retention, Precedence and Balancing. Defects of data quality will eventually lead to failure in providing accurate business information. Each of these quality factors has a great influence on the overall quality of the DW. VI. REQUIREMENTS FOR A DW TESTING FRAMEWORK

After pointing out the factors of weakness that exist in the DW testing environment, we now state the requirements

which the DW testing environment needs. The DW testing environment requires a DW Testing Framework that: 1. 2. 3. 4. Is generic enough to be used in several DW testing projects. Provide tests for all the DW components and transformations. Comprehensively define all test routines to minimize ambiguity. Presents the testing routines within a DW life cycle that includes the following: a. Test Plan b. Test Cases c. Test Data d. Termination Criteria e. Test Results Supports testing unconventional DW types by providing a specialization of testing routines that is adequate for each type of DW. Presents how the test routines can be automated or get automatic support if full automation is not applicable for this specific routine. It should also include suggestions for using existing automated test tools to minimize the amount of work done to get automated support in the DW testing process. VII. CONCLUSION Some trials have been carried out address the DW testing, most of them were oriented to a specific problem and none of them was generic enough to be used in other data warehousing projects. It appears that all the experts want to tell us how to build these things (DWs) without ever addressing the issue of validating its accuracy once it is loaded [9]. Having a generic DW Testing Framework that addresses all the aspects of the DW testing process will ensure the quality of the DW or even improve it, in addition to gaining the end users trust for the results he gets from the tested DW.

VIII. REFERENCES
[1] L. P. English, Improving Data Warehouse and Business Information Quality (Methods for Reducing Costs and Increasing Profits). New York: John Wiley and Sons, Inc., 1999. C. L. Scanlan, "Assessment, Evaluation, Testing and Grading," in www.umdnj.edu, 2003. M. Golfarelli and S. Rizzi, "A Comprehensive Approach to Data Warehouse Testing," in ACM 12th international workshop on Data warehousing and OLAP (DOLAP '09) Hong Kong, China, 2009. Executive-MiH, "Data Warehouse Testing is Different," 2010. CTG, "CTG Data Warehouse Testing," in www.ctg.com, 2002. M. P. Mathen, "Data Warehouse Testing," in www.InfoSys.com, 2010. A. Munshi, "Testing a Data Warehouse Application," in www.wipro.com, 2003. SSNSolutions, "SSN Solutions," in www.ssnsol.com, 2006. C. Bateman, "Where are the Articles on Data Warehouse Testing and Validation Strategy?," in www.Information-Management.com, 2002. S. Bhat, "Data Warehouse Testing - Practical," in www.Stickminds.com, 2007. K. Brahmkshatriya, "Data Warehouse Testing," in www.Stickminds.com, 2007. R. Cooper and S. Arbuckle, "How to Throughly Test a Data Warehouse," in Software Testing Analysis and Review (STAREAST), Orlando, Florida, 2002. V. Rainardi, "Testin your Data Warehouse," in Building a data warehouse with examples in SQL server: Apress, 2008. P. Tanuka, O. Moravk, P. Vaan, and F. Miksa, "The Proposal of Data Warehouse Testing Activities," in 20th Central European conference on Information and Intelligent Systems, Varadin, Croatia, 2009, pp. 7-11. J. Theobald, "Strategies for Testing Data Warehouse Applications." in www.Information-Management.com, 2007. Inergy, "Automated ETL Testing in Data Warehouse Environment," in www.inergy.nl, 2007. R. K. Sharma, "Test Automation: In Data Warehouse Projects," in www.automatedtestinginstitute.com, 2007. P. Tanuka, W. Verschelde, and M. Kopek, "The proposal of Data Warehouse Testing Scenario," in European conference on the use of Modern Information and Communication Technologies (ECUMICT), Gent, Belgium, 2008. P. Tanuka, O. Moravk, P. Vaan, and F. Miksa, "The Proposal of the Essential Strategies of Data Warehouse Testing," in 19th Central European Conference on Information and Intelligent Systems (CECIIS), 2008, pp. 63-67. P. Tanuka, P. Schreiber, and J. Zeman, "The Realization of Data Warehouse Testing Scenario," in proizvodstvo obrazovanii. (Infokit-3) Part II: 3 medunarodnaja nature-technieskaja konferencija. Stavropol, Russia, 2008. M. Golfarelli and S. Rizzi, Data Warehouse Design: Modern Principles and Methodologies: McGraw Hill, 2009. D. Larson, "TDWI Data Cleansing: Delivering High-Quality Warehouse Data," The Data Warehouse Institute 2008.

[2] [3]

[4] [5] [6] [7] [8] [9]

[10] [11] [12]

5.

6.

[13] [14]

[15] [16] [17] [18]

[19]

[20]

[21] [22]

71

Vous aimerez peut-être aussi