Vous êtes sur la page 1sur 29

B.I.

Testing

.......torture the data

By:

Nikhil Bajaj
(Bachelor of Engineering in Information Technology)
( B.I. tester in iGATE Patni )
B.I. Testing……………….torture the data

Version 1.0------------August 2011

INDEX

Sr.no. Topic Page


No.
1 Challenges in the BI Testing field 2
2 Expectations from the IT Industry 4
BI/DW 5
3 What is BI? 5
4 Business Intelligence and Data Warehouse 7
5 What is a Data Warehouse? 8
6 Generally how does the data flow in a Data Warehouse? 10
7 What is a Data Mart? 11
8 What is ETL? 12
BI/DW Testing 13
9 What is the need to test a Data Warehouse? 13
10 Data Warehouse Testing and Database Testing 14
11 Type of testing done in a Data Warehouse project 15
12 Who all are involved in testing a data warehouse? 17
13 What are the phases undergone by the QA team? 18
14 How does the QA team prepare test cases? 19
15 Query Format 21
16 Example 25
17 What are the tools that a QA team may use? 28

Mail your queries to bajajnikhil111@yahoo.co.in Page 1


B.I. Testing……………….torture the data

Challenges in the BI Testing field:


There are many challenges to the development of the specialized skills
required for BI testing:

1. Unwillingness on the part of DW developers


Any IT professional planning to build a career in this exciting field aims
to be an expert ETL developer, OLAP specialist, dimensional data
modeler or DW architect; DW tester doesn't even make the list of
desirable roles.
This is due to the false perception that only such roles carry premium
rates in the job market and only such roles get to face the technical
challenges associated with a BI project.
This has left the BI project team with very few takers for the challenging
and critical role of tester.

2. Lack of awareness
As a general practice, testers plan their career in such a way that they
specialize and equip themselves with technical skills for the tools
involved in test execution (e.g., Winrunner, SilkTest) and test
management (e.g., Quality Center), with very little endeavor to develop
skills in the underlying technology.
But a good understanding of ETL/OLAP tools and technologies is an
essential skill for BI testing and, so far, testers have not developed a
keen interest in this skill.

3. Absence of tools
The BI marketplace is flooded with many tools and vendors, each
attempting to replace the other in the three layers of BI: database, ETL
and OLAP.
But there are no popular ETL/OLAP testing tools in the market that offer
features for automated testing or functional testing.

Mail your queries to bajajnikhil111@yahoo.co.in Page 2


B.I. Testing……………….torture the data

4. Lack of standard approach/methodology


While standard methodologies exist for testing as a whole, there seems
to be no industry-wide view on the suggested approach or methodology
for BI testing.
An ideal methodology should include a test strategy, a test plan and test
cases that cover thorough testing of the various phases of data
movement.
Creating test cases and test data that provide adequate coverage to
each of the phases is very critical for ensuring a comprehensive quality
assurance (QA) of the DW.

Mail your queries to bajajnikhil111@yahoo.co.in Page 3


B.I. Testing……………….torture the data

Expectations from the IT Industry


Listed below are some initiatives that can provide the much-needed boost
to BI testing field:

1. Promote awareness within the DW community that BI testing is a


challenging proposition requiring highly valued skills, thereby
encouraging ETL and BI developers to assume these roles.
Moreover, leading IT players with extensive experience in the DW/BI
area should promote well-defined career options and career
progression plans to the ETL/OLAP developers and conventional
testers.

2. Invest in research to formalize methodologies covering the entire


spectrum of DW/BI testing in full detail.

3. Invest in building assets, tools and job aids to strengthen this


function and provide productivity gains.

4. Develop training courses and course content to cross-train ETL/OLAP


developers in testing nuances and testers in DW and ETL/OLAP tools
and technology concepts.

5. Build strong testing teams with complimentary skills.

The topics covered in this document are prepared according to the above
challenges faced. Keeping in mind all these challenges along with the
expectations from us, let us first start with what is Business Intelligence,
then we will see why is it often used with the term Data Warehouse, then
what is Data Warehouse, Data Mart, ETL and what is the difference
between database testing and data warehouse testing and finally go into
the details of what is BI testing.

Mail your queries to bajajnikhil111@yahoo.co.in Page 4


B.I. Testing……………….torture the data

To test an object, first we need to understand what is that object. So let us


start with understanding Business Intelligence so that we can learn how
to test it.

BI/DW:

What is BI?
BI is an abbreviation of the two words Business Intelligence, bringing the
right information at the right time to the right people in the right format.

Definition:
It is a broad category of applications and technologies for gathering,
storing, analyzing, and providing access to data to help enterprise users
make better business decisions.

Explanation (What is BI about?):


The five key stages of Business Intelligence:
1. Data Sourcing
2. Data Analysis
3. Situation Awareness
4. Risk Assessment
5. Decision Support

1. Data sourcing
Business Intelligence is about extracting information from multiple
sources of data.
The data might be: text documents - e.g. memos or reports or email
messages, photographs and images, sounds, formatted tables, web
pages and URL lists.
The key to data sourcing is to obtain the information in electronic form.
So typical sources of data might include: scanners, digital cameras,
database queries, web searches, computer file access, etc.

Mail your queries to bajajnikhil111@yahoo.co.in Page 5


B.I. Testing……………….torture the data

2. Data analysis
Business Intelligence is about synthesizing useful knowledge from
collections of data.
It is about estimating current trends, integrating and summarizing
disparate information, validating models of understanding, and
predicting missing information or future trends.
This process of data analysis is also called data mining or knowledge
discovery.

3. Situation awareness
Business Intelligence is about filtering out irrelevant information, and
setting the remaining information in the context of the business and its
environment.
The user needs the key items of information relevant to his or her
needs, and summaries that are syntheses of all the relevant data
(market forces, government policy etc.).
Situation awareness is the grasp of the context in which to understand
and make decisions.

4. Risk assessment
Business Intelligence is about discovering what plausible actions might
be taken, or decisions made, at different times.
It is about helping you weigh up the current and future risk, cost or
benefit of taking one action over another, or making one decision versus
another.
It is about inferring and summarizing your best options or choices.

5. Decision support
Business Intelligence is about using information wisely.
It aims to provide you warning of important events, such as takeovers,
market changes, and poor staff performance, so that you can take
preventative steps.
It seeks to help you analyze and make better business decisions, to
improve sales or customer satisfaction or staff morale.
It presents the information you need, when you need it.

Mail your queries to bajajnikhil111@yahoo.co.in Page 6


B.I. Testing……………….torture the data

Business Intelligence and Data Warehouse


Business intelligence is a term commonly associated with data
warehousing.
In fact, many of the tool vendors position their products as business
intelligence software rather than data warehousing software.
This is because often BI applications use data gathered from a data
warehouse or a data mart.
However, not all data warehouses are used for business intelligence, nor do
all business intelligence applications require a data warehouse.

In this document, we are considering DW and BI testing as the same.

Difference:
Business intelligence usually refers to the information that is available for
the enterprise to make decisions on.
A data warehousing (or data mart) system is the backend, or the
infrastructural component for achieving business intelligence.

Mail your queries to bajajnikhil111@yahoo.co.in Page 7


B.I. Testing……………….torture the data

What is a Data Warehouse?


Abbreviated DW, a collection of data designed to support management
decision making. Data warehouses contain a wide variety of data that
present a coherent picture of business conditions at a single point in time.

A data warehouse is a place where data is stored for archival, analysis


and security purposes. Usually a data warehouse is either a single
computer or many computers (servers) tied together to create one giant
computer system.

Definition:
A data warehouse is a subject-oriented, integrated, time-variant and non-
volatile collection of data in support of management's decision making
process.

Explanation:
The important characteristics of a Data Warehouse:
1. Subject Oriented
2. Integrated
3. Time-variant
4. Non-volatile

1. Subject Oriented
It contains data that gives information about a particular subject instead
of about a company's ongoing operations.

2. Integrated
It contains data that is gathered into the data warehouse from a variety
of sources and merged into a coherent whole.

3. Time-variant
All data in the data warehouse is identified with a particular time period.

Mail your queries to bajajnikhil111@yahoo.co.in Page 8


B.I. Testing……………….torture the data

4. Non-volatile
Data is stable in a data warehouse. More data is added but data is never
removed. This enables management to gain a consistent picture of the
business.

Mail your queries to bajajnikhil111@yahoo.co.in Page 9


B.I. Testing……………….torture the data

Generally how does the data flow in a Data Warehouse?

Staging
(Flat File) ETL Source
1. Staging History
.txt File
2. Staging Error

Data
Warehouse

Data Mart

Mail your queries to bajajnikhil111@yahoo.co.in Page 10


B.I. Testing……………….torture the data

What is a Data Mart?


Data Mart is subset of the data warehouse.
It is a repository of data that holds information on a specific business area,
for example – Sales.
Data Marts have the same definition as data warehouse but have limited
audience and/or data content.

So now the question is how do we move or copy the data from everyday
transactional database to data warehouse?
Here is where ETL comes to play.

Mail your queries to bajajnikhil111@yahoo.co.in Page 11


B.I. Testing……………….torture the data

What is ETL?
Short for Extract, Transform, Load, three database functions that are
combined into one tool to pull data out of one database and place it into
another database.

Definition:
ETL is a process used to collect data from various sources, transform the
data depending on business rules/needs and load the data into a
destination database.

Explanation:
The ETL process has 3 main steps, which are Extract, Transform and Load.

1. Extract
The first step in the ETL process is extracting the data from various
sources.
Each of the source systems may store its data in completely different
format from the rest.
The sources are usually flat files or RDBMS, but almost any data storage
can be used as a source for an ETL process.

2. Transform
Once the data has been extracted and converted in the expected
format, it’s time for the next step in the ETL process, which is
transforming the data according to set of business rules.
The data transformation may include various operations including but
not limited to filtering, sorting, aggregating, joining data, cleaning data,
generating calculated data based on existing values, validating data, etc.

3. Load
The final ETL step involves loading the transformed data into the
destination target, which might be a database or data warehouse.

Mail your queries to bajajnikhil111@yahoo.co.in Page 12


B.I. Testing……………….torture the data

BI/DW Testing:

The main difference between normal testing and testing a data


warehouse is that we basically involve the SQL queries in our test case
documents.

What is the need to test a Data Warehouse?

1. Data selection from multiple source systems and analysis that


follows, pose great challenge.

2. Volume and the complexity of the data.

3. Inconsistent and redundant data in a data warehouse.

4. Loss of data during the ETL process.

5. Non-Availability of comprehensive test bed

6. Critical Data for Business.

Mail your queries to bajajnikhil111@yahoo.co.in Page 13


B.I. Testing……………….torture the data

Data Warehouse Testing and Database Testing

All data warehouses are database, but not all databases are data
warehouse.

A Data Warehouse is a database that is designed for facilitating querying


and analysis. Often designed as OLAP (On-Line Analytical Processing)
systems, these databases contain read-only data that can be queried and
analyzed far more efficiently as compared to your regular OLTP application
databases.

Testing a database and testing a data warehouse are more or less the same
except for some points as follows:

1. The ETL processes together form a DW, so ETL testing is the main
component of DW testing.

2. Since data warehouse is mainly used for reporting purpose, it


becomes necessary to test the reporting functionality of it.

3. Data warehouses store very large amount of data as compared to


databases. So testing the performance of a DW is also
recommended. Whereas in databases, performance is not an issue.

4. Data warehouses have to store the historic data and this feature has
to be checked in DW testing. Whereas in databases, historic data can
be seen very rarely.

This document mainly focuses on the ETL testing part.

Mail your queries to bajajnikhil111@yahoo.co.in Page 14


B.I. Testing……………….torture the data

Type of testing done in a Data Warehouse project:


The type and number of test performed on a data warehouse varies with
projects.
Some of the common ones are:

1. Requirement testing:
Requirement testing is conducted before any other level of testing.
It verifies whether or not all the requirements provided in the
specification are fulfilled.

2. ETL testing:
In the ETL testing stage, we make sure that appropriate changes in the
source system are captured properly and propagated correctly into the
data warehouse.

3. Smoke Testing:
A smoke test is a collection of written tests that are performed on a
system prior to being accepted for further testing.
This is also known as a build verification test.

4. Functional Testing:
In the functional testing stage, we make sure all the business
requirements are fulfilled.

5. Unit Testing:
Developers perform tests on their deliverables during and after their
development process.
The unit test is performed on individual components and is based on the
developer's knowledge of what should be developed.

6. Integration Testing:
Here we validate the business and functional requirement from which
data according to correct business rules should produce the correct
number of rows being transferred and to verify the data load volumes.

Mail your queries to bajajnikhil111@yahoo.co.in Page 15


B.I. Testing……………….torture the data

7. Regression Testing:
Validate that the system continues to function correctly after being
changed.
It is performed after a defect reported has been fixed by developer.

8. End-to-end testing:
In the end-to-end testing stage, we let the system run for a few days to
simulate production situations.

9. System Testing:
System Testing is performed to prove that the system meets the
functional specifications from an end to end perspective.
We as a testing team will verify that the data in the source system
databases and the data in the target are consistent through out the
process.
Here QA environment should be the replica of Production prior running
the system test.

10.User Acceptance Testing:


The objective of user acceptance testing is to certify that a release
meets user expectations and is ready for production.

Mail your queries to bajajnikhil111@yahoo.co.in Page 16


B.I. Testing……………….torture the data

Who all are involved in testing a data warehouse?

1. Business Analysts gather and document requirements

2. QA Testers develop and execute test plans and test scripts

3. Infrastructure people set up test environments

4. Developers perform unit tests of their deliverables

5. DBA’s test for performance and stress

6. Business Users perform functional tests including User Acceptance


Tests (UAT)

QA, short for Quality Assurance is any systematic process of checking to


see whether a product or service being developed is meeting specified
requirements. Many companies have a separate department devoted to
quality assurance, known as the QA team.

Mail your queries to bajajnikhil111@yahoo.co.in Page 17


B.I. Testing……………….torture the data

What are the phases undergone by the QA team?


While implementing the best practices at testing, the QA teams follow the
various phases in data warehouse testing. They are:

1. Business understanding
a. High Level Test Approach
b. Test Estimation
c. Review Business Specification
d. Attend Business Specification and Technical Specification
e. Walkthroughs

2. Test plan creation, review and walkthrough

3. Test case creation, review and walkthrough

4. Test Bed & Environment setup

5. Receiving test data file from the developers

6. Test predictions creation, review (Setting up the expected results)

7. Test case execution and regression testing if required.


a. Comparing the predictions with the actual results by testing
the business rules in the test environment.
b. Displaying the comparison result in a separate worksheet.

8. Deployment
a. Validating the business rule in the production environment.

Mail your queries to bajajnikhil111@yahoo.co.in Page 18


B.I. Testing……………….torture the data

How does the QA team prepare test cases?

This topic is very important for a test engineer who is responsible for
writing the test cases.

There are certain types of checks that can be done on the data under
review:

1. Attribute check
2. Current Row check
3. Duplicate check
4. Original Key check
5. Reconciliation check
6. Relationship check

1. Attribute check
Attribute check means verifying that the data is moving correctly from
source table to target table.

2. Current Row check


Current Row check means verifying that the current indicator is “Y” (an
indicator for latest record) for all the latest rows (with latest time
stamp).

3. Duplicate check
Duplicate check means checking that there are no duplicate values for
columns that are required to be unique.

4. Original Key check


Original Key check means checking whether the NOT NULL columns have
some value in them.

5. Reconciliation check
Reconciliation check means verifying that the number of rows in target
and the number of rows coming from source are the same.

Mail your queries to bajajnikhil111@yahoo.co.in Page 19


B.I. Testing……………….torture the data

6. Relationship check
Relationship check means checking that every primary key value in child
table is present in parent table.

Mail your queries to bajajnikhil111@yahoo.co.in Page 20


B.I. Testing……………….torture the data

Query Format

Given below are the formats for writing SQL queries to perform all types
of checks.

1. Attribute check

Select count(1)
From( Select source table attributes
From source table
Where list of conditions
Except
Select corresponding target table attributes
From target table
Where list of conditions
)alias(alternate name)

Expected output: Count=0

In the above query, we are first retrieving all the attributes from source
table which are mapped to target and then removing from this list all
the attributes that are present in target table.
So the result count should be zero, meaning that all the attributes that
are present in source table are present in target table and the test case
can be passed.

2. Current Row check

Select count(1)
From( Select records
From table_1
Where list of conditions(records with current time stamp but
having indicator N)
)alias

Mail your queries to bajajnikhil111@yahoo.co.in Page 21


B.I. Testing……………….torture the data

Assumption: Indicator for current record: Y


Indicator for old record: N

Expected output: Count=0

In the above query, we are retrieving those records which have current
timestamp but still their indicator is ‘N’.
So if the result count is zero, it means that there are no such records who
are current but have an indicator of being old and the test case can be
passed.

3. Duplicate check

Select count(1)
From( Select attribute_list_1
From table_1
Where list of conditions
Group by attribute_list_1
Having count(1)>1
)alias

Expected output: Count=0

In the above query, we are retrieving the attributes which are supposed
to be unique and then grouping them in the same order in which they
were retrieved.
This will group all the records which have these attributes duplicated
and so the count will be greater than 1 for such records.
When we take the count of such duplicate records and we get zero
output, then this shows that there are no duplicate values for unique
columns and the test case can be passed.

Mail your queries to bajajnikhil111@yahoo.co.in Page 22


B.I. Testing……………….torture the data

4. Original Key check

Select count(1)
From table
Where list of conditions
And (any of NOT NULL values are NULL)

Expected output: Count=0

In the above query, we are retrieving all the records which have any of
the NOT NULL columns as NULL and then taking count of it.
If the count is zero, this means there are no such records and the test
case can be passed.

5. Reconciliation check

Select count(*)
From source table
Where list of conditions

Select count(*)
From target table
Where list of conditions

Expected output: Source count = Target count

In the above check, there are two queries, one fetching the count of
total number of records in source table and the other fetching the count
of total number of records in target table.
If both the counts are same, this means that there are equal number of
records in source and target and the test case can be passed.

Mail your queries to bajajnikhil111@yahoo.co.in Page 23


B.I. Testing……………….torture the data

6. Relationship check

Select count(child_id)
From( Select parent_attribute_to_be_checked parent_id,
Child_attribute_to_be_checked child_id
From( Select distinct attributes from child table
Left outer join
Select distinct attributes from parent table
On join conditions
)
)
Where parent_id IS NULL

Expected output: Count=0

In the above query, we are retrieving all the records in target table
which has no parent in source table and then taking its count.
If the count is zero, this means that there are no such records and the
test case can be passed.
Checking lookup condition is the most common example for this type of
check.

Mail your queries to bajajnikhil111@yahoo.co.in Page 24


B.I. Testing……………….torture the data

Example

Below example will make the above queries easy to understand.

Consider a source table STUDENTS and a target table FIRST_CLASS_STUDS.


We have to test whether the transformations between these two tables
given in the mapping document are working properly or not.
Below table shows the mapping between the two tables.

Source Source Target table Target Transformation


table columns columns
STUDENTS SR. NO
STUDENTS NAME FRST_CLAS_STUDS NAME
Capitalize each
letter
STUDENTS ROLL_NO FRST_CLAS_STUDS ROLL_NO It should be
(P.K.) present in the
source table
STUDENTS PERCENTAGE FRST_CLAS_STUDS PERCENT Direct mapping
STUDENTS CLASS
STUDENTS ADDRESS
STUDENTS DOB

Attribute check: In target table, 3 columns are mapped from source table
which have their own individual transformations.
We have to test each attribute that is present in the target table keeping
aside the other attributes in source table which are not mapped.

Select count(*) from


(Select upper(S.NAME), S.ROLL_NO, S. PERCENTAGE
From STUDENTS S
Where S. PERCENTAGE >= 60
Except
Select F.NAME, F.ROLL_NO, F.PERCENT
From FRST_CLAS_STUDS F) A;

Mail your queries to bajajnikhil111@yahoo.co.in Page 25


B.I. Testing……………….torture the data

Expected output: Count=0

Duplicate check: In target table, attribute ROLL_NO is the primary key.


So it has to be unique.
We have to test whether this attribute is unique or not.

Select count(*) from


( Select ROLL_NO from FRST_CLAS_STUDS
Group by ROLL_NO
Having count(*)>1) A;

Expected output: Count=0

Original key check: In target table attribute ROLL_NO is the primary key.
So it has to be NOT NULL.
We have to test whether this attribute has values for all the records or not.

Select count(*) from FRST_CLAS_STUDS


Where ROLL_NO is NULL;

Expected output: Count=0

Reconciliation check: We have to test whether correct number of records


has been moved from source to target.

Select count(*) from STUDENTS


Where PERCENTAGE >= 60;

Select count(*) from FRST_CLAS_STUDS;

Expected output: count from 1st query = count from 2nd query

Mail your queries to bajajnikhil111@yahoo.co.in Page 26


B.I. Testing……………….torture the data

Relationship check: In target, the attribute ROLL_NO is derived from


attribute ROLL_NO in source.
So we have to check whether all roll numbers in target are present in
source or not.

Select count(F. ROLL_NO) from


( select distinct F.ROLL_NO from FRST_CLAS_STUDS F
Left outer join
Select distinct S.ROLL_NO from STUDENTS S)
Where S.ROLL_NO is NULL;

Mail your queries to bajajnikhil111@yahoo.co.in Page 27


B.I. Testing……………….torture the data

What are the tools that a QA team may use?

1. Data access tools (e.g., TOAD, WinSQL) are used to analyze content
of tables and to analyze results of loads.

2. ETL Tools (e.g. Informatica, Datastage).

3. Test management tool (e.g. Test Director, Quality Center) that


maintains and tracks the requirements, test cases, defects and
traceability matrix.

All the best for your future as a data warehouse or database tester!!!!!!!!!!!

Mail your queries to bajajnikhil111@yahoo.co.in Page 28

Vous aimerez peut-être aussi