Académique Documents
Professionnel Documents
Culture Documents
Testing
By:
Nikhil Bajaj
(Bachelor of Engineering in Information Technology)
( B.I. tester in iGATE Patni )
B.I. Testing……………….torture the data
INDEX
2. Lack of awareness
As a general practice, testers plan their career in such a way that they
specialize and equip themselves with technical skills for the tools
involved in test execution (e.g., Winrunner, SilkTest) and test
management (e.g., Quality Center), with very little endeavor to develop
skills in the underlying technology.
But a good understanding of ETL/OLAP tools and technologies is an
essential skill for BI testing and, so far, testers have not developed a
keen interest in this skill.
3. Absence of tools
The BI marketplace is flooded with many tools and vendors, each
attempting to replace the other in the three layers of BI: database, ETL
and OLAP.
But there are no popular ETL/OLAP testing tools in the market that offer
features for automated testing or functional testing.
The topics covered in this document are prepared according to the above
challenges faced. Keeping in mind all these challenges along with the
expectations from us, let us first start with what is Business Intelligence,
then we will see why is it often used with the term Data Warehouse, then
what is Data Warehouse, Data Mart, ETL and what is the difference
between database testing and data warehouse testing and finally go into
the details of what is BI testing.
BI/DW:
What is BI?
BI is an abbreviation of the two words Business Intelligence, bringing the
right information at the right time to the right people in the right format.
Definition:
It is a broad category of applications and technologies for gathering,
storing, analyzing, and providing access to data to help enterprise users
make better business decisions.
1. Data sourcing
Business Intelligence is about extracting information from multiple
sources of data.
The data might be: text documents - e.g. memos or reports or email
messages, photographs and images, sounds, formatted tables, web
pages and URL lists.
The key to data sourcing is to obtain the information in electronic form.
So typical sources of data might include: scanners, digital cameras,
database queries, web searches, computer file access, etc.
2. Data analysis
Business Intelligence is about synthesizing useful knowledge from
collections of data.
It is about estimating current trends, integrating and summarizing
disparate information, validating models of understanding, and
predicting missing information or future trends.
This process of data analysis is also called data mining or knowledge
discovery.
3. Situation awareness
Business Intelligence is about filtering out irrelevant information, and
setting the remaining information in the context of the business and its
environment.
The user needs the key items of information relevant to his or her
needs, and summaries that are syntheses of all the relevant data
(market forces, government policy etc.).
Situation awareness is the grasp of the context in which to understand
and make decisions.
4. Risk assessment
Business Intelligence is about discovering what plausible actions might
be taken, or decisions made, at different times.
It is about helping you weigh up the current and future risk, cost or
benefit of taking one action over another, or making one decision versus
another.
It is about inferring and summarizing your best options or choices.
5. Decision support
Business Intelligence is about using information wisely.
It aims to provide you warning of important events, such as takeovers,
market changes, and poor staff performance, so that you can take
preventative steps.
It seeks to help you analyze and make better business decisions, to
improve sales or customer satisfaction or staff morale.
It presents the information you need, when you need it.
Difference:
Business intelligence usually refers to the information that is available for
the enterprise to make decisions on.
A data warehousing (or data mart) system is the backend, or the
infrastructural component for achieving business intelligence.
Definition:
A data warehouse is a subject-oriented, integrated, time-variant and non-
volatile collection of data in support of management's decision making
process.
Explanation:
The important characteristics of a Data Warehouse:
1. Subject Oriented
2. Integrated
3. Time-variant
4. Non-volatile
1. Subject Oriented
It contains data that gives information about a particular subject instead
of about a company's ongoing operations.
2. Integrated
It contains data that is gathered into the data warehouse from a variety
of sources and merged into a coherent whole.
3. Time-variant
All data in the data warehouse is identified with a particular time period.
4. Non-volatile
Data is stable in a data warehouse. More data is added but data is never
removed. This enables management to gain a consistent picture of the
business.
Staging
(Flat File) ETL Source
1. Staging History
.txt File
2. Staging Error
Data
Warehouse
Data Mart
So now the question is how do we move or copy the data from everyday
transactional database to data warehouse?
Here is where ETL comes to play.
What is ETL?
Short for Extract, Transform, Load, three database functions that are
combined into one tool to pull data out of one database and place it into
another database.
Definition:
ETL is a process used to collect data from various sources, transform the
data depending on business rules/needs and load the data into a
destination database.
Explanation:
The ETL process has 3 main steps, which are Extract, Transform and Load.
1. Extract
The first step in the ETL process is extracting the data from various
sources.
Each of the source systems may store its data in completely different
format from the rest.
The sources are usually flat files or RDBMS, but almost any data storage
can be used as a source for an ETL process.
2. Transform
Once the data has been extracted and converted in the expected
format, it’s time for the next step in the ETL process, which is
transforming the data according to set of business rules.
The data transformation may include various operations including but
not limited to filtering, sorting, aggregating, joining data, cleaning data,
generating calculated data based on existing values, validating data, etc.
3. Load
The final ETL step involves loading the transformed data into the
destination target, which might be a database or data warehouse.
BI/DW Testing:
All data warehouses are database, but not all databases are data
warehouse.
Testing a database and testing a data warehouse are more or less the same
except for some points as follows:
1. The ETL processes together form a DW, so ETL testing is the main
component of DW testing.
4. Data warehouses have to store the historic data and this feature has
to be checked in DW testing. Whereas in databases, historic data can
be seen very rarely.
1. Requirement testing:
Requirement testing is conducted before any other level of testing.
It verifies whether or not all the requirements provided in the
specification are fulfilled.
2. ETL testing:
In the ETL testing stage, we make sure that appropriate changes in the
source system are captured properly and propagated correctly into the
data warehouse.
3. Smoke Testing:
A smoke test is a collection of written tests that are performed on a
system prior to being accepted for further testing.
This is also known as a build verification test.
4. Functional Testing:
In the functional testing stage, we make sure all the business
requirements are fulfilled.
5. Unit Testing:
Developers perform tests on their deliverables during and after their
development process.
The unit test is performed on individual components and is based on the
developer's knowledge of what should be developed.
6. Integration Testing:
Here we validate the business and functional requirement from which
data according to correct business rules should produce the correct
number of rows being transferred and to verify the data load volumes.
7. Regression Testing:
Validate that the system continues to function correctly after being
changed.
It is performed after a defect reported has been fixed by developer.
8. End-to-end testing:
In the end-to-end testing stage, we let the system run for a few days to
simulate production situations.
9. System Testing:
System Testing is performed to prove that the system meets the
functional specifications from an end to end perspective.
We as a testing team will verify that the data in the source system
databases and the data in the target are consistent through out the
process.
Here QA environment should be the replica of Production prior running
the system test.
1. Business understanding
a. High Level Test Approach
b. Test Estimation
c. Review Business Specification
d. Attend Business Specification and Technical Specification
e. Walkthroughs
8. Deployment
a. Validating the business rule in the production environment.
This topic is very important for a test engineer who is responsible for
writing the test cases.
There are certain types of checks that can be done on the data under
review:
1. Attribute check
2. Current Row check
3. Duplicate check
4. Original Key check
5. Reconciliation check
6. Relationship check
1. Attribute check
Attribute check means verifying that the data is moving correctly from
source table to target table.
3. Duplicate check
Duplicate check means checking that there are no duplicate values for
columns that are required to be unique.
5. Reconciliation check
Reconciliation check means verifying that the number of rows in target
and the number of rows coming from source are the same.
6. Relationship check
Relationship check means checking that every primary key value in child
table is present in parent table.
Query Format
Given below are the formats for writing SQL queries to perform all types
of checks.
1. Attribute check
Select count(1)
From( Select source table attributes
From source table
Where list of conditions
Except
Select corresponding target table attributes
From target table
Where list of conditions
)alias(alternate name)
In the above query, we are first retrieving all the attributes from source
table which are mapped to target and then removing from this list all
the attributes that are present in target table.
So the result count should be zero, meaning that all the attributes that
are present in source table are present in target table and the test case
can be passed.
Select count(1)
From( Select records
From table_1
Where list of conditions(records with current time stamp but
having indicator N)
)alias
In the above query, we are retrieving those records which have current
timestamp but still their indicator is ‘N’.
So if the result count is zero, it means that there are no such records who
are current but have an indicator of being old and the test case can be
passed.
3. Duplicate check
Select count(1)
From( Select attribute_list_1
From table_1
Where list of conditions
Group by attribute_list_1
Having count(1)>1
)alias
In the above query, we are retrieving the attributes which are supposed
to be unique and then grouping them in the same order in which they
were retrieved.
This will group all the records which have these attributes duplicated
and so the count will be greater than 1 for such records.
When we take the count of such duplicate records and we get zero
output, then this shows that there are no duplicate values for unique
columns and the test case can be passed.
Select count(1)
From table
Where list of conditions
And (any of NOT NULL values are NULL)
In the above query, we are retrieving all the records which have any of
the NOT NULL columns as NULL and then taking count of it.
If the count is zero, this means there are no such records and the test
case can be passed.
5. Reconciliation check
Select count(*)
From source table
Where list of conditions
Select count(*)
From target table
Where list of conditions
In the above check, there are two queries, one fetching the count of
total number of records in source table and the other fetching the count
of total number of records in target table.
If both the counts are same, this means that there are equal number of
records in source and target and the test case can be passed.
6. Relationship check
Select count(child_id)
From( Select parent_attribute_to_be_checked parent_id,
Child_attribute_to_be_checked child_id
From( Select distinct attributes from child table
Left outer join
Select distinct attributes from parent table
On join conditions
)
)
Where parent_id IS NULL
In the above query, we are retrieving all the records in target table
which has no parent in source table and then taking its count.
If the count is zero, this means that there are no such records and the
test case can be passed.
Checking lookup condition is the most common example for this type of
check.
Example
Attribute check: In target table, 3 columns are mapped from source table
which have their own individual transformations.
We have to test each attribute that is present in the target table keeping
aside the other attributes in source table which are not mapped.
Original key check: In target table attribute ROLL_NO is the primary key.
So it has to be NOT NULL.
We have to test whether this attribute has values for all the records or not.
Expected output: count from 1st query = count from 2nd query
1. Data access tools (e.g., TOAD, WinSQL) are used to analyze content
of tables and to analyze results of loads.
All the best for your future as a data warehouse or database tester!!!!!!!!!!!