Vous êtes sur la page 1sur 39

Data Quality Management: Methods and Tools

A Seminar Report

submitted in partial fulfillment of the requirements


for the award of the degree of

Bachelor of Technology

in

COMPUTER ENGINEERING

by

Mr. Tage Nobin

Under the guidance of

Prof. L.D. Netak

DEPARTMENT OF COMPUTER ENGINEERING


DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY
Lonere-402 103, Tal. Mangaon, Dist. Raigad (MS)
INDIA

April, 2010
Certificate

The seminar report entitled Data Quality Management: Methods and Tools sub-
mitted by Mr. Tage Nobin (20070653) is approved for the partial fulfillment of the
requirement for the award of the degree of Bachelor of Technology in Computer Engi-
neering.

Prof L.D. Netak Prof L.D. Netak


Guide Head
Dept. of Computer Engineering Dept. of Computer Engineering

External Examiner(s)

1. (Name: )

2. (Name: )

Place:Dr.Babasaheb Ambedkar Technological University, Lonere.

Date: 15/05/2010
Acknowledgements

This seminar report is a result of intense effort of many people whom I need to thank for
making this a reality. I thus express my deep regards to all those who have offered their
assistance and suggestions.

I am grateful to my seminar guide Prof. L.D.Netak, for making this work possible.
No word of thanks is enough for his mentorship, guidance, support and patience. I must
acknowledge the freedom he gave me in pursuing topics that I found interesting. His
resourcefulness, influence and keen scientific intuition were also vital to the progress of
this work and for these I am deeply thankful.

Finally, I would like to thank all whose direct and indirect support helped me in com-
pleting the seminar in time.

Mr. Tage Nobin (20070653)


Abstract

Defective data is one of the serious problems pertaining to data world. Business suc-
cess is becoming ever more dependent on the accuracy and integrity of mission critical
data resources. As data volume increases, the question of internal consistency within
data becomes paramount, regardless of fitness for use for any external purpose. Different
methods and tools are being used for the maintenance of data quality as per the condition
and the situation.

This paper describes major data quality problems, requirements and common strate-
gies to manage data quality in systems related to data. It also explains the importance
of data quality management with special spotlight addressing the management issues of
data quality and various methods and tools that can be used and implemented compre-
hensively in the management of data quality.

i
Contents

1 Introduction 1

2 What is Data Quality Management 4

3 Data Quality Definition (Rules and Targets) 6


3.1 Importance of Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Data Quality Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Data Quality Management Challenges 10

5 Design Quality Improvement Process 13


5.1 Data Quality Management Objective . . . . . . . . . . . . . . . . . . . . 13

6 Implement Quality Improvement Process (Methods and Tools) 15


6.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6.1.1 Data Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.1.2 Data Cleansing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.1.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.1.4 Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2.1 Data Auditing Tools . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2.2 Data Cleansing Tools . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2.3 Data Migration tool . . . . . . . . . . . . . . . . . . . . . . . . . 24

ii
7 Basic Tools of Data Quality 26

8 Monitor Data Quality 29


8.1 Monitoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Bibliography 32

iii
List of Figures

3.1 Data Quality Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5.1 Radial Cycle of Data Quality Process . . . . . . . . . . . . . . . . . . . . 14

6.1 Data Quality Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

8.1 Monitor System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

iv
Chapter 1

Introduction

SIX HUNDRED BILLION DOLLARS ANNUALLY! Yes that is what poor data quality
costs American businesses, according to the Data Warehousing Institute. What about
the whole world? Ensuring high level data quality is one of the most expensive and time-
consuming tasks to perform in data warehousing projects. Data quality management is
the field being the one in which all kinds of data- raw or processed are managed. It is
one of the major fields yet to be rolled on in a serious way.

The main topics covered are as follows:

1. What is Data Quality Management?

This is a key first step, as the ability to understand the up-front level of data
quality management will form the foundation of the domain rules and processes
that will be put in place. Without performing an upfront assessment, the ability
to effectively implement a data quality strategy will be negatively impacted. From
an ongoing perspective, the data quality management will allow an organization
to see how the data quality procedures put in place have caused the quality of the
data to improve.

1
2. Data Quality Definition (Rules and Targets)
Once the initial data quality assessment is complete, the second part of the process
involves defining what exactly we mean by data quality. From an ongoing perspec-
tive, this phase will involve describing the various attributes of a quality data that
will make the whole process much easier. Performing trend analyses on the data
and the rules in place to ensure the data rules are adhered to and the target is
focused on.

3. Data Quality Management Challenges


Deploying a data quality management program is not easy; there are significant
challenges that must be overcome. Herein we shall discuss the various problems
and challenges that may act as an obstacle in the way of quality data. Be it of
serious nature or mild one, but these challenges need to be overcome so as to develop
a quality data.

4. Design Quality Improvement Processes


This phase involves designing of data quality management process architecture.
With various ways on the field, we need to choose the best way of managing process
that would make the data of finest quality.

5. Implement Quality Improvement Processes (Methods and Tools)


Once the designing has been standardized, the next phase of the enhancement pro-
cess involves the actual implementation of the various methods and tools designed.
Since data quality is an iterative process, the rules to manage and regulate will
come in handy.

6. Monitor Data Quality


The ability to monitor the data quality processes is critical, as it provides the
organization with a quick snapshot of the health of the data within the organization.
Through analysis of the data quality scorecard results, we can have the information
needed to confidently make additional modifications to the data quality strategies

2
in place if needed. Conversely, the scorecards and trend analysis results can also
make sure that data quality is being effectively addressed within the organization.

3
Chapter 2

What is Data Quality Management

First of all we will be dealing with what exactly is ’Data’, ’Quality’ and ’Manage-
ment’. As Oxford dictionary defines

• Data - ”a collection of facts from which conclusions may be drawn”

• Quality - ”the standard of how good something is measured against other similar
things”. In context of data it quality may be defined as ”fitness for use”.

• Management - ”the action of managing something”.

In a broad spectrum, data quality management entails the establishment and de-
ployment of roles, responsibilities, policies, and procedures concerning the acquisition,
maintenance, dissemination, and disposition of data. Actually the perfect definition de-
pends upon the organization. For, successful data quality management, the solution
must include techniques, processes, methods and tools. Data quality management life-
cycle must be clearly defined using continuous as well as iterative frameworks.

Data quality must be designed into systems using proven engineering principles.
Data quality is too often left to chance or given only superficial attention in the de-
sign of information systems. While good engineering principles are sometimes applied to
software development, data quality is usually left up to the end user. Applying engineer-
ing principles to data quality involves understanding the factors that affect the creation

4
and maintenance of quality data. It is helpful to look at data as the output of a data
manufacturing process.

5
Chapter 3

Data Quality Definition (Rules and


Targets)

3.1 Importance of Data Quality


First off, data is an essential component of most of today’s business processes. In cus-
tomer facing functions, it is the foundation to managing customer relationships. Without
good data quality, it will be difficult to get accurate report metrics, and it will also waste
user’s time and efforts. Bad data will also incur costs. One of the biggest risks of bad
data quality is that it ultimately inhibits adoption, as users get frustrated and lose trust
in the data. We need to know the fact that it isn’t quality that is expensive; it’s the
cost of ”unquality”. Examples of the cost of ”unquality” include the cost of sending
duplicate promotional materials because customers are duplicated in the database, the
opportunity cost of not sending materials to the right customers because the data used to
segment customers is flawed, the opportunity cost of not shipping products to a customer
because of inaccurate information about inventory levels, and the time spent finding and
reconciling data needed to make effective decisions.

Data quality is evidenced by valid and reliable data; therefore planning in the early
stages about the clear concept of the need and definition is well worth the investment

6
of time and resources. Data in a database has no actual value (or even quality); it only
has potential value. Data has realized value only when someone uses it to do something
useful. As mentioned earlier there is no fixed global definition of high quality data. Data
quality don’t restrict itself to a particular concept, instead differs as domain and appli-
cation changes. Whatever be the domain, there are some particular parameters which
are essential to be gratified in order to make the data a real quality data. Contrary to
popular belief, quality is not necessarily zero defects. Quality is conformance to valid
requirements.

3.2 Data Quality Attributes


To be fit for use, data products must possess all three attributes of quality:

1) Utility - refers to the usefulness of the information for its intended users.

2) Objectivity - refers to whether information is accurate, reliable, and unbiased, and


is presented in an accurate, clear, and unbiased manner.

3) Integrity - refers to the security or protection of information from unauthorized access


or revision.

All the above three attributes may be further defined in terms of seven dimensions of
data quality:

Relevance- refers to the degree to which our data products provide information that
meets the customer’s needs.

Accuracy- refers to the difference between an estimate of a parameter and its true value.

Timeliness- refers to the length of time between the reference periods of the infor-
mation and when we deliver the data product to customers.

7
Accessibility- refers to the ease with which customers can identify, obtain, and use
the information in data products.

Interpretability- refers to the availability of documentation to aid customers in un-


derstanding and using our data products. This documentation typically includes: the
underlying concepts; definitions; the methods used to collect, process, and analyze the
data; and the limitations imposed by the methods used.

Transparency- refers to providing documentation about the assumptions, methods,


and limitations of a data product to allow qualified third parties to reproduce the infor-
mation, unless prevented by confidentiality or other legal constraints.

Completeness- refers to the degree to which values are present in the attributes that
require them.

8
Figure 3.1: Data Quality Attributes

9
Chapter 4

Data Quality Management


Challenges

Data is impacted by numerous processes, most of which affect its quality to a certain
degree. It is imperative that the issue of data quality be addressed if the data warehouse
is to prove beneficial to an organization. The information in the dataware house has the
potential to be used by an organization to generate greater understanding of their cus-
tomers, processes, and the organization itself. There potential to increase the usefulness
of data by combining it with other data sources is great. But, if the underlying data
is not accurate, any relationships found in the data warehouse will be misleading. As
Wyatt Earp (Data-Base Expert) said Fast is fine, but accuracy is everything.

Resolving data quality problems is often the biggest effort in a data mining study.
50-80 percent of time in data mining projects spent on DQ. Because it’s (data) in the
computer, it doesn’t mean its right.

10
Figure 4.1: Data Flow

Data/information is not static, it flows between data collection and usage process.

The main problem herein is that problems can and do arise at all of these stages which
makes the need of End-to-End continuous monitoring. This process indeed becomes a
herculean task.

There are various factors that Influence Data Quality:

1. Data Control

2. Data Age

3. Data Types

4. Device Availability

5. Data Structure

6. Read/Write Management

7. Communication Timing

11
These factors matters a lot to development of a quality data. Any error up to 1 percent
of data may impact findings and result.
There are numerous problems which arise in wake of data quality. Some of them are of
small nature and some of them big. Whatever be the nature, ignoring any such kind of
matter may prove to be costly affair.
Some of them listed:

- Much of the raw data is of poor quality. This is because of incorrect data gathering
and data operations. This leads to inaccurate assessment of the data.

- The above mentioned fact results in data being costly to diagnose and assess.

- Consequence of which is the data becomes costly to repair.

- Many of the costs involved are hidden and hard to quantify. This makes the assessment
a tough task.

- Inconsistent data between different systems. Since data flows between different sys-
tems, any obstacle in the smooth transition may lead to a total data failure.

- Most of the attributes of a quality data are extremely difficult to measure sometimes
impossible.

- They are of vague nature. The conventional definitions provide no guidance towards
practical improvements of the data.

- The priority for metadata is undermined. Setting standards for metadata is very
important.

- Data quality management requires cross-functional cooperation.

- It is perceived to be extremely manpower-intensive.

- There are various other systematic errors which can be attributed to lack of resources
and skills.

12
Chapter 5

Design Quality Improvement


Process

The quality of any data statistics disseminated by an agency depends on two aspects:
the quality of statistics received, and the quality of internal processes for the collection,
processing, analysis and dissemination of data and Meta data.

5.1 Data Quality Management Objective


Typical objectives of a data quality management program include:

• Eliminate redundant data

• Validate data at input

• Eliminate false null values

• Ensure data values fall within defined domains

• Resolve conflicts in data

• Ensure proper definition and use of data values

• Establish and apply standards.

13
Figure 5.1: Radial Cycle of Data Quality Process

Here in we develop a basic strategy by combining all the above steps: Prelimi-
nary Problem Definition, followed by Analysis, Improvement and monitor steps for each
problem.

14
Chapter 6

Implement Quality Improvement


Process (Methods and Tools)

Designing the quality management process is clearly not the end of the data quality
effort. Just identifying issues does nothing to improve things. The issues need to drive
changes that will improve the quality of the data for the eventual users. We see process
improvement fundamentally as a way of solving problems. If there is not an apparent or
latent problem, process improvement is not needed. If there is any problem, howsoever
intangible, one or more processes need to be improved to deal with the problem. Once
you sense a problem, good problem solving technique involves alternating between the
levels of thought and experience.

6.1 Methods
1. Profiling

2. Cleansing

3. Data Integration/Consolidation

4. Data Augmentation

15
Figure 6.1: Data Quality Methods

16
6.1.1 Data Profiling
It can be defined as use of analytical techniques on data for the purpose of developing
a thorough knowledge of its content, structure and quality. It is a process of developing
information about data instead of information from data.

The purpose of these statistics may be to:

1. Find out whether existing data can easily be used for other purposes.

2. Improve the ability to search the data by tagging it with keywords, descriptions or
assigning it to a category.

3. Give metrics on data quality, including whether the data conforms to particular
standards or patterns.

4. Assess the risk involved in integrating data for new applications.

5. Assess whether metadata accurately describes the actual values in the source database.

6. Develop a master data management process for data governance for improving data
quality.

6.1.2 Data Cleansing


Data cleansing or data scrubbing is the act of detecting and correcting (or removing)
corrupt or inaccurate records from a record set, table, or database. Used mainly in
databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant etc.
parts of the data and then replacing, modifying or deleting this dirty data.
After cleansing, a data set will be consistent with other similar data sets in the system.
The inconsistencies detected or removed may have been originally caused by different
data dictionary definitions of similar entities in different stores, may have been caused by
user entry errors, or may have been corrupted in transmission or storage. Data cleansing
differs from data validation in that validation almost invariably means data is rejected

17
from the system at entry and is performed at entry time, rather than on batches of data.
The process of data cleansing are

• Data Auditing: The data is audited with the use of statistical methods to detect
anomalies and contradictions. This eventually gives an indication of the character-
istics of the anomalies and their locations.

• Workflow specification: The detection and removal of anomalies is performed by


a sequence of operations on the data known as the workflow. It is specified after
the process of auditing the data and is crucial in achieving the end product of high
quality data. In order to achieve a proper workflow, the causes of the anomalies
and errors in the data have to be closely considered. If for instance we find that an
anomaly is a result of typing errors in data input stages, the layout of the keyboard
can help in manifesting possible solutions.

• Workflow execution: In this stage, the workflow is executed after its specification is
complete and its correctness is verified. The implementation of the workflow should
be efficient even on large sets of data which inevitably poses a trade-off because the
execution of a data cleansing operation can be computationally expensive.

• Post-Processing and Controlling: After executing the cleansing workflow, the re-
sults are inspected to verify correctness. Data that could not be corrected during
execution of the workflow are manually corrected if possible. The result is a new
cycle in the data cleansing process where the data is audited again to allow the
specification of an additional workflow to further cleanse the data by automatic
processing.

6.1.3 Data Augmentation


The term data augmentation refers to methods for constructing iterative algorithms
via the introduction of unobserved data or latent variables. In general, however, con-
structing data augmentation schemes that result in both simple and fast algorithms is
a matter of art in that successful strategies vary greatly with the observed-data models

18
being considered.

6.1.4 Data Integration


Data integration involves combining data residing in different sources and providing
users with a unified view of these data. This process becomes significant in a variety of
situations both commercial (when two similar companies need to merge their databases)
and scientific (combining research results from different repositories, for example). Data
integration appears with increasing frequency as the volume and the need to share existing
data explodes.

6.2 Tools
It is commonly accepted that data quality tools can be grouped according to the
part of a data quality process they cover. Data profiling and analysis assist in detecting
data problems. Data transformation, data cleaning, duplicate elimination and data en-
hancement propose to solve the discovered or previously known data quality problems.
Data quality tools generally fall into one of three categories:

1. Auditing

2. Cleansing

3. Migration

6.2.1 Data Auditing Tools


Data auditing tools enhance the accuracy and correctness of the data at the source.
These tools generally compare the data in the source database to a set of business rules.
When using a source external to the organization, business rules can be determined by
using data mining techniques to uncover patterns in the data. Business rules that are
internal to the organization should be entered in the early stages of evaluating data

19
sources. Lexical analysis may be used to discover the business sense of words within the
data. The data that does not adhere to the business rules could then be modified as
necessary.

Data Analysis:

Activities that enclose the statistical evaluation, the logical study of data values and the
application of data mining algorithms in order to define data patterns and rules to ensure
that data does not violate the application domain constraints. The set of commercial
and research tools that provide data analysis techniques are as follows:

Commercial- dfPower
ETLQ
Migration
Architect
Trillium
WizWhy

Research- Potter’s Wheel


Ken State
University Tool

Data Profiling:

Process of analyzing data sources with respect to the data quality domain, to identify and
prioritize data quality problems. Data profiling reports on the completeness of datasets
and data records organize data problems by importance; output the distribution of data
quality problems in a dataset, and lists missing values in existing records. The identifica-
tion of data quality problems before starting a data cleaning project is crucial to ensure
the delivery of accurate information. The following set of commercial and research tools
implement data profiling techniques:

20
Commercial- dfPower
ETLQ
Migration
Architect
Trillium
WizWhy

Research- Ken State University Tool

6.2.2 Data Cleansing Tools


Data cleansing tools are used in the intermediate staging area. The tools in this category
have been around for a number of years. A data cleansing tool cleans names, addresses
and other data that can be compared to an independent source. These tools are responsi-
ble for parsing, standardizing, and verifying data against known lists. The data cleansing
tools contain features which perform the following functions:

• Data parsing (elementizing) - breaks a record into atomic units that can be used
in subsequent steps. Parsing includes placing elements of a record into the correct
fields.

• Data standardization - converts the data elements to forms that are standard
throughout the data warehouse.

• Data correction and verification- matches data against know lists.

• Record matching- determines whether two records represent data on the same sub-
ject.

• Data transformation- ensures consistent mapping between source systems and data
warehouse.

21
• House-holding - combining individual records that have the same address.

• Documenting - documenting the results of the data cleansing steps in the meta
data.

Data cleaning:

The act of detecting, removing and/or correcting dirty data. Data cleaning aims not only
at cleaning up the data but also to bring consistency to different sets of data that have
been merged from separate databases. Sophisticated software applications are available
to clean data using specific functions, rules and look-up tables. In the past, this task was
done manually and therefore subject to human error. The following set of commercial
and research tools implement data cleaning techniques:

Commercial- DataBlade
dfPower
ETLQ
ETI*DataCleanser
Firstlogic
NaDIS
QuickAddress Batch
Sagent
Trillium
WizRule

Research- Ajax
Arktos
FraQL

22
Duplicate elimination:

The process that identifies duplicate records (referring to the same real entity) and merges
them into a single record. Duplicate elimination processes are costly and very time
consuming.
They usually require the following steps:

(i) to standardize format discrepancies;

(ii) to translate abbreviations or numeric codes;

(iii) to perform exact and approximate matching rules and

(iv) to consolidate duplicate records.

The set of commercial and research tools that provide duplicate elimination techniques
is presented below:

Commercial- Centrus Merge/Purge


ChoiceMaker
DataBlade
DeDupe
dfPower
DoubleTake
ETLQ
ETI*DataCleanser
Firstlogic
Identity SearchvServer
MatchIT
Merge/Purge Plus

Research- Ajax
Flamingo Project

23
FraQL

Data enrichment (also known as data enhancement):

The process of using additional information from internal or external data sources to im-
prove the quality of the input data that was incomplete, unspecific or outdated. Postal
address enrichment, geocoding and demographic data additions are typical data enrich-
ment procedures. The set of commercial and research data enrichment tools are listed
below:

Commercial- DataStage
dfPower
ETLQ
Firstlogic
NaDIS
QuickAddress Batch
Sagent
Trillium

Research- Ajax

6.2.3 Data Migration tool


The third type of tool, the data migration tool, is used in extracting data from a source
database, and migrating the data into an intermediate storage area. The migration tools
also transfer data from the staging area into the data warehouse. The data migration
tool is responsible for converting the data from one platform to another. A migration
tool will map the data from the source to the data warehouse. There can be a great
deal of overlap in these tools and many of the same features are found in tools of each
category.

24
Data transformation:

The set of operations (schema/data translation and integration, filtering and aggregation)
that source data must undergo to appropriately fit a target schema. Data transforma-
tions require metadata, such as data schemas, instance-level data characteristics, and
data mappings.
The set of commercial and research tools that can be classified as data transformation
tools is the following:

Commercial- Data Integrator


DataFusion
DataStage
dfPower
ETLQ
Hummingbird ETL
Firstlogic
Informatica ETL
SQL Server
Trillium

Research- Ajax
Arktos
Clio
FraQL
Potter’s Wheel
TranScm

25
Chapter 7

Basic Tools of Data Quality

1. Fishbone Diagram

Fishbone diagrams are diagrams that show the causes of a certain event. Common
uses of the Fishbone diagram are product design and quality defect prevention, to
identify potential factors causing an overall effect. Each cause or reason for imper-
fection is a source of variation. Causes are usually grouped into major categories
to identify these sources of variation.

2. Flow Chart

A flowchart identifies the sequence of activities or the flow of materials and in-
formation in a process. There is no precise format, and the diagram can be drawn
simply with boxes, lines, and arrows. Flowcharts help the people involved in the
process understand it much better and more objectively by providing a picture of
the steps needed to accomplish a task.

3. Histogram and bar chart

26
Histograms provide clues about the characteristics of the parent population from
which a sample is taken. Patterns that would be difficult to see in an ordinary
table of numbers become apparent.
Bar Chart is series of bars representing the frequency, e.g. number of times yes/no.

• Displays large amounts of data that are difficult to interpret in tabular form.

• Shows centering, variation, and shape.

• Illustrates the underlying distribution of the data.

• Provides useful information for predicting future performance.

4. Scatter diagram

It is a plot of two variables showing whether they are related.

• Supplies the data to confirm a hypothesis that two variables are related.

• Provides both a visual and statistical means to test the strength of a rela-
tionship.

• Provides a good follow-up to cause and effect diagrams.

5. Run Chart

Run charts show the performance and the variation of a process or some quality or
productivity indicator over time in a graphical fashion that is easy to understand
and interpret. They also identify process changes and trends over time and show
the effects of corrective actions.

27
• Monitors performance of one or more processes over time to detect trends,
shifts, or cycles.

• Allows a team to compare performance before and after implementation of a


solution to measure its impact.

• Focuses attention on truly vital changes in the process.

6. Control Chart

Control charts, also known as Shewhart charts or process-behaviour charts, in sta-


tistical process control are tools used to determine whether or not a manufacturing
or business process is in a state of statistical control.

7. Process chart

It is an organized way of recording all the activities performed by a person, by


a machine, at a workstation, with a customer, or on materials. Codes can be
applied such as for operations, transport, inspection, delay, storage against e.g.
Numbered steps, time, distance and step description.

28
Chapter 8

Monitor Data Quality

Monitoring data quality is an important sub-aspect of the Data Quality Life Cycle.
It is based on the specified goals and rules and therefore on the current quality level
obtained after the initial analysis carried out on the basis of data profiling and the initial
cleansing of your data. Monitoring is not an end in itself but more or less serves as a sensor
for data quality weaknesses, before they make themselves felt in the destination system.
We can undertake the monitoring function by orienting itself towards the defined data
quality initiatives and general instructions as well any changes which may be required.

8.1 Monitoring System


We can develop a simple 3-step monitoring system.
This is a model for monitoring the process of data quality. Ultimately, data quality moni-
toring are based on a well-understood set of metrics which provides important knowledge
about the value of the data in use. First of all these metrics need to be in right order.
Data quality must be tracked, managed, and monitored if it is to improve business ef-
ficiency and transparency. Therefore, being able to measure and monitor data quality
throughout the lifecycle and compare the results overtime is an essential ingredient in
the proactive management of ongoing data quality improvement and data governance.

29
Figure 8.1: Monitor System

30
Conclusion
This paper on Data Quality Management System expresses the basic tasks of the man-
agement in the field of techniques, tools and improvement of data quality. Organizations
seeking relief from data problems often turn to technology for help. This is not the most
effective solution. Data quality is a behavioral problem, not a technology problem. To
solve the data quality problem organizations need to change user behavior. A comprehen-
sive program based on prevention, detection, correction and accountability is required.
Deploying a data quality management program is not an easy task, but the rewards are
enormous. Deploying a disciplined approach to managing data as an important asset will
better position an organization to improve the productivity of its information workers
and to better serve its customers.

Strong frameworks and process are imperative for controlling data quality and for manag-
ing data, the most important asset of an organization. Additional validation procedures
such as exception analysis and data reconciliation ensure high success rates in migration-
related initiatives. The challenges associated with data quality control initiatives can be
effectively handled by implementing the recommended framework and process to control
data quality. Maintaining data quality is no longer an option, particularly in today’s
competitive and regulatory climate. With this in place, the six-phase program can be
effectively pursued for the management of data quality.

31
Bibliography

[1] Thomas Korner


“Handbook on Data Quality Assessment Methods and Tools”, 2005, 3th Edition.

[2] Yang W. Lee


“Total Data Quality Management: The case of IRI”,2001

[3] Suzanne M. Embury


“Data Quality Control”,1999

[4] Theodore Johnson


“Data-Quality& Data-Cleaning-An-Overview”,2006

[5] http://www.sap.com/management/data_quality_management/index.epx

[6] http://en.wikipedia.org/wiki/data_quality.html

[7] http://www.tricare.mil/imp/GuidelinesOnDataQualityManagement.html

32

Vous aimerez peut-être aussi