Vous êtes sur la page 1sur 12

03 | Managing Data Quality

Graeme Malcolm | Data Technology Specialist, Content Master


Pete Harris | Learning Product Planner, Microsoft
Data Quality Services
Data Cleansing
Data Matching

Module Overview
The Data Quality Problem
Business decisions rely on trusted data
Data quality issues in data sources can lead to inaccurate
reporting and analysis
Invalid data values (e.g. Californa)
Inconsistencies ( e.g. California and CA
Duplicate business entities (e.g. Jim Corbin, James Corbin, J Corbin)
SQL Server DQS is a knowledge-based solution for:
Data Cleansing
Data Matching

Knowledge Bases and Domains
Knowledge base:
Repository of knowledge about data
Domain validation rules
Matching policies
Domains:
Specific to a data field (or a composite field)
Contain values and validation rules for members
Valid (e.g. California and CA for a State domain)
Invalid (e.g. 90261 for a State domain
Error (e.g. Californa for a State domain)
Define rules to correct values to leading values

Demo: Creating a Knowledge Base
In this demonstration, you will see how to:
Create a Knowledge Base
Perform Knowledge Discovery
Perform Domain Management

Data Cleansing Projects
1. Select a knowledge base
2. Map data columns to
domains
3. Review suggestions and
corrections
4. Export results

Demo: Cleansing Data
In this demonstration, you will see how to:
Create a Data Cleansing Project
View Cleansed Data

Cleansing Data in SSIS
SSIS includes a DQS Cleansing Transformation

Data Matching
Define matching rules for business entities in a
matching policy
Rules match entities based on domains:
Similarity: Similar or exact match
Weight: Percentage to apply if match succeeds
Prerequisite: Mandatory domain match for rule to succeed
If the combined weight of all matches meets or exceeds the rules
minimum matching score, the entities are duplicates

Demo: Matching Data
In this demonstration, you will see how to:
Create a Matching Policy
Create a Data Matching Project
View Data Matching Results

Module Summary
Create a Knowledge Base for your data
Use data cleansing to ensure consistent, correct domain values
Use data matching to identify duplicate data entities
2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the
U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft
must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Vous aimerez peut-être aussi