Graeme Malcolm | Data Technology Specialist, Content Master
Pete Harris | Learning Product Planner, Microsoft Data Quality Services Data Cleansing Data Matching
Module Overview The Data Quality Problem Business decisions rely on trusted data Data quality issues in data sources can lead to inaccurate reporting and analysis Invalid data values (e.g. Californa) Inconsistencies ( e.g. California and CA Duplicate business entities (e.g. Jim Corbin, James Corbin, J Corbin) SQL Server DQS is a knowledge-based solution for: Data Cleansing Data Matching
Knowledge Bases and Domains Knowledge base: Repository of knowledge about data Domain validation rules Matching policies Domains: Specific to a data field (or a composite field) Contain values and validation rules for members Valid (e.g. California and CA for a State domain) Invalid (e.g. 90261 for a State domain Error (e.g. Californa for a State domain) Define rules to correct values to leading values
Demo: Creating a Knowledge Base In this demonstration, you will see how to: Create a Knowledge Base Perform Knowledge Discovery Perform Domain Management
Data Cleansing Projects 1. Select a knowledge base 2. Map data columns to domains 3. Review suggestions and corrections 4. Export results
Demo: Cleansing Data In this demonstration, you will see how to: Create a Data Cleansing Project View Cleansed Data
Cleansing Data in SSIS SSIS includes a DQS Cleansing Transformation
Data Matching Define matching rules for business entities in a matching policy Rules match entities based on domains: Similarity: Similar or exact match Weight: Percentage to apply if match succeeds Prerequisite: Mandatory domain match for rule to succeed If the combined weight of all matches meets or exceeds the rules minimum matching score, the entities are duplicates
Demo: Matching Data In this demonstration, you will see how to: Create a Matching Policy Create a Data Matching Project View Data Matching Results
Module Summary Create a Knowledge Base for your data Use data cleansing to ensure consistent, correct domain values Use data matching to identify duplicate data entities 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint