Vous êtes sur la page 1sur 17

Informatica Data

Quality Technical
Design Document
CHRISTUS Health

Table of contents
DOCUMENT CONTROL.....................................................................2
INTRODUCTION..............................................................................3
OUT OF SCOPE...............................................................................4
ENVIRONMENT DETAILS..................................................................6
HIGH LEVEL DATA FLOW DIAGRAM..................................................7
DATA QUALITY MONITORING...........................................................8
PROFILING.....................................................................................9
SCORECARDS...............................................................................11
REFERENCE MANAGEMENT............................................................13
DATA QUALITY RULES...................................................................14
WEB SERVICE...............................................................................15

Document Control
Version History
Version

Date

Author

Comment
s

1.0

05/15/201
6

Bharat
Sain

Initial
Draft

Reviewer
The following are the list of reviewers for this document.
Name
May-Law,
Michelle
Williams, Shirly

Review
Date

Notes

Introduction
This document was created to support data quality requirement at CHRISTUS Health to
measure and improve the quality of the source data on an on-going basis for all three
different sources namely Meditech, Cerner, and Athena. Also, ensure that data
dependent business processes and applications deliver expected results.
Using the Developer / Analyst tool to design and run processes that achieve the
following objectives:

Profile data. Profiling reveals the content and structure of each of the source system.

Create scorecards to review data quality. A scorecard is a graphical representation of


the quality measurements in a profile.

Standardize data values wherever it is applicable. Standardize data to remove errors


and inconsistencies that can be found while running a profile.

Parse records. Parse data records to improve record structure and derive additional
information from your data. For eg: Phone number, SSN column values can split a
single field of freeform data into fields that contain different information types.

Validate postal addresses. Address validation evaluates and enhances the accuracy
and deliverability of your postal address data. Address validation corrects errors in
addresses and completes partial addresses by comparing address records against
reference data from national postal carriers.

Create data quality rules. Informatica provides many pre-built rules that you can run
or edit to suit your project objectives. You can create rules in the Developer tool.

Collaborate with Informatica users. The rules and reference data tables you add to
the Model repository are available to users in the Developer tool and the Analyst tool.
Users can collaborate on projects, and different users can take ownership of objects
at different stages of a project.

Export objects into PowerCenter. You can export objects into PowerCenter to reuse
the metadata for physical data integration or to create web services.

Out of Scope

No exception ttransformation and mapping requirement using IDQ tool:


The Exception transformation was created as part of the POC to write Bad / exception
records in to bad record table. It also routes Good and Rejected records to the
aappropriate data object. The mapping is run as part of a Workflow to populate
exception tasks in the analyst using human task.
Exceptions are taken care in related ETL process that leverage DQ_SCORE_IND values to
determine DQ score and write into target CEDW object.
DQ Mapping

DQ Workflow

Login into analyst to see task waiting for you.

Corrected and Accepted data.

Table with updated value:

No Match requirement using IDQ tool - Identify duplicate records in your data
using a variety of matching techniques.

No Consolidate requirement using DQ tool - Automatically or manually


consolidate matched records. Consolidate the matched records to create a master
record based using the Consolidation Transformation.

Environment Details
DEV

TEST

PROD

MRS

IDQ_MRS_CHR_DEV
961

IDQ_MRS_CHR_TEST96
1

IDQ_MRS_CHR_PROD96
1

Domain

DOM_CHR_DEV961

DOM_CHR_TEST961

DOM_CHR_PROD961

Host
Name

ICASM544

icasm545

icasm543

Port

6005

6005

6005

High Level Data Flow Diagram

Data Quality Monitoring

Six Dimensions of Data Quality

Profiling
A profile is a repository object that finds and analyzes all data irregularities across data
sources in the enterprise and hidden data problems that put data projects at risk. Running a
profile on any data source in the enterprise gives you a good understanding of the strengths
and weaknesses of its data and metadata.
Create and run a profile to find the content, quality, and structure of data sources of an
application, schema, or enterprise. The data source content includes value frequencies and
datatypes. The data source structure includes keys and functional dependencies.
You can use the Analyst tool and Developer tool to analyze the source data and metadata.
Analysts and developers can use these tools to collaborate, identify data quality issues, and
analyze data relationships. Based on your job role, you can use the capabilities of either the
Analyst tool or Developer tool. The degree of profiling that you can perform differs based on
which tool you use.
You can perform the following tasks in the Developer tool and Analyst tool:
Perform column profiling. The process includes discovering the number of unique
values, null values, and data patterns in a column.
Add rules to column profiles.
Curate the inferred datatypes in the profile results.
Use scorecards to monitor data quality.
Generate a mapping from a profile.

MEDITECH

CERNER

ATHENA

Scorecards
A scorecard is the graphical representation of the valid values for a column or output of a
rule in profile results. Use scorecards to measure data quality progress. You can create a
scorecard from a profile and monitor the progress of data quality over time.

Reference Management
Create a reference table in the design workspace of the Informatica Data Analyst tool.
Reference table is created using unmanaged table with editable option in the version 10.
Note - Managed reference table option did not worked at CHRISTUS Health as IDA tool
reference management creates its own reference table naming in case of any deletion and
this was an issue for any ETL process that this object as source for extracting any reference
data information.
Databased tables created and reference tables are created from a database table by
creating a metadata object in the Model repository. Below are ddls for the same.
DDLs script:
PROD

TEST

DEV

PROD_ddl_reference
Tables.sql

TEST_ddl_reference
Tables.sql

DEV_ddl_referenceT
ables.sql

Once reference table is created, you can edit data and view an audit trail of the changes
that users made to a reference table. Use the Audit Trail view on the reference table to view
the audit trail events.
Below are list of all reference tables

Data Quality Rules


A rule is business logic that defines conditions applied to data when you run a profile. Use
rules to further validate the data in a profile and to measure data quality progress.
You can add a rule when you create a profile. You can reuse rules created in either the
Analyst tool or Developer tool in both the tools. Add rules to a profile by selecting a reusable
rule or create an expression rule. An expression rule uses both expression functions and
columns to define rule logic. After you create an expression rule, you can make the rule
reusable.
Create expression rules in the Analyst tool. In the Developer tool, you can create a mapplet
and validate the mapplet as a rule. You can run rules from both the Analyst tool and
Developer tool.
Below are dq rules are created to be used in within profiles and ETL process. Also, master
list xls sheet.

Master_List_of_DQ_
Rules.xlsx

DQ_SCORE_IND
DQ Scores
0
-26

-25
-99

Description
Good Records.
Source has a value but Target dont have a corresponding
DWID in the lookup table (E.g. Gender is T but lookup table
only has M & F).
Source missing a value (E.g. Gender missing altogether).
Source missing a value and record is rejected and sent to
exception route (E.g. Both ARGO_EID and URN are null in the
MT message).

Web service
Informatica Data Services provides data integration functionality through a web service. You
can create a web service in the Developer tool. A web service client can connect to a web
service to access, transform, or deliver data. An external application or a Web Service
Consumer transformation can connect to a web service as a web service client.
A web service integrates applications using open standards, such as SOAP, WSDL, and XML.
SOAP is the communications protocol for web services. The web service client request and
the web service response are SOAP messages. A WSDL is an XML schema that describes the
protocols, formats, and signatures of the web service operations.

Below are WSDL links in PROD and for other environment, just change icasm543 to new env
once webservice application is deployed in to target env.:
WSDLs

Address

Doctor

http://icasm543:8095/DataIntegrationService/WebService/dqws_AddressDoctor/

AllergySeverity

http://icasm543:8095/DataIntegrationService/WebService/dqws_AllergySeverity/

AllergyType

http://icasm543:8095/DataIntegrationService/WebService/dqws_AllergyType/

ARGO

EID

&

URN

Null

Check

http://icasm543:8095/DataIntegrationService/WebService/dqws_ArgoEID_URN/

Date

http://icasm543:8095/DataIntegrationService/WebService/dqws_Date/

Email
http://icasm543:8095/DataIntegrationService/WebService/dqws_Email/

Ethnicity

http://icasm543:8095/DataIntegrationService/WebService/dqws_Ethnicity/

Ethnicity

Sub

Group

http://icasm543:8095/DataIntegrationService/WebService/dqws_EthnicitySubGroup/

Gender

http://icasm543:8095/DataIntegrationService/WebService/dqws_Gender/

Language

http://icasm543:8095/DataIntegrationService/WebService/dqws_Language/

Marital

Status

http://icasm543:8095/DataIntegrationService/WebService/dqws_MaritalStatus/

Name

http://icasm543:8095/DataIntegrationService/WebService/dqws_Name/

Phone

http://icasm543:8095/DataIntegrationService/WebService/dqws_Phone/

Prefix

http://icasm543:8095/DataIntegrationService/WebService/dqws_Prefix/

Race

http://icasm543:8095/DataIntegrationService/WebService/dqws_Race/

Religion

http://icasm543:8095/DataIntegrationService/WebService/dqws_Religion/

SSN

http://icasm543:8095/DataIntegrationService/WebService/dqws_SSN/

String

with

Number

check

http://icasm543:8095/DataIntegrationService/WebService/dqws_StringNumberCheck/

Suffix

http://icasm543:8095/DataIntegrationService/WebService/dqws_Suffix/

Title
http://icasm543:8095/DataIntegrationService/WebService/dqws_Title/

Vous aimerez peut-être aussi