Vous êtes sur la page 1sur 44

Data Integration Strategy

Mark Mitchell
Senior Product Specialist - EMEA

The development, release and timing of any Informatica product described herein remains at the sole
discretion of Informatica. This information should not be relied upon in making a purchasing decision.

1
Data Integration Product Strategy

• Project Specific Solutions


• Targeted at Broader DI Use Cases
• Includes Some Industry Verticals
• Role Specific Tools
• Encompassing Full DI Lifecycle
• Integrated Workflow
• Common Services & Frameworks
• Single Integration Engine for ETL, EII
• Comprehensive Orchestration Support
• Ajax and Eclipse UIs
• Model-based Repository

Informatica confidential. For discussion only. Do not distribute. 2


Data Integration Product Strategy
Project Specific Solutions

Improve Modernize Merge & Increase Outsource


Business Decisions & Business Non-core
Business & Acquire
Imperatives Regulatory Reduce IT Profitability Functions
Compliance Costs

Business Legacy Application Customer,


Supplier, BPO
IT Intelligence Retirement Consolidation SaaS
Product Hubs
Initiatives

Data
Integration
Projects

Data Data Data Master Data Data Data


Warehouse Migration Consolidation Management Synchronization Quality

Informatica confidential. For discussion only. Do not distribute. 3


Data Integration Product Strategy
Role Specific Tools

Need User % Lifecycle


Discover & Define business terms, domain values and Business Analyst High
Define high level semantics. Profile source data
Analyst Focused Product
systems. Create integration specs. Define
business rules.
Architect & Build (or import) model of key business Data Specialist/ Medium
Design entities. Document relationships
Architect between
FocusedArchitect
Product
data objects and source data. Model
service interfaces.
Develop & Build ETL mappings/workflows + views for Developer/ Low
Process end pointing Informatica 8.xProgrammer
- 7.x

Admin & Deploy and manage changes in large scale Administrator Low
Operate environments. High availability,Informatica
grid, 8.x
pushdown. Unified administration, security.

Informatica confidential. For discussion only. Do not distribute. 4


Data Integration Product Strategy
Common Services & Frameworks = Major New Capabilities

Project Specific Replication/ Cross- Other INFA BU


Solutions Migration
Bulk Sync Enterprise Products
(Ajax)

Role Specific Tools


(Ajax, Eclipse) Analyst Architect Developer Administrator

Informatica Services Framework


Transformation Cleansing Profiling Catalog

Shared Services, Rules Mgmt. Orchestration WSH Reporting


Single Processing
Engine Common Integration Engine
(Batch, Real-time, On Demand, Caching)

Foundation Services
(Repository, Grid, HA, Security, Admin., Logging, Licensing, etc.)

Connectivity Services

Informatica confidential. For discussion only. Do not distribute. 5


Major Release Timeline

Galileo Release
1H 2009

Da Vinci Release
2H 2008

1H 2008 Modeling, Federation


(Architect)
2H 2007
Governance, Orchestration
“XE” Platform (Analyst)
2H 2007 Replication/Bulk Sync Solution
PC/PWX 8.5
Q3 2007
Migration Solution,
Data Masking Solution
Cross-Enterprise
Solution

Mission-Critical Deployments
(Admin, Operator)

Informatica confidential. For discussion only. Do not distribute. 6


Informatica 8.5 Release
Release Summary

CHARTER
• Delivering enterprise grade data integration
• Supporting Integration Competency Centers (ICCs)
• Focus: Administrators and Operators
• Delivery timeframe: GA Released 12 Oct 2007

KEY CAPABILITIES

• Simplified web services deployment


• Dynamic workflow concurrency
• Improved real-time, synchronization support
• Unified administration, enterprise grade security
• More productive metadata discovery and analysis
• 200+ incremental enhancements to existing capabilities

Informatica confidential. For discussion only. Do not distribute. 7


Integrated Platform for Discovery & Definition

Glossary
Management
Profile, Cleanse
& Integrate
Analyst

Specify
Rules

Developer
Active
Collaborate
Scorecard

Informatica confidential. For discussion only. Do not distribute. 8


PowerCenter Data Masking

Informatica confidential. For discussion only. Do not distribute. 9


9
Data Masking Feature Summary

• Protects sensitive information by transforming it into


de-identified, realistic-looking data while retaining
original data properties

• Data remains relevant and meaningful


• Preserves the shape and form of individual fields
• Preserves intra-record relationships
• Preserves join / foreign key relationships

John Smith Glen Carter


654-65-8945 654-45-2643
4739-1146-8075- 4739-1102-3517-8842
5716 342 54th Street
100 Cardinal way New York
Redwood City

Informatica confidential. For discussion only. Do not distribute. 10


Business Drivers & Requirements

Per Incident Cost of Data Breach¹


• Manage Risk
• Minimize risk of a data security • $197 per record ($239 in Fin Serv)
breach • $6.3 million average
• $225k - $35m range
• Regulatory Environment
• Improve compliance with data
Organizations Admitting Non Compliance²
privacy laws & regulations
Sarbanes-Oxley, 28%

• Globalization
Gramm-Leach-Bliley Act, 14%
• Reduce costs through outsourcing
& offshoring California database breach notification act, 15%

HIPAA, 40%

EU Data Privacy Directive , 45%

¹2007 Annual Study: Cost of a Data Breach, Ponemon Institute


²Global State of Information Security 2006, CIO & PricewaterhouseCoopers

Informatica confidential. For discussion only. Do not distribute. 11


The challenge in data privacy
is sharing data while
protecting personal information

Informatica confidential. For discussion only. Do not distribute. 12


Protecting Sensitive Data
Restrict Access, Mask Private Data

• Development and Testing John Smith


654-65-8945
4739-1146-8075-5716

• Training 100 Cardinal way


Redwood city

• Support
Glen Carter

• Data Analysis 654-45-2643


4739-1102-3517-8842
342 54th Street
New York
• Outsourcing and Offshoring

Informatica confidential. For discussion only. Do not distribute. 13


Business Use Cases

A Financial Services Organization needs to setup an offshore development center


for lowering the IT costs
Challenges • Offshore environment needs production-like data for reliable
development and testing of applications
• 200+ applications containing sensitive data in a variety of databases
(e.g. Oracle, DB2) and files (e.g. VSAM) with many inter-dependent
tables
• Needs to ensure that access to all sensitive data is restricted to
users who have a “Need-to-know”
• Sensitive fields include Name, Address, SSN, Credit Card Number,
Account Number, etc

Solution PowerCenter Data masking option can preserve referential integrity


and intelligently mask SSN, Credit Card Number, etc while providing
realistic data to development and test environments

Informatica confidential. For discussion only. Do not distribute. 14


Business Use Cases

A Health Care Provider needs to outsource the analysis of health related data to a
third party marketing research firm

Challenges • It needs to mask all sensitive health related information to comply


with privacy laws like HIPAA
• Sensitive fields include Name, Address, Age, Date of Birth, etc
• Masked data must remain as close as possible to the original data to
ensure proper data analysis
• For example, the date of birth needs to be masked but have to
maintain the same age

Solution PowerCenter Data masking Option provides features like Blurring,


Mask format, Name, Address substitution, etc to de-identify sensitive
data while maintaining the original data characteristics.

Informatica confidential. For discussion only. Do not distribute. 15


The Informatica Product Platform
Automating Entire Data Integration Lifecycle

Audit, Monitor, Report


Ensure data consistency, perform impact analysis and
Deliver de-
Source live data continuously monitor quality identified data to
from any system Monitoring & other
in batch or reporting of environments
real-time Data Explorer Data Quality
adherence to
security policies

Access Discover Cleanse Integrate Deliver


Unstructured Search and profile Validate, correct and Transform and Exchange data at
or structured any data from any standardize reconcile all the right time, in
in batch or source all data types data types and the right format,
real-time Industry across any
formats platform
Discover and Define data
profile sensitive masking rules
PowerExchange data from any and apply PowerCenter
system transformations + Data Masking option

Develop & Manage


Develop and collaborate with common repository and shared metadata

Informatica confidential. For discussion only. Do not distribute. 16


Masking Production Data for Test Environment

Production Environment Test Environment

Mainframe and Mainframe and


Mid-Range Mid-Range

Packaged Packaged
Applications Applications

PowerCenter +
Relational and Relational and
Flat Files Data Masking Option Flat Files

Standards and Standards and


Messaging Messaging
Data Masking Option is
licensed per
Remote Data PowerCenter repository Remote Data

Informatica confidential. For discussion only. Do not distribute. 17


Use data masking
transformations in
PowerCenter mappings

Informatica confidential. For discussion only. Do not distribute. 18


PowerCenter Data Masking Option
Key Features

• Multiple techniques and algorithms


• Random Masking
• Blurring
• Key Masking for preserving referential
integrity
• Substitution
• Specialized, built-in rules and content
• Name and Address content
• Special fields like SSN, Credit Card,
Phone Number, etc
• Pre-packaged sample mappings
• Component of data integration platform
• Universal data access
• Rich transformation capabilities
• Auditing and Reporting

Informatica confidential. For discussion only. Do not distribute. 19


Random Masking

• Replace sensitive field with a randomly generated value


subject to various constraints
• Range – Minimum and Maximum boundaries
• Blurring – Fixed or Percent variance to the original value
• Mask Format – Format specification for retaining the data
structure
Character Description
A Alphabetical characters
D Digits
N Alphanumeric characters
X Any character
+ No character masking.

Informatica confidential. For discussion only. Do not distribute. 20


Random Masking - Example

Customer Customer
CUSTID FULLNAME CREATEDDATE CUSTID FULLNAME CREATEDDATE
117 Andrew Davies 4/16/1996 448 Kan Crone 3/2/1976
638 Elizabeth Murphy 1/14/1998 259 Ludie Dowden 9/5/1982
890 Richard Block 4/6/2000 913 Jarad Bayne 11/19/2004

Customer Accounts Customer Accounts


ACCTID CUSTID BALANCE STARTDATE ACCTID CUSTID BALANCE STARTDATE
AS-09615 117 5197 11/12/2004 RW-07778 448 5268 11/12/2004
SJ-04108 117 8047 3/2/2007 VB-55856 448 7555 3/2/2007
FX-56312 638 162 7/27/2005 SX-00685 259 170 7/27/2005

Production Database Test Database

• ACCTID is masked using Mask Format to preserve the structure, two alphabetic characters
followed by a hyphen followed by five numeric characters

• CREATEDDATE is masked using Range masking, to generate a random date between 01/01/1950
and 01/01/2010

• BALANCE needs to be blurred plus or minus 10% in order to preserve the distribution of balances
across all accounts

Informatica confidential. For discussion only. Do not distribute. 21


Random Masking - Example
BALANCE – number datatype

Blurring – Mask BALANCE with a


value that is within + or - 10%
range of the original value

Informatica confidential. For discussion only. Do not distribute. 22


Random Masking - Example
CREATEDDATE – date datatype

Range – Generate a random


date between 01/01/1950 and
01/01/2010

Informatica confidential. For discussion only. Do not distribute. 23


Random Masking - Example
ACCTID – string datatype

Mask Format – Mask ACCTID


while preserving the structure,
two alphabetic characters, retain
the third character followed by
five numeric characters

Result string replacement


characters can be used to specify
characters to mask and replace. For eg.,
Use only uppercase alphabetic
characters

Informatica confidential. For discussion only. Do not distribute. 24


Key Masking

• Generate repeatable values to preserve referential


integrity
• Seed based algorithm returns the same data each
time the source value and seed value are the same
• Configure the same seed value for masking the
primary key and foreign key columns
• Change seed value to produce a different set of
repeatable data

Informatica confidential. For discussion only. Do not distribute. 25


Key Masking - Example

Customer Customer
CUSTID FULLNAME CREATEDDATE CUSTID FULLNAME CREATEDDATE
117 Andrew Davies 4/16/1996 448 Kan Crone 3/2/1976
638 Elizabeth Murphy 1/14/1998 259 Ludie Dowden 9/5/1982
890 Richard Block 4/6/2000 913 Jarad Bayne 11/19/2004

Customer Accounts Customer Accounts


ACCTID CUSTID BALANCE STARTDATE ACCTID CUSTID BALANCE STARTDATE
AS-09615 117 5197 11/12/2004 RW-07778 448 5268 11/12/2004

SJ-04108 117 8047 3/2/2007 VB-55856 448 7555 3/2/2007

FX-56312 638 162 7/27/2005 SX-00685 259 170 7/27/2005

Production Database Test Database

• Customer and Customer Accounts tables have to be masked consistently to preserve referential
integrity

• Maintain repeatability. For example, mask “117” to “448” again and again

•Change repeatable value for different runs. For example, mask “117” to “448” for test environment
but to “772” for development environment

Informatica confidential. For discussion only. Do not distribute. 26


Key Masking - Example

Seed – Same seed value is used


while masking the primary key
and foreign key fields to preserve
referential integrity

Informatica confidential. For discussion only. Do not distribute. 27


Special built-in masking rules

• Built-in rules for commonly known sensitive fields


• Credit Card Number
• Generate a random but valid credit card number using Luhn
algorithm
• Preserve Issuer Identifier (Visa, Discover, etc), the first 6 digits of
the CC Number

• Social Security Number


• Generate a random Social Security Number that has not been
generated yet
• Uses High group file provided by Social Security Authority
• Download latest high group file for keeping up-to-date

Informatica confidential. For discussion only. Do not distribute. 28


Special built-in masking rules

• Phone Number
• Generate a random phone number but preserve the incoming
phone format

• Email Address
• Generate a random email address of the correct format with @, .,
etc

• URL
• Generate a random URL value with the correct format

• IP
• Generate a random IP address within the same network range

Informatica confidential. For discussion only. Do not distribute. 29


Special built-in masking rules - Example

Customer
PHONE EMAIL SSN CREDITCARD
(206) 923-3477 bmurphy@illuminetss7.com 275-85-8158 4552-7473-4192-6624

6682848046 deborahashea@lmco.com 271-85-8451 4465-8580-5809-1951

Customer
PHONE EMAIL SSN CREDITCARD
(988) 676-4900 ir6NKRi@JuBlAlgI07WR.AEb 275-53-0840 4552-7464-3620-2545

8056642448 78dgrJMg9gU1@laoQ.fGf 271-43-3410 4465-8564-7382-9054

• Mask Phone while retaining the same format

• Mask Email while retaining the correct email format

• Generate an SSN with the correct format but that has not been issued so far

• Generate a valid credit card Number while preserving the issuer identifier number

Informatica confidential. For discussion only. Do not distribute. 30


Special built-in masking rules - Example

Informatica confidential. For discussion only. Do not distribute. 31


Substitution – Name and Address

• Generate random but realistic looking values for Names


and Addresses
• Packaged substitution datasets
• First Names (Male and Female)
• Last Names
• Address

• PowerCenter Lookup transformation is used for


performing random lookup against the provided datasets
• Pre-packaged sample mappings that demonstrate
substitution mechanism

Informatica confidential. For discussion only. Do not distribute. 32


Substitution - Example

Customer
FULLNAME STREET CITY STATE
John Smith 100 Cardinal way Redwood City CA

Andrew Davies 5400 Carillon Pt Kirkland WA

Customer
FULLNAME STREET CITY STATE

Glen Harrison 6 Meadows Pkwy Olympia WA

Kan Crone 9001 Stockdale Hwy Bakersfield CA

• Randomly substitute values from included content

• Name Masking. For example, mask John Smith to Glen Harrison

• Address Masking. For example, mask 100 Cardinal way to 6 Meadows Pkwy

Informatica confidential. For discussion only. Do not distribute. 33


Substitution – Mapping

First Name Lookup – Use


firstnames.dic file

Data Masking Transformation


– Generate random numbers for
lookup

Surname Lookup – Use


surnames.dic file

Address Lookup – Use


Address.dic file

Informatica confidential. For discussion only. Do not distribute. 34


Orchestration

35
Informatica BPM Functionality
Data Service Orchestration and Human Workflow

Informatica Informatica Informatica


Orchestration Orchestration Human
Designer Server Workflow

Informatica confidential. For discussion only. Do not distribute. 36


Orchestration Designer

• Eclipse based
• Visual and Source editors for BPMN, XFORM,
WSDL, XSD etc.
• Drag and drop interface eliminates coding (and
errors!)
• Import and Export of standard artifacts (WSDL, XSD
etc.)
• Single-click Deploy

Informatica confidential. For discussion only. Do not distribute. 37


Orchestration Server

• BPEL engine
• Executes BPEL code generated by Orchestration
Designer or by third party
• Interaction with external participants is exclusively
based on Web Services technology (WSDL)
• Supports long running processes
• Newer versions of processes can be deployed without
terminating existing versions

Informatica confidential. For discussion only. Do not distribute. 38


Human Workflow

• Designed as XFORMS (Web 2.0) using


Orchestration Designer
• Deployed to Orchestration Server along with
generated BPEL code
• Rendered by Orchestration Server and delivered to
browser

Informatica confidential. For discussion only. Do not distribute. 39


BPMN - Highlights

•A standardized means of illustrating a business


process
• Useful for documenting business process
• Useful in IT for documenting technical process

Informatica confidential. For discussion only. Do not distribute. 40


BPMN Diagram - Sample

Informatica confidential. For discussion only. Do not distribute. 41


Process Monitoring

“Where are we at on the ABC project / deal / claim/ account?”

Zoom and
Timeline Control Shows
Event
Information

Shows
Proces
s Path

Informatica confidential. For discussion only. Do not distribute. 42


Uses for Orchestration

• Sequencing
• Start a process after the completion of another process or after a specific time has
been reached
• Synchronization of master data
• Synchronize master data between multiple independent data sources
• Conditional Logic
• Take differentiated action depending on the outcome of another process activity
• Different handlers for System and Business exceptions
• Human Workflow
• Complex decisions requiring human intervention
• Looping
• Iteratively execute a process activity based on standard looping criteria (for, while,
repeat-until)

Informatica confidential. For discussion only. Do not distribute. 43


Thank You

44

Vous aimerez peut-être aussi