Vous êtes sur la page 1sur 63

Technical Architecture

of the
Data Warehouse
Data Warehouse
Evangelist Program
CHED FDP on Business Analytics
Business Administration Track

Learning Outcomes
By the end of the lesson, participants should be
able to:
Be familiar with the fundamentals of the Data
Warehouse Architecture and its importance
Be familiar with the fundamentals of Backroom
Technical Architecture and its importance
Be familiar with the fundamentals of Frontroom
Technical Architecture and its importance
Appreciate the importance of metadata
Understand the importance of security in a data
warehouse environment

Overview
Why do we need blueprints before
building a house?

Overview
Why do we need blueprints before
building a house?
Architectural blueprints for a house
help the architect and customer

Blueprints
Help the architect and the customer
Communicate about desired results
Communicate about the nature of
construction effort

Blueprints
Help the architect and the customer
Communicate about desired results
Communicate about the nature of
construction effort
Determine resources required
Determines dependencies
Determines timing
Determines cost

The Analogy
Blueprints of a house is an analogy
to the importance of the
Information System Architecture

The Analogy
Blueprints of a house is an analogy
to the importance of the
Information System Architecture
Sad reality:

The Analogy
Blueprints of a house is an analogy
to the importance of the
Information System Architecture
Sad reality:
Common for people to dive into a data
warehousing project without any clear
idea of what they are building (!)

The Analogy
Blueprints of a house is an analogy
to the importance of the
Information System Architecture
Sad reality:
Common for people to dive into a data
warehousing project without any clear
idea of what they are building (!)
Starts with RDBMS running on a
leftover server

The Value of Architecture

Communication
Planning
Flexibility and Maintenance
Learning
Productivity and Reuse

The Value of Architecture


Communication
It helps communicate up, providing a
way to help management understand the
magnitude and complexity of the project.
Functions as a communications tool
within the team and with other IS groups,
providing participants with a sense for
where they fit in the process and what
they need to accomplish.

The Value of Architecture


Planning
Provides a cross-check for the project
plan.
Uncovers technical requirements and
dependencies that do not come out as
part of the planning process for a
specific project.
For example:
Suddenly realizing the data access tool you
picked requires a separate NT server while
you
are attempting to load
the software

The Value of Architecture


Flexibility and Maintenance
The architecture describes the
warehouse contents and processes
The architecture is used in those
processes to create, navigate, and
maintain the warehouse.

The Value of Architecture


Learning
The architecture plays an important role
as documentation for the system.
It can help new members of the team
get up to speed more quickly on the
components, contents, and connections.
The alternative is to turn new people
loose to build their own mental maps
through
trial, error, and
folklore.

The Value of Architecture


Productivity and Reuse
The architecture takes advantage of tools and
metadata as the primary enablers of
productivity and reuse.
Productivity is improved because the
architecture helps us choose tools to
automate parts of the warehouse process,
rather than build layers and layers of custom
code by hand.
It becomes easier for a developer to reuse
existing processes than to build from scratch.

The Data Warehouse Architecture


Level of
Detail

Data
(What)

Backroom
(Technical:
How)

Frontroom
(Technical:
How)

Infrastruct
ure
(Where)

Business
Requiremen
ts

What
information
do we need
to make
better
business
decisions?

How will we
get at the
data,
transform it,
make it
available to
the users?

What are
the major
business
issues we
face? How
do we
measure the
issues? How
do we
analyze the
data?

What HW
and system
level
capabilities
do we need
to be
successful?

Architecture
Models and
Documents

The
dimensional
model: What
are the facts
and

What are the


specific
capabilities we
will need to get
the data into a
usable form in

What will the


users need to
get the
information
out into a

Where is the
data coming
from?
Where is it
going to?

The Data Warehouse Architecture


Level of
Detail

Data
(What)

Backroom
(Technical:
How)

Frontroom
(Technical:
How)

Infrastruct
ure
(Where)

Detailed
Models and
Specs

The logical
and physical
models:
individual
elements?

What
standards and
products
provide
needed
capabilities?
How will we
hook them
together?
Development
standards?
Naming?

What are
the specifics
for the
report
templates?
Who needs
them? How
often?

How do we
interact with
these
capabilities?
What are
the system
utilities?
APIs? Calls?

Implementati
on

Creating the
databases,
indexes,
backup, etc

Write the
extracts and
loads,
Automate
Process

Implement
reporting
and analysis
environment
. Train users.

Install and
test
infrastructur
e
components

Technical Architecture
Overview

Technical Architecture
Overview
Services
Data Stores

Technical Architecture
Overview
Services
Functions needed to accomplish the
required tasks of the warehouse
For example: Copying a table from one
place to another (basic data movement
service)

Data Stores
Temporary or permanent landing places
for data

Key Technical Architecture Features


Metadata Driven
Flexible Services Layers
Caveats

Key Technical Architecture Features


Metadata Driven
Metadata provides flexibility by
buffering the various components of the
system from each other.
For example, when a source migrates to
a new operational system, substituting
in the new data source is relatively easy.
Because the metadata catalog holds the
mapping information from source to target

Key Technical Architecture Features


Flexible Services Layers
The data staging services and data
query services are application layers
that also provide a level of indirection
that add to the flexibility of the
architecture.

Key Technical Architecture Features


Caveats
Not all tools and services on the market
currently support the concept of an open
metadata catalog or query services
indirection.
A phased approach is recommended in
implementing the architecture.
Set up the basic structure first, and add
components as the warehouse grows and
resources permit.

Technical Architecture:
Architecture for the Backroom
Data Warehouse
Evangelist Program
CHED FDP on Business Analytics
Business Administration Track

Overview: Backroom Technical


Architecture
Where the staging process takes
place
Engine room of the data warehouse
Common term for the backroom
processes: Data Acquisition

Data Staging Process: Early to present day


implementation

Began as a manually coded


development effort
Early implementation: Manually
directing data into the warehouse
with minimal or no automation
Now automated to a degree
Still complex and temperamental

Backroom Data Stores


Temporary or permanent landing places for
data along the way
What gets physically stored has implications
on the datawarehose infrastructure
Primary data stores are in the backroom
Actual data stores you will need depend on
The business requirements
The complexity of the extract and
transformation process

Source Systems
The transaction systems are the
obvious sources of interesting
business information
Access to the core operational
systems of the business

Source Systems: Core Operations

Order entry
Production
Shipping
Customer Service
Accounting Systems

Source Systems: External to the


Business

Demographic customer information


Target customer lists
Customer business segments
Competitive sales data

The Staging Process


Iterative
Source for a load process maybe
another data warehouse
Source for a load process maybe the
target data warehouse itself

On Some Specific Source


Systems
Client/Server ERP Systems
Reporting Instance
Operational Data Store

Client/Server ERP Systems


Made of several modules that cover
major functional areas of the business
Human resources
Manufacturing
Etc.

Involves a major penalty


There are often thousands of tables in an ERP
source systems
Modern ERP Systems can be data
warehouse ready

The Reporting Instance


Creation of a separate copy of the
operational database to serve as the
reporting environment for the
operational systems.
This will just be another part of the
data warehouse

The Operational Data Store


Two interpretations:
the ODS serves as a point of integration
for operational systems.
It is a truly operational, real-time source for
balances, histories, and other detailed lookups.

The ODS is to supply current, detailed data


for decision support.
The analysis must be done on the most
granular and detailed data possible.

We use the first interpretation

The Data Staging Area


The construction site for the
warehouse.
This is where much of the data
transformation takes place
This is where much of the added
value of the data warehouse is
created.

Data Staging Storage Types


Flat files
Relational tables
Proprietary structures used by data
staging tools
Choice depends on the data
quantities and timeframes involved

Data Staging Data Models


The data models can be designed for
performance and ease of development.
Likely to match a combination of source
structures on the incoming side and
dimensional warehouse structures on the
finished goods side, with the
transformation process managing the
conversion.
Third normal form or E/R models often
appear in the data staging area

Presentation Servers
The target platforms where the data is stored for direct
querying by end users, reporting systems and other
applications.
The Data Warehouse Bus allows parallel development of
business process data marts with the ability to integrate
these data marts ensured by their use of conformed
dimensions.
Ideal to be loading detail and aggregate data into data
marts segmented by business process.
The conformed dimensions used in these data marts will
allow query management software to combine data
across data marts for fully integrated enterprise analysis
and reporting.

Data Marts
Atomic Data Marts
Aggregate Business Process Data
Marts

Data Marts
Atomic Data Marts
Hold data at the lowest level of detail
necessary to meet most of the high-value
business requirements.

Aggregate Business Process Data Marts


Data related to each core business process
Bring together relevant sets of data from
the atomic data mart and presenting it in a
dimensional form that is meaningful to the
business users.

Metadata Catalogue
An integral part of the overall
architecture
Represents the set of information
that describes the warehouse and
plays an active role in its creation,
use, and maintenance

Services Involved
Data Staging Services
Extract Services
Data Transformation Services
Data Loading Services
Data Staging Job Control Services

Data Staging Services


Tools and techniques employed in the data
staging process.
A service as an elemental function or task.
Could be as simple as to creating a table in a
database.
An application is a software product that
provides one or more services.
An application may be code created by the
warehouse team, an in-house utility, a vendor
utility, or a full-scale vendor product designed
for data warehousing.

Extract Services
Pulling the data from the source
systems
Probably the largest single effort in
the data warehouse project,
especially if the source systems are
decades-old, mainframe-based,
mystery-house-style systems.

Data Transformation
Services
Once the data is extracted from the
source system, a range of unnatural
acts are performed on it to convert it
into something presentable to the
users and valuable to the business.

Data Loading Services


Moving the transformed data into the
targets

Data Staging Job Control


Services
The entire data staging job stream
should be managed, to the extent
possible, through a single, metadatadriven job control environment.
The job control process also captures
metadata regarding the progress and
statistics of the daily job itself.

Data Staging Job Control


Services

Job Definition
Job Scheduling
Monitoring
Logging
Exception/Error Handling
Notifications

Technical Architecture:
The Frontroom.
Intro to Metadata
Data Warehouse
Evangelist Program
CHED FDP on Business Analytics
Business Administration Track

The Frontroom
The front room is the public face of
the warehouse.
Its what the business users see and
work with day-to-day.
For most folks, the user interface is
the data warehouse.

Front Room Data Stores


Access Tool Data Stores
Standard Reporting Data Stores

Access Tool Data Stores


As data moves into the front room and
closer to the user, it becomes more diffused.
Users can generate hundreds of ad hoc
queries and reports in a day.
These are typically centered on a specific
question, investigation of an anomaly, or
tracking the impact of a program or event.
Most individual queries yield result sets with
less than 10,000 rowsa large percentage
have less than 1,000 rows.

Access Tool Data Stores


These result sets are stored in the data access tool,
at least temporarily. Much of the time, the results
are actually transferred into a spreadsheet and
analyzed further.
Some data access tools work with their own
intermediate application server.
In some cases, this server provides an additional
data store to cache the results of user queries and
standard reports.
This cache provides much faster response time
when it receives a request for a previously retrieved
result set.

Standard Reporting Data


Stores

Old mainframe reporting systems are being left undone


or are being done poorly.
As a result, client/server-based standard reporting
environments are beginning to pop up in the
marketplace.
These applications usually take advantage of the data
warehouse as a primary data source.
They may use multiple data stores, including a separate
reporting database that draws from the warehouse and
the operational systems.
They may also have a report library or cache of some
sort that holds a preexecuted set of reports to provide
lightning-fast response time.

Downstream Systems
As the data warehouse becomes the
authoritative data source for analysis
and reporting, other systems are
drawn to it as the data source of
choice.
The basic purpose of these systems is
still reporting, but they tend to fall
closer to the operational edge of the
spectrum.

Services Included

Access and Security Services


Activity Monitoring Services
Query Management Services
Standard Reporting Services
Future Access Services
Desktop Services

About Metadata
The back room metadata is process
related, and it guides the extraction,
cleaning, and loading processes.
The front room metadata is more
descriptive, and it helps query tools
and report writers function smoothly.
Process and descriptive metadata
overlap, but it is useful to think about
them separately.

About Metadata
Source System Metadata
Backroom and Frontroom Metadata

Security

Security
Data -> Information -> Knowledge ->
Implementation -> Results
Hardware Security
Software Security
Security involving human resources
Role-based access control
Guidelines, Policies
Accountability