Vous êtes sur la page 1sur 23

Data Warehouse Architecture

Lecture 1
Lecture Objectives
 Define Data Warehouse Architecture
 Define Data Warehouse and Data Mart
 Discuss: The Great Debate
 Present a Data Warehouse Architectural
Framework
Information Systems Architecture
 Information Systems Architecture is the process of
making the key choices that are essential to the
development of an information system. Architecture
includes:
 Guiding Principles:
 Approaches/philosophies
 “Logical” representations of a system
 Hardware/Operating System
 Computing model: client/server vs traditional vs Web-based
 Tools and technologies
 It is key, when making these choices that they are:
 Requirements driven
 Take into consideration operational, technical and financial
feasibility
 Made within an architectural framework
Architecture Drivers
 There are a lot of Drivers of Architecture

Corporate
Politics

Business System
Plan Qualities

Architecture

Current Emerging
Systems Technologies

End User
Requirements
Architecture to Design to
Implementation
Looking at the Work of Developing a System as a set of Choices,
Architecture can be Described as Highest Level Choices

Business Strategy
Architecture
Removal of Choices

Design

Line of Code / Process Step Implementation


Employee / Computer
How is Architecture Different from
Design?
 Its not – Architecture can be considered ‘high-
level’ design
 Architecture includes those aspects of the design
that are essential to the information system
 Architecture Example:
 Users must be able to self-serve (guiding principle)
 “We will use a “hub and spoke” design where data will
be placed in a central data warehouse, then be
propagated to one or more data marts. (approach)
 We will normalize data in the central warehouse and
use a dimensional design in the data marts (approach)
 We will use Oracle 8i as our DBMS (technical
architecture)
Architecture vs Design
 Not Architecture:
 The Order subject area will be composed of the
following tables: order_fact, customer_dim,
product_dim and time_dim
 The customer_dim table will have the following
attributes…….
The Value of Architecture
 Communication:
 To business sponsors, and business users
 Between members of the project team
 Planning:
 Cross Check for Project Plan
 Ensure that all important components of the data
warehouse are accounted for
 Flexibility and Growth
 Thinking about overall architecture will reduce risk
associated with the ‘success’ of the data warehouse
 Learning
 Productivity and Reuse
What’s different about DW
Architecture?
 Transaction processing systems – growth is
(relatively) predictable
 Example:
 A company uses SAP for order processing
 They are opening a new retail store
 They predict (based on experience) 2000
transactions per week
 To process this volume, we need 3 workstations to
capture the transactions
 Peak time each day is 11-2 when 50% of
transactions occur
What’s Different About Data Warehouse
Architecture?
 Success drives
explosive growth
Data  More users
Warehouse
 More (complex) queries
 More data
 Performance is
unpredictable
Growth

SAP R/3

Siebel
 Unpredictable queries
 Unpredictable use
patterns

Time
The Great Data Warehouse
Architecture Debate

 Bill Inmon: “The


enterprise data If you build it,
They will come
warehouse”

 Ralph Kimball: “data


marts”

 The compromise: “Hub


and Spoke” or
“Federated” models
What is a Data Mart?
 A data mart is a collection of subject areas
organized for decision support based on the
specific needs of a given user group.
 Each mart may widely different from others (as
we will see)
 Typically, data marts are built on the
dimensional data model:
 Facts – things that the organization wants to
measure: revenue, orders, shipments, purchases,
etc.
 Dimensions – the means by which the organization
wants to analyze the measures (facts) – by
customer, by time, by product – BY ANY
COMBINATION!!
What is a Data Mart?
 There are two kinds of data marts--dependent and
independent.
 A dependent data mart is one whose source is a data
warehouse.
 An independent data mart is one whose source is the
legacy applications environment. All dependent data
marts are fed by the same source--the data warehouse.
Each independent data mart is fed uniquely and
separately by the legacy applications environment.
 Dependent data marts are architecturally and
structurally sound.
 Independent data marts have a number of significant
issues
Data Warehouse Definition #3
 Provides the infrastructure to feed data marts,
which supply data to users.
 Responsible for:
 Acquiring data
 Cleaning data
 Transforming data
 Managing granular data
 Managing ALL data required for a specific subject
 Can be store data in a number of ways:
 Flat files
 RDBMS
 Other technologies (compression)
Data Warehouse Definition #3
 Ensures that:
 Impact on source systems is minimized
 Consistent data definitions across data marts
 Conformed dimensions!!
 Appropriate history is maintained
Architectural Framework
 Three Views
 Data (content of warehouse)
 Technical (functions to be performed by the
warehouse)
 Infrastructure (Hardware, Communications)
 Four levels of detail for each view
 Business Requirements
 Architecture Models
 Detailed Models
 Implementation
Data Architecture
 Business Requirements
 What information do we need to make better business decisions?
 What data assets are available?
 Architecture Models and Documents - The Dimensional
Model
 What are the major entities (the facts and dimensions) that make up
this information?
 How do they relate to each other?
 How should these entities be structured?
 Detailed Models and Specs - The Logical and Physical
Models
 What are the individual elements, their definitions, domains, and
rules for derivation?
 What are the sources and how do they map to the targets?
 Implementation
 Create and document the databases, indexes, backup procedures,
etc.
Technical Architecture – Back Room
 Business Requirements and Audit
 How will we get at the data, transform it, and make it available to
our users?
 How is this done today?
 Architecture Models and Documents
 What are the specific capabilities needed to get the data into a
usable form in the desired locations at the appropriate times?
 What are the major data stores and where should they be located?
 Detailed Models and Specs
 What standards and products provide the needed capabilities?
 How will we hook them together?
 What are our development standards for code management.,
naming, etc.?
 Implementation
 Write the extracts and loads.
 Automate the process.
 Document the process.
Technical Architecture – Front Room
 Business Requirements and Audit
 What are the major business issues we face?
 How will we measure these issues?
 How do we want to analyze the data?
 Architecture Models and Documents
 What will users need to get the information out in a usable form?
 What major classes of analysis and reporting do we need to provide?
 What are the priorities?
 Detailed Models and Specs
 What tools do we select to provide capabilities
 What are the specifics for the report templates?
 Who needs them?
 How often?
 How do we distribute them?
Technical Architecture – Front Room
 Implementation
 Implement the reporting and analysis environment.
 Build the initial report set.
 Train the users.
 Document.
Infrastructure
 Business Requirements and Audit
 What hardware and network capabilities do we need to be
successful?
 What do we currently have in place?
 Architecture Models and Documents
 Where is the data coming from and going to?
 Do we have enough calculation and storage capacity?
 What are the specific capabilities we are counting on?
 Do they exist?
 Who is responsible for them?
 Detailed Models and Specs
 How do we interact with these capabilities?
 What are the system utilities, calls, APIs, etc.?
 Implementation
 Install and test new infrastructure components.
 Connect the sources to the targets to the desktop.
 Document.
Technical Data Warehouse
Framework
H igh Level W arehouse Technical Architecture

The Back Room The Front Room


Metadata
Source
Catalog
Sys tems
Standard
Data Query Reporting Tools

Staging Presentation Serv ers Services


Services - W arehous e Brows ing
Desktop Data
Access Tools
- Extract - Acc es s and Sec urity
Dimensi onal Da ta Mar ts wi th
- T ransformation
only aggre gate d data - Query Management
- Load Application Models
- Standard Reporting (e.g. data mining)
- J ob Control
T he Data Conformed - Activity Monitor
Data Staging
Dimensions &
W arehous e Downstream /
Area Conformed
operational
BUS Facts
systems

Dimensiona l Data Marts


includi ng a tomic data

Key
Data Service
Element Element
Technical Data Warehouse
Framework
Metadata
Metadata Management

“Back Room”

Data Staging Services “Front Room”


Acquisition Engineering Storage Delivery Access
Optl. Client/
Datastore Server

Extracted Re-engd.
Data Data
Data Warehouse Data
Web
Marts
Server

Workflow and System Management

Vous aimerez peut-être aussi