Vous êtes sur la page 1sur 33

Copyright 2005, Oracle. All rights reserved.

Data Warehouse Design


1-2 Copyright 2005, Oracle. All rights reserved.
Objectives
After completing this lesson, you should be able to do
the following:
Differentiate OLTP and data warehousing design
techniques
Describe effective data warehouse design
Identify data warehousing schemas
Explain implementation models
List data warehousing objects
1-3 Copyright 2005, Oracle. All rights reserved.
Characteristics of a Data Warehouse
A data warehouse is a database designed for
querying, reporting, and analysis.
A data warehouse contains historical data derived
from transaction data.
Data warehouses separate analysis workload from
transaction workload.
A data warehouse is primarily
an analytical tool.
1-4 Copyright 2005, Oracle. All rights reserved.
Comparing OLTP and Data Warehouses
OLTP
Many
Comparatively
lower
Normalized
DBMS
Rare
Some
Large
amount
Denormalized
DBMS
Common
Data
Warehouse
Data accessed
by queries
Joins
Duplicated
data
Derived data
and
aggregates
1-6 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Architectures
Basic Data
Warehouse
Analysis
Reporting
Data mining
Operational
systems
Flat files
Materialized
views
Metadata
Raw data
1-7 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Architectures
Data Warehouse
with Staging Area
Analysis
Reporting
Data mining Flat files
Materialized
views
Metadata
Raw data
Operational
systems
Staging
area
1-8 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Architectures
Data Warehouse
with Staging Area
Reporting
Data mining Flat files
Materialized
views
Metadata
Raw data
Operational
systems
Staging
area
Sales
Purchasing
Inventory
Analysis
1-9 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Design
Key data warehouse design considerations:
Identify the specific data content.
Recognize the critical relationships within and
between groups of data.
Define the system environment
supporting your data warehouse.
Identify the required data
transformations.
Calculate the frequency at which
the data must be refreshed.
1-10 Copyright 2005, Oracle. All rights reserved.
Logical Design
A logical design is conceptual and
abstract.
Entity-relationship (ER) modeling
is useful in identifying logical
information requirements.
An entity represents a chunk of data.
The properties of entities are known as attributes.
The links between entities and attributes are known
as relationships.
Dimensional modeling is a specialized
type of ER modeling useful in data warehouse
design.
1-12 Copyright 2005, Oracle. All rights reserved.
Oracle Warehouse Builder
Oracle Database provides tools to implement the
ETL process.
Oracle Warehouse Builder is a tool to help in this
process.
Oracle Warehouse Builder generates the following
types of code:
SQL data definition language (DDL) scripts
PL/SQL programs
SQL*Loader control files
XML Processing Description Language (XPDL)
ABAP code (used to extract data from SAP
systems)
1-13 Copyright 2005, Oracle. All rights reserved.
Data Warehousing Schemas
Objects can be arranged in data warehousing
schema models in a variety of ways:
Star schema
Snowflake schema
Third normal form (3NF) schema
Hybrid schemas
The source data model and user
requirements should steer the data
warehouse schema.
Implementation of the logical model may require
changes to enable you to adapt it to your physical
system.
1-14 Copyright 2005, Oracle. All rights reserved.
Schema Characteristics
Star schema
Characterized by one or more large fact tables and
a number of much smaller dimension tables
Each dimension table joined to the fact table using
a primary key to foreign key join
Snowflake schema
Dimension data grouped into multiple tables
instead of one large table
Increased number of dimension tables, requiring
more foreign key joins
Third normal form (3NF) schema
A classical relational-database model that
minimizes data redundancy through normalization
1-16 Copyright 2005, Oracle. All rights reserved.
Data Warehousing Objects
Fact tables
Fact tables are the large tables that store business
measurements.
Dimension tables
A dimension is a structure composed of one or
more hierarchies that categorizes data.
Unique identifiers are specified for one distinct
record in a dimension table.
Relationships
Relationships guarantee
integrity of business
information.
1-17 Copyright 2005, Oracle. All rights reserved.
Fact Tables
A fact table must be defined for each star schema.
Fact tables are the large tables that store business
measurements.
A fact table contains either detail-level or
aggregated facts.
A fact table usually contains facts with the same
level of aggregation.
The primary key of the fact table is
usually a composite key made up
of all its foreign keys.
1-18 Copyright 2005, Oracle. All rights reserved.
Dimensions and Hierarchies
A dimension is a structure
composed of one or more
hierarchies that categorizes data.
Dimensional attributes help to
describe the dimensional value.
Dimension data is collected at the
lowest level of detail and aggregated
into higher level totals.
Hierarchies are structures that use
ordered levels to organize data.
In a hierarchy, each level is
connected to the levels above and
below it.
STATE
COUNTRY
SUBREGION
REGION
CUSTOMERS dimension
hierarchy (by level)
CITY
CUSTOMER
1-19 Copyright 2005, Oracle. All rights reserved.
Dimensions and Hierarchies
Dimension table Dimension table
TIMES CHANNELS
CUSTOMERS
#cust_id
cust_last_name
cust_city
cust_state_province
PRODUCTS
#prod_id
Fact table
PROMOTIONS
Dimension table
SALES
cust_id
prod_id
Hierarchy
Unique identifier
Relationship
1-20 Copyright 2005, Oracle. All rights reserved.
Physical Design
Relationships
Unique
identifiers
Attributes
Entities Tables
Integrity
constraints
- Primary key
- Foreign key
- Not null
Columns
Indexes
Materialized
views
Dimensions
Logical Physical (Tablespaces)
1-21 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Physical Structures
Tables and partitioned tables
Partitioned tables enable you to split
large data volumes into smaller,
more manageable pieces.
Expect performance benefits from:
Partition pruning
Intelligent parallel processing
Compressed tables offer scaleup opportunities for
read-only operations.
Table compression saves disk space.

1-22 Copyright 2005, Oracle. All rights reserved.
Data Warehouse Physical Structures
Views:
Are tailored presentations of data contained in one
or more tables or views
Do not require any space in the database
Materialized views:
Are query results that have been stored in advance
(Like indexes) are used transparently and improve
performance
Integrity constraints:
Are used in data warehouses for query rewrite
Dimensions:
Are containers of logical relationships and do not
require any space in the database
1-23 Copyright 2005, Oracle. All rights reserved.
Managing Large Volumes of Data
Work smarter in your data warehouse:
Partitioning
Bitmap indexes/Star transformation
Data compression
Query rewrite
Work harder in your data warehouse:
Parallelism for all operations
DBA tasks, such as loading, index creation, table
creation, data modification, backup and recovery
End-user operations, such as queries
Unbounded scalability: Real Application Clusters
1-24 Copyright 2005, Oracle. All rights reserved.
I/O Performance in Data Warehouses
I/O is typically the primary determinant of data
warehouse performance.
Data warehouse storage configurations should be
chosen by I/O bandwidth, not storage capacity.
Every component of the I/O
subsystem should provide
enough bandwidth:
Disks
I/O channels
I/O adapters
In data warehouses, maximizing
sequential I/O throughput is critical.
1-25 Copyright 2005, Oracle. All rights reserved.
Performance of Sequential I/Os
In data warehouses, drive arrays generally see
random large I/Os (1 MB) spread across the
devices.
This is known as multiuser sequential workload.
The host operating system, device drivers, or
storage array may fracture large I/Os into smaller
I/Os.
It is common in default Linux configurations to
fracture large I/Os into smaller ones (up to 32 KB).
This level of I/O fracturing can have a disastrous
effect on the total throughput.
The implementation of query rewrite has a positive
effect on minimizing I/O requests.
1-26 Copyright 2005, Oracle. All rights reserved.
SELECT sum(sales_amount)
FROM sales
WHERE sales_date
BETWEEN 01-MAR-2005 AND 31-MAY-2005;
Minimizing I/O Requests
Only the relevant partitions are accessed.
Optimizer knows or finds the relevant
partitions.
Static pruning uses known values in advance.
Dynamic pruning uses internal recursive SQL
to find the relevant partitions.
It provides order of magnitude performance
gains.
Partition pruning
SALES
2005-JAN
2005-FEB
2005-MAR
2005-APR
2005-MAY
2005-JUN
1-27 Copyright 2005, Oracle. All rights reserved.
Minimizing I/O Requests
Bitmap indexes are usually 3 to 20 times
smaller than B-tree indexes.
They are ideal for set-based operations.
Star transformation uses bitmap indexes to
identify base table records of interest.
Full table access is replaced with bitmap
index access.
Bitmap indexes minimize I/O.
Bitmap indexes
<Blue, <rowid>, 1000100100010010100>
<Green, <rowid>, 0001010000100100000>
<Red, <rowid>, 0100000011000001001>
<Orange, <rowid>, 0010001000001000010>
1-28 Copyright 2005, Oracle. All rights reserved.
Minimizing I/O Requests

Query rewrite:
Reduces I/O requests by employing materialized
views with precomputed aggregates and joins
Transforms a SQL statement expressed in terms
of tables or views into a statement accessing
materialized views based on the detail tables
The transformation is transparent.
Is implemented using materialized views that can
be added or dropped like indexes without
invalidating your SQL statements
1-30 Copyright 2005, Oracle. All rights reserved.
I/O Scalability
Reduces response time for data-intensive operations
on large databases
Benefits systems with the following characteristics:
Multiprocessors, clusters, or massively parallel systems
Sufficient I/O bandwidth
Sufficient memory to support memory-intensive
processes such as sorts, hashing, and I/O buffers
Data on disk
Query servers
Coordinator
Dispatch
work
Sort Q4
Sorters (Aggregators) Scanners
Parallel execution:
Sort Q3
Sort Q2
Sort Q1 Scan
Scan
Scan
Scan
1-31 Copyright 2005, Oracle. All rights reserved.
I/O Scalability
Automatic Storage Management (ASM)
Configuring storage for a DB depends on many
variables:
Which data to put on which disk
Logical unit number (LUN) configurations
DB types and workloads; data warehouse, OLTP,
DSS
Trade-offs between available options
ASM provides solutions to storage issues
encountered in data warehouses.
1-32 Copyright 2005, Oracle. All rights reserved.
I/O Scalability

Automatic Storage Management: Overview
Portable and high-performance
cluster file system
Manages Oracle database files
Data spread across disks
to balance load
Integrated mirroring across
disks
Solves many storage
management challenges
ASM
File
system
Volume
manager
Operating system
Application
Database
1-33 Copyright 2005, Oracle. All rights reserved.
I/O Scalability
ASM benefits
Stripes files rather than
logical volumes
Online disk reconfiguration
and dynamic rebalancing
Provides redundancy on a
file basis
Automatic database file
management
EM-based graphical
management interface
Hot spots and manual I/O
tuning eliminated
1-34 Copyright 2005, Oracle. All rights reserved.
I/O Scalability
Real Application Clusters
Real Application Clusters (RAC) provides linear
scalability and availability for data warehouses.
RAC provides redundancy so that if a node goes
down, the other nodes will continue to execute.
RAC nodes can share all work equally or perform
dedicated tasks such as ETL or query processing.
1-35 Copyright 2005, Oracle. All rights reserved.
Typical Data Warehouse Cluster
16-port switch
16-port switch
1 Gigabit Ethernet interconnects
Sixteen storage arrays,
each with 1020 disks
Four nodes, each
with four 2 GHz
CPUs
1-36 Copyright 2005, Oracle. All rights reserved.
Parallel Execution with RAC
Execution slaves have node affinity with the execution
coordinator, but will expand if needed.
Execution
coordinator
Parallel
execution
server
Shared disks
Node 4 Node 1 Node 2 Node 3
1-37 Copyright 2005, Oracle. All rights reserved.
Summary
In this lesson, you should have learned how to:
Differentiate OLTP and data warehousing design
techniques
Describe effective data warehouse design
Identify data warehousing schemas
Explain implementation models
List data warehousing objects

Vous aimerez peut-être aussi