Vous êtes sur la page 1sur 28

Data Warehouse

Databases
Objectives
► At the end of this lesson, you will know
:
 Characteristics of warehouse databases
 Types of warehouse databases
 The strengths and limitations of each type
 Examples of warehouse databases
 Recommendations on databases for data
warehouse
Databases for Data Warehouse -
Characteristics
► Designed for analytical, DSS tasks
► Typically, historic, non-volatile data
► Subject-oriented, integrated, detail and
summary data. Data marts often include
summarized data
► Small number of users, complex queries
► Large tables, frequent multi-table joins.
Requirement for high, sustained data flow from
large fact tables
► Updates are additive with periodic data refresh
Critical features
► Scalability and Portability
 Scalable to multiple concurrent users and terabytes
of data
 Support SMP, MPP, NUMA and Clusters
► Query Performance
 High performance for transactions and queries
 Extensions to SQL to enhance performance
 DSS specific indexing and join methods
► Parallelism
 Should perform operations in parallel utilizing
multiple processors on the machine
 Parallel everything - query, index, load
Critical Features

► Data Management
 Support data partitioning methods -
range, hash
 High throughput for bulk loads and selects
► Administration
 Easy to use GUI for administration tool
 Availability of cost-based optimization
Warehouse Databases - Types
► Relational
 Central Warehouse is usually relational because of
potentially large size of data warehouse
► Multidimensional
 faster response to analytical queries and OLAP
computations but they have size limitations
► Hybrid Architecture
 Uses relational component to support large databases
and multidimensional component for fast response to
analytical queries
Relational Database for
Warehouse Strengths
► Scalable to multi-terabytes, portable
among many platforms
► High speed query processing using
SMP, MPP, clustered multiprocessors
► Detailed and aggregated data stored
in same database
Relational Database for Warehouse
Strengths

► Enhanced for data warehouse - data


replication, parallelization, query
optimization, bit-mapped indexes, cost-
based optimizer, SQL extensions for OLAP
► Supports open system standards like SQL,
ODBC
► Supported by large number of third party
vendors
Relational Database for Warehouse
Limitations
► Originally optimized for OLTP
► Slower than MDB databases for OLAP calculations
and ad hoc analysis of data in multiple
dimensions
► Limitations of SQL - cannot perform
 what ifs
 rankings
 cross dimensional calculations
 variances
 multi-level hierarchical rollups
Relational Database for Warehouse
Limitations
► Sophisticatedtechniques required to
overcome SQL limitations
► Star and Snowflake schemas require
complex administration
RDBMS for Warehouse -
Selection
► Scalabilityto support VLDBs and large
number of concurrent users performing
complex analyses
► Support for advanced parallel processing
and partitioning techniques
► Integration with ETL tools
► Integration with MDDBs
RDBMS for Warehouse -
Selection
► Integration with data access and analysis
tools
► Integration with local and central metadata
► Star schema and multidimensional
extensions to SQL to support OLAP
calculations, variances etc.
► Portability, security, data integrity,
backup/restore
Very Large Databases- VLDBs
►A Data Warehouse is 5 to 50 times the size
of an OLTP database because of
 Historical Content
 Summarization and Aggregations
 Special requirements of MDDBs, Data
Movement and Cleansing tools etc
► Average Data Warehouse grows 6-7 times in
18 months.
Partitioning and Parallelism
Key approaches for VLDBs
► Parallelism
 DBMS should carry out backups, loads etc in parallel
 Queries with UNIONS, GROUPS etc can be processed in
parallel
► Partitioning
 Breakup tables into chunks for backups, loads
 Crashes normally affect only one partition
Types of Partitioning
► Range Partitioning
 Based on a range of values
► Round Robin Partitioning
 Data split in the order of inserts
► Hash Partitioning
 Data split using a hash key
► Expression Partitioning
 Data split using a where clause like
expression
Query Performance
► Data Warehouse DBMS should support
Advance Indexing Techniques
 Bitmapped Indexes
 Star Joins
 Hash Joins
► Data Warehouse DBMS should also derive
maximum advantage from parallelism
Examples of RDBMS for
Warehouse
► Oracle 8.x
► IBM DB2 UDB
► Informix Dynamic Server w/ Extended
Parallel Option , Universal Data Option
► Informix Red Brick Warehouse
► Sybase Adaptive Server, IQ
► NCR Teradata
Multidimensional Databases
Warehouse
Data
Admin Tool
Access
Tools
RDBMS
Source Extract
Databases Transform
Local
Load
Metadata

Metadata Data
Repository Access
Tools

Data
Local
Modeling
Metadata
Tool
Components of MDDB
► Source data - may be accessed from an
RDBMS or directly from source databases
► Multidimensional database server - contains
base data, pre-calculated results stored in
multidimensional array, indexing structure to
access the data
► Metadata - local data definitions and
semantics in business terms
► Multidimensional viewer - end-user data
access tool
MDDB for Warehouse -
Strengths
► Highperformance, sophisticated
multidimensional calculations
 Aggregations
 matrix calculations
 Proprietary extended features - row-level
calculations
MDDB for Warehouse - Strengths
► Optimized for OLAP. Appropriate for DSS :
 Support for complex, cross-dimensional calculations
 OLAP-aware functions
 Drill-down for
►iterative queries
►trend analysis
►what-if analysis
►Rapid pivoting from dimension to dimension
► Simple database administration
► Efficient data storage: less space than RDBMS
MDDB for Warehouse - Limitations
► Requires proprietary database solution
► Limitations to size of raw data. Size limit is due to time
required to pre-compute large multidimensional arrays.
► Many MDBs cannot load data incrementally
► Some MDBs require pre-computation of all data
► Static, pre-computed dimensions and calculations
► Performance degradation if database size increases
► Lack of standard MDB model or access method
Examples of MDDBs
► Essbase - Arbor Software
► SAS Multidimensional Database Server
► Commander Decision - Comshare
► Acumate ES - Kenan Systems
► Pilot Decision Support Suite - Cognizant
► FOCUS/Fusion - Information Builders
Hybrid Architecture
► Combination of RDBMS and MDB controlled by
OLAP Server
 RDBMS used for detailed data stored in
large databases
 MDB used for fast, read/write OLAP analysis
and calculations
 OLAP Server routes queries to either RDBMS
or MDB
 Result set from RDBMS may be processed
on-the-fly in Server
Hybrid OLAP Architecture
Warehouse
Admin Tool

RDBMS
Source Extract
Databases Transform
OLAP Data
Load
Server Access
Tools

MDB
Metadata
Repository

Data
Modeling
Tool
Examples of Hybrid Products
► MicrosoftOLAP Server (Plato)
► Oracle Express with ROLAP Option
► Holos from Seagate Software
► GentiaDB from Gentia Software
► Whitelight OLAP from Sybase (reseller)
Recommendations
Databases For Data Warehouses
► UseRDBMS, MDB, and hybrid databases to meet
specialized requirements of groups of end users
► UseRDBMS and ROLAP tools to provide
multidimensional views of large target databases
► UseRDBMS features like parallelization, bit-mapped
indexes for acceptable performance
► UseMDB for high-performance analysis of moderate size
databases
► Usehybrid approach for applications that require access
to detail data and fast OLAP computations
Questions