1-2 Copyright 2005, Oracle. All rights reserved. Objectives After completing this lesson, you should be able to do the following: Differentiate OLTP and data warehousing design techniques Describe effective data warehouse design Identify data warehousing schemas Explain implementation models List data warehousing objects 1-3 Copyright 2005, Oracle. All rights reserved. Characteristics of a Data Warehouse A data warehouse is a database designed for querying, reporting, and analysis. A data warehouse contains historical data derived from transaction data. Data warehouses separate analysis workload from transaction workload. A data warehouse is primarily an analytical tool. 1-4 Copyright 2005, Oracle. All rights reserved. Comparing OLTP and Data Warehouses OLTP Many Comparatively lower Normalized DBMS Rare Some Large amount Denormalized DBMS Common Data Warehouse Data accessed by queries Joins Duplicated data Derived data and aggregates 1-6 Copyright 2005, Oracle. All rights reserved. Data Warehouse Architectures Basic Data Warehouse Analysis Reporting Data mining Operational systems Flat files Materialized views Metadata Raw data 1-7 Copyright 2005, Oracle. All rights reserved. Data Warehouse Architectures Data Warehouse with Staging Area Analysis Reporting Data mining Flat files Materialized views Metadata Raw data Operational systems Staging area 1-8 Copyright 2005, Oracle. All rights reserved. Data Warehouse Architectures Data Warehouse with Staging Area Reporting Data mining Flat files Materialized views Metadata Raw data Operational systems Staging area Sales Purchasing Inventory Analysis 1-9 Copyright 2005, Oracle. All rights reserved. Data Warehouse Design Key data warehouse design considerations: Identify the specific data content. Recognize the critical relationships within and between groups of data. Define the system environment supporting your data warehouse. Identify the required data transformations. Calculate the frequency at which the data must be refreshed. 1-10 Copyright 2005, Oracle. All rights reserved. Logical Design A logical design is conceptual and abstract. Entity-relationship (ER) modeling is useful in identifying logical information requirements. An entity represents a chunk of data. The properties of entities are known as attributes. The links between entities and attributes are known as relationships. Dimensional modeling is a specialized type of ER modeling useful in data warehouse design. 1-12 Copyright 2005, Oracle. All rights reserved. Oracle Warehouse Builder Oracle Database provides tools to implement the ETL process. Oracle Warehouse Builder is a tool to help in this process. Oracle Warehouse Builder generates the following types of code: SQL data definition language (DDL) scripts PL/SQL programs SQL*Loader control files XML Processing Description Language (XPDL) ABAP code (used to extract data from SAP systems) 1-13 Copyright 2005, Oracle. All rights reserved. Data Warehousing Schemas Objects can be arranged in data warehousing schema models in a variety of ways: Star schema Snowflake schema Third normal form (3NF) schema Hybrid schemas The source data model and user requirements should steer the data warehouse schema. Implementation of the logical model may require changes to enable you to adapt it to your physical system. 1-14 Copyright 2005, Oracle. All rights reserved. Schema Characteristics Star schema Characterized by one or more large fact tables and a number of much smaller dimension tables Each dimension table joined to the fact table using a primary key to foreign key join Snowflake schema Dimension data grouped into multiple tables instead of one large table Increased number of dimension tables, requiring more foreign key joins Third normal form (3NF) schema A classical relational-database model that minimizes data redundancy through normalization 1-16 Copyright 2005, Oracle. All rights reserved. Data Warehousing Objects Fact tables Fact tables are the large tables that store business measurements. Dimension tables A dimension is a structure composed of one or more hierarchies that categorizes data. Unique identifiers are specified for one distinct record in a dimension table. Relationships Relationships guarantee integrity of business information. 1-17 Copyright 2005, Oracle. All rights reserved. Fact Tables A fact table must be defined for each star schema. Fact tables are the large tables that store business measurements. A fact table contains either detail-level or aggregated facts. A fact table usually contains facts with the same level of aggregation. The primary key of the fact table is usually a composite key made up of all its foreign keys. 1-18 Copyright 2005, Oracle. All rights reserved. Dimensions and Hierarchies A dimension is a structure composed of one or more hierarchies that categorizes data. Dimensional attributes help to describe the dimensional value. Dimension data is collected at the lowest level of detail and aggregated into higher level totals. Hierarchies are structures that use ordered levels to organize data. In a hierarchy, each level is connected to the levels above and below it. STATE COUNTRY SUBREGION REGION CUSTOMERS dimension hierarchy (by level) CITY CUSTOMER 1-19 Copyright 2005, Oracle. All rights reserved. Dimensions and Hierarchies Dimension table Dimension table TIMES CHANNELS CUSTOMERS #cust_id cust_last_name cust_city cust_state_province PRODUCTS #prod_id Fact table PROMOTIONS Dimension table SALES cust_id prod_id Hierarchy Unique identifier Relationship 1-20 Copyright 2005, Oracle. All rights reserved. Physical Design Relationships Unique identifiers Attributes Entities Tables Integrity constraints - Primary key - Foreign key - Not null Columns Indexes Materialized views Dimensions Logical Physical (Tablespaces) 1-21 Copyright 2005, Oracle. All rights reserved. Data Warehouse Physical Structures Tables and partitioned tables Partitioned tables enable you to split large data volumes into smaller, more manageable pieces. Expect performance benefits from: Partition pruning Intelligent parallel processing Compressed tables offer scaleup opportunities for read-only operations. Table compression saves disk space.
1-22 Copyright 2005, Oracle. All rights reserved. Data Warehouse Physical Structures Views: Are tailored presentations of data contained in one or more tables or views Do not require any space in the database Materialized views: Are query results that have been stored in advance (Like indexes) are used transparently and improve performance Integrity constraints: Are used in data warehouses for query rewrite Dimensions: Are containers of logical relationships and do not require any space in the database 1-23 Copyright 2005, Oracle. All rights reserved. Managing Large Volumes of Data Work smarter in your data warehouse: Partitioning Bitmap indexes/Star transformation Data compression Query rewrite Work harder in your data warehouse: Parallelism for all operations DBA tasks, such as loading, index creation, table creation, data modification, backup and recovery End-user operations, such as queries Unbounded scalability: Real Application Clusters 1-24 Copyright 2005, Oracle. All rights reserved. I/O Performance in Data Warehouses I/O is typically the primary determinant of data warehouse performance. Data warehouse storage configurations should be chosen by I/O bandwidth, not storage capacity. Every component of the I/O subsystem should provide enough bandwidth: Disks I/O channels I/O adapters In data warehouses, maximizing sequential I/O throughput is critical. 1-25 Copyright 2005, Oracle. All rights reserved. Performance of Sequential I/Os In data warehouses, drive arrays generally see random large I/Os (1 MB) spread across the devices. This is known as multiuser sequential workload. The host operating system, device drivers, or storage array may fracture large I/Os into smaller I/Os. It is common in default Linux configurations to fracture large I/Os into smaller ones (up to 32 KB). This level of I/O fracturing can have a disastrous effect on the total throughput. The implementation of query rewrite has a positive effect on minimizing I/O requests. 1-26 Copyright 2005, Oracle. All rights reserved. SELECT sum(sales_amount) FROM sales WHERE sales_date BETWEEN 01-MAR-2005 AND 31-MAY-2005; Minimizing I/O Requests Only the relevant partitions are accessed. Optimizer knows or finds the relevant partitions. Static pruning uses known values in advance. Dynamic pruning uses internal recursive SQL to find the relevant partitions. It provides order of magnitude performance gains. Partition pruning SALES 2005-JAN 2005-FEB 2005-MAR 2005-APR 2005-MAY 2005-JUN 1-27 Copyright 2005, Oracle. All rights reserved. Minimizing I/O Requests Bitmap indexes are usually 3 to 20 times smaller than B-tree indexes. They are ideal for set-based operations. Star transformation uses bitmap indexes to identify base table records of interest. Full table access is replaced with bitmap index access. Bitmap indexes minimize I/O. Bitmap indexes <Blue, <rowid>, 1000100100010010100> <Green, <rowid>, 0001010000100100000> <Red, <rowid>, 0100000011000001001> <Orange, <rowid>, 0010001000001000010> 1-28 Copyright 2005, Oracle. All rights reserved. Minimizing I/O Requests
Query rewrite: Reduces I/O requests by employing materialized views with precomputed aggregates and joins Transforms a SQL statement expressed in terms of tables or views into a statement accessing materialized views based on the detail tables The transformation is transparent. Is implemented using materialized views that can be added or dropped like indexes without invalidating your SQL statements 1-30 Copyright 2005, Oracle. All rights reserved. I/O Scalability Reduces response time for data-intensive operations on large databases Benefits systems with the following characteristics: Multiprocessors, clusters, or massively parallel systems Sufficient I/O bandwidth Sufficient memory to support memory-intensive processes such as sorts, hashing, and I/O buffers Data on disk Query servers Coordinator Dispatch work Sort Q4 Sorters (Aggregators) Scanners Parallel execution: Sort Q3 Sort Q2 Sort Q1 Scan Scan Scan Scan 1-31 Copyright 2005, Oracle. All rights reserved. I/O Scalability Automatic Storage Management (ASM) Configuring storage for a DB depends on many variables: Which data to put on which disk Logical unit number (LUN) configurations DB types and workloads; data warehouse, OLTP, DSS Trade-offs between available options ASM provides solutions to storage issues encountered in data warehouses. 1-32 Copyright 2005, Oracle. All rights reserved. I/O Scalability
Automatic Storage Management: Overview Portable and high-performance cluster file system Manages Oracle database files Data spread across disks to balance load Integrated mirroring across disks Solves many storage management challenges ASM File system Volume manager Operating system Application Database 1-33 Copyright 2005, Oracle. All rights reserved. I/O Scalability ASM benefits Stripes files rather than logical volumes Online disk reconfiguration and dynamic rebalancing Provides redundancy on a file basis Automatic database file management EM-based graphical management interface Hot spots and manual I/O tuning eliminated 1-34 Copyright 2005, Oracle. All rights reserved. I/O Scalability Real Application Clusters Real Application Clusters (RAC) provides linear scalability and availability for data warehouses. RAC provides redundancy so that if a node goes down, the other nodes will continue to execute. RAC nodes can share all work equally or perform dedicated tasks such as ETL or query processing. 1-35 Copyright 2005, Oracle. All rights reserved. Typical Data Warehouse Cluster 16-port switch 16-port switch 1 Gigabit Ethernet interconnects Sixteen storage arrays, each with 1020 disks Four nodes, each with four 2 GHz CPUs 1-36 Copyright 2005, Oracle. All rights reserved. Parallel Execution with RAC Execution slaves have node affinity with the execution coordinator, but will expand if needed. Execution coordinator Parallel execution server Shared disks Node 4 Node 1 Node 2 Node 3 1-37 Copyright 2005, Oracle. All rights reserved. Summary In this lesson, you should have learned how to: Differentiate OLTP and data warehousing design techniques Describe effective data warehouse design Identify data warehousing schemas Explain implementation models List data warehousing objects