Vous êtes sur la page 1sur 36

Scaling To Infinity

Partitioning Data Warehouses


on Oracle Database
Session UGF-3587
Sunday 28-Sep 2014
Tim Gorman - www.Delphix.com
Oracle Open World 2014

Agenda

Stating the problem


Brief overview of star schemas
Brief overview of star transformations
The virtuous cycle
The death spiral
EXCHANGE PARTITION load technique in detail
More EXCHANGE PARTITION ideas

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

A short story

Background: DBA support for about 70 data warehouse


databases at a large telecommunications company
o Emails about out of space in TEMP tablespace scraped out of
alert.log file in a QA/TEST database
Offshore DBAs already reacting by killing sessions

o QA/TEST database already has 56G temp, and PROD has


almost 300G temp
o Reviewing AWR reports reveals a parallel aggregated query
against a view comprising a star schema between a fact
table and 33 dimension tables
Further analysis reveals that none of the dimension-key columns on
the fact table supported by bitmap indexes
Thus, the star join consists of a partition-pruned FULL table scan
on the fact, followed by 33 HASH joins in parallel to the dimensions

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

A short story

When asked about the removed bitmap indexes


o An Informatica ETL developer commented
I dont know why all those indexes were removed

It was because the bitmap indexes made ETL data loading


into the fact table utterly impossible
o Project manager confirmed this, adding
Once the indexes were dropped, everything worked great

o DW architect added
This setup has been running in PROD for 10 months!
o In other words, everything is OK! Stop causing trouble!

o When asked how the end-users felt about performance


not good at all bring on Netezza, Exadata, etc...

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

Why Star Schemas?


BI analysts just want a big spreadsheet
o Lots and lots of attribute and measure columns
Attributes characterize the data
Measures are usually additive and numeric

Dimensional data model is really just that


spreadsheet
o Normalized to recursive depth of one
More sophisticated models might normalize further (snowflake)
Normalized entities are dimension tables
o Columns are primary key and attribute columns
Original spreadsheet is the fact table
o Columns are foreign-keys to dimensions and measures

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

Why Star Schemas?


Transactional
Operational
Entity-Relational
Modeling

Dimensional
Modeling

Customers

Suppliers

Suppliers Dim Products Dim

Orders

Products

Order Facts

Order Lines

www.Delphix.com

Customers Dim

Oracle Open World 2014

Time Dim

28-Sep 2014

Why Star
Transformations?
Dim
Table1

Star transformation
compared to other
join methods (NL,
SM, HA):

Dim
Table2

Dim
Table3
Dim
Table4
www.Delphix.com

Fact
table

Oracle Open World 2014

Filter result set in one


of the dimension
tables
Join from that
dimension table to the
fact along a lowcardinality dimension
key
Join back from fact to
other dimensions
using dimension PK
o Filtering rows
retrieved along the
28-Sep 2014
way

Why Star
Transformations?
Dim
Table1
Dim
Table2
Dim
Table3
Dim
Table4
www.Delphix.com

Star
transformation:
Filter on query
set in each
dimension
Merge result set
from all
dimensions
Join to the fact
from merged
result set, using
Oracle Open World 2014 BITMAP MERGE
28-Sep 2014

Why Star
Transformations?
Point: Single-column bitmap indexes on
dimension-key columns are required for star
transformation
Counter-point: Bitmap indexes become
impossible to load/maintain when data volume
increases past dozens of Gb
Catch-22? Does this mean that Oracle cannot
handle large data warehouses?

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

The Virtuous Cycle

Non-volatile time-variant data implies


o Data warehouses are INSERT only

Insert-only data warehouses implies


o Tables and indexes range-partitioned by a DATE column

Tables range-partitioned by DATE enables


o Data loading using EXCHANGE PARTITION load technique
o Incremental statistics gathering and summarization

Data loading using EXCHANGE PARTITION enables


o Direct-path (a.k.a. append) inserts
o Partitions organized into time-variant tablespaces
o Data purging using DROP/TRUNCATE PARTITION instead of
DELETE
o Bitmap indexes and bitmap-join indexes
o Elimination of ETL load window and 24x7 availability for
queries
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

10

The Virtuous Cycle

Direct-path (a.k.a. append) inserts enable


o
o
o
o

Load more data, faster, more efficiently


Optional NOLOGGING on inserts
Basic table compression (9i and above)
Eliminates contention in Oracle Buffer Cache during data
loading

Optional NOLOGGING inserts enable


o Option to generate less redo during data loads

Basic table compression enables


o Less space consumed for tables and indexes
o Fewer I/O operations during queries

Partitions organized into time-variant tablespaces enable


o READ ONLY tablespaces for older, less-volatile data

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

11

The Virtuous Cycle

READ ONLY tablespaces for older less-volatile data


enables
o Tiered storage
o Backup efficiencies

Data purging using DROP/TRUNCATE PARTITION enables


o Faster more efficient data purging than using DELETE
statements

Bitmap indices enable


o Star transformations

Star transformations enable


o Optimal query-execution plan for dimensional data models
o Bitmap-join indexes

Bitmap-join indexes enable


o Further optimization of star transformations
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

12

The Death Spiral

ETL using conventional-path INSERT, UPDATE, and DELETE


operations
Conventional-path operations are trouble with:
o High-volume data loads
o Contention in Shared Pool, Buffer Cache, global structures
Mixing of queries and loads simultaneously on table and indexes
Periodic rebuilds/reorgs of tables if deletions occur
Full redo and undo generation for all inserts, updates, and deletes

o Bitmap indexes and bitmap-join indexes


Modifying bitmap indexes is slow
Prone to locking issues in concurrency situations

ETL will dominate the workload in the database


o Queries will consist mainly of dumps or extracts to downstream
systems
o Query performance will be abysmal and worsening
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

13

The Death Spiral

Without partitioning and insert-only (insert mostly?)


o Query performance worsens as tables/indexes grow larger
o ETL performance worsens as
Loads must be performed into live tables
o Users must be locked out during load cycle
o In-flight queries must be killed during load cycle
o Bitmap indexes must be dropped/rebuilt during load cycle
o Entire tables must be re-analyzed during load cycle

o Entire database must be backed up frequently


No change for setting tablespaces to READ ONLY
Data cannot be right-sized to storage options according to IOPS

Everything just gets harder and harder to do


o and that stupid Oracle software is to blame
o Bring on Exadata or Netezza or <expensive-flavor-of-the-month>
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

14

Exchange Partition

The technique of bulk-loading new data into a temporary


swap table, which is then published using the
EXCHANGE PARTITION operation, should be the default load
technique for all large tables in a data warehouse
o fact tables
o slowly-changing dimensions

Assumptions for this upcoming example:


o Composite partitioned fact table named TXN
Range partitioned on DATE column TXN_DATE
Hash sub-partitioned on NUMBER column ACCTKEY
Data to be loaded into partition P20140225 on TXN

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

15

step 1
Hashpartitioned
tableTXN_SWAP

Compositepartitioned
tableTXN

CREATE TABLE
TXN_SWAP
AS SELECT
FROM TXN
PARTITION
(P20140225)

22Feb
2014
www.Delphix.com

23Feb
2014

24Feb
2014

25Feb
2014
Oracle Open World 2014

28-Sep 2014

16

step 2
Hashpartitioned
tableTXN_SWAP

Compositepartitioned
tableTXN

Load

Load

Load

22Feb
2014
www.Delphix.com

23Feb
2014

24Feb
2014

25Feb
2014
Oracle Open World 2014

28-Sep 2014

17

step 3
Hashpartitioned
tableTXN_SWAP

Compositepartitioned
tableTXN

CREATE
INDEX

CREATE
INDEX

CREATE
INDEX
22Feb
2014
www.Delphix.com

23Feb
2014

24Feb
2014

25Feb
2014
Oracle Open World 2014

28-Sep 2014

18

step 4
Hashpartitioned
tableTXN_SWAP

Compositepartitioned
tableTXN

EXCHANGE
PARTITIO
N

22Feb
2014
www.Delphix.com

23Feb
2014

24Feb
2014

25Feb
2014
Oracle Open World 2014

28-Sep 2014

19

step 5
Hashpartitioned
tableTXN_SWAP

Compositepartitioned
tableTXN

Gather
partition
statistics
for table,
columns,
indexes

22Feb
2014
www.Delphix.com

23Feb
2014

24Feb
2014

25Feb
2014
Oracle Open World 2014

28-Sep 2014

20

Exchange Partition
1. Create temporary table TXN_SWAP as a hashpartitioned table
2. Perform parallel, direct-path load of new data into
TXN_SWAP
Perform any other DML needed to prepare data in
TXN_SWAP for publishing into the TXN table
3. Create indexes on TXN_SWAP corresponding to the local
indexes on TXN
4. Exchange partition to publish new data to TXN
alter table TXN
exchange partition P20140225 with table TXN_SWAP
including indexes update global indexes;

5. Gather CBO statistics on table TXN partition P20140225


DBMS_STATS preference INCREMENTAL will gather
partition-level statistics as well as updating globallevel statistics
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

21

Exchange Partition

It is a good idea to encapsulate this logic inside PL/SQL


packaged- or stored-procedures:
SQL>executeexchpart.prepare(TXN,_SWAP,
225FEB2014);
SQL>altersessionenableparalleldml;
SQL>insert/*+appendparallel(n,4)*/
2intotxn_swapn
3select/*+full(x)parallel(x,4)*/*
4fromstage_txn_factx;
SQL>commit;
SQL>executeexchpart.finish(TXN,_SWAP);

EXCHPART package (exchpart.sql) posted at


http://www.EvDBT.com/scripts
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

22

The dribble effect


In real-life, data loading is often much
messier than shown here

o For example, for a daily load frequency, data to


be loaded during the 25-Feb load cycle might
consist of:

950,000 rows for the 25-Feb partition


4,500 rows for the 24-Feb partition
400 rows for the 23-Feb partition
700 rows for the 22-Feb partition
200 rows for the 21-Feb partition
100 rows for the 20-Feb partition
and 3 rows for the 07-Jan partition

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

23

The dribble effect


How can this be handled?
o One suggestion:

1.Use EXCHPART package to load the data for the 25Feb and 24-Feb partitions
2.Load the data to the remainder of the partitions by
just inserting (conventional-path) directly into the
partitioned table

o Must determine a threshold when to use


EXCHPART, and when to simply insert rows

. Data volume is the metric


. Threshold value of N rows varies due to many
factors
o Number of bitmap indexes on partitioned table?
o Using compression during load or not?
o Degree of parallelism?
o Are there any global indexes?

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

24

The dribble effect


Example: Use EXCHANGE PARTITION when rows-to-be-loaded >
1000, else just use conventional INSERT
for d in (select trunc(txn_dt) dt, count(*) cnt from EXT_STAGE_TXN
group by trunc(txn_dt)) loop
-if d.cnt > 1000 then
-exchpart.prepare(TXN,_SWAP||to_char(d.dt,YYYYMMDD), d.dt);
insert /*+ append parallel(n,16) */ into TXN_20140224 n
select /*+ parallel(x,16) */ * from EXT_STAGE x
where x.txn_dt >= d.dt and x.txn_dt < d.dt + 1;
exchpart.finish(TXN, TXN_||to_char(d.dt,YYYYMMDD));
exchpart.drop_indexes(TXN_||to_char(d.dt,YYYYMMDD));
insert /*+ append parallel(n,16) */ into TXN_20140224 n
select /*+ parallel(x,16) */ * from EXT_STAGE x
where x.txn_dt >= d.dt and x.txn_dt < d.dt + 1;
-else
-insert into TXN
select * from ext_stage
where txn_dt >= d.dt and txn_dt < d.dt + 1;
-end if;
-end loop;
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

25

Slowly-changing
dimensions
Loading time-variant fact and dimension tables is
not the only load activity in most data
warehouses
o Often, some tables contain current or point-in-time data
Example: type-1 dimension derived from type-2 dimension

With each load cycle loading new data into a new


partition of the type-2 dimension, the type-1
dimension needs to be updated
o Instead of performing transactional MERGE (i.e. Update
or Insert) logic directly on the table
Rebuild the table into a temporary table, then swap it in
using EXCHANGE PARTITION

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

26

Merge logic to update


SCDs
The MERGE statement to update the type-1 dimension
(CURR_ACCT_DIM) from the just-loaded partition of
the type-2 dimension (ACCT_DIM) could look like
merge into curr_acct_dim
using (select * from acct_dim
where eff_dt >= 25-FEB-2014
and
eff_dt < 26-FEB-2014)
when matched then update set ...
when not matched then insert ...;

Simple to write, but sloooowwwwwwwww


www.Delphix.com

Oracle Open World 2014

28-Sep 2014

27

ExchPart instead of
MERGE
CURR_ACCT_DIM
(type1dimension)

ACCT_DIM
(type2dimension)

ACCT_DIM_SWAP

Truncatedas
webegin

Data
current
asof24
Feb
22Feb
2014

23Feb
2014

www.Delphix.com

24Feb
2014

25Feb
2014

Justloaded25Febdata
Oracle Open World 2014

28-Sep 2014

28

ExchPart instead of
MERGE
INSERT/*+appendparallel(t,8)*/INTOTMP_CURR_ACCOUNT_DIMT
SELECT/*+parallel(x,8)*/(list of columns)
FROM (SELECT/*+parallel(y,8)*/(list of columns),
ROW_NUMBER()over(PARTITIONBYacctkey
ORDERBYeffdtdesc)rn
FROM(SELECT/*+parallel(z1,8)*/(list of columns)
FROMCURR_ACCOUNT_DIMz1
UNIONALL
SELECT/*+parallel(z2,8)*/(list of columns)
FROMACCOUNT_DIMpartition(P20140225)z2)y)x
WHEREx.RN=1;
1.Inner-most query pulls changed data from type-2, merged with existing data
from type-1
2.Middle query ranks within ACCTKEY values, sorted by EFFDT DESC
3.Outer-most query selects only latest row for each ACCTKEY and passes to
INSERT

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

29

ExchPart instead of
MERGE
CURR_ACCT_DIM
(type1dimension)

ACCT_DIM
(type2dimension)

All rows
All
rows

22Feb
2014

23Feb
2014

www.Delphix.com

24Feb
2014

ACCT_DIM_SWAP

unio
n all

filter

25Feb
2014

Oracle Open World 2014

28-Sep 2014

30

ExchPart instead of
MERGE
CURR_ACCT_DIM
(type1dimension)

ACCT_DIM
(type2dimension)

ACCT_DIM_SWAP

CREAT
E
INDEX
CREAT
E
INDEX
CREAT
E
INDEX
22Feb
2014

23Feb
2014

www.Delphix.com

24Feb
2014

25Feb
2014

Oracle Open World 2014

28-Sep 2014

31

ExchPart instead of
MERGE
CURR_ACCT_DIM
(type1dimension)

ACCT_DIM
(type2dimension)

ACCT_DIM_SWAP

EXCHANGE
PARTITIO
N

22Feb
2014

23Feb
2014

www.Delphix.com

24Feb
2014

25Feb
2014

Oracle Open World 2014

28-Sep 2014

32

ExchPart instead of
MERGE
CURR_ACCT_DIM
(type1dimension)

ACCT_DIM
(type2dimension)

ACCT_DIM_SWAP

Gather
partition
statistics
for table,
columns,
indexes

22Feb
2014

23Feb
2014

www.Delphix.com

24Feb
2014

25Feb
2014

Oracle Open World 2014

28-Sep 2014

33

Summary
1. Data warehouses use star schemas
2. Star schemas are best queried using star
transformations
3. Star transformation requires bitmap indexes
o

Bitmap indexes or bitmap-join indexes

4. Large bitmap indexes become infeasible without


using the EXCHANGE PARTITION load technique
5. INSERT is always faster than UPDATE, DELETE, or
MERGE
Data loading using EXCHANGE PARTITION is the key to
unlock infinite scalability
www.Delphix.com

Oracle Open World 2014

28-Sep 2014

34

KScope15
http://www.KScope15.com
#ODTUG
#KScope15
Conference for EPM, APEX, ADF, BI, Oracle developers and DBAs

www.Delphix.com

Oracle Open World 2014

28-Sep 2014

35

Q&A

Session: UGF-3587
Email: Tim.Gorman@Delphix.com
Twitter: @TimothyJGorman
Blog: EvDBT.com
o Papers:
EvDBT.com/papers
o Scripts:
EvDBT.com/scripts
o Videos: EvDBT.com/videos
www.Delphix.com

including this presentation!

Oracle Open World 2014

28-Sep 2014

36