Vous êtes sur la page 1sur 6

Certification

Database
Data Modeling
Data Processing
Analysis

Login
Community
Discuss

Incremental Loading for Fact Tables


Written by DWBIConcepts Team
Last Updated: 18 June 2014
In the previous articles, we have discussed the general concepts of incremental data
loading as well as how to perform incremental data loading for dimension tables. In this article
we will discuss the methods and issues of loading data incrementally in Fact tables of a data
warehouse.

METHOD OF LOADING
Generally speaking, incremental loading for Fact tables is relatively easier as, unlike
dimension tables, here you do not need to perform any look-up on your target table to find
out if the source record already exists in the target or not. All you need to do is to select
incremental records from source (as shown below for the case of "sales" table) and load them
as it is to target (you may need to perform lookup to dimensional tables to assign respective
surrogate keys - but that's a different story).
Like before we will assume we have a "sales" table in the source
Sales Table

ID CustomerID ProductDescription Qty Revenue Sales Date


1 1 White sheet (A4) 100 4.00 22-Mar-2012
2 1 James Clip (Box) 1 2.50 22-Mar-2012
3 2 Whiteboard Marker 1 2.00 22-Mar-2012
4 3 Letter Envelop 200 75.00 23-Mar-2012
5 1 Paper Clip 12 4.00 23-Mar-2012

Given this table, a typical extraction query will be like this:


SELECT t.*
FROM Sales t
WHERE t.sales_date > (select nvl(
max(b.loaded_until),
to_date('01-01-1900', 'MM-DD-YYYY')
)
from batch b
where b.status = 'Success');

where "batch" is a separate table maintained at target system having minimal structure and
data like below

Batch_ID Loaded_Until Status


1 22-Mar-2012 Success
2 23-Mar-2012 Success

However, things may get pretty complicated if your fact is a special type of fact called
"snapshot fact". Let's understand them below.

Loading Incident Fact


Incident fact is the normal fact that we encounter mostly (and that we have seen above in
our sales table example). Records in these types of facts are only loaded if there are
transactions coming from the source. For example, if at all there is one sale that happens in
the source system, then only a new sales record will come. They are dependent on some real
"incident" to happen in the source hence the name incident fact.

Loading Snapshot Fact


As opposed to incident fact, snapshot facts are loaded even if there is no real business incident
in the source. Let me show you what I mean by using the above example of customer and
sales tables in OLTP. Let's say I want to build a fact that would show me total revenue of sales
from each customer for each day. In effect, I want to see the below data in my fact table.
Sales fact table (This is what I want to see in my target fact table)

Date Customer Revenue


22-Mar-2012 John 6.50
22-Mar-2012 Ryan 2.00
23-Mar-2012 John 10.50
23-Mar-2012 Ryan 2.00
23-Mar-2012 Bakers' 75.00

As you see, even if no sales was made to Ryan on 23-Mar, we still show him here with the
old data. Similarly for John, even if goods totaling to $4.00 was sold to him on 23-Mar, his
record shows the cumulative total of $10.50.
Now obviously the next logical question is how to load this fact using incremental loading?
Because incremental loading only brings in incremental data - that is on 23rd March, we will
only have Bakers' and John's records and that too with that day's sales figures. We won't
have Ryan record in the incremental set.

Why not a full load

You can obviously opt-in for full load mechanism as that would solve this problem but that
would take the toll on your loading performance.

Then what's the solution?

One way to resolve this issue is: creating 2 incremental channels of loading for the fact. 1
channel will bring in incremental data from source and the other channel will bring in
incremental data from the target fact itself. Let's see how does it happen below. We will take
the example for loading 23-Mar data.

Channel 1: Incremental data from source

Customer Revenue
John 4.00
Bakers' 75.00

Channel 2: Incremental data from target fact table (last day's record)

Customer Revenue
John 6.50
Ryan 2.00

Next we can perform a FULL OUTER JOIN between the above two sets to come to below result

John 10.50
Ryan 2.00
Bakers' 75.00
Prev

Next

Are you able to solve this?

Which one is not an SQL Join?

Half Outer Join

Full Outer Join

Left Outer Join

Inner Join
Submit

Popular
Top 20 SQL Interview Questions with Answers
Best Informatica Interview Questions & Answers
Top 50 Data Warehousing/Analytics Interview Questions and Answers
Top 50 DWBI Interview Questions with Answers - Part 2
Performance Considerations for Dimensional Modeling
Top 30 BusinessObjects interview questions (BO) with Answers

Also Read
OLTP and OLAP
What is Data Warehousing?
What is a data warehouse - A 101 guide to modern data warehousing
A road-map on Testing in Data Warehouse
Top 5 Challenges of Data Warehousing

Have a question on this subject?


Ask questions to our expert community members and clear your doubts. Asking question or
engaging in technical discussion is both easy and rewarding.
Ask a Question, we'll Answer

Are you on Twitter?


Start following us. This way we will always keep you updated with what's happening in Data
Analytics community. We won't spam you. Promise.

How to find out Expected Time of Completion for an Oracle Query


Too often we become impatient when Oracle Query executed by us does not seem to
return any result. But Oracle (10g onwards) gives us an option to check how long a
query will run, that is, to find out expected time of completion for a query.

Install Hive in Client Node of Hadoop Cluster

In the previous article, we have shown how to setup a client node. Once this is done,
now let's put Hadoop to use for some big data analytics purpose. One way to do that is
by using Hive which let's us run SQL queries against the big data. A...

The 101 Guide to Dimensional Data Modeling

In this multi part tutorial we will learn the basics of dimensional modeling and we will
see how to use this modeling technique in real life scenario. At the end of this tutorial
you will become a confident dimensional data modeler.

Understanding Oracle QUERY PLAN - A 10 minutes guide

Confused about how to understand Oracle Query Execution Plan? This 10 minutes step
by step primer is the first of a two part article that will teach you exactly the things you
must know about Query Plan.

Exception Handling While Reading Multiple XML Files in Data Services

This article will demonstrate loading multiple XML files using SAP Data Services
including Exception Handling.

About Us
Data Warehousing and Business Intelligence Organization - Advancing Business
Intelligence

DWBI.org is a professional institution created and endorsed by veteran BI and Data Analytics
professionals for the advancement of data-driven intelligence
Join Us | Submit an article | Contact Us
Copyright

Except where otherwise noted, contents of DWBI.ORGby Intellip LLP is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Privacy Policy | Terms of Use
Get in touch
Security

Vous aimerez peut-être aussi