Académique Documents
Professionnel Documents
Culture Documents
It is generally practiced that in an enterprise data management system, it is the data warehouse
house which contains static data while it is the operational data store that contains dynamic data
that gets frequently updated during the course of business operations.
To illustrate this further, it important to know that in an enterprise data management system
environment, there may plenty of servers and database systems which constitute various data
stores and these servers may be of varying platforms and database management systems
come different vendors.
Each data store gather data based on the departments they are server or on other special
function that they are designed to do. But during the entire business operation, these servers
send their data to the operational data store which acts as the unifying areas were disparate
data from various data stores are extracted and transformed into a unified structure based on
the enterprise data architecture.
The process of unifying disparate data is referred to as ETL which stands for extract, transform
and load. The extract and transform are mostly done in the operational data store before the
transformed data is "loaded" into the data warehouse. With this picture wherein the data
warehouse only get the loading part, many people get the impression that the data warehouse
indeed is a mere static repository does not do a lot of things except accept data for storage.
In fact, the concept of data warehouse has been taken from the analogy with real life
warehouses where good are put before the need arise to get them. And so with data, the
operational data store goes to the data warehouse to get the data and process them at the
operational data store area. Hence the term operational because it refers to the data currently
being operated on or manipulated with.
But modern data warehouses are no longer as static as they seem or look. Data warehouses
today are already managed by software application tools that have the functionality that allows
the data warehouse itself to track data and perform all sorts of analysis related to the movement
of data from the warehouse to the other data stores and back.
Many data warehouse employ a technology known as Online Analytical Processing (OLAP)
which helps in providing answers to various multidimensional analytical queries. Most areas of
business including business reporting for sales, marketing, management reporting, business
process management (BPM), budgeting and forecasting, financial reporting use OALP for
retrieving information from the data warehouse so that the company can spot trends and
patterns as basis for the corporate decisions.
There are many companies specifically offering data warehousing software solutions which
come with sophisticated proprietary intuitive functions. Many of these vendors even offer
integrated solutions that add data warehousing functions with such complex features as data
transformation, management, analytics and delivery components.
Having an intuitive data warehouse greatly increases overall performance of the enterprise data
management system because the data warehouse can already share some of the load which is
supposed to be for the operational data stores which tackles very labor intensive processes
from the on-going business operations.
The fundamental difference between operational systems and data warehousing systems is that
operational systems are designed to support transaction processing whereas data warehousing
systems are designed to support online analytical processing (or OLAP, for short).
Based on this fundamental difference, data usage patterns associated with operational systems
are significantly different than usage patterns associated with data warehousing systems. As a
result, data warehousing systems are designed and optimized using methodologies that
drastically differ from that of operational systems.
The table below summarizes many of the differences between operational systems and data
warehousing systems.
A comparison of operational systems and data warehousing systems
Disadvantages
There are also disadvantages to using a data warehouse. Some of them are:
• Data warehouses are not the optimal environment for unstructured data.
• Because data must be extracted, transformed and loaded into the warehouse, there is
an element of latency in data warehouse data.
• Over their life, data warehouses can have high costs.
• Data warehouses can get outdated relatively quickly. There is a cost of delivering
suboptimal information to the organization.
• There is often a fine line between data warehouses and operational systems. Duplicate,
expensive functionality may be developed. Or, functionality may be developed in the
data warehouse that, in retrospect, should have been developed in the operational
systems.
ETL Concepts
Extraction, transformation, and loading. ETL refers to the methods involved in accessing and
manipulating source data and loading it into target database.
The first step in ETL process is mapping the data between source systems and target database
(data warehouse or data mart). The second step is cleansing of source data in staging area.
The third step is transforming cleansed source data and then loading into the target system.
• eBay has a 6 1/2 petabyte database running on Greenplum and a 2 1/2 petabyte
enterprise data warehouse running on Teradata
• Facebook has a 2 1/2 petabyte datawarehouse running on Hadoop/Hive
• Walmart has a 2.5 petabytes warehouse, Bank of America has 1.5 petabytes, Dell with 1
petabyte – All running on Teradata
• Yahoo, Fox Interactive Media, TEOCO (which runs outsourced DWs’ for top US telcos) are
all in the hundreds of terabytes range