Vous êtes sur la page 1sur 5

What is Data Warehouse?

A data warehouse is a repository of a business organization's historical data. It is a large part of


an enterprise data management system which consists of several servers running on different
kinds of platforms and database management systems.

It is generally practiced that in an enterprise data management system, it is the data warehouse
house which contains static data while it is the operational data store that contains dynamic data
that gets frequently updated during the course of business operations.

To illustrate this further, it important to know that in an enterprise data management system
environment, there may plenty of servers and database systems which constitute various data
stores and these servers may be of varying platforms and database management systems
come different vendors.

Each data store gather data based on the departments they are server or on other special
function that they are designed to do. But during the entire business operation, these servers
send their data to the operational data store which acts as the unifying areas were disparate
data from various data stores are extracted and transformed into a unified structure based on
the enterprise data architecture.

The process of unifying disparate data is referred to as ETL which stands for extract, transform
and load. The extract and transform are mostly done in the operational data store before the
transformed data is "loaded" into the data warehouse. With this picture wherein the data
warehouse only get the loading part, many people get the impression that the data warehouse
indeed is a mere static repository does not do a lot of things except accept data for storage.

In fact, the concept of data warehouse has been taken from the analogy with real life
warehouses where good are put before the need arise to get them. And so with data, the
operational data store goes to the data warehouse to get the data and process them at the
operational data store area. Hence the term operational because it refers to the data currently
being operated on or manipulated with.

But modern data warehouses are no longer as static as they seem or look. Data warehouses
today are already managed by software application tools that have the functionality that allows
the data warehouse itself to track data and perform all sorts of analysis related to the movement
of data from the warehouse to the other data stores and back.

Many data warehouse employ a technology known as Online Analytical Processing (OLAP)
which helps in providing answers to various multidimensional analytical queries. Most areas of
business including business reporting for sales, marketing, management reporting, business
process management (BPM), budgeting and forecasting, financial reporting use OALP for
retrieving information from the data warehouse so that the company can spot trends and
patterns as basis for the corporate decisions.

There are many companies specifically offering data warehousing software solutions which
come with sophisticated proprietary intuitive functions. Many of these vendors even offer
integrated solutions that add data warehousing functions with such complex features as data
transformation, management, analytics and delivery components.
Having an intuitive data warehouse greatly increases overall performance of the enterprise data
management system because the data warehouse can already share some of the load which is
supposed to be for the operational data stores which tackles very labor intensive processes
from the on-going business operations.

Components of Data Warehousing

Operational systems vs. data warehousing

The fundamental difference between operational systems and data warehousing systems is that
operational systems are designed to support transaction processing whereas data warehousing
systems are designed to support online analytical processing (or OLAP, for short).

Based on this fundamental difference, data usage patterns associated with operational systems
are significantly different than usage patterns associated with data warehousing systems. As a
result, data warehousing systems are designed and optimized using methodologies that
drastically differ from that of operational systems.

The table below summarizes many of the differences between operational systems and data
warehousing systems.
A comparison of operational systems and data warehousing systems

No. Operational Systems Data Warehousing Systems

1 Operational systems are generally Data warehousing systems are generally


designed to support high-volume designed to support high-volume analytical
transaction processing with minimal back- processing (i.e. OLAP) and subsequent,
end reporting. often elaborate report generation.

2 Operational systems are generally Data warehousing systems are generally


process-oriented or process-driven, subject-oriented, organized around
meaning that they are focused on specific business areas that the organization needs
business processes or tasks. Example information about. Such subject areas are
tasks include billing, registration, etc. usually populated with data from one or
more operational systems. As an example,
revenue may be a subject area of a data
warehouse that incorporates data from
operational systems that contain student
tuition data, alumni gift data, financial aid
data, etc.

3 Operational systems are generally Data warehousing systems are generally


concerned with current data. concerned with historical data.

4 Data within operational systems are


generally updated regularly according to Data within a data warehouse is generally
need. non-volatile, meaning that new data may
be added regularly, but once loaded, the
data is rarely changed, thus preserving an
ever-growing history of information. In
short, data within a data warehouse is
generally read-only.

5 Operational systems are generally Data warehousing systems are generally


optimized to perform fast inserts and optimized to perform fast retrievals of
updates of relatively small volumes of relatively large volumes of data
data.

6 Operational systems are generally Data warehousing systems are generally


application-specific, resulting in a multitude integrated at a layer above the application
of partially or non-integrated systems and layer, avoiding data redundancy problems.
redundant data (e.g. billing data is not
integrated with payroll data).

7 Operational systems generally require a Data warehousing systems generally


non-trivial level of computing skills appeal to an end-user community with a
amongst the end-user community. wide range of computing skills, from novice
to expert users.
Benefits
Some of the benefits that a data warehouse provides are as follows:
• A data warehouse provides a common data model for all data of interest regardless of
the data's source. This makes it easier to report and analyze information than it would be
if multiple data models were used to retrieve information such as sales invoices, order
receipts, general ledger charges, etc.
• Prior to loading data into the data warehouse, inconsistencies are identified and
resolved. This greatly simplifies reporting and analysis.
• Information in the data warehouse is under the control of data warehouse users so that,
even if the source system data are purged over time, the information in the warehouse
can be stored safely for extended periods of time.
• Because they are separate from operational systems, data warehouses provide retrieval
of data without slowing down operational systems.
• Data warehouses can work in conjunction with and, hence, enhance the value of
operational business applications, notably customer relationship management (CRM)
systems.
• Data warehouses facilitate decision support system applications such as trend reports
(e.g., the items with the most sales in a particular area within the last two years),
exception reports, and reports that show actual performance versus goals.

Disadvantages
There are also disadvantages to using a data warehouse. Some of them are:
• Data warehouses are not the optimal environment for unstructured data.
• Because data must be extracted, transformed and loaded into the warehouse, there is
an element of latency in data warehouse data.
• Over their life, data warehouses can have high costs.
• Data warehouses can get outdated relatively quickly. There is a cost of delivering
suboptimal information to the organization.
• There is often a fine line between data warehouses and operational systems. Duplicate,
expensive functionality may be developed. Or, functionality may be developed in the
data warehouse that, in retrospect, should have been developed in the operational
systems.

ETL Concepts

Extraction, transformation, and loading. ETL refers to the methods involved in accessing and
manipulating source data and loading it into target database.
The first step in ETL process is mapping the data between source systems and target database
(data warehouse or data mart). The second step is cleansing of source data in staging area.
The third step is transforming cleansed source data and then loading into the target system.

Areas where Data Warehousing can be applied


• Credit card churn analysis
• Insurance fraud analysis
• Call record analysis
• Logistics management.
• Agriculture
Here goes some of the more famous very large data warehouses:

• eBay has a 6 1/2 petabyte database running on Greenplum and a 2 1/2 petabyte
enterprise data warehouse running on Teradata
• Facebook has a 2 1/2 petabyte datawarehouse running on Hadoop/Hive
• Walmart has a 2.5 petabytes warehouse, Bank of America has 1.5 petabytes, Dell with 1
petabyte – All running on Teradata
• Yahoo, Fox Interactive Media, TEOCO (which runs outsourced DWs’ for top US telcos) are
all in the hundreds of terabytes range

Vous aimerez peut-être aussi