Vous êtes sur la page 1sur 5

CHAPTER 2 DATA WAREHOUSE: THE BUILDING BLOCKS

EX 1:

Match the columns

1-H, 2-G, 3-J, 4-F, 5-B, 6-I, 7-C, 8-A, 9-E, 10-D

REVIEW QUESTIONS
1. Name at least six characteristics or features of a data warehouse.
There are mainly five features available for data warehouse,they are as follows.
1) Subject-Oriented Data:
-In the data warehouse, data is stored by subjects, not by applications.
- In the data warehouse, data is not stored by operational applications, but by
business subjects. .
-Eg :For the insurance company; In the operational systems, data for each
application is organized separately by application: order processing,
consumer loans, customer billing while in data warehouse it is organised as:
sales, products, customers.
2) Integrated Data:
-There are several data which are from several operational systems, Source
data are in different databases, files, and data segments, so it is difficult to
get some specific data from these all data.
- so to overcome this problem, all these data are integrated and therefore
easy to gain information.
-Here, Data inconsistencies are removed; data from diverse operational
applications is integrated.
- Here are some of the items that would need standardization: Naming
conventions, Codes, Data attributes, Measurements.
3) Time-Variant Data:
- A data warehouse, because of the very nature of its purpose, has to contain
historical data, not just current values. Data is stored as snapshots over past
and current periods. Every data structure in the data warehouse contains the
time element. You will find historical snapshots of the operational data in
the data warehouse. This aspect of the data warehouse is quite significant for
both the design and the implementation phases.

-The time-variant nature of the data in a data warehouse:


 Allows for analysis of the past
 Relates information to the present
 Enables forecasts for the future
4)Nonvolatile Data:
-Nonvolatile means that, once entered into the warehouse, data should not
change. This is logical because the purpose of a warehouse is to enable you
to analyze what has occurred.
-The data in a data warehouse is not as volatile as the data in an operational
database is. The data in a data warehouse is primarily for query and analysis.
5) Data Granularity:
-Data granularity refers to the level of detail.
- Depending on the requirements, multiple levels of detail may be present.
Many data warehouses have at least dual levels of granularity.
-Depending on the query, you can then go to the particular level of detail and
satisfy the query.
2. Why is data integration required in a data warehouse, more so there than in an operational
application?
integration is one of the most important aspects of a Data Warehouse. When data
passes from the sources of the application-oriented operational environment to the
Data Warehouse, possible inconsistencies and redundancies should be resolved, so
that the warehouse is able to provide an integrated and reconciled view of data of
the organization. We describe a novel approach to data integration in Data
Warehousing. Our approach is based on a conceptual representation of the Data
Warehouse application domain, and follows the so-called local-as-view paradigm:
both source and Data Warehouse relations are defined as views over the conceptual
model
3. Every data structure in the data warehouse contains the time element. Why?
Because of its Time-variant nature

-The time-variant nature of the data in a data warehouse:


 Allows for analysis of the past
 Relates information to the present
 Enables forecasts for the future
4. Explain data granularity and how it is applicable to the data warehouse.
Granularity refers to the level of detail of the data stored fact tables in a data warehouse.
High granularity refers to data that is at or near the transaction level. Data that is at the
transaction level is usually referred to as atomic level data. Low granularity refers to data that
is summarized or aggregated, usually from the atomic level data. Summarized data can be
lightly summarized as in daily or weekly summaries or highly summarized data such as
yearly averages and totals.
5. How are the top-down and bottom-up approaches for building a data warehouse
different? Discuss the merits and disadvantages of each approach.
Top-Down vs. Bottom-Up
If you use a top-down approach, you will have to analyze global business needs, plan
how to develop a data warehouse, design it, and implement it as a whole. This
procedure is promising: it will achieve excellent results because it is based on a global
picture of the goal to achieve, and in principle it ensures consistent, well
integrated data warehouses. However, a long story of failure with top-down approaches
teaches that:
 high-cost estimates with long-term implementations discourage company
managers from embarking on these kind of projects;
 analyzing and bringing together all relevant sources is a very difficult task, also
because it is not very likely that they are all available and stable at the same time;
 it is extremely difficult to forecast the specific needs of every department involved
in a project, which can result in the analysis process coming to a standstill;
 since no prototype is going to be delivered in the short term, users cannot check
for this project to be useful, so they lose trust and interest in it.
In a bottom-up approach, data warehouses are incrementally built and
several data marts are iteratively created. Each data mart is based on a set of facts that
are linked to a specific company department and that can be interesting for a user
subgroup (for example, data marts for inventories, marketing, and so on). If this
approach is coupled with quick prototyping, the time and cost needed for
implementation can be reduced so remarkably that company managers will notice how
useful the project being carried out is. In this way, that project will still be of great
interest.
The bottom-up approach turns out to be more cautious than the top-down one and it is
almost universally accepted. Naturally the bottom-up approach is not risk-free, because
it gets a partial picture of the whole field of application. We need to pay attention to the
first data mart to be used as prototype to get the best results: this should play a very
strategic role in a company. In fact, its role is so crucial that this data mart should be a
reference point for the whole data warehouse. In this way, the following data marts can
be easily added to the original one. Moreover, it is highly advisable that the
selected data mart exploit consistent data already made available.
6. What are the various data sources for the data warehouse?
Source data coming into the data warehouse may be grouped into four broad
categories,as discussed here.
 product data:This category of data comes from the various operational
systems of the enterprise. Based on the information requirements in the data
warehouse, you choose segments of data from the different operational
systems.
 internal data: In every organization, users keep their “private” spreadsheets,
documents,customer profiles, and sometimes even departmental databases.
This is the internal data, parts of which could be useful in a data warehouse.
Internal data adds additional complexity to the process of transforming and
integrating the data before it can be stored in the data warehouse.
 archived data: In every operational system, you periodically take the old data
and store it in archived files.
 External data: Most executives depend on data from external sources for a
high percentage of the information they use.
7. Why do you need a separate data staging component?
in a data warehouse you pull in data from many source operational
systems. Remember that data in a data warehouse is subject-oriented and cuts across operational
applications. A separate staging area, therefore, is a necessity for preparing data
for the data warehouse .

8. Under data transformation, list five different functions you can think of.
First, you clean the data extracted from each source. Cleaning may just be correction of misspellings,
or may include resolution of conflicts between state codes and zip codes in the
source data, or may deal with providing default values for missing data elements, or elimination
of duplicates when you bring in the same data from multiple source systems.
Second Standardization of data elements forms a large part of data transformation. You standardize
the data types and field lengths for same data elements retrieved from the various
sources. Semantic standardization is another major task. You resolve synonyms and
homonyms. When two or more terms from different source systems mean the same thing,
you resolve the synonyms. When a single term means many different things in different
source systems, you resolve the homonym.
Data transformation involves many forms of combining pieces of data from the different
sources. You combine data from a single source record or related data elements from
many source records. On the other hand, data transformation also involves purging source
data that is not useful and separating out source records into new combinations. Sorting
and merging of data takes place on a large scale in the data staging area.

9. Name any six different methods for information delivery.

 Ad hoc reports- are predefined reports primarily meant for novice and casual users
 Complex queries-
 Multidimensional (MD) analysis
 Statistical analysis- cater to the needs of the business analysts
and power users.
 Executive Information Systems (EIS) feed- Information fed into Executive Information Systems (EIS)
is meant
for senior executives and high-level managers.
 Data Mining- Some data warehouses also provide data to
data-mining applications. Data-mining applications are knowledge discovery systems

EXERCISES
1. Match the columns:
a. nonvolatile data A. roadmap for users
2. dual data granularity B. subject-oriented
3. dependent data mart C. knowledge discovery
4. disparate data D. private spreadsheets
5. decision support E. application flavor
6. data staging F. because of multiple sources
7. data mining G. details and summary
8. metadata H. read-only
9. operational systems I. workbench for data integration
10. internal data J. data from main data warehouse

2. A data warehouse is subject-oriented. What would be the major critical business


subjects for the following companies?
a. an international manufacturing company
Manufacturing
b. a local community bank
Finance
c. a domestic hotel chain
Business Management