Vous êtes sur la page 1sur 8

SE-6C

Data Warehousing

ASSIGNMENT#2

HAMZA IMTIAZ MALIK


(FA13-BSE-039)

1.

What is Hypercube in OLAP System?

Answer:
An OLAP cube is a term that typically refers to multidimensional array of data. OLAP is an acronym for online
analytical processing, which is a computer-based technique of
analysing data to look for insights. The term cube here refers to
a multi-dimensional dataset, which is also sometimes called a
hypercube if the number of dimensions is greater than 3.
A cube can be considered a multi-dimensional generalization of
a two- or three-dimensional spreadsheet. For example, a
company might wish to summarize financial data by product,
by time-period, and by city to compare actual and budget
expenses. Product, time, city and scenario (actual and budget)
are the data's dimensions.
Cube is a shortcut for multidimensional dataset, given that data
can have an arbitrary number of dimensions. The term
hypercube is sometimes used, especially for data with more
than three dimensions.
Slicer is a term for a dimension which is held constant for all
cells so that multidimensional information can be shown in a
two dimensional physical space of a spreadsheet or pivot table.
Each cell of the cube holds a number that represents some
measure of the business, such as sales, profits, expenses,
budget and forecast.
OLAP data is typically stored in a star schema or snowflake
schema in a relational data warehouse or in a special-purpose
data management system. Measures are derived from the
records in the fact table and dimensions are derived from the
dimension tables.
2. What is MULTI-DIMENSIONAL Analysis?
Answer:
Multi-Dimensional Analysis is an Informational Analysis on data
which takes into account many different relationships, each of
which represents a dimension. For example, a retail analyst
may want to understand the relationships among sales by
region, by quarter, by demographic distribution (income,
education level, gender), by product. Multi-dimensional analysis
will yield results for these complex relationships.
Multi-Dimensional Analysis is generally used in statistics,
econometrics and other related fields and the results of this

kind of analysis used in the different fields can be further


applied to different fields like business enterprise. Multidimensional analysis actually is a process which groups data
into two basic categories which are the data dimension
category and the measurement category. To illustrate this, let
us take the case of a football game.
A data set which consists of the number of wins for one football
team every year for many years could be categorized into a
single dimensional or longitudinal data set. Another data set
which consists of the number of wins many different football
teams within a year can be under a single dimensional or cross
sectional data set. A single data set that consists of the number
of wins for various football teams across many years could be
contained in a two-dimensional data set.
Two dimensional data sets are also called panel data in other
disciplines. Logically, any two or higher dimensional data sets
could actually be considered as multidimensional data but the
term multidimensional data tends to be applied on data sets
only with three or more dimensions.
For instance, there are data sets used for forecasting which
provide forecasts for various target periods and these are
carried out by multiple forecasters made at multiple horizons.
All three dimensions can provide for better information which
can be gleaned from two dimensional panel data sets.
In a multidimensional the term dimension refers to a structural
attribute of a data cube. The dimension is composed or related
and hierarchical members. For instance, the "Time" dimension
may have the members like years, quarters, months, weeks,
day, and hour and so on. In the same manner, the "Geography"
dimension may have members like regions, countries, cities
and so on.
A dimension member is an element of any given dimension just
like in the example above where year like years, quarters,
months and weeks are members of the "Time" dimension.
There is also a dimension hierarchy is a way to organize
dimension members into parent and child relationships. In the
"Time" dimension example, the month is the child belonging to
the quarter which in turn is the child to a year.
A dimension title refers to the name used to make the
dimension known. In the above examples, the "Time" and
"Geography are dimension titles.

The dimension title member is the name of the member as in


the case of month or city. The dimension value member is an
instance of a dimension member. For example, 2007 is the
value of the dimension value which is Year.
A data point refers to the intersection of multiple dimensions
while a data value resides at the data point.
Multidimensional analysis is very important in a business
enterprise because they are the basis for some of the decisions
of the business organization which will give them better edge
over the competitor. Todays business environment is
constantly evolving and business trends change very fast so it
is always a good idea to analyze enterprise related things.
Many software tools have been developed to make
multidimensional analysis processes a lot easier and faster. A
multidimensional analysis is often part of the larger business
intelligence system that works collaboratively with the data
warehouse system.

3.
Briefly
discuss
the
Design
Approaches & Architecture DWH.

Answer:

Design Approaches:
i.
Bottom-Up Design:
In the bottom-up design approach, the data marts are created
first to provide reporting capability. A data mart addresses a
single business area such as sales, Finance etc. These data
marts are then integrated to build a complete data warehouse.
The integration of data marts is implemented using data
warehouse bus architecture. In the bus architecture, a
dimension is shared between facts in two or more data marts.
These dimensions are called conformed dimensions. These
conformed dimensions are integrated from data marts and then
data warehouse is built.
Advantages of bottom-up design are:
This model contains consistent data marts and these data
marts can be delivered quickly.
As the data marts are created first, reports can be generated
quickly.
The data warehouse can be extended easily to accommodate
new business units. It is just creating new data marts and then
integrating with other data marts.
Disadvantages of bottom-up design are:

The positions of the data warehouse and the data marts are
reversed in the bottom-up approach design.
ii.
Top-Down Design:
In the top-down design approach the, data warehouse is built
first. The data marts are then created from the data warehouse.
Advantages of top-down design are:
Provides consistent dimensional views of data across data
marts, as all data marts are loaded from the data warehouse.
This approach is robust against business changes. Creating a
new data mart from the data warehouse is very easy.
Disadvantages of top-down design are:
This methodology is inflexible to changing departmental needs
during implementation phase.
It represents a very large project and the cost of implementing
the project is significant.

Three-Tier Data Warehouse Architecture


Generally a data warehouses adopts a three-tier architecture.
Following are the three tiers of the data warehouse
architecture.
Bottom Tier - The bottom tier of the architecture is the
data warehouse database server. It is the relational
database system. We use the back end tools and utilities
to feed data into the bottom tier. These back end tools and
utilities perform the Extract, Clean, Load, and refresh
functions.
Middle Tier - In the middle tier, we have the OLAP Server
that can be implemented in either of the following ways.
o By Relational OLAP (ROLAP), which is an extended
relational database management system. The ROLAP
maps the operations on multidimensional data to
standard relational operations.
o By Multidimensional OLAP (MOLAP) model, which
directly implements the multidimensional data and
operations.
Top-Tier - This tier is the front-end client layer. This layer
holds the query tools and reporting tools, analysis tools
and data mining tools.

The following diagram depicts the three-tier architecture of


data warehouse:

4. Write Dr. CODDs guidelines for the OLAP system.


Answer:
1. Multidimensional conceptual view
User-analysts would view an enterprise as
being multidimensional in nature for example, profits
could be viewed by region, product, time period, or
scenario (such as actual, budget, or forecast). Multidimensional data models enable more straightforward and
intuitive manipulation of data by users, including slicing
and dicing.
2. Transparency
When OLAP forms part of the users customary
spreadsheet or graphics package, this should be
transparent to the user. OLAP should be part of an open
systems architecture which can be embedded in any place
desired by the user without adversely affecting the
functionality of the host tool. The user should not be
exposed to the source of the data supplied to the OLAP
tool, which may be homogeneous or heterogeneous.

3. Accessibility
The OLAP tool should be capable of applying its own
logical structure to access heterogeneous sources of data
and perform any conversions necessary to present a
coherent view to the user. The tool (and not the user)
should be concerned with where the physical data comes
from.
4. Consistent reporting performance
Performance of the OLAP tool should not suffer
significantly as the number of dimensions is increased.
5. Client/server architecture
The server component of OLAP tools should be sufficiently
intelligent that the various clients can be attached with
minimum effort. The server should be capable of mapping
and consolidating data between disparate databases.
6. Generic Dimensionality
Every data dimension should be equivalent in its structure
and operational capabilities.
7. Dynamic sparse matrix handling
The OLAP servers physical structure should have optimal
sparse matrix handling.
8. Multi-user support
OLAP tools must provide concurrent retrieval and update
access, integrity and security.

9. Unrestricted cross-dimensional operations


Computational facilities must allow calculation and data
manipulation across any number of data dimensions, and
must not restrict any relationship between data cells.
10.
Intuitive data manipulation
Data manipulation inherent in the consolidation path, such
as drilling down or zooming out, should be accomplished
via direct action on the analytical models cells, and not
require use of a menu or multiple trips across the user
interface.
11.
Flexible reporting
Reporting facilities should present information in any way
the user wants to view it.
12.
Unlimited Dimensions and aggregation levels.
The number of data dimensions supported should, to all
intents and purposes, be unlimited. Each
generic dimensions should enable an essentially unlimited
number of user-defined aggregation levels within any
given consolidation path.

Vous aimerez peut-être aussi