Vous êtes sur la page 1sur 4

Data Warehousing and Data Mining Solutions for T1

1. What is an ODS? Expalin the architecture of ODS and write the difference between
ODS and Data warehouse. 2+2+6M
ODS -operational
operational data store ODSs were an early workaround to the “reporting
problem”To create an ODS you Build a separate/simplified version of an OLTP system
Periodically copy data into it from the live OLTP system Hook it to operational
reporting tools An ODS can be an integration point or real
real-time
time “reporting database” for
an operational system It’s not enough for full enterprise
enterprise-level, cross-database
database analytical
processing
Difference between ODS and datawarehouse
– Major task of traditional relational DBMS
– Day-to-dayday operations: purchasing, inventory, banking, manufacturing, payroll,
registration, accounting, etc.
– The time horizon for the data warehouse is significantly longer than that of
operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical perspective (e.g., past
5-10 years)
– Every key structure in the data warehouse
– Contains ann element of time, explicitly or implicitly
– But the key of operational data may or may not contain “time element”

2. What is Data Warehouse and draw a neat diagram of Data Warehouse Architecture.?
Explain Data Warehouse characteristics. 2+2+6
Data warehousing is a process, not a product, for assembling and managing data from various sources
for the purpose of gaining a single detailed view of part or all of a business. The single view is the data
warehouse (DW) which provides the enterprise’s inform
information
ation environment that is separate from OLTP.
DW is important since information is a powerful asset for every enterprise. A DW integrates
information from several sources into a global schema and is stored separately from the operational
data. It does not represent a snapshot of the operational database. 2M
DW Characterstics: 6M
• Subject Oriented:
A DW is organized around major subjects, such as student, degree, country.
Focusing on the modeling and analysis of data for decision makers, not on daily operations.
opera
A DW provides a simple and concise view around particular subject issues by excluding data that
are not useful in the decision support process.
• Integrated:
Data Warehousing and Data Mining Solutions for T1

A DW may be constructed by integrating information from multiple data sources e.g. multiple
OLTP databases.
Data cleaning and data integration techniques are applied to ensure consistency in naming
conventions, encoding structures, attribute measures, etc. among different data sources.
• Time variant:
A DW usually has long time horizon, significantly longer than that of operational systems.
o Operational database: current value data.
o DW data: provide information from a historical perspective (e.g. past 5-10 years)
Every key structure in the DW contains an element of time, explicitly or implicitly
Operational data may or may not contain time element.
• Non Volatile:
A physically separate store of data transformed from the operational environment.
No update of data
Does not require transaction processing, recovery, and concurrency control mechanisms
Requires only two operations in data accessing: initial loading of data and access of data.

3. Implementations of Data Mining 10M


Implementation Steps
a. Requirements analysis and capacity planning
b. Hardware integration
c. Modelling:
d. Physical modelling
e. Sources:
f. ETL:
g. Populate the data warehouse
h. User applications
i. Roll-out the warehouse and applications

• Build incrementally
• Need a champion
• Senior management support
• Ensure quality
• Corporate strategy
• Business plan:
• Training:
• Adaptability:
• Joint management

4. Two Data Warehouse design Schema Types are 10M


• Star Schema
• Snowflake schema

Star schema: A fact table in the middle connected to a set of dimension tables. A single fact table
and for each dimension one dimension table.Does not capture hierarchies directly
Data Warehousing and Data Mining Solutions for T1

Snowflake schema: A refinement of star schema where some dimensional hierarchy is


normalized into a set of smaller dimension tables, forming a shape similar to snowflake.
Represent dimensional hierarchy directly by normalizing tables.
Easy to maintain and saves storage.

5. OLTP (on-line transaction processing) 5M+5M


Major task of traditional relational DBMS
Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration,
accounting, etc.
OLAP (on-line analytical processing)
Major task of data warehouse system
Data analysis and decision making
Distinct features (OLTP vs. OLAP):
User and system orientation: customer vs. market
Data contents: current, detailed vs. historical, consolidated
Database design: ER + application vs. star + subject
View: current, local vs. evolutionary, integrated
Access patterns: update vs. read-only but complex queries

A number of operations may be applied to data cubes. The common ones are:
• roll-up (increasing the level of abstraction)
• drill-down (increasing detail)
• slice and dice (selection and projection)
• pivot (re-orienting the view)

• Roll-up (less detail) - when we wish further abstraction (i.e. less detail). This operation performs
further aggregation on the data, for example, from single degree programs to Schools, single
countries to Continents or from three dimensions to two dimensions.
• Drill-down (increasing detail) - reverse of roll up, when we wish to partition more finely or want
to focus on some particular values of certain dimensions. Drill-down adds more detail to the
data, it may involve adding another dimension.
• Slice and dice (selection and projection) - the slice operation performs a selection on one
dimension of the cube (e.g. degree = “MIT”). The dice operation performs a selection on two or
more dimensions (e.g. degree = “BIT” and country = “Australia” or “India”)
• Pivot (re-orienting the view) - an alternate presentation of the data e.g. rotating the axes in a 3-
D cube.
6. Explain the different tasks of DataMining with suitable examples. 10M
• Prediction Tasks
– Use some variables to predict unknown or future values of other variables
• Description Tasks
– Find human-interpretable patterns that describe the data.
Common data mining tasks
– Classification [Predictive]
– Given a collection of records (training set )
– Each record contains a set of attributes, one of the attributes is the class.
– Find a model for class attribute as a function of the values of other attributes.
– Goal: previously unseen records should be assigned a class as accurately as possible.
Data Warehousing and Data Mining Solutions for T1

– A test set is used to determine the accuracy of the model. Usually, the given data set is
divided into training and test sets, with training set used to build the model and test set
used to validate it.
– Direct Marketing
– Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-
phone product.
– Approach:
– Use the data for a similar product introduced before.
– We know which customers decided to buy and which decided otherwise. This
{buy, don’t buy} decision forms the class attribute.
– Collect various demographic, lifestyle, and company-interaction related
information about all such customers.
– Type of business, where they stay, how much they earn, etc.
– Use this information as input attributes to learn a classifier model.

– Clustering [Descriptive]
– Given a set of data points, each having a set of attributes, and a similarity measure among
them, find clusters such that
– Data points in one cluster are more similar to one another.
– Data points in separate clusters are less similar to one another.
• Ex Customer segmentation e.g. for targeted marketing
– Group/cluster existing customers based on time series of payment history such that
similar customers in same cluster.
– Identify micro-markets and develop policies for eacheachtive filtering:
– group based on common items purchased

– Association Rule Discovery [Descriptive]


– Given a set of records each of which contain some number of items from a given collection;
– Produce dependency rules which will predict occurrence of an item based on
occurrences of other items.
• Ex Supermarket shelf management.
– Goal: To identify items that are bought together by sufficiently many customers.
– Approach: Process the point-of-sale data collected with barcode scanners to find
dependencies among items.
– A classic rule --
• If a customer buys diaper and milk, then he is very likely to buy beer.
• So, don’t be surprised if you find six-packs stacked next to diapers!
– Deviation Detection [Predictive]
Detect significant deviations from normal behavior
– Applications: Ex Credit Card Fraud Detection
– Network Intrusion
Detection

Vous aimerez peut-être aussi