Académique Documents
Professionnel Documents
Culture Documents
1. What is an ODS? Expalin the architecture of ODS and write the difference between
ODS and Data warehouse. 2+2+6M
ODS -operational
operational data store ODSs were an early workaround to the “reporting
problem”To create an ODS you Build a separate/simplified version of an OLTP system
Periodically copy data into it from the live OLTP system Hook it to operational
reporting tools An ODS can be an integration point or real
real-time
time “reporting database” for
an operational system It’s not enough for full enterprise
enterprise-level, cross-database
database analytical
processing
Difference between ODS and datawarehouse
– Major task of traditional relational DBMS
– Day-to-dayday operations: purchasing, inventory, banking, manufacturing, payroll,
registration, accounting, etc.
– The time horizon for the data warehouse is significantly longer than that of
operational systems
– Operational database: current value data
– Data warehouse data: provide information from a historical perspective (e.g., past
5-10 years)
– Every key structure in the data warehouse
– Contains ann element of time, explicitly or implicitly
– But the key of operational data may or may not contain “time element”
–
2. What is Data Warehouse and draw a neat diagram of Data Warehouse Architecture.?
Explain Data Warehouse characteristics. 2+2+6
Data warehousing is a process, not a product, for assembling and managing data from various sources
for the purpose of gaining a single detailed view of part or all of a business. The single view is the data
warehouse (DW) which provides the enterprise’s inform
information
ation environment that is separate from OLTP.
DW is important since information is a powerful asset for every enterprise. A DW integrates
information from several sources into a global schema and is stored separately from the operational
data. It does not represent a snapshot of the operational database. 2M
DW Characterstics: 6M
• Subject Oriented:
A DW is organized around major subjects, such as student, degree, country.
Focusing on the modeling and analysis of data for decision makers, not on daily operations.
opera
A DW provides a simple and concise view around particular subject issues by excluding data that
are not useful in the decision support process.
• Integrated:
Data Warehousing and Data Mining Solutions for T1
A DW may be constructed by integrating information from multiple data sources e.g. multiple
OLTP databases.
Data cleaning and data integration techniques are applied to ensure consistency in naming
conventions, encoding structures, attribute measures, etc. among different data sources.
• Time variant:
A DW usually has long time horizon, significantly longer than that of operational systems.
o Operational database: current value data.
o DW data: provide information from a historical perspective (e.g. past 5-10 years)
Every key structure in the DW contains an element of time, explicitly or implicitly
Operational data may or may not contain time element.
• Non Volatile:
A physically separate store of data transformed from the operational environment.
No update of data
Does not require transaction processing, recovery, and concurrency control mechanisms
Requires only two operations in data accessing: initial loading of data and access of data.
• Build incrementally
• Need a champion
• Senior management support
• Ensure quality
• Corporate strategy
• Business plan:
• Training:
• Adaptability:
• Joint management
Star schema: A fact table in the middle connected to a set of dimension tables. A single fact table
and for each dimension one dimension table.Does not capture hierarchies directly
Data Warehousing and Data Mining Solutions for T1
A number of operations may be applied to data cubes. The common ones are:
• roll-up (increasing the level of abstraction)
• drill-down (increasing detail)
• slice and dice (selection and projection)
• pivot (re-orienting the view)
•
• Roll-up (less detail) - when we wish further abstraction (i.e. less detail). This operation performs
further aggregation on the data, for example, from single degree programs to Schools, single
countries to Continents or from three dimensions to two dimensions.
• Drill-down (increasing detail) - reverse of roll up, when we wish to partition more finely or want
to focus on some particular values of certain dimensions. Drill-down adds more detail to the
data, it may involve adding another dimension.
• Slice and dice (selection and projection) - the slice operation performs a selection on one
dimension of the cube (e.g. degree = “MIT”). The dice operation performs a selection on two or
more dimensions (e.g. degree = “BIT” and country = “Australia” or “India”)
• Pivot (re-orienting the view) - an alternate presentation of the data e.g. rotating the axes in a 3-
D cube.
6. Explain the different tasks of DataMining with suitable examples. 10M
• Prediction Tasks
– Use some variables to predict unknown or future values of other variables
• Description Tasks
– Find human-interpretable patterns that describe the data.
Common data mining tasks
– Classification [Predictive]
– Given a collection of records (training set )
– Each record contains a set of attributes, one of the attributes is the class.
– Find a model for class attribute as a function of the values of other attributes.
– Goal: previously unseen records should be assigned a class as accurately as possible.
Data Warehousing and Data Mining Solutions for T1
– A test set is used to determine the accuracy of the model. Usually, the given data set is
divided into training and test sets, with training set used to build the model and test set
used to validate it.
– Direct Marketing
– Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-
phone product.
– Approach:
– Use the data for a similar product introduced before.
– We know which customers decided to buy and which decided otherwise. This
{buy, don’t buy} decision forms the class attribute.
– Collect various demographic, lifestyle, and company-interaction related
information about all such customers.
– Type of business, where they stay, how much they earn, etc.
– Use this information as input attributes to learn a classifier model.
– Clustering [Descriptive]
– Given a set of data points, each having a set of attributes, and a similarity measure among
them, find clusters such that
– Data points in one cluster are more similar to one another.
– Data points in separate clusters are less similar to one another.
• Ex Customer segmentation e.g. for targeted marketing
– Group/cluster existing customers based on time series of payment history such that
similar customers in same cluster.
– Identify micro-markets and develop policies for eacheachtive filtering:
– group based on common items purchased