Vous êtes sur la page 1sur 6

SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL)

Subject: Data Warehousing and Data Mining

Sample Questions:
Section I: Subjective Questions

1. From the perspective of data warehouse architecture, there are different data
warehouse models, explain them.
2. There are issues while pre-processing the data as well as for comparing and
evaluating classification and prediction methods. Discuss.
3. Discuss the Clustering and Outliers methods of data mining techniques.

4. A design methodology consists of phases, each containing a number of steps, which


guide the designer in the techniques appropriate at each stage of the project. Discuss
its various phases.
5. Apriori Algorithm is called as level wise algorithm. Explain.

6. OLAP tools enable users to analyze multidimensional data interactively from


multiple perspectives. Explain.
7. Different data warehousing systems have different structures. Discuss the various
layers of data warehouse systems.
8. Explain in detail any 3 benefits of dimensional modeling.

Section II: Objective Questions

Multiple Choice Single Response

1. In this system, the balance is the current outstanding balance in the customer’s
account.
1] Accounts receivable
2] Accounts payable
3] Accounts balance
4] Accounts

2. The data is moved from here which is used in operational systems into a data
warehouse staging area, then into a data warehouse and finally into a set of conformed
data marts.
1] databases
2] records
3] files
4] fields

3. These are intermediate servers that sit between a relational back end server (where
the data in the warehouse is stored) and client front end tools.
1] ROLAP Servers
2] MOLAP Servers
3] OLAP Servers
SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL)
Subject: Data Warehousing and Data Mining

4] Specialized SQL Servers

4. Every association rule has Support indicates


1] Fraction of transactions that contain an item set
2] Fraction of item set that contain transactions
3] All transactions that contain an item set
4] All item set that contain transactions

5. In this learning, the class label of each training record is predefined


1] supervised
2] unsupervised
3] labelled
4] unlabelled

6. This learning applies on dynamic dataset where class label of training data is
unknown.
1] Unsupervised
2] supervised
3] labelled
4] unlabelled

7. These Databases contain complex texts, graphics, images, video fragments, maps,
voice, music, and other forms of audio/video information.
1] Multimedia
2] Spatial
3] Content
4] Integrated

8. It is an approach wherein products are recommended to customers based on the


opinion of other customers.
1] Collaborative filtering
2] Specialised filtering
3] Product filtering
4] Customer filtering

9. This module is used to analyse and interact with data mining modules to search for
an interesting pattern. It filters data to discover an interesting pattern.
1] Pattern Evaluation Module
2] Search Evaluation Module
3] Filter Evaluation Module
4] Data Discover Module

10. This layer integrates the disparate data sets by transforming the data from the staging
layer often storing the transformed data in an operational data store (ODS) database.
1] Integration
2] Staging
3] Operational
SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL)
Subject: Data Warehousing and Data Mining

4] Subject

Multiple Choice Multiple Response

11. Data mining is considered an interdisciplinary field. It includes a set of various


disciplines, such as statistics, database systems, machine learning, visualisation and
information science.
1] statistics
2] database systems
3] machine learning
4] normalization

12. The data mining task is generally divided into two categories, these are
1] Predictive Task
2] Descriptive Task
3] Outlier Task
4] Correlation Task

13. Steps in Construction of FP tree are:


1] First entire database is scanned to search frequent data item and support count.
2] Arrange all frequent item set in descending order according to support count. This
is list of item set L
3] Create a root of tree as null.
4] Create leaves of tree as null

14. Applications of data mining in Retail industries are:


1] Identifying the buying patterns of customers
2] Finding associations among customers’ demographic characteristics
3] Identifying the databases of customers
4] Finding non-associations among customers’ records

15. The output of this process is a global logical data model consisting of the following:
1] Entity- Relationship diagram
2] Relational schema
3] Supporting documentation
4] Technical documentation

16. Every association rule has these:


1] Support
2] Confidence
3] Support Count
4] Confidence Count
17. Learning algorithm for decision tree must address following issues:
1] How to split training record
2] Stopping criteria for splitting attributes
3] Gathering splitting criteria
4] Stopping criteria for general attributes
SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL)
Subject: Data Warehousing and Data Mining

18. The different types of data can be a data source such as:
1] Operations
2] Web server logs
3] Internal market research data
4] Client data

Fill in the Blanks

19. In ____________, data sits prior to being scrubbed and transformed into a data
warehouse / data mart.
1] Staging Area
2] Data Extraction Layer
3] ETL Layer
4] Data Storage Layer

20. An ____________ constraint ensures data transaction according to the conditions of


the constraints.
1] enable
2] data
3] condition
4] actual

21. ____________ states that a subset of frequent item set is always frequent.
1] Apriori property
2] Apriori Set
3] Apriori Algorithm
4] Apriori Item

22. The data stored in the warehouse is uploaded from the ____________ systems.
1] operational
2] financial
3] transactional
4] separate

23. Performance of a query is a primary consideration of ____________ designers.


1] data warehouse
2] data mining
3] data marts
4] database

24. In ____________, data sits prior to being scrubbed and transformed into a data
warehouse / data mart.
1] Staging Area
2] Data Extraction Layer
3] ETL Layer
4] Data Storage Layer
SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL)
Subject: Data Warehousing and Data Mining

25. A ____________ methodology consists of phases each containing a number of steps,


which guide the designer in the techniques appropriate at each stage of the project.
1] design
2] analysis
3] requirement
4] phase

26. ____________ helps to identify items that are connected to each other, but it does
not help to find nature of the connection.
1] Association mining
2] Rule mining
3] Data mining
4] Generalisation mining

27. In ____________, the data in the database contains incomplete data called missing
data for some records or noisy data, which misleads the data mining process.
1] Noisy data handling
2] Missing data handling
3] Misleading data handling
4] Noisy record handling

28. ____________ is unsupervised learning technique.


1] Clustering
2] Classification
3] Prediction
4] Categorisation

State True or False

29. The relational model discovers the strong entities in terms of business process
execution, whereas dimension model discovers the associative entities that represent
the effect of business process.
30. Data warehouse usually requires integrating the data from several heterogeneous
resources.
31. Each cell within a multidimensional structure contains aggregated data related to
elements along each of its dimensions.
32. A data mining system may not operate on all operating systems.

33. One of the leading causes of poor query performance is poor I/O design.

34. Query performance is main parameter for data warehouse analysis.

35. Item set is a group of one or many items.

36. Straight-line regression analysis is a simple method of regression.


SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL)
Subject: Data Warehousing and Data Mining

37. Concept hierarchies can be used to derive relationship between spatial and non-
spatial attributes.
38. The best split measures are based on degree of impurity of child node.

Match the Following

39. 1] where the transformed and cleansed data


1] Data Storage Layer
sit.
2] Data Logic Layer 2] business rules are stored.
3] Data Presentation Layer 3] the information that reaches the users.
4] information about the data stored in the
4] Metadata Layer
data warehouse system is stored.
5] Data is pulled from the data source into
the data warehouse system.
6] data sits prior to being scrubbed and
transformed into a data warehouse / data
mart.

40. 1] Association Rules 1] Implication of the form A=>B


2] Fraction of transactions that contain an
2] Support
itemset
3] Support count 3] Frequency of occurrence of an itemset
4] Confidence 4] Based on conditional probability
5] Based on general probability
6] Implication of the form A <=>B

41. 1] ensure the usability of the dimensional


1] Choose the business process model and the use of the data
warehouse.
2] what you are going to build your
2] Declare the grain
dimensions and fact table from.
3] Identify the dimensions 3] define the dimensions of the model.
4] identify the numeric facts that will
4] Identify the facts
populate each fact table row.
5] identify the character facts that will
populate each fact table row.
6] ensure the usability of the relational
model and the use of the data warehouse.

42. 1] technique to find interesting patterns


1] Web Usage Mining
from web data
2] collects user logs and includes IP
2] Web Server Data
address, page reference and access time.
3] track various business events and log
3] Application Server Data
them
4] Application Level Data 4] tracks individual trends
5] tracks collective trends
6] finds clusters from web data

Vous aimerez peut-être aussi