Sample Questions:: Section I: Subjective Questions

SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL)
Subject: Data Warehousing and Data Mining
Sample Questions:
Section I: Subjective Questions
1. From the perspective of data warehouse architecture, there are different data
warehouse models, explain them.
2. There are issues while pre-processing the data as well as for comparing and
evaluating classification and prediction methods. Discuss.
3. Discuss the Clustering and Outliers methods of data mining techniques.
4. A design methodology consists of phases, each containing a number of steps, which

guide the designer in the techniques appropriate at each stage of the project. Discuss
its various phases.
5. Apriori Algorithm is called as level wise algorithm. Explain.
6. OLAP tools enable users to analyze multidimensional data interactively from

multiple perspectives. Explain.
7. Different data warehousing systems have different structures. Discuss the various
layers of data warehouse systems.
8. Explain in detail any 3 benefits of dimensional modeling.
Section II: Objective Questions
Multiple Choice Single Response
1. In this system, the balance is the current outstanding balance in the customer’s
account.
1] Accounts receivable
2] Accounts payable
3] Accounts balance
4] Accounts
2. The data is moved from here which is used in operational systems into a data
warehouse staging area, then into a data warehouse and finally into a set of conformed
data marts.
1] databases
2] records
3] files
4] fields
3. These are intermediate servers that sit between a relational back end server (where
the data in the warehouse is stored) and client front end tools.
1] ROLAP Servers
2] MOLAP Servers
3] OLAP Servers
4] Specialized SQL Servers
4. Every association rule has Support indicates

1] Fraction of transactions that contain an item set
2] Fraction of item set that contain transactions
3] All transactions that contain an item set
4] All item set that contain transactions
5. In this learning, the class label of each training record is predefined

1] supervised
2] unsupervised
3] labelled
4] unlabelled
6. This learning applies on dynamic dataset where class label of training data is
unknown.
1] Unsupervised
2] supervised
3] labelled
4] unlabelled
7. These Databases contain complex texts, graphics, images, video fragments, maps,
voice, music, and other forms of audio/video information.
1] Multimedia
2] Spatial
3] Content
4] Integrated
8. It is an approach wherein products are recommended to customers based on the

opinion of other customers.
1] Collaborative filtering
2] Specialised filtering
3] Product filtering
4] Customer filtering
9. This module is used to analyse and interact with data mining modules to search for
an interesting pattern. It filters data to discover an interesting pattern.
1] Pattern Evaluation Module
2] Search Evaluation Module
3] Filter Evaluation Module
4] Data Discover Module
10. This layer integrates the disparate data sets by transforming the data from the staging
layer often storing the transformed data in an operational data store (ODS) database.
1] Integration
2] Staging
3] Operational
4] Subject
Multiple Choice Multiple Response
11. Data mining is considered an interdisciplinary field. It includes a set of various

disciplines, such as statistics, database systems, machine learning, visualisation and
information science.
1] statistics
2] database systems
3] machine learning
4] normalization
12. The data mining task is generally divided into two categories, these are
1] Predictive Task
2] Descriptive Task
3] Outlier Task
4] Correlation Task
13. Steps in Construction of FP tree are:

1] First entire database is scanned to search frequent data item and support count.
2] Arrange all frequent item set in descending order according to support count. This
is list of item set L
3] Create a root of tree as null.
4] Create leaves of tree as null
14. Applications of data mining in Retail industries are:

1] Identifying the buying patterns of customers
2] Finding associations among customers’ demographic characteristics
3] Identifying the databases of customers
4] Finding non-associations among customers’ records
15. The output of this process is a global logical data model consisting of the following:
1] Entity- Relationship diagram
2] Relational schema
3] Supporting documentation
4] Technical documentation
16. Every association rule has these:

1] Support
2] Confidence
3] Support Count
4] Confidence Count
17. Learning algorithm for decision tree must address following issues:
1] How to split training record
2] Stopping criteria for splitting attributes
3] Gathering splitting criteria
4] Stopping criteria for general attributes
18. The different types of data can be a data source such as:
1] Operations
2] Web server logs
3] Internal market research data
4] Client data
Fill in the Blanks
19. In ____________, data sits prior to being scrubbed and transformed into a data
warehouse / data mart.
1] Staging Area
2] Data Extraction Layer
3] ETL Layer
4] Data Storage Layer
20. An ____________ constraint ensures data transaction according to the conditions of

the constraints.
1] enable
2] data
3] condition
4] actual
21. ____________ states that a subset of frequent item set is always frequent.
1] Apriori property
2] Apriori Set
3] Apriori Algorithm
4] Apriori Item
22. The data stored in the warehouse is uploaded from the ____________ systems.
1] operational
2] financial
3] transactional
4] separate
23. Performance of a query is a primary consideration of ____________ designers.

1] data warehouse
2] data mining
3] data marts
4] database
24. In ____________, data sits prior to being scrubbed and transformed into a data
warehouse / data mart.
1] Staging Area
2] Data Extraction Layer
3] ETL Layer
25. A ____________ methodology consists of phases each containing a number of steps,

which guide the designer in the techniques appropriate at each stage of the project.
1] design
2] analysis
3] requirement
4] phase
26. ____________ helps to identify items that are connected to each other, but it does
not help to find nature of the connection.
1] Association mining
2] Rule mining
3] Data mining
4] Generalisation mining
27. In ____________, the data in the database contains incomplete data called missing
data for some records or noisy data, which misleads the data mining process.
1] Noisy data handling
2] Missing data handling
3] Misleading data handling
4] Noisy record handling
28. ____________ is unsupervised learning technique.

1] Clustering
2] Classification
3] Prediction
4] Categorisation
State True or False
29. The relational model discovers the strong entities in terms of business process
execution, whereas dimension model discovers the associative entities that represent
the effect of business process.
30. Data warehouse usually requires integrating the data from several heterogeneous
resources.
31. Each cell within a multidimensional structure contains aggregated data related to
elements along each of its dimensions.
32. A data mining system may not operate on all operating systems.
33. One of the leading causes of poor query performance is poor I/O design.
34. Query performance is main parameter for data warehouse analysis.
35. Item set is a group of one or many items.
36. Straight-line regression analysis is a simple method of regression.

37. Concept hierarchies can be used to derive relationship between spatial and non-
spatial attributes.
38. The best split measures are based on degree of impurity of child node.
Match the Following
39. 1] where the transformed and cleansed data

sit.
2] Data Logic Layer 2] business rules are stored.
3] Data Presentation Layer 3] the information that reaches the users.
4] information about the data stored in the
4] Metadata Layer
data warehouse system is stored.
5] Data is pulled from the data source into
the data warehouse system.
6] data sits prior to being scrubbed and
transformed into a data warehouse / data
mart.
40. 1] Association Rules 1] Implication of the form A=>B

2] Fraction of transactions that contain an
2] Support
itemset
3] Support count 3] Frequency of occurrence of an itemset
4] Confidence 4] Based on conditional probability
5] Based on general probability
6] Implication of the form A <=>B
41. 1] ensure the usability of the dimensional

1] Choose the business process model and the use of the data
warehouse.
2] what you are going to build your
2] Declare the grain
dimensions and fact table from.
3] Identify the dimensions 3] define the dimensions of the model.
4] identify the numeric facts that will
4] Identify the facts
populate each fact table row.
5] identify the character facts that will
populate each fact table row.
6] ensure the usability of the relational
model and the use of the data warehouse.
42. 1] technique to find interesting patterns

1] Web Usage Mining
from web data
2] collects user logs and includes IP
2] Web Server Data
address, page reference and access time.
3] track various business events and log
3] Application Server Data
them
4] Application Level Data 4] tracks individual trends
5] tracks collective trends
6] finds clusters from web data

Sample Questions:: Section I: Subjective Questions

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Sample Questions:: Section I: Subjective Questions

Transféré par

Droits d'auteur :

Formats disponibles

SYMBIOSIS CENTRE FOR DISTANCE LEARNING (SCDL)

Subject: Data Warehousing and Data Mining

4. A design methodology consists of phases, each containing a number of steps, which

6. OLAP tools enable users to analyze multidimensional data interactively from

Section II: Objective Questions

Multiple Choice Single Response

4] Specialized SQL Servers

4. Every association rule has Support indicates

5. In this learning, the class label of each training record is predefined

8. It is an approach wherein products are recommended to customers based on the

Multiple Choice Multiple Response

11. Data mining is considered an interdisciplinary field. It includes a set of various

13. Steps in Construction of FP tree are:

14. Applications of data mining in Retail industries are:

16. Every association rule has these:

Fill in the Blanks

20. An ____________ constraint ensures data transaction according to the conditions of

23. Performance of a query is a primary consideration of ____________ designers.

25. A ____________ methodology consists of phases each containing a number of steps,

28. ____________ is unsupervised learning technique.

State True or False

34. Query performance is main parameter for data warehouse analysis.

35. Item set is a group of one or many items.

36. Straight-line regression analysis is a simple method of regression.

Match the Following

39. 1] where the transformed and cleansed data

40. 1] Association Rules 1] Implication of the form A=>B

41. 1] ensure the usability of the dimensional

42. 1] technique to find interesting patterns

Vous aimerez peut-être aussi