Vous êtes sur la page 1sur 20

56 B F2 53 F9 1 06 67 0C B

A 7 8 5
6A B7 F22 2531 1F97 718 065 675 50C6 6B3 38E5
B F2 53 F9 1 06 67 0C B 8E 7B
AB 7F 25 1F 71 80 56 50 6B 38 57 5
2 3 9 6 7 C E 6
B7 7F22 253 1F9 718 8065 5675 50C 6B3 38E 57B B56AAB7
F2 53 1F9 71 06 67 0C 6B 8E 57B 56 B F2
7F 25 1F 7 80 56 50 6 38 57 5 A 7F 25
22 3 9 18 65 75 C B3 E B 6A B7 2 3

Q3
Q2
22 531 1F97 7180 065 675 0C6 6B3 8E5 57B 56A B7 F22 2531 1F9
5 6 0 8 5 F 7
25 31F F971 1806 656 750 C6B B38 E57 7B5 6AB B7F 225 531F F97 180

a)
b)
a)
c)
Q1 a)
31 97 80 56 75 C6 3 E5 B5 6A 7 22 31 97 18 65

d)
b)

72259
53 F9 18 65 7 0C B 8E 7B 6 B F2 53 F9 1 06 67
1F 7 0 6 50 6 38 5 5 A 7F 2 1 7 80 56 50
1F 971 1806 6567 750 C6B B38 E57 7B56 6AB B7F 225 531F F971 180 656 750 C6B
C E 2 6 7 C
F9 9718 8065 5675 50C 6B3 38E 57B B56AAB7 7F2 253 31F9 9718 806 567 50C 6B 38E
71 06 67 0C 6B 8E 57 56 B F2 25 1F 71 06 567 50 6B 38 57
97 80 56 50 6 38 5 B5 A 7F 2 31 97 80 56 5 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53

so effective?
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F
38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 567
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
Note: 1. Question 1 is compulsory.

97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F

Explain with appropriate example.


38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18

Let status be the class label attribute.


7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 567
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50
Time: 03 Hours

7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6


22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E

Page 1 of 2
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
2. Answer any three out of remaining five questions.

06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53

department, status, age, and salary given in that row.


6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F
38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 567

56AB7F22531F97180656750C6B38E57B
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3
3. Assume any suitable data wherever required and justify the same.

i. Draw a snowflake schema diagram for the data warehouse.

53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E
occurrence. Describe various methods for handling this problem.
What are spatial data structures? Outline their importance in GIS.

1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A

the count of each generalized data tuple (i.e., of each row entry)?
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F

ii. Use your algorithm to construct a decision tree from the given data.
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
Paper / Subject Code: 88903 / Data Warehouse and Mining

6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F


38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
to list the average grade of CS courses for each DB-University student.

E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 56
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 7
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75
What is Metadata? Why do we need metadata when search engines like Google seem

53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0
With respect to web mining, is it possible to detect visual objects using meta-objects?

1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50

i. How would you modify the basic decision tree algorithm to take into consideration
replication (synchronous or asynchronous) is better suited for data warehousing? Why?
lowest conceptual level (e.g., for a given student, course, semester, and instructor

For a given row entry, count represents the number of data tuples having the values for
In real-world data, tuples with missing values for some attributes are a common

combination), the avg-grade measure stores the actual course grade of the student. At

OLAP operations (e.g., roll-up from semester to year) should you perform in order

What is the relationship between data warehousing and data replication? Which form of
higher conceptual levels, avg-grade stores the average grade for the given combination.

ii. Starting with the base cuboid [student, course, semester, instructor], what specific
student, course, semester, and instructor, and two measures count and avg-grade. At the

been generalized. For example, “31: : : 35” for age represents the age range of 31 to 35.

97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B
[5]
[5]

[5]
[5]

0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B
The following table consists of training data from an employee database. The data have [10]
[10]
Suppose that a data warehouse for DB-University consists of the four dimensions [10]
Marks: 80

6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38


38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E
56 B F2 253 1F9 71 06 67 0C 6B 8E 5
AB 7F 25 1F 71 80 56 50 6B 38 57
7F 225 31F 97 80 656 750 C6 38 E57
2 3 9 18 65 7 C B E B
56 B F2 53 F9 1 06 67 0C B
A 7 8 5
6A B7 F22 2531 1F97 718 065 675 50C6 6B3 38E5
B F2 53 F9 1 06 67 0C B 8E 7B
AB 7F 25 1F 71 80 56 50 6B 38 57 5
2 3 9 6 7 C E 6
B7 7F22 253 1F9 718 8065 5675 50C 6B3 38E 57B B56AAB7
F2 53 1F9 71 06 67 0C 6B 8E 57B 56 B F2
7F 25 1F 7 80 56 50 6 38 57 5 A 7F 25
22 3 9 18 65 75 C B3 E B 6A B7 2 3

Q6
Q5
Q4
Q3
22 531 1F97 7180 065 675 0C6 6B3 8E5 57B 56A B7 F22 2531 1F9
5 6 0 8 5 F 7
25 31F F971 1806 656 750 C6B B38 E57 7B5 6AB B7F 225 531F F97 180

a)
a)
a)

b)
b)
b)
b)
31 97 80 56 75 C6 3 E5 B5 6A 7 22 31 97 18 65

72259
53 F9 18 65 7 0C B 8E 7B 6 B F2 53 F9 1 06 67
1F 7 0 6 50 6 38 5 5 A 7F 2 1 7 80 56 50
1F 971 1806 6567 750 C6B B38 E57 7B56 6AB B7F 225 531F F971 180 656 750 C6B
C E 2 6 7 C

Y
X
F9 9718 8065 5675 50C 6B3 38E 57B B56AAB7 7F2 253 31F9 9718 806 567 50C 6B 38E

ii.
i.
71 06 67 0C 6B 8E 57 56 B F2 25 1F 71 06 567 50 6B 38 57
97 80 56 50 6 38 5 B5 A 7F 2 31 97 80 56 5 C6 3 E5 B5

4
4
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B

support:
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F

4
8
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22

T8
T7
T6
T5
T4
T3
T2
T1
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F

8
38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97

15
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 567
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50

4
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6

24
D
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3
does (i) have over (ii)?

53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E

B, C
D, B
Nominal attributes
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57

Transaction ID Items
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5

12
24
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A

A, D, E
B, A, D
described by the following:
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B

or simple statistical analysis?


56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F

C, A, B, E
E, A, D, B
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22

Briefly outline with example,


0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
5), B3 (6, 4), C1 (1, 2), C2 (4, 9).

D, A, C, E, B
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F

Asymmetric binary attributes


38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18

two other dimensions of data quality.


7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 567
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3

***
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E

Page 2 of 2
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F
38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 567

56AB7F22531F97180656750C6B38E57B
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
Paper / Subject Code: 88903 / Data Warehouse and Mining

6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F


38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06

Use complete linkage algorithm to find the clusters from the following dataset.
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 56
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 7
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0
The three cluster centers after the first round of execution (ii) The final three clusters.

1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50
However, multiple occurrences of an item in the same shopping basket, such as four

one mine frequent itemsets efficiently considering multiple occurrences of items?


(i) converting the decision tree to rules and then pruning the resulting rules, or (ii)
pruning the decision tree and then converting the pruned tree to rules. What advantage

Data quality can be assessed in terms of several issues, including accuracy,


how to compute the dissimilarity between objects

cakes and three jugs of milk, can be important in transactional data analysis. How can
Frequent pattern mining algorithms considers only distinct items in a transaction.
Why is tree pruning useful in decision tree induction? What is a drawback of using a

completeness, and consistency. For each of the above three issues, discuss how data

mining functionalities does this business need (e.g., think of the kinds of patterns that
Differentiate between simple linkage, average linkage and complete linkage algorithms.
Generate Frequent Pattern Tree for the following transaction with 30% minimum
separate set of tuples to evaluate pruning? Given a decision tree, you have the option of

as the center of each cluster, respectively. Use the k-means algorithm to show only (i)

could be mined)? Can such patterns be generated alternatively by data query processing
quality assessment can depend on the intended use of the data, giving examples. Propose
into three clusters, where the points are: A1 (2, 10), A2 (2, 5), A3 (8, 4), B1 (5, 8), B2 (7,

The distance function is Euclidean distance. Suppose initially we assign A1, B1, and C1

97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B

Present an example where data mining is crucial to the success of a business. What data [10]
[10]
[10]
[10]
[10]
Suppose that the data mining task is to cluster points (with (x, y) representing location) [10]
[10]

6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38


38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E
56 B F2 253 1F9 71 06 67 0C 6B 8E 5
AB 7F 25 1F 71 80 56 50 6B 38 57
7F 225 31F 97 80 656 750 C6 38 E57
2 3 9 18 65 7 C B E B
D 6
C0 0CD 50F FE5 594F F32AA7E EB6 35DD02 E2B
C 5 9 3 E
0C D5 0FE E59 4F3 32A 7EB B635 5D0 02E 2B4 4233
D5 0F 59 4F3 2A 7EB 63 D 2E 2B 23 7C
D5 0F E59 4F3 2A 7EB 63 5D 02E 2B 423 37C 0C
0 E 4 7 5 0 2 4 3 D
50 FE5 594 F32 2A7 EB6 635 D02 2E2 B42 2337 7C0 0CD 50F
FE 94 F3 A7 EB 35 D0 E2 B4 33 C CD 50 E5
0F 59 F 2A E 63 D 2E B 23 7C 0C 5 FE 9
3 B 4
FE E594 4F3 2A7 7EB 635 5D0 02E2 2B4 4233 37C 0CDD50 0FE5 594 F32

Q3
Ql
59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7
59 4F3 2A 7EB 63 5D0 2E 2B4 23 37C 0C D50 FE 594 F32 A7 EB
4 7 5 2 3 D
94 F32 2A7 EB 635 D02 2E2 B42 233 7C0 0CD 50F FE5 594F F32AA7E EB6 635D

80309
F3 A EB 63 D0 E B4 33 7C CD 50 E 94 32 7 B 35 0
4F 2A 7E 6 5D 2E 2B 23 7 0C 5 FE 59 F3 A EB 63 D 2E
32 7 B 35 0 2 4 3 C0 D 0 5 4F 2 7E 6 5D 02 2B

1
9

3
8
3

11
21

16
13
32 A7E EB6 635DD02 2E2 B42 2337 7C0 CD 50F FE5 94F 32AA7E B6 35D 02E E2B 423

(x)
A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42 37
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB 35 D0 E2 B4 33 C0
Time: 03 Hours

63 5D0 2E 2B4 23 7C 0C 50 FE 594 F32 A7 EB 635 D0 2E2 B4 233 7C CD


5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B 23 7C 0C 50
02 2B 42 37 C0 D 0F 59 4F 2A 7E 63 5D 02E 2B 423 37C 0C D5 FE

expenence
Discovery.
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0 D 0F 59

application?
B4 233 7C CD 50 FE 94 32 A7 B 35 02 E2 42 33 C0 CD 50F E5 4F
23 7C 0C 5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32
37 0C D5 0F 59 F3 2A EB 63 D 2E B 23 7C 0C 5 FE 94 F3 A7

. usmg
C0 D 0F E5 4F 2A 7E 6 5D 02 2B 42 37 0C D5 0F 59 F 2A E
CD 50 E5 94 32 7 B6 35D 02 E2 42 337 C0 D 0F E5 4F 32A 7E B6
50 FE5 94 F32 A7 EB6 35 02 E2 B42 337 C0 CD 50F E5 94F 32A 7E B6 35D
FE 94 F3 A7 EB 35 D0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02
59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 EB 35 D0 E2

doctor in 2010?
4F 2A 7E 6 5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B4
32 7E B6 35D 02 2 42 37 C0 D 0F 5 4F 2A 7E 6 5D 02E 2B 23
Note: 1. Question 1 is compulsory

A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42 37

83
20
90
59
43
64
72
36
57
30
(y)
charges a patient for a visit.
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB 35 D0 E2 B4 33 C0
63 5D0 2E 2B4 23 7C 0C 50 FE 594 F32 A7 EB 635 D0 2E2 B4 233 7C CD

52,52,56,60,63,70,70,110.
patient, count, charge).
5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B 23 7C 0C 50

.
02 2B 42 37 C0 D 0F 59 4F 2A 7E 63 5D 02E 2B 423 37C 0C D5 FE
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0 D 0F 59

(iii)Show a boxplot of the data.


. rmear regreSSIOn.
B4 233 7C CD 50 FE 94 32 A7 B 35 02 E2 42 33 C0 CD 50F E5 4F
23 7C 0C 5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32
37 0C D5 0F 59 F3 2A EB 63 D 2E B 23 7C 0C 5 FE 94 F3 A7
C0 D 0F E5 4F 2A 7E 6 5D 02 2B 42 37 0C D5 0F 59 F 2A E
CD 50 E5 94 32 7 B6 35D 02 E2 42 337 C0 D 0F E5 4F 32A 7E B6

Years of experience Salary in $100


50 FE5 94 F32 A7 EB6 35 02 E2 B42 337 C0 CD 50F E5 94F 32A 7E B6 35D
FE 94 F3 A7 EB 35 D0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02
59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 EB 35 D0 E2

Page 10f2
4F 2A 7E 6 5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B4
32 7E B6 35D 02 2 42 37 C0 D 0F 5 4F 2A 7E 6 5D 02E 2B 23
A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42 37
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB 35 D0 E2 B4 33 C0
2. Answer any three out of remaining five questions.

63 5D0 2E 2B4 23 7C 0C 50 FE 594 F32 A7 EB 635 D0 2E2 B4 233 7C CD


d) Elucidate Market Basket Analysis with an example.

5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B 23 7C 0C 50
02 2B 42 37 C0 D 0F 59 4F 2A 7E 63 5D 02E 2B 423 37C 0C D5 FE
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0 D 0F 59
B4 233 7C CD 50 FE 94 32 A7 B 35 02 E2 42 33 C0 CD 50F E5 4F
c) A dimension table is wide, the fact table is deep. Explain

23 7C 0C 5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32

2337C0CD50FE594F32A7EB635D02E2B4
37 0C D5 0F 59 F3 2A EB 63 D 2E B 23 7C 0C 5 FE 94 F3 A7
C0 D 0F E5 4F 2A 7E 6 5D 02 2B 42 37 0C D5 0F 59 F 2A E
CD 50 E5 94 32 7 B6 35D 02 E2 42 337 C0 D 0F E5 4F 32A 7E B6
50 FE5 94 F32 A7 EB6 35 02 E2 B42 337 C0 CD 50F E5 94F 32A 7E B6 35D
FE 94 F3 A7 EB 35 D0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02
3. Assume any suitable data wherever required and justify the same.

59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 EB 35 D0 E2
(i) Draw a star schema diagram for the above data warehouse.

4F 2A 7E 6 5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B4

(i) What are the mean, median, mode and midrange of the data?
32 7E B6 35D 02 2 42 37 C0 D 0F 5 4F 2A 7E 6 5D 02E 2B 23
A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42 37
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB 35 D0 E2 B4 33 C0
Paper / Subject Code: 88903 / Data Warehouse and Mining

63 5D0 2E 2B4 23 7C 0C 50 FE 594 F32 A7 EB 635 D0 2E2 B4 233 7C CD

(ii) Find theftrst quartile (Ql) and the third quartile (Q3) of the data.
5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B 23 7C 0C 50
02 2B 42 37 C0 D 0F 59 4F 2A 7E 63 5D 02E 2B 423 37C 0C D5 FE
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0 D 0F 59
B4 233 7C CD 50 FE 94 32 A7 B 35 02 E2 42 33 C0 CD 50F E5 4F
23 7C 0C 5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 3
37 0C D5 0F 59 F3 2A EB 63 D 2E B 23 7C 0C 5 FE 94 F3
C0 D 0F E5 4F 2A 7E 6 5D 02 2B 42 37 0C D5 0F 59 F 2
CD 50 E5 94 32 7 B6 35D 02 E2 42 337 C0 D 0F E5 4F 32A

values for salary (in thousands of dollars), shown in increasing order: 30, 36,47,50,
relational database with the schema fee (day, month, year, doctor, hospital,
(iii)To obtain the same list, write an SQL query assuming the data are stored in a
(ii) Starting with the base cuboid [day, doctor, patient], what specific OLAP
operations should be performed in order to list the total fee collected by each
patient, and the two measures count and charge, where charge is the fee that a doctor
b) Describe the steps involved in Data Mining when viewed as a process of knowledge
a) Why is data integration required in a data warehouse, more so than in an operational

50 FE5 94 F32 A7 EB6 35 02 E2 B42 337 C0 CD 50F E5 94F 32A


FE 94 F3 A7 EB 35 D0 E2 B4 33 C CD 50 E5 94 32 7
59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7
4F 2A 7E 6 5D 2E 2B 23 37 0C D5 FE 59 F3 A E

a) Suppose that the data for analysis includes the attribute salary. We have the following [10]
b) Develop a model to predict the salary of college graduates with 10 years of work [10]
Q2 a) Suppose that a data warehouse consists of the three dimensions time, doctor and [10]

32 7E B6 35D 02 2 42 37 C0 D 0F 5 4F 2A 7E
[5]
[5]
[5]
[5]
Marks: 80

A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB
63 5D0 2E 2B4 23 7C 0C 50 FE 594 F32 A7 EB 63
5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63
02 2B 42 37 C0 D 0F 59 4F 2A 7E 63 5
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D
B4 233 7C CD 50 FE 94 32 A7 B 35 0
(})

23 7C 0C 5 FE 594 F3 A7 EB 635 D0
37 0C D5 0F 59 F3 2A EB 63 D 2
C E 4 7 5 0
C CD 50 E5 94 32 A7 B6 5 02
C0 0CD 50F FE5 94F F32AA7E EB6 35DD02E E2B
C 5 9 3 B 3
0C D5 0FE E594 4F3 2A7 7EB 635 5D0 02E2 2B4 4233
D 0F 59 F 2A E 63 D 2E B 23 7C
CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0
50 FE 94 32 A7 B 35 0 E2 4 33 C0 CD
50 FE5 594F F32 A7E EB6 635DD02 2E2B B42 2337 7C0 CD 50F
50 FE5 94F 32 A7E B6 35D 02 E2B 42 337 C0 CD 50F E59
F 9 3 A B 3 0 E 4 3 C C 50 E 4
FE E594 4F3 2A7 7EB 635 5D0 2E2 2B4 233 37C0 0CDD50 FE5 594F F32
2 6 2 2 7 F A
E5 594F F32AA7E EB6 35DD02 E2B B42 337 C0C CD5 50F E59 94F3 32A 7EB
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E 6
59 F3 A EB 3 D0 E B4 33 C CD 50 E 94 32 7 B 35
4F 2 7E 6 5D 2 2B 2 7 0 F 59 F A E 63 D

80309
4F 32AA7E B63 35D 02E E2B 423 337C C0C CD5 50FE E59 4F3 32A 7EB B63 5D0 02E
4 7 5 2
F3 32A 7EB B635 5D0 02E 2B4 4233 37C 0CDD50 0FE5 594 F32 2A7 EB6 635 D02 2E2 B42
2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB 35 D0 E2 B4 33
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 63 D0 2E B4 233 7C
B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63 5D 2E 2B 23 7C 0C
35 02 E2 42 33 C0 CD 50F E5 4F 32A 7E B6 5D 02E 2B 423 37 0C D5
D0 E B4 33 7C CD 50 E 94 32 7 B6 35 02 2 42 37 C0 D 0F
2E 2B 23 7C 0C 50 FE 594 F32 A7 EB 35 D0 E2 B4 33 C0 CD 50F E5

PI

P5
P3
P4
P2
2B 423 37C 0C D5 FE 59 F3 A EB 63 D0 2E B4 233 7C CD 50 E 94
42 37 0C D5 0F 59 4F3 2A 7E 63 5D 2E 2B 23 7C 0C 5 FE 594 F3
33 C0 D 0F E5 4F 2A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A
7C C 50 E 94 32 7 B 35 02 E2 42 33 C0 D 0F E5 4F 2A 7E

50
40
30
20
10

suitable examples.
8.5
A
2

9
8
TID

technique in detail.
0C D5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32 7E B6

given set of points.


D5 0F 59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 B 35
0F E59 4F 2A 7E 63 5D 2E 2B 23 37C 0C D5 FE 594 F3 A7 EB 635 D0
E5 4F 32A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A EB 63 D 2E

3
2

5
4

1
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E 6 5D 02E 2B
F3 A EB 35 D0 E B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42
2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB 35 D0 E2 B4 33

2,5
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 63 D0 2E B4 233 7C
Items

B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63 5D 2E 2B 23 7C 0C

1,3,5
2,3,5
1,3,4

35 02 E2 42 33 C0 CD 50F E5 4F 32A 7E B6 5D 02E 2B 423 37 0C D5


D0 E B4 33 7C CD 50 E 94 32 7 B6 35 02 2 42 37 C0 D 0F
1,2,3,5
2E 2B 23 7C 0C 50 FE 594 F32 A7 EB 35 D0 E2 B4 33 C0 CD 50F E5
2B 423 37C 0C D5 FE 59 F3 A EB 63 D0 2E B4 233 7C CD 50 E 94
42 37 0C D5 0F 59 4F3 2A 7E 63 5D 2E 2B 23 7C 0C 5 FE 594 F3
33 C0 D 0F E5 4F 2A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A
7C C 50 E 94 32 7 B 35 02 E2 42 33 C0 D 0F E5 4F 2A 7E
0C D5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32 7E B6
How is dimensional modeling different?

***
D5 0F 59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 B 35
0F E59 4F 2A 7E 63 5D 2E 2B 23 37C 0C D5 FE 594 F3 A7 EB 635 D0
a separate set of tuples to evaluate pruning?

E5 4F 32A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A EB 63 D 2E

Page 2 of2
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E 6 5D 02E 2B
F3 A EB 35 D0 E B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42
b) Consider the transac f IOn database gIven below,

2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB 35 D0 E2 B4 33
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 63 D0 2E B4 233 7C
B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63 5D 2E 2B 23 7C 0C

b) What is spatial data? Explain CLARANS Extension.


35 02 E2 42 33 C0 CD 50F E5 4F 32A 7E B6 5D 02E 2B 423 37 0C D5
D0 E B4 33 7C CD 50 E 94 32 7 B6 35 02 2 42 37 C0 D 0F
find all frequent itemsets and strong association rules.

2E 2B 23 7C 0C 50 FE 594 F32 A7 EB 35 D0 E2 B4 33 C0 CD 50F E5


2B 423 37C 0C D5 FE 59 F3 A EB 63 D0 2E B4 233 7C CD 50 E 94
42 37 0C D5 0F 59 4F3 2A 7E 63 5D 2E 2B 23 7C 0C 5 FE 594 F3
33 C0 D 0F E5 4F 2A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A

2337C0CD50FE594F32A7EB635D02E2B4
7C C 50 E 94 32 7 B 35 02 E2 42 33 C0 D 0F E5 4F 2A 7E
0C D5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32 7E B6
D5 0F 59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 B 35
0F E59 4F 2A 7E 63 5D 2E 2B 23 37C 0C D5 FE 594 F3 A7 EB 635 D0
E5 4F 32A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A EB 63 D 2E
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E 6 5D 02E 2B
F3 A EB 35 D0 E B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42
2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB 35 D0 E2 B4 33
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 63 D0 2E B4 233 7C
B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63 5D 2E 2B 23 7C 0C
Paper / Subject Code: 88903 / Data Warehouse and Mining

Use Apriori Algorithm with min-support count 2 and min-confidence

35 02 E2 42 33 C0 CD 50F E5 4F 32A 7E B6 5D 02E 2B 423 37 0C D5


D0 E B4 33 7C CD 50 E 94 32 7 B6 35 02 2 42 37 C0 D 0F
2E 2B 23 7C 0C 50 FE 594 F32 A7 EB 35 D0 E2 B4 33 C0 CD 50F E5
2B 423 37C 0C D5 FE 59 F3 A EB 63 D0 2E B4 233 7C CD 50 E 94
Q5 a) Show the dendrogram created by the complete link clustering algorithm for the

42 37 0C D5 0F 59 4F3 2A 7E 63 5D 2E 2B 23 7C 0C 5 FE 594 F3
33 C0 D 0F E5 4F 2A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3
7C C 50 E 94 32 7 B 35 02 E2 42 33 C0 D 0F E5 4F 2
Q4 a) Why is tree pruning useful in decision tree induction? What is a drawback of using

0C D5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32

to improve on the effectiveness of search engines and crawlers. Explain Page Rank
60% to

D5 0F 59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7
0F E59 4F 2A 7E 63 5D 2E 2B 23 37C 0C D5 FE 594 F3 A7
E5 4F 32A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A E

b) What is Web Structure Mining? List the, approaches used to structure the web pages [10]
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E
Q6 a) Demonstrate Multidimensional and Multilevel Association Rule Mining with [10]
[10]
[10]
[10]
[10]

F3 A EB 35 D0 E B4 33 C CD 50 E5 94 32 7 B
b) Why is entity-relationship modeling technique not suitable for the data warehouse? [10]

2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 6
B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63
35 02 E2 42 33 C0 CD 50F E5 4F 32A 7E B6 5
D0 E B4 33 7C CD 50 E 94 32 7 B6 35
2E 2B 23 7C 0C 50 FE 594 F32 A7 EB 35 D
2B 423 37C 0C D5 FE 59 F3 A EB 63 D0
42 37 0C D5 0F 59 4F3 2A 7E 63 5D
33 C0 D 0F E5 4F 2A 7E B6 5D 02
7C C 50 E 94 32 7 B 35 02
0 D F 59 F A E 63 D E

Vous aimerez peut-être aussi