Académique Documents
Professionnel Documents
Culture Documents
A 7 8 5
6A B7 F22 2531 1F97 718 065 675 50C6 6B3 38E5
B F2 53 F9 1 06 67 0C B 8E 7B
AB 7F 25 1F 71 80 56 50 6B 38 57 5
2 3 9 6 7 C E 6
B7 7F22 253 1F9 718 8065 5675 50C 6B3 38E 57B B56AAB7
F2 53 1F9 71 06 67 0C 6B 8E 57B 56 B F2
7F 25 1F 7 80 56 50 6 38 57 5 A 7F 25
22 3 9 18 65 75 C B3 E B 6A B7 2 3
Q3
Q2
22 531 1F97 7180 065 675 0C6 6B3 8E5 57B 56A B7 F22 2531 1F9
5 6 0 8 5 F 7
25 31F F971 1806 656 750 C6B B38 E57 7B5 6AB B7F 225 531F F97 180
a)
b)
a)
c)
Q1 a)
31 97 80 56 75 C6 3 E5 B5 6A 7 22 31 97 18 65
d)
b)
72259
53 F9 18 65 7 0C B 8E 7B 6 B F2 53 F9 1 06 67
1F 7 0 6 50 6 38 5 5 A 7F 2 1 7 80 56 50
1F 971 1806 6567 750 C6B B38 E57 7B56 6AB B7F 225 531F F971 180 656 750 C6B
C E 2 6 7 C
F9 9718 8065 5675 50C 6B3 38E 57B B56AAB7 7F2 253 31F9 9718 806 567 50C 6B 38E
71 06 67 0C 6B 8E 57 56 B F2 25 1F 71 06 567 50 6B 38 57
97 80 56 50 6 38 5 B5 A 7F 2 31 97 80 56 5 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
so effective?
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F
38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 567
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
Note: 1. Question 1 is compulsory.
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F
Page 1 of 2
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
2. Answer any three out of remaining five questions.
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
56AB7F22531F97180656750C6B38E57B
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3
3. Assume any suitable data wherever required and justify the same.
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E
occurrence. Describe various methods for handling this problem.
What are spatial data structures? Outline their importance in GIS.
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
the count of each generalized data tuple (i.e., of each row entry)?
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
ii. Use your algorithm to construct a decision tree from the given data.
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
Paper / Subject Code: 88903 / Data Warehouse and Mining
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 56
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 7
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75
What is Metadata? Why do we need metadata when search engines like Google seem
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0
With respect to web mining, is it possible to detect visual objects using meta-objects?
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50
i. How would you modify the basic decision tree algorithm to take into consideration
replication (synchronous or asynchronous) is better suited for data warehousing? Why?
lowest conceptual level (e.g., for a given student, course, semester, and instructor
For a given row entry, count represents the number of data tuples having the values for
In real-world data, tuples with missing values for some attributes are a common
combination), the avg-grade measure stores the actual course grade of the student. At
OLAP operations (e.g., roll-up from semester to year) should you perform in order
What is the relationship between data warehousing and data replication? Which form of
higher conceptual levels, avg-grade stores the average grade for the given combination.
ii. Starting with the base cuboid [student, course, semester, instructor], what specific
student, course, semester, and instructor, and two measures count and avg-grade. At the
been generalized. For example, “31: : : 35” for age represents the age range of 31 to 35.
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B
[5]
[5]
[5]
[5]
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B
The following table consists of training data from an employee database. The data have [10]
[10]
Suppose that a data warehouse for DB-University consists of the four dimensions [10]
Marks: 80
Q6
Q5
Q4
Q3
22 531 1F97 7180 065 675 0C6 6B3 8E5 57B 56A B7 F22 2531 1F9
5 6 0 8 5 F 7
25 31F F971 1806 656 750 C6B B38 E57 7B5 6AB B7F 225 531F F97 180
a)
a)
a)
b)
b)
b)
b)
31 97 80 56 75 C6 3 E5 B5 6A 7 22 31 97 18 65
72259
53 F9 18 65 7 0C B 8E 7B 6 B F2 53 F9 1 06 67
1F 7 0 6 50 6 38 5 5 A 7F 2 1 7 80 56 50
1F 971 1806 6567 750 C6B B38 E57 7B56 6AB B7F 225 531F F971 180 656 750 C6B
C E 2 6 7 C
Y
X
F9 9718 8065 5675 50C 6B3 38E 57B B56AAB7 7F2 253 31F9 9718 806 567 50C 6B 38E
ii.
i.
71 06 67 0C 6B 8E 57 56 B F2 25 1F 71 06 567 50 6B 38 57
97 80 56 50 6 38 5 B5 A 7F 2 31 97 80 56 5 C6 3 E5 B5
4
4
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
support:
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
4
8
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
T8
T7
T6
T5
T4
T3
T2
T1
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F
8
38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
15
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 567
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50
4
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6
24
D
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3
does (i) have over (ii)?
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E
B, C
D, B
Nominal attributes
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
Transaction ID Items
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
12
24
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
A, D, E
B, A, D
described by the following:
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
C, A, B, E
E, A, D, B
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
D, A, C, E, B
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F
***
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E
Page 2 of 2
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
6B 38E 57 56 AB 7F2 25 1F 71 806 567 50 6B 38 57 56 AB 7F 25 1F
38 57 B5 AB 7F 25 31F 97 80 56 50 C6 38 E57 B5 A 7F 225 31 97
E5 B5 6A 7 22 31 97 18 65 75 C6 B3 E5 B 6A B7 22 31 F9 18
7B 6A B7 F2 53 F9 18 065 675 0C B3 8E 7B 56 B F2 53 F9 718 06
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 567
56AB7F22531F97180656750C6B38E57B
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 750 C6
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75 C B3
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50 6B 38 57
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C6 3 E5 B5
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C B3 8E 7B 6A
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6B 8E 57 56 B
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6 38 5 B5 A 7F
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B3 E5 7B 6A B7 22
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B 8E 7B 56 B F2 53
Paper / Subject Code: 88903 / Data Warehouse and Mining
Use complete linkage algorithm to find the clusters from the following dataset.
56 B F2 253 1F9 71 06 67 0C 6B 8E 57 56 AB 7F2 253 1F 71 06 56
AB 7F 25 1F 71 80 56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56
7F 225 31F 97 80 656 750 C6 38 E57 B5 6A 7F 225 31 97 180 656 7
22 31 97 180 65 75 C6 B3 E5 B 6A B7 22 31 F9 18 65 75
53 F9 18 65 67 0C B 8E 7B 56 B F2 53 F9 71 06 67 0
The three cluster centers after the first round of execution (ii) The final three clusters.
1F 71 06 67 50 6B 38 57 56 AB 7F 25 1F 71 80 56 50
However, multiple occurrences of an item in the same shopping basket, such as four
cakes and three jugs of milk, can be important in transactional data analysis. How can
Frequent pattern mining algorithms considers only distinct items in a transaction.
Why is tree pruning useful in decision tree induction? What is a drawback of using a
completeness, and consistency. For each of the above three issues, discuss how data
mining functionalities does this business need (e.g., think of the kinds of patterns that
Differentiate between simple linkage, average linkage and complete linkage algorithms.
Generate Frequent Pattern Tree for the following transaction with 30% minimum
separate set of tuples to evaluate pruning? Given a decision tree, you have the option of
as the center of each cluster, respectively. Use the k-means algorithm to show only (i)
could be mined)? Can such patterns be generated alternatively by data query processing
quality assessment can depend on the intended use of the data, giving examples. Propose
into three clusters, where the points are: A1 (2, 10), A2 (2, 5), A3 (8, 4), B1 (5, 8), B2 (7,
The distance function is Euclidean distance. Suppose initially we assign A1, B1, and C1
97 80 56 50 C6 38 E5 B5 A 7F 22 31 97 80 656 75 C
18 65 75 C B3 E 7B 6A B7 22 53 F9 18 65 75 0C
06 67 0C 6B 8E 57 56 B F2 53 1F 71 06 67 0C 6
56 50 6B 38 57 B5 AB 7F 25 1F 97 80 56 50 6
75 C6 3 E5 B 6A 7 22 31 9 18 65 75 C B
0C B 8E 7B 56 B F2 53 F9 71 06 67 0C 6B
Present an example where data mining is crucial to the success of a business. What data [10]
[10]
[10]
[10]
[10]
Suppose that the data mining task is to cluster points (with (x, y) representing location) [10]
[10]
Q3
Ql
59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7
59 4F3 2A 7EB 63 5D0 2E 2B4 23 37C 0C D50 FE 594 F32 A7 EB
4 7 5 2 3 D
94 F32 2A7 EB 635 D02 2E2 B42 233 7C0 0CD 50F FE5 594F F32AA7E EB6 635D
80309
F3 A EB 63 D0 E B4 33 7C CD 50 E 94 32 7 B 35 0
4F 2A 7E 6 5D 2E 2B 23 7 0C 5 FE 59 F3 A EB 63 D 2E
32 7 B 35 0 2 4 3 C0 D 0 5 4F 2 7E 6 5D 02 2B
1
9
3
8
3
11
21
16
13
32 A7E EB6 635DD02 2E2 B42 2337 7C0 CD 50F FE5 94F 32AA7E B6 35D 02E E2B 423
(x)
A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42 37
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB 35 D0 E2 B4 33 C0
Time: 03 Hours
expenence
Discovery.
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0 D 0F 59
application?
B4 233 7C CD 50 FE 94 32 A7 B 35 02 E2 42 33 C0 CD 50F E5 4F
23 7C 0C 5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32
37 0C D5 0F 59 F3 2A EB 63 D 2E B 23 7C 0C 5 FE 94 F3 A7
. usmg
C0 D 0F E5 4F 2A 7E 6 5D 02 2B 42 37 0C D5 0F 59 F 2A E
CD 50 E5 94 32 7 B6 35D 02 E2 42 337 C0 D 0F E5 4F 32A 7E B6
50 FE5 94 F32 A7 EB6 35 02 E2 B42 337 C0 CD 50F E5 94F 32A 7E B6 35D
FE 94 F3 A7 EB 35 D0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02
59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 EB 35 D0 E2
doctor in 2010?
4F 2A 7E 6 5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B4
32 7E B6 35D 02 2 42 37 C0 D 0F 5 4F 2A 7E 6 5D 02E 2B 23
Note: 1. Question 1 is compulsory
A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42 37
83
20
90
59
43
64
72
36
57
30
(y)
charges a patient for a visit.
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB 35 D0 E2 B4 33 C0
63 5D0 2E 2B4 23 7C 0C 50 FE 594 F32 A7 EB 635 D0 2E2 B4 233 7C CD
52,52,56,60,63,70,70,110.
patient, count, charge).
5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B 23 7C 0C 50
.
02 2B 42 37 C0 D 0F 59 4F 2A 7E 63 5D 02E 2B 423 37C 0C D5 FE
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0 D 0F 59
Page 10f2
4F 2A 7E 6 5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B4
32 7E B6 35D 02 2 42 37 C0 D 0F 5 4F 2A 7E 6 5D 02E 2B 23
A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42 37
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB 35 D0 E2 B4 33 C0
2. Answer any three out of remaining five questions.
5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B 23 7C 0C 50
02 2B 42 37 C0 D 0F 59 4F 2A 7E 63 5D 02E 2B 423 37C 0C D5 FE
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0 D 0F 59
B4 233 7C CD 50 FE 94 32 A7 B 35 02 E2 42 33 C0 CD 50F E5 4F
c) A dimension table is wide, the fact table is deep. Explain
23 7C 0C 5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32
2337C0CD50FE594F32A7EB635D02E2B4
37 0C D5 0F 59 F3 2A EB 63 D 2E B 23 7C 0C 5 FE 94 F3 A7
C0 D 0F E5 4F 2A 7E 6 5D 02 2B 42 37 0C D5 0F 59 F 2A E
CD 50 E5 94 32 7 B6 35D 02 E2 42 337 C0 D 0F E5 4F 32A 7E B6
50 FE5 94 F32 A7 EB6 35 02 E2 B42 337 C0 CD 50F E5 94F 32A 7E B6 35D
FE 94 F3 A7 EB 35 D0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02
3. Assume any suitable data wherever required and justify the same.
59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 EB 35 D0 E2
(i) Draw a star schema diagram for the above data warehouse.
4F 2A 7E 6 5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B4
(i) What are the mean, median, mode and midrange of the data?
32 7E B6 35D 02 2 42 37 C0 D 0F 5 4F 2A 7E 6 5D 02E 2B 23
A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42 37
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB 35 D0 E2 B4 33 C0
Paper / Subject Code: 88903 / Data Warehouse and Mining
(ii) Find theftrst quartile (Ql) and the third quartile (Q3) of the data.
5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63 D 2E B 23 7C 0C 50
02 2B 42 37 C0 D 0F 59 4F 2A 7E 63 5D 02E 2B 423 37C 0C D5 FE
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0 D 0F 59
B4 233 7C CD 50 FE 94 32 A7 B 35 02 E2 42 33 C0 CD 50F E5 4F
23 7C 0C 5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 3
37 0C D5 0F 59 F3 2A EB 63 D 2E B 23 7C 0C 5 FE 94 F3
C0 D 0F E5 4F 2A 7E 6 5D 02 2B 42 37 0C D5 0F 59 F 2
CD 50 E5 94 32 7 B6 35D 02 E2 42 337 C0 D 0F E5 4F 32A
values for salary (in thousands of dollars), shown in increasing order: 30, 36,47,50,
relational database with the schema fee (day, month, year, doctor, hospital,
(iii)To obtain the same list, write an SQL query assuming the data are stored in a
(ii) Starting with the base cuboid [day, doctor, patient], what specific OLAP
operations should be performed in order to list the total fee collected by each
patient, and the two measures count and charge, where charge is the fee that a doctor
b) Describe the steps involved in Data Mining when viewed as a process of knowledge
a) Why is data integration required in a data warehouse, more so than in an operational
a) Suppose that the data for analysis includes the attribute salary. We have the following [10]
b) Develop a model to predict the salary of college graduates with 10 years of work [10]
Q2 a) Suppose that a data warehouse consists of the three dimensions time, doctor and [10]
32 7E B6 35D 02 2 42 37 C0 D 0F 5 4F 2A 7E
[5]
[5]
[5]
[5]
Marks: 80
A7 B 35 0 E2 B4 33 C CD 50 E5 94 32 7 B6
EB 63 D0 2E B4 233 7C 0CD 50 FE 94 F32 A7 EB
63 5D0 2E 2B4 23 7C 0C 50 FE 594 F32 A7 EB 63
5D 2E 2B 23 37 0C D5 FE 59 F3 A EB 63
02 2B 42 37 C0 D 0F 59 4F 2A 7E 63 5
E2 4 33 C0 CD 50 E5 4F 32 7E B6 5D
B4 233 7C CD 50 FE 94 32 A7 B 35 0
(})
23 7C 0C 5 FE 594 F3 A7 EB 635 D0
37 0C D5 0F 59 F3 2A EB 63 D 2
C E 4 7 5 0
C CD 50 E5 94 32 A7 B6 5 02
C0 0CD 50F FE5 94F F32AA7E EB6 35DD02E E2B
C 5 9 3 B 3
0C D5 0FE E594 4F3 2A7 7EB 635 5D0 02E2 2B4 4233
D 0F 59 F 2A E 63 D 2E B 23 7C
CD 50 E5 4F 32 7E B6 5D 02 2B 42 37 0
50 FE 94 32 A7 B 35 0 E2 4 33 C0 CD
50 FE5 594F F32 A7E EB6 635DD02 2E2B B42 2337 7C0 CD 50F
50 FE5 94F 32 A7E B6 35D 02 E2B 42 337 C0 CD 50F E59
F 9 3 A B 3 0 E 4 3 C C 50 E 4
FE E594 4F3 2A7 7EB 635 5D0 2E2 2B4 233 37C0 0CDD50 FE5 594F F32
2 6 2 2 7 F A
E5 594F F32AA7E EB6 35DD02 E2B B42 337 C0C CD5 50F E59 94F3 32A 7EB
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E 6
59 F3 A EB 3 D0 E B4 33 C CD 50 E 94 32 7 B 35
4F 2 7E 6 5D 2 2B 2 7 0 F 59 F A E 63 D
80309
4F 32AA7E B63 35D 02E E2B 423 337C C0C CD5 50FE E59 4F3 32A 7EB B63 5D0 02E
4 7 5 2
F3 32A 7EB B635 5D0 02E 2B4 4233 37C 0CDD50 0FE5 594 F32 2A7 EB6 635 D02 2E2 B42
2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB 35 D0 E2 B4 33
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 63 D0 2E B4 233 7C
B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63 5D 2E 2B 23 7C 0C
35 02 E2 42 33 C0 CD 50F E5 4F 32A 7E B6 5D 02E 2B 423 37 0C D5
D0 E B4 33 7C CD 50 E 94 32 7 B6 35 02 2 42 37 C0 D 0F
2E 2B 23 7C 0C 50 FE 594 F32 A7 EB 35 D0 E2 B4 33 C0 CD 50F E5
PI
P5
P3
P4
P2
2B 423 37C 0C D5 FE 59 F3 A EB 63 D0 2E B4 233 7C CD 50 E 94
42 37 0C D5 0F 59 4F3 2A 7E 63 5D 2E 2B 23 7C 0C 5 FE 594 F3
33 C0 D 0F E5 4F 2A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A
7C C 50 E 94 32 7 B 35 02 E2 42 33 C0 D 0F E5 4F 2A 7E
50
40
30
20
10
suitable examples.
8.5
A
2
9
8
TID
technique in detail.
0C D5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32 7E B6
3
2
5
4
1
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E 6 5D 02E 2B
F3 A EB 35 D0 E B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42
2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB 35 D0 E2 B4 33
2,5
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 63 D0 2E B4 233 7C
Items
B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63 5D 2E 2B 23 7C 0C
1,3,5
2,3,5
1,3,4
***
D5 0F 59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 B 35
0F E59 4F 2A 7E 63 5D 2E 2B 23 37C 0C D5 FE 594 F3 A7 EB 635 D0
a separate set of tuples to evaluate pruning?
E5 4F 32A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A EB 63 D 2E
Page 2 of2
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E 6 5D 02E 2B
F3 A EB 35 D0 E B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42
b) Consider the transac f IOn database gIven below,
2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB 35 D0 E2 B4 33
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 63 D0 2E B4 233 7C
B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63 5D 2E 2B 23 7C 0C
2337C0CD50FE594F32A7EB635D02E2B4
7C C 50 E 94 32 7 B 35 02 E2 42 33 C0 D 0F E5 4F 2A 7E
0C D5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32 7E B6
D5 0F 59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7 B 35
0F E59 4F 2A 7E 63 5D 2E 2B 23 37C 0C D5 FE 594 F3 A7 EB 635 D0
E5 4F 32A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A EB 63 D 2E
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E 6 5D 02E 2B
F3 A EB 35 D0 E B4 33 C CD 50 E5 94 32 7 B6 35 02 2 42
2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB 35 D0 E2 B4 33
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 63 D0 2E B4 233 7C
B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63 5D 2E 2B 23 7C 0C
Paper / Subject Code: 88903 / Data Warehouse and Mining
42 37 0C D5 0F 59 4F3 2A 7E 63 5D 2E 2B 23 7C 0C 5 FE 594 F3
33 C0 D 0F E5 4F 2A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3
7C C 50 E 94 32 7 B 35 02 E2 42 33 C0 D 0F E5 4F 2
Q4 a) Why is tree pruning useful in decision tree induction? What is a drawback of using
0C D5 FE 594 F3 A7 EB 635 D0 E2 B4 33 7C CD 50 E5 94 32
to improve on the effectiveness of search engines and crawlers. Explain Page Rank
60% to
D5 0F 59 F3 2A EB 63 D0 2E B4 23 7C 0C 50 FE 94 F32 A7
0F E59 4F 2A 7E 63 5D 2E 2B 23 37C 0C D5 FE 594 F3 A7
E5 4F 32A 7E B6 5D 02 2B 42 37 0C D5 0F 59 F3 2A E
b) What is Web Structure Mining? List the, approaches used to structure the web pages [10]
94 32 7 B6 35 02 E2 42 337 C0 D 0F E5 4F 2A 7E
Q6 a) Demonstrate Multidimensional and Multilevel Association Rule Mining with [10]
[10]
[10]
[10]
[10]
F3 A EB 35 D0 E B4 33 C CD 50 E5 94 32 7 B
b) Why is entity-relationship modeling technique not suitable for the data warehouse? [10]
2A 7E 63 D 2E 2B 23 7C 0C 5 FE 94 F3 A7 EB
7E B6 5D 02E 2B 423 37C 0C D5 0FE 59 F3 2A EB 6
B6 35D 02 2B 42 37 0 D 0F 59 4F 2A 7E 63
35 02 E2 42 33 C0 CD 50F E5 4F 32A 7E B6 5
D0 E B4 33 7C CD 50 E 94 32 7 B6 35
2E 2B 23 7C 0C 50 FE 594 F32 A7 EB 35 D
2B 423 37C 0C D5 FE 59 F3 A EB 63 D0
42 37 0C D5 0F 59 4F3 2A 7E 63 5D
33 C0 D 0F E5 4F 2A 7E B6 5D 02
7C C 50 E 94 32 7 B 35 02
0 D F 59 F A E 63 D E