Vous êtes sur la page 1sur 7

Assignment No B-09

Aim
Implement k-means for clustering data of children belonging to different age groups.

Pre-requisite
1. Data Mining concepts.
2. K-means clustering algorithm.
3. Programming language basics.

Objective
1. To understand idea of K-means clustering algorithm.
2. To implement program for K-means clustering algorithm.

Problem Statement
Implement k-means for clustering data of children belonging to different age groups to
perform some specific activities.
Formulate the Feature vector for following parameters:
height, weight, age, IQ.
Formulate the data for 40 children to form 3 clusters.

Hardware / Software Used


1. Python.
2. Scipy library.
3. Numpy library.
4. Matplotlib library.

Mathematical Model
M = { s, e, X, Y, DD, NDD, fme , M emshared , success, f ailure, CP UCoreCount }

1. s: Start State - Read the dataset.


2. e: End State - Clustering is done using K-mean Approach.
3. X: Input - Dataset having parameters height, weight, age, IQ.
4. Y: Output - Clusters of children data having parameters weight, age, IQ.
5. DD: Deterministic Data - Dataset
6. NDD: Non-Deterministic Data - Number of iteration to finding the clusters.
7. F me = x, y = kmeans2(data, k, iter), where
x = centroids of k clusters calculated.
y = calculated cluster number for each feature vector.
data = input data from dataset.
k = no of clusters required.
iter = no of iterations to be performed for clustering.
8. Mem shared = Shared Memory is used.
9. Success: Clustering is done.
10. Failure: Dataset is not found or clusters are not properly formed.
11. CPUCoreCount : 1.

Theory
Clustering
A cluster is a subset of data that combines together by using some distance measure. the
clustering can be defined as the process of organizing the data into groups whose members are
similar with some parameters i. e., the collection data points which are similar or dissimilar to
some parameters belonging to the other clusters. There are many methods to computing a clusters, the important method is K-Means. Since, clustering partitioning data into no. of subsets,
the data in each subset shares common traits, often according to some defined distance measure.

K-Mean Clustering
K-means clustering is a method of vector quantization used in data mining. basically Kmeans clustering partitioned the n observations into k- clusters in each observation belongs to
the cluster with the nearest mean as a prototype of the clusters. This results in a partitioning
of the data space into Voronoi cells.
K-means clustering is basically NP-Hard. There are efficient heuristic algorithms that are
commonly used for optimal solutions. K-means is easy to work on large data sets.
The k-means algorithm is an algorithm to cluster n objects based on attributes into k
partitions, where k n.
It assumes that the object attributes form a vector space.
An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj
containing data points so as to minimize the sum-of-squares criterion.
Where xn is a vector representing the the nth data point and uj is the geometric centroid
of the data points in Sj.
Simply speaking k-means clustering is an algorithm to classify or to group the objects
based on attributes/features into K number of group.
K is positive integer number.
The grouping is done by minimizing the sum of squares of distances between data and
the corresponding cluster centroid.

Procedure
Execution of Program: python prgm name.py

Conclusion
Thus, we implement K-means for clustering data of children belonging to different age
group.

Program
=================================================
GROUP B
Assignment No : B9
Title : Implement k-means for clustering data of children belonging to different age groups
to perform some specific activities. Formulate the Feature vector for following parameters: height, weight,age, IQ. Formulate the data for 40 children to form 3 clusters.
Roll No :
Batch : B
Class : BE ( Computer )
=================================================

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans2, whiten
data=[]
raw file=open(B9 kmeans rawdata.csv,r)
lines=raw file.readlines()
for x in lines:
abc=[]
temp=x.split(,)
temp[3]=temp[3].replace(,)
abc.append(int(temp[0]))
abc.append(int(temp[1]))
abc.append(int(temp[2]))
abc.append(int(temp[3]))
data.append(abc)
x, y = kmeans2(whiten(data), 3, iter = 1000)
noofsamplesincluster1=0
noofsamplesincluster2=0
noofsamplesincluster3=0
for n in xrange(len(y)):
if y[n]==0:
noofsamplesincluster1+=1
if y[n]==1:
noofsamplesincluster2+=1
if y[n]==2:
noofsamplesincluster3+=1
print Cluster 1 = ,noofsamplesincluster1,Cluster 2 = ,noofsamplesincluster2,Cluster
3 = ,noofsamplesincluster3
print Cluster 1
4

for n in xrange(len(y)):
if y[n]==0:
print Height: ,data[n][0],Weight: ,data[n][1],Age: ,data[n][2],IQ: ,data[n][3]
print Cluster 2
for n in xrange(len(y)):
if y[n]==1:
print Height: ,data[n][0],Weight: ,data[n][1],Age: ,data[n][2],IQ: ,data[n][3]
print Cluster 3
for n in xrange(len(y)):
if y[n]==2:
print Height: ,data[n][0],Weight: ,data[n][1],Age: ,data[n][2],IQ: ,data[n][3]
height=[]
weight=[]
for p in data:
height.append(p[0])
weight.append(p[1])
plt.scatter(height, weight, c=y)
plt.show()

Output

administrator@administrator-OptiPlex-390:~$ cd Desktop/
administrator@administrator-OptiPlex-390:~/Desktop$ cd CL2-Final-All-Prog/
administrator@administrator-OptiPlex-390:~/Desktop/CL2-Final-All-Prog$ python B9_kmeans.
Cluster 1 = 13 Cluster 2 = 14 Cluster 3 = 13
Cluster 1
Height: 130
Height: 140
Height: 131
Height: 133
Height: 141
Height: 134
Height: 142
Height: 135
Height: 132
Height: 139
Height: 132
Height: 129
Height: 138
Cluster 2
Height:
Height:
Height:

Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:

113 Weight:
116 Weight:
114 Weight:

30
36
31
33
37
34
38
35
32
35
32
29
34

Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:

19 Age:
21 Age:
20 Age:

10
12
10
11
12
11
12
11
10
12
11
10
12

IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:

6 IQ:
7 IQ:
6 IQ:

70
76
71
73
76
74
77
75
72
75
72
69
74

61
62
59
5

Height: 121
Height: 111
Height: 117
Height: 122
Height: 115
Height: 112
Height: 120
Height: 124
Height: 123
Height: 118
Height: 125
Cluster 3

Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:
Weight:

24
17
22
25
20
18
23
27
26
23
28

Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:
Age:

8
6
7
8
7
6
8
9
8
7
9

IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:
IQ:

65
61
63
66
61
60
64
68
67
64
69

Height: 144 Weight: 39 Age: 13 IQ: 80


Height: 148 Weight: 42 Age: 14 IQ: 82
Height: 155 Weight: 45 Age: 15 IQ: 85
Height: 145 Weight: 40 Age: 13 IQ: 81
Height: 149 Weight: 43 Age: 14 IQ: 83
Height: 146 Weight: 41 Age: 13 IQ: 82
Height: 156 Weight: 46 Age: 15 IQ: 86
Height: 150 Weight: 44 Age: 14 IQ: 84
Height: 157 Weight: 47 Age: 15 IQ: 87
Height: 154 Weight: 44 Age: 15 IQ: 84
Height: 153 Weight: 43 Age: 15 IQ: 83
Height: 143 Weight: 38 Age: 13 IQ: 79
Height: 147 Weight: 41 Age: 14 IQ: 81
administrator@administrator-OptiPlex-390:~/Desktop/CL2-Final-All-Prog$

Plagiarism Score

Vous aimerez peut-être aussi