Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.

CLASSIFICATION
IN DATA MINING
SUSHIL KULKARNI
SUSHIL KULKARNI
Classification
What is classification?
Model Construction
ID3
Information Theory
Naïve Baysian Classifier
SUSHIL KULKARNI
CLASSIFICATION
PROBLEM
SUSHIL KULKARNI
CLASSIFICATION PROBLEM
Given a database D={t1,t2,…,tn} and a set
of classes C={C1,…,Cm}, the Classification
Problem is to define a mapping f: D C
where each ti is assigned to one class.
Problem is to create classes to classify

data with the help of given set of data
called training set.
SUSHIL KULKARNI
CLASSIFICATION EXAMPLES
ӂ Teachers classify students’ grades as A,

B, C, D, or F.
ӂ Identify mushrooms as poisonous or
edible.
ӂ Identify individuals with credit risks.
SUSHIL KULKARNI
Why Classification? A motivating
application
Credit approval
o A bank wants to classify its customers
based on whether they are expected to
pay back their approved loans
The history of past customers is used to
train the classifier
The classifier provides rules, which
identify potentially reliable future
customers SUSHIL KULKARNI
Why Classification? A motivating
application
Credit approval
o Classification rule:
If age = “31...40” and income = high then
credit_rating = excellent
o Future customers
Suhas : age = 35, income = high ⇒ excellent
credit rating
Heena : age = 20, income = medium ⇒ fair
credit rating
SUSHIL KULKARNI
Classification — A Two-Step
Process
Model construction: describing a set of
predetermined classes: Excellent and Fair
using training set.
Model is represented using classification

rules.
SUSHIL KULKARNI
Supervised Learning
Supervised learning (classification)

o Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
o New data is classified based on the training set
SUSHIL KULKARNI
Classification Process (1):
Model Construction
Classification
Algorithms
Training
Data
NAME RANK YEARS TEACH Classifier

Henna Assistant Prof 3 no (Model)
Leena Assistant Prof 7 yes
Meena Professor 2 yes
Dinesh Associate Prof 7 yes
IF rank = ‘professor’
Dinu Assistant Prof 6 no
OR years > 6
Amar Associate Prof 3 no
THEN teach = ‘yes’
SUSHIL KULKARNI
Classification Process (2): Use
the Model in Prediction
Classifier
Testing
Data Unseen Data
(Dina, Professor, 4)
NAME RANK YEARS TENURED Teach?
Swati Assistant Prof 2 no
Malika Associate Prof 7 no
Tina Professor 5 yes
June Assistant Prof 7 yes SUSHIL KULKARNI
Model Construction: Example
Sr. Gender Age BP Drug
1 M 20 Normal A
2 F 73 Normal B
3 M 37 High A
4 M 33 Low B
5 F 48 High A
6 M 29 Normal A
7 F 52 Normal B
8 M 42 Low B
9 M 61 Normal B
10 F 30 Normal A
11 F 26 Low B
12 M 54 High A
SUSHIL KULKARNI
Directed Tree
Blood Pressure ?
Normal High Low
Age ? Drug A Drug B

≤ 40
> 40
Drug A Drug B
SUSHIL KULKARNI
Tree summarizes the following:

o If BP=High prescribe Drug A
o If BP=Low prescribe Drug B
o If BP=Normal and age ≤40 prescribe Drug A else prescribe
Drug B
Two classes ‘Drug A’ and ‘Drug B’ are created.
SUSHIL KULKARNI
The tree is constructed with training data and
there is no training error.
All rules that we made are 100% correct

according to training data.
In practical field data, it is unlikely that we get

rules with 100% accuracy and with high
support.
SUSHIL KULKARNI
Accuracy and Support :
o Accuracy is 100% correct for given rules.
o If BP=High prescribe Drug A ( Support = 3/12)

o If BP=Low prescribe Drug B ( Support = 3/12)
o If BP=Normal and age ≤ 40 prescribe Drug A else
prescribe Drug B ( Support = 3/12)
SUSHIL KULKARNI
Error and Support
Let t = No. of data points, r = no. of data points in a
class or node, max = maximum no. of data points in
a class or node, min = minimum no. of data points in
a class or node
o Accuracy = max / r
o Error = min / r
o Support = max / t
Accuracy and Error are calculated for classes and

support for the class is calculated with respect to
the total number of data points in a given set.
SUSHIL KULKARNI
Rules with different
accuracy & support
180 data points
E = 5/120 E = 2/60
A= 115/120 A= 58/60
S= 115/180 S= 58/180
X < 60 X > 60
115 A 58 A
5B 2B
Node P Node Q
SUSHIL KULKARNI
Criteria to grow the tree
If the attribute is a categorical then the

tree is called as classification tree.
[ Eg. Drug Prescribe]
If the attribute is continuous then the tree

is called as regression tree.
[ Eg. Income]
SUSHIL KULKARNI
CLASSIFICATION
TREES FOR
CATEGORICAL
ATTRIBUTES
SUSHIL KULKARNI
INDUCTION DECISION TREE [ ID3]
Decision tree generation consists of two phases
o Tree construction
At start, all the training examples are at the root
Partition examples recursively based on
selected attributes
o Tree pruning
• Identify and remove branches that reflect noise
or outliers
Use of decision tree: Classifying an unknown sample

o Test the attribute values of the sample against the
decision tree
SUSHIL KULKARNI
Training Dataset
This follows an example from Quinlan’s
NoID3 age income student credit_rating buys_computer
1 <=30 high no fair no
2 <=30 high no excellent no
3 31…40 high no fair yes
4 >40 medium no fair yes
5 >40 low yes fair yes
6 >40 low yes excellent no
7 31…40 low yes excellent yes
8 <=30 medium no fair no
9 <=30 low yes fair yes
10 >40 medium yes fair yes
11 <=30 medium yes excellent yes
12 31…40 medium no excellent yes
SUSHIL KULKARNI
Output: ID 3 for “buys_computer”
‘no’ and ‘yes’ are two classes created
age?
<=30
31..40 >40
student? yes credit rating?
no excellent fair
yes
no yes no yes
SUSHIL KULKARNI
ANOTHER EXAMPLE:
MARKS
x
<90 >=90
ӂ If x >= 90 then grade =A.
ӂIf 80<=x<90 then grade =B. x A
ӂIf 70<=x<80 then grade =C. <80 >=80

ӂIf 60<=x<70 then grade =D. x B
ӂIf x<50 then grade =F <70 >=70
x C
<50 >=60
F D
SUSHIL KULKARNI
ALGORITHM FOR ID 3
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-
and-conquer manner
At start, all the training examples are at the root
Attributes are categorical
Samples are partitioned recursively based on
selected attributes
Test attributes are selected on the basis of a
heuristic or statistical measure (e.g., information
gain)
SUSHIL KULKARNI
ALGORITHM FOR ID 3
Conditions for stopping partitioning

All samples for a given node belong to the
same class
There are no remaining attributes for
further partitioning – majority voting is
employed for classifying the leaf
There are no samples left
SUSHIL KULKARNI
ID 3 : ADVANTAGES
Easy to understand.
Easy to generate rules
SUSHIL KULKARNI
ID 3 :
DISADVANTAGES
May suffer from over fitting.
Does not easily handle nonnumeric

data.
Can be quite large – pruning is
necessary.
SUSHIL KULKARNI
INFORMATION
THEORY
SUSHIL KULKARNI
INFORMATION THEORY
SUSHIL KULKARNI
INFORMATION THEORY
When all the marbles in the bowl are
mixed up, little information is given.
When the marbles in the bowl are

distributed in different classes , more
information is given.
SUSHIL KULKARNI
ENTROPY
Entropy gives an idea of how to split an

attribute from a tree.
‘yes’ or ‘no’ in our example.
SUSHIL KULKARNI
BUILDING THE
TREE
SUSHIL KULKARNI
Information Gain ID3
Select the attribute with the highest information
gain
Assume there are two classes, P and N
Let the set S contain p elements of class P and n
elements of class N
The amount of information, needed to decide if an
arbitrary object in S belongs to P or N is defined as
p p n n
I ( p, n) = − log 2 − log 2
p+n p+n p+n p+n
SUSHIL KULKARNI
Information Gain in Decision
Tree Induction
Assume that using attribute A, a set S will be
partitioned into sets {S1, S2 , …, Sv}
If Si contains pi elements of P and ni elements of N,
the entropy, or the expected information needed to
classify objects in all sub trees Si is
ν pi + ni
E ( A) = ∑ I ( pi , ni )
i =1 p + n
The encoding information that would be gained

by branching on A
Gain( A) = I ( p, n) − E ( A)
SUSHIL KULKARNI
Training Dataset
This follows an example from Quinlan’s
NoID3 age income student credit_rating buys_computer
1 <=30 high no fair no
2 <=30 high no excellent no
3 31…40 high no fair yes
4 >40 medium no fair yes
5 >40 low yes fair yes
6 >40 low yes excellent no
7 31…40 low yes excellent yes
8 <=30 medium no fair no
9 <=30 low yes fair yes
10 >40 medium yes fair yes
11 <=30 medium yes excellent yes
12 31…40 medium no excellent yes
SUSHIL KULKARNI
Attribute Selection by
Information Gain Computation
Class P:
buys_computer = E ( age) =
5
I ( 2,3) +
4
I ( 4,0)
“yes” 14 14
5
+ I (3,2) =0.69
Class N: 14
Hence
buys_computer = “no” Gain( age) = I ( p, n) − E ( age)
I(p, n) = I(9, 5) =0.940 =0.250
Similarly
Compute the entropy Gain(income) = 0.029

for age: Gain( student ) = 0.151
age pi ni I(pi, ni) Gain(credit _ rating ) = 0.048
<=30 2 3 0.971 AGE IS MAX GAIN
31..40 4 0 0
SUSHIL KULKARNI
>40 3 2 0.971
Splitting the samples using
age
age?
<=30 >40
31...40
income student credit_rating buys_computer income student credit_rating buys_computer
high no fair no medium no fair yes
high no excellent no low yes fair yes
medium no fair no low yes excellent no
low yes fair yes medium yes fair yes
medium yes excellent yes medium no excellent no
income student credit_rating buys_computer

high no fair yes
low yes excellent yes labeled yes
medium no excellent yes
high yes fair yes
SUSHIL KULKARNI
Output: ID 3 for “buys_computer”
age?
<=30
31..40 >40
student? yes credit rating?
no excellent fair
yes
no yes no yes
SUSHIL KULKARNI
CART
SUSHIL KULKARNI
CART [ CLASSIFICATION AND
REGRESSION TREE]
Algorithm is similar to ID 3 but used GINI
index called impurity measure to select
variables.
If target variable is normal and has more
than two categories , the option of merging of
target categories into two super categories
may be considered. The process is called
Twoing.
SUSHIL KULKARNI
Gini Index (IBM Intelligent Miner)
If a data set T contains examples from n
classes, gini index, gini(T) is defined as
n
gini(T ) =1−∑p 2 j
j= 1
where pj is the relative frequency of class j in T.
SUSHIL KULKARNI
Extracting Classification Rules
from Trees
Represent the knowledge in the form of IF-
THEN rules
One rule is created for each path from the root

to a leaf
Each attribute-value pair along a path forms a

conjunction
SUSHIL KULKARNI
Extracting Classification Rules
from Trees
The leaf node holds the class prediction
Rules are easy for humans to understand
Example
IF age = “<=30” AND student = “no” THEN
buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN
buys_computer = “yes”
IF age = “31…40” THEN
buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent”
THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “fair” THEN
buys_computer = “no” SUSHIL KULKARNI
BAYESIAN
CLASSIFICATION
SUSHIL KULKARNI
Classification and
regression
What is classification? What is regression?
Issues regarding classification and
regression
Classification by decision tree induction
Bayesian Classification
Other Classification Methods
regression
SUSHIL KULKARNI
What is Bayesian
Classification?
Bayesian classifiers are statistical
classifiers
For each new sample they provide a

probability that the sample belongs to a
class (for all classes)
SUSHIL KULKARNI
What is Bayesian
Classification?
Example:
o sample John (age=27, income=high,

student=no, credit_rating=fair)
o P(John, buys_computer=yes) = 20%
o P(John, buys_computer=no) = 80%
SUSHIL KULKARNI
Naive Bayesian Classifier
play tennis?
Example
Outlook Temperature Humidity W indy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
SUSHIL KULKARNI
Example
Outlook Temperature Humidity W indy Class
overcast hot high false P
rain mild high false P
rain cool normal false P
overcast cool normal true P 9
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
rain cool normal true N 5
sunny mild high false N
rain mild high true N
SUSHIL KULKARNI
Example
Given the training set, we compute the
probabilities:
Outlook P N Humidity P N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature W indy
hot 2/9 2/5 true 3/9 3/5
mild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5
We also have the probabilities

P = 9/14 and N = 5/14 SUSHIL KULKARNI
We use notation P(A) as the probability of an
event A and P(A/B) denotes the probability of A
conditional on another event B.
H is the hypothesis and E is the evidence and is
the combination of attribute values then
p( E / H ).p( H )
p( H / E ) =
p( E )
• Example: Let H be ‘yes’ and e is the combination
of attribute values for new day: Outlook=sunny,
temp.= cool, humidity= high, windy= true. Call these
for pieces as E1 , E2 ’ E 3 and E 4 and are independent
then p(E1 / H).p(E 2 / H).p(E3 / H)p(E 4 / H).p(H)
p( H / E ) =
p( E )
SUSHIL KULKARNI
Denominator can be eliminated as the final normalizing
step when we make the probabilities of different pieces,
the sum is 1. Thus
p( H / E ) = p(E1 / H).p(E 2 / H).p(E3 / H)p(E 4 / H).p(H)
SUSHIL KULKARNI
Example
To classify a new day E:
outlook = sunny, temperature = cool
humidity = high, windy = false
Prob(P|E) = Prob(P) * Prob(sunny|P) * Prob(cool|P)

* Prob(high|P) * Prob(false|P)
= 9/14*2/9*3/9*3/9*6/9 = 0.01
Prob(N|X) = Prob(N) * Prob(sunny|N) *
Prob(cool|N) * Prob(high|N) *
Prob(false|N)
= 5/14*3/5*1/5*4/5*2/5 = 0.013
SUSHIL KULKARNI
Example
Probability of ‘Playing’
0.01
= = 43 %
0.01 + 0.013
Probability of ‘ Not Playing’

0.013
= = 57 %
0.01 + 0.013
Therefore E takes class label N
SUSHIL KULKARNI
Example
Second example X = <rain, hot, high, false>
P(X|p) · P(p) = P(rain|p) * P(hot|p) * P(high|p)
* P(false|p) * P(p)
= 3/9·2/9·3/9·6/9·9/14 = 0.010582
P(X|n) · P(n) = P(rain|n) · P(hot|n) ·
P(high|n)·P(false|n)·P(n)
= 2/5·2/5·4/5·2/5·5/14 = 0.018286
Sample X is classified in class N (don’t play)
SUSHIL KULKARNI
Example
Probability of ‘Playing’
0.010582
= = 37 %
0.010582 + 0.0182860
Probability of ‘ Not Playing’
0.0182860
= = 63 %
0.010582 + 0.0182860
Therefore X takes class label N SUSHIL KULKARNI
REGRESSION
SUSHIL KULKARNI
What Is regression?
regression is similar to classification
o First, construct a model
o Second, use model to predict unknown value
 Major method for regression is regression
• Linear and multiple regression
• Non-linear regression
regression is different from classification
o Classification refers to predict categorical
class label
o regression models continuous-valued
functions SUSHIL KULKARNI
Predictive Modeling in
Databases
Predictive modeling: Predict data values or
construct generalized linear models based
on the database data.
One can only predict value ranges or
category distributions
Determine the major factors which influence
the regression
o Data relevance analysis: uncertainty
measurement, entropy analysis, expert
judgement, etc. SUSHIL KULKARNI
Regress Analysis and Log-
Linear Models in Regression
Linear regression: Y = α + β X
Two parameters , α and β specify the line and are

to be estimated by using the data at hand.
using the least squares criterion to the known
values of (x1,y1),(x2,y2),...,(xs,yS):
∑
s
( xi − x )( yi − y )
β= i =1
a =y −βx
∑
s
i =1
( xi − x ) 2
SUSHIL KULKARNI
Regress Analysis and Log-
Linear Models in Regression
Multiple regression: Y = b0 + b1 X1 + b2 X2.
Many nonlinear functions can be transformed into
the above.
E.g.,Y=b 0 + b1 X+ b2X 2+ b3X 3, X1=X, X2=X 2, X3=X 3
Log-linear models:
The multi-way table of joint probabilities is
approximated by a product of lower-order tables.
Probability: p(a, b, c, d) = α ab β acχ ad δ bcd
SUSHIL KULKARNI
T H A N K S !
SUSHIL KULKARNI

Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.

Transféré par

Droits d'auteur :

Formats disponibles

CLASSIFICATION

Problem is to create classes to classify

ӂ Teachers classify students’ grades as A,

Model is represented using classification

Supervised learning (classification)

NAME RANK YEARS TEACH Classifier

Normal High Low

Age ? Drug A Drug B

Tree summarizes the following:

Two classes ‘Drug A’ and ‘Drug B’ are created.

All rules that we made are 100% correct

In practical field data, it is unlikely that we get

o If BP=High prescribe Drug A ( Support = 3/12)

Accuracy and Error are calculated for classes and

If the attribute is a categorical then the

If the attribute is continuous then the tree

Use of decision tree: Classifying an unknown sample

student? yes credit rating?

ӂIf 70<=x<80 then grade =C. <80 >=80

Conditions for stopping partitioning

Easy to generate rules

May suffer from over fitting.

Does not easily handle nonnumeric

When the marbles in the bowl are

Entropy gives an idea of how to split an

The encoding information that would be gained

Compute the entropy Gain(income) = 0.029

income student credit_rating buys_computer

student? yes credit rating?

One rule is created for each path from the root

Each attribute-value pair along a path forms a

For each new sample they provide a

o sample John (age=27, income=high,

o P(John, buys_computer=yes) = 20%

o P(John, buys_computer=no) = 80%

We also have the probabilities

p( H / E ) = p(E1 / H).p(E 2 / H).p(E3 / H)p(E 4 / H).p(H)

Prob(P|E) = Prob(P) * Prob(sunny|P) * Prob(cool|P)

Probability of ‘ Not Playing’

P(X|p) · P(p) = P(rain|p) * P(hot|p) * P(high|p)

Two parameters , α and β specify the line and are

Vous aimerez peut-être aussi