Bienvenue sur Scribd !

Machine Learning: Version 2 CSE IIT, Kharagpur

Transféré par

0% ont trouvé ce document utile (0 vote)

36 vues6 pages

This document discusses rule induction and decision trees. It covers splitting functions used to split data when building decision trees based on information gain. It also discusses issues that can arise when building decision trees like overfitting, and describes a common pruning algorithm to address overfitting by removing subtrees that do not significantly impact accuracy on validation data.

Description originale:

Titre original

Lesson 36

Copyright

Formats disponibles

PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

Droits d'auteur :

Attribution Non-Commercial (BY-NC)

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

36 vues6 pages

Machine Learning: Version 2 CSE IIT, Kharagpur

Transféré par

Gaurav Kispotta

Droits d'auteur :

Attribution Non-Commercial (BY-NC)

Formats disponibles

Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 6

Rechercher à l'intérieur du document

Module 12

Machine Learning
Version 2 CSE IIT, Kharagpur

Lesson 36
Rule Induction and Decision Tree - II
Version 2 CSE IIT, Kharagpur

Splitting Functions
What attribute is the best to split the data? Let us remember some definitions from information theory. A measure of uncertainty or entropy that is associated to a random variable X is defined as H(X) = - pi log pi where the logarithm is in base 2. This is the average amount of information or entropy of a finite complete probability scheme. We will use a entropy based splitting function. Consider the previous example:

t2 t3 t1 S
Size divides the sample in two. S1 = { 6P, 0NP} S2 = { 3P, 5NP} H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8)

Version 2 CSE IIT, Kharagpur

S1 t2 t3 S3 t1
humidity divides the sample in three. S1 = { 2P, 2NP} S2 = { 5P, 0NP} S3 = { 2P, 3NP} H(S1) = 1 H(S2) = 0 H(S3) = -(2/5)log2(2/5) -(3/5)log2(3/5) Let us define information gain as follows: Information gain IG over attribute A: IG (A) IG(A) = H(S) - v (Sv/S) H (Sv) H(S) is the entropy of all examples. H(Sv) is the entropy of one subsample after partitioning S based on all possible values of attribute A. Consider the previous example:

t2 t3 t1 S2 S1
Version 2 CSE IIT, Kharagpur

We have, H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) H(S) = -(9/14)log2(9/14) -(5/14)log2(5/14) |S1|/|S| = 6/14 |S2|/|S| = 8/14 The principle for decision tree construction may be stated as follows: Order the splits (attribute and value of the attribute) in decreasing order of information gain.

12.3.4 Decision Tree Pruning

Practical issues while building a decision tree can be enumerated as follows: 1) 2) 3) 4) 5) How deep should the tree be? How do we handle continuous attributes? What is a good splitting function? What happens when attribute values are missing? How do we improve the computational efficiency

The depth of the tree is related to the generalization capability of the tree. If not carefully chosen it may lead to overfitting. A tree overfits the data if we let it grow deep enough so that it begins to capture aberrations in the data that harm the predictive power on unseen examples:

t2 t3
Possibly just noise, but the tree is grown larger to capture these examples

There are two main solutions to overfitting in a decision tree: 1) Stop the tree early before it begins to overfit the data. Version 2 CSE IIT, Kharagpur

+ In practice this solution is hard to implement because it is not clear what is a good stopping point. 2) Grow the tree until the algorithm stops even if the overfitting problem shows up. Then prune the tree as a post-processing step. + This method has found great popularity in the machine learning community. A common decision tree pruning algorithm is described below. 1. Consider all internal nodes in the tree. 2. For each node check if removing it (along with the subtree below it) and assigning the most common class to it does not harm accuracy on the validation set. 3. Pick the node n* that yields the best performance and prune its subtree. 4. Go back to (2) until no more improvements are possible. Decision trees are appropriate for problems where: Attributes are both numeric and nominal. Target function takes on a discrete number of values. A DNF representation is effective in representing the target concept. Data may have errors.

Version 2 CSE IIT, Kharagpur

Vous aimerez peut-être aussi

Machine Learning Rule Induction and Decision Tree Lesson
Document6 pages
Machine Learning Rule Induction and Decision Tree Lesson
Emil Stankov
Pas encore d'évaluation
Adaptive Boosting Assisted Multiclass Classification
Document5 pages
Adaptive Boosting Assisted Multiclass Classification
International Journal of Application or Innovation in Engineering & Management
Pas encore d'évaluation
7898 Catboost Unbiased Boosting With Categorical Features
Document11 pages
7898 Catboost Unbiased Boosting With Categorical Features
Iqbal Hanif
Pas encore d'évaluation
Decision Tree Model for Diabetes Classification Using Scikit-Learn
Document9 pages
Decision Tree Model for Diabetes Classification Using Scikit-Learn
sudeepvmenon
Pas encore d'évaluation
Heba DSBook 2022
Document337 pages
Heba DSBook 2022
Ahmed Fawzy
Pas encore d'évaluation
253 PDF
Document52 pages
253 PDF
sk15021993
Pas encore d'évaluation
1 Preliminaries: Data Structures and Algorithms
Document21 pages
1 Preliminaries: Data Structures and Algorithms
Pavan Rs
Pas encore d'évaluation
K-means Clustering & Decision Tree Classification
Document16 pages
K-means Clustering & Decision Tree Classification
Mofi Gohar
Pas encore d'évaluation
Study Material On Data Structure and Algorithms
Document43 pages
Study Material On Data Structure and Algorithms
God is every where
Pas encore d'évaluation
Designand Implementationof Fast Fourier Transform FFTusing VHDLCode
Document7 pages
Designand Implementationof Fast Fourier Transform FFTusing VHDLCode
Estevão De Freitas Ferreira Prince
Pas encore d'évaluation
1 s2.0 S1877050923018549 Main
Document5 pages
1 s2.0 S1877050923018549 Main
jefferyleclerc
Pas encore d'évaluation
SP18 CS182 Midterm Solutions - Edited
Document14 pages
SP18 CS182 Midterm Solutions - Edited
Hasim
Pas encore d'évaluation
New Microsoft Word Document
Document3 pages
New Microsoft Word Document
RavindraSamria
Pas encore d'évaluation
Performance Analysis of Decision Tree Classifiers
Document9 pages
Performance Analysis of Decision Tree Classifiers
EighthSenseGroup
Pas encore d'évaluation
Java Course Notes CS3114 - 09212011
Document344 pages
Java Course Notes CS3114 - 09212011
Manuel Sosaeta
Pas encore d'évaluation
A Genetic Algorithm For Constructing Compact Binary Decision Trees
Document13 pages
A Genetic Algorithm For Constructing Compact Binary Decision Trees
Tony Waribo
Pas encore d'évaluation
A Practical Introduction To Data Structures and Algorithm Analysis
Document346 pages
A Practical Introduction To Data Structures and Algorithm Analysis
antenehgeb
Pas encore d'évaluation
Project Questions
Document3 pages
Project Questions
ravikgovindu
Pas encore d'évaluation
Session 2 Student Text1
Document8 pages
Session 2 Student Text1
uki
Pas encore d'évaluation
Advanced Data Structure
Document3 pages
Advanced Data Structure
aryan kothambia
Pas encore d'évaluation
Ds Lecture Notes
Document130 pages
Ds Lecture Notes
harshitpandey6051
Pas encore d'évaluation
AAA Lecture 6&7 Divide and Conquer
Document49 pages
AAA Lecture 6&7 Divide and Conquer
Muhammad Juniad
Pas encore d'évaluation
Lecture 1 - Intro
Document20 pages
Lecture 1 - Intro
Owais Ahmed
Pas encore d'évaluation
Data Structures and Overview
Document13 pages
Data Structures and Overview
srinidhi1956
Pas encore d'évaluation
Segment Tree For Solving Range Minimum Query Problems
Document6 pages
Segment Tree For Solving Range Minimum Query Problems
Iskandar Setiadi
Pas encore d'évaluation
1 Intro To Data-Structures
Document27 pages
1 Intro To Data-Structures
amanp760
Pas encore d'évaluation
Nonlinear Dimensionality Reduction
Document18 pages
Nonlinear Dimensionality Reduction
aakarsh
Pas encore d'évaluation
ML Unit 2
Document41 pages
ML Unit 2
abhijit kate
Pas encore d'évaluation
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
Document12 pages
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
jefferyleclerc
Pas encore d'évaluation
KP - Report 3 PDF
Document35 pages
KP - Report 3 PDF
Naveen Mishra
Pas encore d'évaluation
Pe 2
Document5 pages
Pe 2
mcantimurcan
Pas encore d'évaluation
A Data Pre Processing
Document7 pages
A Data Pre Processing
Jaidup Banjarnahor
Pas encore d'évaluation
A Novel Sorting Technique To Sort Elemen PDF
Document5 pages
A Novel Sorting Technique To Sort Elemen PDF
Dr. Hitesh Mohapatra
Pas encore d'évaluation
DSA Important Instructions
Document11 pages
DSA Important Instructions
Kimi John
Pas encore d'évaluation
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
Document38 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
Ravi Kotharu
Pas encore d'évaluation
73 Ben-Haim Parallel Decision Tree
Document4 pages
73 Ben-Haim Parallel Decision Tree
ajithakrish
Pas encore d'évaluation
Jatin Cu Ritambra
Document6 pages
Jatin Cu Ritambra
Amal
Pas encore d'évaluation
K-Means Clustering Method For The Analysis of Log Data
Document3 pages
K-Means Clustering Method For The Analysis of Log Data
idescitation
Pas encore d'évaluation
Random Forest For Binary Classification
Document19 pages
Random Forest For Binary Classification
huifeng952
Pas encore d'évaluation
DS Final Exam 2018 Fall
Document8 pages
DS Final Exam 2018 Fall
Jannat Nasir
Pas encore d'évaluation
Data Structure Using C and C++ Basic
Document8 pages
Data Structure Using C and C++ Basic
aman deeptiwari
Pas encore d'évaluation
DBMSMCQ
Document21 pages
DBMSMCQ
sai
Pas encore d'évaluation
Data Cluster Algorithms in Product Purchase Saleanalysis
Document6 pages
Data Cluster Algorithms in Product Purchase Saleanalysis
Quenton Compas
Pas encore d'évaluation
Description: Bank - Marketing - Part1 - Data - CSV
Document4 pages
Description: Bank - Marketing - Part1 - Data - CSV
ravikgovindu
Pas encore d'évaluation
CSI 4107 - Winter 2016 - Midterm
Document10 pages
CSI 4107 - Winter 2016 - Midterm
Amin Dhouib
0% (1)
Chapter 1 Complexity
Document8 pages
Chapter 1 Complexity
NE-5437 Official
Pas encore d'évaluation
An Integration of K-Means and Decision Tree (ID3) Towards A More Efficient Data Mining Algorithm
Document7 pages
An Integration of K-Means and Decision Tree (ID3) Towards A More Efficient Data Mining Algorithm
Journal of Computing
Pas encore d'évaluation
ECE 368 A Tour by Example of Non-Trivial Circuit Design and VHDL Description
Document23 pages
ECE 368 A Tour by Example of Non-Trivial Circuit Design and VHDL Description
Anand Chaudhary
Pas encore d'évaluation
SECTION 1: Basic Concepts and Notations, Arrays and Recursion
Document6 pages
SECTION 1: Basic Concepts and Notations, Arrays and Recursion
YT Premone
Pas encore d'évaluation
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
Document5 pages
Exam Advanced Data Mining Date: 5-11-2009 Time: 14.00-17.00: General Remarks
kishh28
Pas encore d'évaluation
C4.5 Optimization
Document5 pages
C4.5 Optimization
Anita Andriani
Pas encore d'évaluation
Data Structures and Algorithm
Document1 page
Data Structures and Algorithm
PRIVATUS JAMHURI
Pas encore d'évaluation
2806 Neural Computation Committee Machines: 2005 Ari Visa
Document30 pages
2806 Neural Computation Committee Machines: 2005 Ari Visa
KhaireCr
Pas encore d'évaluation
DIRECT Optimization Algorithm User Guide
Document14 pages
DIRECT Optimization Algorithm User Guide
Mostafa Mangal
Pas encore d'évaluation
Fuzzy Modeling
Document65 pages
Fuzzy Modeling
Abouzar Sekhavati
Pas encore d'évaluation
Ggplot2 Exercise
Document6 pages
Ggplot2 Exercise
retokoller44
Pas encore d'évaluation
Query Trees and Heuristics For Query Optimization
Document29 pages
Query Trees and Heuristics For Query Optimization
Pardeep swami
Pas encore d'évaluation
Tecplot Data Formats
Document3 pages
Tecplot Data Formats
amin nabavizadeh
Pas encore d'évaluation
Simple Data Science (R)
D'Everand
Simple Data Science (R)
Narayana Nemani
Évaluation : 5 sur 5 étoiles
5/5 (1)
Decision Tree Pruning: Fundamentals and Applications
D'Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
Pas encore d'évaluation
Synopsis: Analog Clock Using Opengl
Document1 page
Synopsis: Analog Clock Using Opengl
Gaurav Kispotta
Pas encore d'évaluation
First Year Syllabus - 2013-14 SSIT
Document32 pages
First Year Syllabus - 2013-14 SSIT
Gaurav Kispotta
Pas encore d'évaluation
Speech Recog Intro
Document9 pages
Speech Recog Intro
Gaurav Kispotta
Pas encore d'évaluation
ADBMS TypicalQueryOptimizer
Document30 pages
ADBMS TypicalQueryOptimizer
Gaurav Kispotta
Pas encore d'évaluation
Project Status Report For Implementation of Li-Fi Within Campus
Document14 pages
Project Status Report For Implementation of Li-Fi Within Campus
Gaurav Kispotta
Pas encore d'évaluation
Sri Siddhartha Institute of Technology, Tumkur: Model Question Paper 09CS752: Artificial Intelligence
Document2 pages
Sri Siddhartha Institute of Technology, Tumkur: Model Question Paper 09CS752: Artificial Intelligence
Gaurav Kispotta
Pas encore d'évaluation
ADBMS 2nd Assignment
Document1 page
ADBMS 2nd Assignment
Gaurav Kispotta
Pas encore d'évaluation
Sri Siddhartha Institute of Technology, Tumkur: Model Question Paper 09CS752: Artificial Intelligence
Document2 pages
Sri Siddhartha Institute of Technology, Tumkur: Model Question Paper 09CS752: Artificial Intelligence
Gaurav Kispotta
Pas encore d'évaluation
Sri Siddhartha Institute of Technology, Tumkur: Model Question Paper 09CS752: Artificial Intelligence
Document2 pages
Sri Siddhartha Institute of Technology, Tumkur: Model Question Paper 09CS752: Artificial Intelligence
Gaurav Kispotta
Pas encore d'évaluation
Finite Automata
Document34 pages
Finite Automata
Bhargav Mendapara
Pas encore d'évaluation
First Year Syllabus - 2013-14 SSIT
Document15 pages
First Year Syllabus - 2013-14 SSIT
Gaurav Kispotta
Pas encore d'évaluation
Unix Ref Card
Document2 pages
Unix Ref Card
Gaurav Kispotta
Pas encore d'évaluation
Microprocessor History Evolution
Document40 pages
Microprocessor History Evolution
Gaurav Kispotta
Pas encore d'évaluation
8086 Programs-Semester 4
Document40 pages
8086 Programs-Semester 4
Shashank M Chanmal
Pas encore d'évaluation
Advanced DBMS
Document1 page
Advanced DBMS
Gaurav Kispotta
Pas encore d'évaluation
Before You Call Tech Support
Document12 pages
Before You Call Tech Support
mkpatel0609
Pas encore d'évaluation
3 I's
Document5 pages
3 I's
Jessy Basco
Pas encore d'évaluation
Chapter 07
Document12 pages
Chapter 07
Elton Cache Hunter Tooahnippah
Pas encore d'évaluation
Business Statistics: A First Course: Seventh Edition
Document28 pages
Business Statistics: A First Course: Seventh Edition
Ameet Singh Sidhu
Pas encore d'évaluation
Newman Error Analysis PDF
Document20 pages
Newman Error Analysis PDF
Jun Wei Chai
100% (2)
Documents - Pub Narra Bark As Inkpdf
Document22 pages
Documents - Pub Narra Bark As Inkpdf
Erving Lester Magtajas
Pas encore d'évaluation
Module AGRI 214
Document55 pages
Module AGRI 214
Queenie Gwyneth Manangan
Pas encore d'évaluation
Book Dummy Variables Cla
Document2 pages
Book Dummy Variables Cla
Debora Kasp
Pas encore d'évaluation
Forecasting manpower demand and supply
Document18 pages
Forecasting manpower demand and supply
Ruchi Nanda
Pas encore d'évaluation
Unit 9
Document16 pages
Unit 9
swingbike
Pas encore d'évaluation
Psych Assessment Prelim
Document30 pages
Psych Assessment Prelim
Mew Gulf
Pas encore d'évaluation
Chapter 1
Document37 pages
Chapter 1
Evan Thanh
Pas encore d'évaluation
Computing The Mean and Variance of A Discrete Probability Distribution
Document14 pages
Computing The Mean and Variance of A Discrete Probability Distribution
Mary Cris Go
Pas encore d'évaluation
M Phil Statistics Thesis Topics
Document6 pages
M Phil Statistics Thesis Topics
PayToWritePapersUK
100% (2)
EDA (Online)
Document6 pages
EDA (Online)
Nathaniel Esguerra
Pas encore d'évaluation
Minor Project Main PDF
Document45 pages
Minor Project Main PDF
Bhavik Saun
Pas encore d'évaluation
Data Science Specialization
Document21 pages
Data Science Specialization
SEENU MANGAL
Pas encore d'évaluation
Sternberg's Triangular Love Scale
Document115 pages
Sternberg's Triangular Love Scale
Murni Rosyida
Pas encore d'évaluation
EURO 2008 FinalReport Austria en
Document139 pages
EURO 2008 FinalReport Austria en
Sreten Žujkić Penzioner
Pas encore d'évaluation
Formative Versus Reflective Measurement Implications For Explaining Innovation in Marketing Partnerships
Document9 pages
Formative Versus Reflective Measurement Implications For Explaining Innovation in Marketing Partnerships
Taufik Hameed Sultan
Pas encore d'évaluation
2009.05673 - Jeff Heaton - Applications of Deep Learning in TF 2.0
Document569 pages
2009.05673 - Jeff Heaton - Applications of Deep Learning in TF 2.0
Ron Medina
Pas encore d'évaluation
An Achievement Test
Document20 pages
An Achievement Test
Biswadip Dutta
100% (2)
Research Methods Terms Explained
Document21 pages
Research Methods Terms Explained
Baby Ko
Pas encore d'évaluation
Teme Pentru Referate La Cursul "Retele Neuronale"
Document3 pages
Teme Pentru Referate La Cursul "Retele Neuronale"
cernatand
Pas encore d'évaluation
Box Plot and Standard Deviation
Document25 pages
Box Plot and Standard Deviation
ANKIT KUMAR
Pas encore d'évaluation
AI - ML Dev Plan - 29102018
Document10 pages
AI - ML Dev Plan - 29102018
Siddharth Lodha
Pas encore d'évaluation
Safety Report
Document17 pages
Safety Report
Asim Farooq
Pas encore d'évaluation
Research Methodology Effectiveness-of-Rambutan-Rind-Nephelium-Lappaceum-Extract-as-Home-Disinfectant-Spray
Document9 pages
Research Methodology Effectiveness-of-Rambutan-Rind-Nephelium-Lappaceum-Extract-as-Home-Disinfectant-Spray
Zander Mirabueno
Pas encore d'évaluation
Weebly Week of 4-27-15
Document2 pages
Weebly Week of 4-27-15
api-264567776
Pas encore d'évaluation
STATISTICS FINAL EXAM (MPM) Answer Sheet
Document15 pages
STATISTICS FINAL EXAM (MPM) Answer Sheet
yetm
Pas encore d'évaluation