26 vues

Transféré par Mark Scerri

- 015
- Discovering Patterns of Urban Development-ICTI 2013
- 07850050
- Function
- Codd72a
- ganesh vandana ma1 03
- Bench-Capon, Sartor_A Model of Legal Reasoning With Cases Incorporating Theories and Values
- BOOK-A Primer on Real Analysis
- Module - Functions
- Image Project
- IC-L13
- A Comparative study on Classification and Clustering Techniques Using Assorted Data Mining Tools
- MLT Document Format
- Text Categorization with Support Vector Machines: Learning with Many Relevant Features
- SE2AA4 Intro to Design & Modules
- Clasification Variables
- Data Mining-Rule Based Classification
- HTU_Mahabal
- OM Atce2014
- p18 Survey Gaber

Vous êtes sur la page 1sur 71

Prepared by Kristian Guillaumier

Dept. of Intelligent Computer Systems

University of Malta

2011

All content in these slides adapted from:

Most material in these slides from:

[1] Machine Learning: Tom Mitchell (get this book).

[2] Introduction to Machine Learning: Ethem Alpaydin.

[3] Introduction to Expert Systems: Peter Jackson.

[4] An introduction to Fuzzy Logic and Fuzzy Sets:

Buckley, Elsami.

[5] Pattern Recognition and Machine Learning:

Christopher Bishop.

[6] Grammatical Inference: Colin de la Higuera.

[7] Artificial Intelligence: Negnevitsky.

Miscellaneous web references.

Kristian Guillaumier, 2011 2

CONCEPT LEARNING, FIND-S,

CANDIDATE ELIMINATION

(main source: Mitchell [1])

Kristian Guillaumier, 2011 3

Note on Induction

If a large number of items I have seen so far all possess

some property, then ALL items possess that property.

So far the sun has always risen

Are all swans white

We never known whether our induction is true (we

have not proven it).

In machine learning:

Input: A number of training examples for some function.

Output: a hypothesis that approximates the function.

Kristian Guillaumier, 2011 4

Concept Learning

Learning: inducing general functions from specific training

examples (+ve and/or ve).

Concept learning: induce the definition of a general category (e.g.

cat) from a sample of +ve and ve training data.

Search a space of potential hypotheses (hypothesis space) for a

hypothesis that best fits the training data provided.

Each concept can be viewed as the description of some subset

(there is a general-to-specific ordering) defined over a larger set.

E.g.:

Cat _ Feline _ Animal _ Object

A boolean function over the larger set. E.g. the IsCat() function over

the set of animals.

Concept learning learning this function from training data.

Kristian Guillaumier, 2011 5

Example

Learn the concept: Good days when I like to swim.

We want to learn the function:

IsGoodDay(input) true/false.

Our hypothesis is represented as a conjunction of constraints on

attributes. Attributes:

Sky Sunny/Rainy/Cloudy.

AirTemp Warm/Cold.

Humidity High/Normal.

Wind Strong/Weak.

Water Warm cold.

Forecast Same/Change.

Other possible contraints (values) for an attribute:

? I dont care, any value is acceptable.

C no value is acceptable.

Sky Sunny/Rainy.

Attribute

Constraints

Kristian Guillaumier, 2011 6

Example

Our hypothesis, then, is a vector of constraints on

these attributes:

<Sky, AirTemp, Humidity, Wind, Water, Forecast>

An example of an hypothesis is (only on warm days

with normal humidity):

<?, Warm, Normal, ?, ?, ?>

The most general hypothesis is:

<?, ?, ?, ?, ?, ?>

The most specific hypothesis is:

< C, C, C, C, C, C >

Kristian Guillaumier, 2011 7

Example

Training Examples:

Sky AirTemp Humidity Wind Water Forecast IsGoodDay

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Changes No

Sunny Warm High Strong Cool Changes Yes

Kristian Guillaumier, 2011 8

Notation

The set of all items over which the concept is called the set

of instances, X.

The set of all days represented by the attributes AirTemp,

Humidity,

The set of all animals, etc

An instance in X is denoted by x (x e X).

The concept to be learnt (e.g. cats over animals, good days

to swim over all days) is called the target concept denoted

by c (note that c is the target function).

c is a Boolean-valued function defined over the instances X.

i.e. the function c, takes an instance x e X and in our example

c(x) = 1 if IsGoodDay is Yes

c(x) = 0 if IsGoodDay is no.

Kristian Guillaumier, 2011 9

Notation

When learning the target concept, c, the learner

is given a training set that consists of:

A number of instances x from X.

For each instance, we have the value of the target

concept c(x).

Instances where c(x) = 1 are called +ve training

examples (members of target concept).

Instances where c(x) = 0 are called ve training

examples (non-members).

A training example is usually denoted by:

<x, c(x)>

An instance and its target concept value

Kristian Guillaumier, 2011 10

Notation

Given the set of training examples, we want to

learn (hypothesize) the target concept c.

H is the set of all hypotheses that we are

considering (all the possible combinations of

<Sky, AirTemp, Humidity, Wind, Water,

Forecast>).

Each hypothesis in H denoted by h and is usually

a boolean valued function h:X{0,1}.

Goal of the learner is to find an h such that:

h(x) = c(x) for all x e X.

Kristian Guillaumier, 2011 11

Concept Learning Task (Mitchell pg. 22)

Kristian Guillaumier, 2011 12

The Inductive Learning Hypothesis

Our main goal is to find a hypothesis h that is identical to c for all x

e X (for every instance possible).

The only information we have on c is the training data.

What about unseen instances (where we dont have training data).

At best we can guarantee that our learner will learn a hypothesis

that learns the training data exactly (not good).

We make an assumption the inductive learning hypothesis

Any hypothesis that approximates the target function well over a

sufficiently large set of training examples will also approximate

the target function well for other unobserved/unseen examples.

Kristian Guillaumier, 2011 13

Size and Ordering of the Search Space

We must search in the hypothesis space for the best one (that matches c).

The size of the hypothesis space is defined by the hypothesis representation.

Recall:

Sky Sunny/Rainy/Cloudy (3 options).

AirTemp Warm/Cold (2 options).

Humidity High/Normal (2 options).

Wind Strong/Weak (2 options).

Water Warm cold (2 options).

Forecast Same/Change (2 options).

Which means that we have 322222=96 distinct instances.

Due to the addition of the symbols ? and C we have 544444=5120

syntactically distinct instances.

However note that any instance that contains one or more Cs by definition always

means that it is classified vely. So the number of semantically distinct instances is:

1 + 433333 = 973

1 representing all the definitely

negative ones since they contain C

Distinct possibilities + 1 for the ?

Kristian Guillaumier, 2011 14

Size and Ordering of the Search Space

Consider the following hypotheses:

h1: <Sunny, ?, ?, Strong, ?, ?>

h2: <Sunny, ?, ?, ?, ?, ?>

Consider the sets of instances that are classified as +ve by h1 and

h2.

Clearly, since h2 has less constraints, it will classify more instances

as +ve than h1.

clearly, anything that is classified as +ve by h1 will also be classified

as +ve by h2.

We say that h2 is more general than h1.

This allows us to organize the search space (order the set) according

to this relationship between instances (hypotheses) in it.

This ordering concept is very important because there are concept

learning algorithms that make use of this ordering.

Kristian Guillaumier, 2011 15

Size and Ordering of the Search Space

For any instance xeX and hypothesis heH,

x satisfies h iff h(x) = 1.

The is more general or equal to relationship is

denoted by

g

.

Given two hypothesis h

j

and h

k

, h

j

>

g

h

k

iff any

instance that satisfies h

k

also satisfies h

j

.

h

j

>

g

h

k

iff

xeX, h

k

(x)=1 h

j

(x) = 1

Kristian Guillaumier, 2011 16

Size and Ordering of the Search Space

Just as we have defined >

g

, it is useful to

define: strictly more general than, more

specific or equal to,

Kristian Guillaumier, 2011 17

Size and Ordering of the Search Space

Kristian Guillaumier, 2011 18

Set of all instances

Each hypothesis corresponds

to a subset of X (arrows).

h2 contains h1 and h3 because it is more general.

h1 and h3 are not more general or specific than each other

Size and Ordering of the Search Space

Notes:

>

g

is a partial order over the hypothesis space H (it is reflexive, antisymmetric,

and transitive).

Reflectivity, or Reflexive Relation

A reflexive relation is a binary relation R over a set S where every element in S is related to

itself.

That is, x S, xRx holds true.

For example, the relation over Z

+

is reflexive because x Z

+

, x x.

Transitivity, or Transitive Relation

A transitive relation is a binary relation R over a set S where a, b, c S: aRb bRc aRc.

For example, the relation over Z

+

is transitive because a, b, c Z

+

, if a b and b c then a

c.

Antisymmetry, or Antisymmetric Relation

An antisymmetric relation is a binary relation R over a set S where a, b S, aRb bRa a =

b.

An equivalent way of stating this is that a, b S, aRb a b bRa.

For example, the relation over Z

+

is antisymmetric because a, b, c Z

+

, if a b and b a

then a = b.

Kristian Guillaumier, 2011 19

The Find-S Algorithm

Initialise h to the most specific hypothesis in H.

For each +ve training instance x

For each attribute constraint a

i

in h

If constraint a

i

is satisfied by x Then

Do Nothing

Else

Replace a

i

in h by the next more

general constraint that is

satisfied by x

Return h

Kristian Guillaumier, 2011 20

The Find-S Algorithm

Init h to most specific hypothesis:

h = < C, C, C, C, C, C >

Start with first +ve training example:

x = <Sunny, Warm, Normal, Strong, Warm, Same>

Consider the first attribute sky. Our hypothesis says C (most

specific) but our training example says Sunny (more general) so

the attribute in the hypothesis does not satisfy the attribute in x.

Pick the next more general attribute value from C which is Sunny.

Repeat for all attributes AirTemp, humidity, until we get:

h = <Sunny, Warm, Normal, Strong, Warm, Same>

After having covered the 1

st

training example, h is more general

than what we started with. However it is still very specifiy it will

classify all possible instances to ve except the one +ve training

example it has seen.

Continue with the next +ve training example.

Kristian Guillaumier, 2011 21

The Find-S Algorithm

The next example is:

x = <Sunny, Warm, High, Strong, Warm, Same>

Recall that so far:

h = <Sunny, Warm, Normal, Strong, Warm, Same>

Loop thru all the attributes. 1

st

satisfied do nothing, 2

nd

satisfied do

nothing, 3

rd

not satisfied pick next more general, etc

x = <Sunny, Warm, High, Strong, Warm, Same>

h = <Sunny, Warm, Normal, Strong, Warm, Same>

new h

h = <Sunny, Warm, ?, Strong, Warm, Same>

We complete the algorithm to get the hypothesis:

h = <Sunny, Warm, ?, Strong, ?, ?>

Kristian Guillaumier, 2011 22

The Find-S Algorithm - Observation

Remember that the algorithm skipped the 3

rd

training

example because it was ve.

However we observe that the hypothesis we had

generated so far was consistent with this ve training

example.

After considering the 2

nd

training example, our

hypothesis was:

h = <Sunny, Warm, ?, Strong, Warm, Same>

The 3

rd

training example (that was skipped) was:

x = <Rainy, Cold, High, Strong, Warm, Change>

Note that h is already consistent with the training

example.

Kristian Guillaumier, 2011 23

The Find-S Algorithm - Observation

As long as the hypothesis space H contains the hypothesis that describes

the target concept c and there are no errors in the training data, the

current hypothesis can never be inconsistent with a ve training example.

To see why:

h is the most specify hypothesis in H that is consistent with the currently

observed training examples.

Since we assume the target concept c is in H and is consistent (obviously) with

the positive training examples, then c must be >

g

than h.

But c will never cover a ve example so neither will h (by definition of >

g

).

Clearly, if the a more general hypothesis will not misclassify ve example, the

more specific one cannot misclassify it either.

Kristian Guillaumier, 2011 24

Consider Animal >

g

Cat

If cat then animal

If animal then maybe cat

If not cat then maybe not animal

If not animal then not cat (if ve is correctly classified by general then it is correctly classified by specific)

The Find-S Algorithm Issues Raised

Find-S is guaranteed to find the most specific heH that is

consistent with the +ve and ve training examples assuming

that the training examples are correct (no noise).

Issues:

We dont know whether we found the only hypothesis that is

consistent. There might be more hypotheses that are consistent.

Find-S will find the most specific hypothesis consistent with the

training data. Why not the most general hypothesis consistent?

Why not something in between.

How do we know if the training data is consistent? In real-life

cases, training data may contain noise or errors.

There might be more than one maximally specific hypothesis

which one do we pick?

Kristian Guillaumier, 2011 25

The Candidate Elimination (CE)

Algorithm

Note, that although the hypothesis output by Find-S is consistent

with the training data, it is one of the, possibly, many hypothesis

that is consistent.

CE will output (a description of) all the hypothesis consistent with

the training data.

Interestingly, it does so without enumerating the whole space.

CE finds all describable hypotheses that are consistent with the

observed training examples.

Defn: h is consistent with a set of training examples D iff h(x) = c(x)

for each example <x, c(x)> in D.

Consistent(h, D) <x, c(x)> e D, h(x) = c(x)

Kristian Guillaumier, 2011 26

The Candidate Elimination (CE)

Algorithm

The subset of all the hypotheses that are consistent

with the training data (what CE finds) is called the

version space WRT the hypothesis space H and the

training data D it contains all the possible, consistent

versions of the target concept.

Defn: the version space denoted VS

H,D

with respect to

the hypothesis space H and the training data D is the

subset of hypotheses from H consistent with the

training data in D.

VS

H,D

{heH|Consistent(h,D)}

Kristian Guillaumier, 2011 27

The List-Then-Eliminate (LTE)

Algorithm

A possible representation of a version space is a

listing of all the elements (hypotheses) in it.

List-Then-Eliminate:

VersionSpace = Generate all hypothesis in H

For each training example d:

remove from VersionSpace any hypothesis h

where h(x) <> c(x)

Return VersionSpace

Kristian Guillaumier, 2011 28

The List-Then-Eliminate (LTE)

Algorithm

LTE can be applied whenever the hypothesis

space is finite (not always the case).

It has the advantage of simplicity and the fact

that it will always work (guaranteed to output all

the hypotheses consistent with the training data).

However, enumerating all the hypotheses in H is

unrealistic for all but the most trivial cases.

We need a more compact representation.

Kristian Guillaumier, 2011 29

Compact Representation of a Version

Space

Recall that in our previous example, Find-S

found the hypothesis:

h = <Sunny, Warm, ?, Strong, ?, ?>

This is only one of 6 possible hypotheses that

are consistent with the training examples.

We can illustrate the 6 possible hypothesis in

the next diagram

Kristian Guillaumier, 2011 30

Compact Representation of a Version

Space

Kristian Guillaumier, 2011 31

1

2 3 4

6 5

Compact Representation of a Version

Space

Kristian Guillaumier, 2011 32

Most specific.

Most general.

Arrow represents the

>

g

relation.

Compact Representation of a Version

Space

Kristian Guillaumier, 2011 33

Given only the 2 sets S and G, we can generate all the hypotheses in between.

Try it!

Compact Representation of a Version

Space

Intuitively we see that by having these general and specific boundaries

we can generate the whole version space (check sketch proof in Mitchell).

A few definitions.

Defn: the general boundary G (remember that G is a set) WRT a

hypothesis space H and training data D is the set of maximally general

members of H consistent with D.

G {geH |Consistent(g, D) . (-geH)*(g>

g

g).Consistent(g,D)]}

Defn: the specific boundary S WRT a hypothesis space H and training data

D is the set of minimally general (maximally specific) members of H

consistent with D.

G {seH |Consistent(s, D) . (-seH)[(s>

g

s).Consistent(s,D)]}

Kristian Guillaumier, 2011 34

(Back to) The Candidate Elimination

(CE) Algorithm

CE computes the version space containing all

the hypotheses in H that are consistent with

the observed D.

First we initialise G (remember it is a set) to

contain the most general hypothesis possible.

G

0

= {<?,?,?,?,?,?>}

Then we initialise S to contain the most

specific hypothesis possible.

S

0

= {<C,C,C,C,C,C>}

Kristian Guillaumier, 2011 35

The Candidate Elimination (CE)

Algorithm

So far the two boundaries delimit the whole

hypothesis space (every h in H is between G

0

and

S

0

).

As each training example is considered the

boundary sets S and G are generalised and

specialised respectively to eliminate from the

version space any hypothesis in H that is

inconsistent.

At the end well end up with the correct

boundary sets.

Kristian Guillaumier, 2011 36

The Candidate Elimination (CE)

Algorithm

Kristian Guillaumier, 2011 37

The Candidate Elimination (CE)

Algorithm

After init:

S

0

= {<C,C,C,C,C,C>}

G

0

= {<?,?,?,?,?,?>}

Consider the first training example:

It is positive so:

Kristian Guillaumier, 2011 38

Sky AirTemp Humidity Wind Water Forecast IsGoodDay

Sunny Warm Normal Strong Warm Same Yes

The Candidate Elimination (CE) Algorithm

Part 1: all hypotheses in G are consistent with the

training example, so we dont remove anything.

Part 2:

There is only one s in S (<C,C,C,C,C,C>) which is

inconsistent:

Remove <C,C,C,C,C,C> from S, leaving S={}.

Add to S all minimal generalisations h of s.

i.e. we add <Sunny, Warm, Normal, Strong, Warm, Same> to S.

Remove from S any hypothesis that is more general than any

other hypothesis in S.

There is only 1 hypothesis in S so we do nothing.

Kristian Guillaumier, 2011 39

Sky AirTemp Hum. Wind Water Forecast

Sunny Warm Normal Strong Warm Same Yes

So far we got:

Kristian Guillaumier, 2011 40

The Candidate Elimination (CE) Algorithm

Read the second training example. It is positive as well so.

Kristian Guillaumier, 2011 41

The Candidate Elimination (CE) Algorithm

Sky AirTemp Hum Wind Water Forecast

Sunny Warm High Strong Warm Same Yes

Part 1: all hypotheses in G are consistent with the

training example, so we dont remove anything.

Part 2:

There is only one s in S (<Sunny, Warm, Normal, Strong,

Warm, Same>) which is inconsistent:

Remove it from S, leaving S={}.

Add to S all minimal generalisations h of s.

i.e. we add <Sunny, Warm, ?, Strong, Warm, Same> to S.

Remove from S any hypothesis that is more general than any other

hypothesis in S.

There is only 1 hypothesis in S so we do nothing.

So far we got:

Kristian Guillaumier, 2011 42

The Candidate Elimination (CE) Algorithm

The Candidate Elimination (CE)

Algorithm

We notice that the role of +ve training examples is to

make the S boundary more general and the role of

the ve training examples is to make the G boundary

more specific.

Consider the 3

rd

training example, which is ve.

Kristian Guillaumier, 2011 43

Sky AirTemp Humidity Wind Water Forecast IsGoodDay

Rainy Cold High Strong Warm Changes No

Step 1: S contains <Sunny, Warm, ?, Strong, Warm,

Same> which is consistent because it labels the training

example as NO we do nothing.

Step 2:

The hypotheses in G that are not consistent with d is

<?,?,?,?,?,?> because it labels it as YES. Remove it, leaving

G = {}.

All to G all minimal specialisations of g.

Continued

Kristian Guillaumier, 2011 44

The Candidate Elimination (CE) Algorithm

Sky AirT Hum Wind Water Fore

Rainy Cold High Strong Warm Changes No

The Candidate Elimination (CE) Algorithm

The g we removed is <?,?,?,?,?,?>, all the minimal

specialisations of it would be (remember we want to

label the training example as NO):

Sky: <Sunny, ?, ?, ?, ?, ?>, <Cloudy, ?, ?, ?, ?, ?>

Air Temp: <?,Warm,?,?,?,?>

Humidity: <?,?,Normal,?,?,?>

Wind: <?,?,?,Weak,?,?>

Water: <?,?,?,?,Cold,?>

Forecast: <?,?,?,?,?,Same>

However, not all these minimal specialisations go

into the new G.

Kristian Guillaumier, 2011 45

Sky AirT Hum Wind Water Fore

Rainy Cold High Strong Warm Changes No

Only <Sunny, ?, ?, ?, ?, ?>, <?,Warm,?,?,?,?> and <?,?,?,?,?,Same>

go into the new G.

<Cloudy, ?, ?, ?, ?, ?>, <?,?,Normal,?,?,?>, <?,?,?,Weak,?,?> and

<?,?,?,?,Cold,?> are not part of the new G.

Why?

Because it is inconsistent with the previously encountered training

examples (so far we saw training items 1 and 2).

<Cloudy, ?, ?, ?, ?, ?> is inconsistent with training item 1 and 2.

<?,?,Normal,?,?,?> is inconsistent with training item 2.

<?,?,?,Weak,?,?> is inconsistent with training item 1 and 2.

<?,?,?,?,Cold,?> is inconsistent with training item 1 and 2.

Kristian Guillaumier, 2011 46

The Candidate Elimination (CE) Algorithm

Sky AirTemp Humidity Wind Water Forecast IsGoodDay

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Changes No

Sunny Warm High Strong Cool Changes Yes

The Candidate Elimination (CE)

Algorithm

So far we got:

Kristian Guillaumier, 2011 47

The Candidate Elimination (CE)

Algorithm

After processing the 4

th

training item, we get:

Kristian Guillaumier, 2011 48

The Candidate Elimination (CE)

Algorithm

The entire version space derived from the

boundaries is:

Kristian Guillaumier, 2011 49

Converging

CE will converge towards the target concept if:

There are no errors in the training data.

The target concept is in H.

The target concept is exactly learnt when S and G converge

to a single and identical hypothesis.

If the training data contains errors e.g. a +ve example is

incorrectly labeled as ve:

The algorithm will remove the correct target concept from the

version space.

Eventually, if we are presented with enough training data, we

will detect an inconsistency because the G and S boundaries will

converge to an empty version space (i.e. there is no hypothesis

in H that is consistent with all the training examples).

Kristian Guillaumier, 2011 50

Requesting Training Examples

So far, our algorithm was given a set

containing labeled training data.

Suppose that our algorithm can come up with

an instance and ask (query) an external oracle

to label it.

What instance should the algorithm come up

with for an answer from the oracle?

Kristian Guillaumier, 2011 51

Requesting Training Examples

Consider the version space we got from the 4 fixed training examples we

had?

What training example would we like to have to further refine it?

We should come up with an instance that will classified as +ve by some

hypotheses and ve by others to further reduce the size of the version

space.

Kristian Guillaumier, 2011 52

Requesting Training Examples

Suppose we request the training example:

<sunny, warm, normal, light, warm, same>

3 hypothesis would classify it as +ve and 3 would classify it as

ve:

Kristian Guillaumier, 2011 53

Requesting Training Examples

So if we ask the oracle to classify:

<sunny, warm, normal, light, warm, same>

Wed either generalise the S boundary or

specialise the G boundary and shrink the size of

the version space (make it converge).

In general, the optimal instance wed like the

oracle to classify (the best training example to

have next) is the one that would half the size of

the version space.

If we have this option we can converge to the

target concept in Log

2

time.

Kristian Guillaumier, 2011 54

Partially Learned Concepts

Partially learned = we didnt converge to the

target concept (S and G are not the same).

Our previous example is a partially learned

concept:

Kristian Guillaumier, 2011 55

Partially Learned Concepts

It is possible to classify unseen examples with a degree of certainty.

Suppose we want to classify the instance (not in training data)

<sunny, warm, normal, strong, cool, change>

using our partially learned concept.

Kristian Guillaumier, 2011 56

Notice that every hypothesis in

the version space classifies the

unseen instance as +ve.

So all the hypothesis classified it

as +ve with the same confidence

as if there would have been only

the target concept remaining

(converged).

Partially Learned Concepts

Kristian Guillaumier, 2011 57

Sky AirTemp Humidity Wind Water Forecast

Sunny Warm Normal Strong Cool Change ?

Rainy Cold Normal Light Warm Same ?

Sunny Warm Normal Light Warm Same ?

Sunny Cold Normal Strong Warm Same ?

All hypothesis in version

space classify as +ve

All hypothesis in version

space classify as -ve

50/50. Need more training

examples.

Note: This is an optimal query

to request from an oracle.

2 +ve, 4-ve.

Possible take a majority vote

and output a confidence level.

Inductive Bias

Recall that our system, so far, works assuming the

target concept exists in our hypothesis space.

Also recall that our hypothesis space allowed only

for conjunctions (AND) of attribute values.

There is no way to allow for a disjunction of

values we cannot say Sky=Cloudy OR

Sky=Sunny.

Consider what would happen if in fact, I like

swimming if it is cloudy or sunny. Id get

something like

Kristian Guillaumier, 2011 58

Inductive Bias

Kristian Guillaumier, 2011 59

CE will converge to an empty version space: the target concept is not in

the hypothesis space. To see why:

The most specific hypothesis that classifies the first two examples as +ve

is:

<?, Warm, Normal, Strong, Cool, Change>

Although it is maximally specific for the frst 2 examples, it is already to

general: it will classify the 3

rd

example as +ve too.

The problem is that we biased our learner to consider only hypotheses

that are conjunctions.

Sky AirTemp Humidity Wind Water Forecast

Sunny Warm Normal Strong Cool Change Y

Cloudy Warm Normal Strong Cool Change Y

Rainy Warm Normal Strong Cool Change N

Unbiased Learning

Lets see what happens if to make sure that the target concept

definitely exists in the hypothesis space, we define the hypothesis

space to contain every possible concept.

This means that it is possible to represent every possible subset of X.

In our previous example (containing 6 attributes), the size of the

instance space is 96.

How many possible concepts can be defined over this set of

instances.

The powerset!!!

Recall that the size of a powerset (in general) is 2

|X|

.

So there are 2

96

(ouch) possible concepts that can be learnt from our

instance space.

We had seen that by introducing ? and C, we allowed for 973 possible

concepts which is <<<<<< 2

96

(we had a very strong bias) .

Kristian Guillaumier, 2011 60

Unbiased Learning

Lets define a new hypothesis space that can

represent every subset of instances. i.e.

= ().

To do this we allow H to allow for any combination of

disjunctions, conjunctions and negations. E.g. the

target concept Sky=Sunny OR Sky=Cloudy would be:

<sunny,?,?,?,?,?> v <cloudy,?,?,?,?,?>

So we can use CE knowing that our target concept will

definitely exist in the hypothesis space. But

We create a new problem. Our learner will learn how

to classify exactly the instances presented as training

examples and not generalise beyond them!

Kristian Guillaumier, 2011 61

Unbiased Learning

To see why, suppose, I have 5 training examples d

1

, d

2

,

d

3

, d

4

, d

5

. And that d

1

, d

2

, d

3

are +ve examples and d

4

,

d

5

are ve examples.

The S boundary will become a disjunction of the +ve

examples (since it is the most specific possible

hypothesis that covers the examples):

S = {(d

1

v d

2

v d

3

)}

The G boundary will become a negation (rule out) of

the negative training examples:

G = {(d

4

v d

5

)}

So the only unambiguously classifiable instances are

those that were provided as training examples.

Kristian Guillaumier, 2011 62

Unbiased Learning

What would happen if we use the partially learned concept

and take a vote.

Instances that were originally in the training data will be

classified unambiguously (obviously).

Any other instance not in the training data will be classified

as +ve by half of the hypothesis in the version space and as

ve by the other half of the hypothesis in the version space.

Note that H is the power set of X.

x is some unobserved instance (not in training data).

Then there is some h in the version space that covers x.

But there is also a corresponding h in the version space that

covers the same x except for its classification (h).

Kristian Guillaumier, 2011 63

More on Bias

Straight from:

A learner that makes no a priori

assumptions regarding the identity of the

target concept has no rational basis for

classifying any unseen instances.

(in fact, CE worked because we biased it with

the assumption that the target concept can be

represented by a conjunction of attribute

values).

Kristian Guillaumier, 2011 64

More on Bias

Consider:

L = a learning algorithm.

Has a set of training data D

c

= {<x, c(x)>}.

c = some target concept.

x

i

is some instance we wish to classify.

L(x

i

, D

c

) = the classification (+ve/-ve) that L assigns to x

i

after

learning from training data D

c

.

The inductive inference step is:

(D

c

. x

i

) L(x

i

, D

c

)

Where a b denotes that b is inductively inferred from a.

So the inductive inference step reads given the training

data D

c

and the instance x

i

, as inputs to L, we can

inductively infer the classification of the instance.

Kristian Guillaumier, 2011 65

More on Bias

Kristian Guillaumier, 2011 66

Sky AirTemp Humidity Wind Water Forecast IsGoodDay

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Changes No

Sunny Warm High Strong Cool Changes Yes

(D

c

. x

i

) L(x

i

, D

c

)

More on Bias

Because L is an inductive learning algorithm, in

general, we cannot prove that the result L(x

i

, D

c

)

is correct. I.e. the classification of the example

does not necessarily deductively follow from the

training data (can be proven).

However, we can add a number of assumptions

to our system so that the classification would

follow deductively.

The inductive bias of L is defined as these

assumptions.

Kristian Guillaumier, 2011 67

More on Bias

Let B = these assumptions (e.g. the hypothesis

space is made up only of conjunctions of

attribute values).

Then the inductive bas of L is B, giving:

(B . D

c

. x

i

) L(x

i

, D

c

)

Where the notation a b denotes that b

follows deductively from a (b is provable from

a).

Kristian Guillaumier, 2011 68

Defn. of Inductive Bias

Consider a concept learning algorithm L for the set of instances X.

Let c be an arbitrary concept over X and let D

c

= {<x, c(x)>} be an

arbitrary set of training examples of c.

Let L(x

i

, D

c

) denote the classification assigned to the instance x

i

by L

after training on the data D

c

.

The Inductive Bias of L is any minimal set of assertions B such that for

any target concept c and corresponding training examples D

c

(x

i

e X)[(B . D

c

. x

i

) L(x

i

, D

c

)]

Kristian Guillaumier, 2011 69

The Inductive Bias of CE

Let us specify what L(x

i

, D

c

) means for CE (how

classification works).

Given training data D

c

, CE will compute the version space

VS

H,Dc

.

Then it will classify a new instance x

i

by taking a vote

amongst the hypothesis in this version space.

A classification will be output (+ve or ve) if all the

hypothesis in the version space unanimously agree.

Otherwise no classification is output (I cant tell from

training data).

The inductive bias of CE is that the target concept c is

contained in the hypothesis space. i.e. ceH.

Why?

Kristian Guillaumier, 2011 70

The Inductive Bias of CE

1:

Notice that if we assume that ceH, then it follows

deductively (we can prove) that ceVS

H,Dc

.

2:

Recall that we defined the classification L(x

i

, D

c

) to be

a unanimous vote amongst all hypothesis in VS

H,Dc

.

Thus, if L outputs the classification L(x

i

, D

c

), then so

does every hypothesis heVS

H,Dc

including the

hypothesis ceVS

H,Dc

.

Therefore c(x

i

) = L(x

i

, D

c

).

Kristian Guillaumier, 2011 71

- 015Transféré pardewot
- Discovering Patterns of Urban Development-ICTI 2013Transféré parognenmarina
- 07850050Transféré parThales
- FunctionTransféré parSyaza Izzaty
- Codd72aTransféré parcuriousfan
- ganesh vandana ma1 03Transféré parapi-372203935
- Bench-Capon, Sartor_A Model of Legal Reasoning With Cases Incorporating Theories and ValuesTransféré paralilozadaprado
- BOOK-A Primer on Real AnalysisTransféré parm_sifalakis
- Module - FunctionsTransféré parRogelio Maneclang Canuel Jr.
- Image ProjectTransféré parBilal Amjad
- IC-L13Transféré parShivan Biradar
- A Comparative study on Classification and Clustering Techniques Using Assorted Data Mining ToolsTransféré parIJAFRC
- MLT Document FormatTransféré parlavamgmca
- Text Categorization with Support Vector Machines: Learning with Many Relevant FeaturesTransféré parnghiapickup1
- SE2AA4 Intro to Design & ModulesTransféré parsmk1992
- Clasification VariablesTransféré parankur_saxena_35
- Data Mining-Rule Based ClassificationTransféré parRaj Endran
- HTU_MahabalTransféré parAshish Mahabal
- OM Atce2014Transféré parKshitij Gupta
- p18 Survey GaberTransféré parJijeesh Baburajan
- 10.1.1.19Transféré parSuhail Kotwal
- Geodma for Image ClassificationTransféré parjanetpy
- dwamTransféré parSrinivasa Rao G
- lab4 reportTransféré parapi-321120010
- Privacy Preserving in Big Data using Cloud VirtulaizationTransféré parIRJAES
- Deteccion de celo mediante acelerometros 3DTransféré parHugo Leonardo Rufiner
- Biblio AnnTransféré parHicham Hallouâ
- International Journal of Computational Engineering Research(IJCER)Transféré parInternational Journal of computational Engineering research (IJCER)
- 4 Machine LearningTransféré parSoniaFernandez
- 05583370Transféré parMekaTron

- Part-time Courses Application FormTransféré parMark Scerri
- Verifying Web Applications: From Business Level Specifications to Automated Model-Based TestingTransféré parMark Scerri
- Verifying Web Applications: From Business Level Specifications to Automated Model-Based TestingTransféré parMark Scerri
- Airbus A320-330 Panel DocumentationTransféré parapi-3753120
- HelloTransféré parMark Scerri

- ATM machineTransféré parAbdifatah Said
- 11-4940-00158_-_Polycom_IP_7000Transféré parRich
- Test FinTransféré parCarmenPascu
- Types of ComputersTransféré parYolande Samuels-Cole
- Benestante Brigit Resume 2019SCRIBDTransféré parbrigitbenestante
- Course Outline- ITM350Transféré parsahilbhayana
- Installing or Removing Remote Server Administration Tools for Windows 7Transféré parMactears
- Designing Optical Infrastructures for IP NetworksTransféré parDimitris Vampoulis
- The Ultimate Tar Command Tutorial With 10 Practical ExamplesTransféré parJean Philippeaux
- Shujaat BhattiTransféré parShujaat Hussain Bhatti
- V MAX AllocationTransféré parVakiti Santosh Reddy
- Automatic Traffic ControlTransféré parJournalNX - a Multidisciplinary Peer Reviewed Journal
- a0caf406-817c-0010-82c7-eda71af511fa.pdfTransféré parAnonymous I8axPI6wkQ
- Why Conventional ERP and MRP Systems Fall ShortTransféré parav8b
- Pms ChartTransféré parVivi Vas
- The Elements of Aim l StyleTransféré parArthur Ziegler Paiva
- Mip - Dem Gateway 2Transféré parapi-3727060
- Neonixie 4 Digit Nixie Clock ControllerTransféré parAnonymous 9tNB2x
- ISO 9001 2015 Internal Audit Tracker SampleTransféré parali
- Alok Singh Yadav (Sapsd) (1)Transféré parAlok Yadav
- The Collection of Usage Tracking DataTransféré parvemurichandu
- KAV 6.0.4.1424 MP4 for Windows Workstations (automated) - TMSBBTransféré parkba2040
- FTP.docxTransféré parRenga
- NetLink WT Config Admin CiscoCallManagerTransféré parAsish Dey
- How to Export Data From MySQL to Excel Using PHPTransféré parFahri Çakar
- Interview OopsTransféré parRavi Teja
- p80c51bh IntelTransféré paryash_pachaury
- 9) Tutorial Problem.pdfTransféré parAjitsingh Jagtap
- COBY MP201_MNTransféré parKatarina Petrovic
- Devops a Software RevolutionTransféré parmails4vips