6 vues

Transféré par missinu

Overview of Supervised Learning

- points Estimate
- Threshold Tobit
- Bayesian Parameter and Reliability Estimate of Weibull Failure Time Distribution
- Improving The Accuracy Of KNN Classifier For Heart Attack Using Genetic Algorithm
- DM
- Supplemental Material to Intro to SQC 6th Ed
- Pregnant Probability _MLE
- 18_05_lec37
- Face Recognition Under Uncontrolled Indoor Environment
- Method of Moments
- Example ETA
- 10.1.1.30
- ch5.3
- Game Th. TalkNotes
- appendix
- Testing Market Efficiency Using StatArb With Applications to Momentum and Value Stratergies
- Business Case Analysis Finding Tweets With Criminal Intentions
- NER_Tweet.pdf
- mmh_gastro98.pdf
- 1607.01782

Vous êtes sur la page 1sur 133

DD3364

March 9, 2012

Problem 1: Regression

y

10

Predict its y value?

5

x

0

Problem 2: Classification

Is it a bike or a

face ?

Some Terminology

In Machine Learning have outputs which are predicted from

measured inputs.

Some Terminology

In Machine Learning have outputs which are predicted from

measured inputs.

output(s) given an input and lots of labelled training examples

{(input1 , output1 ), (input2 , output2 ), . . . , (inputn , outputn )}

Variable types

Outputs can be

discrete (categorical, qualitative),

continuous (quantitative) or

ordered categorical (order is important)

Predicting a continuous output is referred to as regression.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

More Notation

The prediction of the output for a given value of input vector

X is denoted by Y .

regression problems

T = {(x1 , y1 ), . . . , (xn , yn )}

with each xi Rp and yi R

It is presumed that we have labelled training data for

classification problems

T = {(x1 , g1 ), . . . , (xn , gn )}

with each xi Rp and gi {1, . . . , G}

More Notation

The prediction of the output for a given value of input vector

X is denoted by Y .

regression problems

T = {(x1 , y1 ), . . . , (xn , yn )}

with each xi Rp and yi R

It is presumed that we have labelled training data for

classification problems

T = {(x1 , g1 ), . . . , (xn , gn )}

with each xi Rp and gi {1, . . . , G}

More Notation

The prediction of the output for a given value of input vector

X is denoted by Y .

regression problems

T = {(x1 , y1 ), . . . , (xn , yn )}

with each xi Rp and yi R

It is presumed that we have labelled training data for

classification problems

T = {(x1 , g1 ), . . . , (xn , gn )}

with each xi Rp and gi {1, . . . , G}

Nearest Neighbours

Linear Model

Have an input vector X = (X1 , . . . , Xp )t

A linear model predicts the output Y as

Y = 0 +

p

X

Xj j

j=1

Let X = (1, X1 , . . . , Xp )t and = (0 , . . . , p )t then

Y = X t

Linear Model

Have an input vector X = (X1 , . . . , Xp )t

A linear model predicts the output Y as

Y = 0 +

p

X

Xj j

j=1

Let X = (1, X1 , . . . , Xp )t and = (0 , . . . , p )t then

Y = X t

How is a linear model fit to a set of training data?

Most popular approach is a Least Squares approach

is chosen to minimize

n

X

RSS() =

(yi xti )2

i=1

unique.

RSS() = (y X)t (y X)

where X Rnp is a matrix with each row being an input

vector and y = (y1 , . . . , yn )t

How is a linear model fit to a set of training data?

Most popular approach is a Least Squares approach

is chosen to minimize

n

X

RSS() =

(yi xti )2

i=1

unique.

RSS() = (y X)t (y X)

where X Rnp is a matrix with each row being an input

vector and y = (y1 , . . . , yn )t

The solution to

is given by

= (Xt X)1 Xt y

if Xt X is non-singular

This is easy to show by differentiation of RSS()

This model has p + 1 parameters.

Assume one has training data {(xi , yi )}n

i=1 with each

(

0

G(x)

=

1

if xt .5

if xt > .5

Assume one has training data {(xi , yi )}n

i=1 with each

(

0

G(x)

=

1

if xt .5

if xt > .5

The linear classifier

the training examples

too rigid

separated by a line

k=1

10 mixtures

The linear classifier

the training examples

too rigid

separated by a line

k=1

10 mixtures

The linear classifier

the training examples

too rigid

separated by a line

k=1

10 mixtures

the k-nearest neighbour fit for Y is

1

Y (x) =

k

yi

xi Nk (x)

closest points xi in the training data.

Closeness if defined by some metric.

For this lecture assume it is the Euclidean distance.

k-nearest neighbours in words:

responses.

Training data: {(xi , gi )} with each gi {0, 1}

is

the k-nearest neighbour estimate for G

G(x)

=

(

0

1

if

P

1

k

g

xi Nk (x) i .5

otherwise

closest points xi in the training data.

k-nearest neighbours in words:

of x as the majority class amongst the neighbours.

k = 15

k=1

classified.

But how well will it perform

same distribution?

k=1

classified.

But how well will it perform

same distribution?

k=1

classified.

But how well will it perform

same distribution?

k=1

There are two parameters that control the behaviour of k-nn.

These are k and n the number of training samples

The effective number of parameters of k-nn is n/k

Intuitively

say the nbds were non-overlapping

Would have n/k neighbourhoods

Need to fit one parameter (a mean) to each neighbourhood

There are two parameters that control the behaviour of k-nn.

These are k and n the number of training samples

The effective number of parameters of k-nn is n/k

Intuitively

say the nbds were non-overlapping

Would have n/k neighbourhoods

Need to fit one parameter (a mean) to each neighbourhood

Linear decision boundary is

smooth,

stable to fit

assumes a linear decision boundary is suitable

k-nn decision boundary is

can adapt to any shape of the data,

unstable to fit (for small k)

not smooth, wiggly (for small k)

Linear decision boundary is

smooth,

stable to fit

assumes a linear decision boundary is suitable

k-nn decision boundary is

can adapt to any shape of the data,

unstable to fit (for small k)

not smooth, wiggly (for small k)

pdfs for the two classes.

Test error

Training error: k-nn

Test error: k-nn

Training error: linear

Test error: linear

0.3

0.2

0.1

0

log( nk )

0

How do we measure how well f (X) predicts Y ?

Statisticians would compute the Expected Prediction Error

Z Z

By conditioning on X can write

How do we measure how well f (X) predicts Y ?

Statisticians would compute the Expected Prediction Error

Z Z

By conditioning on X can write

At a point x can minimize EPE to get the best prediction of y

c

The solution is

f (x) = E[Y |X = x]

This is known as the regression function.

At a point x can minimize EPE to get the best prediction of y

c

The solution is

f (x) = E[Y |X = x]

This is known as the regression function.

Only one problem with this: one rarely knows the pdf

p(Y |X).

Example:

p

Training data {(xi , yi )}n

i=1 where xi X R and yi R

averaging.

Let

X = [1, 1]2 and

x2

x2

x2

x

x1

x1

x1

= accuracy of y increases.

Therefore intuition says:

Lots of training data

More formally:

As n increases then

y =

1

k

xi Nk (x)

yi E[y | x]

Therefore intuition says:

Lots of training data

More formally:

As n increases then

y =

1

k

xi Nk (x)

yi E[y | x]

The Curse of Dimensionality (Bellman, 1961)

k-nearest neighbour averaging approach and our intuition

The Curse of Dimensionality (Bellman, 1961)

k-nearest neighbour averaging approach and our intuition

For large p

Nearest neighbours are not so close !

The k-nn of x are closer to the boundary of X .

Need a prohibitive number of training samples to densely

sample X Rp

The Curse of Dimensionality (Bellman, 1961)

k-nearest neighbour averaging approach and our intuition

For large p

Nearest neighbours are not so close !

The k-nn of x are closer to the boundary of X .

Need a prohibitive number of training samples to densely

sample X Rp

The Curse of Dimensionality (Bellman, 1961)

k-nearest neighbour averaging approach and our intuition

For large p

Nearest neighbours are not so close !

The k-nn of x are closer to the boundary of X .

Need a prohibitive number of training samples to densely

sample X Rp

Scenario:

Estimate a regression function, f : X R, using a k-nn regressor.

Have

X = [0, 1]p (the unit hyper-cube)

Scenario:

Estimate a regression function, f : X R, using a k-nn regressor.

Have

X = [0, 1]p (the unit hyper-cube)

Question:

Let k = r n where r [0, 1] and x = 0.

containing the k-nearest neighbours of x?

Scenario:

Estimate a regression function, f : X R, using a k-nn regressor.

Have

X = [0, 1]p (the unit hyper-cube)

Question:

Let k = r n where r [0, 1] and x = 0.

containing the k-nearest neighbours of x?

Solution:

Volume of hyper-cube of side a is ap . Looking for a s.t. ap equals

a fraction r of the unit hyper-cube volume. Therefore

ap = r = a = r1/p

To recap the expected edge length of the hyper-cube containing a

fraction r of the training data is

ep (r) = r1/p

To recap the expected edge length of the hyper-cube containing a

fraction r of the training data is

ep (r) = r1/p

ep (r)

p = 10

Let p = 10 then

p=3

0.8

ep (.01) = .63,

ep (.1) = .80

Therefore in this case 1% and

10% nearest neighbour estimate

are not local estimates.

p=2

0.6

p=1

0.4

0.2

0

r

0

0.2

0.4

0.6

0.8

Scenario:

Estimate a regression function, f : X R, using a k-nearest

neighbour regressor. Have

X is the unit hyper-sphere(ball) in Rp centred at the origin.

Scenario:

Estimate a regression function, f : X R, using a k-nearest

neighbour regressor. Have

X is the unit hyper-sphere(ball) in Rp centred at the origin.

Question:

Let k = 1 and x = 0.

What is the median distance of the nearest neighbour to x?

Scenario:

Estimate a regression function, f : X R, using a k-nearest

neighbour regressor. Have

X is the unit hyper-sphere(ball) in Rp centred at the origin.

Question:

Let k = 1 and x = 0.

What is the median distance of the nearest neighbour to x?

Solution:

This median distance is given by the expression

1

d(p, n) = (1 .5 n ) p

Plot of d(p, n) for n = 500

distance

0.5

0.4

0.3

0.2

0.1

p

2

10

boundary of X than to x

Consequence

For large p most of the training data points are closer to the

boundary of X than to x.

This is bad because

To make a prediction at x, you will use training samples near

Explanation:

Say n1 = 100 samples represents a dense sampling for a single

input problem

inputs.

sample the input space.

Explanation:

Say n1 = 100 samples represents a dense sampling for a single

input problem

inputs.

sample the input space.

Simulated Example

The Set-up

e8x

0.5

x

1

2

Y = f (X) = e8kXk

e8x

y0

0.5

x0

1

x(1) 0

frequency

150

100

50

0

p = 1, n = 20

x(1)

1

0.5

0.5

Average estimate of y0

frequency

300

200

100

y0

0

Note: True value is y = 1

0.5

p=2

1

x2

x1

1

2

Y = f (X) = e8kXk

p=2

x2

x(1)

x0

x1

1

1-nn estimate of y0

60

frequency

40

20

y0

0

Note: True value is y = 1

0.5

1-nn estimate of y0

frequency

80

60

40

20

0

y0

0

Note: True value is y = 1

0.5

As p increases

average distance to nn

1

average value of y0

0.8

0.6

0.5

0.4

0.2

p

2

10

0

2

10

average distance to nearest neighbour increases rapidly with p

thus average estimate of y0 also rapidly degrades

Bias-Variance Decomposition

For the simulation experiment have a completely deterministic

relationship:

Y = f (X) = e8kXk

= ET [(

y0 ET [

y0 ])2 ] + (ET [

y0 ] f (x0 ))2

= VarT (

y0 ) + Bias2 (

y0 )

1

MSE

Bias2

Variance

MSE

0.5

p

2

10

Why?

As p increases the nearest neighbour is never close to x0 = 0

Hence the estimate y0 tends to 0.

where variance dominates the MSE

The Set-up

1

2 (x

+ 1)3

3

2

1

0

x

1

The relationship between the inputs and output is defined by

1

Y = f (X) = (X1 + 1)3

2

1

2 (x

+ 1)3

3

2

1

y0

0

x0

1

0 x(1)

MSE

Bias2

Variance

MSE

0.2

0.1

p

2

10

Why?

as the deterministic function only involves one dimension the

bias doesnt explode as p increases!

Case 1

.5(x1 + 1)3 +

4

2

0

2

x1

1

Y = .5(X1 + 1)3 + ,

N (0, 1)

Case 2

x1 +

2

0

2

x1

1

Y = X1 + ,

N (0, 1)

EPE

f (x): linear, Pred: 1-nn

2

f (x): cubic, Pred: 1-nn

f (x): cubic, Pred: linear

1.5

p

2

10

EPE

f (x): linear, Pred: 1-nn

2

f (x): cubic, Pred: 1-nn

f (x): cubic, Pred: linear

1.5

p

2

10

linear predictor has a biased estimate of the cubic function

linear predictor fits well even in the presence of noise and high

dimension for the linear f

linear model beats curse of dimensionality

Words of Caution

In previous example linear predictor out-performed the 1-nn

regression function as

But could easily manufacture and example where

bias of linear predictor variance of the 1-nn predictor

There are a whole hosts of models in between the rigid linear

Many are specifically designed to avoid the exponential

Statistical models,

Supervised learning and

Function approximation

Goal

Know there is a function f (x) relating inputs to outputs:

Y f (X)

Want to find an estimate f(x) of f (x) from labelled training

data.

In this case need to incorporate special structure

reduce the bias and variance of the estimates

help combat the curse of dimensionality

y

10

f (x)

x

0

3

random variable indept of input X

Y = f (X) +

output

deterministic relationship

Y = f (X) +

where

the random variable has E[] = 0

is independent of X

f (x) = E[Y |X = x]

any departures from the deterministic relationship are mopped

up by

p(x)

1

0.5

x

0

p(x) = p(G = 1|X = x)

Therefore

E[G|X = x] = p(x) and

Have training data

T = {(x1 , y1 ), . . . , (xn , yn )}

where each xi Rp and yi R.

y

10

x

0

In book Supervised Learning is viewed as a problem in

function approximation.

Common approach

2.6 Statistical Models, Supervised Learning and Function Approximation

31

FIGURE 2.10. Least squares fitting of a function of two inputs. The parameters

minimize of

the sum-of-squared

errors.basis expansion

Decide onof fparametric

f , i.e. vertical

linear

a random sample yi , i = 1, . . . , N fromM

a density Pr (y) indexed by some

parameters . The log-probability of the observed sample is

f (x) =

L() =

N

!

i=1

hm (x) m

log

Pr (yi ).

m=1

(2.33)

Use least The

squares

to estimate in by minimizing

values for are those for which the probability of the observed sample is

largest. Least squares for the additive error model Y = f (X) + , with

n likelihood using the conditional

N (0, 2 ), is equivalent to maximum

likelihood

2

Pr(Y |X, ) = N (f (X),i 2 ).

(2.34)

i

RSS() =

(y f (x ))

of normality seems more restrictive,

i=1

the results are the same. The log-likelihood of the data is

Can find by optimizing other criteria.

Another option is Maximum Likelihood Estimation

For the additive model, Y = f (X) + have

P (Y |X, ) = N (f (X), 2 )

Log-likelihood of the training data is

L() =

=

n

X

i=1

n

X

i=1

log P (Y = yi |X = xi , )

log N (yi ; f (xi ), 2 )

Consider the Residual Sum of Squares for a function f

RSS(f ) =

n

X

(yi f (xi ))2

i=1

f

and

RSS(f) = 0

a solution.

Dont consider and arbitrary function f,

Instead restrict ourselves to f F

f F

Initial ambiguity in choosing f has just been transferred to

choice of constraint.

Dont consider and arbitrary function f,

Instead restrict ourselves to f F

f F

Initial ambiguity in choosing f has just been transferred to

choice of constraint.

Have a parametric representation of f

Linear model: f (x) = 1t x + 0

Quadratic: f (x) = xt x + 1t x + 0

i.e. f must have some regular behaviour in small

neighbourhoods of the input space, but then

What size should the neighbourhood be?

What form should f have in the neighbourhood?

Large neighbourhood = strong constraint

Small neighbourhood = weak constraint

The techniques used to restrict the regression or classification

The techniques used to restrict the regression or classification

Note:

It is assumed we have training examples {(xi , yi )}n

i=1 and

We present the energy functions or functionals which are

ensure f predicts the training values

penalty parameter

PRSS(f, ) =

Pn

i=1 (yi

J(f ) =

[f 00 (x)]2 dx

For wiggly f s this functional will have a large value while for

linear f s it is zero.

Regularization methods express our belief that the f were

Estimate the regression or classification function in a local

neighbourhood.

Need to specify

the nature of local neighbourhood

the class of functions used in local fit

Can define a local regression estimate of f (x0 ), from training

RSS(f , x0 ) =

n

X

i=1

where

Kernel function: K (x0 , xi ) assign weights to xi depending

on its closeness to x0 .

kx0 xk2

1

K (x0 , x) = exp

f is modelled as a linear expansion of basis functions

f (x) =

M

X

m hm (x)

m=1

Linear refers to the actions of the parameters.

f (x) =

M

X

K (m , x) m

m=1

where

Km (m , x) is a symmetric kernel centred at location m .

the Gaussian kernel is a popular kernel to use

If m s and m s pre-defined = estimating a linear

problem.

f (x) =

M

X

t

m (m

x + bm )

m=1

where

= (1 , . . . , M , 1 , . . . , M , b1 , . . . , bm )t

(z) = 1/(1 + ez ) is the activation function.

The directions m and bias terms bm have to be determined

Dictionary methods

Adaptively chosen basis function methods aka dictionary

methods

infinite).

mechanism

the Bias-Variance Trade-off

Many models have a parameter which control its complexity.

We have seen examples of this

k - number of nearest neighbours (nearest neighbour classifier)

- width of the kernel (radial basis functions)

M - number of basis functions (dictionary methods)

- weight of the penalty term (spline fitting)

model affect their predictive behaviour?

y

2

f(x)

f (x)

1.5

x

0

and = .1

y

2

E[f(x)]

1.5

x

0

At each x one std of the estimate is shown. Note its

magnitude.

y

2

1.5

f(x)

f (x)

x

0

and = .1

y

2

E[f(x)]

1.5

x

0

Compare the peak of f (x) and E[f15 (x)] !

Note the variance of estimate is much smaller than when

k = 1.

y

E[f(x)]

E[f(x)]

1.5

1.5

x

0

High complexity: k = 1

x

0

Lower complexity: k = 15

What not to do:

Want to choose model complexity which minimizes test error.

Training error is one estimate of the test error.

Could choose the model complexity that produces the

What not to do:

Want to choose model complexity which minimizes test error.

Training error is one estimate of the test error.

Could choose the model complexity that produces the

Why??

Training error decreases when model complexity

increases

Overfitting

38

Low Bias

High Variance

Prediction Error

High Bias

Low Variance

Test Sample

Training Sample

Low

High

Model Complexity

FIGURE 2.11. Test and training error as a function of model complexity.

be close to f (x0 ). As k grows, the neighbors are further away, and then

Have

a high

variance predictor

anything

can happen.

The variance term is simply the variance of an average here, and decreases as the inverse of k. So as k varies, there is a biasvariance tradeoff.

This More

scenario

is astermed

generally,

the modeloverfitting

complexity of our procedure is increased, the

variance tends to increase and the squared bias tends to decrease. The op-

posite behavior

occurs as the loses

model complexity

is decreased.

For k-nearest

In such

cases predictor

the ability

to generalize

neighbors, the model complexity is controlled by k.

Underfitting

38

Low Bias

High Variance

Prediction Error

High Bias

Low Variance

Test Sample

Training Sample

Low

High

Model Complexity

FIGURE 2.11. Test and training error as a function of model complexity.

be close to f (x0 ). As k grows, the neighbors are further away, and then

Therefore

predictor has poor generalization

The variance term is simply the variance of an average here, and decreases as the inverse of k. So as k varies, there is a biasvariance tradeoff.

More

the modelwill

complexity

of ourhow

procedure

is increased, the

Latter

ongenerally,

in theascourse

discuss

to overcome

these

variance tends to increase and the squared bias tends to decrease. The opproblems.

posite behavior occurs as the model complexity is decreased. For k-nearest

neighbors, the model complexity is controlled by k.

Underfitting

38

Low Bias

High Variance

Prediction Error

High Bias

Low Variance

Test Sample

Training Sample

Low

High

Model Complexity

FIGURE 2.11. Test and training error as a function of model complexity.

be close to f (x0 ). As k grows, the neighbors are further away, and then

Therefore

predictor has poor generalization

The variance term is simply the variance of an average here, and decreases as the inverse of k. So as k varies, there is a biasvariance tradeoff.

More

the modelwill

complexity

of ourhow

procedure

is increased, the

Latter

ongenerally,

in theascourse

discuss

to overcome

these

variance tends to increase and the squared bias tends to decrease. The opproblems.

posite behavior occurs as the model complexity is decreased. For k-nearest

neighbors, the model complexity is controlled by k.

- points EstimateTransféré parKarl Stessy Premier
- Threshold TobitTransféré parPino Bacada
- Bayesian Parameter and Reliability Estimate of Weibull Failure Time DistributionTransféré para256850
- Improving The Accuracy Of KNN Classifier For Heart Attack Using Genetic AlgorithmTransféré parطوق الياسمين
- DMTransféré parJose Antonio Espino Palomares
- Supplemental Material to Intro to SQC 6th EdTransféré parFrank Scialla
- Pregnant Probability _MLETransféré parNakkolop
- 18_05_lec37Transféré paranshul77
- Face Recognition Under Uncontrolled Indoor EnvironmentTransféré parSalwa Khaled Younes
- Method of MomentsTransféré parARUPARNA MAITY
- Example ETATransféré parAnonymous LLRrFX
- 10.1.1.30Transféré parkady123
- ch5.3Transféré parjuntujuntu
- Game Th. TalkNotesTransféré parBj Kr
- appendixTransféré parapi-278952612
- Testing Market Efficiency Using StatArb With Applications to Momentum and Value StratergiesTransféré parbsitler
- Business Case Analysis Finding Tweets With Criminal IntentionsTransféré parChaitanya Dev
- NER_Tweet.pdfTransféré parKumarecit
- mmh_gastro98.pdfTransféré parTanveer
- 1607.01782Transféré parcrocoali
- 016A REVIEW ON SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING TEXT MINING AND MACHINE LEARNINGTransféré parIJAR Journal
- Notes on Expectation-Maximization (EM) of Mixture Models.pdfTransféré parJun Wang
- MLE for Spatial Stochastic Frontier ModelsTransféré parNurus Sa'adiyah
- jew et al 2015 jasrTransféré parapi-254525512
- Bai Tap Mang NeuralTransféré parconcaosamac2010
- Robust Bayesianism: Relation to Evidence TheoryTransféré parDon Hass
- knn-algo by syed.pdfTransféré parjacs6401
- Research paperTransféré parBhupender Sharma
- Simulación urbana y redes bayesianasTransféré parUrbam Red
- Multi FusionTransféré parWassim Suleiman

- Exit Questionnaire 2005 FinalTransféré parmissinu
- Economic Accounts 2005 FinalTransféré parmissinu
- Economic Accounts 2005 FinalTransféré parmissinu
- Stata RTransféré parmissinu
- Exam ECON301B 2002 CommentedTransféré parmissinu
- Exam ECON301B 2002 CommentedTransféré parmissinu
- AnkiTransféré parmissinu
- MulticollinearityTransféré parmissinu
- Midterm Microeconomics 1 2012-13Transféré parmissinu
- Lecture 2 - Some Course AdminTransféré parmissinu
- Bai Giang Toan C2 (2009)Transféré parmissinu
- Lecture 3 - Linear Methods for ClassificationTransféré parmissinu
- Lecture 4 - Basis Expansion and RegularizationTransféré parmissinu
- Exam4135 2004 SolutionsTransféré parmissinu
- Balabolka SampleTransféré parmissinu
- Regulations Livestock in VN SummaryTransféré parmissinu
- Manure Estimates.pdfTransféré parmissinu
- Cuc BVTVTransféré parmissinu
- Giao Trinh VBA_GXDTransféré parYumi Ling
- Bai Tap Giai Tich 2 Chuong 2Transféré parmissinu
- Tom Tat Cong Thuc XSTKTransféré parTuấn Lê
- Thuchanh CH 141030Transféré parmissinu
- Bai Giang Toan Kinh Te Quang 2012 1171Transféré parnicksforums
- Giai Tich 2 2014 Chuong 5Transféré parmissinu
- Visual BasicTransféré parxuananh
- Vocabulary IELTS Speaking Theo TopicTransféré parBBBBBBB
- 3-Vu Trong Khai - Tich Tu Ruong DatTransféré parematn
- Introduction to Microeconomic Theory and GE Theory (2015)Transféré parmissinu
- GRE VocabularyTransféré parKoksiong Poon

- Decomposition MethodTransféré parAbdirahman Deere
- Lagrange HidrociclonesTransféré parRorrosky
- chap_13_2Transféré parapi-19746504
- Linear vs Non Linear KineticTransféré parLean Seey Tan
- A $2000 Lounge SuiteTransféré parluxvis
- IJME Fall 2008Transféré parSaeid Moslehpour
- Machine Learning Methods in Environmental SciencesTransféré parAgung Suryaputra
- AP Statistics MidtermTransféré parokaploi12
- Accelerated Curing - Concrete Mix DesignTransféré parElango Paulchamy
- Viscosity of Nickel ChlorideTransféré parroman_maximo
- Undergraduate EconometricTransféré parAcho Jie
- Assessment of Analysis Techniques used in determining Grounding System Potential Rise from the Fall of Potential MethodTransféré parDes Lawless
- Evaluation of Bull Prolificacy on Commercial Beef Cattle Ranches Using DNA Paternity AnalysisTransféré parLuis Meza
- lecture17Transféré parRania Ahmed
- regrassionTransféré parrsluna
- BSc Statistics 01Transféré parAkshay Goyal
- C. Von Kerczek and E.O. Tuck. the Representation of Ship Hulls by Conformal Mapping FunctionsTransféré parYuriyAK
- Do You Need Jmp Pro12Transféré parStevoIlic
- Niethammer2011 Geodesic Regression MiccaiTransféré parmarcniethammer
- Digital Control of Dynamic Systems (1998)Transféré parSaba Anwar
- Ch5 - Regression Slides for BLACKBOARDTransféré parsfsaf
- 1_M.a.- M. Sc. I Statistics (Dept.)Transféré parrohit kumar
- Self Tuning RegulatorTransféré parManivanna Boopathi
- 04 Linear RegressionTransféré parMuzzammil Khawaja
- v66i05Transféré parLeonhard Damasceno
- Lecture 1 - Overview of Supervised LearningTransféré parmissinu
- scipy-refTransféré parfsbalto
- 716-1 Simple Regression Theory ITransféré parVu Sang
- Mathematics for PositioningTransféré parPedro Luis Carro
- Chapter 14 Fixed Effects Regressions Least Square Dummy Variable Approach (EC220)Transféré parSafinaMandokhail