Vous êtes sur la page 1sur 47

Why and how to use random forest

variable importance measures


(and how you shouldnt)

Introduction
Construction
R functions

Variable
importance
Tests for variable
importance
Conditional
importance

Summary

Carolin Strobl (LMU M


unchen) and Achim Zeileis (WU Wien)
References

carolin.strobl@stat.uni-muenchen.de
useR! 2008, Dortmund

Introduction
Introduction
Construction
R functions

Random forests

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Introduction
Introduction
Construction
R functions

Random forests

Variable
importance

have become increasingly popular in, e.g., genetics and


the neurosciences

Tests for variable


importance
Conditional
importance

Summary
References

Introduction
Introduction
Construction
R functions

Random forests

Variable
importance

have become increasingly popular in, e.g., genetics and


the neurosciences [imagine a long list of references here]

Tests for variable


importance
Conditional
importance

Summary
References

Introduction
Introduction
Construction
R functions

Random forests

Variable
importance

have become increasingly popular in, e.g., genetics and


the neurosciences [imagine a long list of references here]

Tests for variable


importance
Conditional
importance

Summary

can deal with small n large p-problems, high-order


interactions, correlated predictor variables

References

Introduction
Introduction
Construction
R functions

Random forests

Variable
importance

have become increasingly popular in, e.g., genetics and


the neurosciences [imagine a long list of references here]

Tests for variable


importance
Conditional
importance

Summary

can deal with small n large p-problems, high-order


interactions, correlated predictor variables

are used not only for prediction, but also to assess


variable importance

References

(Small) random forest


Introduction
1
Start
p < 0.001

1
Start
p < 0.001

1
Start
p < 0.001

8
8

>8

12

2
n = 13
y = (0.308, 0.692)

>8

2
n = 15
y = (0.4, 0.6)

3
Age
p < 0.001

3
Start
p < 0.001

14

87

6
n = 16
y = (0.75, 0.25)

2
n = 38
y = (0.711, 0.289)

>5
9
n = 11
y = (0.364, 0.636)

2
Age
p < 0.001

> 12

81
3
n = 33
y = (1, 0)

3
Number
p < 0.001

> 81
4
Start
p < 0.001

12
5
n = 13
y = (0.385, 0.615)

>3

4
n = 25
y = (1, 0)

5
n = 18
y = (0.889, 0.111)

4
n = 11
y = (1, 0)

12

6
n = 12
y = (0.25, 0.75)

5
n = 31
y = (1, 0)

1
Start
p < 0.001
> 12

2
Age
p < 0.001

14

7
Number
p < 0.001
> 18

4
Number
p < 0.001
4

>3

8
9
n = 28
n = 21
y = (1, 0) y = (0.952, 0.048)

7
Start
p < 0.001
> 13

8
9
n = 11
n = 37
y = (0.818, 0.182) y = (1, 0)

>4

12

71

> 71

3
n = 15
y = (0.933, 0.067)

4
Start
p < 0.001
12

5
6
n = 12
n = 10
y = (0.417, 0.583)y = (0.2, 0.8)

> 12

81

8
5
Start
p < 0.001

> 81

13

7
n = 34
y = (1, 0)

1
Start
p < 0.001

12
5
Start
p < 0.001

12

3
4
6
n=9
n = 13
n = 12
y = (0.778, 0.222) y = (0.154, 0.846) y = (0.833, 0.167)

2
Age
p < 0.001

>3

136
6
n = 47
y = (1, 0)

> 136
7
n=8
y = (0.75, 0.25)

13

> 12
7
n = 47
y = (1, 0)

71

> 71

12
5
Start
p < 0.001

14

3
4
6
n = 15
n = 17
n = 17
y = (0.667, 0.333) y = (0.235, 0.765) y = (0.882, 0.118)

2
n = 28
y = (0.607, 0.393)

> 14
7
n = 32
y = (1, 0)

7
n = 10
y = (0.5, 0.5)

>3

3
Start
p < 0.001

6
n = 37
y = (0.865, 0.135)

> 13

4
n = 10
y = (0.8, 0.2)

5
n = 24
y = (1, 0)

1
Start
p < 0.001

1
Start
p < 0.001

> 12

Summary
>6

2
Number
p < 0.001
5
Age
p < 0.001

8
>8

Conditional
importance

1
Number
p < 0.001

>8

3
4
n = 12
n = 14
y = (0.667, 0.333) y = (0.143, 0.857)

Tests for variable


importance

> 12

5
6
n = 16
n = 15
y = (0.375, 0.625) y = (0.733, 0.267)

2
Start
p < 0.001

> 13

4
6
n = 16
n = 11
y = (0.188, 0.812) y = (0.818, 0.182)

1
Start
p < 0.001

>1

7
n = 35
y = (1, 0)

1
Start
p < 0.001

2
Age
p < 0.001

3
n = 20
y = (0.85, 0.15)

5
6
n = 14
n=9
y = (0.357, 0.643)
y = (0.111, 0.889)

> 14

2
Age
p < 0.001

>4

Variable
importance

6
n = 11
y = (0.818, 0.182)

13

4
Number
p < 0.001

7
n = 31
y = (0.806, 0.194)

> 125

1
Start
p < 0.001

1
Start
p < 0.001

> 27

>4

4
Age
p < 0.001
125

12

R functions

3
Number
p < 0.001

> 12

2
Age
p < 0.001

2
Start
p < 0.001

>1

5
n=9
y = (0.556, 0.444)

18

6
Start
p < 0.001

1
2
n=8
y = (0.375, 0.625)

>4

3
n = 10
y = (0.9, 0.1)

> 12

15 > 15
7
8
n = 12
n = 12
y = (0.833, 0.167) y = (1, 0)

1
Start
p < 0.001

7
n = 16
y = (1, 0)

7
n = 49
y = (1, 0)

> 68

3
Number
p < 0.001

> 13

1
Number
p < 0.001

12

27

68
5
Start
p < 0.001
13

5
n = 32
y = (1, 0)

1
Start
p < 0.001

3
n = 10
y = (1, 0)

> 87

4
n = 36
y = (1, 0)

> 14

4
n = 34
y = (0.882, 0.118)

> 12

2
Age
p < 0.001

Construction

1
Start
p < 0.001

2
n = 18
y = (0.5, 0.5)

> 12
3
Start
p < 0.001

14
4
n = 21
y = (0.905, 0.095)

>8
3
Start
p < 0.001
12

> 14
5
n = 32
y = (1, 0)

> 12

4
n = 18
y = (0.833, 0.167)

5
Number
p < 0.001
3
6
n = 30
y = (1, 0)

>3
7
n = 15
y = (0.933, 0.067)

References

Construction of a random forest


Introduction
Construction
R functions

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Construction of a random forest


Introduction
Construction
R functions

draw ntree bootstrap samples from original sample

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Construction of a random forest


Introduction
Construction
R functions

I
I

draw ntree bootstrap samples from original sample

Variable
importance

fit a classification tree to each bootstrap sample

Tests for variable


importance

ntree trees

Conditional
importance

Summary
References

Construction of a random forest


Introduction
Construction
R functions

I
I

draw ntree bootstrap samples from original sample

Variable
importance

fit a classification tree to each bootstrap sample

Tests for variable


importance

ntree trees

Conditional
importance

creates diverse set of trees because


I

trees are instable w.r.t. changes in learning data


ntree different looking trees (bagging)

randomly preselect mtry splitting variables in each split


ntree more different looking trees (random forest)

Summary
References

Random forests in R
Introduction

randomForest (pkg: randomForest)

Construction
R functions

reference implementation based on CART trees


(Breiman, 2001; Liaw and Wiener, 2008)

for variables of different types: biased in favor of


continuous variables and variables with many categories
(Strobl, Boulesteix, Zeileis, and Hothorn, 2007)
I

cforest (pkg: party)


I

based on unbiased conditional inference trees


(Hothorn, Hornik, and Zeileis, 2006)

+ for variables of different types: unbiased when


subsampling, instead of bootstrap sampling, is used
(Strobl, Boulesteix, Zeileis, and Hothorn, 2007)

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

(Small) random forest


Introduction
1
Start
p < 0.001

1
Start
p < 0.001

1
Start
p < 0.001

8
8

>8

12

2
n = 13
y = (0.308, 0.692)

>8

2
n = 15
y = (0.4, 0.6)

3
Age
p < 0.001

3
Start
p < 0.001

14

87

6
n = 16
y = (0.75, 0.25)

2
n = 38
y = (0.711, 0.289)

>5
9
n = 11
y = (0.364, 0.636)

2
Age
p < 0.001

> 12

81
3
n = 33
y = (1, 0)

3
Number
p < 0.001

> 81
4
Start
p < 0.001

12
5
n = 13
y = (0.385, 0.615)

>3

4
n = 25
y = (1, 0)

5
n = 18
y = (0.889, 0.111)

4
n = 11
y = (1, 0)

12

6
n = 12
y = (0.25, 0.75)

5
n = 31
y = (1, 0)

1
Start
p < 0.001
> 12

2
Age
p < 0.001

14

7
Number
p < 0.001
> 18

4
Number
p < 0.001
4

>3

8
9
n = 28
n = 21
y = (1, 0) y = (0.952, 0.048)

7
Start
p < 0.001
> 13

8
9
n = 11
n = 37
y = (0.818, 0.182) y = (1, 0)

>4

12

71

> 71

3
n = 15
y = (0.933, 0.067)

4
Start
p < 0.001
12

5
6
n = 12
n = 10
y = (0.417, 0.583)y = (0.2, 0.8)

> 12

81

8
5
Start
p < 0.001

> 81

13

7
n = 34
y = (1, 0)

1
Start
p < 0.001

12
5
Start
p < 0.001

12

3
4
6
n=9
n = 13
n = 12
y = (0.778, 0.222) y = (0.154, 0.846) y = (0.833, 0.167)

2
Age
p < 0.001

>3

136
6
n = 47
y = (1, 0)

> 136
7
n=8
y = (0.75, 0.25)

13

> 12
7
n = 47
y = (1, 0)

71

> 71

12
5
Start
p < 0.001

14

3
4
6
n = 15
n = 17
n = 17
y = (0.667, 0.333) y = (0.235, 0.765) y = (0.882, 0.118)

2
n = 28
y = (0.607, 0.393)

> 14
7
n = 32
y = (1, 0)

7
n = 10
y = (0.5, 0.5)

>3

3
Start
p < 0.001

6
n = 37
y = (0.865, 0.135)

> 13

4
n = 10
y = (0.8, 0.2)

5
n = 24
y = (1, 0)

1
Start
p < 0.001

1
Start
p < 0.001

> 12

Summary
>6

2
Number
p < 0.001
5
Age
p < 0.001

8
>8

Conditional
importance

1
Number
p < 0.001

>8

3
4
n = 12
n = 14
y = (0.667, 0.333) y = (0.143, 0.857)

Tests for variable


importance

> 12

5
6
n = 16
n = 15
y = (0.375, 0.625) y = (0.733, 0.267)

2
Start
p < 0.001

> 13

4
6
n = 16
n = 11
y = (0.188, 0.812) y = (0.818, 0.182)

1
Start
p < 0.001

>1

7
n = 35
y = (1, 0)

1
Start
p < 0.001

2
Age
p < 0.001

3
n = 20
y = (0.85, 0.15)

5
6
n = 14
n=9
y = (0.357, 0.643)
y = (0.111, 0.889)

> 14

2
Age
p < 0.001

>4

Variable
importance

6
n = 11
y = (0.818, 0.182)

13

4
Number
p < 0.001

7
n = 31
y = (0.806, 0.194)

> 125

1
Start
p < 0.001

1
Start
p < 0.001

> 27

>4

4
Age
p < 0.001
125

12

R functions

3
Number
p < 0.001

> 12

2
Age
p < 0.001

2
Start
p < 0.001

>1

5
n=9
y = (0.556, 0.444)

18

6
Start
p < 0.001

1
2
n=8
y = (0.375, 0.625)

>4

3
n = 10
y = (0.9, 0.1)

> 12

15 > 15
7
8
n = 12
n = 12
y = (0.833, 0.167) y = (1, 0)

1
Start
p < 0.001

7
n = 16
y = (1, 0)

7
n = 49
y = (1, 0)

> 68

3
Number
p < 0.001

> 13

1
Number
p < 0.001

12

27

68
5
Start
p < 0.001
13

5
n = 32
y = (1, 0)

1
Start
p < 0.001

3
n = 10
y = (1, 0)

> 87

4
n = 36
y = (1, 0)

> 14

4
n = 34
y = (0.882, 0.118)

> 12

2
Age
p < 0.001

Construction

1
Start
p < 0.001

2
n = 18
y = (0.5, 0.5)

> 12
3
Start
p < 0.001

14
4
n = 21
y = (0.905, 0.095)

>8
3
Start
p < 0.001
12

> 14
5
n = 32
y = (1, 0)

> 12

4
n = 18
y = (0.833, 0.167)

5
Number
p < 0.001
3
6
n = 30
y = (1, 0)

>3
7
n = 15
y = (0.933, 0.067)

References

Measuring variable importance


Introduction
Construction
R functions

Variable

Gini importance

importance

mean Gini gain produced by Xj over all trees


I

obj <- randomForest(..., importance=TRUE)


obj$importance

column: MeanDecreaseGini

importance(obj, type=2)

for variables of different types: biased in favor of continuous


variables and variables with many categories

Tests for variable


importance
Conditional
importance

Summary
References

Measuring variable importance


Introduction

permutation importance

Construction
R functions

mean decrease in classification accuracy after


permuting Xj over all trees
I

obj <- randomForest(..., importance=TRUE)


obj$importance

column: MeanDecreaseAccuracy

Variable
importance
Tests for variable
importance
Conditional
importance

Summary

importance(obj, type=1)
I

obj <- cforest(...)


varimp(obj)

for variables of different types: unbiased only when


subsampling is used as in cforest(..., controls =
cforest unbiased())

References

The permutation importance


within each tree t

Introduction
Construction
R functions

Variable

P
VI (t) (xj ) =

(t)

I yi = yi


(t)
B

(t)

iB

(t)

iB

(t)

I yi = yi,j


(t)
B

importance
Tests for variable
importance
Conditional
importance

Summary

(t)

yi

= f (t) (xi ) = predicted class before permuting

(t)

yi,j = f (t) (xi,j ) = predicted class after permuting Xj


xi,j = (xi,1 , . . . , xi,j1 , xj (i),j , xi,j+1 , . . . , xi,p

Note: VI (t) (xj ) = 0 by definition, if Xj is not in tree t

References

The permutation importance


Introduction
Construction
R functions

over all trees:

Variable
importance

1. raw importance

Tests for variable


importance
Conditional
importance

Pntree
VI (xj ) =

VI (t) (xj )
ntree

Summary

t=1

obj <- randomForest(..., importance=TRUE)


importance(obj, type=1, scale=FALSE)

References

The permutation importance


Introduction
Construction
R functions

over all trees:

Variable
importance

2. scaled importance (z-score)

Tests for variable


importance
Conditional
importance

VI (xj )

ntree

Summary

= zj

obj <- randomForest(..., importance=TRUE)


importance(obj, type=1, scale=TRUE) (default)

References

Tests for variable importance


for variable selection purposes

Introduction
Construction
R functions

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Tests for variable importance


for variable selection purposes

Introduction
Construction
R functions

Breiman and Cutler (2008): simple significance test


based on normality of z-score
randomForest, scale=TRUE + -quantile of N(0,1)

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Tests for variable importance


for variable selection purposes

Introduction
Construction
R functions

Breiman and Cutler (2008): simple significance test


based on normality of z-score
randomForest, scale=TRUE + -quantile of N(0,1)

Diaz-Uriarte and Alvarez de Andres (2006): backward

Variable
importance
Tests for variable
importance
Conditional
importance

Summary

elimination (throw out least important variables until


out-of-bag prediction accuracy drops)
varSelRF (pkg: varSelRF), dep. on randomForest

References

Tests for variable importance


for variable selection purposes

Introduction
Construction
R functions

Breiman and Cutler (2008): simple significance test


based on normality of z-score
randomForest, scale=TRUE + -quantile of N(0,1)

Diaz-Uriarte and Alvarez de Andres (2006): backward

Variable
importance
Tests for variable
importance
Conditional
importance

Summary

elimination (throw out least important variables until


out-of-bag prediction accuracy drops)
varSelRF (pkg: varSelRF), dep. on randomForest
I

Diaz-Uriarte (2007) and Rodenburg et al. (2008): plots


and significance test (randomly permute response values
to mimic the overall null hypothesis that none of the
predictor variables is relevant = baseline)

References

Tests for variable importance


Introduction
Construction
R functions

Variable

problems of these approaches:

importance
Tests for variable
importance
Conditional
importance

Summary
References

Tests for variable importance


Introduction
Construction
R functions

Variable

problems of these approaches:

importance
Tests for variable
importance

(at least) Breiman and Cutler (2008): strange statistical


properties (Strobl and Zeileis, 2008)

Conditional
importance

Summary
References

Tests for variable importance


Introduction
Construction
R functions

Variable

problems of these approaches:

importance
Tests for variable
importance

(at least) Breiman and Cutler (2008): strange statistical


properties (Strobl and Zeileis, 2008)

Conditional
importance

Summary
References

all: preference of correlated predictor variables (see also


Nicodemus and Shugart, 2007; Archer and Kimes, 2008)

Breiman and Cutlers test


Introduction
Construction
R functions

Variable

under the null hypothesis of zero importance:

importance
Tests for variable
importance

as.

zj N(0, 1)

Conditional
importance

Summary
References

if zj exceeds the -quantile of N(0,1) reject the


null hypothesis of zero importance for variable Xj

Raw importance
Introduction
Construction
R functions

Variable

sample size

importance

100
200
500
mean importance
ntree = 200

mean importance
ntree = 100

Tests for variable


importance

mean importance
ntree = 500

Conditional
importance

Summary
References

0.0

0.1

0.2

0.3

0.4

0.0

0.1

0.2

relevance

0.3

0.4

0.0

0.1

0.2

0.3

0.4

z-score and power


Introduction

sample size

Construction

100
200
500
zscore
ntree = 200

zscore
ntree = 100

R functions
zscore
ntree = 500

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
0.0

0.1

0.2

0.3

0.4

0.0

0.1

power
ntree = 100

0.0

0.1

0.2

0.2

0.3

0.4

0.0

0.1

power
ntree = 200

0.3

0.4

0.0

0.1

0.2

relevance

0.2

0.3

0.4

References

power
ntree = 500

0.3

0.4

0.0

0.1

0.2

0.3

0.4

Findings
Introduction
Construction

z-score and power

R functions

Variable
importance

increase in ntree

decrease in sample size

Tests for variable


importance
Conditional
importance

Summary

rather use raw, unscaled permutation importance!


importance(obj, type=1, scale=FALSE)
varimp(obj)

References

What null hypothesis were we testing


in the first place?

Introduction
Construction
R functions

Variable

obs

Xj

1
..
.

y1
..
.

xj (1),j
..
.

z1
..
.

i
..
.

yi
..
.

xj (i),j
..
.

zi
..
.

yn

xj (n),j

zn

H0 : Xj Y , Z or Xj Y Xj Z
H

P(Y , Xj , Z ) =0 P(Y , Z ) P(Xj )

importance
Tests for variable
importance
Conditional
importance

Summary
References

What null hypothesis were we testing


in the first place?

Introduction
Construction
R functions

Variable
importance

the current null hypothesis reflects independence of Xj from


both Y and the remaining predictor variables Z

Tests for variable


importance
Conditional
importance

Summary
References

What null hypothesis were we testing


in the first place?

Introduction
Construction
R functions

Variable
importance

the current null hypothesis reflects independence of Xj from


both Y and the remaining predictor variables Z
a high variable importance can result from violation of
either one!

Tests for variable


importance
Conditional
importance

Summary
References

Suggestion: Conditional permutation scheme


Introduction

obs

Xj

y1

xj|Z =a (1),j

z1 = a

y3

xj|Z =a (3),j

z3 = a

27

y27

xj|Z =a (27),j

z27 = a

y6

xj|Z =b (6),j

z6 = b

14

y14

xj|Z =b (14),j

z14 = b

33
..
.

y33
..
.

xj|Z =b (33),j
..
.

z33 = b
..
.

H0 : Xj Y |Z
P(Y , Xj |Z )

H0

P(Y |Z ) P(Xj |Z )

or P(Y |Xj , Z )

H0

P(Y |Z )

Construction
R functions

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Technically
Introduction
Construction
R functions

use any partition of the feature space for conditioning

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Technically
Introduction
Construction
R functions

I
I

use any partition of the feature space for conditioning

Variable
importance

here: use binary partition already learned by tree

Tests for variable


importance

(use cutpoints as bisectors of feature space)

Conditional
importance

Summary
References

Technically
Introduction
Construction
R functions

I
I

use any partition of the feature space for conditioning

Variable
importance

here: use binary partition already learned by tree

Tests for variable


importance

(use cutpoints as bisectors of feature space)

Conditional
importance

condition on correlated variables or select some

Summary
References

Technically
Introduction
Construction
R functions

I
I

use any partition of the feature space for conditioning

Variable
importance

here: use binary partition already learned by tree

Tests for variable


importance

(use cutpoints as bisectors of feature space)

Conditional
importance

condition on correlated variables or select some

Summary
References

Strobl et al. (2008)


available in cforest from version 0.9-994: varimp(obj,
conditional = TRUE)

Simulation study
I
I

Introduction

i.i.d.

dgp: yi = 1 xi,1 + + 12 xi,12 + i , i N(0, 0.5)


X1 , . . . , X12 N(0, )

Construction
R functions

Variable

0.9

0.9

0.9

0 0

0.9

0.9

0.9

0.9

0.9

0.9

0.9

0.9

0.9

0
..
.

0
..
.

0
..
.

0
..
.

0 0

0 0

0 0

1 0

.. . .
. 0
.

importance

Tests for variable


importance
Conditional
importance

Summary
References

Xj

X1

X2

X3

X4

X5

X6

X7

X8

X12

-5

-5

-2

Results
Construction
R functions

15

mtry = 1

25

Introduction

Variable

0 5

importance

30

Summary

References

10

11

12

mtry = 8

0 10

mtry = 3

50

Conditional
importance

20 40 60 80

Tests for variable


importance

variable
variable

Peptide-binding data
Introduction
Construction

Variable

0.005

importance
Tests for variable
importance
Conditional
importance

Summary

0.005

References

conditional
conditional

unconditional

R functions

h2y8

flex8

*
pol3

Summary
Introduction
Construction
R functions

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =

Introduction
Construction
R functions

cforest unbiased()
with permutation importance varimp(obj)

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =

Introduction
Construction
R functions

cforest unbiased()
with permutation importance varimp(obj)

Variable
importance
Tests for variable
importance

otherwise: feel free to use cforest (pkg: party)

Conditional
importance

with permutation importance varimp(obj)

Summary

or randomForest (pkg: randomForest)

References

with permutation importance importance(obj, type=1)


or Gini importance importance(obj, type=2)
but dont fall for the z-score! (i.e. set scale=FALSE)

Summary
if your predictor variables are of different types:
use cforest (pkg: party) with default option controls =

Introduction
Construction
R functions

cforest unbiased()
with permutation importance varimp(obj)

Variable
importance
Tests for variable
importance

otherwise: feel free to use cforest (pkg: party)

Conditional
importance

with permutation importance varimp(obj)

Summary

or randomForest (pkg: randomForest)

References

with permutation importance importance(obj, type=1)


or Gini importance importance(obj, type=2)
but dont fall for the z-score! (i.e. set scale=FALSE)
if your predictor variables are highly correlated: use the
conditional importance in cforest (pkg: party)

Introduction
Construction
R functions

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Archer, K. J. and R. V. Kimes (2008). Empirical characterization


of random forest variable importance measures. Computational

Introduction
Construction

Statistics & Data Analysis 52 (4), 22492260.


Breiman, L. (2001). Random forests. Machine Learning 45 (1),
532.
Breiman, L. and A. Cutler (2008). Random forests Classification
manual. Website accessed in 1/2008;
http://www.math.usu.edu/adele/forests.
Breiman, L., A. Cutler, A. Liaw, and M. Wiener (2006). Breiman
and Cutlers Random Forests for Classification and Regression.
R package version 4.5-16.
Diaz-Uriarte, R. (2007). GeneSrF and varselrf: A web-based
tool and R package for gene selection and classification using
random forest. BMC Bioinformatics 8:328.

R functions

Variable
importance
Tests for variable
importance
Conditional
importance

Summary
References

Hothorn, T., K. Hornik, and A. Zeileis (2006). Unbiased recursive


partitioning: A conditional inference framework. Journal of
Computational and Graphical Statistics 15 (3), 651674.

Introduction
Construction
R functions

Variable

Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn (2007).

importance

Bias in random forest variable importance measures:

Tests for variable


importance

Illustrations, sources and a solution. BMC Bioinformatics 8:25.

Conditional
importance

Strobl, C. and A. Zeileis (2008). Danger: High power! exploring


the statistical properties of a test for random forest variable
importance. In Proceedings of the 18th International
Conference on Computational Statistics, Porto, Portugal.
Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis
(2008). Conditional variable importance for random forests.
BMC Bioinformatics 9:307.

Summary
References

Vous aimerez peut-être aussi