Vous êtes sur la page 1sur 11

Hypothesis Testing in R

One-sample t-tests
Lets generate an independent random sample of size 100 from
the N(mean=10, variance=9) distribution
set.seed(10)
sample1.100<- rnorm(100, mean=10, sd=3)
sample1.100
[1] 10.056239 9.447242 5.886008 8.202497 10.883635 11.169383 6.375771
[8] 8.908972 5.119982 9.230565 13.305339 12.267345 9.285299 12.962334
[15] 12.224170 10.268042 7.135168 9.414549 12.776564 11.448936 8.211068
[22] 3.444139 7.975402 3.642816 6.204406 8.879015 7.937334 7.383524
[29] 9.694717 9.238658 4.438779 9.766162 12.905699 10.554778 5.860169
[36] 5.693457 11.086262 4.722740 9.026368 8.045311 13.259654 7.712365
[43] 7.514012 12.503422 7.097044 9.913554 10.697575 9.096374 7.967156
[50] 11.965683 8.798087 8.996330 14.103862 16.413301 11.517458 12.359027
[57] 7.293364 11.598691 8.062317 10.872962 6.287217 8.631471 7.509032
[64] 11.020347 13.199129 13.648378 12.207072 8.556374 11.688234 6.261041
[71] 11.142767 5.708718 6.854663 9.344489 5.530191 13.518119 5.560519
[78] 8.708837 6.845084 14.567759 11.778484 9.332015 12.138683 12.149803
[85] 11.320726 10.476492 11.979292 16.661559 6.448165 9.778132 8.750936
[92] 9.425553 10.208634 13.466045 11.784872 5.741065 5.179968 12.678778
[99] 10.444504 13.681085

Note that you have to feed in the standard deviation in the


formula above, not the variance!

Suppose that I want to test whether or not the true mean is zero
or not. Of course, we know that the true mean is 10, so expect to
reject the null hypothesis (that the mean is equal to zero).
We choose a two-sided alternative that the mean is different
from zero!
t.test(sample1.100,alternative=c(two.sided))
One Sample t-test
data: sample1.100
t = 33.9637, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
9.030068 10.150638
sample estimates:
mean of x
9.590353

Now, what if we want to choose a different alternative (onesided)? Such as, the mean is greater than zero?
t.test(sample1.100,alternative=c(greater))
One Sample t-test
data: sample1.100
t = 33.9637, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
9.121507
Inf
sample estimates:
mean of x
9.590353

How about if we want to test that the mean is less than zero?
t.test(sample1.100,alternative=c(less))
One Sample t-test
data: sample1.100
t = 33.9637, df = 99, p-value = 1
alternative hypothesis: true mean is less than 0
95 percent confidence interval:
-Inf 10.0592
sample estimates:
mean of x
9.590353

Now, I want to test whether or not the mean in the first


population is equal to 9 or not. Of course, we know that the true
mean is 10, so expect to reject the null hypothesis (that the mean
is equal to 9).
We choose a two-sided alternative that the mean is different
from 9!
t.test(sample1.100, mu=9 , alternative=c(two.sided))
One Sample t-test
data: sample1.100
t = 2.0907, df = 99, p-value = 0.03912
alternative hypothesis: true mean is not equal to 9
95 percent confidence interval:
9.030068 10.150638
sample estimates:
mean of x
9.590353

Now, what if we want to choose a different alternative (onesided)? Such as, the mean is greater than 9?
3

t.test(sample1.100, mu=9, alternative=c(greater))


One Sample t-test
data: sample1.100
t = 2.0907, df = 99, p-value = 0.01956
alternative hypothesis: true mean is greater than 9
95 percent confidence interval:
9.121507
Inf
sample estimates:
mean of x
9.590353

How about if we want to test that the mean is less than 9?


t.test(sample1.100, mu=9, alternative=c(less))
One Sample t-test
data: sample1.100
t = 2.0907, df = 99, p-value = 0.9804
alternative hypothesis: true mean is less than 9
95 percent confidence interval:
-Inf 10.0592
sample estimates:
mean of x
9.590353

Now, I want to test whether or not the mean in the first


population is equal to 10 or not. We know that the true mean is
10, so expect to fail to reject the null hypothesis (that the mean
is equal to 10).
We choose a two-sided alternative that the mean is different
from 10!
t.test(sample1.100, mu=10 , alternative=c(two.sided))

One Sample t-test


data: sample1.100
t = -1.4507, df = 99, p-value = 0.15
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
9.030068 10.150638
sample estimates:
mean of x
9.590353

Now, what if we want to choose a different alternative (onesided)? Such as, the mean is greater than 10?
t.test(sample1.100, mu=10, alternative=c(greater))
One Sample t-test
data: sample1.100
t = -1.4507, df = 99, p-value = 0.925
alternative hypothesis: true mean is greater than 10
95 percent confidence interval:
9.121507
Inf
sample estimates:
mean of x
9.590353

How about if we want to test that the mean is less than 10?
t.test(sample1.100, mu=10, alternative=c(less))
One Sample t-test
data: sample1.100
t = -1.4507, df = 99, p-value = 0.075
alternative hypothesis: true mean is less than 10
95 percent confidence interval:
-Inf 10.0592
sample estimates:
mean of x
9.590353

Two-sample T-tests

Suppose that we now generate independently a second sample of


size 100 from the N(mean=5, variance=9) distribution.
sample2.100<- rnorm(100, mean=5, sd=3)
sample2.100
[1] 2.71458698 6.25812622 1.88016991 7.13472190 3.10036096
6.68952399 6.98296006 0.02584743 8.08450393 8.38386084
1.15953619 8.38660468
[13] 3.60759642 4.05271937 7.77287944 5.23143417
8.11977082 7.22565862 8.76663457 7.85275690 3.55590318
5.60864533 4.90478077 1.41325910
[25] 6.87104371 2.25558655 5.74627402 1.81213162
3.90805326 1.37901544 9.28763834 6.90030767 -0.99044685
2.95450348 3.61983356 2.05079242
[37] 6.48599514 7.17745250 7.00189620 7.86435931 0.02599654 1.38444382 -0.88975747 9.41225693 6.11741702
8.19763800 6.59194961 5.30595034
[49] 9.01334740 5.26170431 3.82668738 4.25039755
8.46531424 2.40581828 2.39996497 -1.96305109 6.82649051
8.45001814 1.40120698 0.25999774
[61] 6.95949858 3.35177454 6.56316358 2.90179080
3.68327206 2.96804211 7.87742358 0.59548001 5.55129168
0.69455845 1.58780031 3.75606402
[73] 5.43180286 8.18607299 3.28761829 8.83154413
5.68486796 4.07356081 7.87948739 6.64646712 6.27653928
6.93050011 0.91908157 4.40448168
[85] 6.85790803 11.20462882 4.08414574 5.84373684
7.07395201 5.13908431 5.33908808 7.98599562 2.95654592
1.16882826 0.59390675 4.05957780
[97] -0.11097848 0.94845603 1.69371897 1.70137096

I would now like to compare the means is the two populations


that I have drawn my samples from. As a reminder, the first
population has mean 10, while the second one has mean 5. They
6

both have variance equal to 9. We will assume that we know


that the variances are equal, but unknown!
t.test(sample1.100, sample2.100, var.equal=TRUE)
Two Sample t-test
data: sample1.100 and sample2.100
t = 12.0268, df = 198, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to
0
95 percent confidence interval:
4.075852 5.674630
sample estimates:
mean of x mean of y
9.590353 4.715112

By specifying var.equal=TRUE, the pooled variance formula is


used in the computations (see Lecture Notes 12!)
Note that we have tested whether or not the true difference of
the two means is equal to zero! What if we want to test if it is
equal to 3?
t.test(sample1.100, sample2.100, mu=3, var.equal=TRUE)
Two Sample t-test
data: sample1.100 and sample2.100
t = 4.626, df = 198, p-value = 6.724e-06
alternative hypothesis: true difference in means is not equal to
3
95 percent confidence interval:
4.075852 5.674630
sample estimates:
mean of x mean of y
9.590353 4.715112

What if we want to test if the true mean difference is greater


than 3?
t.test(sample1.100, sample2.100, mu=3, var.equal=TRUE,
alternative=c(greater))
Two Sample t-test
data: sample1.100 and sample2.100
t = 4.626, df = 198, p-value = 3.362e-06
alternative hypothesis: true difference in means is greater than
3
95 percent confidence interval:
4.205339
Inf
sample estimates:
mean of x mean of y
9.590353 4.715112

What if we want to test if the true mean difference is greater


than 7?
t.test(sample1.100, sample2.100, mu=7, var.equal=TRUE,
alternative=c(greater))
Two Sample t-test
data: sample1.100 and sample2.100
t = -5.2416, df = 198, p-value = 1
alternative hypothesis: true difference in means is greater than
7
95 percent confidence interval:
4.205339
Inf
sample estimates:
mean of x mean of y
9.590353 4.715112

What if we want to test if the true mean difference is less than 7?


t.test(sample1.100, sample2.100, mu=7, var.equal=TRUE,
alternative=c(less))
8

Two Sample t-test


data: sample1.100 and sample2.100
t = -5.2416, df = 198, p-value = 2.032e-07
alternative hypothesis: true difference in means is less than 7
95 percent confidence interval:
-Inf 5.545143
sample estimates:
mean of x mean of y
9.590353 4.715112

If we do not assume that we know that the variances are equal,


then the number of degrees of freedom of the test statistic will
be approximated!
t.test(sample1.100, sample2.100, var.equal=FALSE)
Welch Two Sample t-test
data: sample1.100 and sample2.100
t = 12.0268, df = 197.827, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to
0
95 percent confidence interval:
4.075848 5.674634
sample estimates:
mean of x mean of y
9.590353 4.715112

The resulting t-test is called Welchs T test!

Confidence Intervals for the Difference of the true Means


To construct a 90% confidence interval for the difference of the
true means, we use
t.test(sample1.100, sample2.100, var.equal=TRUE,
conf.level=0.90)
Two Sample t-test
data: sample1.100 and sample2.100
t = 12.0268, df = 198, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to
0
90 percent confidence interval:
4.205339 5.545143
sample estimates:
mean of x mean of y
9.590353 4.715112

Paired T-tests
We now want to create a paired structure, so we will do it in a
simple way! To the first sample sample1.100, we will add
some noise N(mean=0, variance=0.1) and some signal
equal to 0.2. Call the resulting sample sample3.100.
Then, this latest sample comes from a population N(mean=10.2,
variance=9.1).
sample3.100 <-0.2+rnorm(100, mean=0,sd=sqrt(0.1))+
sample1.100

10

sample3.100
[1] 10.640618 9.751875 6.525652 8.678262 10.741851
11.526175 6.908642 8.705892 5.258739 9.021023 13.550205
12.866334 9.348847 12.585942 12.535655
[16] 10.041624 7.558959 9.484614 12.835512 11.319693
8.306958 3.554703 8.312148 3.745543 6.386496 9.310973
8.168106 8.099257 10.071998 9.859104
[31] 4.550607 9.565586 13.026911 10.760466 6.179410
6.145177 11.020416 4.225308 8.869645 7.821152 13.967236
8.147773 7.986627 12.828381 7.458042
[46] 10.074800 10.926853 9.183202 8.053423 12.490946
9.338947 9.490985 13.841918 16.326775 11.502313 12.895230
7.274404 11.440206 7.916169 10.752654
[61] 6.616958 8.982546 6.972259 11.225487 13.709144
14.103367 12.444893 7.985849 11.898044 6.354868 11.259353
5.796188 6.902509 9.614924 6.498596
[76] 14.191290 5.533718 8.761141 7.254719 15.495139
12.082052 9.552211 11.978322 12.723081 11.533814 10.292720
12.202439 16.780186 6.732536 10.416969
[91] 9.011993 9.812860 10.146173 13.790188 12.106578
6.273434 5.745464 12.551660 10.564034 14.283860

Let us compare means based on sample1.100 and


sample3.100. Recall that this is a paired data structure!
t.test(sample3.100, sample1.100, paired=TRUE)
Paired t-test
data: sample3.100 and sample1.100
t = 6.7964, df = 99, p-value = 8.146e-10
alternative hypothesis: true difference in means is not equal to
0
95 percent confidence interval:
0.1480472 0.2701371
sample estimates:
mean of the differences
0.2090921

11

Vous aimerez peut-être aussi