Hypothesis Testing in R

Hypothesis Testing in R
One-sample t-tests
Lets generate an independent random sample of size 100 from
the N(mean=10, variance=9) distribution
set.seed(10)
sample1.100<- rnorm(100, mean=10, sd=3)
sample1.100
[1] 10.056239 9.447242 5.886008 8.202497 10.883635 11.169383 6.375771
[8] 8.908972 5.119982 9.230565 13.305339 12.267345 9.285299 12.962334
[15] 12.224170 10.268042 7.135168 9.414549 12.776564 11.448936 8.211068
[22] 3.444139 7.975402 3.642816 6.204406 8.879015 7.937334 7.383524
[29] 9.694717 9.238658 4.438779 9.766162 12.905699 10.554778 5.860169
[36] 5.693457 11.086262 4.722740 9.026368 8.045311 13.259654 7.712365
[43] 7.514012 12.503422 7.097044 9.913554 10.697575 9.096374 7.967156
[50] 11.965683 8.798087 8.996330 14.103862 16.413301 11.517458 12.359027
[57] 7.293364 11.598691 8.062317 10.872962 6.287217 8.631471 7.509032
[64] 11.020347 13.199129 13.648378 12.207072 8.556374 11.688234 6.261041
[71] 11.142767 5.708718 6.854663 9.344489 5.530191 13.518119 5.560519
[78] 8.708837 6.845084 14.567759 11.778484 9.332015 12.138683 12.149803
[85] 11.320726 10.476492 11.979292 16.661559 6.448165 9.778132 8.750936
[92] 9.425553 10.208634 13.466045 11.784872 5.741065 5.179968 12.678778
[99] 10.444504 13.681085
Note that you have to feed in the standard deviation in the

formula above, not the variance!
Suppose that I want to test whether or not the true mean is zero
or not. Of course, we know that the true mean is 10, so expect to
reject the null hypothesis (that the mean is equal to zero).
We choose a two-sided alternative that the mean is different
from zero!
t.test(sample1.100,alternative=c(two.sided))
One Sample t-test
data: sample1.100
t = 33.9637, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
9.030068 10.150638
sample estimates:
mean of x
9.590353
Now, what if we want to choose a different alternative (onesided)? Such as, the mean is greater than zero?
t.test(sample1.100,alternative=c(greater))
One Sample t-test
data: sample1.100
t = 33.9637, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is greater than 0
9.121507
Inf
sample estimates:
mean of x
9.590353
How about if we want to test that the mean is less than zero?
t.test(sample1.100,alternative=c(less))
One Sample t-test
data: sample1.100
t = 33.9637, df = 99, p-value = 1
alternative hypothesis: true mean is less than 0
-Inf 10.0592
sample estimates:
mean of x
9.590353
Now, I want to test whether or not the mean in the first

population is equal to 9 or not. Of course, we know that the true
mean is 10, so expect to reject the null hypothesis (that the mean
is equal to 9).
from 9!
t.test(sample1.100, mu=9 , alternative=c(two.sided))
One Sample t-test
data: sample1.100
t = 2.0907, df = 99, p-value = 0.03912
9.030068 10.150638
sample estimates:
mean of x
9.590353
Now, what if we want to choose a different alternative (onesided)? Such as, the mean is greater than 9?
3
t.test(sample1.100, mu=9, alternative=c(greater))

One Sample t-test
data: sample1.100
t = 2.0907, df = 99, p-value = 0.01956
9.121507
Inf
sample estimates:
mean of x
9.590353
How about if we want to test that the mean is less than 9?

t.test(sample1.100, mu=9, alternative=c(less))
One Sample t-test
data: sample1.100
t = 2.0907, df = 99, p-value = 0.9804
-Inf 10.0592
sample estimates:
mean of x
9.590353
Now, I want to test whether or not the mean in the first

population is equal to 10 or not. We know that the true mean is
10, so expect to fail to reject the null hypothesis (that the mean
is equal to 10).
from 10!
t.test(sample1.100, mu=10 , alternative=c(two.sided))
One Sample t-test

data: sample1.100
t = -1.4507, df = 99, p-value = 0.15
9.030068 10.150638
sample estimates:
mean of x
9.590353
Now, what if we want to choose a different alternative (onesided)? Such as, the mean is greater than 10?
t.test(sample1.100, mu=10, alternative=c(greater))
One Sample t-test
data: sample1.100
t = -1.4507, df = 99, p-value = 0.925
9.121507
Inf
sample estimates:
mean of x
9.590353
How about if we want to test that the mean is less than 10?
t.test(sample1.100, mu=10, alternative=c(less))
One Sample t-test
data: sample1.100
t = -1.4507, df = 99, p-value = 0.075
-Inf 10.0592
sample estimates:
mean of x
9.590353
Two-sample T-tests
Suppose that we now generate independently a second sample of

size 100 from the N(mean=5, variance=9) distribution.
sample2.100<- rnorm(100, mean=5, sd=3)
sample2.100
[1] 2.71458698 6.25812622 1.88016991 7.13472190 3.10036096
6.68952399 6.98296006 0.02584743 8.08450393 8.38386084
1.15953619 8.38660468
[13] 3.60759642 4.05271937 7.77287944 5.23143417
8.11977082 7.22565862 8.76663457 7.85275690 3.55590318
5.60864533 4.90478077 1.41325910
[25] 6.87104371 2.25558655 5.74627402 1.81213162
3.90805326 1.37901544 9.28763834 6.90030767 -0.99044685
2.95450348 3.61983356 2.05079242
[37] 6.48599514 7.17745250 7.00189620 7.86435931 0.02599654 1.38444382 -0.88975747 9.41225693 6.11741702
8.19763800 6.59194961 5.30595034
[49] 9.01334740 5.26170431 3.82668738 4.25039755
8.46531424 2.40581828 2.39996497 -1.96305109 6.82649051
8.45001814 1.40120698 0.25999774
[61] 6.95949858 3.35177454 6.56316358 2.90179080
3.68327206 2.96804211 7.87742358 0.59548001 5.55129168
0.69455845 1.58780031 3.75606402
[73] 5.43180286 8.18607299 3.28761829 8.83154413
5.68486796 4.07356081 7.87948739 6.64646712 6.27653928
6.93050011 0.91908157 4.40448168
[85] 6.85790803 11.20462882 4.08414574 5.84373684
7.07395201 5.13908431 5.33908808 7.98599562 2.95654592
1.16882826 0.59390675 4.05957780
[97] -0.11097848 0.94845603 1.69371897 1.70137096
I would now like to compare the means is the two populations

that I have drawn my samples from. As a reminder, the first
population has mean 10, while the second one has mean 5. They
6
both have variance equal to 9. We will assume that we know

that the variances are equal, but unknown!
t.test(sample1.100, sample2.100, var.equal=TRUE)
Two Sample t-test
data: sample1.100 and sample2.100
t = 12.0268, df = 198, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to
0
4.075852 5.674630
sample estimates:
mean of x mean of y
9.590353 4.715112
By specifying var.equal=TRUE, the pooled variance formula is

used in the computations (see Lecture Notes 12!)
Note that we have tested whether or not the true difference of
the two means is equal to zero! What if we want to test if it is
equal to 3?
t.test(sample1.100, sample2.100, mu=3, var.equal=TRUE)
Two Sample t-test
t = 4.626, df = 198, p-value = 6.724e-06
3
4.075852 5.674630
sample estimates:
mean of x mean of y
9.590353 4.715112
What if we want to test if the true mean difference is greater

than 3?
t.test(sample1.100, sample2.100, mu=3, var.equal=TRUE,
alternative=c(greater))
Two Sample t-test
t = 4.626, df = 198, p-value = 3.362e-06
alternative hypothesis: true difference in means is greater than
3
4.205339
Inf
sample estimates:
mean of x mean of y
9.590353 4.715112
What if we want to test if the true mean difference is greater

than 7?
alternative=c(greater))
Two Sample t-test
t = -5.2416, df = 198, p-value = 1
alternative hypothesis: true difference in means is greater than
7
4.205339
Inf
sample estimates:
mean of x mean of y
9.590353 4.715112
What if we want to test if the true mean difference is less than 7?

alternative=c(less))
8
Two Sample t-test

t = -5.2416, df = 198, p-value = 2.032e-07
alternative hypothesis: true difference in means is less than 7
-Inf 5.545143
sample estimates:
mean of x mean of y
9.590353 4.715112
If we do not assume that we know that the variances are equal,

then the number of degrees of freedom of the test statistic will
be approximated!
t.test(sample1.100, sample2.100, var.equal=FALSE)
Welch Two Sample t-test
t = 12.0268, df = 197.827, p-value < 2.2e-16
0
4.075848 5.674634
sample estimates:
mean of x mean of y
9.590353 4.715112
The resulting t-test is called Welchs T test!
Confidence Intervals for the Difference of the true Means

To construct a 90% confidence interval for the difference of the
true means, we use
t.test(sample1.100, sample2.100, var.equal=TRUE,
conf.level=0.90)
Two Sample t-test
t = 12.0268, df = 198, p-value < 2.2e-16
0
4.205339 5.545143
sample estimates:
mean of x mean of y
9.590353 4.715112
Paired T-tests
We now want to create a paired structure, so we will do it in a
simple way! To the first sample sample1.100, we will add
some noise N(mean=0, variance=0.1) and some signal
equal to 0.2. Call the resulting sample sample3.100.
Then, this latest sample comes from a population N(mean=10.2,
variance=9.1).
sample3.100 <-0.2+rnorm(100, mean=0,sd=sqrt(0.1))+
sample1.100
10
sample3.100
[1] 10.640618 9.751875 6.525652 8.678262 10.741851
11.526175 6.908642 8.705892 5.258739 9.021023 13.550205
12.866334 9.348847 12.585942 12.535655
[16] 10.041624 7.558959 9.484614 12.835512 11.319693
8.306958 3.554703 8.312148 3.745543 6.386496 9.310973
8.168106 8.099257 10.071998 9.859104
[31] 4.550607 9.565586 13.026911 10.760466 6.179410
6.145177 11.020416 4.225308 8.869645 7.821152 13.967236
8.147773 7.986627 12.828381 7.458042
[46] 10.074800 10.926853 9.183202 8.053423 12.490946
9.338947 9.490985 13.841918 16.326775 11.502313 12.895230
7.274404 11.440206 7.916169 10.752654
[61] 6.616958 8.982546 6.972259 11.225487 13.709144
14.103367 12.444893 7.985849 11.898044 6.354868 11.259353
5.796188 6.902509 9.614924 6.498596
[76] 14.191290 5.533718 8.761141 7.254719 15.495139
12.082052 9.552211 11.978322 12.723081 11.533814 10.292720
12.202439 16.780186 6.732536 10.416969
[91] 9.011993 9.812860 10.146173 13.790188 12.106578
6.273434 5.745464 12.551660 10.564034 14.283860
Let us compare means based on sample1.100 and

sample3.100. Recall that this is a paired data structure!
t.test(sample3.100, sample1.100, paired=TRUE)
Paired t-test
t = 6.7964, df = 99, p-value = 8.146e-10
0
0.1480472 0.2701371
sample estimates:
mean of the differences
0.2090921
11

Hypothesis Testing in R

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Hypothesis Testing in R

Transféré par

Droits d'auteur :

Formats disponibles

Hypothesis Testing in R

Note that you have to feed in the standard deviation in the

Now, I want to test whether or not the mean in the first

t.test(sample1.100, mu=9, alternative=c(greater))

How about if we want to test that the mean is less than 9?

Now, I want to test whether or not the mean in the first

One Sample t-test

Suppose that we now generate independently a second sample of

I would now like to compare the means is the two populations

both have variance equal to 9. We will assume that we know

By specifying var.equal=TRUE, the pooled variance formula is

What if we want to test if the true mean difference is greater

What if we want to test if the true mean difference is greater

What if we want to test if the true mean difference is less than 7?

Two Sample t-test

If we do not assume that we know that the variances are equal,

The resulting t-test is called Welchs T test!

Confidence Intervals for the Difference of the true Means

Let us compare means based on sample1.100 and

Vous aimerez peut-être aussi