Vous êtes sur la page 1sur 7

Liao HW 3

2-11 We have data on 82 luminosities with four different treatments.


geneexpression=read.delim("http://sites.williams.edu/rdeveaux/files/2014/09/GeneExpression.txt")

The first treatment that will be discussed here is the highdose treatment. Here's the
summary of the highdose treatment:
with(subset(geneexpression,Treatment=="Highdose"),summary(Luminosity))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


12.9 23.5 46.4 54.0 72.7 135.0

with(subset(geneexpression,Treatment=="Highdose"),boxplot(Luminosity))

The spread of the highdose treatment is very small compared to the other treatments (notice
the numbers on the y-axis) and there are no outliers.
The second treatment is Control 1.
with(subset(geneexpression,Treatment=="Control 1"),boxplot(Luminosity))

with(subset(geneexpression,Treatment=="Control 1"),summary(Luminosity))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


15.8 102.0 255.0 395.0 492.0 1480.0

The spread of the Control 1 treatment is large compared to the highdose treatment but
relatively small compared to the Control 3 treatment. There are 3 outliers.
The third treatment is Control 2.
with(subset(geneexpression,Treatment=="Control 2"),boxplot(Luminosity))

with(subset(geneexpression,Treatment=="Control 2"),summary(Luminosity))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


20.2 107.0 318.0 439.0 663.0 2030.0

The spread of the Control 2 treatment is a little bit larger than that of Control 2 but still
relatively small when compared to that of Control 3. There is 1 outlier.
The last treatment is Control 3.
with(subset(geneexpression,Treatment=="Control 3"),boxplot(Luminosity))

with(subset(geneexpression,Treatment=="Control 3"),summary(Luminosity))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


67.1 186.0 388.0 874.0 1490.0 3010.0

The spread of the Control 3 treatment is that largest of all but it has no outliers.
Below are the same set of data but summarized and graphed with log(Luminosity) instead
of Luminosity.
with(subset(geneexpression,Treatment=="Highdose"),boxplot(log(Luminosity)))

with(subset(geneexpression,Treatment=="Highdose"),summary(log(Luminosity)))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


2.56 3.16 3.84 3.72 4.29 4.90

with(subset(geneexpression,Treatment=="Control 1"),boxplot(log(Luminosity)))

with(subset(geneexpression,Treatment=="Control 1"),summary(log(Luminosity)))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


2.76 4.63 5.54 5.42 6.20 7.30

with(subset(geneexpression,Treatment=="Control 2"),boxplot(log(Luminosity)))

with(subset(geneexpression,Treatment=="Control 2"),summary(log(Luminosity)))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


3.01 4.67 5.75 5.47 6.50 7.62

with(subset(geneexpression,Treatment=="Control 3"),boxplot(log(Luminosity)))

with(subset(geneexpression,Treatment=="Control 3"),summary(log(Luminosity)))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


4.21 5.23 5.96 6.17 7.29 8.01

It is easier to describe using log(Luminosity) instead of just Luminosity beacuse the


numbers are smaller and it is easier to comprehend and compare.
2-12
wine=read.delim("http://sites.williams.edu/rdeveaux/files/2014/09/wine-sugars.txt")
with(wine,summary(sugar))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


1.40 2.10 2.45 2.75 2.62 13.80

layout(mat = matrix(c(1,2),2,1, byrow=TRUE), height = c(1,3))


par(mar=c(3.1, 3.1, 1.1, 2.1))
with(wine, boxplot(sugar, horizontal=TRUE, outline=TRUE, frame=F))
with(wine, hist(sugar))

with(wine, summary(sugar))
##
##

Min. 1st Qu. Median Mean 3rd Qu. Max.


1.40 2.10 2.45 2.75 2.62 13.80

a)

The distribution of sugar is slightly skewed to the left with a couple of outlers above
3. These are separated from the main distribution and not considered when describing
the left-skewed distribution.

with(subset(wine, sugar>4), table(sugar))


## sugar
## 4.2 4.3 13.8

## 4.2 4.3 13.8


## 1 1 1
3/40
## [1] 0.075

b)
c)

7.5% have a residual sugar content of 4g or more.


Yes, there are 4 outliers with sugar content of 4g or more because 4g or more is more
than 1.5 IQR of the standard deviation.

layout(mat = matrix(c(1,2),2,1, byrow=TRUE), height = c(1,3))


par(mar=c(3.1, 3.1, 1.1, 2.1))
with(subset(wine,sugar<3.7), boxplot(sugar, horizontal=TRUE, outline=TRUE, frame=F))
with(subset(wine,sugar<3.7), hist(sugar))

d) Removing the outliers, the distribution of the sugar is still slightly skewed to the left.
The spread is smaller without the outliers and the mean/median is lower too.
e) The mean is bigger than the median in part a because part a takes into account the
outliers too, therefore affecting the mean.
2-13
means=do(1000)*mean(sugar,data=resample(wine))
hist(means$result)

b) The shape of the means is skewed to the right.


c) The distribution is more unimodal and symmetric than the one in 12a because this one is
not affected by outliers.
d) If the data set does not contain outliers, the distribution of the means may not
necessarily be more unimodal and symmetric.
2-14
quantile(means$result,c(0.025,.5,.975))
## 2.5% 50% 97.5%
## 2.319 2.726 3.473

a)
b)

95% of the time the mean of a sample of 40 is between 2.32 and 3.39.
Because 2.54 is between the confidence range using the bootstraping method.

Vous aimerez peut-être aussi