Académique Documents
Professionnel Documents
Culture Documents
1) Read Chapter 1 (all), Chapter 2 (only sections 2.1, 2.2 and 2.3), and Chapter 3 (only
3.1, 3.2, and 3.3).
Answer: Completed
Classify the following attributes as binary, discrete, or continuous. Also classify them as
qualitative (nominal or ordinal) or quantitative (interval or ratio). Some cases may have
more than one interpretation, so briefly indicate your reasoning if you think there may be
some ambiguity.
(h) ISBN numbers for books. (Look up the format on the Web.)
Answer: Discrete, Qualitative, Nominal
Reason: It cannot be ordinal since ISBN numbers are sold in batches to a publisher and
comparing to ISBN numbers from two different publishers it may not be possible to say
which book came in the market first.
(i) Ability to pass light in terms of the following values: opaque, translucent, transparent.
Answer: Discrete, Qualitative, Ordinal
(m) Coat check number. (When you attend an event, you can often give
your coat to someone who, in turn, gives you a number that you can
use to claim your coat when you leave.)
Answer: Discrete, Qualitative, Nominal
Reason: It is nominal if a distinct token is chosen from a bunch of tokens and attached to
the coat and the counter-part is given to the person. It could be “Ordinal” if coat
numbers were monotonically increasing in the order the coats were checked in.
> setwd("G:/Shared/Stats202/Homework1");
> getwd();
[1] "G:/Shared/Stats202/Homework1"
> data <- read.csv("myfirstdata.csv",header=FALSE);
> data[1:5,]
V1 V2
1 0 0
2 0 3
3 0 1
4 1 2
5 0 0
> is.factor(data[,1])
[1] FALSE
> is.factor(data[,2])
[1] FALSE
b) Use the command plot() in R to make a plot for each column by entering plot(data[,1])
and plot(data[,2]). Explain exactly what is being plotted in each of the two cases. Include
these two plots in your homework.
Answer:
> plot(data[,1])
Below is the plot with index in X axis and Column 1 of data in Y axis
25
20
15
data[, 1]
10
5
0
Index
> plot(data[,2])
20
10
0
Index
c) Use the R functions mean(), max(), var() and quantile(,.25) to compute the mean,
maximum, variance and 1st quartile respectively of the data in the first column. Show
your R code and the resulting values.
Answer:
> mean(data[,1])
[1] 1.593
> max(data[,1])
[1] 27
> var(data[,1]);
[1] 4.526614
> quantile(data[,1], probs = seq(0, 1, 0.25), na.rm =
FALSE, names = TRUE, type = 7)
0% 25% 50% 75% 100%
0 0 1 2 27
4) Chapter 3 textbook problem #2 on page 142: Identify at least two advantages and two
disadvantages of using color to visually represent information.
Answer:
Advantages:
(2) Color is also used to easily represent “order” in numeric values. Typically, range of
the displayed variable is mapped to a color map (similar to the wavelength / temperature
of visible light). “Warm” colors near red are used to represent larger numeric values
where as “cool” colors near blue are used to represent smaller numeric values. For
example, in radio therapy treatment planning colour based contour map is shown to
highlight low dosed and high-dosed region within patient anatomy.
Disdvantages:
(1).If color is not used judiciously, it may clutter a display or even provide wrong
information.
(2) In some instances, for example in 2D X-ray projection images and 3D computed
tomography images, gray scale values convey better information including anatomy
boundary etc due to its similarity to X-ray film based imaging based on which most
physicians are trained.
(2) .Color blind people may miss information if it is displayed only via color.
http://sites.google.com/site/stats202/homework-1/CA_house_prices.csv
and a sample of 10,000 Ohio house prices at
http://sites.google.com/site/stats202/homework-1/OH_house_prices.csv
Download both data sets to your computer. Note that the house prices are in thousands of
dollars.
Answer: Completed
a) Use R to produce a frequency histogram for only the California house prices. Use
intervals of width $500,000 beginning at 0 and ending at $3.5 million. Include the R
commands and the plot. Put your name in the title of the plot.
526
500
400
Frequency
300
246
200
100
57 64
10 4
0
Price in $
b) Use R to produce a plot showing relative frequency polygons for both the California
prices and the Ohio prices on the same graph. Include a legend. Use the midpoints of the
intervals from the previous exercise. (The first point should be at $250,000 and the last at
$3.25 million). Include the R commands and the plot. Put your name in the title of the
plot.
Answer:
caHousePrice <-
read.csv("CA_house_prices.csv",header=FALSE);
caHousePrice[1:5,];
caHousePrice <- caHousePrice * 1000;
caHousePrice[1:5,];
hist(caHousePrice[,1], br = c(500000*0:7), labels = TRUE,
xlab = "Price in $", main = "Histogram of CA House Prices:
Supratik Bose")
caHousePrice <-
read.csv("CA_house_prices.csv",header=FALSE);
caHousePrice[1:5,];
caHousePrice <- caHousePrice * 1000;
caHousePrice[1:5,];
ohHousePrice <-
read.csv("OH_house_prices.csv",header=FALSE);
ohHousePrice[1:5,];
ohHousePrice <- ohHousePrice * 1000;
ohHousePrice[1:5,];
ohHist <- hist(ohHousePrice[,1], br =
c(500000*0:7),plot=FALSE);
ohHist$counts <- ohHist$counts / sum(ohHist$counts);
ohColor <- "red"
caHist <- hist(caHousePrice[,1], br =
c(500000*0:7),plot=FALSE);
caHist$counts <- caHist$counts / sum(caHist$counts);
caColor <- "blue"
plot(ohHist$mids, ohHist$counts, col= ohColor, pch=21, xlab
= "Price in $", ylab = "relative frequency", main =
"Relative Frequency Polygon of CA and OH House Prices:
Supratik Bose")
lines(ohHist$mids, ohHist$counts,col= ohColor,lty=1)
points(caHist$mids, caHist$counts, col= caColor,pch=22)
lines(caHist$mids, caHist$counts,col= caColor,lty=2)
legend('topright',c('OH','CA'), col = c(ohColor , caColor
), pch=21:22, lty=1:2)
c) Use R to plot the ECDF of the California houses and Ohio houses on the same graph.
Include a legend. Include the R commands and the plot. Put your name in the title of the
plot.
plot(ecdf(ohHousePrice[,1]), verticals= TRUE,do.p = FALSE,
col.h=ohColor,col.v=ohColor,lwd=2, xlab = "Price in $",
ylab = "ECDF", main = "ECDF of CA and OH House Prices:
Supratik Bose")
lines(ecdf(caHousePrice[,1]), verticals= TRUE,do.p = FALSE,
col.h=caColor,col.v=caColor,lwd=4)
legend('bottomright',c('OH','CA'), col = c(ohColor ,
caColor ), ,lwd=c(2,4))
ECDF of CA and OH House Prices: Supratik Bose
1.0
0.8
0.6
ECDF
0.4
0.2
OH
0.0
CA
Price in $