Vous êtes sur la page 1sur 6

STATISTISCS 30001 - CLASSES 15/21

TUTORING SESSION 3

Exercise 1
The following table contains information about the horse power and consumption on a
sample of 11 cars:

HP (X) Consumption (l/100km) (Y)


150 6.5
160 6.4
190 9.1
180 8.9
130 6.1
110 5.9
130 6.3
150 6.5
190 8.1
170 7.0
170 6.3

1. Draw a boxplot for the variable consumption.


2. Which one of the two variable shows the highest variability?
3. BONUS: Are the two variables highly correlated? Justify your answer with a graph
and an appropriate index.

SOL
1. Use Q1 = 6.3, Q3 = 8.1, Median = 6.5, min = 5.9, max = 9.1 to draw the boxplot
2. sHP2 = 681.8182, scons2 = 1.3089, CVHP = 0.166 and CVcons = 0.1632. Therefore HP is slightly
more variable than consumption
3. Draw the scatterplot using the pairs of observed (HP, cons), the sample correlation coefficient is r
= 0.8043
Exercise 2

A market research was aimed to study the brand preferences of male and female customers
who bought a laptop in the last year. Data were collected on a sample of 180 customers across
different European countries, and are given in the following two-entry table

Gender Male Female


Brand
Apple 35 40
LG 30 15
Sony 25 13
Panasonic 15 7

1. Which percentage of men in the sample bought Sony? Which percentage of the sample
is female and bought Apple?
2. Compute the mode of Brand among women, and the mode of Brand in the sample
3. With an adequate graph discuss whether the variable Brand and Gender are dependent

SOL
1. The percentage of men who bought Sony is 23,8%, while 22,23% of the sample is female and
bought Apple.
2. The mode of Brand among women is Apple, as is for the whole sample
3. As shown in the graph below the two variables are dependent. For example, while there is no
predominant brand among men, the preferences of women lean strongly towards Apple.
Exercise 3

Twelve French families were asked questions about their TV subscription in 2011 and data
are reported in the following table:

Number of family Type of Monthly expense for


members subscription subscription
2-4 P 55
1 B 29,5
2-4 B 45
1 P 65
>4 P 70
1 B 29
1 M 38
>4 P 42
1 B 32,5
>4 B 40
2-4 M 62
2-4 P 65

1. Compute the mean of the expenses for families with a basic subscription, and the mean
for families with premium subscription.
2. Discuss the association between number of family members and type of subscription
through an appropriate graph.

SOL
1. mean(basic) = 35.2
mean(premium) = 59.4

2. To discuss the association build the stacked or component bar chart using the following
conditional frequancies

subscription B M P
family members
1 3/5 = 0.6 1/5 = 0.2 1/5 = 0.2
2-4 1/4 = 0.25 1/4 = 0.25 2/4 = 0.5
>4 1/3 = 0.3334 0 2/3 = 0.6666

Since the conditional frequencies Fr(subscription|family member) appear to be different, we


can conclude the two variables are associated
(dark grey = B, mild grey = M, light grey = P)

Exercise 4

In a survey of 200 married couples, information on the number of children (X) and the yearly
income of the couple in thousands of Euros (Y) was collected. The resulting data are
summarized in the following two-way table:

Y/X 0 1 2
[0,30) 10 50 60
[30,60) 4 20 36
[60,90) 6 8 6

1. Determine the frequency distribution of the variable “Number of children” and provide
an appropriate graphical representation.
2. Compute the means of the variable “Number of children” with subpopulations obtained
for the different values of the variable “Yearly income”.
3. Calculate the mean and variance for the two variables X and Y. Compare the variability
of the two variables by using an appropriate index.
4. BONUSIn the same survey, the variable “Yearly expense for goods” (Z) in thousands of
Euros was also collected. If you are told that
200

�(𝑧𝑧𝑖𝑖 − 𝑧𝑧̅)(𝑦𝑦𝑖𝑖 − 𝑦𝑦�) = 24.3


𝑖𝑖=1
and 𝑧𝑧̅ = 14, calculate the covariance between “Yearly expense for goods” and “Yearly
income”.
SOL

1. We need the following table and chart:

2. We need to compute the following:

3. We need to compute the following:

The variable Y fluctuates more than the variable X.


24300
4. 𝐶𝐶𝑜𝑜𝑜𝑜(𝑌𝑌, 𝑍𝑍) = 199
≈ 122.11.
Exercise 5
Coffee shop customers were randomly surveyed and asked to select a category that described
the cost of their recent purchase. The results were as follows:

Find the sample mean and standard deviation of these costs.

SOL
The frequencies are the number of customers for each cost category. The computations for the
mean and the standard deviation are set out in the following table

∑𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 𝑚𝑚𝑖𝑖 112
The sample mean is estimated by 𝑥𝑥̅ = = = 5.6. Since we are working with sample
𝑛𝑛 20
∑𝑘𝑘
𝑖𝑖=1 𝑓𝑓𝑖𝑖 (𝑚𝑚𝑖𝑖 − 𝑥𝑥̅ )
2 120.8
data, the sample variance is: 𝑠𝑠 2 = = =6.3579. Thus 𝑠𝑠 = √6.3579 = 2.52.
𝑛𝑛−1 19

Vous aimerez peut-être aussi