11 vues

Transféré par api-203629011

- The Effect of Corporate Income Tax on Financial Performance of Listed Manufacturing Firms in Ghana
- Hu and Bentler 1998
- MB0024 - Completed
- Calibration Intervals From Variables Data(2)
- Analyze Split Plot Experiment
- Jurnal Ing
- Association and Correlation
- Ststistical Treatment of Data
- 40 2 Intvl Est Var
- 91272
- Bibliography
- Econometric Exercise (1)
- Research Paper-Correlation Between Math and Science and Visual Arts
- SR USA
- Jalgaon 15-1-2011
- Marketing Online Degree Programs
- ikm2
- translate dong2.docx
- BRM FINAL1
- Hasil Spss New

Vous êtes sur la page 1sur 17

MATHEMATICS

2nd Semester

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020

Exercise 1. (Tutorial for lesson page 5)

Are people’s behaviour in relation to tobacco and people’s gender related, with a 10% significant level?

Here are the results of a survey made on a sample of 51 men and 66 women:

G : variable "gender" B : variable "behaviour in relation to tobacco"

Gm : men Bn : never smoked

Gw : women Bs : smoke

Bss : stopped smoking

frequencies: according to H0: Detailed Chi-squares and total:

Gm Gw Gm Gw Gm Gw

Bn 12 23 35 Bn 15.26 19.74 35 Bn 0.69507 0.53710

Bs 31 26 57 Bs 24.85 32.15 57 Bs 1.52417 1.17777

Bss 8 17 25 Bss 10.90 14.10 25 Bss 0.77038 0.59529

51 66 117 51 66 117 5.300

1) Place the subtotals and the general total in the first table, and in the second one, identically.

2) Fill the second table (6 central theoretical values) following proportional calculations.

3) Table #3: calculate the six Chi-square, then add them to get the value χ²calc.

4) Test writing:

Null hypothesis: H0 : Gender and tobacco behaviour are independent

Observed χ²

Value of the variable χ² between the observed and the theoretical samples: χ²calc = 5.3

Rejection area

Significance level: α = 10 %

Number of dof: (r-1)(k-1) = (3 – 1)(2 - 1) = 2

Value of the variable χ² limit until rejection : χ²lim = 4.61

Comparison and decision:

As χ²calc > χ²lim , H0 can be rejected, at a 10% significance level.

In other words, we can say with less than 10% risk of being wrong, that men and women behave

differently with tobacco. However, we could not reject our null hypothesis at a 5% significance level:

χ²lim is 5.99 in such conditions, and so isn’t reached by χ²calc , thus showing us that claiming dependence

is done with more than 5% risk of being wrong.

Exercise 2.

Two candidates A and B compete for a presidential election. In a little town, there are 500 voters. 100 are

retired people, 50 are unemployed and 350 are employees. There, the vote results are:

candidates blank/

A B

voters abstention

unemployed 24 16 10

employees 122 148 80

retired 36 27 37

1) Decide, with a 1% significance level, whether people’s opinion depends on their social group or not.

* H0: "The type of vote is independent of the social group"

* Let’s perform the necessary calculations in order to get χ²calc:

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 1 / 16

observations in theory (indep.) Chi-square

24 16 10 50 18.2 19.1 12.7 50 1.848 0.503 0.574

122 148 80 350 127.4 133.7 88.9 350 0.229 1.529 0.891

36 27 37 100 36.4 38.2 25.4 100 0.004 3.284 5.298

182 191 127 500 182 191 127 500 Chi²calc = 14.16

* Decision: as Chi²calc > Chi²lim, we can reject H0 (so: claim that People’s opinion depends on their social group)

with a 1% chance of being wrong.

Let’s take back the analysis, excluding blank votes and abstentions:

* observations in theory (indep.) Chi-square

24 16 40 19.52 20.48 40 1.03 0.981

122 148 270 131.7 138.3 270 0.72 0.687

36 27 63 30.74 32.26 63 0.9 0.858

182 191 373 182 191 373 Chi²calc = 5.175

We can assess that people’s opinion depends on their social group, with 10 % chances of being wrong, but we

couldn’t assess it if we wanted to take only 5 % chances of being wrong.

Exercise 3.

The table shows attendance in two stores A and B: how many people store

made at least one purchase. These clients are sorted by age group (10 to age A B

15 years old, and so on). 10 - 15 46 24

15 - 20 29 35

1. Say, with a 5% significance level, whether the chosen store depends on 20 - 40 14 17

the age of a client. > 40 12 18

obs A B th A B χ² A B

10 to 15 46 24 70 10 to 15 36.26 33.74 70 10 to 15 2.6185 2.8135 5.4320

15 to 20 29 35 64 15 to 20 33.15 30.85 64 15 to 20 0.5192 0.5579 1.0771

20 to 40 14 17 31 20 to 40 16.06 14.94 31 20 to 40 0.2634 0.2830 0.5464

40 + 12 18 30 40 + 15.54 14.46 30 40 + 0.8058 0.8658 1.6716

101 94 195 101 94 195 4.2069 4.5202 8.727

* with 3 dof and a 5% level, the table gives χ²lim = 7.815.

* Thus, this limit value has been exceeded. With a 5 % significance level, we can reject the hypothesis that

the choice of the store and the age group are independent.

The age group « 10 to 15 year old » mostly contributes to the total χ². It could be easily stated that people

that are over 15 year old show quite the same purchasing behaviour. On the contrary, the first age group

shows a very different frequency distribution (first table, in blue), compared to other customers.

3) Give the meaning of the “5% significance level” on your first answer.

We assume the dependence between age and chosen store with a 5 % chance to be wrong.

4) According to your Chi² table, can you be more accurate about the chance taken in this statement (your first

answer)?

If we wanted to reach a 2% level, χ²calc would have been more than 9.837, but our value isn’t. So, the χ²

table (form) doesn’t allow us to say more than “the risk is between 2% and 5%”.

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 2 / 16

Exercise 4.

In a survey, 100 people were asked about their age and their attendance at theatres (cinema). We name X the

variable "age" and Y the variable "number of annual cinema shows". The survey result is the following table of

quotes (fr.: citations) :

Y X [15 ; 25[ [25 ; 50[ ≥ 50

none 4 6 13

1 to 11 10 16 15

12 to 23 13 8 4

≥ 24 6 3 2

1) By a χ² independence test, with a 2% significance level, decide whether there’s a link or not between the

age and the level of attendance at the cinema.

Y X [15 ; 25[ [25 ; 50[ 50 and more total

obs th χ² obs th χ² obs th χ² obs th χ²

none 4 7.59 1.698 6 7.59 0.333 13 7.82 3.431 23 23 5.462

1 to 11 10 13.53 0.921 16 13.53 0.451 15 13.94 0.081 41 41 1.453

12 to 23 13 8.25 2.735 8 8.25 0.008 4 8.5 2.382 25 25 5.125

≥ 24 6 3.63 1.547 3 3.63 0.109 2 3.74 0.81 11 11 2.466

total 33 33 6.901 33 33 0.901 34 34 6.704 100 100 14.51

With 6 dof and α = 2%, the χ² table gives Chi²lim = 15.03.

Our Chi²calc (14.51) doesn’t exceed it. So, at a 2% significance level, we can’t reject the idea that age and

level of attendance at the cinema are independent.

2) Using your form table, discuss the level of confidence you can assign to the assertion : “they are

dependent”.

Our Chi²calc (14.51) is located between both Chi²lim of levels 2% and 5%. Thus, we can assume dependence

with more than 95% confidence, but with less than 98% confidence.

3) Identify the most important partial Chi-2s and give the meaning of these high values.

The biggest partial Chi² has been obtained with the “50 year old and more” whose attendance is zero: the

observed frequency (13) is much higher than the expected one (7.82).

The partial Chi² of the “50 year old and more” whose attendance is “between 12 and 23 times a year” is big

too: the observed frequency is much lower than the theoretical one

The partial Chi² of the “15 to 25 year old” whose attendance is “between 12 and 23 times a year” is big too:

the observed frequency is much higher than the theoretical one.

Exercise 5.

Using the data series introduced in the exercice 11, decide, by the mean of a Chi-square test, whether both

variables are independent or not.

X obs th χ² obs th χ² obs th χ² obs th χ²

1 23 60,06 22,87 92 84,63 0,642 80 50,31 17,52 195 195 41,03

2 77 59,75 4,979 84 84,2 5E-04 33 50,05 5,809 194 194 10,79

3 42 27,72 7,356 35 39,06 0,422 13 23,22 4,498 90 90 12,28

4 12 6,468 4,731 6 9,114 1,064 3 5,418 1,079 21 21 6,875

total 154 154 39,93 217 217 2,128 129 129 28,91 500 500 70,97

With 6 dof and α = 1%, the χ² table gives Chi²lim = 16.8.

Our Chi²calc (70.97) is much bigger. There are more 99% chances of dependence between both variables.

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 3 / 16

Exercise 6. (Tutorial for lesson page 6)

Let’s have a close look of a company’s turnover evolution through time.

Year N N+1 N+2 N+3

tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4

(M€) 28 45 49 36 30 44 48 40 28 46 52 37 31 42 54 39

Though there are big seasonal variations, due to its particular activity, is it possible to find out a global

trend on several years?

× × × × × × × × × × × × ×

(do it as a group job: divide the set of calculations with your neighbours and share your results)

1-4 2-5 3-6 … 13-16

X 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5

Y 39.5 40 39.75 39.5 40.5 40 40.5 41.5 39.75 41.5 40.5 41 41.5

calculations:

The values of X (on the graph) correspond to the quantity of trimesters since the beginning:

1st trimester year N → x = 1 ; 2nd trimester year N → x = 2 ; and so on. We deduce that the values of X to be

entered in the table are 2.5, 3.5, 4.5, and so on: 1st value = mean of 1,2,3,4 = 2.5 ; 2nd value = mean of

2,3,4,5 = 3.5 ; and so on until the 13th value, which is the mean of 13,14,15,16, that equals 14.5.

The values of Y calculated in the table above are the average turnovers of the five considered trimesters.

1st value of Y = mean of 28,45,49,36 = 39.5 ; 2nd value of Y = mean of 45,49,36,30 = 40 ; and so on.

Let’s take back one of the examples introduced page 3 (lessons doc): effect of the amount of fertilizer on the

harvested production.

fertilizer harvest

-1

plot # X (kg.ha ) Y (q.ha-1)

1 150 46

2 80 37

3 120 46

4 220 51

5 100 43

Half-clouds have to be defined: since there are 5 pairs of results, let’s choose a cut in 3 points on the left and 2

points on the right (the contrary would have been allowed too), separating them by the X values (always):

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 4 / 16

1st half-cloud: (80, 37), (100, 43), (120, 46); mean point: G1(100, 42)

2nd half-cloud: (150, 46), (220, 51); mean point: G2(185, 48.5)

2) Determine the expression of the Mayer’s line (G1G2).

48.5 − 42 6.5

slope: a = = ≈ 0.07647

185 − 100 85

y = 0.07647 x + b can be written with the coordinates of G1 (for instance): 100 = 0.07647×42 + b,

which gives us b = 34.35.

Expression of the Mayer’s line: y’ = 0.07647 x + 34.35

3) On a graph, plot the initial table and draw this line.

Exercise 8.

Determine the expression of the Mayer’s line, taking back the case given in exercise 6.

The 16 values are parted in 8 for N and N+1 besides 8 for N+2 and N+3.

1 + 2 + ... + 8 9 + 10 + ... + 16

xG1 = = 4.5 xG = = 12.5 1.125

8 8 2

slope: a = = 0.140625

28 + 45 + ... + 40 28 + 46 + ... + 39 8

yG1 = = 40 yG = = 41.125

8 8 2

which gives us b = 39.367.

Expression of the Mayer’s line: y’ = 0.140625 x + 39.367

Calculate or display on your calculator: the means and standard deviations; the covariance.

1) Taking the data of exercise 7 (fertilizer/harvest)

x = 134 kg.ha-1 and y = 44.6 q.ha-1 ; σ ( X ) = 48.826 kg.ha-1 and σ (Y ) = 4.5869 q.ha-1 (Stat mode).

n

∑x y i i

30900

Cov ( X , Y ) = i =1

−xy= − 134 × 44.6 = 203.6

n 5

2) Taking the data of exercise 4 (age/# of cinema shows) – choose 60 as average age for the class 50 and more;

choose 36 as average number of shows for the class 24 and more.

x = 39.375 yo and y = 10.795 shows ; σ ( X ) = 16.422 years and σ (Y ) = 10.833 shows (Stat mode).

n

∑x y i i

36890

Cov ( X , Y ) = i =1

−xy= − 39.375 × 10.795 = −56.15

n 100

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 5 / 16

Exercise 10. (Tutorial for lesson page 9)

Let’s consider the following time series: a company’s annual expenses in advertising.

X : year 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

Y : expense (k€) 41 60 55 66 87 61 90 95 82 120 125 118

The corresponding scatter plot is represented:

year 1: 2007

Determine the expression of the Y on X fitting line, following the least square method; then, draw it.

(D) : y’ = 7.0629 x + 37.42

Exercise 11.

500 people, having passed their driving license exam, are sorted in the table below.

They are distributed with respect to the number X of times they took the exam before passing it and to the

number Y of hours of driving lessons before their first attempt.

Y

[0 ; 15[ [15 ; 25[ [25 ; 40[

1 23 92 80

2 77 84 33

X

3 42 35 13

4 12 6 3

1) Define a margin frequency. Then, give an example from the table.

A margin frequency is the total number of individuals associated to a value of one of the variables.

e.g.: 195 (margin frequency) people passed their exam following their first attempt (value: X = 1).

2) Describe, shortly, the way to enter the data set in your calculator.

We use to enter the frequencies in List3, so 12 values here; List1 and List2 will be used for entering the

corresponding X and Y values.

3) Calculate the covariance of the pair (X, Y) and give a concrete comment about this value.

16815

Cov ( X , Y ) = − 1.874 × 19.375 = −2.679 , non-positive. Globally, the more hours of driving lessons one

500

takes, the less attempts one needs to pass the exam.

4) Among those who took between 15 and 25 hours of driving lessons, what is the rate of those who passed

their exam on the third attempt? 35/217 = 16.13 %

5) Among those who passed their exam on the third attempt, what is the rate of those who took between 15

and 25 hours of driving lessons? 35/90 = 38.89 %

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 6 / 16

Exercise 12.

A sales agent wishes to analyse his (or her) activity and efficiency. On

each appointment to a prospect have been noted the length (X, in

minutes) of the presentation of the product, and the sold quantity

(Y). The twelve values inside the table show the number of

appointments that correspond to each pair (X, Y).

1) Give the meaning of the frequency "8" found inside the table.

During each of 8 appointments with prospects, the sales agent made a 10 to 20 min-long presentation and

then sold 2 units.

2) Calculate, manually, the average time spent per appointment.

Margin frequencies of the three values of X: 7, 19 and 21. The corresponding lengths are 5, 15 and 25 (in

minutes). Total number of appointments: 47.

The average time is then (5×7 + 19×15 + 21×25)/47 = 17.98 minutes per appointment (about 18 minutes).

3) Give the covariance of the pair (X, Y).

1595

Cov ( X , Y ) = − 17.9787 × 1.80851 = 1.422

47

Exercise 13.

The following table indicates the sales price (€) of an equipment and the number of sold items, for 4 years.

year rank 1 2 3 4

sales price (€) X 300 210 270 375

# of sold items Y 198 240 222 160

1) Build the scatter plot with an orthogonal frame. The axes intersection must be the point (210, 160);

scales: 1 cm for €15 on the abscissas axis, 1 cm for 10 items on the ordinates axis.

G(288.75 ; 205)

3) a. Determine the expression of the Y on X fitting line, following the least square method.

The coefficients will be expressed with 6 significant figures.

y’ = -0.498274 x + 348.876

b. Draw this regression line on the graph.

4) Which year saw the highest turnover? For which amount?

The turnover is X×Y. Its four values are: 59400, 59940, 50400 and 60000. The highest was in year # 4.

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 7 / 16

going further:

5) Now, we assume that, each year, the number of sold items y and the sales price x are related this way:

y = – 0.498 x + 349. We denote S(x) the turnover achieved by selling y items, €x each.

a. Express S(x) with respect to x.

S(X) = xy = -0.498 x² + 349 x

b. Find the variations of the function S defined in [210 ; 375].

S’(X) = -0.996 x + 349 > 0 iff x < 350.4. S is decreasing in [210 ; 350.4] and increasing in [350,4 ; 375].

c. Deduce the sales price we would have to set for a fifth year if we want a maximum turnover. How many

items will be sold (round to one unit)? For what turnover?

We have to set the sales price at €350.4. number of sold items: y = – 0.498×350.4 + 349 = 174.5.

Considering 174 items, x = €350.4/unit and turnover = €60969.6;

considering 175 items, x = €350.4/unit and turnover = €61320.

Exercise 14.

A survey wishes to compare people's expense in high tech equipment compared to their sales. Each column of

the table T below represents, in a given French land, the average monthly income of people (X) and the

average monthly expense (Y) in high-tech equipment.

land A B C D E F

income X (€) 1550 1620 1770 1850 1930 2000

expense Y (€) 57 61 66 73 76 82

1) Calculate the covariance and then the linear correlation coefficient of the pair (X, Y).

Give an interpretation of both parameters.

749720

Cov ( X , Y ) = − 1786.66667 × 69.1666667 ≈ 1375.55556 , positive, showing a global upward trend

6

of the expense, as the income increases.

1375.55556

r≈ ≈ 0.9901 , very close to 1, hence an excellent linear correlation between X and Y.

160.2775 × 8.66827

2) a. Give, by the mean of your calculator, the expression of the Y on X regression line.

y’ = 0.05355 x – 26.50

b. Obtain the expression of the Mayer's line of the series, from the table T.

Let's part the table into two groups: {A, B, C} and {D, E, F} (indeed, the values of X have already been

sorted in an ascending order). The coordinates of both mean points are:

G1(1646.6667 ; 61.333333) and G2(1926.66667 ; 77)

The Mayer's fitting line, (G1G2), has a typical expression y’ = ax + b, where

yG − yG1

a= 2 ≈ 0.05595 and b = yG1 − a × xG1 ≈ −30.80 ; ( DM ) : y ′ = 0.05595 x − 30.80

xG2 − xG1

c. Both lines slightly differ. Find the income for which they both give the same expense. What makes this

common point special, inside the point cloud?

Let's act as if we didn't already know this common point.

We can seek it by an identification of both expressions: 0.05595 x – 30.80 = 0.05355 x – 26.50.

That gives: 0.0024 x = 4.3 and then x = 1791.67. We can deduce the value of y: 69.44.

Both lines give an estimated average expense of € 69.44 €, for an average income of € 1791.67.

This common point is in fact the midpoint of the cloud: 1791.67 is the actual average value of X in the

table, and 69.44 is the average value of Y (little differences can be seen, mostly due to the rounded

slopes used four lines above).

This particularity is general, as explained in the lessons of this chapter: a least square fitting line, as well

as a Mayer's fitting line, meets Mayer's criterion, which is equivalent to "the line owns G"!

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 8 / 16

Exercise 15. (Tutorial for lesson page 12)

Data about the fuel consumption of a motorcycle have been

collected. Consumption: Y, in L/100km, speed: X, in km/h):

X 10 20 30 40 50 60 70 80 90

Y 15.2 11.6 9.3 7.8 7 6.6 6.9 8 9.6

The scatter plot, on the right, clearly shows us that a linear

regression would be inappropriate to describe the evolution of the

consumption with respect to the speed. Thus, we will propose a

variable change.

1) Let’s define the variable T by: T = (X – 60)².

Complete the following table:

Y 15.2 11.6 9.3 7.8 7 6.6 6.9 8 9.6

2) Perform a linear regression of Y on T.

Cov(T, Y) = 81280/9 – 766.66667×9.111111 = 2045.926 ; r = 2045.926/780.3133/2.62782 = 0.997759

r is very close to 1, a linear fitting is appropriate, between T and Y.

Least square regression line: y’ = 0.00336 t + 6.535

3) Thus, deduce the expression of the regression curve, for the initial scatter plot.

Regression curve of the pair (X, Y) : y’ = 0.00336 (x – 60)² + 6.535

A company took note of its profits Y with respect to X, produced and sold quantity:

X (tons) 2 3 5 7 11

Y (k€) 38 55 72 69 24

T -16 -9 -1 -1 -25

1) Thanks to your calculator, give the linear correlation coefficient between X and Y. Comment.

Cov(X, Y) = 1348/5 – 5.6×51.6 = -19.36 ; r = -19.36/3,2/18.315 = -0.3303

This is far from -1, the linear correlation is very bad between X and Y.

2) Let’s settle the variable T = -(X - 6)².

a. Complete the table.

b. Calculate Cov(T, Y) and then the linear correlation coefficient between both variables.

Cov(T, Y) = -1844/5 - (-10.4)×51.6 = 167.84 ; r = 167.84/9.2/18.315 = 0.9961

c. Is a linear fitting of Y on T appropriate?

r is very close to 1, a linear fitting is appropriate, between T and Y.

d. Determine the expression of the Y on T fitting line, following the least square method.

y’ = 1.983 t + 72.22

e. Deduce an expression of the regression of Y on X.

y’ = -1.983(x - 6)² + 72.22

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 9 / 16

Exercise 17. quadratic fitting

A market study was conducted on a new type of product. The table below gives, for several proposed sales

price, the number of people willing to pay that price.

number of people Y 66 47 34 25 18 14

unit p. nb X(X-20) nb sales

X Y T Y’ CA CA’

2 66 -36 62.97 132 125.9

3 47 -51 48.88 141 146.6

4 34 -64 36.66 136 146.7

5 25 -75 26.33 125 131.7

6 18 -84 17.88 108 107.3

7 14 -91 11.3 98 79.13

1) Calculate the covariance of the variables X and Y, then comment its sign.

740

Cov ( X , Y ) = − 4.5 × 34 = −29.67 , non-positive: Y values tend to improve as X decreases.

6

2) We set T = X(X - 20)

a. Calculate le the linear correlation coefficient between both variables T and Y.

−11610 337.33

Cov (T , Y ) = − ( −66.8333 ) × 34 = 337.33 . r= = 0.992487

6 18.95096 × 17.93507

b. Comment its value.

This coefficient (0.992487) is an excellent one.

c. Determine the expression of the Y on T fitting line, following the least square method.

y’ = 0.9393 t + 96.78

d. Deduce an expanded expression of the regression of Y with respect to X.

y’ = 0.9393 (x² - 20x) + 96.78 = 0.9393 x² - 18.79 x + 96.78

3) Here we examine the expected turnover (unit selling price × number of sales), if the numbers of citations

obtained in the survey are considered to be the numbers of units sold.

a. Calculate the turnovers that can be extracted from the initial table.

See above: grey table (turnover = CA = XY)

b. Calculate, for the same values of X, the turnovers CA' that can be got thanks to the formula obtained in

question 2)d.

See above: grey table (turnover = CA’ = XY’)

c. What unit selling price should we fix, so that the best turnover would be reached?

According to the model, it seems that CA’ would be maximum when X is between €3 and €4.

Le’s be a little more accurate:

X 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4

CA' 146.6 147.4 148.1 148.5 148.8 148.8 148.7 148.4 148 147.4 146.6

We will recommend a selling price at about € 3.5 for an optimized turnover.

A perfumery, on analysing its turnover, connects the sales quantities (Y) to various perfume brands and

models prices (X). The results are gathered in the following table:

X, bottle’s price (€) 15 25 30 40 45 60 75 90

Y, # of sold bottles 202 117 107 82 78 60 55 48

Answer the questions beginning with "calculate" by using your calculator’s results.

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 10 / 16

calculator’s results:

28000

Cov ( X , Y ) = − 47.5 × 93.625 = −947.2 , non-positive. Y is globally a decreasing function of X.

8

b. Calculate the linear correlation coefficient of X and Y; comment its value.

−947.2

rXY = = −0.8357 , not very close to 1. The linear correlation between X and Y is not

24.109 × 46.843

excellent (the point cloud may be noisy or following a curve).

850

2) In order to have a more precise idea of how X and Y are related, we set the variable change: T =

X

a. After having calculated the list of values of T, in a third list (calculator), justify that the linear correlation

is excellent between T and Y.

The values of T have been show above. The calculations, relatively to the pair (T, Y), lead to r = 0.9971,

very close to 1. Their linear correlation is excellent.

b. Give the expression of the Y on T regression line, according to the least square method.

y’ = 3.215 t + 15.62

c. What is the least square criterion?

The sum of the squared residues must be minimum (which makes the fitting line unique).

d. Deduce from question 2)b a modelled expression of Y with respect to X.

850 a 2733

y ′ = at + b = +b = + 15.62

x x

e. According to this model, how many bottles whose cost is €150 would the perfumery expect to sell?

2733

If x = 150, the estimate of y is: + 15.62 ≈ 33.84 ≈ 34 : it can expect to sell 34 bottles.

150

Calculate the point estimates, in the given situations.

1) Taking back exercise 10, give an estimate of the expense in 2020.

y’ = 7.0629 x + 37.42 ; x0 = 14 ; hence y’0 = k€ 136.3

2) Taking back exercise 7, give an estimate of the quantity of fertilizer that would offer a harvest of 60 q/ha.

y’ = 0.07647 x + 34.35 ; y’0 = 60 q/ha ; hence x’0 = 335.4 kg/ha

3) Taking back exercise 15, give an estimate of the fuel consumption when the speed is 100 km/h.

y’ = 0.00336 (x – 60)² + 6.535 ; x0 = 100 ; hence y’0 = 11,91 L/100km

Let’s take back exercise 10. We want to estimate the expense, for the year 2020, by a 95% confidence interval.

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 11 / 16

1) a. Get the values of Y’, from the values of X and the expression of the fitting

line;

b. Get the values of Z, by dividing Y by Y’;

c. Then, give the mean and standard deviation of Z.

z = 1.000971 ; σ Z = 0.125286

2) Give the point estimate of the expense in 2020.

see exercise 18-1: y’0 = k€ 136.3

3) Give the coefficient u corresponding to the confidence level.

u = 1.96

4) Then, give the confidence interval.

[129.2(1.000971 – 1.96×0.125286) ; 129.2(1.000971 + 1.96 × 0.125286)] =

[97.6 ; 161]

With exercise 7, estimate the harvest by a 99% confidence interval, due to 300 kg/ha of fertilizer.

1) a. Get the values of Y’, from the values of X and the expression of the fitting line;

b. Get the values of Z, by dividing Y by Y’;

c. Then, give the mean and standard deviation of Z.

z = 0.9991106 ; σ Z = 0.0472554

2) Give a point estimate of the harvest.

y’ = 0.07647 x + 34.35 ; x0 = 300 kg/ha ; hence y’0 = 57.29 q/ha

3) Give the coefficient u corresponding to the confidence level.

u = 2.58

4) Then, give the confidence interval.

[57.29(0.9991 – 2.58×0.047255) ; 57.29(0.9991 + 2.58×0.047255)] = [50.25 ; 64.22]

On each person in a sample, a survey noted the age class (X) and the visual acuity (Y, 1/10 = 0.1):

X

[5 ; 35[ [35 ; 45[ [45 ; 55[ [55 ; 65[

0.3 1 5 10 20

Y 0.6 8 12 25 18

0.9 55 30 14 6

Estimate the visual acuity of a 80 year-old person, by a 99% confidence interval.

z = 0.999266 ; σ Z = 0.298378

Point estimate:

y’ = -0.008422 x + 1.038 ; x0 = 80 ; hence y’0 = 0.3642

Coefficient u: u = 2.58

Confidence interval:

[0.3642(0.999266 – 2.58×0.298378) ;

0.3642(0.999266 + 2.58×0.298378)]

= [0.08358 ; 0.6444]

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 12 / 16

Exercise 23.

In a country, two variables are compared: the consumer force index and the turnover of its car industry:

consumer force (index) X 3.26 3.85 3.44 3.08 3.6

car industry turnover (G€) Y 9.3 9.56 9.36 9.24 9.47

1) Give the expression of the Y on X Mayer’s line.

Two ways to cut this data set (3 points then 2, or 2 points then 3) as X increases.

case 1: G1(3.26 ; 9.3) and G2(3.725 ; 9.515) y = 0.4624 x + 7.793

case 2: G1(3.17 ; 9.27) and G2(3.63 ; 9.463) y = 0.4283 x + 7.912

2) By the mean of a point estimate, give a value of the consumer force that would correspond to a G€ 10 car

industry turnover.

case #1: y = 10 iff x = 4.733

case #2: y = 10 iff x = 4.875

3) Is a strong correlation between two variables a sign of a cause and effect relationship between them?

Not necessarily. This numerical relationship may just be a coincidence.

Monthly revenues of a commercial website are listed below, from January to December 2018:

in k€ : 3 5 4 8 10 9 13 12 17 18 18 21

1) In a few words, describe the least square method.

This method consists in finding out the line that minimizes the sum of the squared residues (rises between

the points and the line).

2) Thanks to the global trend of the evolution of the monthly revenue, give the 95% confidence interval of the

predictable revenue in December 2019. (number the months from 1 for January 2018)

month, X 1 2 3 4 5 6 7 8 9 10 11 12

revenue, 3 5 4 8 10 9 13 12 17 18 18 21

Y

Y’ 2.5 4.136 5.573 7.409 9.045 10.68 12.32 13.95 15.59 17.23 18.86 20.5

Z 1.2 1.209 0.693 1.08 1.106 0.843 1.055 0.86 1.09 1.045 0.954 1.024

Expression of the Y on X regression line: y’ = 1.636 x + 0.8636

Point estimate of the revenue in December 2016 (x = 24): y’0 = k€ 40.14

Variable Z : z = 1.0132222 and σ Z = 0.14538387

Coefficient u for a 95 % confidence level: u = 1.96

Confidence interval: [29.23 ; 52.10]

3) Give the probability that, in December 2019, the revenue would be less than k€ 29.23.

There are 95% chances that this revenue be inside this interval. Moreover, the concept of confidence

interval involves a symmetric probability distribution (the normal law); thus, there are 2.5% chances that

the revenue would be less than the values included in the interval, and 2.5% chances that it would be more

than them. Answer: 2.5%.

4) Build the scatter plot (scale: 2 cm for one month), draw the regression line and finally represent the

confidence interval.

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 13 / 16

Y

revenue (k€)

X

month

city X Y The given table includes eight among the major cities of a country. The variable X

A 850 58 gives, in thousands, the number of city residents; the variable Y gives, in

thousands, the number of students in this city.

B 623 37

C 587 38 1) Build the scatter plot from this data series. see below

D 360 20 2) Give the coordinates of the mean point of the cloud. G(439.1 ; 26)

E 312 16 3) a. Using Mayer’s method, determine manually the expression of the Y on X

regression line.

F 275 15

G1(273.3 ; 13.75) and G2(605 ; 38.25) slope: a = 0.07385

G 262 12 With G1: b = y – ax = -6.430 expression: y’ = 0.07385 x - 6.43

H 244 12

b. Draw this line. Does G belong to it? G always belongs to it

c. Give "Mayer’s principle". the sum of the residues must be zero

Y : # students

(thousands)

X : # residents

(thousands)

4) We will use here another fitting line, whose expression is: y' = 0.07 x – 6.

a. With this line, give the 95% confidence interval of the predictable number of students in a town that has

two million inhabitants.

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 14 / 16

X 850 623 587 360 312 275 262 244

Y 58 37 58 20 16 15 12 12

Y’ 53.5 37.61 35.09 19.2 15.84 13.25 12.34 11.08

Z 1.084 0.984 1.083 1.042 1.01 1.132 0.972 1.083

Expression of the Y on X regression line: y’ = 0.07 x - 6

Point estimate of the number of students (x = 2000): y’0 = 134

Variable Z : z = 1.04877 and σ Z = 0.052588

Coefficient u associated to a 95 % confidence level: u = 1.96

Confidence interval: [126.7 ; 154.3]

b. What can we say about the chances that the number of students would exceed 155,000 in such a town ?

There are a bit less than 2.5 % chances.

Service life of some identical office equipment has been studied. In the following table, ti represents the

duration of use - expressed in thousands of hours - and R(ti) the rate of equipment still in use at the time ti.

(e.g. : after 1,000 hours, ti = 1, there are still 90 % left of equipment in use, R(ti) = 0.90)..

ti 1 2 3 4 5 6 7 8 9

R(ti) 0.9 0.66 0.53 0.4 0.32 0.25 0.19 0.14 0.1

1) We set yi = ln[R(ti)] where ln is the natural logarithm. Fill the following table, then build the scatter plot,

using the points Mi (ti, yi), into an orthogonal frame.

ti 1 2 3 4 5 6 7 8 9

yi -0.105 -0.416 -0.635 -0.916 -1.139 -1.386 -1.661 -1.966 -2.303

Calculate the linear correlation coefficient between T and Y.

These points are almost collinear; a linear fitting appears to be relevant.

3) Using the least square method, determine an expression of the Y on T regression line.

Deduce from this expression that there are two positive real numbers k and λ such that: R(t) = k e- λt.

y’ = -0.26604 t + 0.1605 . y = ln R(t) implies R(t) = ey = e-0.26604 t + 0.1605 = e0.1605 × e-0.26604 t = 1.174 e-0.26604 t .

4) In this question, we'll take k = 1.174 and λ = 0.266.

a. Determine the predictable rate of equipment still in use after 10,000 hours.

After 10,000 hours, t = 10 ; hence R(t) = 1.174 e- 2.66 = 0.08184 = 8,2 % rounded.

b. After how long are there exactly 50 % of equipment still in use?

R(t) = 0.5 implies 1.174 e- 0.266 t = 0.5 iff e- 0.266 t = 0.5/1.174 iff -0.266 t = ln(0.5/1.174)

iff t = ln(0.5/1.174) / (-0.266) = 3.209. Answer: after 3,209 hours.

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 15 / 16

5) Give a 99% confidence interval of the rate of equipment still in use after 10,000 hours of service.

T 1 2 3 4 5 6 7 8 9

Y -0.105 -0.416 -0.635 -0.916 -1.139 -1.386 -1.661 -1.966 -2.303

Y ’ -0.106 -0.372 -0.638 -0.904 -1.170 -1.436 -1.702 -1.968 -2.234

Z 0.998 1.118 0.996 1.014 0.974 0.966 0.976 0.999 1.031

Point estimate of the rate (t = 10) : y’0 = -2.5

Variable Z : z = 1.007964 and σ Z = 0.043476

Coefficient u associated to a 99 % confidence level: u = 2.58

Confidence interval on y : [-2.8003 ; -2.2395] and the the interval on R is: [0.0608 ; 0.1065].

Exercise 27.

100 children have been classified by age (X) and size (Y):

Y

[95 ; 105[ [105 ; 125[ [125 ; 135[

[3 ; 5[ 15 10 0

X [5 ; 7[ 8 32 5

[7 ; 9[ 2 13 15

2) Give the means and standard deviations of X and Y, calculate their covariance.

3940

x = 6.1 years , V ( X ) = − 6.12 = 2.19 , σ ( X ) = 1.480 year ;

100

1315375

y = 114.25 cm , V (Y ) = − 114.252 = 100.6875 , σ (Y ) = 10.03 cm .

100

70540

Cov ( X , Y ) = − 6.1 × 114.25 = 8.475 .

100

3) Calculate their linear correlation coefficient. Comment this value.

8.475

r= = 0.5709 , a very weak linear correlation (the cloud may be noisy and curved).

1.480 × 10.03

4) Nevertheless, does the table allow us to see some trend?

We see that from one age to another, the sizes corresponding to the greatest number of individuals are not

the same. But these largest frequencies do not represent, in their column, an overwhelming majority,

which reflects a high variability of sizes for children of the same age. To model the growth of a child by a

straight line is therefore difficult, or even by a well-defined curve.

5) Assuming that the relationship between age and size is linear until the age of 12, give the 95% confidence

interval of the size of a 12 year-old child.

X 4 6 8 4 6 8 4 6 8

Y 100 100 100 115 115 115 130 130 130

n 15 8 2 10 32 13 0 5 15

Y’ 106.12 113.86 121.6 106.12 113.86 121.6 106.12 113.86 121.6

Z 0.94233 0.87827 0.82237 1.08368 1.01001 0.94572 1.22503 1.14175 1.06908

Expression of the Y on X regression line: y’ = 3.87 x + 90.64

Point estimate of the size of a 12 yo child (x = 12): y’0 = 137.08 cm

Variable Z : z = 1.013138 and σ Z = 0.121881

Coefficient u corresponding to a 99 % confidence level: u = 1.96

Confidence interval on y : [106.1 ; 171.6].

____________________________________________________________________________

IUT de Saint-Etienne – Département TC –J.F.Ferraris – Math – S2 – Stat2Var – TExCorr – Rev2020 – page 16 / 16

- The Effect of Corporate Income Tax on Financial Performance of Listed Manufacturing Firms in GhanaTransféré parAlexander Decker
- Hu and Bentler 1998Transféré pardrdjettel
- MB0024 - CompletedTransféré parvarsha
- Calibration Intervals From Variables Data(2)Transféré parValya Ruseva
- Analyze Split Plot ExperimentTransféré parAdonay Mc
- Association and CorrelationTransféré parJoshua Samarista
- Ststistical Treatment of DataTransféré parvhine
- 91272Transféré parSjd Gh
- Jurnal IngTransféré parMarina Lucuh
- 40 2 Intvl Est VarTransféré parEbookcraze
- BibliographyTransféré parbuddysmbd
- Econometric Exercise (1)Transféré parrtk9787
- Research Paper-Correlation Between Math and Science and Visual ArtsTransféré paraloyswims
- SR USATransféré parsilkaiv
- Jalgaon 15-1-2011Transféré parSatish Sonawane
- Marketing Online Degree ProgramsTransféré parRamanjooloo Deelan
- ikm2Transféré parNabila Saribanun
- translate dong2.docxTransféré parAhmad Syaukat
- BRM FINAL1Transféré parPankaj Lahoty
- Hasil Spss NewTransféré parrusli
- bivariat.docxTransféré parPSC Provriau
- 1109201415[1]Transféré parPiyush Aggarwal
- 45-90-1-SMTransféré parChristopher Adhisasmita
- staticsTransféré parAshu Bahuguna
- Smoking & GenderTransféré parLe Huy
- Ptosis After Glaucoma SurgeryTransféré parAldora Sindila Febriana
- Deshmukh Abstract NewTransféré parVadivel N P Pillai
- AdiTransféré parDefri Syahputra SKep Ns
- 05633491Transféré parsuchi87
- 12887_2017_Article_912Transféré parJocel Castillo

- s2e - stat2var - tex - doc - rev 2020Transféré parapi-203629011
- s3e - probdist - texcorr - doc - rev 2019Transféré parapi-203629011
- s3e - probdist - tex - doc - rev 2019Transféré parapi-203629011
- s2e - stat2var - slides - rev 2019Transféré parapi-203629011
- s2e - stat2var - lessons - rev 2019Transféré parapi-203629011
- s2e - combprob - lessons - rev 2018Transféré parapi-203629011
- s3e - probdist - lessons - rev 2017Transféré parapi-203629011
- es4 - analysis - slides - rev 2014Transféré parapi-203629011
- es4 - analysis - lessonsex - rev2014Transféré parapi-203629011
- es4 - alglin - lessonsex - rev2014Transféré parapi-203629011
- es3 - inferstats - texcorr - 1 discrete laws - rev 2014nTransféré parapi-203629011
- es2 - stat2var - lessonsex - rev 2013Transféré parapi-203629011

- Survey of Illness Behaviour and Its Effects on Access to Health Services Among Rural Dwellers at Ethiope East Local Government Area of Delta State NigeriaTransféré parIJARP Publications
- LG-LaptopTransféré parprojectsformba2009
- Brand preference of soapTransféré parsunithascribd
- Bisleri ProjectTransféré parBHARAT
- PERCEPTION OF NURSE WORK ENVIRONMENT IN MATERNITY HOSPITAL.Transféré parIJAR Journal
- Substance Abuses among the Students of Higher Secondary Schools: A Case StudyTransféré parIOSRjournal
- Marketing - Soap Industry SynopTransféré parKishore Khatri
- Stats Ch11Transféré parHongYu Hui
- A Project Report on a Study on Brand Loyalty Management in Retail Industry 2Transféré parAdarsh Srivastava
- B.tech. _Electronics & Communication Engg_ Semester I-VIII _CBCEGS_3rd SemesterTransféré parRohit Duggal
- quiz 2Transféré parTarun Bansal
- CCTV ThesisTransféré parJoe Black
- Chi-square Test PresentationTransféré parpchimanshu27
- TI-NspireCAS ReferenceGuideTransféré parDiana Lucía
- data.pptTransféré parroshnibiju
- Water_Science_and_Technology_Library_Hyd.pdfTransféré parPrasad Plvk
- Perhitungan SPSSTransféré parFarisa Rahma
- StataTransféré parYamelin Reyes
- Measure of Central TendencyTransféré parsanchi_23
- classfication tree.pptTransféré parGaurav Khokhani
- Can We Explain Airport PerformanceTransféré parNishant Gupta
- Customer Satisfaction of NOKIA Mobile in East Delhi ProjectTransféré parChandan Parsad
- Engineering Statistics Handbook 2003Transféré pard-fbuser-67904250
- Kharo Uf Ah 2018Transféré parFilipeDaumas
- time cost 681-2608-1-PBTransféré parNihar Nanyam
- Hi Kvadrati ZasveTransféré parIX
- 13 Statistical Fdn Nominal Ordinal Data Analysis (1)Transféré parShakander Badsha
- Data Analisis AAN1 FitriTransféré parvivirahmania
- Research Article TMJ and Hearing LossTransféré parMusthafa Afif Wardhana
- Advanced Data Analysis Binder 2015Transféré parChirag