Académique Documents
Professionnel Documents
Culture Documents
A hypothesis which states that there is no difference between assumed and actual value of
the parameter is called the null hypothesis and the hypothesis that is different from the null
hypothesis is the alternative hypothesis. Null hypothesis is denoted by H 0 and alternative
hypothesis by H1 .
Example: An auditor wishes to test the assumption that the mean value of all accounts
receivable in a given firm is $260.00 by taking a sample of n=36 and computing the sample
mean. He wishes to reject the assumed value of $260.00 only if it is clearly contradicted by
the sample mean, and thus the hypothesized value should be given the benefit of doubt in
the testing procedure. The null and the alternative hypothesis of the procedure are
H 0 $260.00, and H1 $260.00
Meaning
Risk Symbol
Type I
Type II
Accept
False
Type II
Error
h0 : 12 OZ
ha : 12 OZ
Rejection Region
Rejection Region
Critical Value
Ho : 40
Ha : 40
Rejection Region
Non Rejection Region
=40 ounces
Critical Value
Ho : 40
Ha : 40
Rejection Region
Non Rejection Region
=40 ounces
Critical Value
Ho : 12
Ha : 12
Rejection
Region
Rejection
Region
x
N n
n N 1
x
s
n
-- The probability of getting a test statistic at least as extreme as the observed test
statistic (computed from the data) is computed under the assumption that the null
hypothesis is true.
-- p value is the smallest value of
P Do not reject H0
2
Example: If the p-value of a test is .038, the null hypothesis cannot be rejected at
=0.01 because .038 is the smallest value of for which the null hypothesis can
be rejected. However, the null hypothesis can be rejected for =0.05.
Example 9.17
Suppose that in the past years the average price per square foot for warehouses in
the United States has been $32.28. A national real state investor wants to determine
whether the figure has changed now. The investor hires a researcher who randomly
samples 19 warehouses that are for sale across the United States and find that the
mean price per square foot is $31.67, with a standard deviation of $1.29. If the
researcher uses a 5% level of significance, what statistical conclusion can be
reached? What are the hypothesis.
Ex. 9.16 Suppose a study report that the average price for a gallon of
self-serve regular unleaded gasoline is $1.16.You believe that the
figure is higher in your area of the country. You decide to test this
claim for your part of the United States by randomly calling gasoline
stations. Your random survey of 25 stations produces the following
prices.
$1.27
$1.29
$1.16
$1.20
$1.37
1.20
1.23
1.19
1.20
1.24
1.16
1.07
1.27
1.09
1.35
1.15
1.23
1.14
1.05
1.35
1.21
1.14
1.14
1.07
1.10
0.10
X C Zc
12 (1.645)
11.979
n
60
If
x 11.979, reject H 0
If
x 11.979, accept H 0
Figure in the next slide gives distribution of values when the null
hypothesis is true (top) and alternative mean 11.99 ounces is
true (bottom)
How often will the business researcher fail to reject the top
distribution as true when, in reality, the bottom distribution is
true
H 0 : 12 OZ
H a : 12 OZ
Suppose a sample of 60 cans of beverage yields a sample mean of
11.985 ounces with a standard deviation of 0.10 ounces. 0.05and a
one tailed test.
Z
11.985 12.00
1.16 Z 0.05 1.645
.10
60
X c 1 11.979 11.99
Z1
0.85
s
.10
n
60
but 11.99
is
Power
11.999
.94
.06
11.995
.89
.11
11.990
.80
.20
11.980
.53
.47
11.970
.24
.76
11.960
.07
.93
11.950
.01
.99
57 56
52
44
58 53
44
44
48 51
56
48
63 53
51
50
1.62
1.63
1.70
1.66
1.63
1.65
1.71
1.64
1.69
1.57
1.64
1.59
1.66
1.63
1.65
x1 x2 1 2
x1 x2
n1
n2
2
1
n1
and n2 30
x1 x2 1 2
21 2 2
n2
n1
n2
n1
Confidence Intervals
Confidence intervals for the difference in two populations means
21 2 2
x1 x2 z 1 2 x1 x2 z
n2
n1
21 2 2
n2
n1
s 21 s 2 2
x1 x2 z 1 2 x1 x2 z
n1 n2
s 21 s 2 2
n1 n2
Exercise 10.6
The Bureau of Labor Statistics shows that the average insurance
cost to a company per employee per hour is $ 1.84 for managers
and $ 1.99 for professional specialty workers. Suppose these
figures were obtained from 35 managers and 41 professional
specialty workers and their respective standard deviations are $
0.38 and $ 0.51. Calculate 98% confidence interval to estimate the
difference in the mean hourly company expenditures for insurance
for these two groups. What is the value of the point estimate? Test
to determine whether there is a significant difference in the hourly
rates employers pay for insurance between managers and
professional specialty workers. Use 2% level of significance.
Exercise 10.7 :
A companys auditor believes the per diem cost in Nashville,
Tennessee, rose significantly between 1992 and 1999. To test
this belief, the auditor samples 51 business trips from
companys records for 1992, the sample average was $ 190 per
day, with a samples standard deviation of $ 18.50. The auditor
selects a second random sample of 47 business trips from
companys record for 1999; the samples average was $ 198 per
day, with a standard deviation of $ 15.60.If he uses a risk of
committing a Type-I error of 0.01, does the auditor find that
the per diem average expense in Nashville has gone up
significantly?
22
2
2
The t formula to test the difference in means assuming 1 2
x1 x2 1 2
t
s 21 n1 1 s 2 2 n1 1
n1 n2 2
df n1 n2 2
2
1
1 1
n1 n2
is
x1
x2
df
s 21
s22
n1
n2
s 21
s 21
n1
n1
n1 1
n1 1
Exercise 10.17 (page 360). Based on an indication that mean daily car
rental rates may be higher for Boston than in Dallas, a survey of eight
car rental companies in Boston is taken and the sample means car
rental rate $ 47, with a standard deviation of $ 3. Further, suppose a
survey of nine car rental companies in Dallas results in a sample
mean of $ 44 and a standard deviation of $3. Use alpha =0.01 to test
to determine whether the average daily car rental rates in Boston are
significantly higher than those in Dallas. Assume car rental rates are
normally distributed and the population variances are equal.
Where
d D
t
sd
n
df n 1
N= number of pairs
d=sample difference in pairs
D=mean population difference
Sd=s.d. of samples difference
D=mean sample difference
Formula for d and S
d
d
n
sd
d d
n 1
d
2
n 1
Before
After
255
197
230
225
290
215
242
215
300
240
250
235
215
190
Employee
Before
After
230
240
225
200
10
219
203
11
236
223
2.
3.
For large
n p samples
5
1
n1 q1 5
n1 p 2 5
n1 q 2 5
3
4
p1proportions
P2normally distributed with
p 2 P1 is
The difference in sample
p1 p 2
P1.Q1 P2 .Q2
n1
n2
p 1 p 2 p1 p2
Z
P1.Q1 P2 .Q2
n1
n2
= size of sample 1
n2
= size of sample 2
p1
p2
Q1 =1-P1
Q2=1-P2
p 1
2 p1 p2
p
P .Q
X1 X 2
P
n1 n2
n P
n1 P
1
2 2
n1 n2
Q 1 P
1
1
n2
n1
p 1 q1 p 2 q 2
p1 p2 p 1 p 2 z
n1
n2
p 1q1 p 2 q 2
n1
n2
10.41 Suppose the data shown here are the result of a survey to
investigate gasoline prices. Ten service stations were selected
randomly in each of the two cities and the figures represent the
prices of a gallon unleaded regular gasoline on a given day. Use the
F test to determine wheher there is a significant difference in the
variances of the prices of unleaded regular gasoline between two
cities. Let =1.0. Assume gasoline prices are normally distributed.
City 1
City2
1.18
1.07
1.13
1.08
1.05
1.19
1.15
1.14
1.13
1.17
1.21
1.12
1.14
1.13
1.03
1.14
1.14
1.13
1.09
1.11
Houston
Chicago
132
126
118
56
138
94
85
69
131
161
113
67
127
133
81
54
99
119
94
137
126
88
93
134
X2 Goodness-of-Fit Test
Formula to be used to compute the chi-square goodness-of-fit test.
2
f o f e
fe
df = k - 1 - c
where : f o frequency of observed values
f e frequency of expected values
k number of categories
c = number of parameters estimated from the sample data
2 ( )
2
0
The parameter
X.
x
1
2
2
x0
x0
v and variance 2 v.
Age of purchaser
Frequency
10-under 20
16
20-under 30
44
30-under 40
61
40-under 50
56
50-under 60
35
60-under 70
19
Number of calls
(per 5-minute interval)
Frequency
18
28
47
21
16
11
6 or more
9
56
19
17
14
18
19
21
18
18
F0
D. West
G. Treasury bills
Type of financial
Investment
Contingency Table
E
A
Geographic B
C
Region
D
nE
nF
G
O13
nG
nA
nB
nC
nD
N
P F
P A F
AF
n n
A
N P A F
n n
n
n
Type of Financial
Investment
Contingency Table
E
Geographic
Region
A
B
C
D
e12
nE
nF
nG
nA
nB
nC
nD
N
ni .n j
N
Where i the row
j the column
ni the total of row i
eij
Calculated/observed value of
2
fe
fe
Where
df (r - 1)(c - 1)
r number of rows
c number of columns
f
0 e
fe
2
Type of
Gasoline
Income
Less than $30,000
$30,000 to $49,999
$50,000 to $99,000
At least $100,000
Regular Premium
85
16
102
27
36
22
15
23
238
88
Extra
Premium
6
13
15
25
59
107
142
73
63
385
Expected Frequency:
n
n
ij
N
107 238
e11 385
66.15
12
107 88
385
24 .46
e13
107 59
385
16.40
Type of
Gasoline
Income
Less than $30,000
$30,000 to $49,999
$50,000 to $99,000
At least $100,000
Regular Premium
(66.15)
(24.46)
85
16
(87.78)
(32.46)
102
27
(45.13)
(16.69)
36
22
(38.95)
(14.40)
15
23
238
88
Extra
Premium
(16.40)
6
(21.76)
13
(11.19)
15
(9.65)
25
59
107
142
73
63
385
2 Calculation
f o f e
f
88 66.15
16 24.46
27 32.46
66.15
102 87.78
87 .78
36 4513
.
45.13
70.78
38.95
24 .46
6 16.40
13 21.76
32 .46
15 1119
.
23 14.40
16.69
25 9.65
14.40
22 16.69
15 38.95
16.40
21.76
11.19
9.65
Conclusion :
df = 6
Non rejection
region
16.812
0.01
Exercise 12.13. Use the following contingency table and the chi-square test of
independence to determine whether social class is independent of number of
children in a family.
Social Class
Lower
Middle
Upper
18
31
38
23
70
2 or 3
34
97
58
189
47
31
30
108
97
184
117
398
Number of
Children
= 7.56
e31
= 46.06
e12
= 14.3
e31
= 87.38
e13
= 9.11
e31
= 55.56
e21
= 17.06
e31
= 26.32
e22
= 32.36
e31
= 49.93
e23
= 20.58
e31
= 31.75
Social Class
Lower
Middle
Upper
(7.56)
7
(14.33)
18
(9.11)
6
31
(17.06)
9
(32.36)
38
(20.58)
23
70
2 or 3
(46.06)
34
(87.38)
97
(55.56)
58
189
(26.32)
47
(49.93)
31
(31.75)
30
108
97
184
117
398
Number
of
Children
Analysis of Variance
Experimental Design
A plan and a structure to test hypotheses in which the
researcher controls or manipulates one or more variables.
Independent Variable
Treatment variable is one that the experimenter controls
or modifies in the experiment.
Classification variable is a characteristic of the
experimental subjects that was present prior to the
experiment, and is not a result of the experimenters
manipulations or control.
Dependent Variable
The response to the different levels of the independent variables.
In the valve experiment the dependent variable is the size of the
opening of the valve.
Analysis of Variance : experimental designs are analyzed
statistically by a group of techniques known as Analysis of
Variance.
Example : Suppose the measurements for the opening of 24 valves
randomly selected from an assemble line are given below.
6.26
6.19
6.33
6.26
6.50
6.19
6.44
6.22
6.54
6.23
6.29
6.40
6.23
6.29
6.58
6.27
6.38
6.58
6.31
6.34
6.21
6.19
6.36
6.56
H 0 : 1 2 3 .... k
MSC
F
MSE
If FFc, reject Ho
If F Fc, do not reject Ho
x
c
j 1
i 1
x n x
2
ji
j 1
n1
x x
c
j 1
i 1
ij
n x
c
j 1
SSE
MSC
i 1
nj
i 1
SSC
df c
SSE
MSE
df E
MSC
MSE
nj
x
c
j 1
x
c
j 1
SST
ij
df c c 1
j xj
x
df E N C
df T N 1
If
v 2 n2 1 degrees of freedom.
s 21
2 then the statistic F 2 follows F distribution with
s 2
v2
for v 2
v2 2
Variance
2v 2 2 v1 v 2 2
v1 v 2 2
v2 4
for v 2
Machine 2
Machine 3
Machine 4
4.05
3.99
3.97
4.00
4.01
1.02
3.98
4.02
4.02
4.01
3.97
3.97
4.04
3.99
3.95
4.01
4.00
4.00
4.00
High Level
Mid Level
Low Level
____________________________________________________
7
10
10
8
_____________________________________________________
.
Row
Treatment
.
.
.
.
.
.
.
.
.
.
.
.
.
Cells
.
.
.
Ho:
Ha:
Columns Effects: Ho:
Ha:
Interaction Effects: Ho:
Ha:
SSR nC ( X i X )
df R R 1
df C C 1
i 1
C
SSC nR ( X j X )
j 1
SSI n ( X ij X i X j X )
i 1 j 1
SSE ( X ijk X ij )
i 1 j 1 k 1
C
SST ( X ijk X )
c 1 r 1 a 1
SSR
R 1
SSC
MSC
C 1
SSI
MSI
R 1 C 1
SSE
MSE
RC n 1
MSR
where :
n = number of observations per cell
C = number of column treatments
df I R 1 C 1
R = number of row treatments
df E RC n 1
i = row treatment level
j = column treatment level
df T N 1
k = cell member
MSR
Xijk = individual observation
FR
MSE
Xij = cell mean
MSC
FC
Xi = row mean
MSE
MSI
X j = column mean
FI
X = grand mean
MSE
3 or more
__________________________________________________
4-5
Age of
6-7
Child
(years)
8-9
10-12
____________________________________________________
IFthetwovariablesdeviateinthesamedirection,correlationissaidto
bepositive.
Scatterdiagrammethod.
II.
II.
III.
III. Spearmansrankcorrelationcoefficient.
KarlPearsonsCoeffofCorrelation.
Ifallthepointslieonastraightlinefallingfromlowerlefthandcornertothe
upperrighthandcorner,correlationissaidtobeperfectlypositive.
Iftheptslyingonastraightlinerisingfromtheupperlefthandcornertothe
lowerrighthandcorner,correlationissaidtobeperfectlynegative.
If the plotted points fall in a narrow band there will be high degree of
correlationbetweenthevariables.
If the points are widely scattered in the diagram it indicated a very low
degreeofrelationship.
Capitalemployed1
23456789
1112
(CroresinRs.)
Profit(LakhsinRs.)35479810111214
(a)Makeascatterdiagram
(b) Do you think that there is any correlation between profits and capital
employee?Isitpositive?Isithighorlow?
KarlPearsonCoefficientofCorrelation.
Correlationcoefficientbetweentworandomvariables
X and Y, usually denoted by r (X, Y) or , is numerical
measureoflinearrelationshipbetweenthemandisdefined
as
Cov ( x, y )
( X X ) (Y Y )
r ( x, y )
2
2
( X X ) (Y Y )
x y
1 r ( X , Y ) 1
10
Monthly
Income
780
360
980
250
750
820
900
620
650
390
NetSavings
84
51
91
60
68
62
86
58
53
47
No
xy
780
84
130
18
16900
324
2340
360
51
-290
-15
84100
225
4350
980
91
330
25
108900
625
8250
250
60
-400
-6
160000
36
2400
750
68
100
10000
200
820
62
170
-4
28900
16
-680
900
86
250
20
62500
400
500
620
58
-30
-8
900
64
240
650
53
-13
169
10
390
47
-260
-19
67600
361
4640
Total
6500
660
539800
2224
27040
xy
x2 y2
27040
(539800) (2224)
0.78 (approx )
6 D2
R 1
N (N 2 1)
WhereR=rankcoeffofcorrelation
D=Differenceofranksbetweenpaireditemsintwoseries.
N=Numberofdatapoints.
Detergent
Geeta
Rita
10
10
Inordertofindouthowfarthepreferencesfordifferentkinds
of detergents go together, we will calculate the rank
correlationcoeff.
Detergent
Rank
Geeta
10
10
N=10
by
RankbyRita
6 D2
66
R1 1 3
1
0.960
990
N N
Thus the preference of these two ladies agree
verycloselyasfarastheiropinionondetergentis
concerned.
1
1
3
6 D
( m1 m1 )
(m 23 m 2 ) ....
12
12
R1 1
N3 N
2
Where m
1 is the number of item whose rank is same in
group1Andsoon.
Applicant
Marks
in
Accountancy
15
20
28
12
40
60
20
80
Marks
Statistics
40
30
50
30
20
10
30
60
in
Applicant
Marksin
Accountancy
Ranksassigned
Marks
Statistics
15
30
16.00
20
3.5
30
0.25
28
50
4.00
12
30
9.00
40
20
16.00
60
10
36.00
20
30
0.25
80
3.5
60
0.00
N=8
in
Rank
Assignedby
1
1
R1 1
N3 N
m1 2, m2 3
1 3
1 3
6 81.5 (2 2) (3 3)
12
12
1 684 0
Thus R 1
504
82 8
Regression Analysis.
bivariate(twovariables)linearregression
-
themostelementaryregressionmodel
predictor
or
DeterministicRegressionModel
Y=0+1X
0and1arepopulationparameters
0and1areestimatedbysamplestatisticsb0andb1
EquationoftheSampleRegressionLine
Y b0 b1 x
b1
X X Y Y
X X
XY nXY XY nXY
2
2
2
2
X nX
X nX
Y
X
b0 Y b1 X
b1
n
n
X Y
SSxy X X Y Y XY
n
X2
SSxx X X X
n
SSxy
b1
SSxx
Y
X
b0 Y b1 X
b1
n
n
2
XY
X Y
n
( X ) 2
2
X
n
Advertisement
Sales
12.5
148
3.7
55
21.6
338
60.0
994
37.6
541
6.1
89
16.8
126
41.2
379
StandardErrorofEstimate
Residuals: Each difference between the actual y values and
the predicted y values
is the error of the regression line at a
y y
givenpoint,andisreferredtoasresiduals.
StandardErrorofEstimate:Toexaminetheerrorofthemodel
we need to calculate he standard error of estimate, which
providesasinglemeasurementoftheregressionerror.
SSE=Sumofthesquaresoftheerror= y y
Standarderrorofestimate= S e
Shortcutformula=
SSE
n2
Y 2 b0 y b1 xy
Annual sales ($
billions)
Annual
Volume
(millionsofshares)
Merk
10.5
728.6
PhillipMorris
48.1
497.9
IBM
64.8
439.1
EastmanKodak
20.1
377.9
Bristol-MyersSquibb
11.4
375.5
GeneralMotors
123.8
363.8
FordMotors
89.0
276.3
Coefficient of Determination.
TheerrorsumofsquaresSSEcanbeinterpretedasmeasureof
howmuchvariationinyisleftunexplainedbythemodel-Thatis
howmuchcannotbeattributedtoalinearrelationship.
Aquantitativemeasureofthetotalamountofvariationinobserved
yvaluesisgivenbythetotalsumofsquares.
The ratio SSE/SST is the proportion of the total variation that
cannotbeexplainedbythesamplelinearregressionmodel,and
1-SSE/SST(anumberbetween0and1)istheproportionofthe
observedyvariationexplainedbythemodel.
The Coefficient of Determination :
SSE
1
SST
The higher the value of r2, the more successful is the simple
linearregressionmodelinexplainingyvariation.
MedianHouseholdIncome($1000)
116.8
37.415
91.5
36.770
68.5
35.501
61.6
35.047
65.9
34.700
90.6
34.942
100.0
35.887
104.6
36.306
125.4
37.005
PredictionIntervaltoEstimateYforagivenvalueofX
Y fa
2
,n 2
1 X0 X
1
n
SSxx
Se
13.39.ASpecialistinhospitaladministrationstatedthatthenumber
of FTEs (full time employees) in a hospital can be estimated by
countingthenumberofbedsinthehospital(acommonmeasureof
hospital size). A health care business researcher decided to
developaregressionmodelinanattempttopredictthenumberof
FTEs of a hospital by the number of beds. She surveyed 12
hospitalsandobtainedthefollowingdata.Thedataarepresented
insequence,accordingtothenumberofbeds.
Numberofbeds
FTEs
Numberofbeds
FTEs
23
69
50
138
29
95
54
178
29
102
64
156
35
119
66
184
42
126
76
176
46
125
78
225
Constructa90%intervalforasinglevalueofy.Usex=100.