QT II (Hy I) & (Hy II)

Hypothesis Testing
Steps of Hypothesis Testing

Setp-1. Establish a null and alternative hypothesis.
Step 2. Determining the appropriate statistical test.
Step 3. Set the value of alpha, the type I error rate.
Step 4. Establish the decision rule.
Step 5. Gather sample data.
Step 6. Analyze the data.
Step 7. Reach a statistical conclusion.
Step 8. Make a business decision.
Null and the Alternative Hypothesis.

The null hypothesis asserts that there is no true difference in the sample statistics and the
population parameter under consideration and the difference found is accidental arising out
of fluctuation of the sampling.
A hypothesis which states that there is no difference between assumed and actual value of
the parameter is called the null hypothesis and the hypothesis that is different from the null
hypothesis is the alternative hypothesis. Null hypothesis is denoted by H 0 and alternative
hypothesis by H1 .
Example: An auditor wishes to test the assumption that the mean value of all accounts
receivable in a given firm is $260.00 by taking a sample of n=36 and computing the sample
mean. He wishes to reject the assumed value of $260.00 only if it is clearly contradicted by
the sample mean, and thus the hypothesized value should be given the benefit of doubt in
the testing procedure. The null and the alternative hypothesis of the procedure are
H 0 $260.00, and H1 $260.00
Type I and Type II error

Statistical decisions generally entail some risk of error. We define two
types of error
Error
Meaning
Risk Symbol
Type I
Rejecting the null hypothesis when it is true
Type II
Accepting the null hypothesis when it is

false
Null hypothesis is either

True
Decision is Reject
either to
Accept
False
Type I Error Correct

Correct
Type II
Error
Null and Alternative Hypotheses : Example

A soft drink company is filling 12 oz. Cans with cola.
The company hopes that the cans are averaging 12 ounces.
h0 : 12 OZ
ha : 12 OZ
Rejection and non rejection region

Statistical outcomes that result in the rejection of the null hypothesis lie in what is
termed the rejection region. Statistical outcomes that fail to result in the rejection
of the null hypothesis lie in what is termed as non rejection region.
Rejection Region
Rejection Region
Non Rejection Region

=40 ounces
Critical Value
Critical Value
One tail and two tail test
Ho : 40
Ha : 40
Rejection Region
=40 ounces
Critical Value
Ho : 40
Ha : 40
Rejection Region
=40 ounces
Critical Value
Two tail test.
Ho : 12
Ha : 12
Rejection
Region
Rejection
Region

=12 OZ
Critical Value
Testing of Hypothesis about a population mean using the z-Statistic

Formulas for testing hypothesis
1. Formula below can be used to test hypothesis about a single population mean if
the sample size is large (n 30) for any population and for samples (n<30) if x is
known to be normally distributed
z
2. When population standard deviation is unknown
3. Testing the Mean with a Finite Population

z
x
N n
n N 1
x
s
n
P-Value Method to Test Hypothesis

-- No preset value of
is given in the p-value method
-- The probability of getting a test statistic at least as extreme as the observed test
statistic (computed from the data) is computed under the assumption that the null
hypothesis is true.
-- p value is the smallest value of
for which the null hypothesis can be rejected.
Procedure for two tailed test

Split alpha to determine the critical valueof the test statistic.
P, the probability of getting a test statistic at least as extreme as observed value is
computed.
If
P Do not reject H0
2
Example: If the p-value of a test is .038, the null hypothesis cannot be rejected at
=0.01 because .038 is the smallest value of for which the null hypothesis can
be rejected. However, the null hypothesis can be rejected for =0.05.
Exercise 9.5 (Page 306)

According to the U.S. Bureau of Labor Statistics, the average weekly earnings of
a production worker in 1997 were $424.20. Suppose a labor researcher wants to
test to determine whether this figure is still accurate today. The researcher
randomly selects 54 production workers from across the United States and
obtains a representative earning statement for one week from each. The resulting
sample average is $432.69, with a standard deviation of $33.90. Use this data
and hypothesis testing techniques along with a 5% level of significance to
determine whether the mean weekly earnings of a production worker have
changed. Also solve the problem by using the p-value method.
Testing Hypothesis about a Population Mean Using the t-statistic

The formula for testing such hypothesis
x
s
n
df n 1
Example 9.17
Suppose that in the past years the average price per square foot for warehouses in
the United States has been $32.28. A national real state investor wants to determine
whether the figure has changed now. The investor hires a researcher who randomly
samples 19 warehouses that are for sale across the United States and find that the
mean price per square foot is $31.67, with a standard deviation of $1.29. If the
researcher uses a 5% level of significance, what statistical conclusion can be
reached? What are the hypothesis.
Ex. 9.16 Suppose a study report that the average price for a gallon of
self-serve regular unleaded gasoline is $1.16.You believe that the
figure is higher in your area of the country. You decide to test this
claim for your part of the United States by randomly calling gasoline
stations. Your random survey of 25 stations produces the following
prices.
$1.27
$1.29
$1.16
$1.20
$1.37
1.20
1.23
1.19
1.20
1.24
1.16
1.07
1.27
1.09
1.35
1.15
1.23
1.14
1.05
1.35
1.21
1.14
1.14
1.07
1.10
Assume gasoline prices for a region are normally distributed. Do the

data you obtained provide enough evidence to reject the claim? Use a
1 % level of significance.
Testing Hypothesis about a Proportion

p p
p.q
n
where p sample proportion
p population proportion, q 1 - p
z
Example 9.22. The Independent Insurance Agents of America conducted a survey

of insurance consumers and discovered that 48% of them always reread their
insurance policies, 29% some time do, 16% rarely do, and 7% never do. Suppose a
large insurance company invests considerable time and money in rewriting policies
so that they will be more attractive and easy to read and understand. After using the
new policies for a year, company managers wants to determine whether rewriting
the policies significantly change the proportion of the policy holders who always
reread their insurance policy. They contact 380 of the companys insurance
consumers who purchased a policy in the past year and ask them whether they
always reread their insurance policies. One hundred and sixty four respond that
they do. Use a 1% level of significance to test the hypothesis.
Solving for Type II Errors (Determine the Probability of

Committing a Type II Error)
-- A type II error can be committed only when the researcher fails to reject the
null hypothesis and the null hypothesis is false
-- A type II error , ,varies with the possible values of the alternative parameters
Procedure for computing the type-II error

Calculate the critical value for the sample mean
0.10
X C Zc
12 (1.645)
11.979
n
60
If
x 11.979, reject H 0
If
x 11.979, accept H 0
Figure in the next slide gives distribution of values when the null
hypothesis is true (top) and alternative mean 11.99 ounces is
true (bottom)
How often will the business researcher fail to reject the top
distribution as true when, in reality, the bottom distribution is
true
A soft drink company is filling 12 oz. Cans with cola

The company hopes that the cans are averaging 12 ounces.
H 0 : 12 OZ
H a : 12 OZ
Suppose a sample of 60 cans of beverage yields a sample mean of
11.985 ounces with a standard deviation of 0.10 ounces. 0.05and a
one tailed test.
Z
11.985 12.00
1.16 Z 0.05 1.645
.10
60
Decision : Do not reject null hypothesis

Alternative : Either correct decision or type II error
If mu actually equals 11.99 ounces, what is the probability of

failing to reject mu=12 ounces when 11.979 ounces is the critical
value?
X c 1 11.979 11.99
Z1
0.85
s
.10
n
60
The value of Z yields an area of .3023. The probability of

committing type-II error is the area to the right of xc 11.979 in
the lower distribution, or 0.3023+0.5=0.8023.
Fail to reject the null hypothesis 12
correct.
but 11.99
is
Hence there is an 80.23 % chance of committing a type-II error if

the alternative mean is 11.99 ounces.
Values and Power Values for the Soft-Drink Example

Power : Power is the probability of rejecting the null hypothesis
when it is false and representing the correct decision of selecting
the alternative hypothesis when it is true.Power is equal to 1
Value of for various values of alternative means
Power
11.999
.94
.06
11.995
.89
.11
11.990
.80
.20
11.980
.53
.47
11.970
.24
.76
11.960
.07
.93
11.950
.01
.99
Operating characteristic curve : Plotting the against the

various values of the alternative hypothesis.
Power Curve : Plotting the power values against the
various values of the alternative hypothesis.

The New York Stock Exchange recently reported that the average age
of a female shareholder is 44 years. A broker in Chicago wants to
know whether this figure is accurate for the female shareholders in
Chicago. The broker secures a master list of share holders in Chicago
and takes a random sample of 58 women. Suppose the average age of
the shareholders in the sample is 45.1 years, with a standard deviation
of 8.7 years. Test to determine whether the brokers sample data differ
significantly enough from the 44 year figure released by the New York
Stock Exchange to declare that the Chicago female shareholders is
different in age from female shareholders in general. Use alpha=0.5. If
no significant difference is noted, what is the brokers probability of
committing a type II error if the average age of a female shareholder is
actually 45 years? 46 years? 47 years? 48 years? Construct an OC
curve for the data.Construct a power curve for the data.
A small business has 37 employees. Because of the uncertain demand

for its product, the company usually pays overtime on any given
week. The company assumed that about 50 total hours of overtime
per week is required and that the variance on this figure is about
25. Company officials want to know whether the variance of
overtime hours has changed. Given here is a sample of 16 weeks
of overtime data in hours per week. Assume hours of overtime are
normally distributed. Use these data to test the null hypothesis
that the variance of overtime data is 25. Let alpha=0.10.
57 56
52
44
58 53
44
44
48 51
56
48
63 53
51
50
9.30 A manufacturing company produces bearings. One line of

bearings is specified to be 1.64 centimeters (cm) in diameter. A major
customer requires that the variance of the bearings be more than .001
cm2. The producer is required to test the bearings before they are
shipped, and so the diameters of 16 bearings are measured with a
precise instrument, resulting in the following values. Assume bearing
diameters are normally distributed. Use the data and .01 to test the
data to determine whether the population of these bearings is to be
rejected because of too high a variance.
1.69
1.62
1.63
1.70
1.66
1.63
1.65
1.71
1.64
1.69
1.57
1.64
1.59
1.66
1.63
1.65
Statistical Inferences about Two Populations

In some research design, the sampling plan calls for selecting
two independent samples.
The object might be to determine whether the two samples
come from the same population or if they come from the
different populations, to determine the amount of difference
in the populations.
Examples : Whether the effective ness of two brands of
toothpaste differs or whether two brands of tires wire
differently.
A business analyst might want to compare the expenditures
on shoes made in 1992 with those from 2002 in an effort to
determine whether any change occurred over time.
Hypothesis Testing and Confidence Intervals About the

Difference in Two Means Using the Z Statistic.
The Central Limit Theorem states that the difference in

two sample means, x1 x2 ,is normally distributed for large
sample sizes ( both n1 and n2 30 ) regardless of the
shape of the populations.
It can be shown that
x1 x2 1 2
x1 x2
n1
n2
2
1
Z formula for the difference in two sample means for

(independent Samples )
1 the mean of pop1
n1
and n2 30
x1 x2 1 2
21 2 2
n2
n1
2 the mean of pop1
n1 the mean of pop1

n1
the mean of pop1
If the populations are normally distributed on the measurement being studied
and if the population variances are known, above formula can be used for small
sample sizes
Formula when population variances are unknown and samples are large
x x2 1 2
z 1
S 21 S 2 2
n2
n1
Confidence Intervals
Confidence intervals for the difference in two populations means
21 2 2
x1 x2 z 1 2 x1 x2 z
n2
n1
21 2 2
n2
n1
When population standard deviations are unknown and large

samples then the formula is
s 21 s 2 2
x1 x2 z 1 2 x1 x2 z
n1 n2
s 21 s 2 2

n1 n2
Exercise 10.6
The Bureau of Labor Statistics shows that the average insurance
cost to a company per employee per hour is $ 1.84 for managers
and $ 1.99 for professional specialty workers. Suppose these
figures were obtained from 35 managers and 41 professional
specialty workers and their respective standard deviations are $
0.38 and $ 0.51. Calculate 98% confidence interval to estimate the
difference in the mean hourly company expenditures for insurance
for these two groups. What is the value of the point estimate? Test
to determine whether there is a significant difference in the hourly
rates employers pay for insurance between managers and
professional specialty workers. Use 2% level of significance.
Exercise 10.7 :
A companys auditor believes the per diem cost in Nashville,
Tennessee, rose significantly between 1992 and 1999. To test
this belief, the auditor samples 51 business trips from
companys records for 1992, the sample average was $ 190 per
day, with a samples standard deviation of $ 18.50. The auditor
selects a second random sample of 47 business trips from
companys record for 1999; the samples average was $ 198 per
day, with a standard deviation of $ 15.60.If he uses a risk of
committing a Type-I error of 0.01, does the auditor find that
the per diem average expense in Nashville has gone up
significantly?
Hypothesis Testing and Confidence Intervals About the

Difference in Two Means :
Small Independent Samples and Population Variances Unknown
Each of the two populations is normally distributed.
The two samples are independent.
At least one of the samples is small, n30
The Values of the populations variances are unknown.
The variances of the two populations are equal.
22
2
2
The t formula to test the difference in means assuming 1 2
x1 x2 1 2
t
s 21 n1 1 s 2 2 n1 1
n1 n2 2
df n1 n2 2
2
1
1 1

n1 n2
is
When the population variances are not assumed to be equal

t
x1
x2
df
s 21
s22
n1
n2
s 21
s 21
n1
n1

n1 1
n1 1
Exercise 10.17 (page 360). Based on an indication that mean daily car
rental rates may be higher for Boston than in Dallas, a survey of eight
car rental companies in Boston is taken and the sample means car
rental rate $ 47, with a standard deviation of $ 3. Further, suppose a
survey of nine car rental companies in Dallas results in a sample
mean of $ 44 and a standard deviation of $3. Use alpha =0.01 to test
to determine whether the average daily car rental rates in Boston are
significantly higher than those in Dallas. Assume car rental rates are
normally distributed and the population variances are equal.
Statistical Inferences for Two Related or Dependent Samples
Formula to test hypothesis for dependent populations.
Where
d D
t
sd
n
df n 1
N= number of pairs
d=sample difference in pairs
D=mean population difference
Sd=s.d. of samples difference
D=mean sample difference
Formula for d and S
d
d
n
sd
d d
n 1
d
2
n 1

Eleven employees were put under the care of the company nurse because of
high cholesterol readings. The nurse lectured them on the dangers of this
condition and put them on the new diet. Shown are the cholesterol readings
of the 11 employees both before the new diet and one month after use of the
diet began. Construct a 98% confidence interval to estimate the population
mean difference of cholesterol readings for people who are involved in this
program. Assume differences in cholesterol readings are normally distributed
in the population.
Employee
Before
After
255
197
230
225
290
215
242
215
300
240
250
235
215
190
Employee
Before
After
230
240
225
200
10
219
203
11
236
223
Use an alpha of 0.02 to test to determine whether there is a

significant difference in the cholesterol readings.
Statistical Inferences about two population proportions, P1-P2

Applications
1.
Comparing market share of a product for two different markets.
2.
Comparing the proportions of defective products from one period to

another.
3.
Studying the difference in the proportion of female customers in two

different regions.
For large
n p samples
5
1
n1 q1 5
n1 p 2 5
n1 q 2 5
3
4
p1proportions
P2normally distributed with
p 2 P1 is
The difference in sample
p1 p 2
P1.Q1 P2 .Q2
n1
n2
Z Formula for the Difference in Two Population Proportions
p 1 p 2 p1 p2
Z
P1.Q1 P2 .Q2
n1
n2
1 = Proportion from samples 1

p
2 = Proportion from samples 2
p
n1
= size of sample 1
n2
= size of sample 2
p1
= Proportion from population 1
p2
= Proportion from population 2
Q1 =1-P1
Q2=1-P2
Z Formula to Test the Difference in Population Proportions
p 1
2 p1 p2
p
P .Q
X1 X 2
P
n1 n2
n P
n1 P
1
2 2
n1 n2
Q 1 P
1
1
n2
n1
Confidence Interval to Estimate P1-P2

p1 p 2 z
p 1 q1 p 2 q 2
p1 p2 p 1 p 2 z
n1
n2
p 1q1 p 2 q 2
n1
n2
Exercise 10.35. Companies that recently developed new products were

asked to rate which activities are most difficult to accomplish with new
products. Options included such activities as assessing market potential,
market testing, finalizing the design, developing a business plan, and
the like. A researcher wants to conduct a similar study to compare the
result between the two industries; the computer hardware industry and
the banking industry. He takes a random sample of 56 computer firms
and 89 banks. The researcher asks whether market testing is most
difficult activity to accomplish in developing a new product. Some 48%
of the sampled computer company and 56% of the sampled banks
respond that it is most difficult activity. Use a level of significance of
0.20 to test whether there is a significant difference in the responses to
the question from these two industries.
Exercise 10.36.A large production facility uses two machines to

produce a key part for its main product.Inspectors have expressed
concern about the quality of the finished product. Quality control
investigation has revealed that the key part made by the two machines
is defective at times. The inspectors randomly sampled 35 units of the
key part from each machine. Of those products by machine A, five
were defective. Seven of the 35 sampled parts from machine B were
defective. The production manager is interested in estimating the
difference in proportions of the populations of parts that are defective
between machine A and machine B.From the sample information,
compute a 98% confidence interval for this difference.
10.41 Suppose the data shown here are the result of a survey to
investigate gasoline prices. Ten service stations were selected
randomly in each of the two cities and the figures represent the
prices of a gallon unleaded regular gasoline on a given day. Use the
F test to determine wheher there is a significant difference in the
variances of the prices of unleaded regular gasoline between two
cities. Let =1.0. Assume gasoline prices are normally distributed.
City 1
City2
1.18
1.07
1.13
1.08
1.05
1.19
1.15
1.14
1.13
1.17
1.21
1.12
1.14
1.13
1.03
1.14
1.14
1.13
1.09
1.11
Exercise 10.42. How long are resale houses on the market?

One survey by the Houston Association of realtors reported
that in Houston, resale houses are on the market an average
112 days. Of course, the length of time varies by market.
Suppose random samples of 13 houses in Houston and 11
houses in Chicago that are for resale are traced. The data
shown here represent the number of days each house was on
the market before being sold. Use the given data and a 1 %
level of significance to determine whether the population
variances for the number of days until resale are different in
Houston than in Chicago. Assume the numbers of days
resale houses are on the market are normally distributed.
Houston
Chicago
132
126
118
56
138
94
85
69
131
161
113
67
127
133
81
54
99
119
94
137
126
88
93
134
X2 Goodness of-Fit Test

The x2 goodness-of-fit test compares
Expected (theoretical) frequencies
Of categories from a population distribution
To the observed (actual) frequencies
From a distribution to determine whether
There is a difference between what was
Expected and what was observed.
This test can be used to determine whether the observed arrivals at teller
windows at a bank are Poisson distributed, as might be expected. Also in
the paper industry, manufactures can use the Chi-square good-ness-of-fit
test to determine whether the demand for paper follows a uniform
distribution throughout the year.
X2 Goodness-of-Fit Test
Formula to be used to compute the chi-square goodness-of-fit test.
2
f o f e
fe
df = k - 1 - c
where : f o frequency of observed values
f e frequency of expected values
k number of categories
c = number of parameters estimated from the sample data
The Chi-square Distribution

Let be a positive integer. Then a r.v. X is said to have a chisquared distribution with parameter if the pdf of X is
f ( x, )
2 ( )
2
0
The parameter
X.

x
1
2

2
x0
x0
is called the number of degrees of freedom of
Properties of the Chi-square distribution

(1) Distribution is a continuous probability distribution.
(2) The exact shape of the distribution depends upon the number of degrees
of freedom
. For different values of
, we shall have different
shapes of the distribution. In general when is small, the shape of the
curve is skewed to the right and as
gets larger, the distribution
becomes more and more symmetrical and can be approximated by the
normal distribution.
(3) The mean of the chi-square distribution is
v and variance 2 v.
(4) The sum of independent chi-square variates is also a chi-square variate .

(5) The chi-square distribution is the sum of the squares of k independent
random variables and there fore can never be less than zero
Exercise 12.7.The general manager of a major league baseball team

believes the ages of purchasers of game tickets are normally distributed.
The following data represent the distribution of ages for a sample of
observed purchasers of major league baseball game tickets. Use the chisquare good ness of fit test to determine whether this distribution is
significantly different from the normal distribution. Assume 0.05 .
Age of purchaser
Frequency
10-under 20
16
20-under 30
44
30-under 40
61
40-under 50
56
50-under 60
35
60-under 70
19
Exercise 12.8. The Springfield Emergency Medical Service keeps

records of emergency telephone calls. A study of 150 five-minutes time
intervals resulted in the distribution of number of calls as follows. For
example, during 18 of the five minutes intervals, no calls occurred. Use the
chi square goodness of fit test and
0.01 to determine whether this distribution is Poisson.
Number of calls
(per 5-minute interval)
Frequency
18
28
47
21
16
11
6 or more
9
56
Exercise 12.2 Use the following data and 0.01 to

determine whether the observed frequencies represent a
uniform distribution.
Category
1
19
17
14
18
19
21
18
18
F0
Chi-square test of independence
Goodness of fit test cannot be used to analyze two variables

simultaneously.
The chi-square test of independence can be used to analyze

the frequencies of two variables with multiple categories to
determine whether the two variables are independent.
Example: A market researcher might want to determine whether

the type of soft drink preferred by a consumer is independent
of the consumers age. Financial investors might want to
determine whether the type of preferred stock investment is
independent of the region where the investor resides.
Suppose a business researcher is interested in determining whether

geographic region is independent of type of financial investment. On
a questionnaire, the following two questions might be used to
measure geographic region and type of financial investment.
In which region of the country do you reside?
A. Northeast
B. Midwest
C. South
D. West
Which type of financial investment are you most likely to make

today?
E. Stocks F. Bonds
G. Treasury bills
Type of financial
Investment
Contingency Table
E
A
Geographic B
C
Region
D
nE
nF
G
O13
nG
nA
nB
nC
nD
N
If A and F are independent,

P A F P A P F
P A
P F
P A F
AF
n n
A
N P A F
n n
n
n
Type of Financial
Investment
Contingency Table
E
Geographic
Region
A
B
C
D
e12
nE
nF
nG
nA
nB
nC
nD
N
2 Test of Independence: Formulas

Expected Frequency
ni .n j
N
Where i the row
j the column
ni the total of row i
eij
n j the total of column j

N the total of all frequencies.
Calculated/observed value of

2
fe
fe
Where
df (r - 1)(c - 1)
r number of rows
c number of columns
Exercise: Suppose a business researcher wants to determine whether type of gasoline

preferred is independent of a persons income. She takes a random survey of gasoline
purchasers, asking them one question about the gasoline preference and the second
question about the income. The respondent is to check whether he or she prefers (1)
regular gasoline, (2) premium gasoline, or (3) extra premium gasoline. The
respondent is also check his or her income brackets as being (1) less than $30000, (2)
$30,000 to $49,999, (3) $50,000 to $99,999 or (4) more than $100,000. Using
alpha=0.01, use the chi-square test of independence to determine whether type of
gasoline preferred is independent of the income level.
Solution: Step-1
The hypothesis is
H 0 Type of gasoline is independent of income

H a Type of gasoline is not independent of income.
Step-2. The appropriate statistical test is
f
0 e
fe
2
Step-3. Alpha is 0.01

Step-4. Here there are four rows (r=4) and three columns (c=3). The degrees of
freedom are (4-1)(3-1)=6.
2.01, 6 16.812
If Cal 16.812, reject Ho.

2
If Cal 16.812, do not reject Ho.

2
Step-5. Observed frequencies
Type of
Gasoline
Income
Less than $30,000
$30,000 to $49,999
$50,000 to $99,000
At least $100,000
Regular Premium
85
16
102
27
36
22
15
23
238
88
Extra
Premium
6
13
15
25
59
107
142
73
63
385
Expected Frequency:
n
n
ij
N
107 238
e11 385
66.15
12
107 88
385
24 .46
e13
107 59
385
16.40
Type of
Gasoline
Income
Less than $30,000
$30,000 to $49,999
$50,000 to $99,000
At least $100,000
Regular Premium
(66.15)
(24.46)
85
16
(87.78)
(32.46)
102
27
(45.13)
(16.69)
36
22
(38.95)
(14.40)
15
23
238
88
Extra
Premium
(16.40)
6
(21.76)
13
(11.19)
15
(9.65)
25
59
107
142
73
63
385
2 Calculation
f o f e
f
88 66.15
16 24.46
27 32.46
66.15
102 87.78
87 .78
36 4513
.
45.13
70.78
38.95
24 .46
6 16.40
13 21.76
32 .46
15 1119
.
23 14.40
16.69
25 9.65
14.40
22 16.69
15 38.95
16.40
21.76
11.19
9.65
Conclusion :
df = 6
Non rejection
region
16.812
Cal 70.78 16.812, reject Ho.

2
0.01
Exercise 12.13. Use the following contingency table and the chi-square test of
independence to determine whether social class is independent of number of
children in a family.
Social Class
Lower
Middle
Upper
18
31
38
23
70
2 or 3
34
97
58
189
47
31
30
108
97
184
117
398
Number of
Children
H0 : Social Class is independent of Number of Children

Ha : Social Class is not independent of Number of Children
e11
= 7.56
e31
= 46.06
e12
= 14.3
e31
= 87.38
e13
= 9.11
e31
= 55.56
e21
= 17.06
e31
= 26.32
e22
= 32.36
e31
= 49.93
e23
= 20.58
e31
= 31.75
Social Class
Lower
Middle
Upper
(7.56)
7
(14.33)
18
(9.11)
6
31
(17.06)
9
(32.36)
38
(20.58)
23
70
2 or 3
(46.06)
34
(87.38)
97
(55.56)
58
189
(26.32)
47
(49.93)
31
(31.75)
30
108
97
184
117
398
Number
of
Children
Analysis of Variance
Experimental Design
A plan and a structure to test hypotheses in which the
researcher controls or manipulates one or more variables.
Independent Variable
Treatment variable is one that the experimenter controls
or modifies in the experiment.
Classification variable is a characteristic of the
experimental subjects that was present prior to the
experiment, and is not a result of the experimenters
manipulations or control.
Example : Suppose a manufacturing organization produces a

valve that is specified to have an opening of 6.37 centimeters.
Quality controllers within the company might decide to test to
determine how the openings for produced valves vary among
four different machines on three different shifts.
Here the independent variable is the type of machine and work
shift. These are also classification variables as these are existed
prior to the study.
Levels or Classifications are the subcategories of the
independent variable used by the researcher in the
experimental design.
In the valve experiment four levels or classification of
machines with in the independent variable machine type are
used : Machine1, Machine 2, Machine 3, and Machine 4
Dependent Variable
The response to the different levels of the independent variables.
In the valve experiment the dependent variable is the size of the
opening of the valve.
Analysis of Variance : experimental designs are analyzed
statistically by a group of techniques known as Analysis of
Variance.
Example : Suppose the measurements for the opening of 24 valves
randomly selected from an assemble line are given below.
6.26
6.19
6.33
6.26
6.50
6.19
6.44
6.22
6.54
6.23
6.29
6.40
6.23
6.29
6.58
6.27
6.38
6.58
6.31
6.34
6.21
6.19
6.36
6.56
Analysis of Variance : Assumptions

Observations are drawn from normally distributed populations.
Observations represent random samples from the populations
Variances of the populations are equal.
One-way ANOVA : Computational Procedure
If k samples are being analyzed, the following hypotheses are
being tested in a one-way ANOYA.
H 0 : 1 2 3 .... k
Ha : At least one of the means is different from the others
MSC
F
MSE
If FFc, reject Ho
If F Fc, do not reject Ho
One-Way ANOVA : Sums of Squares Definitions

Total sum of squares = error sum of squares + between sum of
squares
SST = SSC+SSE
n1
x
c
j 1
i 1
x n x
2
ji
j 1
n1
x x
c
j 1
i 1
ij
Where : Particular member of a treatment level

j a treatment level
C number of treatment levels
n j number of observations in a given treatment level
X grand mean
x j mean of a treatment group or level
xij individual value
One-Way ANOVA : Computational Formulas

SSC
n x
c
j 1
SSE
MSC
i 1
nj
i 1
SSC
df c
SSE
MSE
df E
MSC
MSE
nj
x
c
j 1
x
c
j 1
SST
ij
df c c 1
j xj
x
df E N C
df T N 1
Where : a Particular member of a treatment level

j a treatment level
C number of treatment levels
n j number of observations in a given treatment level
X grand mean
x j Column mean
xij individual value
The F-distribution is named in honour of R.A. Fisher who first

studied it in 1924. This distribution is usually defined in terms of
the ratio of the variances of two normally distributed populations.
The quantity is s 21 distributed as F distribution with v1 n1 1and
21
s22
22
If
v 2 n2 1 degrees of freedom.
s 21
2 then the statistic F 2 follows F distribution with
s 2
v1 and v 2 degrees of freedom

Properties of F-distribution
1) The F-distribution is positively skewed and its skew ness

decreases with increase in v1 and v 2
2) The value of F must always be positive or zero since variances
are squares and can never assume negative values. Its value will
always lie between o and infinity.
3) The shape of the F distribution depends upon the number of

degrees of freedom
4) The mean and variance of the F-distribution are
Mean
v2
for v 2
v2 2
Variance
2v 2 2 v1 v 2 2
v1 v 2 2
v2 4
for v 2
In ANOVA situation the F value is a ratio of the treatment

variance to the error variance.
Exercise : 11.11. A milk company has four machines that fill

gallons jugs with milk. The quality control manager is
interested in determining whether the average fill for these
machines is the same. The following data represent random
samples of fill measures (in quarts) for 19 jugs of milks filled
by the different machines. Use
The test the hypothesis.
Machine 1
Machine 2
Machine 3
Machine 4
4.05
3.99
3.97
4.00
4.01
1.02
3.98
4.02
4.02
4.01
3.97
3.97
4.04
3.99
3.95
4.01
4.00
4.00
4.00
Ex-11.13 A management consulting company presents a three day

seminar on project management to various clients. The seminar is
basically the same each time it is given. However, sometimes it is
presented to high level managers, the seminar facilitators believe
evaluations of the seminar may vary with the audience. Suppose the
following data are some randomly selected evaluations scores from
different levels of managers who attended the seminar. The ratings
are on a scale from 1to 10, with 10 being the highest. Use a one way
ANOVA to determine whether there is a significant difference in the
evaluations according to managers level. Assume alpha=0.05.
Discuss the business implications of your findings.
High Level
Mid Level
Low Level
____________________________________________________
7
10
10
8
_____________________________________________________
A Factorial Design (Two Way ANOVA)

Some experiments are designed so that two or more treatments
(independent variable) are explored simultaneously. Such
experimental designs are referred to as factorial designs.
Factorial Designs with two treatments.
Column Treatment
.
Row
Treatment
.
.
.
.
.
.
.
.
.
.
.
.
.
Cells
.
.
.
Here there are two independent variables (two treatments) and

there is an intersection of each level of each treatment. These
intersections are referred to as cells.
Examples: The natural gas industry can design an experiment to

study usage rates and how they are affected by temperature and
precipitation. Theorizing that the outside temperature and type of
precipitation make a difference in natural gas usage, industry
researchers can gather usage measurements for a given
community over a variety of temperature and precipitation
conditions. At the same time they can make an effort to determine
whether certain types of precipitation combined with certain
temperature levels, affect usage rates differently than other
combination of temperature and precipitation (interaction effect)
Statistically testing the Factorial Design

Analysis of variance is used to analyze data gathered from
factorial designs. For factorial design with two factors
(Independent variables) a two way ANOVA is used to test
hypothesis statistically.
Two-Way ANOVA: Hypotheses: Following hypothesis is

tested for two way ANOVA
Row Effects:
Ho:
Ha:
Columns Effects: Ho:
Ha:
Interaction Effects: Ho:
Ha:
Row Means are all equal.

At least one row mean is different from the others.
Column Means are all equal.
At least one column mean is different from the others.
The interaction effects are zero.
There is an interaction effect.
Formulas for computing two-way ANOVA

R
SSR nC ( X i X )
df R R 1
df C C 1
i 1
C
SSC nR ( X j X )
j 1
SSI n ( X ij X i X j X )
i 1 j 1
SSE ( X ijk X ij )
i 1 j 1 k 1
C
SST ( X ijk X )
c 1 r 1 a 1
SSR
R 1
SSC
MSC
C 1
SSI
MSI
R 1 C 1
SSE
MSE
RC n 1
MSR
where :
n = number of observations per cell
C = number of column treatments
df I R 1 C 1
R = number of row treatments
df E RC n 1
i = row treatment level
j = column treatment level
df T N 1
k = cell member
MSR
Xijk = individual observation
FR
MSE
Xij = cell mean
MSC
FC
Xi = row mean
MSE
MSI
X j = column mean
FI
X = grand mean
MSE
Exercise 11.42. Children are gradually believed to have

considerable influence over their parents in the purchase of certain
items, particularly food and beverage items. To study this notion
further, a study is conducted in which parents are asked to report
how many food and beverage items purchased by the family per
week are purchased because of the influence of their children.
Because the age of the child may have an effect on the study,
parents are asked to focus on one particular child in the family, for
the week, and to report the age of the child. Four age categories are
selected for the children: 4-5 years, 6-7 years, 8-9 years, and 10-12
years. Also because the number of children in the family might
make a difference, three different sizes of families are chosen for
the study: families with one child, families with two children, and
families with three or more children. Suppose the following data
represent the reported number of child-influenced buying incidents
per week. Use the data to compute a two way ANOVA. Let
alpha=0.05.
Number of Children in Family

_________________________________________________
1
3 or more
__________________________________________________
4-5
Age of
6-7
Child
(years)
8-9
10-12
____________________________________________________
Correlation and Regression

Correlationisameasureofdegreeofrelatednessofvariables.
In a bi-variate distribution we may be interested to find out if there is

anycorrelationbetweenthetwovariablesunderstudy.Ifthechangein
onevariableaffectsthechangeintheothervariablesthevariablesare
saidtobecorrelated.
IFthetwovariablesdeviateinthesamedirection,correlationissaidto
bepositive.
Examples: Height and weight of a group of persons. Income and

expenditure.
If the variables constantly deviate in the opposite direction, the

correlationissaidtobenegative.
Examples: Price and demand of a commodity. Volumeandpressure

ofaperfectgas.
Perfect correlation: Correlation is said to be perfect if the deviation in
onevariableisfollowedbyacorrespondingproportionaldeviationinthe
other.
Methods of ascertaining whether two variables are correlated or

not.
I.
Scatterdiagrammethod.
II.
II.
III.
III. Spearmansrankcorrelationcoefficient.
KarlPearsonsCoeffofCorrelation.
Scatter diagram method.

Plotthepointsonthegraph.
Ifallthepointslieonastraightlinefallingfromlowerlefthandcornertothe
upperrighthandcorner,correlationissaidtobeperfectlypositive.
Iftheptslyingonastraightlinerisingfromtheupperlefthandcornertothe
lowerrighthandcorner,correlationissaidtobeperfectlynegative.
If the plotted points fall in a narrow band there will be high degree of
correlationbetweenthevariables.
If the points are widely scattered in the diagram it indicated a very low
degreeofrelationship.
Given the following pair of values
Capitalemployed1
23456789
1112
(CroresinRs.)
Profit(LakhsinRs.)35479810111214
(a)Makeascatterdiagram
(b) Do you think that there is any correlation between profits and capital
employee?Isitpositive?Isithighorlow?
KarlPearsonCoefficientofCorrelation.
As a measure of intensity or degree of linear

relationship between two variables, Karl Pearson a British
biometrician, developed a formula called correlation
coefficient
Correlationcoefficientbetweentworandomvariables
X and Y, usually denoted by r (X, Y) or , is numerical
measureoflinearrelationshipbetweenthemandisdefined
as
Cov ( x, y )
( X X ) (Y Y )
r ( x, y )
2
2
( X X ) (Y Y )
x y
1 r ( X , Y ) 1
Exercise. Given below are the monthly income and their

netsavingsofasampleof10supervisorystaffbelongingto
afirm.Calculatethecorrelationcoefficient.
Employee
Number
10
Monthly
Income
780
360
980
250
750
820
900
620
650
390
NetSavings
84
51
91
60
68
62
86
58
53
47
No
xy
780
84
130
18
16900
324
2340
360
51
-290
-15
84100
225
4350
980
91
330
25
108900
625
8250
250
60
-400
-6
160000
36
2400
750
68
100
10000
200
820
62
170
-4
28900
16
-680
900
86
250
20
62500
400
500
620
58
-30
-8
900
64
240
650
53
-13
169
10
390
47
-260
-19
67600
361
4640
Total
6500
660
539800
2224
27040
xy
x2 y2
27040
(539800) (2224)
0.78 (approx )
Rank Correlation Coefficient.

This method of finding out the covariability or the lack of it between two
variables was developed by the British psychologist Charles Edward
Spearmanin1904.
The measure is specially useful when the quantitative measure of certain
factors (such as in the evaluation of leadership ability or the judgment of
female beauty) cannot be fixed, but the individuals in a group can be
arranged in order thereby obtaining for each individuals a number
indicatinghisorherrankinagroup.
SpearmansRankCorrelationCoefficient
6 D2
R 1
N (N 2 1)
WhereR=rankcoeffofcorrelation
D=Differenceofranksbetweenpaireditemsintwoseries.
N=Numberofdatapoints.
Exercise. Two house wives, Geeta and Rita, asked to

expenses their preferences for different kinds of
detergents,gavethefollowingreplies:
Detergent
Geeta
Rita
10
10
To what extent the preferences of these two ladies go

together.
Inordertofindouthowfarthepreferencesfordifferentkinds
of detergents go together, we will calculate the rank
correlationcoeff.
Detergent
Rank
Geeta
10
10
N=10
by
RankbyRita
6 D2
66
R1 1 3
1
0.960
990
N N
Thus the preference of these two ladies agree
verycloselyasfarastheiropinionondetergentis
concerned.
Equal ranks or tie in ranks.

Insomecasesitmaybenecessarytoassignequalrankto
twoormoreindividualorentries.
Acorrectionisadjustedtotheformula.
1
1
3
6 D
( m1 m1 )
(m 23 m 2 ) ....
12
12
R1 1
N3 N
2
Where m
1 is the number of item whose rank is same in
group1Andsoon.
Exercise. An examination of eight applicants for a clerical

postwastakenbyafirm.Fromthemarksobtainedbythe
applicants in the Accountancy and Statistics papers,
computetherankcorrelationcoefficient.
Applicant
Marks
in
Accountancy
15
20
28
12
40
60
20
80
Marks
Statistics
40
30
50
30
20
10
30
60
in
Applicant
Marksin
Accountancy
Ranksassigned
Marks
Statistics
15
30
16.00
20
3.5
30
0.25
28
50
4.00
12
30
9.00
40
20
16.00
60
10
36.00
20
30
0.25
80
3.5
60
0.00
N=8
in
Rank
Assignedby
1
1
6 D 2 (m13 m1 ) (m23 m2 ) ....

12
12
R1 1
N3 N
m1 2, m2 3
1 3
1 3
6 81.5 (2 2) (3 3)
12
12
1 684 0
Thus R 1
504
82 8
There is no correlation between the marks obtained in the

twosubjects.
Merits and limitations of rank method.

1. This method is is simpler to understand and easier to apply
compared to the Karl Persons method. The answers obtained by
this method and the Karl Pearsons method is the same provided
the no value is repeated that is all the items are different.
2. Where the data are of qualitative nature like honesty, efficiency,
intelligence etc this method can be used with great advantage.
3. This method is the only method when we are given the ranks
and not the actual data.
4. Even where actual data are given, rank method can be applied
for ascertaining rough degree of correlation.
Limitations: calculations becomes tedious when the number of data
points exceed 30.
Regression Analysis.
Regression analysis is the process of constructing a

mathematical model or function that can be used to
predictordetermineonevariablebyanothervariable.
bivariate(twovariables)linearregression
-
themostelementaryregressionmodel
dependent variable, the variable to be

predicted,usuallycalledY
independent variable, the
explanatoryvariable,usuallycalledX
predictor
or
DeterministicRegressionModel
Y=0+1X
0and1arepopulationparameters
0and1areestimatedbysamplestatisticsb0andb1
EquationoftheSampleRegressionLine
Y b0 b1 x
where : b0 the sample int ercept

b1 the sample slope
Y the predicted value of Y
b1
X X Y Y
X X
XY nXY XY nXY
2
2
2
2
X nX
X nX
Y
X
b0 Y b1 X
b1
n
n
X Y
SSxy X X Y Y XY
n
X2
SSxx X X X
n
SSxy
b1
SSxx
Y
X
b0 Y b1 X
b1
n
n
2
XY
X Y
n
( X ) 2
2
X
n
Exercise 13.3: A corporation owns several companies.

The strategic planner for the corporation believes dollar
spent on advertising can to some extent be a predictor of
totalsalesdollars.Asanaidtothelongtermplanning,she
gathersthefollowingsalesandadvertisinginformationfrom
severalofthecompaniesfor2002($millions).
Advertisement
Sales
12.5
148
3.7
55
21.6
338
60.0
994
37.6
541
6.1
89
16.8
126
41.2
379
Develop the equn of the simple regression line to predict

salesfromadvertisingexpendituresusingthesedata.
StandardErrorofEstimate
Residuals: Each difference between the actual y values and
the predicted y values
is the error of the regression line at a
y y
givenpoint,andisreferredtoasresiduals.
StandardErrorofEstimate:Toexaminetheerrorofthemodel
we need to calculate he standard error of estimate, which
providesasinglemeasurementoftheregressionerror.
SSE=Sumofthesquaresoftheerror= y y
Standarderrorofestimate= S e
Shortcutformula=
SSE
n2
Y 2 b0 y b1 xy
Exercise 13.25: Determine the equation of the regression

line to predict annual sales of a company from the yearly
stock market volume of shares sold in a recent year.
Computethestandarderroroftheestimateforthismodel.
Doesvolumeofsharessoldappeartobeagoodpredictor
ofacompanyssales?Whyorwhynot?
Company
Annual sales ($
billions)
Annual
Volume
(millionsofshares)
Merk
10.5
728.6
PhillipMorris
48.1
497.9
IBM
64.8
439.1
EastmanKodak
20.1
377.9
Bristol-MyersSquibb
11.4
375.5
GeneralMotors
123.8
363.8
FordMotors
89.0
276.3
Coefficient of Determination.
TheerrorsumofsquaresSSEcanbeinterpretedasmeasureof
howmuchvariationinyisleftunexplainedbythemodel-Thatis
howmuchcannotbeattributedtoalinearrelationship.
Aquantitativemeasureofthetotalamountofvariationinobserved
yvaluesisgivenbythetotalsumofsquares.
The ratio SSE/SST is the proportion of the total variation that
cannotbeexplainedbythesamplelinearregressionmodel,and
1-SSE/SST(anumberbetween0and1)istheproportionofthe
observedyvariationexplainedbythemodel.
The Coefficient of Determination :
SSE
1
SST
The higher the value of r2, the more successful is the simple
linearregressionmodelinexplainingyvariation.
Exercise 13.31. The Conference Board produces a Consumer

Confidence Index (CCI) that reflects peoples feelings about general
business conditions, employment opportunities, and their own income
prospects. Some researchers may feel that consumer confidence is a
functionofthemedianhouseholdincome.ShownherearetheCCIsfor
9 years and the median household incomes for the same 9 years
published by the U.S. Census Bureau. Determine the equation of the
regression line to predict the CCI from the median household income.
Computethestandarderroroftheestimateforthismodel.Computethe
valueofr2.Explainthemeaning.
CCI
MedianHouseholdIncome($1000)
116.8
37.415
91.5
36.770
68.5
35.501
61.6
35.047
65.9
34.700
90.6
34.942
100.0
35.887
104.6
36.306
125.4
37.005
PredictionIntervaltoEstimateYforagivenvalueofX
Y fa
2
,n 2
1 X0 X
1
n
SSxx
Se
Where : X 0 a particular value of X

SSxx X
13.39.ASpecialistinhospitaladministrationstatedthatthenumber
of FTEs (full time employees) in a hospital can be estimated by
countingthenumberofbedsinthehospital(acommonmeasureof
hospital size). A health care business researcher decided to
developaregressionmodelinanattempttopredictthenumberof
FTEs of a hospital by the number of beds. She surveyed 12
hospitalsandobtainedthefollowingdata.Thedataarepresented
insequence,accordingtothenumberofbeds.
Numberofbeds
FTEs
Numberofbeds
FTEs
23
69
50
138
29
95
54
178
29
102
64
156
35
119
66
184
42
126
76
176
46
125
78
225
Constructa90%intervalforasinglevalueofy.Usex=100.
Exercise 13.40: Construct a 98% prediction interval for a

singlevalueofyforproblem13.3.Usex=20

QT II (Hy I) & (Hy II)

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

QT II (Hy I) & (Hy II)

Transféré par

Droits d'auteur :

Formats disponibles

Hypothesis Testing

Steps of Hypothesis Testing

Null and the Alternative Hypothesis.

Type I and Type II error

Rejecting the null hypothesis when it is true

Accepting the null hypothesis when it is

Null hypothesis is either

Type I Error Correct

Null and Alternative Hypotheses : Example

Rejection and non rejection region

Non Rejection Region

One tail and two tail test

Two tail test.

Non Rejection Region

Testing of Hypothesis about a population mean using the z-Statistic

2. When population standard deviation is unknown

3. Testing the Mean with a Finite Population

P-Value Method to Test Hypothesis

is given in the p-value method

for which the null hypothesis can be rejected.

Procedure for two tailed test

Exercise 9.5 (Page 306)

Testing Hypothesis about a Population Mean Using the t-statistic

Assume gasoline prices for a region are normally distributed. Do the

Testing Hypothesis about a Proportion

Example 9.22. The Independent Insurance Agents of America conducted a survey

Solving for Type II Errors (Determine the Probability of

Procedure for computing the type-II error

A soft drink company is filling 12 oz. Cans with cola

Decision : Do not reject null hypothesis

If mu actually equals 11.99 ounces, what is the probability of

The value of Z yields an area of .3023. The probability of

Hence there is an 80.23 % chance of committing a type-II error if

Values and Power Values for the Soft-Drink Example

Operating characteristic curve : Plotting the against the

Exercise 9.37 (Page 330)

A small business has 37 employees. Because of the uncertain demand

9.30 A manufacturing company produces bearings. One line of

Statistical Inferences about Two Populations

Hypothesis Testing and Confidence Intervals About the

The Central Limit Theorem states that the difference in

It can be shown that

Z formula for the difference in two sample means for

1 the mean of pop1

2 the mean of pop1

n1 the mean of pop1

When population standard deviations are unknown and large

Hypothesis Testing and Confidence Intervals About the

When the population variances are not assumed to be equal

Statistical Inferences for Two Related or Dependent Samples

Formula to test hypothesis for dependent populations.

Exercise 10.27 (Page 368)

Use an alpha of 0.02 to test to determine whether there is a

Statistical Inferences about two population proportions, P1-P2

Comparing market share of a product for two different markets.

Comparing the proportions of defective products from one period to

Studying the difference in the proportion of female customers in two

Z Formula for the Difference in Two Population Proportions

1 = Proportion from samples 1

= Proportion from population 1

= Proportion from population 2

Z Formula to Test the Difference in Population Proportions