Académique Documents
Professionnel Documents
Culture Documents
Kiriakos Vlahos
ALBA 2016
Session overview
Descriptive Statistics
The Normal Distribution
Sampling
Confidence Intervals
Hypothesis Testing
Kiriakos Vlahos
What is statistics good for?
Descriptive Statistics Inferential Statistics
Collect Predict and forecast values of
Organize population parameters
Summarize Test hypotheses (draw
Display conclusions) about values of
population parameters
Analyze
Make decisions
Kiriakos Vlahos
Samples and Populations
A population consists of the set of all
measurements in which we are interested.
A sample is a subset of the measurements
selected from the population.
A census is a complete enumeration of every
item in a population.
Kiriakos Vlahos
Samples and Populations
A population consists of the set of all
measurements in which we are interested.
A sample is a subset of the measurements
selected from the population.
A census is a complete enumeration of every
item in a population.
Kiriakos Vlahos
Descriptive Statistics
Measures of location
Percentiles and quartiles
Mean, median and mode (measures of central tendency)
Measures of variability
Range
Interquartile range
Variance
Standard deviation
Other summary measures
Skewness
Kiriakos Vlahos
Measures of central tendency The
arithmetic mean
Mean
Commonly referred to as average
The sum of the observed values divided by the
number of observations.
(in Excel: =AVERAGE(range) )
Population Mean Sample Mean
=1 =1
= =
Kiriakos Vlahos
Other measures of central
tendency
Median
Middle value when sorted in terms of magnitude
(in Excel: =MEDIAN(range) )
50% of the values lie below the median
Mode
Most frequently occurring value (in Excel:
=MODE(range) )
Kiriakos Vlahos
Measures of central
tendency - Example
Mean = 110/11 = 10
Median = 8
Length of Stay (LOS) of 11
hospital patients in days
5
5 Mode = 6 and 9
6
6
6
8
9
9
9
9
38
Sum 110
Kiriakos Vlahos
On the meaning of
average
Things look slightly better for
the government if you take
the median rather than the
average income. The median
is the mid-point between the
highest and the lowest
incomes in a given group and
considered by statisticians to
be a more reliable figure.
Daily Telegraph
Kiriakos Vlahos
Income distributions
Page 11
Measures of Variability or
Dispersion
Range
Difference between maximum and minimum
values
Interquartile Range (IQR)
Difference between third and first quartile (Q3 -
Q1)
Variance
Average*of the squared deviations from the mean
Standard Deviation
Square root of the variance
Kiriakos Vlahos
Quartiles and the Interquartile
Range
Quartiles are the percentage points that
break down the ordered data set into
quarters.
The first quartile is the point below which lie
1/4 of the data.
The second quartile is the point below which
lie 1/2 of the data. This is also called the
median.
The third quartile is the point below which lie
3/4 of the data.
Kiriakos Vlahos
Variance and Standard Deviation
= 2 = 2
Kiriakos Vlahos
Calculation of Sample
Variance
= 15.85
2
=1(
)
2 = =
( 1)
378.55
= 19.923
(20 1)
= 2 = 19.923 = 4.46
Kiriakos Vlahos
Empirical Rule
Kiriakos Vlahos
Comparing measures of
dispersion
Measure Advantages Disadvantages
Range Easily understood, Based on two observations,
familiar Grossly distorted by outliers,
Descriptive measure only
Inter-quartile Easily understood Not well known,
range Descriptive measure only
Variance Mathematically tractable Wrong units
No intuitive appeal
Standard Mathematically tractable Too-involved for descriptive
deviation Well known because of purposes
its use in various
theories
Standard measure of risk
in finance
Kiriakos Vlahos
Skewness
Symmetric
Kiriakos Vlahos
Skewness
Skewed to left
Kiriakos Vlahos
Skewness
Skewed to right
Kiriakos Vlahos
Skewness
Skewness
Measure of asymmetry of a frequency distribution
Negatively skewed or skewed to left
Symmetric or unskewed
Positively skewed or skewed to right
Coefficient of skewness
3( )
=
Kiriakos Vlahos
Discrete and Continuous
Random Variables
A discrete random variable: A continuous random variable:
counts occurrences measures (e.g.: height, weight, speed,
has a countable number of possible values value, duration, length)
has discrete jumps between successive has an infinite number of possible
values values
has measurable probability associated moves continuously from value to
with individual values value
probability is height has no measurable probability
associated with individual values
For example: probability is area
Binomial Binomial: n=3 p=.5 Minutes to Complete Task
n=3 p=.5 0.4 For example:
0.3
In this case, the
0.3
x P(x) shaded area 0.2
P(x)
P(x)
0 0.125 0.2
presents the
1 0.375 0.1
0.1 probability that
2 0.375
3 0.125 0.0 the task takes 0.0
1 2 3 4 5 6
1.000
0 1
C1
2 3
between 2 and 3 Minutes
minutes.
Kiriakos Vlahos
The Normal Distribution
Example: Assume that we have a forecast of the /$ exchange rate in three months
time (call this X- A random variable whose value we cannot know)
Forecasted value: 1 = $1.65 with SD (of the forecast error) = 5c
Kiriakos Vlahos
IQ distribution
Source : mindhacks.com
Kiriakos Vlahos
Page 24
The Normal Probability
Distribution
The normal probability density function:
N o rm a l D is trib u tio n : = 0 , = 1
0.4
0.3
f(x)
0.2
0.1
0.0
-5 0 5
x
Kiriakos Vlahos
Normal Probabilities
(Empirical Rule)
The probability that a normal random
variable will be within 1 standard S ta n d a rd N o rm a l D is trib u tio n
f(z)
variable will be within 2 standard 0 .2
approximately 0.95. 0 .0
Kiriakos Vlahos
Application of the
empirical rule
Exchange rate forecast: x ~ N( 165, 52 )
Kiriakos Vlahos
The Standard Normal
Distribution
The standard normal random variable, Z, is the normal random
variable with mean = 0 and standard deviation = 1: Z~N(0,12).
Standard Normal Distribution
0 .4
0 .3
=1
f( z)
{
0 .2
0 .1
0 .0
-5 -4 -3 -2 -1 0 1 2 3 4 5
=0
Z
Kiriakos Vlahos
Normal Probability
Distributions
All of these are normal probability density functions, though each has a different mean and variance.
0.3
f(w)
f(y)
f(x)
0.2 0.1 0.1
0.1
0.3 Consider:
P(39 W 41) The probability in each case is
f(z)
0.2
0.1
P(25 X 35) an area under a normal
0.0
P(47 Y 53) probability density function.
-5 0
z
5 P(-1 Z 1)
Kiriakos Vlahos
The Transformation of
Normal Random Variables
The transformation of X to Z:
X - m
Z = Normal Distribution: =50, =10
s
0.07
0.06
Transformation 0.05
0.04
f(x)
(1) Subtraction: (X - x) 0.03
0.02 =10
{
Standard Normal Distribution 0.01
0.4 0.00
0 10 20 30 40 50 60 70 80 90 100
X
0.3
f(z)
0.2
(2) Division by x)
{
Z
Kiriakos Vlahos
Using the Normal
Transformation
The monthly starting salaries MBA
X of recent MBA graduates
z = follows the normal
distribution with a mean of
$4,000 and a standard
deviation of $400. What is
the z-value for a salary of
$4,400?
$4,400 - $4000
= $400
= 1.00
Kiriakos Vlahos
Finding probabilities of the Standard Normal
Distribution:P(Z > 1.56)
0.2
0.1
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
Kiriakos Vlahos
Finding Probabilities of the Standard Normal
Distribution: P(1< Z < 2)
0.3
Area between 1 and 2
f(z)
0.2
P(1 < Z < 2) = 0.1587 - 0.0228 = .1359
0.1
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
Kiriakos Vlahos
Finding Values of the Standard Normal
Random Variable: P(Z < z) = 0.90
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
To find z such that 0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
P(Z <z) = .90: 0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
1. Find a probability as close as 0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
possible to P(Z>z) = 1-0.9 0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
=0.10 in the table of 0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
standard normal probabilities. 0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
0.9
2. Then determine the value of z 1.0
0.1841
0.1587
0.1814
0.1562
0.1788
0.1539
0.1762
0.1515
0.1736
0.1492
0.1711
0.1469
0.1685
0.1446
0.1660
0.1423
0.1635
0.1401
0.1611
0.1379
from the corresponding row 1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
and column. 1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.28
f(z)
0.2
Kiriakos Vlahos
Z
z = 1.28
Statistics is a Science of
Inference
Statistical Inference:
On basis of sample statistics derived
Predict and forecast values of
population parameters... from limited and incomplete sample
Test hypotheses about values of information
population parameters...
Make decisions...
Kiriakos Vlahos
Examples of sampling
A bank wants
to find what customers think about redesigned shops
how many employees would be prepared to work on Saturdays for an
extra fee
what percentage of the buying public are aware of the existence of a
new banking product
how much are customers worried about the security of online
banking
In all of this cases dealing with the whole population is either
impractical or too expensive and sampling is the only option.
Kiriakos Vlahos
US Election 1948
Kiriakos Vlahos
Unbiased and biased
sampling
Unbiased
Sample
Unbiased, representative sample
drawn at random from the entire
population.
Democrats Republicans
Population
Biased
People who have phones and/or Sample Biased, unrepresentative sample
cars.
drawn from people who have cars
and/or telephones.
Democrats Republicans
Population
Kiriakos Vlahos
Inferential statistics
(sample size)
Population Sampling
Sample
, , p
Confidence Interval
Summarising
H0 , Ha Data
Hypothesis Inference
Statistics
x, s, p
Hypothesis Testing
Reject ?
Say not I have found the truth, but rather, I have found a truth.
--Kahlil Gibran, The Prophet
Kiriakos Vlahos
Sampling Distributions
The sampling distribution of a statistic is the
probability distribution of all possible values the
statistic may assume, when computed from random
samples of the same size, drawn from a specified
population.
The sampling distribution of X is the probability
distribution of all possible values the random
variable X may assume when a sample of size n is
taken from a specified population.
Kiriakos Vlahos
Sampling
Population: = 32.0, = 3.3
27 33
33 33 37 28
30
27 31
40
Take a sample 28 34
27 33 35 31
of size 5, n = 5
30 31
31 33
32 29
30
34 33 31 38
Variation!
Variation among sample means is measured by their standard deviation
Standard deviation of sample means is called standard error
Sampling Distributions
(Continued)
Uniform population of integers from 1 to 8:
Unifo rm D is trib utio n (1 ,8 )
0 .2
= 4.5
2 = 5.25
= 2.2913
P(X)
0 .1
0 .0
1 2 3 4 5 6 7 8
X
Kiriakos Vlahos
Sampling Distributions
(Continued)
There are 8*8 = 64 different but equally- Each of these samples has a sample mean.
likely samples of size 2 that can be drawn For example, the mean of the sample (1,4)
from a uniform population of the integers is 2.5, and the mean of the sample (8,4) is
from 1 to 8: 6.
Samples of Size 2 from Uniform (1,8) Sample Means from Uniform (1,8), n =
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
1 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5
2 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
3 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
4 4,1 4,2 4,3 4,4 4,5 4,6 4,7 4,8 4 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
5 5,1 5,2 5,3 5,4 5,5 5,6 5,7 5,8 5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5
6 6,1 6,2 6,3 6,4 6,5 6,6 6,7 6,8 6 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
7 7,1 7,2 7,3 7,4 7,5 7,6 7,7 7,8 7 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5
8 8,1 8,2 8,3 8,4 8,5 8,6 8,7 8,8 8 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
Kiriakos Vlahos
Sampling Distributions
(Continued)
The probability distribution of the sample mean is called the
sampling distribution of the the sample mean.
X = 4.5
0 .10
2X = 2.625
X = 1.6202
P(X)
0 .05
0 .00
1.0 1 .5 2.0 2.5 3 .0 3.5 4.0 4 .5 5 .0 5.5 6.0 6 .5 7.0 7.5 8 .0
X
Kiriakos Vlahos
Properties of the Sampling
Distribution of the Sample
Mean Uniform Distribution (1,8)
0.2
P(X)
0.1
variance.
P(X)
0.05
0.00
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
X
Kiriakos Vlahos
Population Parameters and the Sampling
Distribution of the Sample Mean
The expected value of the sample mean is equal to the population mean:
E( X ) = = X X
The variance of the sample mean is equal to the population variance divided by the
sample size:
2
V(X) = 2
X
= X
n
The standard deviation of the sample mean, known as the standard error of the mean,
is equal to the population standard deviation divided by the square root of the sample
size:
SD( X ) = = X
X
Kiriakos Vlahos
Sampling from a Normal
Population
When sampling from a normal population with mean and standard deviation , the
sample mean, X, has a normal sampling distribution:
X ~ N (, )
n
This means that, as the sample size Sampling Distribution of the Sample Mean
increases, the sampling distribution of the
0.4
sample mean remains centered on the Sampling Distribution: n =16
population mean, but becomes more 0.3
Sampling Distribution: n =4
compactly distributed around that f(X)
0.2
Kiriakos Vlahos
The Central Limit Theorem
n=5
When sampling from a population with mean 0.25
0.20
P(X)
0.15
0.10
P(X)
0.1
(n >=30).
0.0
X
f(X)
0.2
0.1
0.0
-
X
Kiriakos Vlahos
The Central Limit Theorem Applies to
Sampling Distributions from Any
Population
Normal Uniform Skewed General
Population
n=2
n = 30
X X X X
Kiriakos Vlahos
Some words by W. J.
Youden
The
normal
law of error
stands out in
the experience of
mankind as one of the
broadest generalisations of
natural philosophy. It serves
as the guiding instrument in
researches in the physical and social
sciences and in medicine, agriculture,
and engineering. It is an indispensable
tool for the analysis and the interpretation of the
basic data obtained by observation and experiment.
Kiriakos Vlahos
The Central Limit Theorem -Example
Kiriakos Vlahos
Types of Estimators
Point Estimate
A single-valued estimate.
A single element chosen from a sampling distribution.
Conveys little information about the actual value of the
population parameter or about the accuracy of the
estimate.
Confidence Interval or Interval Estimate
An interval or range of values believed to include the
unknown population parameter.
Associated with the interval is a measure of the
confidence we have that the interval does indeed
contain the parameter of interest.
Kiriakos Vlahos
Confidence Interval or
Interval Estimate
A confidence interval or interval estimate is a range or interval of numbers
believed to include an unknown population parameter. Associated with the
interval is a measure of the confidence we have that the interval does indeed
contain the parameter of interest.
Kiriakos Vlahos
Confidence Interval for when
is known
If the population distribution is normal, the sampling distribution of
the mean is normal.
If the sample is sufficiently large, regardless of the shape of the
population distribution, the sampling distribution is normal (Central
Limit Theorem).
In either case: Standard Normal Di stribution: 95 % Interval
0.4
P 1.96 x + 1.96 = 0.95
n
0.3
n
f(z)
0.2
or 0.1
0.0
x + 1.96
-4 -3 -2 -1 0 1 2 3 4
P x 1.96 = 0.95
n n
z
Kiriakos Vlahos
Confidence Interval for when
is Known (Continued)
Before sampling, there is a 0.95 probability that the interval
1.96
n
will include the sample mean (and 5% that it will not).
That is, x 1.96 is a 95% confidence interval for .
n
Kiriakos Vlahos
Critical Values of z and
Levels of Confidence
f(z)
0.98 0.010 2.326 0.2
0.1
0.95 0.025 1.960 2 2
0.0
2 2
Kiriakos Vlahos
The Level of Confidence and the
Width of the Confidence Interval
When sampling from the same population, using a fixed sample size, the
higher the confidence level, the wider the confidence interval.
S t a n d ar d N o r m al D i s t ri b u ti o n S t a n d ar d N o r m al D i s t ri b u ti o n
0.4 0 .4
0.3 0 .3
f(z)
f(z)
0.2 0 .2
0.1 0 .1
0.0 0 .0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z
Kiriakos Vlahos
The Sample Size and the Width of
the Confidence Interval
When sampling from the same population, using a fixed confidence level, the
larger the sample size, n, the narrower the confidence interval.
0 .4 0 .9
0 .8
0 .3 0 .7
0 .6
0 .5
f(x)
f(x)
0 .2
0 .4
0 .3
0 .1
0 .2
0 .1
0 .0 0 .0
x x
Kiriakos Vlahos
Example
Kiriakos Vlahos
Sampling Error: Unknown
We want to estimate the mean of the population,
The standard deviation of the population is unknown,
We estimate that true average with the sample mean, X
n is the sample size
The sampling error is estimated with:
s
SE
n
Where s is the standard deviation of the sample
61
Overview: Confidence
intervals for sample mean
Population Standard Deviation ( )
Known Unknown
Small Sample
Normal x z n xts n
Population
Distribution:
Kiriakos Vlahos
Confidence Interval in
Excel
DataData AnalysisDescriptive Statistics:
Female Salary
Mean 62576.92
Standard Error 1438.86
Median 62650
Mode 60900
Standard Deviation 7336.774
Sample Variance 53828246
Kurtosis -0.09267
Skewness -0.33406
Range 30800
Minimum 45600
Maximum 76400
Sum 1627000
Count 26
Confidence Level(95.0%) 2963.385
ts n
Confidence Interval = Mean Confidence Level
63
Hypothesis Testing
Kiriakos Vlahos
The court analogy
The defendant is presumed to be innocent
until proven guilty beyond reasonable doubt.
What do we mean by reasonable doubt?
Cot death case
O.J. Simpson
Al Capone
Kiriakos Vlahos
Page 65
Decision-Making
In statistics you cant prove anything. You try
to show that the alternative is highly unlikely
Kiriakos Vlahos
What is a hypothesis?
A hypothesis is a statement or assertion about the state of nature
(about the true value of an unknown population parameter):
The accused is innocent
= 100
Every hypothesis implies its contradiction or alternative:
The accused is guilty
100
A hypothesis is either true or false, and you may fail to reject it or
you may reject it on the basis of information:
Trial testimony and evidence
Sample data
Kiriakos Vlahos
Statistical Hypothesis Testing
Kiriakos Vlahos
The Null Hypothesis, H0
Population Sampling
Sample
, , p
Confidence Interval
Summarising
H0 , Ha Data
Hypothesis Inference
Statistics
x, s, p
Hypothesis Testing
Reject ?
Kiriakos Vlahos
The process of hypothesis testing
Step 1: State null and alternate hypotheses
Kiriakos Vlahos
The Concepts of
Hypothesis Testing
A test statistic is a sample statistic computed from sample data. The value
of the test statistic is used in determining whether or not we may reject the
null hypothesis.
The decision rule of a statistical hypothesis test is a rule that specifies the
conditions under which the null hypothesis may be rejected.
Consider H0: = 100. We may have a decision rule that says: Reject H0 if the
sample mean is less than 95 or more than 105.
In a courtroom we may say: The accused is innocent until proven guilty beyond
a reasonable doubt.
Kiriakos Vlahos
Decision Making
A decision may be correct in two ways:
Fail to reject a true H0
Reject a false H0
A decision may be incorrect in two ways:
Type I Error: Reject a true H0
The Probability of a Type I error is denoted by (level of
significance).
Type II Error: Fail to reject a false H0
The Probability of a Type II error is denoted by (power of
the test).
Kiriakos Vlahos
Type I and Type II Errors
Kiriakos Vlahos
Decision Making (example)
Kiriakos Vlahos
Choosing the significance
level
If it is of paramount importance to avoid a
Type I error then we should choose a low
significance level e.g. 1%
If a Type II error is also costly then we need to
strike a balance and choose a higher
significance level
The most commonly used value is 5%
Kiriakos Vlahos
Page 76
Testing Population Means
Kiriakos Vlahos
Overview: tests of sample
mean
Population Standard Deviation ( )
Known Unknown
Large Sample z n z s n
Small Sample
Normal z n t s n
Population
Distribution:
Kiriakos Vlahos
The rejection region
The rejection region is the range of values that will lead us to reject
the null hypothesis if the test statistic should fall within this region.
The rejection region is designed so that, before the sampling takes
place, our test statistic will have a probability of falling within the
rejection region if the null hypothesis is true
The non-rejection (acceptance) region
consists of all values not included in the
rejection region
Kiriakos Vlahos
Example
A company that delivers packages within a large metropolitan area claims that it
takes an average of 28 minutes for a package to be delivered from your door to the
destination. Suppose that you want to carry out a hypothesis test of this claim.
s 5
Set the null and alternative hypotheses: x z . 025
= 315
. 196
.
H0: = 28 n 100
H1: 28
. .98 = 30.52, 32.48
= 315
Collect sample data:
n = 100 We can be 95% sure that the average time for all
x = 31.5 packages is between 30.52 and 32.48 minutes.
s=5 Since the asserted value, 28 minutes, is not in
this 95% confidence interval, we may reasonably
Construct a 95% confidence interval for reject the null hypothesis at the 5% significance
the average delivery times of all packages: level.
Kiriakos Vlahos
Picturing Hypothesis
Testing
95% confidence interval
Population mean around observed sample
under H0 mean
It seems reasonable to reject the null hypothesis, H0: = 28, since the hypothesized value lies outside
the 95% confidence interval. If were 95% sure that the population mean is between 30.52 and 32.58
minutes, its very unlikely that the population mean is actually be 28 minutes.
Note that the population mean may be 28 (the null hypothesis might be true), but then the observed
sample mean, 31.5, would be a very unlikely occurrence. Theres still the small chance ( = .05) that we
might reject the true null hypothesis. represents the level of significance of the test.
Kiriakos Vlahos
Relationship to confidence
intervals
If the observed sample mean falls within the non-rejection (acceptance) region, then you fail to reject
the null hypothesis as true. Construct a 95% non-rejection region around the hypothesized population
mean, and compare it with the 95% confidence interval around the observed sample mean:
The non-rejection region and the confidence interval are the same width, but centered on different
points. In this instance, the non-rejection region does not include the observed sample mean, and the
confidence interval does not include the hypothesized population mean.
Kiriakos Vlahos
The Decision Rule
The Hypothesized Sampling Distribution of the Mean
0.8
0.7 .95
0.6
0.5
0.4
0.3
.025 .025
0.2
0.1
0.0
f(z)
0.98 0.010 2.326 0.2
0.1
0.95 0.025 1.960 2 2
0.0
2 2
Kiriakos Vlahos
Example
An insurance company believes that, over the last few years, the mean liability insurance per board
seat in companies defined as small companies has been $2000. Using = 0.01, test this hypothesis
using Growth Resources, Inc. survey data. (sample size 100 and sample mean $2700)
n = 100
H0: = 2000
x = 2700
H1: 2000
s = 947
n n 100
Kiriakos Vlahos
Example : Continued
Kiriakos Vlahos
1-Tailed and 2-Tailed Tests
If action is to be taken if a parameter is either greater than or less than some value a, then
the alternative hypothesis is that the parameter is not equal to a, and the test is a two-tailed
test. H0: = 50
H1: 50
The tails of a statistical test are determined by the need for an action. If action is to be taken
if a parameter is greater than some value a, then the alternative hypothesis is that the
parameter is greater than a, and the test is a right-tailed test.
H0: 50
H1: > 50
If action is to be taken if a parameter is less than some value a, then the alternative
hypothesis is that the parameter is less than a, and the test is a left-tailed test.
H0: 50
H1: < 50
Kiriakos Vlahos
Rejection region for
different types of tests
Page 88
Critical Values of z and Levels of
Confidence (one-tailed tests)
Conf. a za S t an d ard N o r m al D i s trib uti o n
level 0.4
(1 )
0.99 0.01 2.326 0.3
f(z)
0.2
a
0.95 0.05 1.645 0.1
0.0
0.90 0.10 1.282 -5 -4 -3 -2 -1 0
Z
1 2 3 4 5
za
0.80 0.20 0.842
Kiriakos Vlahos
Example
An automatic bottling machine fills cola into two liter (2000 cc) bottles. A consumer advocate wants to
test the null hypothesis that the average amount filled by the machine into a bottle is at least 2000 cc. A
random sample of 40 bottles coming out of the machine was selected and the exact content of the
selected bottles are recorded. The sample mean was 1999.6 cc. The population standard deviation is
known from past experience to be 1.30 cc.
Test the null hypothesis at the 5% significance level.
H0: 2000 n = 40
H1: < 2000 x = 1999.6
n = 40 = 1.3
For = 0.05, the critical value
of z is -1.645
x 0 x
z= z= 0 = 1999.6 - 2000
The test statistic is: 1.3
n
n 40
Do not reject H0 if: [z -1.645]
Reject H0 if: [z < -1.645]
Kiriakos Vlahos = 1.95 Reject H
0
The p-Value
The p-value is the smallest level of significance, , at which the null hypothesis
may be rejected using the obtained value of the test statistic.
Reporting the p-value allows the reader to choose her own level of
significance
Kiriakos Vlahos
The p-Value and
Hypothesis Testing
The further away in the tail of the distribution the test statistic falls, the smaller is the p-
value and, hence, the more convinced we are that the null hypothesis is false and should be
rejected.
In a right-tailed test, the p-value is the area to the right of the test statistic if the test statistic
is positive.
In a left-tailed test, the p-value is the area to the left of the test statistic if the test statistic is
negative.
In a two-tailed test, the p-value is twice the area to the right of a positive test statistic or to
the left of a negative test statistic.
Kiriakos Vlahos
The p-Value: Rules of
Thumb
When the p-value is smaller than 0.01, the result is called very significant.
When the p-value is between 0.01 and 0.05, the result is called significant.
When the p-value is between 0.05 and 0.10, the result is considered by some as
marginally significant (and by most as not significant).
When the p-value is greater than 0.10, the result is considered not significant.
Kiriakos Vlahos
Testing for Differences
Applications for testing differences between samples
Difference in average running cost for different makes of vehicle
Difference in average salary between different groups of employees
Difference in profits between regions, managers, etc
Approach:
measure the difference in the average (cost, salary, etc.) of each group
calculate the sampling error as a standard error for this statistic
quantify the sampling error by estimating a confidence interval for the difference
alternatively, perform a specific hypothesis test using Excel
Kiriakos Vlahos
94
Comparisons of Two Population
Means: Test Statistic
Large-sample test statistic for the difference between two population means:
( x x ) ( )
z= 1 2 1 2 0
2
2
1
+ 2
n1
n 2
The term (1- 2)0 is the difference between 1 an 2 under the null hypothesis. It
is equal to zero in situations I and II, and it is equal to the prespecified value D in
situation III. The term in the denominator is the standard deviation of the
difference between the two sample means (it relies on the assumption that the
two samples are independent). This test also assumes unequal variances.
Kiriakos Vlahos
Comparison of Salaries
Female Salary Male Salary
57,000 79,400
61,300 67,400
62,000 66,500
70,100 72,600 Female and Male Salary
45,600 63,600
71,200 74,500 12
64,700 76,400 10
53,800 67,900
Frequency
8
60,900 61,600 Female
62,700 75,500 6
Male
76,400 64,500 4
57,900 73,400 2
68,200 76,100
0
65,800 72,200
60,300 69,600
62,600 53,100
67,000 65,500 Salary
62,700 78,400
54,700 77,600
71,400 82,000
50,400
71,800
59,800
80,800 Discrimination?
64,100 74,800
70,400 71,000
53,100
60,900 96
Discrimination?
Excel: Tools.Data Analysis.t-test two-sample assuming
unequal variances
Kiriakos Vlahos
97
Discrimination?
Output from Excel:
t-Test: Two-Sample Assuming Unequal Variances
98