Vous êtes sur la page 1sur 8

STAT3010: Lecture 2

Lecture 2:
REVIEW OF DISTRIBUTIONS CONTD
DISCRETE DISTRIBUTIONS:
THE BINOMIAL DISTRIBUTION
Definition
A population or process distribution for a discrete variable
mass function

p(x)

and

p(x) 1 ;

where the summation is over all possible

values. Other interesting

proportions can be obtained by adding various


and

and

p(x)

values. In particular if

a b , then proportion of x
p(a) p(a 1) ... p(b)

are integers with

(inclusive)

is specified by a

satisfying

p( x) 0

values between

The mass function of x is given by the formula:

Table B.1 can be used, found in appendix B, pages 609-618.


Mean of the binomial distribution?
Variance of the binomial distribution?
Example: A variety of seed has a 40% chance of germinating. A
total of 5 seeds are planted. Find the probability of at most 2
seeds germinating.

STAT3010: Lecture 2

REVIEW OF HPYOTHESIS TESTS


Null and alternative hypotheses:
A null hypothesis is a hypothesis set up to be nullified or refuted
in order to support an alternative hypothesis. When used, the
null hypothesis is presumed true until statistical evidence in the
form of a hypothesis test indicates otherwise. Usually the
alternate hypothesis is the possibility that an observed effect is
genuine. We denote the null hypothesis as H0 and the
alternative hypothesis as Ha or H1 .The final conclusion once the
test has been carried out is always given in terms of the null
hypothesis. We either reject H0 in favour of H1 or do not
reject H0 ; we never conclude reject H1 , or even accept
H1 .
Null Hypothesis ( H0 )
If the original claim includes one of , , , it is the null
hypothesis. If the original claim does not include one of
, , then the null hypothesis is the complement of the
original claim. The null hypothesis always includes the
equal sign.

Alternative Hypothesis ( H1 or Ha )
Statement which is true if the null hypothesis is false. The
type of test (left, right, or two-tail) is based on the
alternative hypothesis which will include one of , , .
In the past you have seen several different types of hypothesis
testing such as: Hypothesis tests concerning a single mean with
both large and small sample sizes, hypothesis tests concerning
two means with either paired data, independent data or
pooled data.

STAT3010: Lecture 2

Single Mean
General breakdown of hypothesis testing:
In order to perform a test of significance, it is necessary to learn
how to set up your hypotheses, compare your calculated and
critical values, and draw appropriate conclusions. To complete
this task, you will follow the six-step procedure outlined below.
1. State one of the following:
H o : o

versus

H a : o

H o : o

versus

H a : o

H o : o

versus

H a : o

2. Determine your significance level (if not given, use .05).


3. Compute the test statistic (Z if sample size is large or t if
sample size is small).
4. Find the p-value (using table B.2 for Z) or find the t critical
value (using table B.3 for t).
5. Make a decision based on your p-value and the
significance level for Z or based on your t observed and the

t critical for t.
Recall: If P-value , we reject the null hypothesis (H0)
If P-value , we do not reject the null hypothesis
Or
If t observed - t critical or t observed t critical , we reject H0
If - t critical < t observed < t critical , we do not reject H0
6. State the appropriate conclusions.
3

STAT3010: Lecture 2

Single Mean (large n)


Example: The average waiting time for a patient to wait in a
waiting room is 20 minutes. A dentist has recently started a new
private practice and wants to estimate how long patients are
waiting, on average, in the waiting room before their
appointments. He randomly selects 50 patients for observation
and records how many minutes each waits. The mean waiting
time is 18 minutes with a standard deviation of 3.6 minutes.
Based on the data, is there statistical evidence that the mean
waiting time is significantly different than 20 minutes?

STAT3010: Lecture 2

Single Mean (small n)


Example: Suppose that the mean cholesterol level for males
aged 50 is 241. An investigator wishes to examine whether
cholesterol levels are significantly reduced by modifying diet
only slightly. A random sample of 12 patients agrees to
participate in the study and follow the modified diet for 3
months. After 3 months, their cholesterol levels are measured
and summary statistics are produced on the n = 12 subjects.
The mean cholesterol level in the sample is 235 with a standard
deviation of 12.5. Based on the data, is there statistical
evidence that the modified diet reduces cholesterol?

STAT3010: Lecture 2

Example: In a managed care organization, the mean starting salary


for males in entry level, nonclinical positions is $29,500. We want to
determine whether the mean starting salary for females in similar
entry-level, nonclinical positions is different from $29,500. In order to
make this assessment, we randomly select 10 females whose job
titles and responsibilities fit our criteria, and we record their starting
salaries (in $1000s). The data are:
32
27
31
27
26
26
30
22
25
36
Is the mean starting salary for females significantly different from
$29,500? Use SAS.

SAS CODE:
options LS=80 PS=60
nodate;
Data salary;
input x;
y=x-29.5;
cards;
32
27
31
27
26
26
30
22
25
36
;
proc ttest;
var y;
run;

SAS OUTPUT:

Variable
y

N
10

The TTEST Procedure


Statistics
Lower CL
Mean
-4.197
Variable
y
Variable
y

Upper CL
Lower CL
Mean
Mean
Std Dev
Std Dev
-1.3
1.597
2.7855
4.0497
Statistics
Std Err
Minimum
Maximum
1.2806
-7.5
6.5
T-Tests
DF
t Value
Pr > |t|
9
-1.02
0.3366

Upper CL
Std Dev
7.3932

STAT3010: Lecture 2

Difference Between 2 Means


The P-values for a test of H0 against
(i)

Ha : 1 > 2 is P ( T t) upper tail test

(ii)

Ha : 1 < 2 is P ( T t) lower tail test

(iii)

Ha : 1 2 is 2P ( T t ) 2 tail test

Two Independent Populations (Pooled)


Example: A new curriculum has been implemented across
medical schools that is designed to improve medical students
analytic skills. An evaluation committee is concerned that the
new curriculum may be differentially effective among male
and female students. To evaluate the new curriculum, random
samples of male and female students who completed the new
curriculum are selected and given a test to assess their analytic

STAT3010: Lecture 2

skills. The following data, which denote the numbers of analytic


problems correctly solved by the students, are observed:
Statistic
Sample size
Mean
Standard deviations

Males
15
15.8
4.2

Females
12
12.4
3.6

Use the data to test if there is a significant difference in the


mean numbers of analytic problems correctly solved by male
and female medical students by assuming equal variability in
the populations.

to

x1 x2 ( 1 2 )
1 1
sp

n1 n2

(n1 1) s12 (n 2 1) s 22
s
n1 n 2 2
2
p