Vous êtes sur la page 1sur 61

Biostatistics-Testing of

hypothesis


Be faithful in small things because it is in them
that your strength lies.

Mother Teresa
Introduction
Statistics plays a vital role to make a right
decision in research
Statistical procedures enable researchers
to summarize, organize, evaluate, interpret
and communicate numeric information
Focus of this presentation
Discuss commonly used terms
Null hypothesis
Type I and Type II errors
Power etc.

Present a few commonly used statistical
tests
Age
(Years)
Sex BMI
Systolic
BP
(mm of Hg)
Diastolic
BP
(mm of Hg)
Hypertension
40 2 21.6 110 70 2
60 1 33.78 150 90 1
47 2 25.22 120 74 2
50 1 28 120 88 2
47 1 24.39 140 80 1
43 2 25.63 110 70 2
48 2 29.33 120 70 2
55 1 25.83 160 80 1
40 2 29.3 130 80 2
40 2 25.39 110 70 2
67 1 25.71 140 100 1
74 2 22.96 140 90 1
50 2 26.52 154 110 1
46 1 19.53 128 90 1
58 1 31.11 130 80 2
45 2 30.61 140 100 1
45 2 32.6 120 80 2
40 1 30.61 150 80 1
65 2 24.11 150 100 1
55 1 27.06 120 70 2
45 2 24.3 120 94 1
48 1 27.41 110 70 2
67 2 25.56 150 80 1
46 2 16.5 110 70 2
52 1 23.31 122 82 1
45 2 30.61 140 100 1
45 2 32.6 120 80 2
40 1 30.61 150 80 1
65 2 24.11 150 100 1
55 1 27.06 120 70 2
45 2 24.3 120 94 1
48 1 27.41 110 70 2
67 2 25.56 150 80 1
46 2 16.5 110 70 2
52 1 23.31 122 82 1
Sex : 1 Male, 2 Female
Hypertension : 1 Hypertensive 2 Normotensive
Age in Years Tally Frequency %
40 49 |||| |||| |||| 14 56
50 59 |||| | 6 24
60 69 |||| 4 16
70 80 | 1 4
Agewise distribution of the 25 persons
0
10
20
30
40
50
60
Age Group
40 49 50 59 60 69 70 - 80
0
10
20
30
40
50
60
Age Group
40 49 50 59 60 69 70 - 80
Sexwise distribution of the 25 persons

Gender Tally Frequency %
Male |||| |||| | 11 44
Female |||| |||| |||| 14 56
0
10
20
30
40
50
60
Male Female
Male Female
44
56
Male Female
Hypertension status by Gender

sex
Male Female
Hypertension
status
No % No %
Hypertensive 7
63.
6
6 42.9
Normotensive 4
36.
4
8 57.1
0
1
2
3
4
5
6
7
8
Male female
Multiple bar diagram
Hypertensive
Normotensive
0
10
20
30
40
50
60
70
Male female
Multiple bar diagram
Hypertensive
Normotensive
0%
20%
40%
60%
80%
100%
Male Female
component bar diagram
Normotensive
Hypertensive
Descriptive analysis
In illustrative data,
Mean age =

years 92 . 50
25
52 46 ........ 60 40
=
+ + + +
median age is 48 years.
mode age is 40 years
Standard deviation= 9.48 years
Scatter diagram
0
20
40
60
80
100
120
0 50 100 150 200
Systolic Blood Pressure
D
i
a
s
t
o
l
i
c

B
l
o
o
d

P
r
e
s
s
u
r
e
62 . 0 619 . 0
13 . 173
32 . 107
98 . 121 73 . 245
32 . 107
= = =
=
x
A hypothesis is an
assumption about the
population parameter.
A parameter is a
characteristic of the
population, like its mean
or variance.
The parameter must be
identified before analysis.
I assume the mean
age of the
participants is 25yrs
What is a Hypothesis?
A statement about the study objective
Testing of Hypothesis (or)
Statistical tests

The testing of hypothesis consists of 5 major
steps.
Forming of null hypothesis
Forming of alternative hypothesis
Fixing of level of significance
Applying critical ratio / formula / statistical test
Making inference
Step 1 : Null Hypothesis:


An unbiased statement about the study
objective.
The null hypothesis indicates a neutral
position in the given study or experiment.
Step 2 : Alternative hypothesis
A statement against the null hypothesis.
It can be of two forms: without any specific
direction or with a particular direction.
Without any specific direction is called as
two-sided alternative hypothesis.
With a specific direction it is called as one-
sided alternative hypothesis
Step 3 : Level of Significance

The concept of sample population here.
prepare to do a mental exercise.
Population and Sample
Population Sample
Use parameters to
summarize features
Use statistics to
summarize features
Inference on the population from the sample
errors
In any study, the researcher is going to
make a decision based on the sample
result.
The researcher is unknown about the
population.
Since the researcher makes the decision
based on the sample, there is a possibility
of committing some errors.
This will be explained as follows:

Population unknown actual situation is that the null
hypothesis is
Sample-
known the
researcher
calculates a
test
statistics
and decides
on the null
hypothesis
True False
True
(Accepted)
1. Correct
decision
3. Wrong
(Type II error)
False
(Rejected)
2. Wrong
(Type I error)
4. Correct
decision
Type I error: Rejecting the null hypothesis when it is in fact, true,
usually noted as by the letter o
Type II error:Accepting the null hypothesis when it is, in fact, false
noted as |.
o
|
Reduce probability of one error
and the other one goes up.
& Have an
Inverse Relationship
True Value of Population Parameter
Increases When Difference Between Hypothesized
Parameter & True Value Decreases
Significance Level o
Increases When o Decreases
Population Standard Deviation o

Increases When o

Increases
Sample Size n
Increases When n Decreases
Factors Affecting
Type II Error,
o
|
|
o
|
n
Common significance level
The two most frequently used significance
levels are 0.05 and 0.01.
With a 0.05 significance level, we are
accepting the risk that out of 100 samples
drawn from a population, a true null
hypothesis would be rejected only 5 times.
The statistical tests are classified
under these six situations.

Comparison of sample mean with population
mean.
Comparison of two sample means.
Comparison of more than two sample means
simultaneously.
Comparison of sample percentage with
population percentage.
Comparison of two sample percentages.
Finding any association or relationship between
any two or more variables.
Mean of SE
Mean Population Mean Sample
Z

=
n SD
X
/

=
Mean of SE
Mean Population Mean Sample
t

=
n SD
X
/

=
1
) (
2


=
n
x x
i
Common statistical tests / procedures
Situation I

Comparison of sample mean and population mean.
If sample size > 30 distribution


If sample size <30



.
Here, SD
Normal distribution
t distribution with n1 degrees of freedom
2
2
2
1
2
1
2 1
n
SD
n
SD
SampleMean Mean Sample
Z
+

=
2
1 1
1
2 1
n n
S
Mean Population Mean Sample
t
+

= t distribution with
n
1
+ n
2
2 degrees of freedom
Situation II
Comparison between two samples means
If two samples are independent and sample size is >30


Where n
1
and n
2
are sample size of sample 1 and 2.


If two samples are independent and sample size <30


Normal distribution
Where
2
) 1 ( ) 1 (
2 1
2
2 2
2
1 1
+
+
=
n n
S n S n
S
S
1
and S
2
are SDs of sample 1 and 2 and n
1
and n
2
are sample size
of sample 1 and 2. (It is frequently called as independent student t test).
n S
d
t
d
/
=
d
d
S
1
) (
2


=
n
d d
i
If two samples are related and sample size is <30.


t distribution with n1 degrees of freedom.
Where
- the mean of the difference between two sample observations,
is the standard error of the mean difference.
Sd
Note :
Correlated samples refers to the situations like pre and post tests, before and after treatment,
matched sample etc.
and n is the sample size.
This t test is usually called as paired t test
Comparison between sample percentage and population percentage
(i) If sample size > 30


Where p is the sample %
P is the population %
Q = 100 P
and N is the sample size.
Situation IV
Normal distribution
N PQ
P p
P SE
P p
Z
/
) (

=
) / 1 / 1 (
2 1
2 1
n n PQ
P P
Z
+

=
2 1
2 2 1 1
n n
P n P n
+
+
Comparison between two sample percentages
If sample size > 30

where P =
Q = 100 P
Situation V
Normal distribution
2
1
2

=
n
r
r
t

= (
(


=
m n
j i
ij
ij ij
E
E O
1
1 , 1 ,
2
2
) (
_
Relationship or Association between two variables

(i) If two variables are quantitative, Pearson correlation
co-efficient is used to find out the relationship.
To verify the calculated correlation coefficient the following formula is used.

(ii) If two variables are qualitative and sample size is >30,

to find out the association, chi-square test is used

Chi-square distribution
.
Situation VI
t distribution with n 2 degrees of freedom
Where Oij is the i,jth observed value
and Eij is the i,jth expected value
Example 1
The prevalence rate of neck pain among
computer professionals is 20%. A BPT
student wants to verify whether this true.
For this the student selects 400 software
professionals with minimum experience of
3 years. He conducts a survey and founds
that 96 of them are having neck pain. With
this evidence, what type of conclusion the
student will make
Solution

Here, the situation is comparison between
sample percentage and population
percentage and sample size is > 30.
Step 1: Null hypothesis


Sample % = population % i.e. the sample
has come from a population with 20%
neck pain among computer professionals.
The prevalence rate of neck pain among
sample computer professional is equal to
the prevalence rate of all computer
professional
Step 2:Alternative Hypothesis

Since there is no specific direction
mentioned in the problem the researcher
can have a two sided alternative
hypothesis.
Sample % is not equal to population %
The prevalence rate of neck pain among
the sample computer professionals is
different from the all the computer
professionals.
Step3: Level of Significance

Since in the problem there is no specific
value is mentioned about the level of
significance, the researcher can select the
level of significance as 5% i.e. o= 0.05
and the corresponding table value is 1.96.
Step 4: Critical ratio


Here Population % is
20% so, Q = 80%
P =20% Q = 80%
.% 3 . 23
400
93
=
2
400
80 20
= =
x
P of SE
65 . 1
2
3 . 3
2
3 . 23 20
= =

= Z
Sample %P =
N PQ
P p
P SE
P p
Z
/
) (

=
Normal Curve
Characteristics of Normal Curve

Bell shaped and symmetric
Mean Median and Mode are all equal
Mean +SD approximately consists of 68.7
% of the observations.
Mean+2SD approximately consists of
95.4% of the observations.
Mean+3SD approximately consists of
99.9% of the observations
Step 5:Interpretation

Since the critical ratio 1.65 < table value
1.96, Accept the null hypothesis.
Conclusion
The researcher may conclude that
prevalence rate of neck among computer
professionals is 20%.
Example: 2

If a physiotherapist is interested to know whether
moist heat + isometric exercise + tapping is effective in
reducing the pain in osteoarthritis knee. For this, 80
osteoarthritis patients were selected. 30 of them were
treated with moist heat + isometric exercise, their
average pain was found to be 3.90 with the SD of 1.37.
(Treatment A) Remaining 50 of them were treated with
moist heat + isometric exercise + tapping, their average
pain level was found to be 1.70 with the SD of 0.67
(Treatment B). Based on the above results can the
physiotherapist concludes that treatment B is effective
than treatment A at 5% level of significance? (Pain is
assessed by a numerical pain scale : lower the score,
less pain).
Solution

To do the statistical test, first identify the
situation and
follow the five steps. This design clearly
tells us a comparative
study between two sample means and
sample size is > 30.
Step 1

Null hypothesis
Treatment A = Treatment B
i.e. there is no significant difference exists between the
moist heat + isometric exercise and the moist heat +
isometric exercise + tapping with respect to the average
pain level after the treatment.
Step 2
Alternative Hypothesis
Treatment B > treatment A
Treatment B is effective than treatment A
i.e. The mean pain level after treatment B is less than
the mean pain level of treatment A.
Step 3

Level of Significance
It is given in the problem i.e. o=0.05. Since
alternative hypothesis is one sided, the
table value is 1.64.
Step 4

Critical ratio / Formula
2
2
2
1
2
1
2 1 2
n
SD
n
SD
Mean Mean
Z
+

=
25 . 8
267 . 0
2 . 2
50
) 67 . 0 (
30
) 37 . 1 (
7 . 1 9 . 3
2 2
= =
+

=
Step 5 Interpretation

Since the critical ratio 8.25 > table
value 1.64, accept the alternative
hypothesis. It is 95% confidence that
treatment B is effective than treatment A.
Example 3

Objective of the Study
Find out is there any association exists between obesity
and hypertension
Data
140 obesed persons were selected and 65 of them are
having hypertension.
110 Non-obesed persons were selected and among
them, 42 are suffering from hypertension.
Question
Can we conclude at 99% confidence that there is an
association exists between obesity and hypertension.
Solution

This data can be analysed by two ways
Comparison between two sample
percentages and
Finding association using chi-square test.
Here, the second way is used to verify the
data
Null hypothesis
There is no association between obesity
and hypertension.
Alternative hypothesis
An association exists between obesity and
hypertension.
Level of Significance

Since the question is asked with 99%
confidence, o=0.01. The corresponding
chi-square table value is 6.63 for d.f. = 1
(refer the table in the appendix).
Critical ratio


Where Os are called observed values are Es are called as expected values
(
(

=
ij
ij ij
E
E O
2
2
) (
_
Where Os are called observed values are Es are called as expected values

hyperten
sion
Normal Total
Obesed 85 55 140
Non-Obesed 42 68 110
Total 127 123 250
Given data is known as observed value.
Expected values are calculated by

T
T T
G
C x R
R
T
- Row total
C
T
- Column total
And G
T
- Grand Total
E
E O
2
) (
12 . 71
250
140 127
=
x
88 . 68
250
140 123
=
x
88 . 55
250
110 127
=
x
12 . 54
250
110 123
=
x
Row
Colum
n
Observed
Value (O)
Expected Value
(E)
O E (O E)
2

1 1 85 13.88 192.65 2.71
1 2 55 13.88 192.65 2.80
2 1 42 13.88 192.65 3.45
2 2 68 13.88 192.65 3.56
Total 12.52
_2 = 12.52 d.f = (R 1) x (C 1) = (2-1) x (2-1) = 1x1 = 1
Interpretation

Since the calculated value, 12.52 > table
value 6.63, accept the alternative
hypothesis.
Conclusion
99% confidence, that there is an
association exists between obesity and
hypertension.
You Need to Know
How to turn a question into hypotheses
Failing to reject the null hypothesis DOES NOT
mean that the null is true
Every test has assumptions
A statistician can check all the assumptions
If the data does not meet the assumptions there are non-
parametric versions of the tests (see text)

Avoid Common Mistakes:
Hypothesis Testing
If you have paired data, use a paired test
If you dont then you can lose power
If you do NOT have paired data, do NOT
use a paired test
You can have the wrong inference

Common Mistakes:
Hypothesis Testing
These tests have assumptions of independence
Taking multiple samples per subject ?
Different statistical analyses MUST be used
Distribution of the observations
Histogram of the observations
Highly skewed data - t test - incorrect results
Common Mistakes:
Hypothesis Testing
Assume equal variances and the
variances are not equal
Did not show variance test
Not that good of a test
ALWAYS graph your data first to assess
symmetry and variance

Vous aimerez peut-être aussi