Vous êtes sur la page 1sur 52

INTRODUCTION TO

BIOSTATISTICS

DR.S.Shaffi Ahamed
Asst. Professor
Dept. of Family and Comm. Medicine
KKUH
This session covers:

 Origin and development of Biostatistics


 Definition of Statistics and Biostatistics
 Reasons to know about Biostatistics
 Types of data
 Graphical representation of a data
 Frequency distribution of a data
 “Statistics is the science which deals
with collection, classification and
tabulation of numerical facts as the
basis for explanation, description
and comparison of phenomenon”.

------ Lovitt
Origin and development of
statistics in Medical Research
 In 1929 a huge paper on application of
statistics was published in Physiology
Journal by Dunn.
 In 1937, 15 articles on statistical methods
by Austin Bradford Hill, were published in
book form.
 In 1948, a RCT of Streptomycin for
pulmonary tb., was published in which
Bradford Hill has a key influence.
 Then the growth of Statistics in Medicine
from 1952 was a 8-fold increase by 1982.
C.R. Rao
Douglas Altman Ronald Fisher Karl Pearson

Gauss -
“BIOSTATISICS”
 (1) Statistics arising out of biological
sciences, particularly from the fields of
Medicine and public health.
 (2) The methods used in dealing with
statistics in the fields of medicine, biology
and public health for planning,
conducting and analyzing data which
arise in investigations of these branches.
Reasons to know about
biostatistics:
 Medicine is becoming increasingly
quantitative.
 The planning, conduct and interpretation
of much of medical research are
becoming increasingly reliant on the
statistical methodology.
 Statistics pervades the medical literature.
Example: Evaluation of Penicillin (treatment
A) vs Penicillin & Chloramphenicol
(treatment B) for treating bacterial
pneumonia in children< 2 yrs.
 What is the sample size needed to demonstrate the significance
of one group against other ?
 Is treatment A is better than treatment B or vice versa ?
 If so, how much better ?
 What is the normal variation in clinical measurement ? (mild,
moderate & severe) ?
 How reliable and valid is the measurement ? (clinical &
radiological) ?
 What is the magnitude and effect of laboratory and technical
error ?
 How does one interpret abnormal values ?
CLINICAL MEDICINE

 Documentation of medical history of


diseases.
 Planning and conduct of clinical studies.
 Evaluating the merits of different
procedures.
 In providing methods for definition of
“normal” and “abnormal”.
PREVENTIVE MEDICINE

 To provide the magnitude of any health


problem in the community.
 To find out the basic factors underlying
the ill-health.
 To evaluate the health programs which
was introduced in the community
(success/failure).
 To introduce and promote health
legislation.
WHAT DOES STAISTICS
COVER ?
Planning
Design
Execution (Data collection)
Data Processing
Data analysis
Presentation
Interpretation
Publication
HOW A “BIOSTATISTICIAN”
CAN HELP ?
 Design of study
 Sample size & power calculations
 Selection of sample and controls
 Designing a questionnaire
 Data Management
 Choice of descriptive statistics & graphs
 Application of univariate and multivariate
statistical analysis techniques
INVESTIGATION

Data Colllection

Inferential Statistiscs
Descriptive Statistics
Data Presentation
Estimation Hypothesis Univariate analysis
Measures of Location
Tabulation Testing
Measures of Dispersion
Diagrams Ponit estimate Multivariate analysis
Measures of Skewness &
Graphs Inteval estimate
Kurtosis
TYPES OF DATA

 QUALITATIVE DATA
 DISCRETE QUANTITATIVE
 CONTINOUS QUANTITATIVE
QUALITATIVE

Nominal
Example: Sex ( M, F)
Exam result (P, F)
Blood Group (A,B, O or AB)
Color of Eyes (blue, green,
brown, black)
ORDINAL
Example:
Response to treatment
(poor, fair, good)
Severity of disease
(mild, moderate, severe)
Income status (low, middle,
high)
QUANTITATIVE (DISCRETE)

Example: The no. of family members


The no. of heart beats
The no. of admissions in a day

QUANTITATIVE (CONTINOUS)

Example: Height, Weight, Age, BP, Serum


Cholesterol and BMI
Discrete data -- Gaps between possible values

Number of Children

Continuous data -- Theoretically,


no gaps between possible values

Hb
CONTINUOUS DATA

DISCRETE DATA

wt. (in Kg.) : under wt, normal & over wt.


Ht. (in cm.): short, medium & tall
Table 1 Distribution of blunt injured patients
according to hospital length of stay
hospital length of stay Number Percent
1 – 3 days 5891 43.3
4 – 7 days 3489 25.6
2 weeks 2449 18.0
3 weeks 813 6.0
1 month 417 3.1
More than 1 month 545 4.0
Total 14604 100.0
Mean = 7.85 SE = 0.10
Scale of measurement
Qualitative variable:
A categorical variable

Nominal (classificatory) scale


- gender, marital status, race

Ordinal (ranking) scale


- severity scale, good/better/best
Scale of measurement
Quantitative variable:
A numerical variable: discrete; continuous

Interval scale :
Data is placed in meaningful intervals and order. The unit of
measurement are arbitrary.

- Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and


No implication of ratio (30º C is not twice as hot as 15º C)
Ratio scale:
Data is presented in frequency distribution in
logical order. A meaningful ratio exists.

- Age, weight, height, pulse rate


- pulse rate of 120 is twice as fast as 60
- person with weight of 80kg is twice as heavy
as the one with weight of 40 kg.
Scales of Measure

 Nominal – qualitative classification of


equal value: gender, race, color, city
 Ordinal - qualitative classification
which can be rank ordered:
socioeconomic status of families
 Interval - Numerical or quantitative
data: can be rank ordered and sizes
compared : temperature
 Ratio - Quantitative interval data along
with ratio: time, age.
INVESTIGATION

Data Colllection

Inferential Statistiscs
Descriptive Statistics
Data Presentation
Estimation Hypothesis Univariate analysis
Measures of Location
Tabulation Testing
Measures of Dispersion
Diagrams Ponit estimate Multivariate analysis
Measures of Skewness &
Graphs Inteval estimate
Kurtosis
Frequency Distributions

 data distribution – pattern of


variability.
 the center of a distribution
 the ranges
 the shapes
 simple frequency distributions
 grouped frequency distributions
 midpoint
Tabulate the hemoglobin values of 30 adult
male patients listed below

Patien Hb Patien Hb Patien Hb


t No (g/dl) t No (g/dl) t No (g/dl)
1 12.0 11 11.2 21 14.9
2 11.9 12 13.6 22 12.2
3 11.5 13 10.8 23 12.2
4 14.2 14 12.3 24 11.4
5 12.3 15 12.3 25 10.7
6 13.0 16 15.7 26 12.5
7 10.5 17 12.6 27 11.8
8 12.8 18 9.1 28 15.1
9 13.2 19 12.9 29 13.4
10 11.2 20 14.6 30 13.1
Steps for making a
table
Step1 Find Minimum (9.1) & Maximum (15.7)

Step2 Calculate difference 15.7 – 9.1 = 6.6

Step3 Decide the number and width of


the classes (7 c.l) 9.0 -9.9, 10.0-10.9,----

Step4 Prepare dummy table –


Hb (g/dl), Tally mark, No. patients
DUMMY TABLE Tall Marks TABLE
Hb (g/dl) Tall marks No. Hb (g/dl) Tall marks No.
patients patients

9.0 – 9.9 9.0 – 9.9 l 1


10.0 – 10.9 10.0 – 10.9 lll 3
11.0 – 11.9 11.0 – 11.9 lll 6
12.0 – 12.9 12.0 – 12.9
13.0 – 13.9
llll llll 10
13.0 – 13.9
14.0 – 14.9 14.0 – 14.9 llll 5
15.0 – 15.9 15.0 – 15.9 3
lll 2
ll
Total Total - 30
Table Frequency distribution of 30 adult male
patients by Hb
Hb (g/dl) No. of
patients
9.0 – 9.9 1
10.0 – 10.9 3
11.0 – 11.9 6
12.0 – 12.9 10
13.0 – 13.9 5
14.0 – 14.9 3
15.0 – 15.9 2
Total 30
Table Frequency distribution of adult patients by
Hb and gender:
Hb Gender Total
(g/dl)
Male Female

<9.0 0 2 2
9.0 – 9.9 1 3 4
10.0 – 10.9 3 5 8
11.0 – 11.9 6 8 14
12.0 – 12.9 10 6 16
13.0 – 13.9 5 4 9
14.0 – 14.9 3 2 5
15.0 – 15.9 2 0 2

Total 30 30 60
Elements of a Table
Ideal table should have Number
Title
Column headings
Foot-notes
Number – Table number for identification in a report

Title,place - Describe the body of the table, variables,


Time period (What, how classified, where and when)

Column - Variable name, No. , Percentages (%), etc.,


Heading

Foot-note(s) - to describe some column/row headings,


special cells, source, etc.,
Table II. Distribution of 120 (Madras) Corporation divisions
according to annual death rate based on registered deaths in
1975 and 1976

Death rate (/1000 per


No.annum)
of divisions
7.0-7.9 4 (3.3)
8.0 - 8.9 13 (10.8)
9.0 - 9.9 20 (16.7)
10.0 - 10.9 27 (22.5)
11.0 - 11.9 18 (15.0)
12.0 - 12.9 11 (0.2)
13.0 - 13.9 11 (9.2)
14.0 - 14.9 6 (5.0)
15.0 - 15.9 2 (1.7)
16.0 - 16.9 4 (3.3)
17.0 - 18.9 3 (2.5)
19.0 + 1 (0.8)
Total 120 (100.0)

Figures in parentheses indicate percentages


DIAGRAMS/GRAPHS

Discrete data
--- Bar charts (one or two groups)

Continuous data
--- Histogram
--- Frequency polygon (curve)
--- Stem-and –leaf plot
--- Box-and-whisker plot
Example data

68 63 42 27 30 36 28 32
79 27 22 28 24 25 44 65
43 25 74 51 36 42 28 31
28 25 45 12 57 51 12 32
49 38 42 27 31 50 38 21
16 24 64 47 23 22 43 27
49 28 23 19 11 52 46 31
30 43 49 12
Histogram

20
Frequency

10

11.5 21.5 31.5 41.5 51.5 61.5 71.5


Age

Figure 1 Histogram of ages of 60 subjects


Polygon

20
Frequency

10

11.5 21.5 31.5 41.5 51.5 61.5 71.5


Age
Example data

68 63 42 27 30 36 28 32
79 27 22 28 24 25 44 65
43 25 74 51 36 42 28 31
28 25 45 12 57 51 12 32
49 38 42 27 31 50 38 21
16 24 64 47 23 22 43 27
49 28 23 19 11 52 46 31
30 43 49 12
Stem and leaf plot
Stem-and-leaf of Age N = 60
Leaf Unit = 1.0

6 1 122269
19 2 1223344555777788888
(11) 3 00111226688
13 4 2223334567999
5 5 01127
4 6 3458
2 7 49
Box plot

80

70

60

50
Age

40

30

20

10
Descriptive statistics report:
Boxplot
- minimum score
- maximum score
- lower quartile
- upper quartile
- median
- mean

- the skew of the distribution:


positive skew: mean > median & high-score whisker is longer
negative skew: mean < median & low-score whisker is longer
Pie Chart
•Circular diagram – total -100%
10%
•Divided into segments each
representing a category
20% Mild
•Decide adjacent category
Moderate
Severe •The amount for each category is
70% proportional to slice of the pie

The prevalence of different degree of


Hypertension
in the population
Bar Graphs
25
Heights of the bar indicates
20 20
20
frequency
16
Number

15 12 12
9 8 Frequency in the Y axis
10
5 and categories of variable
0 in the X axis
Smo Alc Chol DM HTN No F-H
Exer The bars should be of equal
Risk factor
width and no touching the
other bars
The distribution of risk factor among cases with
Cardio vascular Diseases
HIV cases enrolment in
USA by gender
Bar chart
12
Enrollment (hundred)

10
8
6
Men
4 Women
2
0
1986 1987 1988 1989 1990 1991 1992

Year
HIV cases Enrollment
in USA by gender
Stocked bar chart
18
16
Enrollment (Thousands)

14
12
10
8 Women
6 Men
4
2
0
1986 1987 1988 1989 1990 1991 1992
Year
Graphic Presentation of
Data
the frequency polygon
(quantitative data)

the histogram
(quantitative data)

the bar graph


(qualitative data)
General rules for designing
graphs
 A graph should have a self-explanatory
legend
 A graph should help reader to understand
data
 Axis labeled, units of measurement
indicated
 Scales important. Start with zero (otherwise
// break)
 Avoid graphs with three-dimensional
impression, it may be misleading (reader
visualize less easily
Any Questions

Vous aimerez peut-être aussi