Vous êtes sur la page 1sur 112

Chapter I Descriptive Statstics

Objectives
Define variable and data Describe types of data and measurement scales Define and calculate ratio, rate and proportion Define and calculate measures of central tendency and measures of spread Organize and display data Extract useful information

Any aspect of an individual or object that is measured (e.g., BP) or recorded (e.g., age, sex) and takes any value. There may be one variable in a study or many. E.g., A study of treatment outcome of TB

Eg. Nominal
Marital status: 1. Single 2. Married 3. Widow 4. Divorce

The numbers have NO meaning They are labels only

E.g. Ordinal
Pain level 1. None 2. Mild 3. Moderate 4. Severe The numbers have LIMITED meaning 4>3>2>1 is all we know apart from their utility as labels

Eg. Interval
- Temperature. in oC on 4 consecutive days Days: A B C D Temp. oC: 18 20 22 23 For these data, not only is day A with 18o cooler than day D with 23o, but is 5o cooler. - It has no true zero point. 0 is arbitrarily chosen and doesnt reflect the absence of temp.

Eg. Ratio
-Height, age, weight, BP, etc Someone who weighs 80 kg is two times as heavy as someone else who weighs 40 kg.

. Note on meaningfulness of ratio-

Interval

Nominal

Degree of precision in measuring

Ordinal

Ratio

Summary of Data
Variable

Qualitative or categorical

Quantitative measurement

Nominal (not ordered) e.g. ethnic group

Ordinal (ordered) e.g. response to treatment

Continuous Discrete (real-valued) (count data) e.g. number of e.g. height admissions

Categorizing Data
Can facilitate data analysis Must choose:
Number of categories Category cut points

Some options for cut points:


Percentiles, natural breaks, established criteria Example: WHO body mass index classification
Underweight: <18.50 kg/m2 Normal: 18.50 24.99 kg/m2 Overweight: 25.00 kg/m2

Categorizing Variables-Exercise

1. 2. 3. 4. 5.

Year of birth Marital status of women Identification number study participant Class rank Length of infants at ANC clinic

Categorizing Variables-Exercise

1. Year of birth: Quantitative/Discrete 2. Marital status: Categorical/Nominal 3. Identification number: Categorical/Nominal 4. Class rank: Categorical/Ordinal 5. Length: Quantitative/Continuous

Discrete or Continuous?
Identify whether the following data is discrete or continuous:

1. Distance from primary health center to reference lab 2. Number of times a child under 5 has experienced fever in the last month 3. Number of fatal accidents on a road over the past year 4. Weight gained or lost by a 9 month old in the past 3 months

Discrete or Continuous?
Identify whether the following data is discrete or continuous:

1. Distance from primary health center to reference lab: Continuous 2. Number of times a child under 5 has experienced fever in the last month: Discrete 3. Number of fatal accidents on a road over the past year: Discrete 4. Weight gained or lost by a 9 month old in the past 3 months: Continuous

Describing categorical data


A prerequisite for any research is the ability to quantify the occurrence of disease How many people are affected by a certain disease? (Count) What is the rate at which the disease in occurring through time? (Rate) How does the disease burden vary by location, by sex, by age, or various modes of exposure? (Ratio, Proportion)

Counts
Most basic measure of disease frequency is a simple count of affected individuals. Example: 350,000 cases of polio 350,000 cases of polio in 1988 350,000 cases of polio in 1988 in 125 countries

How is count data used?


1988 Polio > 350 000 cases 125 countries

2002 Polio 1918 cases 7 countries

Example of Counts
Number of Cases of Hemorrhagic Fever by Age and Sex, Zaire, 1976

Age (years)

Male

Female

Total

<1
1 - 14 15 - 29

10
18 33

14
25 60

24
43 93

30 - 49
50+ Total

57
23 141

52
26 177

109
49 318

Ratio Proportion Rate


What, who is in the denominator????

Ratio
The quotient of 2 numbers Numerator NOT INCLUDED in the denominator No relationship necessary between numerator and denominator May be expressed as a/b or a:b

What is the sex ratio?

# males 100 Sex ratio = males:females = # females

= 2 / 5 = .4 X 100 = 40

When is a ratio used?


Sex ratio: Male to female Number of health facilities per population Number of participants in the course per facilitator Number of inhabitants per latrine Odds ratio Relative risk Prevalence ratio Maternal mortality ratio

Ratio Example 1
A university has 4000 male students and 2000 female students. The ratio of male to female students is: 4000/2000 = 2/1 or 2:1 For every 2 male students there is one female student

Ratio Example 2
A foodborne epidemic occurred in an elementary school canteen. The attack rate in the first grade was 24% while the attack rate in the second grade was 16%. Compare these two attack rates. 24/16 = 3/2 or 3:2 For every 3 first graders who fell ill, there were 2 second graders who also fell ill.

Ratio Example 3
A city of 4 million people has 400 clinics. Calculate the ratio of clinics per person.

Ratio = 400 / 4,000,000 = 0.0001 clinics / person

Multiply by 104
Ratio = 0.0001 x 104 = 1 clinic / 10,000 persons

Proportion
The quotient of 2 numbers Numerator is a sub-group of the population in the denominator Numerator is always INCLUDED in the denominator Proportion ranges between 0 and 1 Percentage = proportion x 100

What is the proportion of cases?

+
2 cases 0.5 100 50% 4 total

When is a proportion used?


Proportion of samples positive for P. Falciparum
1000 samples, 236 positive Proportion of positive samples = 236/1000 = 0.236 Percentage of positive samples = 0.236 x 100 = 23. 6%

Proportion of malaria deaths


123 malaria cases, 7 deaths Proportion of malaria deaths = 7/123 = 0.057 Percentage of malaria deaths = 0.057 x 100 = 5.7%

Proportion Example 1
A university has 4000 male students and 2000 female students. Calculate the proportion of male and female students. Male: 4000/6000 x 100% = 66.7% Female: 2000/6000 x 100% = 33.3%

Proportion Example 2
40 children are currently ill with the measles, 80 children all together have had the measles 40 / 80 = .50 (proportion) 40 / 80 = .50 * 100 = 50% (percentage)

Rate
The quotient of 2 numbers Measures the probability of occurrence of an event over time Numerator: number of EVENTS Denominator: POPULATION at risk for event in numerator observed for a given TIME

What is the rate of death?

Observed in one year

2 2 deaths per 100 population per year 100 per year

When is a rate used?


Morbidity rates
Attack rates Prevalence rates Incidence rates

Mortality rates Natality rates

Rate Example 1
Mortality rate of tetanus in France in 1995
Tetanus deaths: 17 Population in 1995: 58 million Time period: 1 year Mortality rate = 0.029 per 100,000 population per year

Rate may be expressed in any power of 10


100, 1,000, 10,000, 100,000

Rate must include an aspect of time


Per year, per month, per day

Rate Example 2
Maternal Mortality for Various Continents (1995) Continent Rate

Africa Asia
Europe Latin America/Caribbean

273000 217000
2000 22000

South America
North America Australia/New Zealand

15000
490 25

Summary
W is the Measure of Frequency? hat
Is numerator included in denominator? Yes Is time included in denominator? Yes Measure: Rate No Proportion Ratio
14

No

Describing Quantitative Variables


Measures of Central Location
Mean, Median, Mode

Measures of Spread
Range, IQR, Variance, Standard deviation

Measure of Central Location


Central Location / Position / Tendency
A single value that represents (is a good summary of) an entire distribution of data

Also known as: Measure of central tendency Measure of central position

Common measures Arithmetic mean Median Mode

Central Location
?
20

Number of people

15

10

Spread
0-9 10-19 20-29 30-39 40-49 50-59 60-69 Age 70-79 80-89 90-99

Age 27 30 28 31 28 36 29 37 29 34

Raw data set: Ages of students in a class (years)

30
30 27 30 28 31 32 30 29 29

Ob s

Age

1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

27
27 28 28 28 29 29 29 29 30 30 30 30 30 31 31 32 34 36 37

Order the data set from the lowest value to the highest value Add observation numbers

Mode
Definition: Mode is the value that occurs most frequently Method for identification 1. Arrange data into frequency distribution or histogram, showing the values of the variable and the frequency with which each value occurs 2. Identify the value that occurs most often

Ob s

Age

1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

27
27 28 28 28 29 29 29 29 30 30 30 30 30 31 31 32 34 36 37

Mode
Age 27 28 29 Frequency 2 3 4

30
31 32

5
2 1

Mode

33
34 35 36 37 Total

0
1 0 1 1 20

Ob s

Age

1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

27
27 28 28 28 29 29 29 30 30 30 30 30 31 31 32 34 36 37 29

Mode
The most frequent value of the variable

Mode = 30
7 6
Frequency

5 4 3 2 1 27 2 8 29 30 31 32 33 34 35 36 37

Age (years)

Example
Finding Mode from Length of Stay Data
0, 2, 3, 4, 5, 5, 6, 7, 8, 9,

9, 9, 10, 10, 10, 10, 10, 11, 12, 12, 12, 13, 14, 16, 18, 18, 19, 22, 27, 49

Mode = 10

Finding Mode from Histogram


6

Number of patients

5 4 3 2 1 0 0 5 10 15 20 25 30 35 40 45 50 Nights of stay

Mode Properties / Uses


Easiest measure to understand, explain, identify Always equals an original value Insensitive to extreme values (outliers) Good descriptive measure, but poor statistical properties May be more than one mode May be no mode Does not use all the data

Outliers
6

Number of patients

5 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Nights of stay

20 18 16 Population 14 12 10 8 6 4

Unimodal Distribution

2
0

18
16 Population 14 12 10 8 6 4 2 0

Bimodal Distribution

Median
Definition: Median is the middle value; also, the value that splits the distribution into two equal parts 50% of observations are below the median 50% of observations are above the median
Method for identification 1. Arrange observations in order 2. Find middle position as (n + 1) / 2 3. Identify the value at the middle

Obs 1 2 3 4 5

Age 27 27 28 28 28

Median: Odd Number of Values


N = 19 Median = Observation = =

6
7 8 9

29
29 29 29

N+1 2
19+1 2 20 2 10

10
11 12 13 14 15 16 17 18

30
30 30 30 30 31 31 32 34

Median age = 30 years

19

36

Obs 1 2 3 4 5

Age 27 27 28 28 28

Median: Even Number of Values


N = 20 Median = Observation = = = N+1 2

6
7 8 9 10 11 12 13 14 15 16 17 18 19

29
29 29 29 30 30 30 30 30 31 31 32 34 36

20+1 2
21 2 10.5

Median age = Average value between 10th and 11th observation 30+30 = 2

30 years

Examples

Find Median of Length of Stay Data;


0, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 10, 10, 10, 10, 11, 12, 12, 12, 13, 14, 16, 18, 18, 19, 22, 27, 49

Median at 50% = 10

Median Properties / Uses


Does not use all the data available Insensitive to extreme values (outliers) Good descriptive measure but poor statistical properties Measure of choice for skewed data Equals an original value of n is odd

Quartiles
Definition: Quartile is the value that splits the distribution into four equal parts
25% of observations are below the first quartile (Q1) 25% of observations are between Q1 and Q2 (median) 25% of observations are between Q2 (median) and Q3 25% of observations are above Q3

Obs

Age

1
2 3

27
27 28 28 28 29 29 29 29 30 30

Quartiles
Q1 age = 28 Q2 age = 30 Q3 age = 31
N+1 4 Q1 observation = round 20+1 21 = = 4 4 = 5.25 ~ 5th obs

Q1

4 5 6 7 8 9

Q2

10 11

Q2 observation = 10.5 (median)


3(N+1) Q3 observation = round 4 3(20+1) 3(21) = = 4 4 = 15.75 ~ 16th obs

12
13 14 15 16 17 18 19

30
30 30 31 31 32 34 36

Q3

Percentiles
Value of the variable that splits the distribution in 100 equal parts
35 % of observations are below the 35th percentile 65 % of observations are above 35th percentile

Obs 1 2

Age 27 27 28 28 28 29 29 29 29

Percentiles
Values (Age) 27 Fre q 2 Percent (Freq/Tota l) 10% Cumulativ e Percent 10%

3 4 5
6 7 8 9

28
29 30 31 32 34 36 37 Total

3
4 5 2 1 1 1 1 20

15%
20% 25% 10% 5% 5% 5% 5% 100%

25%
45% 70% 80% 85% 90% 95% 100%

25th Percentile

10
11 12 13

30
30 30 30

90th Percentile

14
15 16 17 18 19

30
31 31 32 34 36

Arithmetic Mean
Arithmetic mean = average value

Method for identification


1. Sum up all of the values 2. Divide the sum by the number of observations (n)

Obs

Age

1
2 3 4 5

27
27 28 28 28

Arithmetic Mean

6
7 8 9 10 11 12 13 14

29
29 29 29 30 30 30 30 30

x i m N
N = 20 Sxi = 605

15
16 17 18 19

31
31 32 34 36

605 20

30.25

Example
Finding the Mean Length of Stay Data
0, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 10, 10, 10, 10, 11, 12, 12, 12, 13, 14, 16, 18, 18, 19, 22, 27, 49 Sum = 360 n = 30 Mean = 360 / 30 = ?

Arithmetic Mean Properties / Uses


Probably best known measure of central location Use all of the data Affected by extreme values (outliers) Best for normally distributed data Not usually equal to one of the original values Good statistical properties

Sensitive to Outliers
6 5 4 3 2 1 0 0
6

Mean = 12.0
5 10 15 20 25 30 Nights of stay 35 40 45 50

Number of patients

5 4 3 2 1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Nights of stay

Mean = 15.3

When to use the arithmetic mean?


Centered distribution
Approximately symmetrical Few extreme values (outliers)

OK!

Summary
Measure of Central Location single measure that represents an entire distribution Mode most common value Median central value Arithmetic mean average value Mean uses all data, so sensitive to outliers Mean has best statistical properties Mean preferred for normally distributed data Median preferred for skewed data Geometric mean for dilutional titer

Other Measures of Central Location


Midrange = Minimum + maximum values / 2 Quick and dirty Geometric Mean Can use if log of data are normally distributed (e.g., lab titers) = nth root of (Obs1 x Obs2 x Obs3 x Obsn) = antilog (sum log xi / n)

Measures of Spread
Definition: Measures that quantify the variation or dispersion of a set of data from its central location Also known as: Measure of dispersion Measure of variation Common measures Range Standard error Interquartile range 95% confidence interval Variance / standard deviation

Same center
but different dispersions

Range
Definition: difference between largest and smallest values

Example: Finding the Range of Length of Stay Data


0, 2, 3, 4, 5, 5, 6, 7, 8, 9,

9, 9, 10, 10, 10, 10, 10, 11, 12, 12, 12, 13, 14, 16, 18, 18, 19, 22, 27, 49

Range Sensitive to Outliers?


6 5 4 3 2 1 0 0 5 10 15

Range = 0 to 49
20 25 30 Nights of stay 35 40 45 50

Number of patients

6 5 4 3

Range = 0 to 149

2
1 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Nights of stay

Interquartile Range
Definition: the central 50% of a distribution Properties / Uses Used with median Used to show the most typical 50% of the values

Example
IQR Length of Stay Data

Q1 0, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 10, 10, M 10, 10, 11, 12, 12, 12, 13, 14, 16, 18, 18, 19, 22, 27, 49 Q3 Q1 = 25th percentile = (30+1) / 4 = 7 Median = 50th percentile = 15.5 Q3 = 75th percentile = 3 (30+1) / 4 = 23 6 10 14

IQR Length of Stay Data


IR = 7.75
6 5 4

Q1

Q3

3
2

1
0 0 5 10 15 20 25 30 35 40 45 50

Nights of stay

Variance and Standard Deviation


Definition: measures of variation that quantifies how closely clustered the observed values are to the mean Variance = average of squared deviations from mean
= Sum (x mean)2 / n-1

Standard deviation = square root of variance

Variance and Standard Deviation


Mean Mean

Equations for Variance and Standard Deviation


: Mean xi : Data value n : No. of observation s: Variance s : Standard deviation

(x i - x ) s = n-1
( x i - x ) s = n-1

Standard Deviation Properties / Uses


Standard deviation usually calculated only when data are more or less normally distributed (bell shaped curve) For normally distributed data, 68.3% of the data fall within 95.5% of the data fall within 95.0% of the data fall within 99.7% of the data fall within plus/minus plus/minus plus/minus plus/minus 1 SD 2 SD 1.96 SD 3 SD

Normal Distribution
2.5% 95 % 68% 2.5%

Standard deviation
Mean

Comparison of Mode, Median and Mean


Symmetrical:
Mode = Median = Mean

Skewed right:
Mode < Median < Mean

Skewed left:
Mean < Median < Mode

Match the Measures of Central Location & Spread

Mode Median

Standard deviation Range

Arithmetic mean

Interquartile range

Match the Measures of Central Location & Spread

Mode Median

Standard deviation Range

Arithmetic mean

Interquartile range

Name the Appropriate Measures of Central Location and Spread


Distribution Central Location Spread

Single peak, symmetrical Skewed or Data with outliers

Name the Appropriate Measures of Central Location and Spread


Distribution Central Location Spread

Single peak, symmetrical

Mean*

Standard deviation

Skewed or Median Range or Data with outliers Interquartile range


* Median and mode will be similar

Properties of Measures of Central Location & Spread


Arithmetic mean best for normally distributed data Median best for skewed data Mode simple, descriptive, not always useful Standard deviation use with mean Range/Interquartile Range use with median

Median
14

Mode

12 Population
10 8 6 4 2 0
Age

1st quartile

3rd quartile

Minimum

Interquartile interval Range

Maximum

Displaying categorical variables


Table of frequency distributions
Frequency Relative frequency Cumulative frequencies

Charts
Bar charts Pie charts

Frequency distributions
A simple and effective way of summarizing categorical data is to construct a frequency distribution table. First column: Level of the variables. Second column: Count number of observation E.g. Table below shows the frequency distribution of birth weight for 9975 newborns between 1976-1996. BWT . Very low Low Normal Big Total Freq. Rel.Freq(%) 43 0.4 793 8.0 8870 88.9 268 2.7 9974 100 Cum. Freq 43 836 9706 9974

8.4 97.3 100

Relative Frequency
Useful to compute the proportion, or percentages of observations in each level. The distribution of proportions is called the relative frequency distribution of the variable Given a total number of observations, the relative frequency distribution is easily derived from the frequency distribution. Conversion in the opposite direction is also possible, but the conversion is often inaccurate because of rounding The third column of Table below shows the relative frequency distribution of birth weight for 9975 newborns between 1976-1996

Table 1. Frequency Distribution of birth weight of newborns between 1976-1996 at TAH.


BWT Very low Low Normal Big Total Freq. 43 793 8870 268 9974 Rel.Freq(%) Cum. Freq 0.4 43 8.0 836 88.9 9706 2.7 9974 100 q.(%) 0 8. 97.3 10

Cumulative frequency
The cumulative frequency of a category is the number of observations in the category plus observations in all categories smaller than it. BWT Freq. Rel.Freq(%) Cum.Freq Cum.rel.freq.(%) Very low 43 0.4 43 0.4 Low 793 8.0 836 8.4 Normal 8870 88.9 9706 97.3 Big 268 2.7 9974 100 Total 9974 100

Table 2. Frequencies of serum cholesterol levels for 1067 US males of ages 25-34 1976-1980
-----------------------------------------------------------------------------------Cholesterol level Mg/100ml freq Relative freq Cum freq Cum.rel. freq ------------------------------------------------------------------------------------------80-119 13 1.2 13 1.2 120-159 150 14.1 163 15.3 160-199 442 41.4 605 56.7 200-239 299 28.0 904 84.7 240-279 115 10.8 1019 95.5 280-319 34 3.2 1053 98.7 320-359 9 0.8 1062 99.5 360-399 5 0.5 1067 100 ------------------------------------------------------------------------------------------Total 1067 100

Charts
The frequency distribution of a categorical variable is often presented graphically as a bar chart or pie chart. Bar charts: display the frequency distribution for nominal or ordinal data. Horizontal axis: Labels of the variable Vertical bar: Frequency or the relative frequency The bars should be of equal width and should be separated from one another so as not to imply continuity

Bar charts showing frequency distribution of the variable BWT described in Table

6000

100

5000

80
4000

Rel. Freq.

Freq.

60

3000

2000

40

1000

20
0

Very low

Low BWT

Normal

Big

0 Very low Low Normal Big

BWT

Bar charts for comparison


In order to compare the distribution of a variable for two or more groups, bars are often drawn along side each other for groups being compared in a single bar chart
100 90 80 70 60 50 40 30 20 10 0 88.9 89

Percent

Yes No
9 7.9

2.1 3.1 Normal BWT Big

Low

Bar chart indicating categories of birth weight of 9975 newborns grouped by antenatal follow-up of the mothers

Pie chart
Pie Chart: displays the frequency distribution for nominal or ordinal data.
In a pie chart the various categories into which the observation fall are represented along sectors of a circle, such that each sector represents either the frequency or the relative frequency of observation within the class the angles of which are proportional to frequency or the relative.
Fig 3(b) Pie chart indicating relative frequency of categories of birth weight

Fig 3(a) Pie chart indicating frequency of catego of birth weight

2.7

0.4 8
Very low Low Normal Big

268

43 793

88.9

8870

Displaying numerical variables


Graphs Histograms Frequency polygons Cumulative frequency polygons Box Plots

Histograms
Histograms are frequency distributions with continuous class interval that have been turned into graphs. Given a set of numerical data, we can obtain impression of the shape of its distribution by constructing a histogram. Horizontal axis: Labels of the variable Vertical bar: Frequency or the relative frequency Except for the two boundaries, class intervals are usually chosen to be of equal width. If this is not the case, the histogram could give a misleading impression of the shape of the data

Example Consider the following table and the histogram showing distribution of the age of women at the time of marriage
Age group 15-19 20-24 25-29 30-34 35-39 40-44 45-49 No. of women 11 36 28 40 13 35 7 3 30 2
No of women

Age of women at the time of marriage

25 20 15 10 5 0 14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 Age group 34.5-39.5 39.5-44.5 44.5-49.5

A histogram displaying frequency distribution of birth weight of newborns at Tikur Anbessa Hospital
2000 1800 1600 1400 1200 1000 800 600

Frequency

400 200 0

Std. Dev = 502.34 Mean = 3126 N = 9975.00

00 52 00 48 00 44 00 40 00 36 00 32 00 28 00 24 00 20 00 16 00 12

0 80

Birth weight

Frequency polygons
Instead of drawing bars for each class interval, sometimes a single point is drawn at the mid point of each class interval and consecutive points joined by straight line.

A graph drawn in this way is called frequency polygons (line graphs).


Frequency polygons are superior to histograms for comparing two or more sets of data.

Frequency polygon of birth weight of 9975 newborns at Tikur Anbessa Hospital for males and females
50

40

%
30

20

SEX

10

Males Females

0
500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Birth Weight

Cumulative frequency polygons


Horizontal axis: Labels of the variable Vertical bar: cumulative relative frequency.
The points are then connected by straight lines. Like frequency polygons, cumulative frequency polygons may be used to comparing sets of data.

Cumulative frequency polygons can also be used to obtain percentiles of a set of data.
Roughly the 50th percentile is the value that is greater than or equal to 50%.

Table 2. Frequencies of serum cholesterol levels for 1067 US males of ages 25-34 1976-1980
------------------------------------------------------------------------------------

Cholesterol level Mg/100ml freq Relative freq Cum freq Cum.rel. freq ---------------------------------------------------------------------------------------80-119 13 1.2 13 1.2 120-159 150 14.1 163 15.3 160-199 442 41.4 605 56.7 200-239 299 28.0 904 84.7 240-279 115 10.8 1019 95.5 280-319 34 3.2 1053 98.7 320-359 9 0.8 1062 99.5 360-399 5 0.5 1067 100 ---------------------------------------------------------------------------------------Total 1067 100

Table 3. Frequencies of serum cholesterol levels for 1227 US males of ages 55-64 1976-1980
------------------------------------------------------------------------------------------Cholesterol level Mg/100ml freq Relative freq Cum freq Cum.rel. freq ------------------------------------------------------------------------------------------80-119 5 0.4 5 0.4 120-159 48 3.9 53 4.3 160-199 265 21.6 318 25.9 200-239 458 37.3 776 63.2 240-279 281 22.9 1057 86.1 280-319 128 10.4 1185 96.5 320-359 35 2.9 1220 99.4 360-399 7 0.5 1227 100 ------------------------------------------------------------------------------------------Total 1227 100

Frequency polygon and Cumulative frequency polygons of serum cholesterol levels for 2294 males aged 25-34 and55-64 years, 1976-1980

45 40 35 30 25 20 1 5 1 0 5 0 80-1 9 1 1 59 20-1 1 99 60-1 200-239 240-279 280-31 9 320-359 360-399

100
Cumulative relative frequency (%)

90 80 70 60 50 40 30 20 10 0 80-119 120-159 160-199 200-239 240-279 280-319 320-359 360-399

Relative frequency (%)

Ages 25-34 Ages 55-64

Ages 25-34 Ages 55-64

Serum cholesterol levels (m g/100m l)

Serum cholesterol levels (mg/100ml)

Box Plots
A visual picture called box plot can be used to convey a fair amount of information about the distribution of a set of data. The box shows the distance between the first and the third quartiles, The median is marked as a line within the box and The end lines show the minimum and maximum values respectively

Illustration of Box-plot

18

20

22

24

26

28

30

32

34

36

Numbers

A box-plot indicating birth weight of 5092 newborns by gestational age at Tikur Anbessa Hospital studied

Pre

Gest. age

Term

Post

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Birth weight(grams)

Tables

Summary

Diagrams

Although a certain information is lost when data are summarized using tables and graphs, a great deal is gained Tables are effective ways of summarizing categorical data Tables are more informative when they are not overly complex Tables and the columns within them should always be clearly labeled and units of measurement be specified Diagrams have greater attraction than mere figures. The give delight to the eye, add a spark of interest and as such catch the attention as much as the figures dispel it. They help in deriving the required information in less time and without any mental strain. They have great memorizing value than mere figures. This is so because the impression left by the diagram is of a lasting nature. They facilitate comparison

Vous aimerez peut-être aussi