Vous êtes sur la page 1sur 47

Types of data

Gender male/female

Age in years

continuous

Continuous
Time is on a continuous scale

Weight nearest Kg

Weekly expenditure

continuous

Continuous
Again a continuous scale but
you might argue that it is
discrete

Number of siblings

Party Labour, Cons, Lib Dem,


other

discrete
categorical
Term accommodation Halls, Home,
Private rented, other
categorical

Assignment grade A, B, C, D, E
Ranked because we put grade A
as better than grade B better
than C etc

Tabulation of data
Males

Females

Total

Halls

75

100

175

Home

30

38

68

Private
rented

21

30

Other

Total

130

150

280

Pie chart angle formula


categoryfrequency
angle
totalfrequ ecny

X 360

Pie chart angle calculations


Halls 75/130 x360 =208
Home 30/130x360= 83
Private 21/130x360=58
Other 4/130x360= 11
Must add up to 360

Pie Chart

Presentation of information

A simple bar chart has non touching bars with the height of each
bar proportional to the frequency. In Excel a bar chart with vertical
bars is called a Column chart
And with horizontal bars is called a Bar chart
Multiple Bar Chart: the bars are split into several to show another
variable. In Excel such a bar chart is known as a
clustered column with vertical bars and a clustered bar with
horizontal bars.
Component Bar Chart:components are stacked together to show
another variable. In Excel such a bar chart is known as a stacked
column with vertical bars and a stacked bar chart with horizontal
bars.
Percentage Component Bar charts convey the information in
percentage form rather than the actual frequency values and thus
highlight differences in proportions of one variable.

Frequency table
Time

Frequency

Boundaries

Class size Class


midpoint

0-<5
5-<10
10-<15
15-<20
20-<25
25-<30
30-<35
35-<40
40-<45

0
3
7
10
12
8
6
2
2

0
5
10
15
20
25
30
35
40

5
5
5
5
5
5
5
5
5

5
10
15
20
25
30
35
40
45

2.5
7.5
12.5
17.5
22.5
27.5
32.5
37.5
42.5

Open ended classes


Under 5 becomes 0 to under 5
40 and over becomes 40 to under 45

Histogram
of delivery times

Frequency polygon of delivery times

Unequal class sizes


Delivery
time

Number of
deliveries

Frequency
density

Adjusted
height using
standard class width 5

0-<10

0.3

1.5

10-<15

1.4

15-<20

10

10

20-<25

12

2.4

12

25-<30

1.6

30-<45

10

0.7

3.3

Blank grids

Histogram standard class size 5

Histogram using Frequency density

Guidelines for grouping data


So, in the example,
Largest observation = 41
Smallest = 5
Require say 8 classes class width =

41 5
4.5 5
8

Cumulative frequency table


Delivery time

Frequency

Cumulative
Frequency
Less than 5

5-<10

Less than 10

10-<15

Less than 15

10

15-<20

10

Less than 20

20

20-<25

12

Less than25

32

25-<30

Less than 30

40

30-<35

Less than 35

46

35-<40

Less than 40

48

40-<45

Less than 45

50

Cumulative frequency graph

Finding measures from the cumulative


frequency graph

Measures for this example

median look at cumulative frequency of 25 on graph


22 days
upper quartile -cumulative frequency of 37.5 on graph
28 days
lower quartile - cumulative frequency of 12.5 on graph
16 days
inter-quartile range is UQ LQ 28-16=12 days
20th percentile look at cumulative frequency of 10 on
graph 15 days
look at 30 days horizontal axis to give 40 deliveries so
50-40 =10 deliveries are more than 30 days
90% of deliveries take less 34 days

Measures of location
28

28

35

35

35

36

39

44

44

The mode : Most frequently occurring item 35


The median: Middle number.

35 36
35.5
2

The mean

x
n

374
37.4
10

50

Using frequency formula


X

FX

28

56

35

105

36

36

39

39

44

88

50

50

fx

x
f

Measure from a grouped frequency table


Time
0-<5
5-<10
10-<15
15-<20
20-<25
25-<30
30-<35
35-<40
40-<45

Frequency
F

0
3
7
10
12
8
6
2
2
50

Midpoint
X

2.5
7.5
12.5
17.5
22.5
27.5
32.5
37.5
42.5

FX

0
22.5
87.5
175
270
220
195
75
85
1130

Cumulative
frequency

0
3
10
20
32
40
46
48
50

Measures
Mean =

1130
22.6
50

Mode is estimated to
be 22.5, the middle of
the modal class
Median

5
20 5
12
22.8

Which measure is best


3

Mean= 10.9~11

10

10

Mode= 3, 10

10

25 40

Median = 7

quartiles

Lower quartile

Upper quartile

2.5
15
5 16.25
10

5.5
25
5 28.375
8

Measures of spread

Armstrong
3

Barrett
4 6

4 5 3 5 4 4 3 5

Ordered

2 3 3 4 4 4 6 6

3 3 4 4 4 4 5 5

Mean

4 weeks

Mode

Median

Conclude

Little difference

Range

6-2 = 4

Inter-quartile range
Standard deviation
Coefficient of
variation

5-3=2

Standard deviation
X

2 3 3 4
x x -2 -1 -1 0
4 1 1 0
( x x)
14
2

14
1.75
8
14
8

1.32
1.32
100 33%
4

4
0
0

4
0
0

6 6
2 2
4 4

Coefficient of variation
st .deviation
100
mean

The higher the ratio, the greater the spread around the mean.

Lengths mean= 55
Weights mean= 5.5

standard deviation = 28.7 coefficient of variation=52%


standard deviation = 2.8.7 coefficient of variation = 52%

Mean and Standard Deviation for Armstrong


X
0.5
1.5
2.5
3.5
4.5
5.5
6.5
totals

F
14
15
18
16
15
11
11

FX
7
22.5
45
56
67.5
60.5
71.5
330

FX2
3.5
33.75
112.5
196
303.75
332.75
464.75
1447

mean

330
x
100
3.3weeks

Standard deviation
1447
2
standard deviation
3.3
100
3.58
1.89 weeks

Probabilty examples
Examples
Throw a coin. The probability of a head = 0.5
There are three counters in a bag, one red, one green and one
blue. One counter is pulled out.
1
The probability that the counter is red =

3
The counter is then replaced and a second pulled out.
List all the outcomes: RR, RG, RB, GR, GG, GB, BR, BG, BB
the probability that both the first and the second were red =

1
9

example
Over the last month (November) a machine
has broken down on three days. What is the
probability the machine breaks down?

3
30

example
In a sample of adults these probabilities were found:
P(male) = 0.5 P(Married)=0.6
P(full time job)=0.9
A person is selected at random. What is the probability that
the person is
i) married and male

0.6 0.5 0.3

ii) male and in a full time job

iii) female

0.5 0.9 0.45


0.5

example
P(male)=0.7 P(aged 40 to 59)=0.4
P(aged 60 to 69)=0.15
P(aged 70 or more)=0.1
Female

1-0.7 = 0.3

Probability(40 to 59) or (60 to 69) years old

0.4+0.15=0.55
Probability (female) and (40 to 59) years old

0.30.4 = 0.12
male or aged 40 to 59? CANNOT SAY
NOT 0.7 + 0.4 = 1.1

example
The probability that firm A makes a profit has been
assessed to be 0.6. The probability that the firm breaks
even is 0.3.
What is the probability that the firm makes a loss?

1- 0.6 - 0.3 = 0.1

example
Firm

Profit

Break
even

loss

0.6

0.3

0.1

0.7

0.1

0.2

example

both firms make a profit 0.60.7=0.42


firm A does not make a profit 1-0.6=0.4
firm A makes a profit or breaks even
0.6+0.3 = 0.9
firm B does not make a profit 0.3
neither firms make a profit 0.40.3=0.12
at least one firm makes a profit
1-0.12=0.88
only one firm makes a profit
1- 0.42 0.12 = 0.46

Throw a die
Expected score is

1
1
1
1
1
1
1 2 3 4 5 6
6
6
6
6
6
6
21

6
3.5

expectation
0

10

15

0.4

0.30

0.15

0.1

0.05

Expected number of minutes late


00.4 + 30.3 + 50.15 + 100.1 + 150.05 =
3.4 minutes

Spread of values

with = 3000 hrs and = 200 hrs


approximately 68% of the bulbs will last between
2800 hours and 3200 hours,
approximately 95% of the bulbs will last between
2600 hours and 3400 hours,
approximately 99.75% of the bulbs will last
between 2400 hours and 3600 hours.

Using normal tables


i P(Z<1.3)

1 0.0968

= 0.9032

ii P(Z>1.3)

Read directly from the table

=0.0968

iii P(Z<-1.3)

Read directly from the table

=0.0968

iv P(1.3<Z<2.4) 1 0.0082 0.9032

=0.0886

v P(-1.3<Z<2.4) 1 0.0082 0.0968

=0.895

Find K
find k such that P(Z>K) = 0.15
also means P(Z<-K) = 0.15
from tables , the nearest probability to 0.15 is
0.1492 when the Z value is -1.04
P(Z<-1.04) = 0.15 approximately
hence K=1.04

Solution to example
= 2000 hours and = 250 hours
(a) less than 1750 hours

1750 2000
1
250

from tables area to the left of -1 = 0.1587


(b)more than 2350 hours,

2350 2000
1.4
250
from tables area to the left of -1.4 = 0.0808

Example (c)

between 1800 hours and 2400 hours?


1800 2000
0.8
250

2400 2000
1.6
250

area to the left of -0.8= 0.2119


area to left of 1.6 = 1- 0.0548 = 0.9452
area in between = 0.9452-0.2119 = 0.7333

Example(d)
4% fail
P(Z<k) = 0.04
from tables , nearest probability
to 0.04 is for Z= -1.75

k 2000
1.75
250

solve to get k = 1562.5

Example (e)

best 6%
P(Z>k) = 0.06 means also that
P(Z<-k) = 0.06
from tables nearest probability to 0.06 is for Z= -1.55
(or use -1.56)

k 2000
1.55
250

solve to get k = 2387.5


2388 to nearest whole number

Vous aimerez peut-être aussi