Statistics 4040

REVISION NOTES FOR STATISTICS
Khodaboc us Aihjaaz
AVERAGE
Mean There are four types of average: mean, mode, median and range. The mean is what most people mean when they say 'average'. It is found by adding up all of the numbers you have to find the mean of, and dividing by the number of numbers. So the mean of 3, 5, 7, 3 and 5 is 23/5 = 4.6 . When you are given data which has been grouped, the mean is Sfx / Sf , where f is the frequency and x is the midpoint of the group (S means 'the sum of'). Example: Work out an estimate for the mean height. Height (cm) Number of People (f) Midpoint (x) 101-120 1 110.5 121-130 3 125.5 131-140 5 135.5 141-150 7 145.5 151-160 4 155.5 161-170 2 165.5 171-190 1 180.5 Sfx = 3316.5 Sf = 23 mean = 3316.5/23 = 144cm (3s.f.) Mode The mode is the number in a set of numbers which occurs the most. So the modal value of 5, 6, 3, 4, 5, 2, 5 and 3 is 5, because there are more 5s than any other number. Range The range is the largest number in a set minus the smallest number. So the range of 5, 7, 9 and 14 is (14 - 5) = 9. fx (f multiplied by x) 110.5 376.5 677.5 1018.5 622 331 180.5
The Median Value The median of a group of numbers is the number in the middle, when the numbers are in order of magnitude. For example, if the set of numbers is 4, 1, 6, 2, 6, 7, 8, the median is 6: 1, 2, 4, 6, 6, 7, 8 (6 is the middle value when the numbers are in order) If you have n numbers in a group, the median is the (n + 1)/2 th value. For example, there are 7 numbers in the example above, so replace n by 7 and the median is the (7 + 1)/2 th value = 4th value. The 4th value is 6.
Khodaboc us Aihjaaz
Cumulative Frequency This is the running total of the frequencies. On a graph, it can be represented by a cumulative frequency polygon, where straight lines join up the points, or a cumulative frequency curve. Example: Frequency: 4 6 3 2 6 4 Cumulative 4 10 13 15 21 25 frequency: (4 (4 (4 (4 (4 + + + + + 6) 6+ 6+ 6+ 6+ 3) 3 + 2) 3 + 2 + 6) 3 + 2 + 6 + 4)
The Median Value The median of a group of numbers is the number in the middle, when the numbers are in order of magnitude. For example, if the set of numbers is 4, 1, 6, 2, 6, 7, 8, the median is 6: 1, 2, 4, 6, 6, 7, 8 (6 is the middle value when the numbers are in order) If you have n numbers in a group, the median is the (n + 1)/2 th value. For example, there are 7 numbers in the example above, so replace n by 7 and the median is the (7 + 1)/2 th value = 4th value. The 4th value is 6. When dealing with a cumulative frequency curve, n is the cumulative frequency (25 in the above example). Therefore the median would be the 13th value. To find this, on the cumulative frequency curve, find 13 on the y-axis (which should be labelled cumulative frequency). The corresponding 'x' value is an estimation of the median. Quartiles If we divide a cumulative frequency curve into quarters, the value at the lower quarter is referred to as the lower quartile, the value at the middle gives the median and the value at the upper quarter is the upper quartile. A set of numbers may be as follows: 8, 14, 15, 16, 17, 18, 19, 50. The mean of these numbers is 19.625 . However, the extremes in this set (8 and 50) distort this value. The interquartile range is a method of measuring the spread of the middle 50% of the values and is useful since it ignore the extreme values. The lower quartile is (n+1)/4 th value (n is the cumulative frequency, i.e. 157 in this case) and the upper quartile is the 3(n+1)/4 the value. The difference between these two is the interquartile range (IQR). In the above example, the upper quartile is the 118.5th value and the lower quartile is the 39.5th value. If we draw a cumulative frequency curve, we see that the lower quartile, therefore, is about 17 and the upper quartile is about 37. Therefore the IQR is 20 (bear in mind that this is a rough sketch- if you plot the values on graph paper you will get a more accurate value).
Khodaboc us Aihjaaz
Histograms Histograms are similar to bar charts apart from the consideration of areas. In a bar chart, all of the bars are the same width and the only thing that matters is the height of the bar. In a histogram, the area is the important thing. Example: Draw a histogram for the following information. Frequency: Height (feet): (Number of pupils) Relative frequency: 0-2 0 0 2-4 1 1 4-5 4 8 5-6 8 16 6-8 2 2 (Ignore relative frequency for now). It is difficult to draw a bar chart for this information, because the class divisions for the height are not the same. The height is grouped 0-2, 2-4 etc, but not all of the groups are the same size. For example the 4-5 group is smaller than the 0-2 group. When drawing a histogram, the y-axis is labelled 'relative frequency' or 'frequency density'. You must work out the relative frequency before you can draw a histogram. To do this, first you must choose a standard width of the groups. Some of the heights are grouped into 2s (0-2, 2-4, 6-8) and some into 1s (4-5, 5-6). Most are 2s, so we shall call the standard width 2. To make the areas match, we must double the values for frequency which have a class division of 1 (since 1 is half of 2). Therefore the figures in the 4-5 and the 5-6 columns must be doubled. If any of the class divisions were 4 (for example if there was a 8-12 group), these figures would be halved. This is because the area of this 'bar' will be twice the standard width of 2 unless we half the frequency
Khodaboc us Aihjaaz
PROBABILITY
Introduction Probability is the likelihood or chance of an event occurring. Probability = the number of ways of achieving success the total number of possible outcomes For example, the probability of flipping a coin and it being heads is , because there is 1 way of getting a head and the total number of possible outcomes is 2 (a head or tail). We write P(heads) = . The probability of something which is certain to happen is 1. The probability of something which is impossible to happen is 0. The probability of something not happening is 1 minus the probability that it will happen. Single Events Example: There are 6 beads in a bag, 3 are red, 2 are yellow and 1 is blue. What is the probability of picking a yellow? The probability is the number of yellows in the bag divided by the total number of balls, i.e. 2/6 = 1/3. Example: There is a bag full of coloured balls, red, blue, green and orange. Balls are picked out and replaced. John did this 1000 times and obtained the following results: Number of blue balls picked out: 300 Number of red balls: 200 Number of green balls: 450 Number of orange balls: 50 a) What is the probability of picking a green ball? b) If there are 100 balls in the bag, how many of them are likely to be green? a) For every 1000 balls picked out, 450 are green. Therefore P(green) = 450/1000 = 0.45 b) The experiment suggests that 450 out of 1000 balls are green. Therefore, out of 100 balls, 45 are green (using ratios). Multiple Events When working out what the probability of two things happening is, a probability/ possibility space can be drawn. For example, if you throw two dice, what is the probability that you will get: a) 8, b) 9, c) either 8 or 9? a) The black blobs indicate the ways of getting 8 (a 2 and a 6, a 3 and a 5, ...). There are 5 different ways. The probability space shows us that when throwing 2 dice, there are 36 different possibilities (36 squares). With 5 of these possibilities, you will get 8.
Khodaboc us Aihjaaz
Therefore P(8) = 5/36 . b) The red blobs indicate the ways of getting 9. There are four ways, therefore P(9) = 4/36 = 1/9. c) You will get an 8 or 9 in any of the 'blobbed' squares. There are 9 altogether, so P(8 or 9) = 9/36 = 1/4 . Another way of representing 2 or more events is on a probability tree. Example: There are 3 balls in a bag: red, yellow and blue. One ball is picked out, and not replaced, and then another ball is picked out. The first ball can be red, yellow or blue. The probability is 1/3 for each of these. If a red ball is picked out, there will be two balls left, a yellow and blue. The probability the second ball will be yellow is 1/2 and the probability the second ball will be blue is 1/2. The same logic can be applied to the cases of when a yellow or blue ball is picked out first. In this example, the question states that the ball is not replaced. If it was, the probability of picking a red ball (etc.) the second time will be the same as the first (i.e. 1/3). The AND and OR rules In the above example, the probability of picking a red first is 1/3 and a yellow second is 1/2. The probability that a red AND then a yellow will be picked is 1/3 1/2 = 1/6 (this is shown at the end of the branch). The probability of picking a red OR yellow first is 1/3 + 1/3 = 2/3. When the word 'and' is used we multiply. When 'or' is used, we add. On a probability tree, when moving from left to right we multiply and when moving down we add. Example: What is the probability of getting a yellow and a red in any order? This is the same as: what is the probability of getting a yellow AND a red OR a red AND a yellow. P(yellow and red) = 1/3 1/2 = 1/6 P(red and yellow) = 1/3 1/2 = 1/6 P(yellow and red or red and yellow) = 1/6 + 1/6 = 1/3
Khodaboc us Aihjaaz
There are a number of ways of representing data diagrammatically. (See also Histograms). Scatter Graphs These are used to compare two sets of data. A line of best fit is drawn, which should pass through as many points as possible. It should have roughly the same number of points above and below it. The less scatter there is about the best-fit line, the stronger the relationship is between the two quantities. If the points are close to the best-fit line, we say that there is a strong correlation. If the points are loosely scattered, there is a weak correlation. There is no correlation if there is no trend in the results. Bar Chart A bar chart is a chart where the height of bars represents the frequency. The data is 'discrete' (discontinuous- unlike histograms where the data is continuous). The bars should be separated by small gaps. Pie Chart A pie chart is a circle which is divided into a number of parts. The pie chart above shows the TV viewing figures for the following TV programmes: Eastenders, 15 million Casualty, 10 million Peak Practice, 5 million The Bill, 8 million Total number of viewers for the four programmes is 38 million. To work out the angle that 'Eastenders' will have in the pie chart, we divide 15 by 38 and multiply by 360 (degrees). This is 142 degrees. So 142 degrees of the circle represents Eastenders. Similarly, 95 degrees of the circle is Casualty, 47 degrees is Peak Practice and the remaining 76 degrees is The Bill.
Khodaboc us Aihjaaz
Sampling When examining a particular population it is usually advisable to choose a small sample in such a way that everyone is represented. This is not easy and requires careful thought about sample size and composition. Often questionnaires are devised to identify the required information. These need to be idiot proof, so questions need to cover all alternatives and give little scope for variation. Example question: A bus company attempted to estimate the number of people who travel on local buses in a certain town. They telephoned 100 people in the town one evening and asked 'Have you travelled by bus in the last week?' Nineteen people said 'Yes'. The bus company concluded that 19% of the town's population travel on local buses. Give 3 criticisms of this method of estimation. In answering this question, there are no 3 correct answers. As long as what you say is plausible and sensible, you should get the marks. For example, you might say: 100 people in a large town is not a large enough proportion of the population to give a good sample. People who travel on local buses once a fortnight may have said no to the question. They nevertheless travel on local buses. On the evening that the sample was carried out, anybody travelling by bus would be out.
Khodaboc us Aihjaaz
Standard Deviation
Lower case sigma means 'standard deviation'. Capital sigma means 'the sum of'. x bar means 'the mean' The standard deviation measures the spread of the data about the mean value. It is useful in comparing sets of data which may have the same mean but a different range. For example, the mean of the following two is the same: 15, 15, 15, 14, 16 and 2, 7, 14, 22, 30. However, the second is clearly more spread out. If a set has a low standard deviation, the values are not spread out too much. Example: Find the standard deviation of 4, 9, 11, 12, 17, 5, 8, 12, 14 First work out the mean: 10.222 Now, subtract the mean individually from each of the numbers in the question and square the result. This is equivalent to the (x - xbar) step. x refers to the values in the question. x (x - x) 4 38.7 9 1.49 11 0.60 12 3.16 17 45.9 5 27.3 8 4.94 12 3.16 14 14.3
Now add up these results (this is the 'sigma' in the formula): 139.55 Divide by n. n is the number of values, so in this case is 9: 15.51 And finally, square root this: 3.94 The standard deviation can usually be calculated much more easily with a calculator and this is usually acceptable in exams. With some calculators, you go into the standard deviation mode (often mode '.'). Then type in the first value, press 'data', type in the second value, press 'data'. Do this until you have typed in all the values, then press the standard deviation button (it will probably have a lower case sigma on it). Check your calculator's manual to see how to calculate it on yours. NB: If you have a set of numbers (e.g. 1, 5, 2, 7, 3, 5 and 3), if each number is increased by the same amount (e.g. to 3, 7, 4, 9, 5, 7 and 5), the standard deviation will be the same and the mean will have increased by the amount each of the numbers were increased by (2 in this case). When dealing with data such as the following: x f 4 9 5 14 6 22 7 11 8 17

the formula for standard deviation becomes:
Khodaboc us Aihjaaz
Try working out the standard deviation of the above data. You should get an answer of 1.32 .

Statistics 4040

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Statistics 4040

Transféré par

Droits d'auteur :

Formats disponibles

REVISION NOTES FOR STATISTICS

REVISION NOTES FOR STATISTICS

REVISION NOTES FOR STATISTICS

REVISION NOTES FOR STATISTICS

REVISION NOTES FOR STATISTICS

REVISION NOTES FOR STATISTICS

REVISION NOTES FOR STATISTICS

REVISION NOTES FOR STATISTICS

REVISION NOTES FOR STATISTICS

Vous aimerez peut-être aussi