Vous êtes sur la page 1sur 13

Statistics (Summary)

Representation of Data:
Pictographs:
One of the most arresting ways of illustrating statistics is by using graph in the
form of pictures. This kind of graph is called a pictograph or pictogram.
Advantages:
Very attractive way of presenting the information.
Easy to read.
Disadvantages:
Time consuming.
Sometimes very difficult to draw.
Sometimes very difficult to read accurately when there are bits of pictures.

Bar Charts:
Simple Bar Chart:
A simple bar chart is used to represents data involving only one variable classified
on spatial, quantitative or temporal basis. In simple bar chart, we make bars of
equal width but variable length, i.e. the magnitude of a quantity is represented by
the height or length of the bars. Following steps are undertaken in drawing a
simple bar diagram.
Advantages:
Makes comparison easy.
Clear strong visual impact.
Easy to draw well.
Disadvantages:
Inability to show enough interactive relationships between all of the activities on
larger, more complex projects.

http://olevelalevelnotes.wordpress.com

Compound/Multiple Bar Chart:


By multiple bars diagram two or more sets of inter-related data are represented
(multiple bar diagram facilities comparison between more than one phenomena).
The technique of simple bar chart is used to draw this diagram but the difference is
that we use different shades, colors, or dots to distinguish between different
phenomena. We use to draw multiple bar charts if the total of different phenomena
is meaningless.
Advantages:
Makes comparison between different items very easy.
Disadvantages:
Only a few items can be shown.
If more items are shown, the graph becomes very messy.
Time consuming.
Component/Composite/Sectional Bar Chart:
Sub-divided or component bar chart is used to represent data in which the total
magnitude is divided into different or components. Sub-divided bar chart may be
drawn on percentage basis. To draw sub-divided bar chart on percentage basis, we
express each component as the percentage of its respective total. In drawing
percentage bar chart, bars of length equal to 100 for each class are drawn at first
step and sub-divided in the proportion of the percentage of their component in the
second step. The diagram so obtained is called percentage component bar chart or
percentage staked bar chart. This type of chart is useful to make comparison in
components holding the difference of total constant.
Advantages:
Shows the division of an item into its constituent parts.
Comparison between the parts is easy.
Different grouped quantities can be compared at the same level.
Disadvantages:
Calculations required.
Time consuming in contrast to Simple Bar Chart.
Only a few items can be included.
Percentage Component Bar Chart does not show total output.

http://olevelalevelnotes.wordpress.com

Line Graphs:
Line graphs show the data by means of drawing a line. This kind of graph is very
useful for showing upward and downward trends, and is the kind of graph most
used and misused in newspapers, magazines and advertisements.
Advantages:
Shows the trend.
Disadvantages:
Takes a lot of time to draw.
Sometimes becomes very difficult to read when there are too many value.
Uninformative and also misleading.

Pie Chart:
Pie chart can used to compare the relation between the whole and its components.
Pie chart is a circular diagram and the area of the sector of a circle is used in pie
chart. The angles are made in the circle by mean of a protractor to show different
components. The arrangement of the sectors is usually anti-clock wise.
Advantages:
Attractive way of representing the data.
Comparison of the relation between the whole and its components becomes easy.
Disadvantages:
Requires calculations.
Sometimes becomes very complex.
Exact values are not given.
The answers in most cases must be approximations.
There has to be actual measurement of angles using protractors.

http://olevelalevelnotes.wordpress.com

Sampling:
Some important definitions:
Population: The group of people, items or units under investigation.
Parameter: A numerical characteristic of a population, such as its mean and
standard deviation is called a parameter.
Census: Obtained by collecting information about each member of a population
Sample: Obtained by collecting information only about some members of a
"population"
Statistic: A quantity calculated from a sample is a statistic.
Sampling Frame: The list of people from which the sample is taken. It should be
comprehensive, complete and up-to-date. Examples of sampling frame: Electoral
Register; Postcode Address File; telephone book
A probability sample is one in which each member of the population has an equal
chance of being selected.
In a non-probability sample, some people have a greater, but unknown, chance
than others of selection.

Simple Random Sample:


A simple random sample gives each member of the population an equal chance of
being chosen. It is not a random sample as some people think. One way of
achieving a simple random sample is to number each element in the sampling
frame (e.g. give everyone on the Electoral register a number) and then use random
numbers to select the required sample.
Random numbers can be obtained using your calculator, a spreadsheet, printed
tables of random numbers, or by the more traditional methods of drawing slips of
paper from a hat, tossing coins or rolling dice.
The optimum sample is the one which maximizes precision per unit cost, and by
this criterion simple random sampling can often be bettered by other methods.

http://olevelalevelnotes.wordpress.com

Advantages:
Cheap.
Simple.
Easily applied to a small population ensures bias is not introduced.
Disadvantages:
Hard to achieve in practice.
Requires an accurate list of the whole population.
Expensive to conduct as those sampled may be scattered over a very large area.

Stratified Random Sample:


This is a sample, made up of random samples from each section or stratum of a
population, where the size of each of the random samples is propotional to the size
of that section of population.
Advantages:
Focuses on important subpopulations and ignores irrelevant ones.
Allows use of different sampling techniques for different subpopulations.
Improves the accuracy/efficiency of estimation.
Permits greater balancing of statistical power of tests of differences between strata
by sampling equal numbers from strata varying widely in size.
Disadvantages:
Requires selection of relevant stratification variables which can be difficult.
Is not useful when there are no homogeneous subgroups.
Can be expensive to implement.
Requires greater effort.
Strata must be defined carefully to obtain accurate results.
More complex to organize and analyze results.

Quota Sample:
In quota sampling the selection of the sample is made by the interviewer, who has
been given quotas to fill from specified sub-groups of the population. For
example, an interviewer may be told to sample 50 females between the age of 45
and 60.
There are similarities with stratified sampling, but in quota sampling the selection
of the sample is non-random. Anyone who has had the experience of trying to
interview people in the street knows how tempting it is to ask those who look most
helpful, hence it is not the most representative of samples, but extremely useful.
Advantages:
Quick and cheap to organize.
Ensures selection of adequate numbers of subjects with appropriate characteristics.

http://olevelalevelnotes.wordpress.com

Disadvantages:
Not as representative of the population as a whole as other sampling methods.
Because the sample is non-random it is impossible to assess the possible sampling
error. Not possible to prove that the sample is representative of designated
population.

Systematic Sample:
This is random sampling with a system. From the sampling frame, a starting point
is chosen at random, and thereafter at regular intervals.
In a random sample every member of the population has an equal chance of being
chosen, which is clearly not the case here but in practice a systematic sample is
almost always acceptable as being random.
Advantages:
Spreads the sample more evenly over the population.
Easier to conduct than a simple random sample.
Disadvantages:
The system may interact with some hidden pattern in the population.

http://olevelalevelnotes.wordpress.com

Variables:
A variable is a symbol that stands for a value that may vary; the term usually
occurs in opposition to constant, which is a symbol for a non-varying value, i.e.
completely fixed or fixed in the context of use. The Variable varies or changes
from one group to another.
There are two types of variables:
Quantitative Variable
Qualitative Variable
Quantitative Variable:
A quantitative variable is naturally measured as a number for which meaningful
arithmetic operations make sense. In other words it is the one which can be given
the numerical value.
Qualitative Variable:
Also known as categorical variables, qualitative variables are variables with no
natural sense of ordering. They are therefore measured on a nominal scale. For
instance, hair color (Black, Brown, Gray, Red, Yellow) is a qualitative variable.
Qualitative variables can be coded to appear numeric but their numbers are
meaningless. Variables that are not qualitative are known as quantitative variables.
Discrete Variable:
Variables that can only take on a finite number of values are called "discrete
variables." All qualitative variables are discrete. Some quantitative variables are
discrete, such as performance rated as 1,2,3,4, or 5, or temperature rounded to the
nearest degree.
Continuous Variable:
Variables that can take on an infinite number of possible values are called
"continuous variables." It is the one which can take up any value within a certain
range. The different values of a continuous variable are usually obtained by some
kind of measurement. A boys height or weight varies continuously and may be
found by measuring.

http://olevelalevelnotes.wordpress.com

Measures of Central Tendency:


Mean:
This is the average used in arithmetic. To find the mean of a set of scores, we
simply add up the scores and divide by the number of scores. Mean is be affected
by the extreme values.

Mean(X) = X/N
Mean(X) =fX/f
Median:
When a number of scores is arranged in numerical order, the median is the middle
score having the same number of scores above it as below. It is not affected by the
extreme values because it is in the middle of the arranged numbers.
Mode:
The mode is the score which occurs most frequently. It is also not affected by the
extreme values as it is the number that is occurring most frequently.
Relationship between the Mean, Median and Mode:
A frequency distribution shows a symmetrical curve peaking at the centre is said to
be a normal distribution and it has the mean, median and mode coinciding at the
centre.
A distribution which is not symmetrical is said to be skewed. The mean is dragged
either left or right according to the nature of skew. The greater the difference
between the mean and the mode, the more skewed the distribution.
Geometric Mean:
The geometric mean of n numbers is the nth root of their product. The geometric
mean, however, cannot be calculated if we have negative or the zero values of the
variables. Calculation of the geometric mean is one way of reducing the effect of
an outlier while still using every value of the variable. Geometric mean is never
greater than arithmetic mean. The geometric mean is particularly useful when
dealing with a distribution where there is a constant rate of growth or decay.

http://olevelalevelnotes.wordpress.com

Comparison between MEAN, MEDIAN and MODE:

Mean

Median

Mode

Advantages
It can be calculated
exactly.
It makes the use of all
data.
It can be used in further
statistical calculations.
It is simple to understand.
It is unaffected by
abnormally high or low
values.
It is the characteristic of
the normal group and
sometimes represents an
actual member of the
group.

It is simple to understand.
It is unaffected by
abnormally high or low
values.
It is the average useful to
manufacturers of shoes,
clothes, hats and so on.

Disadvantages
It can be very misleading
if there is an abnormally
high or low value.

It cannot b used in the


further statistical
calculations.
Its value can only be
estimated in grouped
distributions (C.F.
Curves).
In small groups or in
groups which have a
rather odd pattern of
distribution, it may not be
the characteristic of the
group.
It cannot be determined
exactly in a distribution
where the data is grouped.
It cannot be used in
arithmetical calculations.

http://olevelalevelnotes.wordpress.com

Measures of Dispersion:
Range:
One simple way of measuring the scatter is to consider the range of values. The
range is defined as the difference between the greatest and the least measures in the
distributions.
Inter-quartile Range:
The inter-quartile range (IQR) is the distance between the 75th percentile and the
25th percentile. The IQR is essentially the range of the middle 50% of the data.
Because it uses the middle 50%, the IQR is not affected by outliers or extreme
values. Half of this range (25 percent or the distribution) is called
semi-inter-quartile range.
Inter-percentile Range:
Using percentiles we have another measure of the dispersion of a distribution
called the inter-percentile range. This is the range between the tenth percentile and
the ninetieth percentile. It also does not depend on the extreme values.
Mean Deviation:
This, as it implies, is the mean of the deviations or differences of the scores from
the mean, median or mode. It is usually most useful to calculate the mean deviation
of the scores from the mean since the mean itself depends on all the measures in
the distributions.
Variance:
In calculating variance, the mean of the distribution is found, the deviations of the
scores from the mean are tabulated, these deviations are squared and the mean of
the squares of the deviations is calculated.
The (population) variance of a random variable is a non-negative number which
gives an idea of how widely spread the values of the random variable are likely to
be; the larger the variance, the more scattered the observations on average.
Stating the variance gives an impression of how closely concentrated round the
expected value the distribution is; it is a measure of the 'spread' of a distribution
about its average value.
Standard Deviation:
The square root of variance is called standard deviation. This is the most
satisfactory measure of dispersion, since it makes use of all of the scores in the
distribution and is also quite acceptable mathematically.

http://olevelalevelnotes.wordpress.com

Moving Averages:
Moving Averages can be used to smooth out a graph when the irregularities are the
result of variations other than seasonal variations. It reduces and almost removes
the variation from the time series and can also be used for future predictions.
Time Series:
A time series is a sequence of data points, measured typically at successive times,
spaced at (often uniform) time intervals.
A time series is a sequence of observations which are ordered in time (or space). If
observations are made on some phenomenon throughout time, it is most sensible to
display the data in the order in which they arose, particularly since successive
observations will probably be dependent. Time series are best displayed in a scatter
plot. The series value X is plotted on the vertical axis and time t on the horizontal
axis. Time is called the independent variable (in this case however, something over
which you have little control).
Trend:
Trend is a long term movement in a time series. It is the underlying direction (an
upward or downward tendency) and rate of change in a time series, when
allowance has been made for the other components.
A simple way of detecting trend in seasonal data is to take averages over a certain
period. If these averages change with time we can say that there is evidence of a
trend in the series. There are also more formal tests to enable detection of trend in
time series.
It can be helpful to model trend using straight lines.
Seasonal Variation:
In weekly or monthly data, the seasonal component, often referred to as
seasonality, is the component of variation in a time series which is dependent on
the time of year. It describes any regular fluctuations with a period of less than one
year. For example, the costs of various types of fruits and vegetables,
unemployment figures and average daily rainfall, all show marked seasonal
variation.
We are interested in comparing the seasonal effects within the years, from year to
year; removing seasonal effects so that the time series is easier to cope with; and,
also interested in adjusting a series for seasonal effects using various models.

http://olevelalevelnotes.wordpress.com

Cyclic Variation:
In weekly or monthly data, the cyclical component describes any regular
fluctuations.
It is a non-seasonal component which varies in a recognizable cycle.

http://olevelalevelnotes.wordpress.com

Probability:
Mutually Exclusive Events:
In some situations two events cannot occur at the same time. These events are
called mutually exclusive events. In this case:
P(A B) = 0
P(A or B) = P(A) + P(B)
Exhaustive Events:
If a situation has only a limited number of possible outcomes then these outcomes
are said to be exhaustive events. One or more events are said to be exhaustive if all
the possible elementary events under the experiment are covered by the event(s)
considered together. In other words, the events are said to be exhaustive when they
are such that at least one of the events compulsorily occurs.
Exhaustive events may be elementary or compound events. They may be equally
likely or not equally likely.
Independent Events: When the probabilities of certain events occurring are quite
unconnected to one another, these events are said to be independent events.
P(A B) = P(A) x P(B)
P(A and B) = P(A) x P(B)

__________________________________________________________________

http://olevelalevelnotes.wordpress.com

Vous aimerez peut-être aussi