Académique Documents
Professionnel Documents
Culture Documents
Data
Data can be defined as groups of information that represent the
qualitative or quantitative attributes of a variable or set of variables,
which is the same as saying that data can be any set of information
that describes a given entity. Data in statistics can be classified into
grouped data and ungrouped data.
Any data that you first gather is ungrouped data. Ungrouped data is
data in the raw. An example of ungrouped data is a any list of
numbers that you can think of.
Grouped Data
Grouped data is data that has been organized into groups known as
classes. Grouped data has been 'classified' and thus some level of
data analysis has taken place, which means that the data is no
longer raw.
A data class is group of data which is related by some user defined
property. For example, if you were collecting the ages of the people
you met as you walked down the street, you could group them into
classes as those in their teens, twenties, thirties, forties and so on.
Each of those groups is called a class.
Each of those classes is of a certain width and this is referred to as
the Class Interval or Class Size. This class interval is very
important when it comes to drawing Histograms and Frequency
diagrams. All the classes may have the same class size or they may
have different classes sizes depending on how you group your data.
The class interval is always a whole number.
Below is an example of grouped data where the classes have the
same class interval.
Age
(years)
Frequency
0 - 9
12
10 - 19
30
20 - 29
18
30 - 39
12
40 - 49
50 - 59
60 - 69
Solution:
Below is an example of grouped data where the classes have
different class interval.
Age (years)
Frequency
Class Interval
0 - 9
15
10
10 - 19
18
10
20 - 29
17
10
30 - 49
35
20
50 - 79
20
30
Example 1:
Group the following raw data into ten classes.
Solution:
Class interval should always be a whole number and yet in this case
we have a decimal number. The solution to this problem is to round
off to the nearest whole number.
In this example, 2.8 gets rounded up to 3. So now our class width
will be 3; meaning that we group the above data into groups of 3 as
in the table below.
Number
Frequency
1 - 3
4 - 6
7 - 9
10 - 12
13 - 15
16 - 18
19 - 21
22 - 24
25 - 27
28 - 30
The relationship between the class boundaries and the class interval
is given as follows:
Class limits and class boundaries play separate roles when it comes
to representing statistical data diagrammatically as we shall see in a
moment.
Probability
Probability is the branch of mathematics that deals with the study
chance. Probability deals with the study of experiments and their
outcomes.
Probability Key Terms
Experiment
An experiment in probability is a test to see what will
happen incase you do something. A simple example is
flipping a coin. When you flip a coin, you are performing
an experiment to see what side of the coin you'll end up
with.
Outcome
An outcome in probability refers to a single (one) result of
an experiment. In the example of an experiment above,
one outcome would be heads and the other would be tails.
Event
An event in probability is the set of a group of different
outcomes of an experiment. Suppose you flip a coin
multiple times, an example of an event would the getting
a certain number of heads.
Sample Space
A sample space in probability is the total number of all the
different possible outcomes of a given experiment. If you
flipped a coin once, the sample space S would be given
by:
Notation of Probability
The probability that a certain event will happen when an experiment
is performed can in layman's terms be described as the chance that
something will happen.
The probability of an event, E is denoted by
For the sample space given above, if the event is 2, there is only one
2 in the sample space, thus n = 1 and N = 6.
Thus probability of getting a 2 when you roll a die is given by
When an event has probability of one, we say that the event must
happen and when the probability is zero we say that the event is
impossible.
The total of all the probabilities of the events in a sample space add
up to one.
Events with the same probability have the same likelihood of
occurring. For example, when you flip a fair coin, you are just as
likely to get a head as a tail. This is because these two outcomes
have the same probability i.e.
Concepts in Probability
The study of probability mostly deals with combining different
events and studying these events alongside each other. How these
different events relate to each other determines the methods and
rules to follow when we're studying their probabilities.
Events can be pided into two major categories dependent or
Independent events.
Independent Events
If we don't return this card into the deck, the probability of drawing
an ace on the second pick is given by
As you can clearly see, the above two probabilities are different, so
we say that the two events are dependent. The likelihood of the
second event depends on what happens in the first event.
Conditional Probability
We have already defined dependent and independent events and
seen how probability of one event relates to the probability of the
other event.
Having those concepts in mind, we can now look at conditional
probability.
Conditional probability deals with further defining dependence of
events by looking at probability of an event given that some other
event first occurs.
Conditional probability is denoted by the following:
The different regions of the set S can be explained as using the rules
of probability.
Rules of Probability
When dealing with more than one event, there are certain rules that
we must follow when studying probability of these events. These
rules depend greatly on whether the events we are looking at are
Independent or dependent on each other.
First acknowledge that
But remember from set theory that and from the way we defined our
sample space above:
and that:
For any given pair of events, if the sum of their probabilities is equal
to one, then those two events are mutually exclusive.
Rules of Probability for Mutually Exclusive Events
Multiplication Rule
From the definition of mutually exclusive events, we
should quickly conclude the following:
Addition Rule
As we defined above, the addition rule applies to
mutually exclusive events as follows:
Subtraction Rule
From the addition rule above, we can conclude that
the subtraction rule for mutually exclusive events
takes the form;
hence
which means that the probability that the random variable is equal
to some real number x.
In the above example, we can say:
Let X be a random variable defined as the number of heads
obtained when two coins are tossed. Find the probability the you
obtain two heads.
So now we've been told what X is and that x = 2, so we write the
above information as:
Since we already have the sample space, we know that there is only
one outcomes with two heads, so we find the probability as:
From this example, you should be able to see that the random
variable X refers to any of the elements in a given sample space.
There are two types of random variables: discrete variables and
continuous random variables.
Discrete Random Variables
The word discrete means separate and individual. Thus discrete
random variables are those that take on integer values only. They
never include fractions or decimals.
A quick example is the sample space of any number of coin flips, the
outcomes will always be integer values, and you'll never have half
heads or quarter tails. Such a random variable is referred to as
discrete. Discrete random variables give rise to discrete probability
distributions.
Continuous Random Variable
Continuous is the opposite of discrete. Continuous random variables
are those that take on any value including fractions and decimals.
Continuous random variables give rise to continuous probability
distributions.
Probability Distributions
A probability distribution is a mapping of all the possible values of a
random variable to their corresponding probabilities for a given
sample space.
The probability distribution is denoted as
P(X = x)
given that you know the full table of the cumulative distribution
functions of the sample space.
Continuous Probability Distribution
Continuous random variables give rise to continuous probability
distributions. Continuous probability distributions can't be tabulated
since by definition the probability of any real number is zero i.e.
and so on.
While a discrete probability distribution is characterized by its
probability function (also known as the probability mass function),
and
From the above, we can see that to find the probability density
function f(x) when given the cumulative distribution function F(x);
whereby the above means that the probability density function f(x)
exists within the region {x;a,b} but takes on the value of zero
anywhere else.
For example, given the following probability density function
1. P(X 4)
Since we're finding the probability that the random variable is less
than or equal to 4, we integrate the density function from the given
lower limit (1) to the limit we're testing for (4).
We need not concern ourselves with the 0 part of the density
function as all it indicates is that the function only exists within the
given region and the probability of the random variable landing
anywhere outside of that region will always be zero.
2. P(X < 1)
P(X < 1) = 0 since the density function f(x) doesn't exist outside of
the given boundary
3. P(2 X 3)
Since the region we're given lies within the boundary for which x is
defined, we solve this problem as follows:
4. P(X > 1)
The above problem is asking us to find the probability that the
random variable lies at any point between 1 and positive Infinity. We
can solve it as follows:
1 2 3 4 5 6
Row Totals
Heads
a b c d e f
Tails
g h i
Column Totals
k l
In the example we gave above, flipping a coin and tossing a die are
independent random variables, the outcome from one event does
not in any way affect the outcome in the other events. Assuming
that the coin and die were both fair, the probabilities given
by a through l can be obtained by multiplying the probabilities of the
different x and y combinations.
For example: P(X = 2, Y = Tails) is given by
Since we claimed that the coin and the die are fair, the
probabilities a through l should be the same.
The marginal PDF's, represented by the Greek letters should be the
probabilities you expect when you obtain each of the outcomes.
For example:
Row Totals
Heads
12
12
12
12
12
12
Tails
12
12
12
12
12
12
12
1
1
Column Totals
6
6
If X and Y are Dependent:
Solution:
The random variables X and Y are dependent since they are picked
from the same sample space such that if any one of them is picked,
the probability of picking the other is affected. So we solve this
problem by using combinations.
We've been told that there are 4 possible outcomes of X i.e
{0,1,2,3} where by you can pick none, one, two or three black balls;
and similarly for Y there are 3 possible outcomes {0,1,2} i.e. none,
one or two blue balls.
The joint probability distribution is given by the table below:
f(x,y)
0
1
2
Column Totals
0 1 2 3 Row Totals
What the above represents are the different number of ways we can
pick each of the required balls. We substitute for the different values
of x (0,1,2,3) and y (0,1,2) and solve i.e.
0
0
2
70
3
70
5
70
1
3
70
18
70
9
70
30
70
2
9
70
18
70
3
70
30
70
3
3
70
2
70
5
70
Row Totals
15
70
40
70
15
70
1
Example:
A certain farm produces two kinds of eggs on any given day; organic
and non-organic. Let these two kinds of eggs be represented by the
random variables X and Y respectively. Given that the joint
probability density function of these variables is given by
Solution:
a) The marginal PDF of X is given by g(x) where
c) P(X 12, Y 12
where the above is the probability that X lies between a and b given
that Y = y.
For a set of continuous random variables, the above probability is
given as:
where g(x) is the marginal pdf of X and h(y) is the marginal pdf of Y.