Random variables are simply numerical results from any experiment or physical process which
produces random outcomes. Typically we denote random variables by capital letters like X or Y
and seek to understand where these random variables spend their time, in other words, we want to
make calculations such as P (a < X < b). But before studying random variables, a fair question is
= k) =
n
k
pk (1
p)n k ;
k = 0; 1; : : :; n:
This latter function is called a probability function (or probability mass function or probability
density function). The argument of the function is often a small letter like x, but for the binomial
we often use i or k to emphasize that the possible values are integers.
The expected value of a random variable is the weighted average of the possible values,
E(X ) =
xi P (X
= xi );
where the weights are the probabilities of the values. We call E(X ) the mean of X and often use
the notation or X .
For the coin experiment above,
E(X ) = (0)(1=4) + (1)(1=2) + (2)(1=4) = 1:
It is harder to nd the expected value of a binomial random variable, but the result is quite simple:
n
X
k=0
kP (X
= k) =
n
X
k=0
n
k
px (1
p)n
= np:
(Actually the coin experiment random variable above is a binomial random variable with n = 2
and p = 1=2. Thus the expected value is np = (2)(1=2) = 1.)
Another important expected value is E(X )2 which is called the variance of X . We often
use the notation Var(X ) or 2 or X2 :
Var(X ) = E(X )2 = 2 =
(xi )2P (X = xi ):
The variance is an average weighted distance from the mean . By multiplying out the above
expression and simplifying we are able to derive the computing formula
Var(X ) = E (X 2) 2 =
x2i P (X
= xi ) 2 :
The standard deviation of X is just the square root of the variance, sd(X ) = 2 = .
For the simple coin experiment we have
var(X ) = (0 1)2(1=4) + (1 1)2(1=2) + (2 1)2(1=4) = 1=2;
2
and the standard deviation is = 1=2 = :707. For the binomial random variable, we nd
Var(X ) = np(1 p). (Since the coin experiment X is a binomial random variable, we have
Var(X ) = np(1 p) = (2)(1=2)(1 1=2) = 1=2, which agrees with the direct approach.)
Note that the random variables mentioned so far have a nite set of possible values in SX . It
is possible to have an innite list of possible values such as for the Poisson random variable
P (X
k
= k) = e k! ;
k = 0; 1; : : :
This is not a problem since the probabilities are going to zero quite fast as k gets large|fast
P
enough in fact so that the probabilites sum to one, 1
k=0 P (X = k) = 1, as they must for any
P
random variable. In fact, we can also nd the innite sums = 1
k=0 kP (X = k) = and
P1
2
2
2
2 = . (It may seem a bit strange to
k=0 k P (X = k) = + so that Var(X ) = +
have the mean of a random variable equal to its variance. But this is an important distinguishing
feature of the Poisson distribution. A simple way to check whether a data set appears to be from
a Poisson distribution is to compare the sample mean X and sample variance s2 which should be
approximately equal.)
actually has probability 0, but sub-intervals of SX have positive probability described exactly by
the area under a curve of a function f (x) called the density function of X . This function must
R
be nonnegative, f (x) 0, and integrate to one, 11 f (x)dx=1, so that probabilities such as
R
P (a < X < b) = ab f (x)dx obey the laws of probability. Thus the probability distribution of a
continuous random variable X is described completely by the function f and the subset SX of
( 1; 1) where f (x) > 0.
The simplest examples of continuous random variables are the uniform(c; d) (assuming c < d)
with density
f (x) =
1 ;
x 2 [c; d];
f (x) = e
x= ;
x 0:
Note that we have been amiss in not dening the above densities outside SX = [c; d] or SX = [0; 1),
respectively. Our convention will always be that outside the region where f is dened, f (x) is 0.
The expected value E(X ) for continuous random variables,
E(X ) =
1
1
xf (x)dx;
P
is a weighted average similar to the expected value E(X ) = xi P (X = xi ) for discrete random
variables. Similarly we use the notation for E(X ) and dene the variance by
Var(X ) = 2 = E(X )2 =
1
1
(x )2f (x)dx;
1 2
x f (x)dx
1
2 :
Z d
1 dx =
x
d
"
x2
2(d c)
#d
2
2
= d c = d + c;
2(d c)
2
c
and
E(X ) =
2
"
d
x3
d2 + c2 + cd
1 dx =
d3 c3
=
;
x
=
d c
3(d c) c 3(d c)
3
Z d
and
Var(X ) =
d2 + c2 + cd
d+c
2
= (d 12c) :
2
Z
ye y dx
= ye
y 1
0
= [0 0 + 0 + 1] = :
Similarly we nd E(X 2) = 2 2 and Var(X ) = 2 2 2 = 2:
Distribution Functions
For computing probabilities about X such as P (a < X < b) =
to dene a function
F (x) = P (X
x) =
Z x
Rb
f (x)dx,
it sometimes helps
f (t)dt;
which is called the cumulative distribution function of X or simply the distribution function of
X . If F is known, then it can be simpler to calculate P (a < X < b) = F (b) F (a) rather than
integrate f from a to b for each new a and b.
For example, the exponential( ) distribution function is
F (x) =
Z x
1e
t= dt
i
t= x
=1 e
x=
x 0:
Since F (x) is 0 for x < 0, our convention is to ignore F over that interval. If lifetimes of light bulbs
have an exponential distribution and we want the probability that a light bulb with average life
of = 800 hours will last longer than 1000 hours, we have P (X > 1000) = 1 P (X 1000) =
1 (1 e 1000=800) = e 1:25 = :29.
5
To get the density function f from the distribution function F (x), just take the derivative of
F (x) with respect to x. For example, d[1 exp( x= )]=dx = (1= )exp( x= ):
The most important continuous distribution is the normal distribution whose density function
is
f (x) =
p1
2
(x )2
2 2
The mean of this distribution is E(X ) = , and the variance is Var(X ) = 2. Thus the normal
distribution has natural parameters and 2 which turn out to be its mean and variance as their
names suggest.
The normal distribution is very important because many types of data and measurements can
be well-modeled by a normal density. That is, many data sets have histograms which resemble
the bell-shaped curve of a normal density. The normal distribution also has special mathematical
properties which are beyond the scope of this course. In one aspect, though, the normal distribution is very hard to work with: no analytic integral exists for the function e x . Thus, to nd
probabilities for normal random variables we need to use numerical integration.
2
But we may not always have a computer or calculator available to do the numerical integration.
Thus the classical approach is to put in a table the values of the distribution function F (z ) for the
standard normal random variable (usually called Z ) whose density is
f (z ) =
p1
2
z2
2
1 < z < 1:
The standard normal Z has mean = 0 and variance 2 = 1. The key to using the standard
normal distribution table is that X = Z + is also a normal distribution with mean and
variance 2. Thus, to nd probabilities for a general normal random variable X , we need to
convert the probability about X into a probability about Z , and then look up the probabilities for
Z in the table.
For example, suppose that we want P (5 < X < 8), where X has mean = 4 and standard
deviation = 3. Replacing X by Z + = 3Z + 4, we have
P (5 < 3Z + 4 < 8)
Transformations
We have just seen how a linear transformation X = Z + or its inverse (Z = X )= helps
with probability calculations. In general we may know the density and/or distribution function of
a random variable X and want the density and/or distribution function of some transformation
of X , say Y = aX , or Y = aX + b, or some general one-to-one function Y = h(X ) with inverse
h 1 (Y ) = X .
The easiest method to remember is the distribution function method:
FY (y ) = P (Y
p
p
FY (y ) = P (Y y ) = P (X 2 y ) = P (X y ) = FX ( y ) = 1
py
y 0:
The distribution method is not as useful when you start with the density of X and want the
density of Y because you have to rst nd the distribution function of X to use in the above
sequence. Then at the end you must take the derivative of the distribution function of Y to get the
density function of Y . If you put all those steps together, the density function method is simply
fY (y ) = fX h
d
(y )
(h 1 (y )) :
dy
(y )) = d(py ) = 1 :
dy
dy
2py
1
py
2p
x= ,
y 0:
we have