ProbabilityStat SLIDES

Probability and Statistics for
Engineers
Epoka University
CEN CE ECE
Probability and Statistics for Engineers
Dr. Julian Hoxha
a.y. 2016/2017
Prerequisite/Textbook
Multi variable calculus, Discrete mathematics
Textbook is required:
Papoulis, A., & Pillai, S. U. (2002). Probability,
Random Variables, and Stochastic Processes. 4th
Edition. Tata McGraw-Hill Education.
Grading Policy
Assignments: 10%
Each student must hand in one copy
2 assignments
One midterm exam: 30%

Final exam: 60%
Lectures
Objective
The goal of the course is to introduce
probabilistic modeling and its role in solving
engineering problems.
It provides a foundation in the theory and
applications of probability and stochastic
processes and an understanding of the
mathematical techniques relating to random
processes.
It forms the basis for understanding random
processes in the areas of signal processing,
detection, estimation, and communication.
Lectures
Approach: how to do well in this course
Attend ALL lectures and the assignments.
Examples will be given and solved in the class:

this gives you the opportunity to clarify things
further!
Course Learning Outcomes
The student should become familiar with the

concept of probability: Axioms of probability,
conditional probability, Bayes theorem and
Bernoulli trials.
To become familiar with the concept of random
variables. Function of random variables and
inverse problems.
To become familiar with statistic concept and
stochastic process.
Understanding Markov chain and Bayesian
statistical inference.
Contents
History and overview (slides adapted from prof. Hisashi Kobayashi
blog)
Meaning of Probability
The axioms of probability
Repeated trials
Concept of Random Variables (C.R.V)
Function of one random variable
Characterization of a random variable
Two random variables
Function of two random variables
Sequences of random variables
Introduction to statistics
Stochastic process
Markov chain
Bayesian statistical inference
Introduction
Why study probability, random process and statistical
analysis?
Motivations/Applications
- Communication, information, and control systems
- Signal processing
(Processing, Signals are often characterized as

random processes, Markov process representation etc.).
- Machine learning (Probabilistic reasoning and the Bayesian statistical

approach play an important role, Hidden Markov model (HMM), Bayesian
network, artificial neural network (ANN) etc.).
- Biostatistics, bioinformatics, and related fields

- Econometrics and mathematical finance
- Queueing and loss systems
- Other application domains
History and overview/ Classical probability theory
History and overview/ Modern probability theory
Meaning of probability
The theory of probability deals with averages of mass
phenomena occurring sequentially or simultaneously:
electron emission, telephone calls, radar detection, quality
control,...etc.
In repeated experiments, the averages may exhibit statistical
regularity, and may converge as more trials are made.
A mathematical model to study random phenomena. This is
the domain of probability and statistics.
Certain averages approach a constant value as the number
of observation increases.
Using this approach we define probability in terms of
frequency of occurrence, as a percentage of successes in
a large number of observation.
Example: In the coin experiment, the percentage of the
heads approches 0.5.
The purpose of theory is to predict and describe such
averages in terms of probabilities of events.
The probability of an event (an event is a collection or a set
of outcomes) A is a number P(A) assigned to this event.
If the experiment is performed n times (with n sufficiently
large) and the event A occurs nA times, with a high
degree of certainty, the relative frequency nA| n of the
occurrence of A is close to P(A):
P( A)
nA n
In the applications of probability to real problems, we assume that
probabilities satisfy certain axioms, and by deductive reasoning we
determine from the probabilities P(Ai) of certain events Ai the probabilities
P(Bj) of other events Bj.
Example: if you are rolling a fair die, the probability of the event even
equals 3| 6.
Rolling a die outcomes

S ={ , , , , , }
Unable to predict outcome but in the long run one can determine that each outcome will
occur 1/6 of the time.
Each side is the same. One side should not occur more frequently than another side in the
long run. If the die is not balanced this may not be true.
Random:
We call phenomenon random if individual outcomes are

uncertain but there in nonetheless a regular distribution of outcomes in a large
number of repetitions.
Probability:
The probability of any outcome of a random

phenomenon is the proportion of times the outcome would occur in a very long
series of repetitions. That is, probability is a long-term relative frequency.
Probability theory:
Probability theory is the branch of

mathematics that describes random behavior.
Axioms of probability: Set theory
Examples
1. Tossing a coin outcomes S ={Head, Tail}
Rolling a die outcomes
S ={ , , , , , }
={1, 2, 3, 4, 5, 6}
An Event E
The event, E, is any subset of the sample
space, S. i.e. any set of outcomes (not
necessarily all outcomes) of the random
phenomena.
Venn
diagram

The event, E, is said to have occurred if after the outcome has been
observed the outcome lies in E.
Examples: Rolling a die with outcomes

S = {1, 2, 3, 4, 5, 6}
E = the event that an even number is rolled
= {2, 4, 6}
Special Events:
The Null Event, The empty event - f

f = { } = the event that contains no outcomes. (the event that
never occur)
The Entire Event, The Sample Space - S
S = the event that contains all outcomes. (the entire event that
always occur)
Set operations on Events
Union or
sum
Let A and B be two events, then the union of A and B is the event denoted
by AB or A+B. This is a set whose elements are all elements of A or of B
or of both.
The event A B occurs if the event

A occurs or the event (and) B occurs .
AB
Intersection or product
Let A and B be two events, then the
intersection of A and B is the event
denoted by AB or AB. This is a set
consisting of all elements that are common
to the set A and B.
The event A B occurs if the event A

occurs and the event B occurs .

Union operation is commutative {AB=BA} and associative {(AB) C=A (BC)}
Intersection operation is commutative {AB=BA}, associative {(AB)C=A(BC)} and
distributive {(AB)C=ACBC}
Two events A and B are called

mutually exclusive or disjoint if
they have no outcomes in common:
A B f
Axioms of probability: Probabilistic model

A probabilistic model is a mathematical description of an uncertain situation.
Elements of a Probabilistic Model are: 1) sample space and 2) the probability law
which assigns to a set A of possible outcomes (also called an event) a nonnegative
number P(A) (called the probability of A) that encodes our knowledge or belief about
the collective likelihood of the elements of A.
Regardless of their number, different elements of the sample space should be distinct
and mutually exclusive so that when the experiment is carried out, there is a unique
outcome.
A probability measure is an assignment of real numbers to the events defined on
.
The set of properties that the assignment must satisfy are called the axioms of
probabilities.
Axioms of probability: Axiomatic method in mathematics

The oldest and most famous example of axioms is Euclids axioms in geometry.
In his book entitled Elements , Euclid (aka Euclid of Alexandria) deduced all
propositions (or theorems) of what is now called Euclidean geometry from the
following five axioms (or sometimes called postulates).
Axiom 1: We can draw a straight line segment joining any two points.
Axiom 2: We can extend any straight line segment indefinitely in a straight line.
Axiom 3: We can draw a circle with any point as its center and any distance as its
radius.
Axiom 4: Are all right angles are congruent (i.e., equal to each other)
Axiom 5: If two straight lines intersect a third straight line in such a way that the
sum of the inner angles on one side is less than two right angles, then the two lines
inevitably must intersect each other on that side if extended indefinitely (known as
the parallel postulate).
Many mathematicians attempted to deduce Axiom 5 from Axioms 1-4, but failed.
The Hungarian mathematician Jnos Bolyai (1802-1860) and the Russian
mathematician Nicolai Labachevsky (1792-1856) independently discovered a
non-Euclidean geometry , now known as hyperbolic geometry, based entirely
on Axioms 1-4.
Axioms of probability: Probabilistic Axioms
Axioms of probability: Example of Coin tosses

Consider an experiment involving a single coin toss. There are two possible
outcomes, heads (H) and tails (T). The sample space is = {H, T}, and the events
are
Events Space = {H, T}, {H}, {T}, .
If the coin is fair, i.e., if we believe that heads and tails are equally likely, we should
assign equal probabilities to the two possible outcomes and specify that
The additivity axiom implies that
which is consistent with the normalization axiom. Thus, the probability law is given by
and satisfies all three axioms.

At this example we have assigned a numerical value of probability at each events
inside the event space. In this way we have obtained a probability law.
In cases of more complicated problems we should use a systematic approach. In
particular, it is necessary to introduce the axioms or postulates that must satisfy
the probability law. This was done by A. N. Kolmogorov
Probability and Statistics for Engineers (19031987)
Axioms of probability: Fields

Events are subsets of S to which we have assigned probabilities.
We shall not consider as events all subsets of S but only a class F of subsets. This
because, in certain case it is impossible to assign probabilities to all subsets
satisfying all the previous axioms when infinite outcomes are involved.
The field F is a nonempty class of sets such that:
These two proprieties gives a minimum set of conditions for F to be a field. All the
other properties follow:
Axioms of probability: -algebra (-field)
An n1 F
An F
n 1
A -algebra on a set S is a collection of subsets of S that include the

impossible event (or empty subset) and the certain event, is closed under
complement and is closet under union & intersection of countably infinite
subsets.
Axioms of probability: Axiomatic Definition of an Experiment
Axioms of probability: Properties & Examples
Axioms of probability: Noncountable infinite elements
Axioms of probability: Noncountable infinite elements
Axioms of probability: Conditional probability
Axioms of probability: Chain rule
Axioms of probability: Bayes Theorem

Example: We have three boxes:
Box 1 contains 2000 electronic components of which 5% (100 components) are
defective.
defective.
defective.
We select at random one of the boxes and we remove at random a single component.
1. What is the probability that the selected component is defective?
2. If we know that one component is defective, what is the probability that it comes
from Box 2?
The sample space associated with this experiment has 4000 components with 600
defective components. First of all we observe that if the Box was only one, the
probability of the defective one is 600/4000=0.15.
However the experiment is chosen in a different way. First we chose the Box and after
the component inside.
Axioms of probability: Independence
Axioms of probability: Independence
Axioms of probability: Conditional Independence
Repeated trials: Combined experiment

Given two experiments: the first experiment is rolling a die and the second is tossing
a coin.
The probability to get on the die and on the coin is simply 1 6 1 2,
because the experiments have independent outcomes.
If the two experiment are viewed as a single one, the new sample space is the
cartesian product of each sample space: = 1 2 , where 1 = {1,2,3,4,5,6} and
2 = {, }.
The new space result in 12 ordered pairs elements. In this space, 2 is not an
elementary event but a subset considering of two elements: = {2, 2}
We must assign probabilities to all subsets of . Because the events &
are independent, their intersection is the event {} and we arrive at the
same result as before: = 1 1 =
1 2 ( ).
If the experiment are independent we can find the probability law, , on the space
product simply by considering the probability law 1 and 2 defined on each sample
space.
If an experiment involves a sequence of independent but identical stages, we say
that we have a sequence of independent trials. In the special case where there are
only two possible results at each stage, we say that we have a sequence of
independent Bernoulli trials.
Repeated trials: Bernoulli trials

Example: In Black Jack we are dealt with 2 cards. What is the probability that we
will have a 21?
The number of ways the two cards can be selected from a deck with 52 cards is
= 52 51 = 2652. The probability of selected any pair cards is simply 1 2652.
To have a 21 we need the first card to be a face or a ten {10,J,Q,K} and the second
card to be an ace or we have an ace at the first card and a face or a ten at the second
card. In 52 cards we have 4 aces,10,Jacks,Queens and Kings {hearts, diamonds, clovers
and spikes}.
The probability to have any ace as the first card is simply 4 52 and as second card a
face or a ten is 16 51. Because the order of the cards is essential, we have that having
a Face + Ace is distinct from having an Ace + Face. The number of outcomes in order
to have a 21 is therefore 2 4 16 = 128. Thus the probability is 128 2652 = 4.83%.
A set of n distinct objects can be placed in several different orders forming
permutations.
Ordering n objects is equivalent of performing n operations in sequence. The total
number of ways this can be done is 1 + 1 3 2 1 = ! where
is some positive integer with, .
The total number of permutations of n objects taken at k at a time is given by:
!
1 ( + 1) =
( )!

Suppose that the order is irrelevant, then the k objects taken from n objects form
different combinations.
Example: a) 2 permutations of the letters A, B, C are AB, BA, BC, CB, AC, CA b)
whereas the combinations of 2 out 3 of letters are AB, AC, BC.
To count the number of combinations, note that selecting a k-permutation is the
same as first selecting a combination of k items and then ordering them.
Since there are k! ways of ordering the k selected items, we see that the number of
k-permutations is equal to the number of combinations times k!.
Hence, the number of possible combinations, is given by:
!
=
! !
If a set has n elements, then the total number of its subsets consisting of k elements
each equals ! ! !
Consider the coin experiment, where the probability of heads {} is and of tails
{} is = 1 .
Suppose we toss the coin n times, we obtain a new space, = 1 consisting
of 2 elements 1 , , , where = .
Assuming that the experiment are independent we have that: 1 , , = 1 1 . .
( ) where = = = .

If the elementary events consist of heads and tails (in a specific order), then:
1 , , =
Let us now consider the probability that heads shows k times in an
sequence, () = ( ).
The event " " consist of elementary events containing k
heads and tails in a specific order.
Since we know the probability of each elementary events, then the probability of
" " is:

() =

The numbers (reads n choose k) are known as binomial coefficients, while

the probability () are known as binomial probabilities.
Note that the binomial probabilities () must add to 1, thus showing the binomial
formula:

=1
=0
Repeated trials: Bernoullis Theorem

Let A denote an event whose probability of occurrence in a single trial is . If
denote the number of occurrence of A in independent trials we have that:
> < 2
This theorem states that the frequency definition of probability of an event and
its axiomatic definition can be made compatible to any degree of accuracy with
probability 1.
In other words, given two positive numbers and , the probability of the inequality
< will be greater then 1 .
The theorem states that in a sufficiently long series of independent trials with
constant probability, the relative frequency of an event will differ from that probability
by less than any specified number ( no matter how small ), with a probability
approaching 1 or with certainty.
For a given > 0, 2 can be made arbitrarily small by letting n became large.
Thus, we can make the relative frequency, , close to in a single trial.
Concept of Random Variable (C.R.V): Random variable

Given an experiment specified by the space ( ), the field of subset of called
events and the probability assign to these events, we assign a numerical number to
each outcome of the experiment.
Thus, we have created a function with domain the set and range a set of
numbers. This function is called random variable.
Mathematically we can write, : where is the range of
the function .
If , , is the probability space, then we want the random variable on to
preserve the information about the probability of the events.
C.R.V: Random variable

We start with the meaning of the following notation:
x
which represent a subset A of consisting of all outcomes such that () x.
A = { () x}
The event { x} is the subset A = {2 , 3 , 4 } of (in white), obtained based on

the elements whose image through is less than or equal to x.
{ x} is not a set of numbers but a set of experimental outcomes.
C.R.V: Random variable (r.v.)

If A = { x} is an event for each x we can calculate the probability P(A).
If the previous definition is verified, than we can calculate the probability of the set
B = { () }, if this set can be obtained as union, intersection and
complement of the event { x}.
This is equivalent to say that the numerical set is obtained from the
complement, union or intersection of countable left half lines.
Formal definition of r.v.: Given a probability space (, , ), the r.a is defined on
with value on = {, +} such that:
1) The set { x} is an event, x .

2) P( = + ) = P( = ) = 0.
The r.v. is a process of assigning a number () to each outcome .

To conclude, we observe that defining a r.v. in a probability space (, , ), in
practice it is equivalent to construct a new probability space, where the sample
space is now , the events are subsets of obtained from the complement,
union or intersection of countable left half lines, and the probability law is induced
from the probability law .
C.R.V: Cumulative distribution function (CDF)

The function that express the probability of the event { x} as x takes different
value in is called Cumulative Distribution Function (CDF) of the r.v. .
x = x ,
We note that the range of CDF is all , whereas the range of is .

When we want to specify explicitly the r.v. , we will write the CDF as (x).
Example: In the coin toss experiment the probability of heads equals and the
probability of tails equal . We define the r.v. such that
() = 0 and () = 1
0,
and the CDF as: (x) = ,
1,
x<0
0x<1
x1
Indeed we have that:

For x < 0,
x = x = x < 0 = = 0;
For 0 x < 1, x = x = ( = 0) = () = ;
For x 1,
x = x = = 0 = 1 = + = + = 1;

Example Cont.: The CDF x = (x) of the r.v. of the previous example is shown
in figure:
Example: Lets have = {1 , 2 , 3 , 4 , 5 , 6 } with equiprobable outcomes.

Consider the r.v. = 10. Then the CDF is calculated as:

Example Cont.: Because the CDF is a staircase function the r.v. is discrete.
Example: A telephone call occur at random in the interval [0,T] and we denote by
the time arrival of the call. The sample space is = [0, T], and the events are
complements, union and intersection of countable open intervals ]a, b[ . The
probability law will be defined as:
a, b
ba
,
T
0<abT
We define the r.v. as = , 0 T. Then the CDF is calculated as:

If x < 0 then { x} is the impossible event because 0 for every . Hence
x = x = x < 0 = = 0.
If 0 x < T then x = x = 0 < x = x T.
If x T
then x = x = 0 < T = T T = 1.
In this case the CDF is a continues function so the r.v. is continues.
C.R.V: Properties of the CDF

Previous examples have shown that the CDF of a random variable is a function with
values on an interval [0, 1] and non decreasing. The CDF shows other properties
demonstrated below.
The expression (x + ) and (x ) will mean the limits from the right and left of the
function (x) on the point x, or (x + ) = lim (x + ) and (x ) = lim (x ) with
0
> 0.
1. We have that: + = 1, = 0.
Proof. + = + = = 1 and = = = 0.
2. It is a non decreasing function: if x1 < x2 (x1 ) (x2 ).
Proof. If x1 < x2 we have that x1 { x2 } ( x1 ) ( x2 ).
3. If x0 = 0 then x = 0 for every x x0 .
Proof. Follows from propriety 2. Also, we have that if () > 0, we have
that 0 = 0 = 0 x = 0 for x 0. In this case we have a positive r.v.
4. P > x = 1 (x).
Proof. We just need to observe that x > x = and the two events are
mutually exclusive, x + > x = = 1.
5. The function x is continues from the right, x + = (x).

Proof. We need to prove that lim (x + ) = (x) per > 0. We note that for the
0
property 2, (x) is monotonically increasing function (and limited) so we are sure that at
every point exists finite right hand limit and left one (theorem of existence of the limit
for monotonic function). Then, to calculate the limit from the right it is not restrictive to
consider = 1 and make (that is, let tend to zero on a particular sequence
of values). We than note that, x + 1 = x + 1 = ( ), where = {
+ 1 }; note that is a decreasing function such that
=1 = = { x}
x + = lim ( + 1 ) = lim ( ) = = x = (x)
6. P 1 < 2 = 2 1 .
Proof. For the event {1 < 2 } we have { 1}{1 < 2 }= { 2 }.
Where the two events are mutually exclusive 1 + 1 < 2 = (
2 ).
7. P = = () ( ).
Proof. Lets take = { 1 < }: this sequence of events is decreasing such
that
=1 = = { = x}. From the property 6, for 1 = 1 e 2 = we have
that:
= 1 < = () ( 1 )
Passing to the limit and exploiting the properties of continuity of the probability

We have that lim = = ( = ) ;
on the other hand because () is
monotonous and limited function it admits a finite limit to the left on the point ,
having:
= = lim 1 = () ( ).
8. P x1 x2 = (x2 ) (x1 ).
Proof. We have that:
x1 x2 = x1< x2 { = x1}
With events at the second member mutually exclusive. For 6&7 property we have that:
P x1 x2 = x1< x2 + ( = x1)=F(x2 ) (x1)+F(x1) x1
= (x2 ) (x1 ).
9. P x1 < x2 = (x2 ) (x1 ).
10. P x1 < < x2 = (x2 ) (x)
C.R.V: Continues, Discrete and Mixed type

If the CDF () is a piecewise constant (step type) function, except for a finite
number of jumps discontinuities, than the r.v. is said to be discrete.
If are discontinuity points of (), then from propriety 7 of CDF we have that:
= = =
0,
, =
A discrete r.v. takes on the values with probability taken from the CDF
discontinuity jump values.
The r.v. is said to be a continuous type if its distribution function () is
continuous.
The continuity of () implies that = = + = = 0, .
In other terms, a continuous r.v. takes on each value of its codomain with zero
probability.
Finally, the r.v. is mixed if its CDF is discontinuous, but not piecewise constant.
C.R.V: Continues, Discrete and Mixed type

Example: A fair coin is tossed twice, and let the r.v. represent the number of
heads. Find ()?
The sample space is: = {, , , } and

= 2, = 1, = 1, = 0.
If < 0, = = 0.
If 0 < 1, = = = = 1 4.
If 1 < 2, = , , = + + = 3 4.
C.R.V: Probability density function (pdf)

The derivate of the CDF () is called the probability density function (pdf) () of
the r.v. . Thus:
() =
()
.
Since
()
( + ) ()
= lim
0
0
from the monotone non decreasing nature of () 0, .

If is a continuous r.v. will be a continuous function.
If is a discrete r.v. than the pdf has the general form:
=
( )
where represent the jump discontinuity points in and () is the Dirac delta
function (more on Dirac function in the next slide), and = ( = ).
The amplitude of the discontinuity jump represent the probability that the r.v. takes
the values .
We derive from the derivative of the CDF, a pdf which consist of only Dirac pulses,
centered in discrete values.
C.R.V: Dirac Delta Function

In mathematics, the Dirac delta function, or () function, is a generalized function,
or distribution, on the real number line that is zero everywhere except at zero, with
an integral of one over the entire real line.
Consider () as any continuous function in = 0. The Dirac pulse () is defined
as:
0 , 0 ], [
0, 0 ], [
The Dirac pulse is sampling () in = 0. It is clear that ordinary function do not
possess this propriety.
A good approximation of the Dirac pulse () is a tall narrow function with unit
area. For example:
1 , 2
with 1.
0,
2
If ] 2 , /2[ ], [ && if () is a slowing

varying function () (0) for 2
we have:
(0).
C.R.V: Dirac Delta Function

The previous approximate equality becomes exact if we pass to the limit for 0:
lim
= lim
0
= (0).
This allows us to treat the Dirac pulse as the limit of a family of functions with
the following propriety:
For 0, the functions become more and more narrow.
For 0, the functions become more and more high.
The area of such functions is 1 regardless of .
= 1.
Sampling or product: = (0)().

Translation: 0 = (0 )( 0 ).
Scaling: =
(), where
= .
Derivative: =
Integration:
1,
0
0,

If the function () has a discontinuity of the first kind or jump discontinuity on the
point 0 , its derivative has a Dirac pulse in 0 with an area equal to the value of the
jump discontinuity (x0+ ) (x0 ).
From the definition of the pdf, we also obtain:
()
Since + = 1, the previous integral yields:

+
() = 1
We also get that:
1 < 2 = 2 1 =
Thus, the area under in the interval ( , ) represent the probability

that the r.v. lies in the interval ( , ).

Example: Lets consider again the coin toss experiment with probability of heads
equals and the probability of tails equal . CDF is calculated as:
0,
x<0
0x<1
(x) = ,
1,
x1
Since this is a discrete r.v., the pdf will be a sum of Dirac pulses.
Applying the derivative propriety of Dirac pulse we find that:
= + ( 1)

Example: Lets consider again the example of a telephone call which occur at
random in the interval [0,T]. We consider the CDF (), shown in the figure below.
Because the r.v. is continuous, also the CDF is continuous.
The pdf will not have Dirac pulses so ordinary derivate is calculated. We have that:
We note that the derivative (and therefore the pdf) is not defined at the points x = 0
and x=T (angular points of the CDF curve). This, however, is not a problem because
,as we shall see, the pdf is always used within an integral, the values in the
isolated points play no roles (if there is no Dirac pulse in that point).
C.R.V: Probability density function (pdf)Properties

1. () 0. Proof. The property follows from the fact that () is
monotonically
increasing function, and therefore its derivative is non-negative.
2. =
we find
3.
() . Proof. We have that = () . By integrating
()
=
[() ] = () (). But = 0.
+
()
= 1. Proof. Follow from 2, for = + and + = 1.
4. 1 < 2 = 2 1 =
. Proof. Follow from property 6 of
CDF and 2 of pdf; we have: 1 < 2 = 2 1 =

1
both side
5. is continuous with pdf () continuous + , 1

Proof.From 4 and observing that for a continues r.v. the probability does not change
if we include 1 or not, we have:
+ = 1 < 2 =
For continuity assumption of (),we can apply the mean value theorem for integrals
+ =
= + , with [0,1].
C.R.V: Probability density function (pdf)Properties

We note that the last property justifies the name of probability density function; In
fact, it follows that, if () is continuous:
+
+
= lim
So, the value () in the point , represents the probability that takes values in a
interval (, + ), divided by the interval width , that is precisely a probability
density function.
We also observe, that the probability [, + ] is proportional (if 1) to
() and is maximum if [, + ] , where ( ) is locally maximum.
A pdf or density of a continuous r.v., is a function that describes the relative likelihood
for this r.v. to take on a given value.
Define law of probability in a continuous probability space is equivalent to
assign a pdf of a r.v.
C.R.V: Probability distribution function (DF)

If is a discrete r.v. it assume only the value with probability , and its pdf
has Dirac pulses.
In place of CDF or pdf we will define a new distribution function (DF) that returns
directly the probability of the r.v.
= ( = )
The advantage of the DF is to have an ordinary function that does not contain Dirac
pulses.
Example: Lets consider again the coin toss experiment with probability of heads
equals (with r.v. = 1) and the probability of tails equal (with r.v. = 0).
The DF of is:
For a continuous r.v. it makes no sense introducing a DF because it will be zero. A

continuous r.v. take all values of with zero probability.
C.R.V: Probability distribution function (DF)Proprieties

1) () 0. Proof. () is a probability.
2) () =
=
3)
, ().Proof.
= =
{ } =
, ().
= 1. Proof. + = 1 =
4) 1 < 2 =
]1 ,2 [
,+ ()
().
, (
= )
C.R.V: Specific Random Variable

We have introduced the random variables as functions defined on a sample space,
( ), which has a probability space structure.
We have defined r.v. starting from know experiments and description of the
probability space built on it.
In practice, r.v. are introduced having specific distribution or density functions without
any reference to a particular probability space.
Theorem (Existence theorem): Given a function that satisfies the properties of
CDF, you can construct a probability space and a r.v. with CDF .
Bases on this theorem, we can build the r.v. of a particular probability space, or
alternatively directly introduce the r.v. through their distribution functions, without
explicitly specifying the experiment.
In the following slides we will introduce some of the r.v. most commonly used in
practice.
C.R.V: Specific Random VariableDiscrete type

The simplest discrete r.v. is the Bernoulli r.v. that correspond to any experiment with
only 2 possible outcomes.
Bernoulli Distribution: A r.v. is said to be Bernoulli distributed if it takes the
value 1 with probability and 0 with probability = 1 ( {0,1}). Its DF will be:
Note that the Bernoulli r.v. is a particular case (a single trial experiment) of the
Binomial r.v.
Binomial Distribution: is said to be a binomial r.v. with parameters (total
number of test), (total number of successes), (the probability of success in each
experiment) and = 1 if takes the 0,1,2, , with probability (or its DF):

= = =

,
= {0,1,2,3 , }
The performance of the binomial DF by changing , for = 20 and = 0.4 is shown

on the next slide.
Note that the maximum is find at = = 8.
Example: A company produces electronic components in batches of = 1000

components. The probability of a defective component is equal to = 101 ,
independently of the others. What is the probability that:
1. the number of defective components of a lot is equal to zero
2. the number of defective components of a lot is less than or equal to 80;
3. the number of defective components of a lot is between 80 and 120.

Example Solution:
1. If we interpret as a "success" the event that the component is defective, than we
have a repeated trials problem with = 1000.
Therefore, the number of defective components can be modeled as a binomial
random variable .
The probability of no defective component is given by:
2. The probability that the number of defective components is less than or equal to 80
is calculated noting that, 80 = 80
=0{ = } . As the elementary events are
mutually exclusive, the probability of the union is equal to the sum of the
probabilities:
3. The event that is comprised between 80 and 120 can also be expressed as union
of mutually exclusive elementary events as:

Example: A multiple choice test has = 20 questions with three possible answers
for each question. A poorly prepared student responds randomly to all questions;
What is the probability of a score greater than or equal to 12, which is the minimum
threshold for admission?
Solution: Responding randomly to each question, the student identifies the correct
answer (success) with probability = 1 3 and will fail with probability = 2 3.
The number of correct answers is a binomial random variable binomial , and the
desired probability, with considerations similar to those of previous Example previous
is given by:
It is a probability less than 2%, so it is extremely difficult that the student exceeds
the test, by random responding the questions.

Poisson Distribution: is said to be a Poisson r.v. with parameter > 0, if takes
the values [0,1,2, , ] with probability (or its DF):
= = =

,
!
Below we show an example with = 5.
= 0,1,2, = 0 .
C.R.V: Specific Random VariableContinous type

Uniform distribution is said to be uniformly distributed in the interval (, ), with
if its pdf is:
1
[, ]
= ,
0,
The CDF is simply calculated by integration, given by:
= 0, = 1

Normal (Gaussian) distribution is a Gaussian r.v. if its pdf is:
with , and 2 > 0.

This is a bell shape curve, symmetric around the parameter and with a width which
is governed by the parameter ; high value of corresponds to a large bell, while at
small value corresponds to narrow bell.
is the mean or expectation of the distribution; is standard deviation and 2
is the variance
= 0, = 1
+

Normal (Gaussian) distribution The constant factor 2 2 is a normalization
constant that maintain the area under () to be unity.
Its CDF is calculated as:
2
1
= =
() =
22 =
2 2
suppose that ; since the integral of all over is 1, and the Gaussian
function is symmetric with respect to the integral from to is 1 2.
1
2
= +
22
2
we make a change of variables, = ( )
=2+
22
2 2 0
2 2 where = 2 2
2 2 = 2 +
22
1
0

Normal (Gaussian) distribution The previous result for gives the Gauss
2
+
integral: 0 = 2 and in total = 1.
Previous integral can not be calculated directly but can be represented by the error
function:
erf =
2 2
( 0)
We also introduce Gauss integral and complementary error function as:

+
erfc = 1 erf =
+
2
The CDF can than been calculated as:

1
2
1
2
= + erf
22

Normal (Gaussian) distribution Consider now the domain < .
We can find that the CDF is calculated as:
= 0, = 1
1 1

=
2 2
2 2
Exponential distribution The r.v. is said to be exponential(unilateral) if its pdf is:

{ ,
>0
=
{0,
If the events are independent such as arrival time of the telephone or bus arrival time
then the waiting time of this events is shown to be Exponential.
The CDF function is:
= 1
Functions of one random variable

Let be a r.v. defined on the probability space , and () a function of the real
variable which contain the codomain of the function (). The expression = ()
is a new r.v. obtained by associating the value = [()] .
Thus a function of a r.v. is a composite function = = [()] with domain

the set of all possible outcomes.
The CDF of the is the probability of the event { } consisting of all outcomes
such that = [()]
Functions of one random variable

For a specific , the values of such that () form a set on the axis, denoted
by ; we have that:
The transformation = must be such to satisfy the following condition:
1. For each , the set = [ such that () ] of the solutions of inequality
() must consist of the union and intersection of a countable number of
intervals (closet right infinite half lines), in order for { } to be an event. The
function () that have these property is called a Borel function.
2. The domain of the function () must include the range of the r.v. .
3. The events { = } must have zero probability
We shall express the CDF () of the r.v. = in terms of the CDF () of the
r.v. and the function ().
For this purpose we must determine the set of the axis such that () , and
the probability that is on this set.
It will be assumed that the function () is continuous.
Functions of one random variableExamples

Example: Consider the linear transformation = a + , shown on the figure below
for a < 0 and a > 0.
For a > 0:
For a < 0:

To obtain the pdf we need to derive the CDF; for a > 0 we have;
()
=
=
=
=
For a < 0:
=
()
=
In general we have that:

=
Functions of one random variableGeneral solution

Fundamental theorem on the transformation of random variables: Let be
a r.v. with pdf (), if we consider the transformation = () the pdf of is
given by:
0,
=
=
( )
,
( )|
where () is the derivative of ().

Proof. The pdf of can be achieved by the following relationship ( > 0):
We will consider the values of such that for some , = . In case of 3 solutions:
shown on the next figure. Because 1 > 0, 2 < 0, 3 > 0 and is infinitesimal,
the three set where belong are mutually exclusive.
Functions of one random variableGeneral solution

Proof. Cont.

Example: Consider again the previous example with transformation = + . For
any and 0 the equation = = + admit the only solutions
= . We also have () = , so finally the pdf found is same as the
previous obtained result:
1
Example: Consider the transformation = 2 , as in figure below. If < 0 the

equation = = 2 does not have solution = 0. If > 0 we have 2
solution 1 = and 2 = .
Thus, () = 2

Example: Consider the transformation given in the figure below.
If > 1 and [ , ] the transformation is a characteristic of a device that

amplifies. Otherwise the output is limited ("saturates") to the value or .
If < = = () < = = 0.
If
= = = 1.
If = = = =

Example Cont.: The CDF () of the r.v. at the output of the saturated amplifier
is shown below. We note that for = the CDF is discontinuous, because his left
hand limit is 0, while the right hand limit is ( ).
The CDF is also discontinues at the point = , since the right hand limit is 1, while
the left hand limit is ( ).
Therefore, when calculating the pdf, two Dirac pulses will show centered at =
and = , with respectively areas of P Y = = = (1 )
and P Y = = = ( ).
To calculate the pdf we note that for < the equation = = have only
one solution = , so the first derivative gives () = .
Characterization of a random variableMean

Definition. The mean or expected value () of a r.v. with pdf () is (if the
integral exist and is finite):
Example: If is a r.v. with uniform pdf in the interval , , then = 1 .

Hence:
the mean of coincides with the middle point of the interval , .

Example: Suppose has an exponential distribution. Thus to find its pdf we need to
solve the integral by parts as:
the mean of coincides with the reciprocal of the parameter .


If is a discrete r.v., we have that =
The mean is obtained as:
( ) =

) where = ( = ).
( ) =
Example. Suppose is a Bernuolli r.v. we have that its mean is:
Example. Suppose has a Binomial distribution. Its mean is found as:
Proof.
=
=1
!
1
( 1)! ( )!
1
=
=0
1 !
1
! 1 !
=
=0
1
!
+1 1
! ( 1)!
= + 1
= .

Theorem. Let = () be a transformation of the r.v. with pdf (), we have:
In case of a discrete r.v. with DF (), we have:
Example: Lets have a r.v. with uniform distribution in the interval [0,2], and
calculate the mean value of = cos().
Solution:
Characterization of a random variableVariance

Definition. Let define as variance 2 = () of the r.v. with mean = (),
the average square deviation of around its mean:
The variance is a positive quantity, and its positive square root is = () known
as the standard deviation of .
The standard deviation represent the root mean square of the r.v. around its mean
.
Because of the linearity property of the mean we can show that:
Definition. ( 2 ) is known as the mean square value:
Characterization of a random variableRoot Mean Square

Definition. The root mean square (RMS or rms) = ( 2 ) is defined as the
square root of the mean square.
Example: Lets consider a r.v. with uniform distribution in the interval
2 , 2 and zero mean, = = 0, we have:
We observe how the variance increases with amplitude interval , where the
r.v. takes its values.
The variance 2 measures the concentration (or, equivalently, the dispersion) of
around its mean .
We can say that equivalently, the variance is a measure of the uncertainty associated
with the values of the r.v. .
Characterization of a random variableExample

Example. Lets consider a gaussian r.v. and demonstrate that 2 is the variance of .
Solution. We begin the proof by noting that the integral from gaussian pdf should
be 1.
Because last expression is valid for any > 0, we take the derivative with respect to .
If is a discrete r.v. we have that:

=
2 =

( )

Characterization of a random variableExample

Example. Find the variance of a Bernoulli r.v.
Solution.
Variance Property: The variance is not a linear operator, but a quadratic one. If
is a r.v. with finite variance, whatever the real constants a and b are, we have that:
Proof. Based on the definition we have:
Using the linearity of the mean operator, with simple steps we can write:
Characterization of a random variable Moments

The mean, variance and mean square value belongs to a more general class, called
moments of a r.v.
Definition of Moments. The nth moment of a r.v. ( ) is:
Definition of Central moments.
Definition of Absolute/Generalized moments.
Characterization of a random variable

Relation between Moments and Central moments
To obtain the central moments as a function of those not central, we can use the
binomial expansions theorem and the linearity of the mean:
Characterization of a random variable Example

Example. Let be a gaussian r.v. Calculate the central and non central moments of
nth order.
Solution. Let's start with the calculation of the moments of a r.v. with zero mean
and variance 1, ~(, = 0, 2 = 1). In fact, we can express any generic gaussian
r.v. ~(, , 2 ) in terms of the standard normal r.v. , as = + .
We can then express the moments of as a function of the moments of .
Because is zero mean r.v., moments and central moments are same: we must then
calculate the generic nth moment n, given by:
where
Since and () is an even function, the moments for n odd are zero, because
the integral of an odd and even function is zero.
We are so interested only on n even. Because the calculation are still difficult to do,
we will use the Gauss Integral:
>0

Example. Cont. The last integral can be obtain from the normalization condition of
the pdf for a r.v. ~(, 0, ) with 2 = 1 2 .
Differentiating times with respect to , we obtain:

The latest relation can be rewritten, by simple algebraic manipulations, in the form:
Substitute = 1 2 and rewrite the last equation as:

Example. Cont. We have used the symbol !! to denote that only odd value are
allowed.
Last relation is the nth moment ( ) with even = 2.
If ~(, = 0, 2 = 1) we have that:
0,

1 !!,
Before we evaluate the general case, ~ , , 2 , we consider the case of only
zero mean ~ , 0, 2 , where the central and non central moments are same; also
we have that = , whereby = ( ):
=
0,
1 !!,
Widely used is the moment of the 4th order, for a ~ , 0, 2 , 4 = 3 4 .

If 0, the central moments are same as those of with zero mean, so
they are still given from the previous relation:
= ( ) =
0,
1 !!,
Characterization of a random variableChebyshev Inequality

Example. Cont. If 0, the non central moments are obtained from the
central moments.
Theorem: Markov Inequality
finite (). We have for > 0
Let be a positive r.v. ( 0, < 0), with
Proof.
Theorem: Bienaym Inequality Let be a r.v. and a real number, then for
and > 0 we have:
Proof. The proof is obtained from Markov inequality by taking = and

= , and also observing that the function = is monotonically increasing for
.

The probability that appears in Bienaym inequality is that the r.v. does not belong
to the range , + .
This probability is much smaller if the absolute moment is smaller with
respect to and for a fixed .
Therefore it can be interpreted as a dispersion index of the r.v. around .
Theorem: Chebyshev Inequality Let be a r.v. with finite mean and variance
2 . For > 0 we have that:
Proof. The proof is obtained from Bienaym inequality by taking = and = 2.

The variance can be interpreted as the most simple index of dispersion of the values
assumed by a r.v. around its mean. In other words, is a broad measure of how a r.v.
is dispersed around its mean.
If we take = we can rewrite the Chebyshev inequality as:

The previous relation can also be written as:
Last relation allows to obtain a lower bound for the probability that the r.v. takes
values in the interval ( , + ), as shown in the table below, for = 1,2,3,4,5.
( )

ProbabilityStat SLIDES

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

ProbabilityStat SLIDES

Transféré par

Droits d'auteur :

Formats disponibles

Probability and Statistics for

Multi variable calculus, Discrete mathematics

Probability and Statistics for Engineers

One midterm exam: 30%

Probability and Statistics for Engineers

Attend ALL lectures and the assignments.

Examples will be given and solved in the class:

Probability and Statistics for Engineers

Course Learning Outcomes

The student should become familiar with the

Probability and Statistics for Engineers

(Processing, Signals are often characterized as

- Machine learning (Probabilistic reasoning and the Bayesian statistical

- Biostatistics, bioinformatics, and related fields

History and overview/ Classical probability theory

Probability and Statistics for Engineers

History and overview/ Classical probability theory

Probability and Statistics for Engineers

History and overview/ Classical probability theory

Probability and Statistics for Engineers

History and overview/ Classical probability theory

Probability and Statistics for Engineers

History and overview/ Classical probability theory

Probability and Statistics for Engineers

History and overview/ Modern probability theory

Probability and Statistics for Engineers

History and overview/ Modern probability theory

Probability and Statistics for Engineers

History and overview/ Modern probability theory

Probability and Statistics for Engineers

History and overview/ Modern probability theory

Probability and Statistics for Engineers

History and overview/ Modern probability theory

Probability and Statistics for Engineers

History and overview/ Modern probability theory

Probability and Statistics for Engineers

History and overview/ Modern probability theory

Probability and Statistics for Engineers

History and overview/ Modern probability theory

Probability and Statistics for Engineers

Probability and Statistics for Engineers

Rolling a die outcomes

Probability and Statistics for Engineers

We call phenomenon random if individual outcomes are

The probability of any outcome of a random

Probability theory is the branch of

Probability and Statistics for Engineers

Axioms of probability: Set theory

Axioms of probability: Set theory

Probability and Statistics for Engineers

Axioms of probability: Set theory

Examples: Rolling a die with outcomes

The Null Event, The empty event - f

Axioms of probability: Set theory

Set operations on Events

The event A B occurs if the event

The event A B occurs if the event A

Axioms of probability: Set theory

Two events A and B are called

Probability and Statistics for Engineers

Axioms of probability: Probabilistic model

Axioms of probability: Axiomatic method in mathematics

Axioms of probability: Probabilistic Axioms

Probability and Statistics for Engineers