Académique Documents
Professionnel Documents
Culture Documents
c c
c c
In probability theory, a a a
, or
a
, is a way of
assigning a value to each possible outcome, that is element of a sample space.
These values might represent the possible outcomes of an experiment, or the
potential values of a quantity whose value is uncertain (e.g., as a result of
incomplete information or imprecise measurements.) Intuitively, a random variable
can be thought of as a quantity whose value is not fixed, but which can take on
different values; normally, a probability distribution is used to describe the
probability of different values occurring. Realizations of a random variable are
called random variates.
Random variables are usually real-valued, but one can consider arbitrary types
such as boolean values, complex numbers, vectors, matrices, sequences, trees, sets,
shapes, manifolds, functions, and processes. The term a is used to
encompass all such related concepts. A related concept is the stochastic process, a
set of indexed random variables (typically indexed by time or space).
?a
Real-valued random variables (those whose range is the real numbers) are used in
the sciences to make predictions based on data obtained from scientific
experiments. In addition to scientific applications, random variables were
developed for the analysis of games of chance and stochastic events. In such
instances, the function that maps the outcome to a real number is often the identity
function or similarly trivial function, and not explicitly described. In many cases,
however, it is useful to consider random variables that are functions of other
random variables, and then the mapping function included in the definition of a
random variable becomes important. As an example, the square of a random
variable distributed according to a standard normal distribution is itself a random
variable, with a chi-square distribution. One way to think of this is to imagine
generating a large number of samples from a standard normal distribution,
squaring each one, and plotting a histogram of the values observed. With enough
samples, the graph of the histogram will approximate the density function of a chi-
square distribution with one degree of freedom.
Another example is the sample mean, which is the average of a number of samples.
When these samples are independent observations of the same random event they
can be called independent identically distributed random variables. Since each
sample is a random variable, the sample mean is a function of random variables
and hence a random variable itself, whose distribution can be computed and
properties determined.
One of the reasons that real-valued random variables are so commonly considered
is that the expected value (a type of average) and variance (a measure of the
"spread", or extent to which the values are dispersed) of the variable can be
computed.
There are two types of random variables: discrete and continuous.[1] A discrete
random variable maps outcomes to values of a countable set (e.g., the integers),
with each value in the range having probability greater than or equal to zero. A
continuous random variable maps outcomes to values of an uncountable set (e.g.,
the real numbers). For a continuous random variable, the probability of any
specific value is zero, whereas the probability of some infinite set of values (such
as an interval of non-zero length) may be positive. A random variable can be
"mixed", with part of its probability spread out over an interval like a typical
continuous variable, and part of it concentrated on particular values like a discrete
variable. These classifications are equivalent to the categorization of probability
distributions.
The expected value of random vectors, random matrices, and similar aggregates of
fixed structure is defined as the aggregation of the expected value computed over
each individual element. The concept of "variance of a random vector" is normally
expressed through a covariance matrix. No generally-agreed-upon definition of
expected value or variance exists for cases other than just discussed.
There are two possible outcomes for a coin toss: heads, or tails. The possible
outcomes for one fair coin toss can be described using the following random
variable:
and if the coin is equally likely to land on either side then it has a probability mass
function given by:
We can also introduce a real-valued random variable V as follows:
A random variable can also be used to describe the process of rolling a die and the
possible outcomes. The most obvious representation is to take the set {1, 2, 3, 4, 5,
6} as the sample space, defining the random variable X equal to the number rolled.
In this case,
Ña
When is a topological space, then the most common choice for the ı-algebra Î is
to take it equal to the Borel ı-algebra ( ), which is the ı-algebra generated by the
collection of all open sets in . In such case the ( , Î)-valued random variable is
called the
a a
. Moreover, when space is the real line X,
then such real-valued random variable is called simply the a a
.
a a
In this case the observation space is the real numbers with a suitable measure.
Recall, is the probability space. For real observation space, the function
is a real-valued random variable if
If 0, then
so
a a
There are several different senses in which random variables can be considered to
be equivalent. Two random variables can be equal, equal almost surely, or equal in
distribution.
In increasing order of strength, the precise definition of these notions of
equivalence is given below.
If the sample space is a subset of the real line a possible definition is that random
variables Õ and V are
a if they have the same distribution
functions:
Two random variables having equal moment generating functions have the same
distribution. This provides, for example, a useful method of checking equality of
certain functions of i.i.d. random variables. However, the moment generating
function exists only for distributions that are good enough.
a
where "ess sup" represents the essential supremum in the sense of measure theory.
Finally, the two random variables Õ and V are if they are equal as functions
on their measurable space:
o
a
Much of mathematical statistics consists in proving convergence results for certain
sequences of random variables; see for instance the law of large numbers and the
central limit theorem.
There are various senses in which a sequence (Õ) of random variables can
converge to a random variable Õ. These are explained in the article on convergence
of random variables.
c
a
a a
3
It has been suggested that this article or section be merged into
a
a. (Discuss)
2.c
3.c
4.c The probability mass function of a discrete probability distribution. The
probabilities of the singletons {1}, {3}, and {7} are respectively 0.2, 0.5,
0.3. A set not containing any of these points has probability zero.
5.c
6.c
7.c The cdf of a discrete probability distribution,...
8.c
9.c
10.c... of a continuous probability distribution,...
11.c
12.c
13.c... of a distribution which has both a continuous part and a discrete part.
14.cIn probability theory and statistics, a
a
a a is a
probability distribution characterized by a probability mass function. Thus,
the distribution of a random variable Õ is discrete, and Õ is then called a
a
a a
, if
15.c
16.cas runs through the set of all possible values of Õ. It follows that such a
random variable can assume only a finite or countably infinite number of
values. That is, the possible values might be listed, although the list might be
infinite. For example, count observations such as the numbers of birds in
flocks comprise only natural number values {0, 1, 2, ...}. By contrast,
continuous observations such as the weights of birds comprise real number
values and would typically be modeled by a continuous probability
distribution such as the normal.
17.cIn cases more frequently considered, this set of possible values is a
topologically discrete set in the sense that all its points are isolated points.
But there are discrete random variables for which this countable set is dense
on the real line (for example, a distribution over rational numbers).
18.cAmong the most well-known discrete probability distributions that are used
for statistical modeling are the Poisson distribution, the Bernoulli
distribution, the binomial distribution, the geometric distribution, and the
negative binomial distribution. In addition, the discrete uniform distribution
is commonly used in computer programs that make equal-probability
random selections between a number of choices.
a
a
19.cEquivalently to the above, a discrete random variable can be defined as a
random variable whose cumulative distribution function (cdf) increases only
by jump discontinuities²that is, its cdf increases only where it "jumps" to a
higher value, and is constant between those jumps. The points where jumps
occur are precisely the values which the random variable may take. The
number of such jumps may be finite or countably infinite. The set of
locations of such jumps need not be topologically discrete; for example, the
cdf might jump at each rational number.
20.cConsequently, a discrete probability distribution is often represented as a
generalized probability density function involving Dirac delta functions,
which substantially unifies the treatment of continuous and discrete
distributions. This is especially useful when dealing with probability
distributions involving both a continuous and a discrete part.
a
a
a
21.cFor a discrete random variable Õ, let 0, 1, ... be the values it can take with
non-zero probability. Denote
22.c
23.cThese are disjoint sets, and by formula (1)
24.c
25.cIt follows that the probability that Õ takes any value except for 0, 1, ... is
zero, and thus one can write Õ as
26.c
27.cexcept on a set of probability zero, where 1 is the indicator function of .
This may serve as an alternative definition of discrete random variables
a a
In probability theory and statistics, a a a identifies either the
probability of each value of a random variable (when the variable is discrete), or
the probability of the value falling within a particular interval (when the variable is
continuous).[1] The probability distribution describes the range of possible values
that a random variable can attain and the probability that the value of the random
variable is within any (measurable) subset of that range.
c
c
cc
cc cc
cc
When the random variable takes values in the set of real numbers, the probability
distribution is completely described by the cumulative distribution function, whose
value at each real is the probability that the random variable is smaller than or
equal to .
The concept of the probability distribution and the random variables which they
describe underlies the mathematical discipline of probability theory, and the
science of statistics. There is spread or variability in almost any value that can be
measured in a population (e.g. height of people, durability of a metal, sales growth,
traffic flow, etc.); almost all measurements are made with some intrinsic error; in
physics many processes are described probabilistically, from the kinetic properties
of gases to the quantum mechanical description of fundamental particles. For these
and many other reasons, simple numbers are often inadequate for describing a
quantity, while probability distributions are often more appropriate.
Ña
For many familiar discrete distributions, the set of possible values is topologically
discrete in the sense that all its points are isolated points. But, there are discrete
distributions for which this countable set is dense on the real line.
c
o
Y
J
a
a
a
`c c
c
c
cccccc
c c
c
ccÀ cccc
c
c
c
`c c
c
c
ccc
ccc
c
c
c
ccÀ À
cc
c
c
c
`c
c
cccccccccc c c
c
ccc ccc
ccc
ccc
ccc c cc
cc
cc
c
ccccc
ccc
Note also that all of the univariate distributions below are singly-peaked; that is, it
is assumed that the values cluster around a single point. In practice, actually-
observed quantities may cluster around multiple values. Such quantities can be
modeled using a mixture distribution.