Vous êtes sur la page 1sur 14


Properties of Random Variables

This note discusses several properties of random variables, including expected value, variance, and covariance. It also examines these properties for sums and other functions of random variables, which will be important in many applications. When the outcomes of some random event or experiment are numerical, we call the random event a random variable. Examples include: At a construction project, the number of weeks until completion; At the same construction project, the nal total cost of the job; The closing price of Ford Motor Co. on the New York Stock Exchange next December 31; The temperature, in degrees Fahrenheit, in Times Square at midnight Eastern Standard Time next December 31. Since random variables are just a special type of uncertain experiment (in the broadest sense of that term), all the probability rules and techniques we have already examined are available to us. For example, we can use the sort of display shown in Figure 1. If we have two random variables, we can analyze their joint distribution, using tools such as the probability table in Figure 2. We can also compute marginal, joint, and conditional probabilities using probability trees as well. Random variables play a central role in analyzing risk and uncertainty, and will be used throughout the rest of this course. We rst need to introduce some terminology used with random variables, and then to discuss some of their properties.

Fig. 1. The probability distribution of a random variable. outcome: number of weeks until completion probability





Fig. 2. A probability table giving the joint distribution of two random variables. total cost of work $5500 5 number of weeks until completion 6 7 8 .15 0 0 0 .15 $5750 .1 .05 0 0 .15 $6000 0 .2 0 0 .2 $6250 0 0 .2 0 .2 $6500 0 0 .1 0 .1 $6750 0 0 0 .2 .2 .25 .25 .3 .2

Terminology of random variables

Consider a simple experiment with numerical outcomes: We roll two fair die, and count the total number of spots shown. Since the outcome of this experiment is an integer between 2 and 12 inclusive, we can analyze this experiment with a random variable. It is convention to denote random variables with a capital letter, such as X, Y, or Z. We will say things like: Let X be the outcome of rolling two fair die. In doing so, we actually mean three things: 1) there is an experiment (i.e., rolling two fair die), 2) the outcome space is numerical (here, {2, 3, 4, . . . , 12 }), and that 3) the actual outcome we will denote by the symbol X. We use symbols like X because we may not know the actual outcome at the time we need to analyze the problem.

Expected Value
One of the things we can do with random variables is compute summary measures of the uncertain outcomes they represent. The rst and most important measure of a random variable is called its expected value, or sometimes its expectation. If X is a random variable, its expected value, written E(X), is dened as E(X) = (value of outcome) (prob. of outcome)

That is, the expected value of a random variable is just its probability-weighted average. So, for example, for the random variable whose distribution is given in Figure 1, the expected value is 5(.25) + 6(.25) + 7(.3) + 8(.2) = 6.45 weeks. In essence, expected value is a measure of the central tendency of the outcomes of a random experiment. Expected value is a particularly powerful concept, because the mathematical denition above corresponds (in most situations) closely to our psychological expectation of what should happen in an uncertain situation. The term expected value originates in games of chance, where it corresponds (exactly) to the long-run average of a gamblers winnings when the same game is played over and over again. For the example of a fair toss of two die, with eleven possible outcomes {2, 3, . . . 12}, you can verify by enumerating out the probabilities that the expected number of spots is 7. In more sophisticated uses, the expected value of a random variable is also referred to as the population mean. It takes a bit more context for this terminology to make sense, however, which we will provide in a subsequent lecture. Since the textbook discusses expected value at length in Chapter 5.4, we turn our attention here to other properties of random variables. 3

Variance and standard deviation

In addition to summarizing the central tendency of a random variable, we will also be interested in summarizing its dispersion about the expected value. The two most important measures of this sort are the variance and the standard deviation, which are closely related. If X is a random variable, the variance of X, written var(X), is var(X) = (value of outcome expected outcome)2 (prob. of outcome)

In words, take each possible outcome, subtract the expected value, square the difference, and then multiply by the probability of the outcome. The sum of these probability weighted squared deviations is the variance. The standard deviation of X, written SD(X), is just the square root of the variance: SD(X) = var(X).

The variance for the probability distribution in Figure 1 is (56.45)2 (.25) + (66.45)2 (.25) + (76.45)2 (.3) + (86.45)2 (.2) = 1.1475 , so the standard deviation is 1.1475 = 1.0712. . . weeks. The basic idea behind the variance formula is to look at how far each possible outcome is from the expected outcome, and take a probability-weighted average of these squared deviations. The square inside the sum is necessary to prevent symmetric positive and negative deviations from cancelling one another out (which would erroneously imply no dispersion!). As with expected value, we take a probability-weighted average of these squared deviations in the variance formula, which makes highly unlikely outcomes count much less in our measure of spread than more likely outcomes do.1 Units become important in thinking about variance and standard deviation. A wrinkle with variance is that its units are the square of whatever the original random variable is. So, for example, if the random variable X represents the weeks to completion whose probability distribution is in Figure 1, the expected value of X is also in weeks but the variance of X is in weeks squared. This bit of awkwardness
You may wonder why SD is computed using a square root of squared deviations, rather than simply summing the absolute value of all those deviations. The answer lies in how we measure distance in geometry. If you recall how to measure the distance between two points in a plane, it involves taking the square root of the sum of the squared differences between each of the coordiates of the points involved. Standard deviation is based on the same idea, measuring the expected deviation as the (probability-weighted) distance between the list of possible outcomes and their expected value.

is due to the square inside the variance formula. Thus, the values we compute for variance in different situations are difcult to interpret directly. The way around this is to get rid of the squared units by taking the square root of our answer a solution so simple that it gets its own name, the standard deviation. Standard deviation is in the same units as the original problem, or in our case, weeks. So, why do we use two measures of dispersion? It turns out that variance and standard deviation are useful for different things. Variance is mainly used as an input into other calculationsmany things, such as the dispersion of a sum of random variables, depend on the variance rather than the SD (more on this below). Standard deviation is much more readily interpretable, however, because it is in familiar units. More about interpreting standard deviations Intuitively, we interpret both variance and standard deviation as a measure of our condence in the expected value as a predictor of an uncertain outcome. Random variables with a very small SD have outcomes that are highly likely to be close to the expected outcomethus, the expected value is a good predictor of the actual outcome. In contrast, when a random variable has a relatively high SD, is becomes quite likely that the actual outcome may differ considerably from the expected value. Higher SD is interpreted as higher levels of risk in nancial contexts. Although it is not immediately apparent, a useful interpretation of the SD is that it measures the typical deviation from the expected outcome. Typical is the key term here: we are thinking about how big a difference between the expected and actual outcomes we would see if the experiment was performed repeatedly. For example, suppose a particular betting strategy in the casino dice game of craps has an expected payoff of 92 cents and a SD of 36 cents for every dollar the gambler wages (remember, casino games favor the house). The SD is a give or take numberas in, the gamblers winnings for each dollar waged should be about 92 cents, give-or-take 36 cents. Thus, we interpret the SD number as how much the gamblers payoff will typically differ from the expected payoff. If the gambler played a large number of games with this same strategy, the typical difference between the actual winnings each round and the expected winnings, ignoring signs, would be about 36 cents. Thats the SD. Using Excel Excel (and most other numerical computer software) has a function VAR() and a function STDEV() for calculating variance and standard deviation. These functions work by taking a list of numbers (as a column or row reference) and returning an

Fig. 3. Computing variance. square of outcome minus previous outcome probability expected value column 5 6 7 8 0.25 0.25 0.3 0.2 -1.45 -0.45 0.55 1.55 2.1025 0.2025 0.3025 2.4025 previous column times probability 0.525625 0.050625 0.09075 0.4805 1.1475

sum of last column = Variance =

answer assuming all values are equally likely. This is important: if the random variable you are analyzing does not have equally likely outcomes, these functions will give you the wrong answer, period. This is a convention which software makers use because a frequent context for calculating variance and SD, known as sampling, entails equally likely outcomes. When the probabilities that go with each possible outcome are not all the same, however, you need to make a few columns in a spreadsheet to evaluate the variance formula step-by-step: list the outcomes, subtract the deviations, square the results, multiply by the probabilities, and add up the results. An example of this for the probability distribution in Figure 1 is shown in Figure 3. By the way, Excel, and some other software, actually has two versions of each function for variance and SD. The other variants are VARP() and STDEVP(). The difference between the P versions and the no-P versions is whether they divide the sum by the number of outcomes, or by the number of outcomes minus one. Excel calls the P versions the population variance and population SD; however, these are not the variance analogs to the population mean discussed above (and are not how statisticians employ the term population more generally). Cest la vie.

When we have a joint distribution for two different random variables, we can ask how they covary. This is a fancy term for: When the outcome of one variable is high, does the other tend to be high as well? For example, you should have a sense that the price of a share of stock in Ford Motor Co. and the price of a share of General Motors tend to positively covary, in that common factors (viz., demand for automobiles and the state of the economy as a whole) affect the the protability of both rms in a similar manner. The summary statistic we use for quantifying such 6

relationships is called, not surprisingly, covariance. Suppose that X and Y are two different random variables, such as the number of weeks and total cost in the probability table in Figure 2. The covariance of X and Y , which we write in shorthand as cov(X,Y ), is dened as cov(X, Y ) =
joint outcomes

value of X outcome


value of Y outcome

E(Y ) of X and Y

joint prob.

The formula for covariance is a generalization of the formula for the variance: if you split apart the squared term in the variance formula, and replace one of the deviations with Y instead of X, you get the covariance formula above. (Thus the covariance of any random variable with itself, or cov(X,X), is just var(X).) Calculating a covariance with this denition is best done in spreadsheet using as many rows as there are joint outcomes to evaluate. You will get some practice using this formula in the Variance Associates note. Covariance measures how two random variables tend to move together, relative to their expected values. This is the sense in which we refer to X being high when Y tends to be high. Positive covariance simply means that both random variables tend to deviate from their expected values in the same direction. Negative covariance means they tend to deviate in opposite directions (viz., one high, the other lowagain, relative to their expected values). In terms of probability tables, we can usually see covariance in how the probabilities lie in a joint distribution. If most of the probabilities lie along a more-or-less downward sloping (top-left to bottom-right) diagonal of the table, then random variables covary positively.2 If most of the probabilities lie along an upward-sloping diagonal, then the random variables covary negatively. If you look at Figure 2 now, it should be evident that most of the probability lies along the downward-sloping diagonal; you can verify (if you wish) that that the covariance of these two random variables is 438.125. Costs tend to rise with the time to completion, as one might expect. A shortcoming of covariance, just as with variance, is its units. When both random variables are measured in the same units (e.g., dollars), then the covariance units will be the square of whatever units the random variables are in (e.g., dollars squared). Like variance, this makes covariance difcult to interpret directly.3 Despite this inconvenience, covariance is quite important for understanding and quantifying things like diversication strategies and portfolio construction in practice. For this reason we will make much use of it, warts and all. To see how
That is, assuming your outcomes along the border of the table are in ascending order. For inquiring minds that want to know, the solution to the units wrinkle with covariance is not to take a square root, but to use a different normalization known as the correlation. We will have (much) more to say on this subject in a later note.
3 2

covariance comes to play (quantitatively) in thinking about different sources of uncertainty, read on.

Covariance and independence

When someone says that two random variables are independent, they mean that knowing the outcome of one random variable gives us no information about the other random variables outcome (that is, beyond what we already havethe random variables original (marginal) distribution). In terms of a joint probability table, independence of two random variables means that every pair of events must pass the independence test. For example, consider the random variables height (in meters) and weight (in kilograms) of a randomly selected MBA student. We would certainly not expect these two random variables to be independent of each otherpeople who are taller than average tend to weight more. We would expect height and weight to positively covary in this experiment. There is a subtlety about the relationship between covariance and independence, however. This is that the relationship runs only one way: If X and Y are independent, then their covariance is zero. (This is a mathematical fact, which you can check by example with any probability table for independent random variables.) But the reverse is not true, unfortunately. Just because X and Y have zero covariance does not mean they are independent. This occurs because covariance is a (probability-weighted) average of how two random variables move together, while independence concerns covariation possibilities for every outcome of both. Zero covariation of every pair of outcomes implies zero covariance of the probability-weighted average, but not vice versa. We should also note that two random variables that negatively covary are of course not independent; knowing the value of one should cause you to update your assessment of the likely outcomes of the other, using the quotient rule for obtaining conditional probabilities.

Sums of random variables

In many applications we will encounter, we will be interested in evaluating expected value, variance, and SD for sums of random variables. For example, the uncertainty in the return to an investment can be represented by a random variable, and the return on any portfolio of investments by a sum of such random variables. This framework is widely employed in portfolio management for understanding risk and the value of diversication. Similar considerations arise in understanding the effects of demand uncertainty among different customers or 8

groups of customers, as in the Variance Associates note. Sums of random variables also play a prominent role in the statistical analysis of sampling, including the precision of surveys and opinion polls, hypothesis testing, and regression analysis. All of this is on the agenda for upcoming lectures. We provide here two important properties relating to the sum of random variables. These can be derived directly from the denitions earlier in this note, but since we will use these properties often it will be useful to state them explicitly. Some intuition behind the mathematics is provided with each formula so that you understand the logic from which they follow.

Expected value of a sum

The expected value of the sum of two random variables, X and Y , satises: E(X + Y ) = E(X) + E(Y ). In words, this says that the expected value of the sum of two random variables is equal to the sum of the expected values. It holds because the expected value is a sum of (probability-weighted) outcomes, and we can compute sums of things in any order we prefer. This property generalizes straightforwardly to sums of more than two random variables. For any number of random variables, the expected value of a sum is the sum of the expectations. An example: Suppose that you have a (admittedly small) stock portfolio consisting of one share of Microsquash and one share of Oogle. The expected value of the Microsquash share in one year is $120, and the expected value of the Oogle share in one year is $100. Then the expected value of the total portfolio in one year is
E(Value of Portfolio) = E(Value of Microsquash + Value of Oogle) = E(Value of Microsquash) + E(Value of Oogle) = $120 + $100 = $220.

Variance of a sum
The variance of the sum of two random variables, X and Y , is given by: Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y ). This usually gives the uninitiated pause: where did that covariance term come from? And why does that 2 get in there? The answer is that this equation comes from the quadratic formula: (x + y)2 = x2 + y 2 + 2xy. If you replace x and y in the 9

quadratic by x E[X] and y E[Y ], respectively, then insert sums on each side, and multiply by the probabilities you get the variance of sums. In words, this says that the variance of the sum of two random variables is equal to the sum of the two variances plus two times the covariance of the random variables. In essence, a positive covariance spreads out the probability distribution of a sum, while a negative covariance reduces the spread of the sum. You can start to see why covariance play an important role in risk diversication (more about which will come later). To continue our previous example, suppose that the variance of (the value of) the Microsquash share in one year is 900($2 ), and the variance of (the value of) the Oogle share in one year is 1600($2 ). The covariance between the two is 400($2 ). What is the standard deviation of the value of your portfolio in one year? Var(Value of Portfolio) = Var(Value of Microsquash + Value of Oogle) = Var(Value of Microsquash) + Var(Value of Oogle) + 2 Cov(Value of Microsquash, Value of Oogle) = 900($2 ) + 1600($2 ) + 2(400)($2 ) = 3, 300($2 ). The standard deviation of the portfolio value is therefore 3, 300($2 ) = $57.46. An important point to note about this example concerns calculating the SD for a sum of two random variables. Since the SD of a random variable is the square root of the variance, to get the SD for a sum of two random variables you have to rst work out the variance of the sum and then take the square root. There are no shortcuts here; you cant reverse the order of sums and square roots in mathematics. What if you had more than two different stocks in your portfolio and needed the variance for a sum of, say, three random variables (e.g., Var(X + Y + Z))? This gets a little tricky. The trick is to let a new random variable equal the sum of two existing random variables, and apply the variance formula for the sum of two random variables twice. That is, Var(X + Y + Z) = Var(X) + Var(Y + Z) + 2 Cov(X, Y + Z) In doing so, it is useful to know that covariances are distributive, that is, Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z) This follows from the fact that we are just re-computing the sum (over all of the relevant joint outcomes) in a different order. We will give you some practice with this and related calculations at various points over the next several weeks. 10

Constants and random variables

We will often need to concern ourselves with linear functions of random variables. For example, if X is the price of a share of Microsquash one year from now and Y is the price of a share of Oogle at the same point in time, we will need to evaluate the properties of holding, say, 60 shares of Microsquash and 40 shares of Oogle. This amounts to evaluating the expected value and SD of the function 60X+ 40Y. Portfolios are often represented by such mathematical shorthand in their analysis. There are three facts we will use for analyzing constants and random variables: one for expected values, one for variances, and one for covariances. As before, these follow from the denition of these properties discussed earlier in this note, but it is useful to state them explicitly since we will use them often. In each case below, a, b and c, d are constants (i.e., non-random numbers). Fact 1: Fact 2: Fact 3: E(a + bX) = a + bE(X). Var(a + bX) = b2 Var(X). Cov(a + bX, c + dY ) = b d Cov(X, Y ).

Fact 1 follows from inserting a + bX into the original expected value formula. Both of the constants just pull through the sum, leaving us with a + bE(X). Fact 2 is (slightly) more subtle: notice that when you pull a constant multiplier through a variance, it gets squared. This follows from inserting a + bX into the variance formula and expanding the square inside the sum. If you do the algebra correctly, a will cancel out, and you will see that b pulls out in front of the parentheses after it gets squared. Fact 3 is a generalization of Fact 2: inserting a + bX and c + dY into the covariance and multiplying out lets b and d pull out frontwhile a and c get cancelled out. At this point, you should be able to verify that the expected value of the 60share Microsquash, 40-share Oogle portfolio has an expected value of $11,200 and a standard deviation of $2,778.49.

Random variables as functions of outcomes

The foregoing analysis of constants and sums of random variables leads us down the path of thinking about functions of random outcomes. Often (if not usually) when we are modeling an uncertain situation, there will many different pieces of uncertainty that come together to determine the outcome of primary interest. For example, the total cost of completing a construction project may depend upon the 11

time to completion, future changes in materials prices, the number of intervening rainy days, the number of worker accidents, and so on. To get to where we can think precisely about analyzing the risk in the total projects cost, we start by analyzing the probabilities associated with these individual components using whatever knowledge or data we have available (e.g., number of rainy days in the past, accident rates on similar projects, and so on). The total project cost is a function of all these components, and sometimes these functions can become quite complex. The facts regarding constants and sums of random variables discussed above are essentially shortcuts for working with simple linear functions of two (or more) random variables. They allow us to work out the EV and SD for a sum of several random variables at once without having to re-compute any new probability distributions. In more complicated cases, similar shortcuts will not be available. If the random outcome we are interested in is a complicated (viz., non-linear) function of multiple pieces of uncertainty, the only way to get the EV and SD of the random variable we are interested in is to rst derive its entire probability distribution. An example will help cement these ideas in place. Consider the following story: At a particular construction project, a subcontractor is worried about his total costs. This subcontractor must excavate and then lay the foundation for the project. His costs are: $4,000 xed costs, plus $250 per week for every week that he is engaged in excavation, plus $250 per week for every week that he is working on the site. The lengths of time that excavation and foundation will take depend on a number of factors about which he is currently uncertain, having to do with the properties of the site. The contractor assesses that excavation will take 1, 2, or 3 weeks, with probabilities .2, .5, and .3, respectively. If excavation takes 1 week, then foundation work will take 3, 4, or 5 (additional) weeks with probabilities .5, .25, and .25, respectively; if excavation takes 2 weeks, then foundation work will take 3, 4, or 5 weeks with probabilities .2, .4, and .4, respectively; and if excavation takes 3 weeks, then foundation work will take either 4 or 5 weeks, with probabilities 1/3 and 2/3, respectively. The length of time that the contractor must work at this site is the sum of the times of the excavation and the foundation work, except that he is contractually obligated to be at the project for no less than ve weeks. He incurs the $250 per week (while working) charge for a minimum of ve weeks, even if he nishes both tasks sooner than that. Suppose we want to nd the EV and SD of the total project cost. To do so, we need to determine the complete probability distribution for the random variable


total project cost. The reason for this comes from the relation between the individual pieces of uncertainty. Specically, if we let X represent the random variable excavation time (in weeks) and F stand for foundation time (also in weeks), then T , for total project completion time, is given by the formula: T = max{5, X + F } weeks. The explanation here is that since the subcontractor is contractually obligated to be on the job no less than ve weeks, the total project completion time is the larger of 5 weeks or the actual time to nish excavation and the foundation. Reading the third sentence in the story, the total project cost, C, is then C = $4,000 + $250 T + $250 X. This second formula looks like a simple sum of random variables, with a few constants thrown in to work with. But the rst formula for T doesnt look like anything familiar; there is no easy way to evaluate E(T ) or SD(T ) without working out its complete probability distribution. To get the EV and SD of the total project cost then, we build the probability tree shown above. The probability data are given to us in convenient fashion for a probability tree: rst unconditional probabilities for the excavation time, and then, conditional on the possible excavation time, conditional probabilities of foundation times. Thus, we are able to ll in the tree and get the joint outcomes and probabilities shown in the rst two columns after the tree. Next comes the total project completion time, which is the sum of the two times except that in the rst joint oucome we write 5 weeks reecting the subcontractors obligations to be on the site for at least ve weeks. And then we have total cost, computed by the formula for C above: $4,000 + $250 total completion time + $250 excavation time. From this we can compute the probability distribution of total cost. The probability that the total cost is $5,500 is .15, the sum of the probabilities for the two outcomes making up this total cost; and so on for the remaining total cost outcomes. The entire marginal distribution of total cost outcomes is given by the bottom row of the probability table back in Figure 2. In fact, Figure 2 is computed to give the joint probability distribution of the two random variables T and C, the total project completion time and total cost respectively, as obtained from this probability tree. From the marginal distribution of total completion time, we can then compute the EV and SD as we did much earlier in this note: The EV is 6.45 weeks, and the SD is 1.0712 weeks.


Figure 1: A probability tree analysis of the contractors problem

Foundation Time Excavation Time
3 weeks (0.5) 4 weeks (0.25) 5 weeks (0.25) 3 weeks (0.2) 4 weeks (0.4) 5 weeks (0.4) 3 weeks (0) 4 weeks (1/3) 5 weeks (2/3)

Joint Outcome 1,3 1,4 1,5 2,3 2,4 2,5 3,3 3,4 3,5

Prob. 0.10 0.05 0.05 0.10 0.20 0.20 0.00 0.10 0.20

Total Compl. Time 4 weeks 5 weeks 6 weeks 5 weeks 6 weeks 7 weeks 6 weeks 7 weeks 8 weeks

Total Cost $5500 $5500 $5750 $5750 $6000 $6250 $6250 $6500 $6750

1 week (0.2)

2 weeks (0.5)

3 weeks (0.3)

The point of this example is that often we will be concerned with random outcomes that depend on lot of other random variables. When we are working with simple sums of random variables, we can use the shortcuts discussed earlier in this note. But when the uncertainties are not so simply related to one another, we have to build probability trees (tables, whatever) that allow us to work out the complete probability distribution of the random outcome we are interested in. This can be a lot of work, to be certain. But analyzing risk and uncertainty in real life is often like that, and this type of problem arises frequently in such applications as nancial portfolio analysis, making production capacity decisions, managing inventories, choosing product marketing strategies, and evaluating commercial insurance contracts. You will get an early feel for all of these ideas in the Variance Associates note for class this week, and a more advanced application later in the term.