Vous êtes sur la page 1sur 6

Chapter 3: Forecasting From Time Series Models

Part 1: White Noise and Moving Average Models


Stationarity In this chapter, we study models for stationary time series. A time series is stationary if its underlying statistical structure does not evolve with time. A stationary series is unlikely to exhibit long-term trends. To see why, we need a better denition of trend. Trend is a tendency of the series to increase (or decrease) not necessarily for the realization that actually occurred, but instead for the "typical" realization. Consider the underlying (population) mean of the series, E [xt ]. This represents the average value we would get for the series at time t if we could turn back the hands of time and look at many realizations of the series. (We only have one realization, but the key aspect of statistical thinking is to admit that the data series we actually got is just a sample realization from the many series we might have gotten but didnt.) Since E [xt ] is determined by the underlying statistical structure of the series, stationarity implies that E [xt ] must not depend on time. But this means that the series has no underlying trend. Since stationary time series are unlikely to exhibit long-term trends, any apparent trend in the data should rst be removed (e.g. by differencing) if the models discussed here are to be usefully applied.

Autocorrelation A useful measure of linear association between two random variables X and Y (assumed here to have mean zero) is their correlation : Corr (X , Y ) = Covariance (X , Y )
Variance (X ) Variance (Y )

E [XY ] E [X 2] E [Y 2]

It is always true that 1 Corr (X , Y ) 1 . The larger the absolute value of Corr (X , Y ), the stronger the linear association between X and Y . The concept of correlation plays a key role in time series analysis. We think of xt

-2(t = 1 , 2 , 3 , . . . ) as a sequence of random variables, any pair of which may be correlated. The stationarity assumption implies that Corr (xt , xt j ) depends only on the time separation j , and not on the time location t . Note that Corr (xt , xt ) = 1 . We often refer to Corr (xt , xt j ) as an autocorrelation since it is a correlation between the time series and its own past. As long as Corr (xt , xt j ) 0 for some j > 0, we can obtain a linear forecast of xt from xt 1 , xt 2 , . . . . To give a simple example of this, suppose that Corr (xt , xt 1) = 1, and that

E [xt ] = E [xt 1] = 0. Since 1 is a correlation, we must have 1 1 1. The larger 1 , the stronger the linear association between xt and xt 1. The best linear predictor of xt based on xt 1 is 1xt 1. Thus, if 1 > 0 (positive autocorrelation ), the forecast for xt increases as xt 1 increases. Examples include the daily NASDAQ index, daily temperatures, monthly interest rates. If 1 < 0 (negative autocorrelation ), the forecast for xt decreases as xt 1 increases. Examples include hours of sleep per night, the daily caloric intake of an individual, and output from a production process which is being continuously adjusted to achieve a desired target output. If 1 = 0, then xt and xt 1 are said to be uncorrelated , and the best linear forecast of xt is zero (or, in general, E [xt ]). Examples include the daily returns on General Motors stock, and the difference between 3.5 and the number rolled in independent tosses of a fair die. Note that, if xt and xt 1 are uncorrelated, knowledge of xt 1 does not help us to linearly forecast xt . White Noise A stationary time series t is said to be white noise if Corr (t , s ) = 0 for all t s .

Thus, t is a sequence of uncorrelated random variables with constant variance and constant mean. We will assume that this constant mean value is zero. Plots of white noise series exhibit a very erratic, jumpy, unpredictable behavior. Since the t are uncorrelated, previous values do not help us to forecast future values. The results of successive spins of a roulette wheel provide an example of a white noise series. White noise series themselves are quite uninteresting from a forecasting standpoint (they are not linearly forecastable), but they form the building blocks for more general models. In economic time series, the white noise series is often thought of as representing innovations , or

-3shocks . That is, t represents those aspects of the time series of interest which could not have been predicted in advance.

Moving Averages A simple moving average is a series xt generated from a white noise series t by the rule xt = t + t 1 . Note that, unless = 0 , xt will have a nontrivial correlation structure. In fact, we have Corr (xt , xt 1) = var t var xt (j > 1)

(1)

Corr (xt , xt j ) = 0

(2)

Equation (1) implies that if is positive, then adjacent terms of xt will be positively correlated, so that an above average (i.e., positive) xt will tend to be followed by a further above average value. For a white noise series, = 0 , an above average value is equally likely to be followed by an above average or a below average value. If is negative, then an above average term is likely to be followed by a below average term. We see that if is positive (zero or negative), then xt will be smoother (as smooth or rougher) than the white noise series t . Equation (2) implies that there is no correlation between the present value of xt and all previous values apart from the most recent. Thus, the simple moving average is said to have a short memory . To prove equation (1), note that E [xt xt 1] = E [(t + t 1) (t 1 + t 2)] = E [t t 1] + E [t21] + 2 E [t 1 t 2] + E [t t 2] = E [t21] since E [t t j ] = 0 for all j > 0 . Since t is stationary with zero mean, so is xt , and we have E [t21] = var t 1 = var t , var xt 1 = var xt .

-4-

Thus, E [xt xt 1] E [t21] var t var xt

Corr [xt , xt 1] =

var xt var xt 1

var xt var xt 1

The proof of (2) is similar. Suppose we want to forecast xn +1 from x 1 , . . . , xn for the simple moving average xt = t + t 1 with known. Since xn +1 = n +1 + n and since the best forecast of n +1 is zero, the optimal forecast of xn +1 is f n , 1 = n . Since

To obtain a useful forecast, however, we must express n in terms of x 1 , . . . , xn . f n 1 , 1 = n 1 and since xn = n + n 1 , we have n = xn n 1 = xn f n 1 , 1 .

If we (somewhat arbitrarily) set f 0 , 1 = 0 , then we can compute (approximations to) 1 , 2 , . . . , n by recursively applying the formula t = xt f t 1 , 1 , We can then evaluate the forecast f n , 1 = n as well as f n 1 , 1 , . . . , f 1 , 1. A measure of forecastability of a series is R 2 = 1 Variance of Forecast Error Variance of xt . , t =1,2, . . . , n .

This analogous to the version of R 2 used in regression analysis, R 2 = 1 SSResid /SSTotal . Just as in regression, the forecasting version of R 2 is guaranteed to be between 0 and 1. If R 2 is close to zero, then the variance of the forecast error is almost as large as the variance of the series xt itself. Thus, the best linear forecast is not performing much better than the trivial forecast (zero), so the series is not very forecastable when R 2 is close to zero. On the other hand, if R 2 is close to 1, then the forecast error

-5-

from the best forecast is much less than the forecast error from the trivial forecast, so the series is very forecastable in this case. For the simple moving average, we have var xt = var (t + t 1) = var t + 2 var t 1 + 2Cov (t , t 1) = (1 + 2) var t The forecast error is en , 1 = xn +1 f n , 1 = [n +1 + n ] n = n +1 . Thus, R 2 reduces to R2 = 1 var t (1 + 2)var t = 1 1 2 = 1 + 2 1 + 2 .

Forecasts for more than one step ahead are all zero, since for h > 1 we have xn +h = n +h + n +h 1 , both terms of which are "unforecastable". In this case, R 2 = 0 . The simple moving average is also called a f irst order moving average, denoted by MA(1), because it contains just one parameter, . qth order moving average MA(q), given by xt = t + 1t 1 + 2t 2 + . . . + q t q It is easily shown that for the MA(q) series, Corr (xt , xt j ) = 0 and hence the forecast f n , h is zero for h > q . Since xn +1 = n +1 + 1n + 2n 1 + . . . + q n q +1 , the optimal one-step forecast is f n , 1 = 1n + 2n 1 +...+ q n q +1 j >q . A generalization of the MA(1) model is the

-6-

and the one-step forecast error is en , 1 = xn +1 f n , 1 = n +1 . To actually f orm the optimum forecast, start with f 0 , 1 = 0 and then form t = xt f t 1 , 1 , t = 1,2, . . . , n .

In general, to form the forecast f n , j for j q , write down the expression for xn +j , replace every unknowable future value of t by 0 and all the remaining values by the t given above. How might a moving average model arise in the real world? For example, suppose yt is the daily change in price of some product, and t +1 is the effect on tomorrows price of unexpected news. The full impact of this news may not be completely absorbed by the market. Suppose this takes two days to happen. Then we would have yt +2 = t +2 + b t +1, where t +2 is the news which cannot be predicted from time t +1, and b t +1 is the reassessment of the earlier piece of news. This would lead to an MA (1) model for yt . Another way that an MA (1) model can arise is through overdif f erencing . Suppose, for example, that our original time series is white noise, xt = t . This is stationary, and should not be differenced. Differencing a series which is already stationary is called overdif f erencing and should be avoided if possible, but it often occurs by accident. If we (over) difference our white noise series, we get yt = t t 1, which is an MA (1) series, with negative autocorrelation. Although the original series was not linearly predictable, the differenced series will be!

Vous aimerez peut-être aussi