Vous êtes sur la page 1sur 52

EXAMPLES

OF TIME SERIES
Sequential data over time have played a role in various
different areas where records are kept over time.
These can be analyzed via time series models.
Let us consider some examples.
Example 1
Australian red wine sales (in kilolitres) from Jan 1980
Oct 1991
There are 142 records

Example 2
All-star baseball games from Jan 1933 1995
There are two possible values, namely 1 and -1
We observe that the National League has some long runs

Ex 1.1.2 (All-star baseball games, 1933-1995)


1 if the National League won in year t

xt =

-2

-1

1 if the American League won in year t

1940

1950

1960

1970

1980

1990

Example 3
Monthly accidental deaths during 1973 1979
There are peaks in July, with a seasonal pattern

Ex 1.1.3 (Accidental deaths, USA; DEATHS.TSM)

9
7

(thousands)

10

11

Figure 1.3 : Monthly accidental deaths

1973

1974

1975

1976

1977

1978

1979

Features: slight trend


seasonal component (peak in July)
5

Ex. 1.1.4 (Signal Detection; SIGNAL.TSM)


Model:
X t cos(t/ 10 )

N t , t = 1, 2,...,200

where N t is an IID sequence of N(0,.25) rv' s.


Figure 1.4: red = estimated signal

-2

-1

black= true signal

10

15

20

(thous

8
7
1973

1974

1975

1976

1977

1978

1979

Example 4 Features: slight trend


seasonal component (peak in July)

Signal detection in noisy sinuisoidal data


True signal in black
Estimated signal in red

Ex. 1.1.4 (Signal Detection; SIGNAL.TSM)


Model:
X t cos(t/ 10 )

N t , t = 1, 2,...,200

where N t is an IID sequence of N(0,.25) rv' s.


Figure 1.4: red = estimated signal

-2

-1

black= true signal

10

15

20

Example 5
US population during 1790 1990
Exhibits exponential growth
Small variation or little noise

Ex. 1.1.5 (Population of USA.; USPOP.TSM)

120
0

40

80

(Millions)

160

200

240

Figure 1.5.

1790

1820

1850

1880

1910

1940

1970


1.2 Objectives of Time Series Analysis
Modelling paradigm :
set up family of probability models to represent
data
estimate parameters of model

check model for goodness of fit


Applications of models:

provides a compact description of the data


interpretation
prediction

OBJECTIVES OF TIME SERIES


o
o
o

o
o
o
o

Modeling
Identify family of probability models to represent data
Estimate model parameters
Check goodness of fit of data to model

Application of winner model
Provides a compact description of data
Used for interpretation
for prediction and forecasting
for statistical hypothesis testing

Pre-processing data
Before identifying a model to fit the time series data, remove
any detectable signals.
Possible signals are
Trend over time (series increases or decreases with time)
Seasonal or cyclical components

Also examine data for
Constant variability over time
Systematic features
Regime shifts

A first step when exploring time series data is to graphically
examine them in various time series plots for above features.

Next use time series techniques to proceed.

SOME SIMPLE TIME SERIES MODELS



o A Time Series is a Stochastic Process {Xt : ! T}, that is, a
family of random variables Xt, for ! T, all defined on the
same probability space.

o T is the set of time points. T could also be the set of space
points.

o {xt : ! T} is a realization of {Xt : ! T}.

o One approach to time series is to study the second-order
properties of the data. The study of the second-order
properties includes the notions of

mean E(Xt) and

2nd moments E(Xt Xt+h), autocovariance, and
autocorrelations

1.3 Some Simple Time Series Models


DEFINITION 1.3.1. A time series model for the
observed data {xt} is a specification of the joint
Let us look at some simple examples of time series.
distributions of a sequence of random variables {Xt}
of which {xt} is postulated to be a realization.

Zero mean models

2nd Order Properties.

IID Noise: {X
means: t} is an i.i.d. sequence with mean 0 and
E(Xt)
!
variance !

2nd -order
moments: E(Xt+h Xt)
Binary Process: {Xt} is sequence of binary random
9
variables taking two possible values, e.g. 1 and -1 with
probability p and 1-p, respectively.
1.3.1 Zero-mean Models
Ex 1.3.1 (IID NOISE).
{Xt}~IID(0,

2)

if {Xt} is an IID sequence with mean 0 and variance

2.

Ex 1.3.2 (Binary Process).


{Xt} ~ IID
P[Xt = 1] = p, P[Xt = -1] = 1-p,
where p=.5. (Model for All Star baseball games??)

10


Other models that we will study are

White Noise: {Xt} is a sequence of uncorrelated identical
random variables with mean 0 and variance ! !

Gaussian Process: : {Xt} is the stochastic process, called
Gaussian process (to be defined)

Brownian Motion: {Xt} is the stochastic process, called
Brownian motion (to be defined)

Poisson Process: {Xt} is the stochastic process, called
Poisson process (to be defined)







Stationary Process (or Weakly Stationary Model)
Stationary Process {Yt : ! T} is a stochastic process where
both, the mean E(Yt) and the 2nd moments E(Yt Yt+h) are
independent of t (do not depend on t and are constant
functions of t).













Suppose that {Yt : ! T} is a stationary process.

o Consider models with added trend and seasonal
components.
1.3.2 Models with trend and seasonality
Model with no seasonal component.
Xt = mt + Yt ,
where mt is a slowly varying function called the
trend function.
Estimation via least squares.
mt = a0 + a1 t + a2 t2

e.g.

where coefficients are estimated by minimizing

xt - mt )2
11

Ex. 1.3.4 (Population of USA.; USPOP.TSM)


Model: Xt = a0 + a1 t + a2 t2 + Yt
a^ = 6.96x105 , a^ = -2.16 x 106, ^a = 6.51x 105
1

^
6
Forecast for year 2000: m
2000 = 274.35 x 10

120
80
60
40
20

(Millions)

160

200

240

Figure 1.8.

Estimation via least squares.


mt = a0 + a1 t + a2 t2

e.g.

where coefficients are estimated by minimizing

xt - mt )2
Added quadratic trend

11

Ex. 1.3.4 (Population of USA.; USPOP.TSM)


Model: Xt = a0 + a1 t + a2 t2 + Yt
a^ = 6.96x105 , a^ = -2.16 x 106, ^a = 6.51x 105
1

^
6
Forecast for year 2000: m
2000 = 274.35 x 10

120
80
60
40
20
0

(Millions)

160

200

240

Figure 1.8.

1800

1820

1840

1860

1880

1900

1920

1940

1960

1980

12

Added linear trend



Ex 1.3.5 (Lake Huron Levels (1875-1972);
LAKE.TSM)
Model: Xt = a0 + a1 t + Yt
12.0

Figure 1.9.

6.0

7.0

8.0

9.0

10.0

11.0

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

13

Figure 1.10. Residuals from the LS fit in previous


figure.

-2

-1

te: residuals do not look IID

1880

1900

1920

1940

1960

14

7.0
6.0

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

13

Residuals from Least Squares fit


Figure 1.10. Residuals from the LS fit in previous
figure.

-2

-1

te: residuals do not look IID

1880

1900

1920

1940

1960

14

Harmonic Regression
Useful for data exhibiting a clear periodic component.
Model: Xt = st + Yt ,
st = st-d (periodic component)
Convenient choice:
st = a0 +

k
j=1

(aj cos( jt) + bj sin( jt) ),

aj , bj are unknown parameters

where

are fixed frequencies, multiple of 2 d


(usually a Fourier frequency 2 k/n for some
k=1,..., [n/2].) For daily data,
2 /365.
j

15

Ex 1.1.6 (Accidental deaths, USA; DEATHS.TSM)


Model: Xt = st + Yt ,
2

st = a0 +
1=

j=1

(aj cos( jt) + bj sin( jt) )

2 12 (period 12),

2=

2 6 (period 6)

9
8
7

(thousands)

10

11

Figure 1.11 : Monthly accidental deaths

1973

1974

1975

1976

1977

1978

1979

16

Stationary Process A Wide Class of Time Series



The properties of time series that, after signal removal,
can be considered as stationary processes are commonly
studied via the structure of autocorrelations or
autocovariances.

The autocorrelation function and autocovariance function
serve as tools to understand the properties of this
important and wide class of time series.

What About Other Time Series?

IBM Returns

Let us look at IBM Returns, Foreign Exchange Rates,

S e rie s

2 0 .

1 5 .

1 0 .

5 .

0 .

-5 .

-1 0 .

-1 5 .

-2 0 .
0

500

1000

1500

2000

2500

7 0 0 .

6 0 0 .

5 0 0 .

4 0 0 .

3 0 0 .

2 0 0 .

1 0 0 .

-2 0

-1 5

-1 0

-5

10

15

20

Q -Q (N o rm a l) P lo t, R ^2 = .9 6 5 3 6 2

2 0 .

1 5 .

1 0 .

5 .

0 .

-5 .

-1 0 .

-1 5 .

-2 0 .
-3

-2

-1

There are some spikes or apparent extreme values


The tails are heavy









Foreign Exchange Rates


S e rie s
.5 0 0

.4 0 0

.3 0 0

.2 0 0

.1 0 0

.0 0 0

-.1 0 0

-.2 0 0

-.3 0 0

-.4 0 0

400

800

1200

R e s c a le d R e s id u a ls
5 0 0 .

4 0 0 .

3 0 0 .

2 0 0 .

1 0 0 .

-6

-4

-2

Q -Q (N o rm a l) P lo t R e s id u a ls , R ^2 = .9 4 9 3 3 4
.5 0 0

.4 0 0

.3 0 0

.2 0 0

.1 0 0

.0 0 0

-.1 0 0

-.2 0 0

-.3 0 0

-.4 0 0

-3

-2

-1


There are some spikes or apparent extreme values
The tails are heavy
The distribution is asymmetric

A Longer Series of IBM Returns


S e rie s
2 2 0 .

2 0 0 .

1 8 0 .

1 6 0 .

1 4 0 .

1 2 0 .

1 0 0 .

8 0 .

6 0 .

4 0 .

2 0 .

0 .

500

1000

1500

2000

2500

1500

2000

2500

S e rie s

1 5 .

1 0 .

5 .

0 .

-5 .

-1 0 .

-1 5 .

500

1000

As these historical financial returns show, financial series


tend to exhibit
o
o
o
o

heavy tails
non-normality
extreme values
clustering of extreme values and stochastic
volatility.

Such time series are non-linear.


For stock or asset returns, the volatility is not directly
observable.







Objectives of Time Series Analysis


Set up suitable models that fit or represent the data
Estimate parameters of the model
Check model for goodness of fit

A suitable model can be used for
describing the data in a compact form
interpretation
forecasting and prediction
hypothesis testing

Features of Sequential Data in Time or Financial Series


Data may be serially correlated
Relationship between consecutive observations
of time series can be linear or non-linear
Clustering of extreme observations
Heavy tails
Regime-shifts
Non-stationarity versus stationarity

A General Approach to Time Series Modeling


Plot and visually inspect the data
Inspect the series for
o a trend
o a seasonal component
o extreme observations
o sharp changes in behavior
o any other patterns
Remove trend and seasonal components to
obtain stationary residuals
Identify a suitable model to fit residuals

Objectives of this Course



Systematic introduction to linear and non-linear time
series models, their properties and applications to the
modeling and prediction of financial time series and other
data collected sequentially in time

We will learn specific techniques for analyzing data


Objectives of this Course (contd)

We will acquire a thorough understanding of
mathematical and statistical bases for concepts


Gain experience in analyzing financial, economic, and
other time series

Topics of this Course


o Returns and their distributions and moments
o Properties of time series models
Stationarity
Estimation and elimination of trend and seasonal
components
Autocovariance and autocorrelation functions
Linear processes
Multivariate normal distribution

o Linear time series, stationary ARMA processes


Modeling and Forecasting

o ARIMA models for non-stationary series


o Regression models with ARMA errors

Topics of this Course (contd)


Modeling volatility, GARCH models and family
Non-linear models
High-frequency data analysis
Continuous-time diffusion models, Black-Scholes
pricing
o Value at Risk (VaR), peak over the threshold,
expected shortfall, tail dependence
o
o
o
o

Topics of this Course (contd)


o Multivariate time series models and financial returns
o Multivariate volatility models, principal volatility
component analysis (as time permits)
o State space models and Kalman filters (as time permits)
o MCMC methods (as time permits)

Examples of Financial Time Series


Daily log returns of Amazon stock:



The VIX index

Quarterly earnings of Dell

Seasonal time series useful in



















o earning forecasts
o pricing weather related derivatives (e.g. energy)
o modeling intraday behavior of asset returns

Examples of Financial Time Series (contd)




US monthly interest rates (3m & 6m Treasury bills)
Relations between two series? Term structure of interest

Exchange rate between US Dollar vs. Euro or Japanese Yen


vs. Swiss Franc

Size of insurance claims (e.g. values of fire insurance
claims)

Projected medical insurance claims in outcomes research

Computing in R:

Why R?
R is free
R is considered a standard software for statisticians
R is widely used in academia and has many
contributors to software library.

Computing: The primary package used in this class is R,


which can be downloaded (for free) from
http://www.r-project.org/.
At that website, you can look up the most recent version,
as versions are rapidly changing (R-3.1.1, R-3.1.3, R3.2.1, R-3.2.2, etc).
The following packages in R are useful for time series:
fBasics, fGarch, quantmod, fUtilities,
fUnitRoots, timeSeries, nnet, evir, urca, mAr.
You may want to install and load the R package
Rmetrics. It is designed for modeling a wide range of
financial time series.
This can be done in R using the following two commands:
>source("http://www.rmetrics.org/Rmetrics.R")
>install.Rmetrics()

R Demonstration
Let us use monthly IBM stock returns from 1967
to 2008


Tasks ahead:
Set the working directory
Load the library fBasics
Read data set
Compute descriptive summary statistics
Perform test for mean return being zero
Perform normality test using the Jaque-Bera method
Perform skewness and kurtosis tests

Set the working directory


Working directory:
"C:\Users\Irene\Desktop\Teach Columbia\2013 (Fall)\DATASETS\Test\"


becomes

"C:/Users/Irene/Desktop/Teach Columbia/2013 (Fall)/DATASETS/Test/"

Data Set (monthly IBM stock returns from 1967 to 2008):
m-ibm6708.txt

# Set working directory Note the slash instead of a backslash

> setwd("C:/Users/rst/teaching/bs41202/sp2012")
> getwd()


















Load the library fBasics


Read data set
# Load the library fBasics.

> library(fBasics)

# Load data set m-ibm6708.txt (text file) with header on top

> da=read.table("m-ibm6708.txt",header=T)
# Read a data set from the clipboard

> dc <- read.table("clipboard")


# Load data set Starbucks.csv (comma separated file, Excel file)

> db <- read.csv("Starbucks.csv",header=T)
























Inspect the data set of IBM returns




# Show the first row of the data

> da[1,]
date ibm sprtn

1 19670331 0.048837 0.03941

# Get ibm simple returns, assign the values to the variable ibm

> ibm=da[,2]

# Transform the simple returns into log returns, call log return variable rt

> rt=log(ibm+1)























Plot log returns of the IBM returns



# Plot log returns with caption in purple

> plot(rt,type="l", col = "purple", lwd = 1, xlab = "time", ylab="IBM


log returns")

# Plot log returns with caption in red

> plot(rt,type="l", col = "red", lwd = 1, xlab = "time", ylab="IBM log


returns")

# Draw two horizontal lines at y=0.2 and y=-0.2

> abline(h=0.2, col="red", lwd=1)


> abline(h=-0.2, col="red", lwd=1)

# Add a title "Time Plot of Monthly log Returns of IBM Stock"

> title(main="Time Plot of Monthly log Returns of IBM Stock from


1967,3 to 2008.12")


















Let us create a pdf file of the plots




# Let us create a pdf file "IBMLogReturns.pdf"

> pdf("IBMLogReturns.pdf")

> plot(rt,type="l", col = "purple", lwd = 1, xlab = "time", ylab="IBM


log returns")

> plot(rt,type="l", col = "red", lwd = 1, xlab = "time", ylab="IBM log
returns")

> plot(rt,type="l", col = "purple", lwd = 1, xlab = "time", ylab="IBM
log returns")

> title(main="Time Plot of Monthly log Returns of IBM Stock from
1967,3 to 2008.12")

> abline(h=0.2, col="red", lwd=1)
> abline(h=-0.2, col="red", lwd=1)

# Close the file

> dev.off()








0
100
200
300

time
400
500

0.3

0.2

0.1

0.0

IBM log returns


0.1

0.2

0.3

0
100
200
300

time
400
500

0.3

0.2

0.1

0.0

IBM log returns


0.1

0.2

0.3

0.0
0.1
0.2
0.3

IBM log returns

0.1

0.2

0.3

Time Plot of Monthly log Returns of IBM Stock from 1967,3 to 2008.12

100

200

300
time

400

500

Compute descriptive summary statistics



# Compute the sample mean

> mean(rt)
[1] 0.006208082

# Compute the sample variance

> var(rt)
[1] 0.005258775

# Compute the sample skewness

> skewness(rt)
[1] -0.1353432

# Compute the sample excess kurtosis

> kurtosis(rt)
[1] 1.693092

# Compute all descriptive summary statistics at once

> basicStats(rt)
rt
nobs 502.000000
NAs 0.000000
Minimum -0.303683
Maximum 0.302915
1. Quartile -0.037641
3. Quartile 0.048443
Mean 0.006208
Median 0.005260
Sum 3.116457
SE Mean 0.003237
LCL Mean -0.000151
UCL Mean 0.012567
Variance 0.005259
23
Stdev 0.072517
Skewness -0.135343
Kurtosis 1.693092

Perform test for mean return being zero




> t.test(rt)
# Let us perform the test that the mean log return is equal to zero

One Sample t-test


data: rt
t = 1.9181, df = 501, p-value = 0.05567
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.0001509194 0.0125670842
sample estimates:
mean of x
0.006208082

Perform normality test using the Jaque-Bera method

# Test for the normality assumption

> normalTest(rt,method=jb)
Title:
Jarque - Bera Normality Test
Test Results:
STATISTIC:
X-squared: 62.8363
P VALUE:
Asymptotic p Value: 2.265e-14

Perform skewness and kurtosis tests


Quit R


# Perform skewness test

> s3=skewness(rt)

> T=length(rt)
> tst=s3/sqrt(6/T)
> tst
[1] -1.237977

# Compute two-sided p-value of the test statistic

> pv=2*pnorm(tst)
> pv
[1] 0.2157246

# Perform kurtosis test

> k4=kurtosis(rt)
> tst=k4/sqrt(24/T)
> tst
[1] 7.743311

# quit R.

>q()







Data Mining: Looking for Hidden Patterns


& Predicting Future Outcomes:
Do We Still Need Time Series Analysis?

Vous aimerez peut-être aussi