Académique Documents
Professionnel Documents
Culture Documents
OF TIME SERIES
Sequential data over time have played a role in various
different areas where records are kept over time.
These can be analyzed via time series models.
Let us consider some examples.
Example 1
Australian red wine sales (in kilolitres) from Jan 1980
Oct 1991
There are 142 records
Example 2
All-star baseball games from Jan 1933 1995
There are two possible values, namely 1 and -1
We observe that the National League has some long runs
xt =
-2
-1
1940
1950
1960
1970
1980
1990
Example 3
Monthly accidental deaths during 1973 1979
There are peaks in July, with a seasonal pattern
9
7
(thousands)
10
11
1973
1974
1975
1976
1977
1978
1979
N t , t = 1, 2,...,200
-2
-1
10
15
20
(thous
8
7
1973
1974
1975
1976
1977
1978
1979
N t , t = 1, 2,...,200
-2
-1
10
15
20
Example 5
US population during 1790 1990
Exhibits exponential growth
Small variation or little noise
Ex. 1.1.5 (Population of USA.; USPOP.TSM)
120
0
40
80
(Millions)
160
200
240
Figure 1.5.
1790
1820
1850
1880
1910
1940
1970
1.2 Objectives of Time Series Analysis
Modelling paradigm :
set up family of probability models to represent
data
estimate parameters of model
o
o
o
o
Modeling
Identify family of probability models to represent data
Estimate model parameters
Check goodness of fit of data to model
Application of winner model
Provides a compact description of data
Used for interpretation
for prediction and forecasting
for statistical hypothesis testing
Pre-processing data
Before identifying a model to fit the time series data, remove
any detectable signals.
Possible signals are
Trend over time (series increases or decreases with time)
Seasonal or cyclical components
Also examine data for
Constant variability over time
Systematic features
Regime shifts
A first step when exploring time series data is to graphically
examine them in various time series plots for above features.
Next use time series techniques to proceed.
IID Noise: {X
means: t} is an i.i.d. sequence with mean 0 and
E(Xt)
!
variance !
2nd -order
moments: E(Xt+h Xt)
Binary Process: {Xt} is sequence of binary random
9
variables taking two possible values, e.g. 1 and -1 with
probability p and 1-p, respectively.
1.3.1 Zero-mean Models
Ex 1.3.1 (IID NOISE).
{Xt}~IID(0,
2)
2.
10
Other models that we will study are
White Noise: {Xt} is a sequence of uncorrelated identical
random variables with mean 0 and variance ! !
Gaussian Process: : {Xt} is the stochastic process, called
Gaussian process (to be defined)
Brownian Motion: {Xt} is the stochastic process, called
Brownian motion (to be defined)
Poisson Process: {Xt} is the stochastic process, called
Poisson process (to be defined)
Stationary Process (or Weakly Stationary Model)
Stationary Process {Yt : ! T} is a stochastic process where
both, the mean E(Yt) and the 2nd moments E(Yt Yt+h) are
independent of t (do not depend on t and are constant
functions of t).
Suppose that {Yt : ! T} is a stationary process.
o Consider models with added trend and seasonal
components.
1.3.2 Models with trend and seasonality
Model with no seasonal component.
Xt = mt + Yt ,
where mt is a slowly varying function called the
trend function.
Estimation via least squares.
mt = a0 + a1 t + a2 t2
e.g.
xt - mt )2
11
^
6
Forecast for year 2000: m
2000 = 274.35 x 10
120
80
60
40
20
(Millions)
160
200
240
Figure 1.8.
e.g.
xt - mt )2
Added quadratic trend
11
^
6
Forecast for year 2000: m
2000 = 274.35 x 10
120
80
60
40
20
0
(Millions)
160
200
240
Figure 1.8.
1800
1820
1840
1860
1880
1900
1920
1940
1960
1980
12
Figure 1.9.
6.0
7.0
8.0
9.0
10.0
11.0
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
13
-2
-1
1880
1900
1920
1940
1960
14
7.0
6.0
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
13
-2
-1
1880
1900
1920
1940
1960
14
Harmonic Regression
Useful for data exhibiting a clear periodic component.
Model: Xt = st + Yt ,
st = st-d (periodic component)
Convenient choice:
st = a0 +
k
j=1
where
15
st = a0 +
1=
j=1
2 12 (period 12),
2=
2 6 (period 6)
9
8
7
(thousands)
10
11
1973
1974
1975
1976
1977
1978
1979
16
IBM Returns
S e rie s
2 0 .
1 5 .
1 0 .
5 .
0 .
-5 .
-1 0 .
-1 5 .
-2 0 .
0
500
1000
1500
2000
2500
7 0 0 .
6 0 0 .
5 0 0 .
4 0 0 .
3 0 0 .
2 0 0 .
1 0 0 .
-2 0
-1 5
-1 0
-5
10
15
20
Q -Q (N o rm a l) P lo t, R ^2 = .9 6 5 3 6 2
2 0 .
1 5 .
1 0 .
5 .
0 .
-5 .
-1 0 .
-1 5 .
-2 0 .
-3
-2
-1
.4 0 0
.3 0 0
.2 0 0
.1 0 0
.0 0 0
-.1 0 0
-.2 0 0
-.3 0 0
-.4 0 0
400
800
1200
R e s c a le d R e s id u a ls
5 0 0 .
4 0 0 .
3 0 0 .
2 0 0 .
1 0 0 .
-6
-4
-2
Q -Q (N o rm a l) P lo t R e s id u a ls , R ^2 = .9 4 9 3 3 4
.5 0 0
.4 0 0
.3 0 0
.2 0 0
.1 0 0
.0 0 0
-.1 0 0
-.2 0 0
-.3 0 0
-.4 0 0
-3
-2
-1
There are some spikes or apparent extreme values
The tails are heavy
The distribution is asymmetric
2 0 0 .
1 8 0 .
1 6 0 .
1 4 0 .
1 2 0 .
1 0 0 .
8 0 .
6 0 .
4 0 .
2 0 .
0 .
500
1000
1500
2000
2500
1500
2000
2500
S e rie s
1 5 .
1 0 .
5 .
0 .
-5 .
-1 0 .
-1 5 .
500
1000
heavy tails
non-normality
extreme values
clustering of extreme values and stochastic
volatility.
Objectives of this Course (contd)
We will acquire a thorough understanding of
mathematical and statistical bases for concepts
Gain experience in analyzing financial, economic, and
other time series
o earning forecasts
o pricing weather related derivatives (e.g. energy)
o modeling intraday behavior of asset returns
Computing in R:
Why R?
R is free
R is considered a standard software for statisticians
R is widely used in academia and has many
contributors to software library.
R Demonstration
Let us use monthly IBM stock returns from 1967
to 2008
Tasks ahead:
Set the working directory
Load the library fBasics
Read data set
Compute descriptive summary statistics
Perform test for mean return being zero
Perform normality test using the Jaque-Bera method
Perform skewness and kurtosis tests
becomes
"C:/Users/Irene/Desktop/Teach Columbia/2013 (Fall)/DATASETS/Test/"
Data Set (monthly IBM stock returns from 1967 to 2008):
m-ibm6708.txt
> setwd("C:/Users/rst/teaching/bs41202/sp2012")
> getwd()
> library(fBasics)
> da=read.table("m-ibm6708.txt",header=T)
# Read a data set from the clipboard
> da[1,]
date ibm sprtn
1 19670331 0.048837 0.03941
# Get ibm simple returns, assign the values to the variable ibm
> ibm=da[,2]
# Transform the simple returns into log returns, call log return variable rt
> rt=log(ibm+1)
# Plot log returns with caption in purple
> pdf("IBMLogReturns.pdf")
> dev.off()
0
100
200
300
time
400
500
0.3
0.2
0.1
0.0
0.2
0.3
0
100
200
300
time
400
500
0.3
0.2
0.1
0.0
0.2
0.3
0.0
0.1
0.2
0.3
0.1
0.2
0.3
Time Plot of Monthly log Returns of IBM Stock from 1967,3 to 2008.12
100
200
300
time
400
500
# Compute the sample mean
> mean(rt)
[1] 0.006208082
# Compute the sample variance
> var(rt)
[1] 0.005258775
# Compute the sample skewness
> skewness(rt)
[1] -0.1353432
# Compute the sample excess kurtosis
> kurtosis(rt)
[1] 1.693092
> basicStats(rt)
rt
nobs 502.000000
NAs 0.000000
Minimum -0.303683
Maximum 0.302915
1. Quartile -0.037641
3. Quartile 0.048443
Mean 0.006208
Median 0.005260
Sum 3.116457
SE Mean 0.003237
LCL Mean -0.000151
UCL Mean 0.012567
Variance 0.005259
23
Stdev 0.072517
Skewness -0.135343
Kurtosis 1.693092
> normalTest(rt,method=jb)
Title:
Jarque - Bera Normality Test
Test Results:
STATISTIC:
X-squared: 62.8363
P VALUE:
Asymptotic p Value: 2.265e-14
> s3=skewness(rt)
> T=length(rt)
> tst=s3/sqrt(6/T)
> tst
[1] -1.237977
# Compute two-sided p-value of the test statistic
> pv=2*pnorm(tst)
> pv
[1] 0.2157246
# Perform kurtosis test
> k4=kurtosis(rt)
> tst=k4/sqrt(24/T)
> tst
[1] 7.743311
# quit R.
>q()