PART 3 - Time Series Examples - 7sept2016

EXAMPLES
OF TIME SERIES
Sequential data over time have played a role in various
different areas where records are kept over time.
These can be analyzed via time series models.
Let us consider some examples.
Example 1
Australian red wine sales (in kilolitres) from Jan 1980
Oct 1991
There are 142 records

Example 2
All-star baseball games from Jan 1933 1995
There are two possible values, namely 1 and -1
We observe that the National League has some long runs
Ex 1.1.2 (All-star baseball games, 1933-1995)

1 if the National League won in year t
xt =
-2
-1
1 if the American League won in year t
1940
1950
1960
1970
1980
1990
Example 3
Monthly accidental deaths during 1973 1979
There are peaks in July, with a seasonal pattern
Ex 1.1.3 (Accidental deaths, USA; DEATHS.TSM)
9
7
(thousands)
10
11
Figure 1.3 : Monthly accidental deaths
1973
1974
1975
1976
1977
1978
1979
Features: slight trend

seasonal component (peak in July)
5
Ex. 1.1.4 (Signal Detection; SIGNAL.TSM)

Model:
X t cos(t/ 10 )
N t , t = 1, 2,...,200
where N t is an IID sequence of N(0,.25) rv' s.

Figure 1.4: red = estimated signal
-2
-1
black= true signal
10
15
20
(thous
8
7
1973
1974
1975
1976
1977
1978
1979
Example 4 Features: slight trend

seasonal component (peak in July)
Signal detection in noisy sinuisoidal data

True signal in black
Estimated signal in red
Ex. 1.1.4 (Signal Detection; SIGNAL.TSM)

Model:
X t cos(t/ 10 )
N t , t = 1, 2,...,200
where N t is an IID sequence of N(0,.25) rv' s.

Figure 1.4: red = estimated signal
-2
-1
black= true signal
10
15
20
Example 5
US population during 1790 1990
Exhibits exponential growth
Small variation or little noise

Ex. 1.1.5 (Population of USA.; USPOP.TSM)
120
0
40
80
(Millions)
160
200
240
Figure 1.5.
1790
1820
1850
1880
1910
1940
1970

1.2 Objectives of Time Series Analysis
Modelling paradigm :
set up family of probability models to represent
data
estimate parameters of model
check model for goodness of fit

Applications of models:
provides a compact description of the data

interpretation
prediction
OBJECTIVES OF TIME SERIES

o
o
o
o
o
o
o

Modeling
Identify family of probability models to represent data
Estimate model parameters
Check goodness of fit of data to model

Application of winner model
Provides a compact description of data
Used for interpretation
for prediction and forecasting
for statistical hypothesis testing

Pre-processing data
Before identifying a model to fit the time series data, remove
any detectable signals.
Possible signals are
Trend over time (series increases or decreases with time)
Seasonal or cyclical components

Also examine data for
Constant variability over time
Systematic features
Regime shifts

A first step when exploring time series data is to graphically
examine them in various time series plots for above features.

Next use time series techniques to proceed.

SOME SIMPLE TIME SERIES MODELS

o A Time Series is a Stochastic Process {Xt : ! T}, that is, a
family of random variables Xt, for ! T, all defined on the
same probability space.

o T is the set of time points. T could also be the set of space
points.

o {xt : ! T} is a realization of {Xt : ! T}.

o One approach to time series is to study the second-order
properties of the data. The study of the second-order
properties includes the notions of

mean E(Xt) and

2nd moments E(Xt Xt+h), autocovariance, and
autocorrelations

1.3 Some Simple Time Series Models

DEFINITION 1.3.1. A time series model for the
observed data {xt} is a specification of the joint
Let us look at some simple examples of time series.
distributions of a sequence of random variables {Xt}
of which {xt} is postulated to be a realization.
Zero mean models
2nd Order Properties.
IID Noise: {X
means: t} is an i.i.d. sequence with mean 0 and
E(Xt)
!
variance !

2nd -order
moments: E(Xt+h Xt)
Binary Process: {Xt} is sequence of binary random
9
variables taking two possible values, e.g. 1 and -1 with
probability p and 1-p, respectively.
1.3.1 Zero-mean Models
Ex 1.3.1 (IID NOISE).
{Xt}~IID(0,
2)
if {Xt} is an IID sequence with mean 0 and variance
2.
Ex 1.3.2 (Binary Process).

{Xt} ~ IID
P[Xt = 1] = p, P[Xt = -1] = 1-p,
where p=.5. (Model for All Star baseball games??)
10

Other models that we will study are

White Noise: {Xt} is a sequence of uncorrelated identical
random variables with mean 0 and variance ! !

Gaussian Process: : {Xt} is the stochastic process, called
Gaussian process (to be defined)

Brownian Motion: {Xt} is the stochastic process, called
Brownian motion (to be defined)

Poisson Process: {Xt} is the stochastic process, called
Poisson process (to be defined)

Stationary Process (or Weakly Stationary Model)
Stationary Process {Yt : ! T} is a stochastic process where
both, the mean E(Yt) and the 2nd moments E(Yt Yt+h) are
independent of t (do not depend on t and are constant
functions of t).

Suppose that {Yt : ! T} is a stationary process.

o Consider models with added trend and seasonal
components.
1.3.2 Models with trend and seasonality
Model with no seasonal component.
Xt = mt + Yt ,
where mt is a slowly varying function called the
trend function.
Estimation via least squares.
mt = a0 + a1 t + a2 t2
e.g.
where coefficients are estimated by minimizing
xt - mt )2
11

Model: Xt = a0 + a1 t + a2 t2 + Yt
a^ = 6.96x105 , a^ = -2.16 x 106, ^a = 6.51x 105
1
^
6
Forecast for year 2000: m
2000 = 274.35 x 10
120
80
60
40
20
(Millions)
160
200
240
Figure 1.8.
Estimation via least squares.

mt = a0 + a1 t + a2 t2
e.g.
where coefficients are estimated by minimizing
xt - mt )2
Added quadratic trend

11

Model: Xt = a0 + a1 t + a2 t2 + Yt
a^ = 6.96x105 , a^ = -2.16 x 106, ^a = 6.51x 105
1
^
6
Forecast for year 2000: m
2000 = 274.35 x 10
120
80
60
40
20
0
(Millions)
160
200
240
Figure 1.8.
1800
1820
1840
1860
1880
1900
1920
1940
1960
1980
12
Added linear trend

Ex 1.3.5 (Lake Huron Levels (1875-1972);
LAKE.TSM)
Model: Xt = a0 + a1 t + Yt
12.0
Figure 1.9.
6.0
7.0
8.0
9.0
10.0
11.0
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
13
Figure 1.10. Residuals from the LS fit in previous

figure.
-2
-1
te: residuals do not look IID
1880
1900
1920
1940
1960
14
7.0
6.0
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
13
Residuals from Least Squares fit

Figure 1.10. Residuals from the LS fit in previous
figure.
-2
-1
te: residuals do not look IID
1880
1900
1920
1940
1960
14
Harmonic Regression
Useful for data exhibiting a clear periodic component.
Model: Xt = st + Yt ,
st = st-d (periodic component)
Convenient choice:
st = a0 +
k
j=1
(aj cos( jt) + bj sin( jt) ),
aj , bj are unknown parameters
where
are fixed frequencies, multiple of 2 d

(usually a Fourier frequency 2 k/n for some
k=1,..., [n/2].) For daily data,
2 /365.
j
15
Ex 1.1.6 (Accidental deaths, USA; DEATHS.TSM)

Model: Xt = st + Yt ,
2
st = a0 +
1=
j=1
(aj cos( jt) + bj sin( jt) )
2 12 (period 12),
2=
2 6 (period 6)
9
8
7
(thousands)
10
11
Figure 1.11 : Monthly accidental deaths
1973
1974
1975
1976
1977
1978
1979
16
Stationary Process A Wide Class of Time Series

The properties of time series that, after signal removal,
can be considered as stationary processes are commonly
studied via the structure of autocorrelations or
autocovariances.

The autocorrelation function and autocovariance function
serve as tools to understand the properties of this
important and wide class of time series.

What About Other Time Series?
IBM Returns

Let us look at IBM Returns, Foreign Exchange Rates,
S e rie s
2 0 .
1 5 .
1 0 .
5 .
0 .
-5 .
-1 0 .
-1 5 .
-2 0 .
0
500
1000
1500
2000
2500
7 0 0 .
6 0 0 .
5 0 0 .
4 0 0 .
3 0 0 .
2 0 0 .
1 0 0 .
-2 0
-1 5
-1 0
-5
10
15
20
Q -Q (N o rm a l) P lo t, R ^2 = .9 6 5 3 6 2
2 0 .
1 5 .
1 0 .
5 .
0 .
-5 .
-1 0 .
-1 5 .
-2 0 .
-3
-2
-1
There are some spikes or apparent extreme values

The tails are heavy

Foreign Exchange Rates

S e rie s
.5 0 0
.4 0 0
.3 0 0
.2 0 0
.1 0 0
.0 0 0
-.1 0 0
-.2 0 0
-.3 0 0
-.4 0 0
400
800
1200
R e s c a le d R e s id u a ls
5 0 0 .
4 0 0 .
3 0 0 .
2 0 0 .
1 0 0 .
-6
-4
-2
Q -Q (N o rm a l) P lo t R e s id u a ls , R ^2 = .9 4 9 3 3 4
.5 0 0
.4 0 0
.3 0 0
.2 0 0
.1 0 0
.0 0 0
-.1 0 0
-.2 0 0
-.3 0 0
-.4 0 0
-3
-2
-1

There are some spikes or apparent extreme values
The tails are heavy
The distribution is asymmetric

A Longer Series of IBM Returns

S e rie s
2 2 0 .
2 0 0 .
1 8 0 .
1 6 0 .
1 4 0 .
1 2 0 .
1 0 0 .
8 0 .
6 0 .
4 0 .
2 0 .
0 .
500
1000
1500
2000
2500
1500
2000
2500
S e rie s
1 5 .
1 0 .
5 .
0 .
-5 .
-1 0 .
-1 5 .
500
1000
As these historical financial returns show, financial series

tend to exhibit
o
o
o
o
heavy tails
non-normality
extreme values
clustering of extreme values and stochastic
volatility.
Such time series are non-linear.

For stock or asset returns, the volatility is not directly
observable.

Objectives of Time Series Analysis

Set up suitable models that fit or represent the data
Estimate parameters of the model
Check model for goodness of fit

A suitable model can be used for
describing the data in a compact form
interpretation
forecasting and prediction
hypothesis testing
Features of Sequential Data in Time or Financial Series

Data may be serially correlated
Relationship between consecutive observations
of time series can be linear or non-linear
Clustering of extreme observations
Heavy tails
Regime-shifts
Non-stationarity versus stationarity

A General Approach to Time Series Modeling

Plot and visually inspect the data
Inspect the series for
o a trend
o a seasonal component
o extreme observations
o sharp changes in behavior
o any other patterns
Remove trend and seasonal components to
obtain stationary residuals
Identify a suitable model to fit residuals
Objectives of this Course

Systematic introduction to linear and non-linear time
series models, their properties and applications to the
modeling and prediction of financial time series and other
data collected sequentially in time

We will learn specific techniques for analyzing data

Objectives of this Course (contd)

We will acquire a thorough understanding of
mathematical and statistical bases for concepts

Gain experience in analyzing financial, economic, and
other time series
Topics of this Course

o Returns and their distributions and moments
o Properties of time series models
Stationarity
Estimation and elimination of trend and seasonal
components
Autocovariance and autocorrelation functions
Linear processes
Multivariate normal distribution
o Linear time series, stationary ARMA processes

Modeling and Forecasting
o ARIMA models for non-stationary series

o Regression models with ARMA errors
Topics of this Course (contd)

Modeling volatility, GARCH models and family
Non-linear models
High-frequency data analysis
Continuous-time diffusion models, Black-Scholes
pricing
o Value at Risk (VaR), peak over the threshold,
expected shortfall, tail dependence
o
o
o
o
Topics of this Course (contd)

o Multivariate time series models and financial returns
o Multivariate volatility models, principal volatility
component analysis (as time permits)
o State space models and Kalman filters (as time permits)
o MCMC methods (as time permits)

Examples of Financial Time Series

Daily log returns of Amazon stock:

The VIX index

Quarterly earnings of Dell

Seasonal time series useful in

o earning forecasts
o pricing weather related derivatives (e.g. energy)
o modeling intraday behavior of asset returns
Examples of Financial Time Series (contd)

US monthly interest rates (3m & 6m Treasury bills)
Relations between two series? Term structure of interest

Exchange rate between US Dollar vs. Euro or Japanese Yen

vs. Swiss Franc

Size of insurance claims (e.g. values of fire insurance
claims)

Projected medical insurance claims in outcomes research

Computing in R:

Why R?
R is free
R is considered a standard software for statisticians
R is widely used in academia and has many
contributors to software library.

Computing: The primary package used in this class is R,

which can be downloaded (for free) from
http://www.r-project.org/.
At that website, you can look up the most recent version,
as versions are rapidly changing (R-3.1.1, R-3.1.3, R3.2.1, R-3.2.2, etc).
The following packages in R are useful for time series:
fBasics, fGarch, quantmod, fUtilities,
fUnitRoots, timeSeries, nnet, evir, urca, mAr.
You may want to install and load the R package
Rmetrics. It is designed for modeling a wide range of
financial time series.
This can be done in R using the following two commands:
>source("http://www.rmetrics.org/Rmetrics.R")
>install.Rmetrics()
R Demonstration
Let us use monthly IBM stock returns from 1967
to 2008

Tasks ahead:
Set the working directory
Load the library fBasics
Read data set
Compute descriptive summary statistics
Perform test for mean return being zero
Perform normality test using the Jaque-Bera method
Perform skewness and kurtosis tests
Set the working directory

Working directory:
"C:\Users\Irene\Desktop\Teach Columbia\2013 (Fall)\DATASETS\Test\"

becomes

"C:/Users/Irene/Desktop/Teach Columbia/2013 (Fall)/DATASETS/Test/"

Data Set (monthly IBM stock returns from 1967 to 2008):
m-ibm6708.txt
# Set working directory Note the slash instead of a backslash
> setwd("C:/Users/rst/teaching/bs41202/sp2012")
> getwd()

Load the library fBasics

Read data set
# Load the library fBasics.
> library(fBasics)

# Load data set m-ibm6708.txt (text file) with header on top
> da=read.table("m-ibm6708.txt",header=T)
# Read a data set from the clipboard
> dc <- read.table("clipboard")

# Load data set Starbucks.csv (comma separated file, Excel file)
> db <- read.csv("Starbucks.csv",header=T)

Inspect the data set of IBM returns

# Show the first row of the data
> da[1,]
date ibm sprtn

1 19670331 0.048837 0.03941

# Get ibm simple returns, assign the values to the variable ibm
> ibm=da[,2]

# Transform the simple returns into log returns, call log return variable rt
> rt=log(ibm+1)

Plot log returns of the IBM returns

# Plot log returns with caption in purple
> plot(rt,type="l", col = "purple", lwd = 1, xlab = "time", ylab="IBM

log returns")

# Plot log returns with caption in red
> plot(rt,type="l", col = "red", lwd = 1, xlab = "time", ylab="IBM log

returns")

# Draw two horizontal lines at y=0.2 and y=-0.2
> abline(h=0.2, col="red", lwd=1)

> abline(h=-0.2, col="red", lwd=1)

# Add a title "Time Plot of Monthly log Returns of IBM Stock"
> title(main="Time Plot of Monthly log Returns of IBM Stock from

1967,3 to 2008.12")

Let us create a pdf file of the plots

# Let us create a pdf file "IBMLogReturns.pdf"
> pdf("IBMLogReturns.pdf")


log returns")

> plot(rt,type="l", col = "red", lwd = 1, xlab = "time", ylab="IBM log
returns")

log returns")

> title(main="Time Plot of Monthly log Returns of IBM Stock from
1967,3 to 2008.12")

> abline(h=0.2, col="red", lwd=1)
> abline(h=-0.2, col="red", lwd=1)

# Close the file
> dev.off()

0
100
200
300
time
400
500
0.3
0.2
0.1
0.0
IBM log returns

0.1
0.2
0.3
0
100
200
300
time
400
500
0.3
0.2
0.1
0.0
IBM log returns

0.1
0.2
0.3
0.0
0.1
0.2
0.3
IBM log returns
0.1
0.2
0.3
Time Plot of Monthly log Returns of IBM Stock from 1967,3 to 2008.12
100
200
300
time
400
500
Compute descriptive summary statistics

# Compute the sample mean
> mean(rt)
[1] 0.006208082

# Compute the sample variance
> var(rt)
[1] 0.005258775

# Compute the sample skewness
> skewness(rt)
[1] -0.1353432

# Compute the sample excess kurtosis
> kurtosis(rt)
[1] 1.693092

# Compute all descriptive summary statistics at once
> basicStats(rt)
rt
nobs 502.000000
NAs 0.000000
Minimum -0.303683
Maximum 0.302915
1. Quartile -0.037641
3. Quartile 0.048443
Mean 0.006208
Median 0.005260
Sum 3.116457
SE Mean 0.003237
LCL Mean -0.000151
UCL Mean 0.012567
Variance 0.005259
23
Stdev 0.072517
Skewness -0.135343
Kurtosis 1.693092

Perform test for mean return being zero

> t.test(rt)
# Let us perform the test that the mean log return is equal to zero
One Sample t-test

data: rt
t = 1.9181, df = 501, p-value = 0.05567
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.0001509194 0.0125670842
sample estimates:
mean of x
0.006208082

Perform normality test using the Jaque-Bera method
# Test for the normality assumption
> normalTest(rt,method=jb)
Title:
Jarque - Bera Normality Test
Test Results:
STATISTIC:
X-squared: 62.8363
P VALUE:
Asymptotic p Value: 2.265e-14

Perform skewness and kurtosis tests

Quit R

# Perform skewness test
> s3=skewness(rt)

> T=length(rt)
> tst=s3/sqrt(6/T)
> tst
[1] -1.237977

# Compute two-sided p-value of the test statistic
> pv=2*pnorm(tst)
> pv
[1] 0.2157246

# Perform kurtosis test
> k4=kurtosis(rt)
> tst=k4/sqrt(24/T)
> tst
[1] 7.743311

# quit R.
>q()

Data Mining: Looking for Hidden Patterns

& Predicting Future Outcomes:
Do We Still Need Time Series Analysis?

PART 3 - Time Series Examples - 7sept2016

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

PART 3 - Time Series Examples - 7sept2016

Transféré par

Droits d'auteur :

Formats disponibles

EXAMPLES

Ex 1.1.2 (All-star baseball games, 1933-1995)

1 if the American League won in year t

Ex 1.1.3 (Accidental deaths, USA; DEATHS.TSM)

Figure 1.3 : Monthly accidental deaths

Features: slight trend

Ex. 1.1.4 (Signal Detection; SIGNAL.TSM)

where N t is an IID sequence of N(0,.25) rv' s.

black= true signal

Example 4 Features: slight trend

Signal detection in noisy sinuisoidal data

Ex. 1.1.4 (Signal Detection; SIGNAL.TSM)

where N t is an IID sequence of N(0,.25) rv' s.

black= true signal

check model for goodness of fit

provides a compact description of the data

OBJECTIVES OF TIME SERIES

SOME SIMPLE TIME SERIES MODELS

1.3 Some Simple Time Series Models

Zero mean models

2nd Order Properties.

if {Xt} is an IID sequence with mean 0 and variance

Ex 1.3.2 (Binary Process).

where coefficients are estimated by minimizing

Ex. 1.3.4 (Population of USA.; USPOP.TSM)

Estimation via least squares.

where coefficients are estimated by minimizing

Ex. 1.3.4 (Population of USA.; USPOP.TSM)

Added linear trend

Figure 1.10. Residuals from the LS fit in previous

te: residuals do not look IID

Residuals from Least Squares fit

te: residuals do not look IID

(aj cos( jt) + bj sin( jt) ),

aj , bj are unknown parameters

are fixed frequencies, multiple of 2 d

Ex 1.1.6 (Accidental deaths, USA; DEATHS.TSM)

(aj cos( jt) + bj sin( jt) )

Figure 1.11 : Monthly accidental deaths

Stationary Process A Wide Class of Time Series

Let us look at IBM Returns, Foreign Exchange Rates,

There are some spikes or apparent extreme values

Foreign Exchange Rates

A Longer Series of IBM Returns

As these historical financial returns show, financial series

Such time series are non-linear.

Objectives of Time Series Analysis

Features of Sequential Data in Time or Financial Series

A General Approach to Time Series Modeling

Objectives of this Course

Topics of this Course

o Linear time series, stationary ARMA processes

o ARIMA models for non-stationary series

Topics of this Course (contd)

Topics of this Course (contd)

Examples of Financial Time Series

Daily log returns of Amazon stock:

Examples of Financial Time Series (contd)

Exchange rate between US Dollar vs. Euro or Japanese Yen

Computing: The primary package used in this class is R,

Set the working directory

# Set working directory Note the slash instead of a backslash

Load the library fBasics