Vous êtes sur la page 1sur 38

Econometric Modeling

Research Methods
Professor Lawrence W. Lan
Email: lawrencelan@mdu.edu.tw http://140.116.6.5/mdu/ Institute of Management

Outline
Overview Single-equation Regression Models Simultaneous-equation Regression Models Time-Series Models

Overview
Objectives Model building Types of models Criteria of a good model Data Desirable properties of estimators Methods of estimation Software packages and books

Objectives
Empirical verification of the theories in business, economics, management and related disciplines is becoming increasingly quantitative. Econometrics, or economic measurement, is a social science in which the tools of economic theory, mathematical statistics are applied to the analysis of economic phenomena. Focus on models that can be expressed in equation form and relating variables quantitatively. Data are used to estimate the parameters of the equations, and the theoretical relationships are tested statistically. Used for policy analysis and forecasting.

Model Building
Model building is a science and art, which serves for policy analysis and forecasting.
science: consists of a set of quantitative tools used to construct and test mathematical representations of the real world problems. art: consists of intuitive judgments that occur during the modeling process. No clear-cut rules for making these judgments.

Types of Models (1/4)


Time-series models
Examine the past behavior of a time series in order to infer something about its future behavior, without knowing about the causal relationships that affect the variable we are trying to forecast. Deterministic models (e.g. linear extrapolation) vs. stochastic models (e.g. ARIMA, SARIMA).

Types of Models (2/4)


Single-equation models
With causal relationships (based on underlying theory) in which the variable (Y) under study is explained by a single function (linear or nonlinear) of a number of variables (Xs) Y: explained or dependent variable Xs: explanatory or independent variables

Types of Models (3/4)


Simultaneous-equation models (or multiequation simulation models)
With causal relationships (based on underlying theory) in which the dependent variables (Ys) under study are related to each other as well as to a set of equations (linear or nonlinear) with a number of explanatory variables (Xs)

Types of Models (4/4)


Combination of time-series and regression models
Single-input vs. multiple-input transfer function models Linear vs. rational transfer functions Simultaneous-equation transfer functions Transfer functions with interventions or outliers

Criteria of a Good Model


Parsimony Identifiability Goodness of fit Theoretical consistency Predictive power

Data
Sample data: the set of observations from the measurement of variables, which may come from any number of sources and in a variety of forms. Time-series data: describe the movement of any variable over time. Cross-section data: describe the activities of any individual or group at a given point in time. Pooled data: a combination of time-series and cross-section data, also known as panel data, longitudinal or micropanel data.

Desirable Properties of Estimators


Unbiased: the mean or expected value of an estimator is equal to the true value. Efficient (best): the variance of an estimator is smaller than any other ones. Minimum mean square error (MSE): to trade off bias and variance. MSE is equal to the square of the bias and the variance of the estimator. Consistent: the probability limit of an estimator gets close to the true value. It is a large-sample or asymptotic property.

Methods of Estimation
Ordinary least squares (OLS) Maximum likelihood (ML) Weighted least squares (WLS) Generalized least squares (GLS) Instrumental variable (IV) Two-stage least squares (2SLS) Indirect least squares (ILS) Three-stage least squares (3SLS)

Software Packages and Books


LIMDEP: single-equation and simultaneous-equation regression models SCA: time series models Textbooks
(1) Damodar Gujarati, Essentials of Econometrics, 2nd ed. McGraw-Hill, 1999. (2) Robert S. Pindyck and Daniel L. Rubinfeld, Econometric Models and Economic Forecasts, 4th ed. McGraw-Hill, 1997.

Single-equation Regression Models


Assumptions Best Linear Unbiased Estimation (BLUE) Hypothesis testing Violations for assumptions 1 ~ 5 Forecasting

Assumptions
A1: (i) The relationship between Y and X is truly existent and correctly specified. (ii) Xs are nonstochastic variables whose values are fixed. (iii) Xs are not linearly correlated. A2: The error term has zero expected value for all observations. A3: The error term has constant variance for all observations A4: The error terms are statistically independent. A5: The error term is normally distributed.

Best Linear Unbiased Estimation


Gauss-Markov (GM) Theorem: Given assumptions 1, 2, 3, and 4, the estimation of the regression parameters by least squares (OLS) method are the best (most efficient) linear unbiased estimators. (BLUE) GM theorem applies only to linear estimators where the estimators can be written as a weighted average of the individual observations on Y.

Hypothesis Testing
Normal, Chi-square, t, and F distributions Goodness of fit Testing the regression coefficients (single equation) Testing the regression equation (joint equations) Testing for structural stability or transferability of regression models

A1(i) Violation -- Specification Error


Omitting irrelevant variables biased and inconsistent estimators Inclusion of irrelevant variables unbiased but inefficient estimators Incorrect functional form (nonlinearities, structural changes) biased and inconsistent estimators

A1(ii) Violation Xs Correlated with Error


OLS leads to biased and inconsistent estimators Criteria of good instrumental (proxy) variables Instrumental-variables estimation consistent, but no guarantee for unbiased or unique estimators Two-stage least squares (2SLS) estimation optimal instrumental variable, unique consistent estimators

A1(iii) Violation -- Multicollinearity


Perfect collinearity between any of Xs no solution will exist Near or imperfect multicollinearity large standard error of OLS estimators or wider confidence intervals; high R2 but few significant t values; wrong signs for regression coefficients; difficulty in explaining or assessing the individual contribution of Xs to Y.

Detection of Multicollinearity
Testing the significance of R-i2 from the various auxiliary regressions. F=[R-i2/(k-1)]/[(1-R-i2)/(n-k)], where n=number of observations, k=number of explanatory variables including the intercept.
Check if F-value is significantly different from zero. If yes (F-value > F-table), X-i and Xi are significantly collinear with each other.

Variance inflation factor (VIF = 1/(1-R-i2): VIF=1


representing no collinearity; if VIF>10 then high degree of multicollinearity

A2 Violation Measurement Error in Y


OLS will result in biased intercept; however, the estimated slope parameters are still unbiased and consistent. Correction for the dependent variable

A3 Violation -- Heteroscedasticity
It happens mostly for cross-sectional data; sometimes for time-series data. OLS will lead to inefficient estimation, but still unbiased. Can be corrected by weighted least squares (WLS) method Detection: Goldfeld-Quandt test, Breusch-Pagan test, White test, Park-Glejser test, Bartlett test, Peak test, Spearmans rank correlation test, etc.

A4 Violation -- Autocorrelation
It happens mostly for time-series data; sometimes for cross-sectional data. OLS will lead to inefficient estimation, but still unbiased. Can be corrected by generalized least squares (GLS) method Detection: Durbin-Watson test, runs test. (For lagged dependent variable, DW2 even when serial correlation, do not use DW test, use h test or t test instead)

A5 Violation Non-normality
Chi-square, t, F tests are not valid; however, these tests are still valid for large sample. Detection: Shapiro-Wilk test, AndersonDarling test, Jarque-Bera (JB) test. JB=(n/6)[S2 + (K-3)2/4] where n=sample size, K=kurtosis, S=skewness. (For normal, K=3, S=0) JB~ Chi-square distribution with 2 d.f.

Forecasting
Ex post vs. ex ante forecast Unconditional forecasting Conditional forecasting Evaluation of ex post forecast errors
means: root-mean-square error, root-mean-square percent error, mean error, mean percent error, mean absolute error, mean absolute percent error, Theils inequality coefficient variances: Akaike information criterion (AIC), Schwarz information criterion (SIC)

Simultaneous-equations Regression Models


Simultaneous-equation models Seemly unrelated equation models Identification problem

Simultaneous-equations Models
Endogenous variables exist on both sides of the equations Structural model vs. reduced form model OLS will lead to biased and inconsistent estimation; indirect least squares (ILS) method can be used to obtain consistent estimation Three-stage least squares (3SLS) method will result in consistent estimation 3SLS often performs better than 2SLS in terms of estimation efficiency

Seemly Unrelated Equation Models


Endogenous variables appear only on the left hand side of equations OLS usually results in unbiased but inefficient estimation Generalized least squares (GLS) method is used to improve the efficiency Zellner method

Identification Problem
Unidentified vs. identified (over identified and exactly identified) Order condition Rank condition

Time-series Models
Time-series data Univariate time series models Box-Jenkins modeling approach Transfer function models

Time-series Data
Yt: A sequence of data observed at equally spaced time interval Stationary vs. non-stationary time series Homogeneous vs. non-homogeneous time series Seasonal vs. non-seasonal time series

Univariate Time Series Models


Types of models: white noise model, autoregressive (AR) models, moving-average (MA) models, autoregressive-moving average (ARMA) models, integrated autoregressivemoving average (ARIMA) models, seasonal ARIMA models Model identification: MA(q) sample autocorrelation function (ACF) cuts off; AR(p) sample partial autocorrelation function (PACF) cuts off; ARMA(p,q) both ACF and PACF die out

Box-Jenkins Modeling Approach


Tentative model identification (p, q) extended sample autocorrelation function (EACF) Estimation (maximum likelihood estimation conditional or exact) Diagnostic checking (t, R2, Q tests, sample ACF of residuals, residual plots, outlier analysis) Application (using minimum mean squared error forecasts)

Transfer Function Models


Single input (X) vs. multiple input (Xs) models Linear transfer function (LTF) vs. rational transfer function (RTF) models Model identification (variables to be used; b, s, r for each input variable using corner table method; ARMA model for the noise) Model estimation: maximum likelihood estimation (conditional or exact) Diagnostic checking: cross correlation function (CCF) Forecasting: simultaneous forecasting

Simultaneous Transfer Function (STF) Models


Purposes (to facilitate forecasting and structural analysis of a system, and to improve forecast accuracy) Yt and Xt can be both endogenous variables in the system Use LTF method for model identification, FIML for estimation, CCM (cross correlation matrices) for diagnostic checking, simultaneous forecasting

Transfer Function Models with Interventions or Outliers


Additive Outlier (AO) Level Shift (LS) Temporary Change (TC) Innovational Outlier (IO) Intervention models

Vous aimerez peut-être aussi