Vous êtes sur la page 1sur 484

Using Excel

For Principles of Econometrics, Fourth Edition


Using Excel
For Principles of Econometrics, Fourth Edition

GENEVIEVE BRIAND
Washington State University

R. CARTER HILL
Louisiana State University

JOHN WILEY & SONS, INC


New York I Chichester I Weinheim I Brisbane I Singapore I Toronto
Genevieve Briand dedicates this work to Tom Trulove

Carter Hill dedicates this work to Todd and Peter

This book was set by the authors.

To order books or for customer service call 1-800-CALL-WILEY (225-5945)

Copyright© 2010, 2011 John Wiley & Sons, Inc. All rights reserved. No part of this
publication rnay be reproduced, stored in a retrieval system or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act,
without either the prior written permission of the Publisher, or authorization through
payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc. 222
Rosewood Drive, Danvers, MA 01923, website www.copyright.corn. Requests to the
Publisher for permission should be addressed to the Permissions Department, John Wiley
& Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, (201)748-6011, fax (201)748-
6008, website http://www.wiley.corn/go/permissions.

ISBN-13 978-111-803210-7

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1
Preface

This book is a supplement to Principles of Econometrics, 4th Edition by R. Carter Hill, William E.
Griffiths and Guay C. Lim (Wiley, 2011). This book is not a substitute for the textbook, nor is it a
stand alone computer manual. It is a companion to the textbook, showing how to perform the
examples in the textbook using Excel 2007. This book will be useful to students taking
econometrics, as well as their instructors, and others who wish to use Excel for econometric
analysis.

In addition to this computer manual for Excel, there are similar manuals and support for the
software packages EViews, Gretl, Shazam, and Stata. In addition, all the data for Principles of
Econometrics, lh in various formats, including Excel, are available at
http://www.wiley.com/college/hill. Individual data files, as well as errata for this manual and the
textbook, can also be found at http://principlesofeconometrics.com.

The chapters in this book parallel the chapters in Principles of Econometrics, lh. Thus, if you
seek help for the examples in Chapter 11 of the textbook, check Chapter 11 in this book.
However within a Chapter the sections numbers in Principles of Econometrics, lh do not
necessarily correspond to the Excel manual sections.

This work is a revision of Using Excel 2007 for Principles of Econometrics, 3rd Edition by
Genevieve Briand and R. Carter Hill (Wiley, 2010). Genevieve Briand is the corresponding
author.

We welcome comments on this book, and suggestions for improvement. *

Genevieve Briand
School of Economic Sciences
Washington State University
Pullman, WA 99164
gbriand@wsu.edu

R. Carter Hill
Economics Department
Louisiana State University
Baton Rouge, LA 70803
eohill@lsu.edu

·
Microsoft product screen shot(s) reprinted with permission from Microsoft Corporation. Our use does not directly or indirectly imply
Microsoft sponsorship, affiliation, or endorsement.

iv
BRIEF CONTENTS

1. Introduction to Excel 1

2. The Simple Linear Regression Model 19

3. Interval Estimation and Hypothesis Testing 67

4. Prediction, Goodness-of-Fit and Modeling Issues 95

5. The Multiple Linear Regression 143

6. Further Inference in the Multiple Regression Model 154

7. Using Indicator Variables 180

8. Heteroskedasticity 204

9. Regression with Time Series Data: Stationary Variables 228

10. Random Regressors and Moment-Based Estimation 262

11. Simultaneous Equations Models 278

12. Nonstationary Time-Series Data and Cointegration 294

13. Vector Error Correction and Vector Autoregressive Models 310

14. Time-Varying Volatility and ARCH Models 328

15. Panel Data Models 355

16. Qualitative and Limited Dependent Variable Models 391

A. Mathematical Tools 402

B. Review of Probability Concepts 416

C. Review of Statistical Inference 431

Index 466

v
CONTENTS 2.4.1 Model Assumptions 45
2.4.2 Random Number Generation
47
CHAPTER 1 Introduction to Excel 1
2.4.3 The LINEST Function 49
1.1 Starting Excel 1
2.4.4 Repeated Sampling 50
1.2 Entering Data 3
2.5 Variance and Covariance ofb1 and b2
1.3 Using Excel for Calculations 3
52
1.3.1 Arithmetic Operations 3
2.6 Nonlinear Relationships 53
1.3.2 Mathematical Functions 4
2.6.1 A Quadratic Model 53
1.4 Editing your Data 6
2.6.la Estimating the Model
1.5 Saving andPrinting your Data 8
53
1.6 Importing Data into Excel 10
2.6.lb ScatterPlot ofData
1.6.1 Resources for Economists
with Fitted Quadratic
on the Internet 10
Relationship 55
1.6.2 Data Files forPrinciples of
2.6.2 A Log-Linear Model 57
Econometrics 13
2.6.2a Histograms ofPRICE
1.6.2a John Wiley & Sons
and ln(PRJCE) 57
Website 13
2.6.2b Estimating the Model
1.6.2bPrinciples of
61
Econometrics Website
2.6.2c ScatterPlot ofData
14
with Fitted Log­
1.6.3 Importing ASCII Files 14
Linear Relationship
62
CHAPTER 2 The Simple Linear Regression 2.7 Regression with Indicator Variables 63
Model 19 2.7.1 Histograms ofHousePrices
2.1 Plotting the Food Expenditure Data 19 63
2.1.1 Using Chart Tools 21 2.7.2 Estimating the Model 65
2.1.2 Editing the Graph 23
2.1.2a Editing the Vertical
CHAPTER 3 Interval Estimation and
Axis 23
Hypothesis Testing 67
2.1.2b Axis Titles 24
3.1 Interval Estimation 68
2.1.2c Gridlines and Markers
3.1.1 The t-Distribution 68
25
3.1.1a The t-Distribution
2.1.2d Moving the Chart
versus Normal
26
Distribution 68
2.2 Estimating a Simple Regression 27
3.1.1b t-Critical Values and
2.2.1 Using Least Squares
Interval Estimates
Estimators' Formulas 27
69
2.2.2 Using Excel Regression
3.1.1c Percentile Values
Analysis Routine 31
69
2.3 Plotting a Simple Regression 34
3.1.1d TINY Function 69
2.3.1 Using TwoPoints 34
3.1.le Appendix E: Table 2
2.3.2 Using Excel Built-in Feature
inPOE 71
38
3.1.2 Obtaining Interval Estimates
2.3.3 Using a Regression Option
71
38
3.1.3 An Illustration 71
2.3.4 Editing the Chart 40
2.4 Expected Values of b1 and b2 44

vi
3.1.3a Using the Interval 3.4.1 Thep-Value Rule 88
Estimator Formula 3.4.1a Definition ofp-value
71 88
3.1.3b Excel Regression 3.4.1b Justification for thep­
Default Output 73 Value Rule 89
3.1.3c Excel Regression 3.4.2 The TDIST Function 91
Confidence Level 3.4.3 Examples of Hypothesis Tests
Option 74 Revisited 92
3.1.4 The Repeated Sampling 3.4.3a Right-Tail Test from
Context (Advanced Material) Section 3.3.1b 92
75 3.4.3b Left-Tail Test from
3.1.4a Model Assumptions Section 3.3.2 92
75 3.4.3c Two-Tail Test from
3.1.4b Repeated Random Section 3.3.3a 93
Sampling 75 3.4.3d Two-Tail Test from
3.1.4c The LINEST Function Section 3.3.3b 93
Revisited 77
3.1.4d The Simulation
CHAPTER 4 Prediction, Goodness-of-Fit
Template 78
and Modeling Issues 95
3.1.4e The IF Function 79
4.1 Least Squares Prediction 96
3.1.4f The OR Function 79
4.2 Measuring Goodness-of-Fit 98
3.1.4g The COUNTIF
4.2.1 Coefficient of Determination
Function 80
or R2 98
3.2 Hypothesis Tests 81
4.2.2 Correlation Analysis and R2
3.2.1 One-Tail Tests with
98
Alternative "Greater Than" (>)
4.2.3 The Food Expenditure
81
Example and the CORREL
3.2.2 One-Tail Tests with
Function 99
Alternative "Less Than"(<)
4.3 The Effects of Scaling the Data 100
82
4.3.1 Changing the Scale of x 100
3.2.3 Two-Tail Tests with
4.3.2 Changing the Scale ofy 101
Alternative "Not Equal To"(:1:)
4.3.3 Changing the Scale of x andy
82
102
3.3 Examples of Hypothesis Tests 82
4.4 A Linear-Log Food Expenditure Model
3.3.l Right-Tail Tests 83
104
3.3.la One-Tail Test of
4.4.l Estimating the Model 104
Significance 84
4.4.2 Scatter Plot of Data with Fitted
3.3.lb One-Tail Test of an
Linear-Log Relationship 105
Economic Hypothesis
4.5 Using Diagnostic Residual Plots 108
84
4.5.1 Random Residual Pattern
3.3.2 Left-Tail Tests 84
108
3.3.3 Two-Tail Tests 86
4.5.2 Heteroskedastic Residual
3.3.3a Two-Tail Test of an
Pattern 111
Economic Hypothesis
4.5.3 Detecting Model Specification
87
Errors 112
3.3.3b Two-Tail Test of
4.6 Are the Regression Errors Normally
Significance 87
Distributed? 115
3.4 Thep-Value 88

vii
4.6.1 Histogram of the Residuals 5.3.2a Left-Tail Test of
115 Elastic Demand
4.6.2 The Jarque-Bera Test for 146
Normality using the CHINV 5.3.2b Right-Tail Test of
and CHIDIST Functions 118 Advertising
4.6.3 The Jarque-Bera Test for Effectiveness 147
Normality for the Linear-Log 5.4 Polynomial Equations: Extending the
Food ExpenditureModel 121 Model for Burger Barn Sales 148
4.7 PolynomialModels: An Empirical 5.5 Interaction Variables 149
Example 122 5.5.1 LinearModels 149
4.7.1 Scatter Plot of Wheat Yield 5.5.2 Log-LinearModels 151
over Time 123 5.6 Measuring Goodness-of-Fit 153
4.7.2 The Linear EquationModel
125
CHAPTER 6 Further Inferenee in the
4.7.2a Estimating theModel
Multiple Regression Model 154
125
6.1 Testing the Effect of Advertising: the F­
4.7.2b Residuals Plot 126
test 154
4.7.3 The Cubic EquationModel
6.1.1 The Logic of the Test 154
126
6.1.2 The Unrestricted and
4.7.3a Estimating theModel
RestrictedModels 155
126
6.1.3 Test Template 158
4.7.3b Residuals Plot 128
6.2 Testing the Significance of theModel
4.8 Log-LinearModels 129
159
4.8.1 A Growth Model 129
6.2.1 Null and Alternative
4.8.2 A Wage Equation 130
Hypotheses 159
4.8.3 Prediction 132
6.2.2 Test Template 159
4.8.4 A Generalized R2Measure
6.2.3 Excel Regression Output 160
135
6.3 The Relationship between t- and F-Tests
4.6.5 Prediction Intervals 136
161
4.9 A Log-LogModel: Poultry Demand
6.4 Testing Some Economic
Equation 139
Hypotheses 163
4.9.1 Estimating theModel 139
6.4.1 The Optimal Level of
4.9.2 A Generalized R2Measure
Advertising 163
140
6.4.2 The Optimal Level of
4.9.3 Scatter Plot of Data with Fitted
Advertising and Price 164
Log-Log Relationship 140
6.5 The Use of Nonsample Information
166
CHAPTER 5 The Multiple Linear Regression 6.6 Model Specification 167
143 6.6.1 Omitted Variables 167
5.1 Least Squares Estimates Using the 6.6.2 Irrelevant Variables 169
Hamburger Chain Data 143 6.6.3 The RESET Test 172
5.2 Interval Estimation 145 6.7 Poor Data, Collinearity and
5.3 Hypothesis Tests for a Single Coefficient Insignificance 176
145 6.7.1 CorrelationMatrix 176
5.3.1 Tests of Significance 145 6.7.2 The CarMileageModel
5.3.2 One-Tail Tests 146 Example 177

viii
CHAPTER 7 Using Indicator Variables 180 8.4.2 Grouped Data: Wage Equation
7.1 Indicator Variables: The University Example 222
Effect on House Prices Example 180 8.4.2a Separate Wage
7.2 Applying Indicator Variables 182 Equations for
7.2.1 Interactions Between Metropolitan and
Qualitative Factors 182 Rural Areas 222
7.2.2 Qualitative Factors with 8.4.2b GLS Wage Equation
Several Categories 185 223
7.2.3 Testing the Equivalence of 8.5 Generalized Least Squares: Unknown
Two Regressions 187 Form of Variance 224
7.3 Log-Linear Models: a Wage Equation
Example 191
CHAPTER 9 Regressions with Time Series
7.4 The Linear Probability Model: A
Data: Stationary Variables 228
Marketing Example 192
9.1 Finite Distributed Lags 228
7.5 The Difference Estimator: The Project
9.1.1 US Economic Time Series
STAR Example 193
228
7.6 The Differences-in-Differences
9.1.2 An Example: The Okun's Law
Estimator: The Effect of Minimum Wage
230
Change Example 198
9.2 Serial Correlation 232
9.2.1 Serial Correlation in Ouput
CHAPTER 8 Heteroskedasticity 204 Growth 232
8.1 The Nature ofHeteroskedasticity 204 9.2.la Scatter Diagram for Gt
8.2 Detecting Heteroskedasticity 206 and Gt-1 232
8.2.1 Residual Plots 206 9.2.lb Correlogram for G
8.2.2 Lagrange Multiplier Tests 233
206 9.2.2 Serially Correlated Errors
8.2.2a Using the Lagrange 237
Multiplier or Breusch­ 9.2.2a Australian Economic
Pagan Test 206 Time Series 237
8.2.2b Using the White Test 9.2.2b A Phillips Curve
209 239
8.2.3 The Goldfeld-Quandt 9.2.2c Correlogram for
Test 210 Residuals 240
8.2.3a The Logic of the Test 9.3 Lagrange Multiplier Tests for Serially
210 Correlated Errrors 241
8.2.3b Test Template 211 9.3.1 !-Test Version 241
8.2.3c Wage Equation 9.3.2 T x R2 Version 243
Example 212 9.4 Estimation with Serially Correlated
8.2.3d Food Expenditure Errors 245
Example 216 9.4.1 Generalized Least Squares
8.3 Heteroskedasticity-Consistent Standard Estimation of an AR(1) Error
Errors or the White Standard Errors Model 245
219 9.4.la The Prais-Winsten
8.4 Generalized Least Squares: Known Form Estimator 245
of Variance 221 9.4.lb The Cochrane-Orcutt
8.4.1 Variance Proportional to x: Estimator 248
Food Expenditure Example 9.4.2 Autoregressive Distributed
221 Lag (ARDL) Model 252

ix
9.5 Forecasting 254 11.1.2a 2SLS Estimates for
9.5.1 Using an Autoregressive (AR) Truffle Demand
Model 254 281
9.5.2 Using an Exponential 11.1.2b 2SLS Estimates for
Smoothing Model 257 Truffle Supply
9.6 Multiplier Analysis 258 283
11.2 Supply and Demand Model for the
Fulton Fish Market 286
CHAPTER 10 Random Regressors and
11.2.1 The Reduced Form Equations
Moment-Based Estimation 262
286
10.1 OLS Estimation of a Wage Equation
11.2.la Reduced Form
262
Equation for lnQ
10.2 Instrumental Variables Estimation of the
286
Wage Equation 264
11.2.1b Reduced Form
10.2.1 With a Single Instrument 264
Equation for lnP
10.2.la First Stage Equation
287
for EDUC 264
11.2.2 The Structural Equations or
10.2.lb Stage 2 Least
Stage 2 Least Squares
Squares Estimates
Estimates 290
265
11.2.2a 2SLS Estimates for
10.2.2 With a Surplus Instrument
Fulton Fish Demand
268
290
10.2.2a First Stage Equation
for EDUC 268
10.2.2b Stage 2 Least CHAPTER 12 Nonstationary Time-Series
Squares Estimates Data and Cointegration 294
270 12.1 Stationary and Nonstationary
10.3 Specification Tests for the Wage Variables 294
Equation 273 12.1.1 US Economic Time Series
10.3.1 The Hausman Test 273 294
10.3.2 Testing Surplus Moment 12.1.2 Simulated Data 296
Conditions 274 12.2 Spurious Regressions 299
12.3 Unit Root Tests for Stationarity 301
12.4 Cointegration 306

CHAPTER 11 Simultaneous Equations


Models 278 CHAPTER 13 Vector Error Correction and
11.1 Supply and Demand Model for Truffles Vector Autoregressive Models 310
278 13.1 Estimating a VEC Model 310
11.1.1 The Reduced Farm Equations 13.1.1 Test for Cointegration 312
279 13.1.2 The VEC Model 315
11.1.1a Reduced Farm 13.2 Estimating a VAR Model 317
Equation for Q 13.2.1 Test for Cointegration 318
279 13.2.2 The VAR Model 321
11.1.1b Reduced Farm 13.3 Impulse Responses Functions 323
Equation for P 13.3.1 The Univariate Case 323
280 13.3.2 The Bivariate Case 325
11.1.2 The Structural Equations or
Stage 2 Least Squares
Estimates 281

x
CHAPTER 14 Time-Varying Volatility and 15.4.3 Estimation: Different
ARCH Models 328 Coefficients, Different Error
14.1 Time-Varying Volatility 328 Variances 384
14.1.1 Returns Data 328 15.4.4 Seemingly Unrelated
14.1.2 Simulated Data 334 Regressions: Testing for
14.2 Testing and Forecasting 341 Contemporaneous Correlation
14.2.1 Testing for ARCH Effects 388
341
14.2.la Time Series and
CHAPTER 16 Qualitative and Limited
Histogram 342
Dependent Variable Models 391
14.2.lb Lagrange Multiplier
16.1 Least Squares Fitted Linear Probability
Test 344
Model 391
14.2.2 Forecasting Volatility 347
16.2 Limited Dependent Variables 393
14.3 Extensions 349
16.2.1 Censored Data 393
14.3.1 The GARCH Model 349
16.2.2 Simulated Data 395
14.3.2 The T-GARCH Model 350
14.3.3 The GARCH-In-Mean Model
352 APPENDIX A Mathematical Tools 402
A. I Mathematical Operations 402
A.1.1 Exponents 408
CHAPTER 15 Panel Data Models 355
A.1.2 Scientific Notation 409
15.1 Pooled Least Squares Estimates of Wage
A.1.3 Logarithm and the Number e
Equation 355
410
15.2 The Fixed Effects Model 357
A.2 Percentages 413
15.2.1 Estimates of Wage Equation
for SmallN 357
15.2.la The Least Squares APPENDIX B Review of Probability
Dummy Variable Concepts 416
Estimator for Small B.1 Binomial Probabilities 416

N 357 B.1.1 Computing Binomial

15.2.lb The Fixed Effects Probabilities Directly 417


Estimator: Estimates B.1.2 Computing Binomial
of Wage Equation Probabilities Using

forN=lO 361 BINOMDIST 419


15.2.2 Fixed Effects Estimates of B.2 The Normal Distributions 422
Wage Equation from Complete B.2.1 The STANDARDIZE
Panel 365 Function 422

15.3 The Random Effects Model 371 B.2.2 The NORMSDIST

15.3.1 Testing for Random Effects Function 423


371 B.2.3 The NORMSINV
15.3.2 Random Effects Estimation of Function 423

the Wage Equation 373 B.2.4 The NORMDIST


15.4 Sets of Regression Equations 381 Function 424
15.4.1 Estimation: Equal Coefficients, B.2.5 The NORMINV

Equal Error Variances 381 Function 424

15.4.2 Estimation: Different B.2.6 A Template for Normal


Coefficients, Equal Error Distribution Probability

Variances 383 Calculations 424

xi
B.3 Distributions Related to the Normal
426
B.3.1 The Chi-Square Distribution
426
B.3.2 The t-Distribution 428
B.3.3 The F-Distribution 429

APPENDIX C Review of Statistical Inference


431
C.1 Examining a Sample of Data 431
C.2 Estimating Population Parameters 436
C.2.1 Creating Random Samples
436
C.2.2 Estimating a Population Mean
438
C.2.3 Estimating a Population
Variance 438
C.2.4 Standard Error of the Sample
Mean 439
C.3 The Central Limit Theorem 439
C.4 Interval Estimation 444
C.4.1 Interval Estimation with u2

unkown 446
C.4.2 Interval Estimation with the
Hip Data 447
C.5 Hypothesis Tests About a Population
Mean 449
C.5.1 An Example 450
C.5.2 The p-value 450
C.5.3 A Template for Hypothesis
Tests 451
C.6 Other Useful Tests 454
C.6.1 Simulating Data 454
C.6.2 Testing a Population Variance
456
C.6.3 Testing Two Population Means
459
C.6.4 Testing Two Population
Variances 461
C.7 Testing Population Normality 463
C.7.1 A Histogram 463
C.7.2 The Jacque-Bera Test 465

Index 467

xii
CHAPTER 1

Introduction to Excel

CHAPTER OUTLINE
1.1 Starting Excel 1.6 Importing Data into Excel
1.2 Entering Data 1.6.1 Resources for Economists on the Internet
1.3 Using Excel for Calculations 1.6.2 Data Files for Principles of Econometrics
1.3.1 Arithmetic Operations 1.6.2a John Wiley & Sons Website
1.3.2 Mathematical Functions 1.6.2b Principles of Econometrics Website
1.4 Editing your Data 1.6.3 Importing ASCII Files
1.5 Saving and Printing your Data

1.1 STARTING EXCEL

Find the Excel shortcut on your desktop. Double click on it to start Excel (left clicks).

Alternatively, left-click the Start menu at the bottom left comer of your computer screen.

i1/,; Sta rt
... " ' .:,!o., ""

Slide your mouse over All programs, Microsoft Office, and finally Microsoft Office Excel
2007. Left-click on this last one to start Excel-or better yet, if you would like to create a
shortcut, right-click on it; slide your mouse over Send to, and then select (i.e. drag your mouse
over and left-click on) Desktop (create shortcut). An Excel 2007 short-cut is created on your
desktop. If you right-click on your shortcut and select Rename, you can also type in a shorter
name like Excel.

1
2 Chapter 1

Excel opens to a new file, titled Book I. You can find the name of the open file on the very top of
the Excel window, on the Title bar. An Excel file like Bookl contains several sheets. By default,
Excel opens to Sheet I of Book I. You can figure out which sheet is open by looking at the Sheet
tabs found in the lower left comer of your Excel window.

- "

title bar fcrmula bar help button


$ty/es
1-0 cell reference group of
II c1>mmand.s
v
ll_
11

There are lots of little bits that you will become more familiar with as we go along. The Active
cell is surrounded by a border and is in Column A and Row I; its Cell reference is Al.

Below the title bar is a Tab list. The Home tab is the one Excel opens to. Under each tab you
will find groups of commands. Under the home tab, the first one is the Clipboard group of
commands, named after the tasks it relates to. The wide bar including the tab list and the groups
of commands is referred to as the Ribbon. The content of the Active cell shows up in the
Formula bar (right now, there is nothing in it). Perhaps the most important of all of this is to
locate the Help button on the upper right comer of the Excel window. Finally, you can use the
Scroll bars and the arrows around them to navigate up-down and right-left in your worksheet.
And you have a long way to go: each worksheet in Microsoft Excel 2007 contains 1,048,576
rows and 16,384 columns!!!!

Note that your Ribbon might look slightly different than the one shown above. If your screen is
bigger, Excel will automatically display more of its available options. For example, in the Styles
group of command, instead of the Cell styles button, you might have a colorful display of cell
styles.
Introduction to Excel 3

1.2 ENTERING DATA

We will use Excel to analyze data. To enter labels and data into an Excel worksheet move the
cursor to a cell and type. First type X in cell Al. Press the Enter key on your keyboard to get to
cell A2 or navigate by moving the cursor with the mouse, or use the Arrow keys (to move right,
left, up or down). Fill in the rest as shown below:

1
2
3
4
s

1.3 USING EXCEL FOR CALCULATIONS

What is Excel good for? Its primary usefulness is to carry out repeated calculations. We can add,
subtract, multiply and divide; and we can apply mathematical and statistical functions to the data
in our worksheet. To illustrate, we are going to compute the squares of the numbers we just
entered and then add them up. There are two main ways to perform calculations in Excel. One is
to write formulas using arithmetic operators; the other is to write formulas using mathematical
functions.

1.3.1 Arithmetic Operations

Select the Excel Help button in the upper right comer of your screen. In the window of the Excel
Help dialog box that pops up, type arithmetic operators and select Search. In the list of results,
select Calculation operators and precedence.

�Excel He.Ip
R.esults 1-25 �f l'J
- l!ll x (� ... �) �) � � Ai
arithmetic-0perators '_formulas

Standard arithmetic operators are defined as shown below. To close the Excel help dialog box,
select the X button found on its upper right comer.

Anthmetic operator Mear\ln'!ll Example

... (i>lus sign) Addition :J..f.J

- (minus sign) Subtract10J11 3-1

Negation -1

"(asterisk) MuJtlplicalicm '3"3

I (forward slash) DlVISilln JIJ

% (percent sign) Percent 2.0%


- r:l


.. (caret) ExponentiaUo-n 3"2
4 Chapter 1

Place your cursor in cell Bl, and type X-squared. In cells B2 through B6 below (henceforth
referred to as B2:B6), we are going to compute the squares of the corresponding values from cells
A2:A6. Let us emphasize that the trick to using Excel efficiently is NOT to re-type values already
stored in the worksheet, but instead to use references of cells where the values are stored. So, to
compute the square of 1, which is the value stored in cell Al, instead of using the formula =l*l,
you should use the formula =A2*A2 or =A2"2. Place your cursor in cell B2 and type the formula.

SUM
.. ( x "" f;o I =A2"2
A I B j c I D I
1 )(
2 1] ill •

Then press Enter. Note that: (1) a formula always starts with an equal sign; this is how Excel
recognizes it is a formula, and (2) formulas are not case sensitive, so you could also have typed
=a2"2 instead. Now, we want to copy this formula to cells B3:B6. To do that, place your cursor
back into cell B2, and move it to the south-east comer of the cell, until the fat cross turns into a
skinny one, as shown below:
A I B � c
1 x X-s91.1•nea
2.
11 11
,_ f
3 .2

Left-click, hold it, drag it down to the next four cells below, and release!

Excel has copied the formula you typed in cell B2 into the cells below. The way Excel
understands the instructions you gave in cell B2 is "square the value found at the address A2".
Now, it is important to understand how Excel interprets "address A2". To Excel "address A2"
means "from where you are at, go left by one cell"-because this is where A2 is located vis-a-vis
B2. In other words, an address gives directions: left-right, up-down, and distances: number of
cells away-all in reference to the cell where the formula is entered. So, when we copied the
formula we entered in cell B2, which instructed Excel to collect the value stored one-cell away
from its left, and then square it-those exact same instructions were given in cells B3:B6. If you
place your cursor back into B3, and look at the Formula bar, you can see that, in this cell, these
same instructions translate into "=A3"2".

B.3 ... (. /.I =A311.2


I A I B j c I D I
1 x X::s9':_ared
-
2 1 1
-
l
21 4! �

1.3.2 Mathematical Functions

There are a large number of mathematical functions. Again, the list of functions available in
Excel can be found by calling upon our good friend Help button and type Mathematical
functions. If you try it, you will be able to see that the list is long. We will not copy it here.
Introduction to Excel 5

We did compute the squares of the numbers we had. Now we will add them up-the numbers,
and the squares of the numbers, separately. For that, we will be using the SUM function.

We first need to select or highlight all the numbers from our table. There are several ways to
highlight cells. For this small area the easiest way is to place your cursor in A2, hold down the
left mouse button and drag it across the area you wish to highlight-i.e. all the way to cell B6.
Here is how your worksheet should look like:

A B I
1 x X-sauared

2 1 1

a 2 4

4 3 '9

5 4 16

6 5 025 •

Next, go to the Editing group of command, which is found in the extreme right of the Home tab,
and select :r. AutoSum.

i%Aut�� �
!ii f!IC:!:"- Z1f'
Sort & Find &
Cl;ear •
Hitt r • Selt:d •

Editing

Excel sums the numbers from each column and places the sum in the bottom cell of each column.
The result is:
-

.A El I
1 x X-squared

2 1 1

3 2 4

4 3 9

5 4 16

5 5 2.5

7 15 55
..

Notice that if you select the arrow found to the right of :r. AutoSum you can find a list of
additional calculations that Excel can automatically perform for you.

Alternatively, you could have placed your cursor in cell A7, typed =SUM(A2:A6), and pressed
the Enter key (and then copied this formula to cell B7).

A I B
7 l=SUM(A2:::" 6)

Note that: (1) as soon as you type the first letter of your function, a list of all the other available
functions that start with the same letter pops up. This can be very useful: if you left click on any
of them, Excel gives you its definition; if you double left-click on any of them, it automatically
finishes typing the function name for you, and (2) once the function name and the opening
parenthesis are typed, Excel reminds you of what the needed Arguments are, i.e. what else you
need to specify in your function to use it properly.
6 Chapter 1

Now, you could also have used the Insert function button, which you can find on the left side of
the Formula bar .

Once your cursor is placed in A 7, select the Insert function button. An Insert function dialog
box pops up. You can Select a function you need (highlight it, and select OK), or Search for a
function first (follow the instructions given in that window).

- --- -- - __

Ins-ert Function �l'.EJ


s_e,,,rch 'fur a function:
Tyl?e a.brief' desaiption raf what you "-•mt to do and ther> dick
Gg [
Or select a 93tegcry : J Mo•t Re<0en tly
�-------�
u..,d

Select a funttiC!JQ_:

"I

In the Function Arguments dialog box that pops up, you need to specify the cell references of
the values you want to add. If they are not already properly specified, you can type A2:A6 in the
Number 1 window, or place your cursor in the window, delete whatever is in it, and then select
A2:A6. Select OK. Now that you have the formula in A7, copy it into B7 .

. -

Functforn Arguments - CTJ�


SUM

Number1 jA2::A6

1.4 EDITING YOUR DATA

Before wrapping-up, you want to polish the presentation of your data. It actually has less to do
with appearance than with organization and communication. You want to make sure that anyone
can easily make sense of your table (like your instructor for example, or yourself for that
matter-when you come back to it after you let it sit for a while).

We are going to add labels and color/shade to our table. Hold your cursor over cell A until it turns
into an arrow-down; left-click to select the whole column; and select Insert in the Cells group of
commands, found left to the Editing group of commands.

JS.:i.-

·n � l g iH
1 x
2 l 2. 1 [ns_ert De�.e1e li'o�at

3 z 2
_3
4 3 4 3 C:�ll•

Excel adds a new column to the left of the one you selected. That's where we are going to write
our labels. In the new Al cell, type Variables; in cell A2, type Values; in cell A7 type Sum .
Introduction to Excel 7

A B A
1 x 1 v.a�iables
-

2 1 -
2 Values
-

3 -
2 3
4 3 -
4
5 4 5
5 5 5

L 15 7 Sum

Select column A again, make it Bold (Font group of commands, right to the Clipboard one), and
align it Left (Alignment group of commands, right to the Font one).

caribri �l I A � •
[= = =lJ�· / � wrapT�xt
�I Ir T1[03 Tl[&� ,A �/ Ii([§ �J I �� ��l
-

Font fii ·
Al1gnme-nt

Select cells Bl and Cl, and make them Bold. Repeat with cells B7 and C7. Better, but not there
yet. Select row 7, make it Italic (next to Bold). Select column B, hold your left-click and drag
your mouse over cell C to select column C too; select Center alignment (next to Left). Next,
select A2:A6; left-click the arrow next to Merge & Center (on the Alignment group of
commands), and select Merge cells.

Immediately after, select Middle Align, which is found right above the Center alignment button.

AllJJnm�nt

Select Al:C7, left-click the arrow next to the Bottom Border button and select All Borders.

61),r.ilers

BJ llQtl.Om Bo·rder

t::i::! Top_ B.order

E':: !•ft Bcrd\'r


EJ Ri!<hl Be rder
jca;lrnri ·�· IK .A1
No l±lorder
a Tl
i .. : .• ;

f B1 I 1! T1u::n hrT I EB �II Bordie��



fnnt r. EB Ocrt1iok Borden

Select A7:C7 (A7:C7, not Al:C7 this time), left-click the arrow next to the Fill Color button,
and select a grey color to fill in the cell with. Choose a different color for Al:Cl.
8 Chapter 1

Theme Colms

[caJilbri T 111
rA ATJ
T

le I JI ·j I � �1 �. A 1 �

Fant � Ii

Finally, put your cursor between cells C and D until it turns to a left and right arrow as shown
here:
C + D

Hold it there and double left-click so that the width of column C gets resized to better
accommodate the length of the label "X-squared". The result is:

A B c
rtl. -
- 1-- --

1 variables x X-squared
1 1

�"''"�
2 4
3 9
4 16
5 25
7 fsum 15 55

Next, drag your cursor over the Sheetl tab, right-click, select Rename and type in a descriptive
name for your worksheet like Excel for POE 1.2-1.4, for Using Excel for Principles of
Econometrics, 4e-sections 1.2 through 1.4. Press the Enter key on your keyboard or left-click
anywhere on your worksheet.

n Excel for 00£ 1.2-:1�"1- / 5heet2 • �


I 1

1.5 SAVING AND PRINTING YOUR DATA

All you need to do now is to save your Excel file. Select the Save button on the upper left comer
of the Excel window.

A Save As dialog box pops up. Locate the folder you want to save your file in by using the
arrow-down located at the extreme right of the Save in window or browsing through the list of
folders displayed below it.
Introduction to Excel 9

In the File name window, at the bottom of the Save As dialog box, the generic name Bookl
should be outlined. Type the descriptive name you would like to give to your Excel file, like POE
Chapter 1. Finally, select Save.

File name: lsaOl!J F.de: name: I chapter


POE 1
�==== I I
Save as !.Ype; I Excel WorkbMk Save as :[)lpe: I Excel Workbook

If you need to create a new folder, use the Create New Folder button found to the right of the
Save in window.

A New Folder dialog box pops up; it is prompting you for the name you want to give to your new
folder, Excel for POE for example. Type it in the Name window and select OK. Finally, select
Save.

� ���folder
f::!ame: jExcel for POE
- = �CgJ

c
If you would like to print your table, select the Office Button, next to the Save button; go to
Print, and select one of the print options.

Preview :and �lliint tl\le llO<Ument

f:rint
Se•lect.a p�inter, nrumb�r of rnpies,·and
oth .. r pri111tin.g optiorn< before prri·ntfng.

Qukl<Print
s�nd th• woukbo.olcdi'r�ctly ti© tm.e default
printer with.a"! makin9 changes,

1 17\ Print Prev'iew


� Preview and rmake <h.anges t<J pages before
'Hinting.

Eri nt �· •

For more print options, you might want to check out the Page Layout tab, on the upper left of
your screen, as well as the Page Layout button on the bottom right of your screen.

Hom,; rnsert: P�.g• �aNout

To close your file, select the X button on the upper right comer of your screen.

- �Ix!
,�, - !'- . � 1-'

10 Chapter 1

In the next section, we show you how to import data into an Excel spreadsheet. Getting data for
economic research is much easier today than it was years ago. Before the Internet, hours would be
spent in libraries, looking for and copying data by hand. Now we have access to rich data sources
which are a few clicks away.

First we will illustrate how convenient sites that make data available in Excel format can be. Then
we illustrate how to import ASCII or, text files, into Excel.

1.6 IMPORTING DATA INTO EXCEL

1.6.1 Resources for Economists on the Internet

Suppose you are interested in analyzing the GDP of the United States. The website Resources for
Economists contains a wide variety of data, and in particular the macro data we seek. Websites
are continually updated and improved. We guide you through an example, but be prepared for
differences from what we show here.

First, open up the website http://rfe.org/.

RFE: Resources for Economists on the I n te rn et


RFE l;/<>!n@

ISSN 1081-·4248.
vol. 1J., No. s
RFE Seaoch May, 2010

Editor: .B ill Goffe


Dept. of E.oonomics, SUNY Oswego
Editori'al As;sistant; Rich Freeh

• Int m d u ctio n
• D ta
• - "onarii=:s; G l o=a rles & Enc do edias
• E omi>ts. Dep.artments, & UniY c r s itii:-.s.
• Fore casti ng & Con:.ulting
• Jobs. Grants. Grad School. & Advice

Select the Data link and then select U.S. Macro and Regional Data.
Introduction to Excel 11

RFE: Resourcas for Econo mists on the Internet


!RFE Ho_,

Title Paqe I Oata


Tabre of Contentis: Abridged I Comolete Contents
Se.arch Economic Web Sites I Search RFE

.Data

• U.S. M<icro and Re<:J1c0MI Cla�


• Other U.S. Data
• W0>rld .:ind Non-U.S. Data
• Finance- and Fina11dal Markets
• Journal Data and Pmqram A.rcchi11e.s

This will open up a range of sub-data categories. For the example discussed here, select the
Bureau of Economic Analysis (BEA).

RFE. � Resources for Econ omists on the Internet

Title Page./ Ct.ata I U.S. Macru and Reqion<il Data


Table of Contents:: Abn dgi:d I Complete Contents
Seard1 Economic \11/eb Sites I Search fl.FE
RFE S<nrC)-1

U.S. Macro and Regional Data

"Pn�mary" maa-o and regiona'f sites that


generate data (mau,Y Jong series)

• Bureau of Ea:ino 11c Anal sis BEA - National Income and


P�·c;duce Accoun't:s (GDP, etc;), in ,atiornll and regior1al d.ata
cl et.ails . • .
• Feder.al Reserve
• Bur:eau of Labor St.ab;tics (6LS) - more th<1_n 25.Q,�Oa Jan>i
12 Chapter 1

Finally, select Gross Domestic Product (GDP).

dmw
Latest Information:
Federal Recovery Programs amd BEA Slatislics
Cl.:lrr.ent Re-leases

N'E!U't'S R@leas·@! Sche-dule U.S. Economfc Acc.o-unts


CongrE!::!3sion.a1 Quick Data

Coni@rences and Meetings.


National lnterm1tiona1
N..ai.-.sroo.m
----
Access National Economic Accounts Data Access International Ec·o·nomic Accounts D.ata
RSS llirformation
.. Gross. Domi?S.ti.c Product (GDP) It- Bi3li3rict! of Pd?Jments
• Per;o•n;;;il rm:ome· and Ou-tlay5 � � Trade i111 G:oods al'ild Servi't:e"S
• Cons:umer Sp:ending • )ntemcrtioflal Servi,ce.g
Su.rv.ts-y iJif Olrr�.t-
.n· _B.11�.11·���
t- Comorate Profi ts. t International lr.i\l'estment- Position
Imteradive Data Tables
(.lii' t- Fixed �-sets. Opcsri3tiorn; of M u1tf n�tion i3I Como'3'11ie:!:i

Dig ital tib·r,,.ry t Satellite Accnunt Survev Forms .aPld Related Materials
11 Rssie•arch arid De,u-elopment.
l'apers. and Working l'all"'rs
View all lnte.rnati·onal Accounts Information •••
Metho-dology P"f>""' • View all N1ational Actounts Infarm.:atio1T1 ••.

Electromic Reporting wtith

The result shows the point we are making. Many government and other web sites make data
available in Excel format. Select Current-dollar and "real" GDP.

Gros.s D:omestic Pmduct (-GDP)


News Release; Gmss Domestic Product I PDF
verniofl o.fthe Grn-ss Domestic Product release. Note
t inclu-des highlights, technical note, and Beginning with, th,e 2010 Q2 adval'loee·GDP re·lease (July
associated tables 3·0, 20.W), the advam;ed download fili!s (xi,;, -csv, and zip)
for th.e NIPA Interactive D.at.a Table.swill be split into two
s:ep<arate t im e peri9dsc 1969 to µresent , .a nd data throu9h
• Current-dollar and "r·eal" GDP 'Exc�I

Percent t9·59. This is b-eing d()ne in order \.C) acwmmodate the


• change from preceding perio · ,- el ·
apµroa·ching column limit in Ex-eel 2.0o:i· for tallies showing
i,li lnteracti11e T.ables: GDP and the National Income q u.ar·t e rly series.
and Pr.oduct Account (NlPAl H1stoncal \·abl=

� Selected Nll?A Tallies: Vie•"' tne ch.ange..s to the layout for the advancoo
download P"ae-

• Te.xt fa.rm at ITe:">tt:


• Co.mma-delimited format ,cs
• Port-able document format (PDF'

You have the option of saving the resulting Excel file to your computer or storage device, or
opening it right away-which we proceed to do next.

Do YQU wen: tu open Ill" saYe this file?

Name gdplev. xls.


Type: Microsoft Of1fke Excel 97-2003Worksheet, 25.CJKB
From: 11\JWW . b ea . g<D'll

Ii -'Op en � �[ _ v·_ e_�


_Sa ] 1 Cancel

What opens is a workbook with headers explaining the variables it contained. We see that there is
a series of annual data and a quarterly series.
Introduction to Excel 13

,., A � B I c J _Q___j__ E I F I G I
1 JCurrent-Dollar and "RealA Gr·OSS Domestic Product
2
Quart�Jy
-�

-
3 Annual

_4_ (S�asonally adjusted a n n ua l rat.es)



5
GDP·in GDP in
'GDP in· hillions of · GDP in billions of
billions of d1ai11ed billions of chained
curr9'nt 2005 current 2005
6- dollars dollars dollars dollars

-
7
8
-

-
9, 1929 103.6 977.0 '1.�47q1 23'7.2 1/�2·.2
10 19'30 '91.2 s92
1 .a 1947q2 240.4 1, 7169.5
·11 1931 76-5 1!34_9 19471q] 244_5 1,7@.0
12
-
1932 SS.:7 725_S. 1_19471q4 254_3 1,7'94,,B
'13 1933 56.4 716.4 1.948i;j1 2-60.3 1,823'.4

The opened file is "Read Only" so you must save it under another name to work with it, graph,
run regressions and so on.

1.6.2 Data Files for Principles of Econometrics

The book Principles of Econometrics, 4e, uses many examples with data. These data files have
been saved as workbooks and are available for you to download to your computer. There are
about 150 such files. The data files and other supplementary materials can be downloaded from
two web locations: the publisher website or the book website maintained by the authors.

1.6.2a John Wiley and Sons Website

Using your web browser, enter the address www.wiley.com/college/hill. Find, among the authors
named "Hill", the book Principles ofEconometrics, 4e.

t*- TEXTBOOK
P1rfm:::i.p,1'es of 6c:Ooonu�trics., 4ttll EdJ1Jirn111
R Carter H ill CLouislan.a State Uni.versity), William E. Griffiths
Univers.ity Ctf'Melbourne·, Australia), Gua: C. Um (University of
Melb·ourne ustra.l ia)
January 2011, ©2012

Follow the link to Resources for Students, and then Student Companion Site. There, you will
find links to supplement materials, including a link to Data Files that will allow you to download
all the data definition files and data files at once.
14 Chapter I

1. 6.2b Principles ofEconometrics Website

The address for the book website is www.principlesofeconometrics.com. There, you will find
links to the Data definitions files, Excel spreadsheets, as well as an Errata list. You can download
the data definition files and the Excel files all at once or select individual files. The data definition
files contain variable names, variable definitions, and summary statistics. The Excel spreadsheets
contain data only; those files were created using Excel 2003.

1.6.3 Importing ASCII Files


Sometimes data that you want to use may be provided but in ASCII or text format. To illustrate
go to http://principlesofeconometrics.com. There you will find that one of the formats in which
we provide data is ASCII or text files. These are used because they contain no formatting and can
be used by almost every software once imported.

Favorites. Tools Help

d' Fa11orites I � � ::iuggested Sites .. lol/e.b Slice Galler:t ..

_I �Principles of Ernnometrics SJ .. ml g iii T Page .,. Safety .. Tools •

lnstriuctor Resourrce s from John Wiley & Sons Data files, PowefPoirit Slides, Tustructo:r's.Mairnal

Student, Resources. frnm John Wiley & Sons Datafiles. .and Using Excelfor Principk� oiEconometri.c
Data files: POE includes 148 data files in various formats_ Usiri,g the links 'below you can download all files in a ".ZIP format,
or d01.Vn'load i'ndhiidual fi'le·s_ The data dennifio.n fil·es should he downloaded by all users_

Data d'e-finitfon files (•_def) are text file·s conta:ining variable- ·n ames., definitions .and summary statistics_

ASCII riles (•.dat) are text files contai.nin·g only data. Variable .names are in �.def files.

Select ASCII files and then go to the food data.


Introduction to Excel 15

ASCII data files (* .dat) are text files containing only data.

Dnwriload all ilie * .. dat files in (a) ZIP format m· (b) a s.e1 f- exib'adin!? EXE file (download and double-dick)

Select i'ndividual *. dat files from the table below.

a irli ne cola gQjQ meat profits fax


ale.oho I c ola2 gQ]f medical W!h tax2
andy c o m m w t si growth metrics pube-xp term
asp-aras comouter grunfeld mex1co· .Q.!..O.l texas
banqla1 consumption grunfeld2'. mininqi quizzes the-ories
beer £m grunfeldJ money returns tobit
bond cps sm a ll hhSUF\18V lilQ!'.!Jill ri_Q!l_ tobitmc
12! ffil.-1 hill mroz robberv toodyay
br2 cps2 house starts music salary tran Sf!CJrl
bro i l e r crime housing ne ls sales truffle-s
brum111 csi hwage nels small savirms tun a
w demand indpro newbroiler share· llk
canada, demo inflatiCJl'I nls sheep unit
capm2: edu ]nc insur nls panel sirmans usa

QI§. .fil!!Q: ivre21 nls 11ane l 2 .w 11town


cattle exrate ivreg2 oil spuri'ous vacan
ces fair � olympics sterling vacation
cespro figureC-3 korea oram:ie stockton •1ar
ch10 fi ori.d a learn oscar stockton2' vec

chard food liquor fil: stockton96 vote


cloth � lon1 � sumlus •1ote2

Right-click on the file name. Select Save Target As. A Save As dialog box pops up. Locate the
folder you want to save your file in by using the arrow-down located at the extreme right of the
Save in window or browsing through the list of folders displayed below it. Finally, select Save.

Once the download of the file 1s completed, a Download complete window pops up. Choose
Close.

r Do�nlmid complete ----- ��1(ill

Do "'nload Complete

food. d�t Ii-om


. VllW'tl . pr:m::i!'iesafernrnxne trirn . rnm

Downloaded: 960 bytes ir:i Lsec


Download to: C:ipocuments<1nd Setti ... \food.dat
Transrer rate: 961:l bytes{Sec

oaose·this dialmg max whien downlo�d .completes.

.Open ] [ Open Foldlec l [ Clo"e ti]

Start Excel. Select the Office Button on the upper left comer of the Excel window, then Open.
16 Chapter 1

Navigate to the location of the data file. Make sure you have selected All Files in the Files of
Type window. Select you food.dat file and then select Open .

. --

Open

Look�: Iii:::! DATA

Fili:'s of!;ype: IAll Files{'*,'"')


11·�
What begins is a Windows "Wizard" that will take you through 3 steps to import the data into
Excel. Our ASCII data files are neatly lined up in columns with no commas or anything else
separating the columns. Select Fixed width, and then Next.

Text I mport W izard - Step 1 of 3 r:I)�


The Text Wizard has determined that your data is Delimited.

If this.i;-,·mrrect, ·choose Next) or ch�ose the data type that best describes your �ata.

_
Original data type

/
S:hoose the file type that best describe� your data:
-
0 Q_�limited - Characters such as commas or tabs separate each fi.eld:

®fh��·�··_cii�.\F1 - Fields are aligned in colum.ns with spaces between each field. ·

Start import at IDW: I1____


.... !-"I File �igin: 4.37 : OEM United States

Preview-of-File C:\data\econ4630\food-.dat .

l . 115.ZZ 3.69
z 135. 98 4 .. 39
3 119-. 31 4. 75
Pr.e'View of Data file
4 ll4. 9oS 6_0_3,
5 lB_'I_ 05 12: 47
__

[ Cancel <Bad' !::!ext > I [ E_inish

In the next step the data are previewed. By clicking on the vertical black line you could adjust the
column width, but there is no need most of the time. For neatly arrayed data like ours, Excel can
determine where the columns end and begin. Select Next again.
Introduction to Excel 17

- ------ - -�=- ·--

;
r �

Tert Import Wiz.ard - Step 2 of J 11:] �


This s,ITeen leiB "lf-Plil set 'fieh:l 'titdttioi (rn'lumn flreaks}.
Lin ef; with <ir:ro1111s signify a rnlumnbreak.

To CREATE a '.break line, dick <it 1he desired position.


To DELEliE a br·eak line, double click on the hne.
To MO\IE a t:Preak line, dick and drag it ..

Data .._reVie'l'll

30 -40 SU 60 7tl

1Hi_2:! .3 _ 6 9 �I
135._:<l·S 4-39 -1
-
11:9.34 4.7Ei I
11'4. S•o& 6. 03

1.87. 05 12.47
�I

Cam:el
l [ <�ck
l �-:· ··.· -��:it_>·_ ··� [ EJnish
]

In the third and final step Excel permits you to format each column, or in fact to skip a column. In
our case you can simply select Finish.

r - ------ -�.

i Text Import Wizard - Step 3 of 3


l1JL8.J
ThlssITeen lets JIOLil .select eac:h -rnlumru -and :set the Data Fo�mat.

column dara funnai:

@ §erJeral
"General' cooverTii rn.1meric 11aliles ill numliers, d<1te v<11ues. ID d11tEs, and all
Ore·xt r·emair;iing values. to :text.
O Q.ate.1 j'-1"1-'o--v_
__ _,,,v,,J,,, [ !!_dvanced . . .. ]
0 1Do. mit [mpcrt column (skip)

Data: g_re view

.... ,
'L3'9
=I
4_7�
fi_ 03
·
12.47 vj
�I

This step concludes the process and now the data is in a worksheet named food.
18 Chapter 1

II A I B I
1 115.22 3.69
2_ B5.98 4.39
-

3 119.�4 4.75
4 114.:96 6.03
-
-
5 187.05 12.47
1 .. � � �1 I food I<" � .•
Rl"aily

Next, you need to save your food data in an Excel File format. To do that, select the Office
Button, Save As, and finally Excel Workbook.

::

�oeel W«kboolt
Save the ffle as an El( (el Workboafc ts
· Enel M.acrn-Eniib!ed Wadl:bcmk.
• Savoe the workbook lrt !he-XML-ba5oed andi
macr.a-e·nabred me farm.at.

E:x<:el _!!in.a'IY Worllboo �


Save the workbook In a. b l n aryfl.feformat
Ol!lfol!l1lzed far '1a1t load ing .and s.avin�.

A Save As dialog box pops up. Locate the folder you want to save your file in by using the
arrow-down located at the extreme right of the Save in window or browsing through the list of
folders displayed below it.

.Sa 11e ln: ! g9 My_Daruments

Excel has automatically given a File name, food.xlsx, and specify the file format in the Save as
type window, Excel Workbook (*.xlsx). All you need to do is select Save.

File []3!11E� [food xlsx


,
;:::=====
:::
I
11
Save as· type: Exc:el Workbook ('".xis:() �ave .�

From this point you are ready to analyze the data.

This completes our introductory Chapter. The rest of this manual is designed to supplement your
readings of Principles ofEconometrics, 4e. We will walk you through the analysis of examples
found in the text, using Excel 2007. We would like to be able to replicate most of the plots of data
and tables of results found in your text.
CHAPTER 2

The Simple Linear Regression


Model

CHAPTER OUTLINE
2.1 Plotting the Food Expenditure Data 2.4.2 Random Number Generation
2.1.1 Using Chart Tools 2.4.3 The LINEST Function
2.1.2 Editing the Graph 2.4.4 Repeated Sampling
2.1.2a Editing the Vertical Axis 2.5 Variance and Covariance of b1 and b2
2.1.2b Axis Titles 2.6 Nonlinear Relationships
2.1.2c Gridlines and Markers 2.6.1 A Quadratic Model
2.1.2d Moving the Chart 2.6.1a Estimating the Model
2.2 Estimating a Simple Regression 2.6.1b Scatter Plot of Data with Fitted
2.2.1 Using Least Squares Estimators' Formulas Quadratic Relationship
2.2.2 Using Excel Regression Analysis Routine 2.6.2 A Log-Linear Model
2.3 Plotting a Simple Regression 2.6.2a Histograms of PRICE and
2.3.1 Using Two Points ln(PR/CE)
2.3.2 Using Excel Built-in Feature 2.6.2b Estimating the Model
2.3.3 Using a Regression Option 2.6.2c Scatter Plot of Data with Fitted
2.3.4 Editing the Chart Log-Linear Relationship
2.4 Expected Values of b1 and b2 2.7 Regression with Indicator Variables
2.4.1 Model Assumptions 2.7.1 Histograms of House Prices
2.7.2 Estimating the Model

In this chapter we estimate a simple linear regression model of weekly food expenditure. We also
illustrate the concept of unbiased estimation. In the first section, we start by plotting the food
expenditure data.

2.1 PLOTTING THE FOOD EXPENDITURE DATA

Open the Excel file food. Save it as POE Chapter 2.

Compare the values you have in your worksheet to the ones found in Table 2.1, p. 49 of
Principles of Econometrics, 4e. The second part of Table 2.1 shows summary statistics. You can

19
20 Chapter 2

compute and check on those by using Excel mathematical functions introduced in Chapter 1, if
you would like.

Select the Insert tab located next to the Home tab. Select A2:B41. In the Charts groups of
commands select Scatter, and then Scatter with only Markers.

The result is:

40·

35

30

25 -

20
•.series1
15
• •
10

0 lOIJ 200 300 4-0U 500 60G 700

Each point on this Scatter chart illustrates one household for which we have recorded a pair of
values: weekly food expenditure and weekly income. This is very important. We chose Scatter
chart because we wanted to keep track of those pairs of values. For example, the point
highlighted below illustrates the pair of values (187.05, 12.47) found in row 6 of your table.

.... - ..
-:·
40
'

�5
• I
6:0
... ..
... - .... :
25

.... .. --
••• �
2:0 ......
#"• ,. •• • •seriesl
'15
. ..... .
.... - '
.I'\.
10
_"t I
Serier 1 Point "187 . 1>5000·3 "1
[1!87.050003, 12.47] I
0 I

0 100 200. 30.Q 400 son 500 700 I


I
-

When we select two columns of values to plot on a Scatter chart, Excel, by default, represents
values from the first column on the horizontal axis and values from the second column on the
vertical axis. So, in this case, the expenditure values are illustrated on the horizontal axis and
income values on the vertical axis. Indeed, you can see that the scale of the values on the
The Simple Linear Regression Model 21

horizontal axis corresponds to the one of the food expenditure values in column A, and the scale
of the values on the vertical axis corresponds to the one of the income values in column B.

We actually would like to illustrate the food expenditure values on the vertical axis and the
income values on the horizontal axis-opposite of what it is now. By convention, across
disciplines, the variable we monitor the level of (the dependent variable) is illustrated on the
vertical axis (Y-variable ). And by convention, across disciplines, the variable that we think might
explain the level of the dependent variable is illustrated on the horizontal axis (X-variable).

In our case, we think that the variation of levels of income across households might explain the
variation of levels of food expenditure across those same households. That is why we would like
to illustrate the food expenditure values on the vertical axis and the income values on the
horizontal axis.

X= Income

2.1.1 Using Chart Tools

If you look up on your screen, to the right end of your tab list, you should notice that Chart Tools
are now displayed, adding the Design, Layout, and Format tabs to the list. The Design tab is
open. (If, at any time, the Chart Tools and its tabs seem to disappear, all you need to do is to put
your cursor anywhere in your Chart area, left-click, and they will be made available again.)

Microsoft Excel �i Ch
� a rt-
Ta_
· ·a_

� ���- 1
-
Vlew Add-ms Auobat DeiTgin [;iyo.ut Format

Chart SlylH

Go to the Data group of commands, to the left, and select the Select Data button.

Swit�n Select
Row/CO·IUrtll!l Datot'(
D.ata �
22 Chapter 2

A Select Data Source dialog box pops up. Select Edit.

'
Select Datil Source 11]�
Cbart Qata range: llf@ll!·MRll

rr==1 [ � S�itch,RowfColumn ]�
Le!jel'ld Entries �er,ies) Horizontal (§_ateljory) Axis Labels
���=>'!"'=='=�=rr ����----:---.
[ '§l Md )I CT? E:irut J[ X ;B;emove JI 'It I ' :r/�,
°()
Seriesl 115.220001.
l:J.5.979996
119 .. 339996

114.959999
187 .. 050003

[ !::!)dden and Empty Cells I OK IJ [ Cancel

In the Edit Series dialog box, highlight the text from the Series X values window. Press the
Delete key on your keyboard. Select B2:B41. Highlight and delete the text from the Series Y
values window. Select A2:A41. Select OK.

-- - -

Edit s�ries [1) � :' [dit Series ---- �L8]


_Series aame: Series o.ame:

c__
________ _�
[i]
_, s..., Range m ett'lang�
.Series �values: Series� \lalrues::
�-------�

ifimiiim
m1iiq,iio1:ii
1.••'41:rl
l!ii .11rli 111,-----ji]
-- ri
ii �a. = iu. 22000 i, i3... I�=_Sh _ e_e t_1!_$8_$_2: $8_ _$4_i
___ �[iJ � 3 .. 69, 41.39, 4....

·Series Y values: Series 'i \lalues::

=Sheetl!S8$2 :$8$4 1 � = .3 .. 69, 4.39, 4.... �l=_Sh _ e_e t_11�


$A_$2_:: _
$A_S_
4 1___ �[i] =· l15.220001, 13.. ,

'-------------�
--'
OK iJ I Canrn ] OK t)l 1 Cancel l

The Select Data Source dialog box reappears. Select OK again. You have just told Excel that
income are the X-values, and food expenditure are the Y-values-not the other way around.

The result is:


7{)0

600

500

400

+
300
+ •seriesl
•• •
200

100

() 2() 30 40
The Simple Linear Regression Model 23

2.1.2 Editing the Graph

Now, we would like to do some editing. We do not need a Legend, since we have only one data
series. Our expenditure values do not go over 600, so we can restrict our vertical axis scale to
that. We definitely would like to label our axes. We might want to get rid of our Gridlines, and
change the Format of our data series. Finally, we would like to move our chart to a new
worksheet.

Select the Layout tab. On the Labels group of commands, select Legend and None to delete the
legend.

��ila�T�olt
[;J l"i:l � lib] lil 1
11
Chart Axi·s Ltgen<11 Data Data Non<'
�Label�
De� ta.yo;!) Fermat
InleT nt1e1. �

Labers
T 1able.

��rl r-. Tomi offfle!)rnd


2.1.2a Editing the Vertical Axis

Select the Axes button on the Axes group of commands. Go to Primary Vertical Axis, and select
More Primary Vertical Axis Options.
Show Axis fn !lBllons
Display �.xls with numbers
'e�resente:d in Billions

Show Ax[s with lo-g Seal�


i>isplay Axis u5ing a. tog 10 based
Primali)' J:!oria:ontal Axis • scale
l'rima11Y Yertica U A1ds ' •
• 1 1--Mo•� Prt""'ry Vert<caJ Al!is Optiorn ...
-

A Format Axis dialog box pops up. Change the Maximum value illustrated on the axis from
Auto to Fixed, and speci fy 600.

format Axis �IBJ


[!xis Op'ticng l Axis Optiom
Min'imum:
Number ®.A uto 0 EiKed I" o
Fill Maximum�
Q A!!t:o 0 f.[xed foS�O.O
une Color Major-uriit: @Auto Q R�ed 1100 a

MiMr unit: @ Aut:Q. 0 Fix!l_d


Une 5tyle 110 '

Next select Alignment, and use the arrow-down in the Text direction window to select Rotate
all text 270°.

I I
.ABC Horizontal

!\lumber

Fill I�I .Rotate all text 90"

Line Color I
Alignment
line St>jle
Shadow
Te�tlay,,ut

I
\l_erbcal �lignment: Middle Cente.,, I v
• Rotate all text 210°

1..
J-0 f()rmat

Alignment�
Teir! direction: IHorizonral
C!!_•tom
. "r;ge:
I
"-'J rn c:
Stacked
4T .I i,,�
24 Chapter 2

Place your cursor on the upper blue border of your Format Axis dialog box.

" Format Axis [1]['8]

Left-click, hold it, and drag the box over so you can see your chart; release. Look at the vertical
axis of your chart.

The numbers are now displayed vertically instead of horizontally, but less of them are displayed
as well:

00

00
a
a
v

00

00 0
a
"'
00

We want to change that back.

Select Axis Options again. Change Major unit from Auto to Fixed, and specify 100. Select
Close.

Number � ------- ----- · - -� .

Fill Format Axis Ll] rg)


Line .Color
f Axis Options J Axis. 0 ptions
Une Style
Number Minimum: @ �uto 0 Eixed
Sllado•fli
I 1

Fill Maximum: 0 Ayto ® F!xed J60a.o


.J,-0 Format
Line Color Major unit Q Auto ® Fi�ed l¢o.o
Angnment Minor unit:
Line Style @ AutQ 0 Fix�d I,

2.1.2b Axis Titles

Back to the Labels group of commands; select Axis Titles, go to Primary Horizontal Axis
Title, and select Title Below Axis.

N�me
Do not cd'i1pll!y�nAl<i< Title

Ol·art Axir
Titlies t&
Legenlli Dat.a Datil I� Prirnt•ny !fori>:o°'tal !bi< TlUe �· Trtle Selow Axis
TrtlP · ta.be!'� · Table· Disp!ay Tiflf' belOJ•W Ho ri;zontal t.xis md f°".

· label�
� Prin:ui:yyentlcal Axil Title � re<Lze cha·rt
The Simple Linear Regression Model 25

Select the generic Axis Title in the bottom of your chart and type in x =weekly income in $100.

cr.:: ------------
... x= ;t?
weekly income in S10�J
[!J-- ------------�

Go back to Axis Titles, then to Primary Vertical Axis Title this time. Select Rotated Title.

None
Do nett dl!1Play a.n Aili� Trtle

Chart Axisc Legend Data Dm Primary Horizontal Axis m1e � Rotated' rrtie
Tiitle
� 1iit1E§ N -
Labels� Ta.hie
P1im;:11y Ye rtical �j5. Tltrt
[}i;sp. �a.y Rc.tt iitedl 11.Jcf,5 liitfe and' mile �
"'S labels clnart

Select the generic Axis Title on the left of your chart and press Delete, or put your cursor on top
of the Axis Title box, left-click, and press the Backspace key to delete the generic Axis Title.
Type in y =weekly food expenditure in $.

1:1
·-1
�I
�I
.,, I
=1
i1I
al
.,, I
1111
:i,1
I ...
1

I 111 I
I 1111
I :: I
I .,
o}"j

2.1.2c Gridlines and Markers

Back to the Axes group of commands now. Select Gridlines. Go to Primary Horizontal
Gridlines, and select None.

�I
� -
Axes Grldttnes !iii l?fim a ry .t!o rilzontal Gr�d Ii roes �- � M.aj'or Gr[dlirie5

i\xe5
�� "lilJ l P1imary :\[errtic.al GrldITne;; "\ Dhplay . Hmizontaf G.� icllun es for Major units

Change the Current Selection (group of commands to the far left) to Series 1 (use the arrow
down button to the right of the window to make that selection). Select Format Selection.

Fs ] _j.· . �rRf'Sl w]
� E=ornna.t Selection � l<q,, i'ormat Sell'ction�
� Rfid to M'atcll 'Styl� tij Reset to Matcll S:tyl·�
CurrentSeli:-ction Currenl Selection.
26 Chapter 2

A Format Data Series dialog box pops up. Select Marker Options. Change the Marker Type
from Automatic to Built-in. Change the Type and the Size as shown below:

Marker Type,

0 �bltoma1ic
0 NQne
@ Buili:4n
Type:�
Si2e: a

Next, select Marker Fill. Change it from Automatic to Solid fill. Color options pop up. Change
the Color to black. Select Marker Line Color, and change it from Automatic to No line. Select
Close.
@ ;i.ondfill
-
Marker Fill
·••!'�" il.'11:1 .. •�;�] 0 !?r,.dientfill
Marker Line Color
0 tlofill 0 !:'.icture or te�ture fill Markerfll
� rn
���::;:�tfi
line
Series Options 0 Al,!toma1fc Line Color
N

�olid line
Marker OptiOMS ll D Y:ary colors by poin.t
line Style

��I
0 f:ic.lure or texiure fill 0 i;;radient line
Marlcer Fill
·� @ Ab!toma1ic
�r..lor.:
�- � Markerli"1e Color� ®
-
Ay_toma1fc
- I'
1 11 Close

The result is a replica of Figure 2.6 p. 50 in Principles of Econometrics, 4e: (if it looks like some
of your dots are little flowers, left-click your cursor anywhere on your screen first)

.... .. ,
D I
D .

-
.!ii 0
D
I!! VI .
" .
:t: ..
p
.,, D �
c
... . . . •
8. . .
.
>< D . . . .
llJ 0 . .
.,, m . . . . .
0 . . ..
.g D . . .
::.. 0
. . .
::;;: "' . .
II .

I
"' D . . . .
ii: 0
.....
II
::..
0

0 5 10 15 20 25 30 35 40

:
x� w�eldv inoome in $100

....
I
2.1.2d Moving the Chart

Go back to the Design tab. (Remember if you don't see your Chart Tools tabs, what you need to
do is place your cursor in your chart area and left-click). Select the Move Chart button on the
Location group of commands to the far right of your screen.

Ch.a.rt

li>esngn
T110!5

�: Layout Format
Move
Cha
��rt �<
loGJhcn;
I
The Simple Linear Regression Model 27

A Move Chart dialog box pops up. Select New sheet and give it a name like Figure 2.6. Select
OK.

Choose. where you want the dlart to be placed:

'iF�g _ur _e _2.,6_l


�. � @'Ne:w�he:e:t: ________
_

Q:Qbjectfn! �fs_h ee _tl


_________ �
, v I
OK ] [ Cancel J

Rename Sheet 1 Data (if needed, see Section 1.4 of this manual on how to do that).

We have plotted our data, and edited our chart. Next, we want to estimate the regression line that
best fit the data, and add this line to the chart.

2.2 ESTIMATING A SIMPLE REGRESSION

In this section, we are going to use two different methods to obtain the least squares estimates of
the intercept and slope parameters {31 and {32. Method 1 consists of plugging in values into the
b1 and b2 least squares estimators' formulas. Method 2 consists of making use of Excel built-in
regression analysis routine.

2.2.1 Using Least Squares Estimators' Formulas

The least squares estimators are:


= I(xi - x)(yi - y)
b2 (2.1)
ICxi - x)2

(2.2)

These formulas are telling us two things: (1) which values we need, and (2) how we need to
combine them to compute b1 and b2.

(1) Which values do we need?

We need the (xi, Yi) pairs of values-they do appear explicitly in equation (2.1). We also need x
and y, which are the sample means, or simple arithmetic averages of the xi values and Yi
values-those averages appear both in equation (2.1) and equation (2.2). Note that the subscript i
in xi and Yi keeps count of the x and y values. In other words, i denotes the ith value or ith pair
of values. Also, x and y, are referred to as "x-bar" and "y-bar".
28 Chapter 2

(2) How do we combine those values?

The numerator is the sum of products; L is the Greek capital letter "sigma" which denotes sum.
The first term of each product is the deviation of an x value from its mean (xi x). The second
-

term of each product is the deviation of the corresponding y value from its mean (yi y). The -

products are computed for each (xi,yJ pair of values before they are added together.

The denominator is the sum of the squared deviations from the mean, for the x values only. In
other words, each x value deviation from its mean is first squared, and then all those squared
deviations values are summed.

Equation (2.2): b1 = y - b2.X

This equation tells us to multiply b2 by x, and then subtract this product from y. Note that b2
must be computed first-before b1 can be computed.

There is actually no magic to this. We use the food expenditure and income values we have
collected from our random sample of 40 households, and perform simple arithmetic operations to
compute the estimates the intercept and slope coefficient of our regression line.

As for the computation of b1 and b2 itself, there is only one trick. We need to make sure we
know which values are the x 's and which ones are the y' s. So, we are going to start by adding
labels to our columns of data.

You should be in your Data worksheet. If not, you can go back to it by selecting its tab on the
bottom of your screen.

Select row 2 and insert a new row (see Section 1.4 of this manual if you need help on that). In the
new cell A2, type y; and in the new cell B2, type x. Right-align Al :B2.

I A I B
j' jfood_exp income
_I_J 'J x

Next, we need to lay out the frame of the table where we are going to store our intermediate and
final computations. Type x_bar=in cell D2, y_bar=in cell D3, b2 =in cell D6, and bl=in cell
D7. In cell G2:J2, type x_deviation, y_deviation, (x_dev)(y_dev), and (x_deviation)2,
respectively. (Note that you can use your Tab key, instead of moving your cursor or using the
Arrow key, to move to the next cell to your right).
The Simple Linear Regression Model 29

D E 'F G H I J K
·
2 x_bar= J:<�delliatiory_delliatior (x_dev)(y !ex deviation
_ )2
J. y_bar-=
4
5.
& b2 =

7 b1 =

Below x_deviation we are going to compute and store the deviations of the x values from their
mean. Below y_deviation, we are going to compute and store the deviations of they values from
their mean. Below (x_dev)(y_dev), we are going to compute and store the products of the x

deviation and they deviation for each pair of values. Finally, below (x_deviation)2 we are going
to compute and store the x deviations squared.

To show the 2 of (x_deviation)2 as a square, place your cursor in J2, if it is not already in it.
Move to the Formula bar to select the 2, and select the arrow to the right comer of the Font
group of commands.

A Format cells dialog box pops up. Select Superscript and then OK.

�_nt_; _________, F �� nt _s cy
r � le_: __ �iz _e:_____,
r � r
Arial Regular 10

'It Calibri (Body) liM@I ""'


s,------i
':II' i\gency FB lt.alic 9
!!erian Bold
!
�. Bold rtnlic �"·i
':II' Mal Blad\ 12
'Ii' Mal Narrow 14

Underline : C.ol on
,,_N-on -e -------.,.�1 1 Automatic v I D 't!i.ormal font
.Effects.

I g��::�ut
Osul;i_saipt

This is a TrueType funt. The same fonh'lliTI be used on both y0ur printer.and your
ween.

OK� [ Cancel

In cells D6 and D7 proceed to format the 2 and 1 of b2 and b1 as Subscripts instead. Bold all
the labels you just typed, and Align Right the ones from G2:J2. Finally, resize the width of
columns G:J to accommodate the width of its labels (see Section 1.4 of this manual if you need
help on that).
30 Chapter 2

Now, your worksheet should look like this one:

l'1P'I D j E I F I G I H I I I J
2 )( bar=
- -
!<_:deviation ·y_devia1io11 (�_lfev'}()'�dev) 1(x�d'evi11tionf I
3 y_bar=
4
__§_
-
6 bl=
7 b1 = l " I

We have computed averages before. The formula you should have in cell E2 is
=AVERAGE(B3:B42), and the one in cell E3 is = AVERAGE(A3:A42). Compare the averages
you get to the sample means of Table 2.1 in Principles of Econometrics, 4e (p. 49); they should
be the same.
D I E I F I G I H I I I J
-1:_ x bar= 19_60475 1t _devfatfon l..Y. de
' viation lx dev)(y_d!ev) (1<_ deviati'onf-
_

-� y_bar= 283.5735
-
4
_j_
6 b:z=
-
7 b1 =

Next, we want to compute the deviations. Think about what you are trying to compute. And then
type the needed formulas in G3:J3.

You should type =B3 - E2 in cell G3, =A3 - E3 in cell H3, =G3*H3 in cell 13, and G23A2 in
cell J3. Here are the values you should get:

D I E I F I G I H I I I J I
2 x-bar= 19.60'475 x_deviation y_d'.eviation (x_�ev}{y_d:ey] (x_dE:Jviaticrnf
,__
J y_bar= :283.5.735-- -15_9 1 4 7 501 -16.8_353498 2679. 303845 253_2792692
,_
4
>--

2-
6 b2=
I-
7 b-1= I
Now, in cells G3 and H3, we gave cell references E2 and E3, where the averages are stored. Note
that we will need to use those averages again, and get those averages from these same exact
locations, to compute the deviations of the next 39 observations.

So, what we actually need to do is to transform these Relative cell references (E2 and E3) into
Absolute cell references ($E$2 and $E$3). This will allow us to copy the formula from G3:H3
down below without losing track of the fact that the values for the averages are stored in cells E2
and E3.

A Relative cell reference is made into an Absolute cell reference by preceding both the row and
column references by a dollar sign. Place your cursor back in cell G3 (i.e. move your mouse over
and left-click); in the Formula bar, place your cursor before the E and insert a dollar sign (press
the Shift-key and the $ key at the same time); move your cursor before the 2 and insert another
dollar sign; place your cursor at the end of the formula and press Enter.

� =B3}2 K )( ./ �I =B3-$@ 'X ./ fr =B3-$E$2l


The Simple Linear Regression Model 31

Go to cellH3, and add the needed dollar signs there too. Now, you can select G3:J3. Select
Copy on the Clipboard group of command. Select G4:J42, and select Paste (next to Copy). You
have just copied the formulas to compute the needed deviations for the rest of the (xi, Yi) pairs.

Your worksheet should look like this:

-
D I E I F G H I J 1
2
I--
x-bar= 1 9 60475
_
:C�d!.Y!a�t!C?:'l J�d!:'!l'!;t!�n.. J����Y1!�U!�'!.t Lx�d_e_v11!.'l�'!t.
y_bar 283.5735 : 15 9147501 �68 353498 2679-30�845 253.2792,692'

4
= - ,_

-15-214!501
- _

-147 5 935 03 2245-598261 231. 48861,91


- _

5 -164---233�03 2439J)476'41 '2:20 66 599

t
-14.8547501 _ 3

6 b'2= - 13 51475 01
_ -168_6135 221!8_886121 184.27363891
7 b1 = 7 13475005 -96.52349£3 681!.<6710199 50 .9'Q4,65 828-
-
- _

We have everything we need to finalize the computation of b1 and b2.

Place your cursor in cell E6, and again think about what you need to compute b2. Recall that the
least squares estimators are:
= L(Xi - .i)(yi - y)
b2 2 (2.1)
L(xi - x)

(2.2)

If you refer back to equation (2.1), you can see that =SUM(I3:142)/SUM(J3:J42) is the formula
you need in cell E6. The one you need in cell E7 is =E3 - E6*E2 for equation (2.2).

Your worksheet should look like this:

- - - - - - - - -
A I B I c I D I E I F I G H I I j
2 y x- x bm= 19-60475 x_deviation y_deviation lx_dev')(y_d�ev) 1(x_deviatio·nf
3 115.22 3!69 y_bar = 283.5735 -15,.9N7501 -1-68.3 53498 2679.303845 253.279269'2
4 135.98 4.39 -151-214 7501 -147 5935 03
_ 2245_5 98251 231-48861911
5

119.34 4.75·
--
-14.8.547501 -1'64.233503 243-9.64 7641 220•.-66�599
6- 114.96 6.031 �= 10.2096:4 ·-13.5747501 -168_6135 221! 8.8 86121 184.273838 9
7 187.05 12-47 ht= 83_41501 7 13475005 -9 6_ 5234%3 688:6710199 50 90465828
-
- _ _

In the table above we obtain the same exact least squares estimates as those reported on p. 53 of
Principles of Econometrics, 4e.

That was Method 1 of obtaining the least squares estimates of the intercept and slope parameters
/Ji and {32. For Method 2, we are going to use the Excel built-in regression analysis routine.

2.2.2 Using Excel Regression Analysis Routine

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
32 Chapter 2

If the Data Analysis tool does not appear on the ribbon, you need to load it first.

Select the Office Button in the upper left comer of your screen, Excel Options on the bottom of
the Office Button tasks panel, Add-Ins in the Excel Options dialog box, Excel Add-ins in the
Manage window at the bottom of the Excel Options dialog box, and then Go.

. ------
! Excel Options

Popular

Fcrmurlas

Proofin.!1

'iave

Advanced

Excel Optjam �X E!:it Excel I Manag:e: I Excel: Add-1ns

In the Add-Ins dialog box, check the box in front of Analysis ToolPak. Select OK.

!!dd-Ins available.:
1(8! .. ·-iiirlj
0 ·•· mmiiij \
,___K
D _ .I'...
-=---<"'P
I
O AnalysisTo dlPak - VB A

Now Data Analysis should be available on the Analysis group of commands. Select it.

A Data Analysis dialog box pops up. In it, select Regression (you might need to use the scroll up
and down bar to the right of the Analysis Tools window to find it), then select OK.

-
, Data An alysi s [1.JL.8]
�rna'lysis Tools
'HistIJgram
Movil]g Average
Random Number Gener.ation
Rank arnl Percentile
tfelP'
Re ESSIDn
Sampling
t-Test: Paired Two Sample filr Means
t-Test: Two·Sample Assuming Equal Variances
t-Test: Two-Sample Assuming Une:qual Variances
z-Ted:Two Sam�e for Means

The Regression dialog box that pops up next is very similar to the Edit Series box we
encountered before (see Section 2.1.1). Place your cursor in the Input Y Range window, and
select A3:A42 to specify they-values you are working with. Similarly, place your cursor in the
Input X Range window, and select B3:B42 to specify the x-values you are working with. Next,
place your cursor in the New Worksheet Ply window and type Regression-this is going to be
the name of the new worksheet where Excel regression analysis results are going to be stored.
Select OK.
The Simple Linear Regression Model 33

r - -

1 Re-gressfon l1J �
lilput
[nputJ_Range;

lnput:KJRange;
I :$A$.3:$A$42
I '$8$3::$13$42



1jelp
O!oabe:ls D ·Constant is fero
D Confidem:e Level: �%
0Ulp11I options.

Q .Quj;put Rllflge: �j
0 New l!'JQrkslieet:pJy� I Regre�sionl I
0 New �orlibook
Reslduals--
013.esiduals D Re:sigual P:lotE
Ostandardized Residuals D L!i:ie: RtPlots

Normal Prcliabihty
D·!'.!orrnal-'Probability Plots

The Summary Output that Excel just generated should be highlighted as shown below:

-.,-, A B c D I E I F I G H r J
1 SUMMARY OUTPUT
2
3 Regression Sraiistics
4 MultiplB- R 0.-0.204.85
.5 R Square 0_385D_Q2
& Adjusted F 0.368S·1-6
7 Standard :E .89.517
B Observat.io 40
9
iO ANOVA
11 df SS MS F :Qrr;ficarerc F
12 R1l9ressio1 1 190627 190627 23.78684 USE-OS
1 3 Residual 38 304505.2 8fil13_294
14 Total 39 4951'32.2
15
16 CoefficienManaa'«i E:m t·Sfat P-vaJi.Je l..ower 95% UpDer 95%.ower 95. OfJipper 95. 09�
1 7 lntemept 83_41ifiQ1 43_41orn B21.518 ()'_Qfi2182 -4.4•fi327 1712953 -4_46327 HL2953 -
18 X Variable 10.20964 2.G9326J· 4.87138:1 t.95.E-05· 5.972052 14.44723 5.972(}52 14.44723
19
20
21
•·
22 I L 1

Select the Home tab. In the Cells group of commands, select Format, and AutoFit Column
Width; this is an alternative to adjust the width of the selected columns to fit their contents.

=
,._ n�
rn · �
EB n Cclumn'Width ...
:;:
Autolt=ft CoEUlll'l'li'I Wi1dth.�
�ef,;ult Width ...
34 Chapter 2

Your worksheet should now look like this:

A I B c I
D I
E I F G H I
1 SUMMARY OUTPUT
f-
2
3 Re_qression S/:alisfics
4 Multi'f1leR O.S20485472
5 R $:quare oiSS001Z22�
6. Adjus1ed R Squ;;ire C1.:Jliea1 sos9
7 Stal'ld<ird Errnr 89.51700429
1i OhSe.l'Vatlon s 40
9
10 AN OVA
11 ,rJf SS MS F SianificaQce f
12 R"J:rr:e<ssicm 1 13062�.�788 190626.9788 23".7$884'1 Q7 1.94586E-05
13 Residual 38 )0450.5.1742 8013.294058
14 Tota.I 39 4951'32.153
15
16 Coefficients Slandani Error t Stat P-velue Lower95% Upoer95% LDwer 95.0%. Uppe.t 95. 0%
17 lntemef11 83.4""'16U0997 43.4"1016.1921.9215779�1 0.06,2182379 -4.46:1267721 11129.s2srr -4.4632&n21 1112952877
Hl X Variable 1 10.2095425 2.0!t3263461 4.BTTJB0554 1 .94586E-O� 5.97205221f2 14.4472328 5.972052202 14.447.2;328

The least squares estimates are given under the Coefficients column in the last table of the
Summary Output. The estimate for the Intercept coefficient or b1 is the first one; followed by
the estimate of the slope coefficient (X variable 1 coefficient) or b2. The summary output
contains many other items that we will learn about shortly. For now, notice that the number of
observations or pairs of values, 40, is given in cell BS.

A convenient way to report the values for b1 and b2 is to write out the equation of the estimated
regression line:
Yi = 83.42 + 10.21xi (2.3)

Now that we have the equation of our straight line, we would like to graph it. This is what we are
doing in the next section.

2.3 PLOTTING A SIMPLE REGRESSION

There are different ways to draw a regression line. One way is to plot two points and draw the
line that passes through those two points-this is the method we are going to use first. Another
way is plot many points, and then draw the line that passes through all those points-this is the
method that Excel uses in its built-in features we are going to look at next.

2.3.1 Using Two Points

When we draw a line by hand, on a piece of paper, using a pen and a ruler, we can use any two
points. We can extend our line between the points, as well as beyond the points, up and down, or
right and left. Excel does not use a ruler. Instead, it uses the coordinates of two points to draw a
line, and it draws the line only between them. So, to have Excel draw a line that spans over the
whole range of data we have, we need to choose those two points a little bit more strategically
than usual.
The Simple Linear Regression Model 35

If you look back at your scatter chart (Figure 2.6 worksheet) or back in your table (Data
worksheet), you can see that our x values range from about 0 to 35 (from 3.69 to 33.4 exactly).
So, we choose our first point to have an x value equal to 0, and our second point an x value of
35.

The point with an x value of zero is our y intercept. It is the point where the line crosses the
vertical axis. Its coordinates are x = 0 and y = b1 or (0, 83.42). This is our first point.

For our second point, we let x = 35; plug this x value in equation (2.3), and compute its
corresponding or predicted y value. We obtain:

y = 83.42 + 10.21(35) = 440.77 (2.4)

This is our second point, with coordinates (35, 440.77).

Go back to your Data worksheet (if you are not already there). In cell Ll, type Points to graph
regression line. In columns L and M we are going to record the coordinates of the two points we
are using to draw our regression line. In cell L2, type y; in cell M2, type x. In cell M3, type O; in
cell M4, type 35. In cell L3, we actually want to record the value for our y intercept or bi, which
we already have in cell E7. So, we are going to get it from there: in cell L3, type= E7, and press
Enter. In cell L4, we want to have the computed predicted y value from (2.4). So we type
=E7+E6*M4, and press Enter. Note that instead of typing all those cell references, you can just
move your cursor to the cells of interest as if you were actually getting the needed values-this is
a very good way to avoid typing errors. So, you would type the equal sign, move your cursor to
E7 and left-click to select it, type the plus sign, move your cursor to cell E6 and left-click to
select it, type the asterisk, move your cursor to sell M4 and left-click to select it, and finally press
Enter. Once you have done all of that, your worksheet should look like this:

L J M J N
1 P'oints fo graph regre.ssion line
2 y. x
,_ ..

j
83_41601 0
,_l_
4 440.7535 35

Note that the predicted y value we obtain in the worksheet for x = 35 is slightly different than
the one we just computed in equation (2.4) due to rounding number differences.

Now, go back to your Figure 2.6 worksheet. The data we have plotted on the chart represent one
set or series of data. The two new pairs of values we want to add to this chart represent a second
set or series of data.

Select the Design tab, then the Select data button from the Data group of commands.

Chart loCJh

D�sign C'.t Laveut Format


36 Chapter 2

In the Legend Entries (Series) window of the Select data source dialog box, select the Add
button.
,..- _ _;____ .

' S.elect Data Source

Chart i;!ata range:


The clala '""'ge is !Do comple� to be di'>Piayecil. lf.a new rar
ttie series in the-Series panel.

JP
Legend Entries §eries)

1, '� Add �[ li:? Edit ][ 'X 8,emo�e ],


Series!

Place your cursor in the Series X values window of the Edit series dialog box, and select
M3:M4 in the Data worksheet. Place your cursor in the Series Y values window (delete
whatever is in there), and select L3:L4 in the Data worksheet. Select OK.

· � dit Series

- rli �
Series.name:

[�] :deURlitl!JF
Series.� valLles:
=Dara1�$3-:.$M$4 � = 0, 35

=Daral�$3:!1L$� 00 = 8,'.H1600"997, 4..

GK Can(eJ

The Select data source dialog box reappears. A second data series, Series2, was created from the
selection you just specified. Select OK.

Legend Er.itries (S_erie�J

I \@Add II � Edit II X 8,emove I I�


Seriest
Series2

The two points from your new series are plotted on your chart (squares below):

:
.. .. ..

0
D .

"II>
.5 D
0
:!! Lil
.
.
"'
..
D
.. •
.., 0
., .
..,- ··. .
K .
.
" D . .
.
.. D . . .
.., "' . .
. .
.. . . . .
4! 0 . .

r
D .
J:-
""'
N
. .
. .
.

.. .
D . . .
� ;'; II
.

II
;=,.
D

0 5 JlO 15 20 25 SD 35 40

,._•weekly income in $100


: .. .. l


The Simple Linear Regression Model 37

Now, we need to draw a line across those two points. Go to the Layout tab. Change the Current
selection (group of command to the far left) to Series 2 (use the arrow down button to the right of
the window to make that selection). Select Format selection.

!series 2. 1. I SerHeS 2
i�
L� Fi;nma,t_SelectiCJ�

Chart roars
� Form<>t S:l'll: 'rtior:i
I � Rrset to Matcl'.I Sfyl<". � Resetto Match Style
[}esign �ayout ts Fmmat C:unenlS:ele'Cllon Current 5clection

A Format data series dialog box pops up. Select Line color and change its selection from No
line to Solid line. Select Close.

'"'
,11,!1.-ur.1 •lm.-�w"1...:<:J] line Color
-

�so'ii"cfl 1ile1
Series. Options.
0 r:-!o Line

Marker Optlons I
�;�di�tlne
Marker Fill 0 Ay_tomatk
I

(;_olor;
11
I

Uhe Color Close


t;d I [� T) �

The result is:

0
0
lD

.5 0
0
E lf"I
=
:t: 0
-g 0
..
<;!"
l
x 0
111 0

"Ill f"l
..
.s 0
0
z- IN
...
111
111 0

� 0
rl
II
::..
0

0 5 10 15 35 40

x �weekly inwmie in $1!00

Note that while you need only two points to be able to draw a straight line, you can use more than
two points. So we could have computed a predicted level of food expenditure for every level of
income we have in our original data set, and use the 40 (xi, .Ya pairs of values as our data Series
2. This is actually what Excel does when it adds a Linear Trend Line to a Scatter chart or a
Line of best Fit to Plots of data as part of the Regression Analysis routine.

We are going to delete the line and two points we just added to our graph and successively look at
these other two ways to plot our regression line.
38 Chapter 2

2.3.2 Using Excel Built-in Feature

In the Design tab, go back to the Data group of commands, and select the Select Data button. In
the Select Data Source dialog box, select Series2 and Remove. Finally select OK.

Select Data Source

Gflart !!!.�ta range:


The data nlnge is tpa mmplex l:o be di�pilayed. [fa new rnra
the 'Series in the .Se(ies pane'I,

J
r �S\'!']t:h Row/C<;;fumn

Chart Tool!

Design�- La��LJt form"t

To add a Linear Trend Line, select the Layout tab. Go to the Analysis group of commands,
select Trendline, and then Linear Trendline.

No.ne
Removes the <etecte-d Tr..r1dline OJ all
' Trendlines ili none are selerted
1 Lines UpiDmwn Error Uneatr Trend nne

Layout � Format
Bars·
i!>.n�lysis
Bar1 •
.Ad1'sfse1s a UneafTrendHne for the
�e-lected chart ser�e�
"'
"i I

Your chart should look like this (see also Figure 2.8 p. 54 in Principles ofEconometrics, 4e):

-0
0
ID
...,.
.! 0
0
� If)
.
"
·" ..
0
.., D
" ...
w..
" a
OJ 0
.., m
0
.e ·O


.a
0
N

.,
OJ a
3 0
rl
II
i:--
·o

0 5 10 15 20 25 30 .35 40

x� weeklyiru:ome ini$1IDO

2.3.3 Using a Regression Option

You can also have Excel add the Line that best Fit your data by choosing that option on the
Regression dialog box.

Go back to your Data worksheet (bottom left comer of your screen).


The Simple Linear Regression Model 39

Select the Data tab, located in the middle of your tab list. Select Data Analysis on the Analysis
group of commands to the far right of the ribbon. Select Regression in the Data Analysis dialog

-----l1J (g]
box, and then OK.
- -
a _ _n a-l -.- ------
:' Da-t_ A ysi s
�alysis Tools
Covariance
Descriptive Sratisties
Exponential Smoothing
F'Test Two-Sample fur �ariances
Fouri:er Analysis I t[elp
Hi.s�ram
M.:wing Average
Random Number Gen�ation
Rank and Percentile
1..-F-
o _
r m_
u a�_ o_al!...._,
s ____ a t! Review
Analysis
Re ess1on V'

In the Regression dialog box, proceed as you did before, except this time, name your worksheet
Regression and Line, and check the box in front of Line Fit Plots. Select OK.

Output options.

0 QutputRange: �1
0 New Worksheet!ely: I Regression anci Line I
0 New W.orkbook
Residuals
D Residuals
D siandar.dized Re.siduals

In addition to the Summary Output you now have a Residual Output table and a Chart in your
new worksheet. The Residual Output table is only partially shown below, and shown after
AutoFitting the Column Width (see Section 2.2.2 for more details on that).

A I B I c X Variable l Line Fit Plot


22 RES.IDUAL OUTPUT J
I-
23 1000 ,----------

24 I ObseNation Predicted Y R.esii:Juals


> 50� l.,.�,•mm1,wmwmm1mm,
1,����,
2:5 1 121 _089590!t: -5-86% 8 9792' - - . - •Y
-- "' ,.._ N "' 6 r-i ui 0 0.11 'lD
128.-2363405·: lD <;t rn rn 00 rl ....
26 2 7 _743555458, 'l: N
ui
r--- rn "' lJ1 r-. ori 0 .... <T N ;-; • Predicte.d V
27 3 131.9'11 ilrni · -12.57181564 rl ..-< ..-< ..-< N N N N
,_._
28 4 144_9801542: -.30_ 02'()1 15524 lCVaria'bli" 1
.� 5 210.7302519' 23 . 6 8 024894
-

The Predicted Y or Yi values have been computed for all the original observed xi values,
similarly to the way we computed y for x = 35 (see Section 2.3.1).

The least squares Residuals are defined as

(2.5)

You can compare the Predicted Y and Residuals values reported in the Excel Residual Output
to the ones reported in Table 2.3 of Principles of Econometrics, 4e (p. 66). They should be the
same.
40 Chapter 2

2.3.4 Editing the Chart

Now, the chart needs a little bit of editing. For one it looks like it is a Column chart as opposed
to a Scatter one. The scales could be changed. Finally, Chart and Axis titles are not currently
very helpful.

Place your cursor anywhere in the Chart area, and left-click, so that Chart Tools are made
available to you again. Select the Design tab. Go to the far left group of commands, Type, and
select Change Chart Type. In the Change Chart Type dialog box, select X Y (Scatter) chart,
and then Scatters with only Markers. Finally, select OK.

,- -

Chamge Chart Type

Templates

lltill Co1umn

� Line

@ Pie

� Bar

Chi!rt 1oor.s
� Area

11:1 XY (Scatter) �I
-

The result is:

X Varj ab�e 1 Line Fit Plot


101)-0

.•'4
,.. so: .
I
0.
-

w
J!�
30 40 • �redicted V

X \I ariable 1

Now that we have the correct chart type, we would like to draw a line through all the Predicted Y
points. Actually, since we are using those points to draw our regression line, what we want to
show is only the line. So, we will use the points to draw the line, and then get rid of those big
square points. This way our chart won't be as busy.

On your chart, select the Predicted Y points with your cursor. Your cursor should turn into a fat
cross as shown below:

X Varfable 1 Line ! F it Plot X Variab e 1 Line Fit Plot


11000 ....
,.. I
S.001-
-::11 •
Seri " < "Pmllicted �· Poiot "26.610001 " 1
•Y

I (26.6100CJI, 5.0946()'71)
35
11 30 40 • Pr<edicted Y

XVariable 1 XVariab'le 1
The Simple Linear Regression Model 41

Right-click and select Format Data Series. A Format Data Series dialog box pops up. Select
Line Color and Solid line. Change the line color to something different from the Y points.
Select Marker Options, and change the Marker Type from Automatic to None. Select Close.

Qelete
� Reset to MQtch Stylle -
r --

Cha ni:i. �· ·seri es C�art T:£pe ... Line Color --


format Data Seri es
Formiilt Data S-eries
GfJi I :s:�lect lllata ... 0 t!_eline
J 3-D B_otation
Series Options

Marker Option.-
0
0
�olidline

§_r adientftne.
Series Options
Marker Options
Marker Type
Adlf: Data. La.Q�f>
0 A�toma1ic
0 A�toma1ic


Marker Flll
Adc!Trendline... Marker Fill
-

fmma.t Dat"' s .. ries ...


� - .;;ol11r;
Line Color �f�����
The result is:

X Varmable· 1 Line Fit Plot


10()0 �--------

,_ 50: I - I ' �11\


20 30

• v

-Predkted'!I

)( Va rfable 1

On your chart, select the Legend with your cursor, right-click and select Delete.

X Variable 1 Line Fit Plot I Qieol e�ieo



� Reset to M�tch Style


.1000

1- ,J'' t\;1
A Eont...
,_ 500
Clilange Cnart TYJ:H' ...
0
0 10 20 30 40 � :S:�lect Data ...

3-n _E'.nt;;ilon
ICVaria'ble !I.
� Eor_mat Legen.a...

Change the Chart and Axis titles as you see fit. Below, we show you how you can change the
Chart title. You can follow a similar process to change the Axis titles.

Place your cursor in the title area and left click.

X Variable 11line Fit Plot


L>-----1Charlr T.itle,_______...,,
1000

; )- 5-00

0
I ..

HJ 20 30 40

XVariable 1
42 Chapter 2

Select the generic title.

G------------ -------------_i;i
l X rVariahle ll Line Fit Plot l
woo
lch>rtTIle;
� - -1T ------------ - ------0

> 500

0 I
0
..... I

10
' ••, ...
·�=··�. !
30
.
40

X Varlab'le l

Type in your new title.

You can select any of the titles and change the Font size by going back to the Home tab. Select
what you need on the Font group of commands.

Calit>ri (Body) • 110 �A• A�]


lej I !1 �l�l � - .A ·!
Fnnll r,

You can reformat the y-axis (and/or the x-axis) by selecting it with your cursor, right-clicking and
selecting Format Axis.
Q.elete

.a R�s·efto Ml!tch, S:tyle


.Figure,:2'.8 The fitted regression A Eont. ..
-

ai Chan:,ge Chart T)lpe...

..
� SS:lect Data ...

:3-0 ll_oh !Or>


'

Add Mi110• Gridltn�s


40
Fu rm at .l!!!.ajor.-Gridli ne s ...
w� Wl'e.lily in.oome in.$100
I& Eu<fm<1tAxi1 ...
[J:_

If you proceed as you did before to edit your vertical axis (see Section 2.1.2a), you should obtain
the following:
'Figure2.8 The frttedl.regres:<ion

To resize the whole Chart area, put your cursor over its lower border until it turns into a double
cross arrow as shown below.


The Simple Linear Regression Model 43

Left click, and it should turn into a skinny cross.

Hold it, and drag it down until you are satisfied with the way your chart looks.

Figure 2.8 The f"itted regression


0
0
"'
...,..
.5 0
a
� U'l
::I
-� 0
"Cl a
.. <t
Ii
x 0
Ill a

"Cl ro
0
Ji! a
0
? N
-""
Ill
0
�" a
..;·

;:a.
a

0 5 10 15 20 25 3() 35 40

11 =wee kly lnoome il'I :$10-0

You can delete the Gridlines by first selecting them, right-clicking and then selecting Delete.

Figu:re 2.g The fitted regre!l-S'icm

,. D
II D
"Cl rn
0
_Qelete
.s
1!-
O•
0 � -
....
Ill
N
� Re5i't to M;!hh- �tyle
II 0

"
0
.--i oll Change- Cha.rt Type ...
;:.. LE@i S.tledi Data...
0
� 3-D _Batat1!ln ...
0 :m 20 40
� furm af Grl d l i n, e-s ...
JI= weeklyinoome iru$10lD
Forma.t Axls...

You can also reformat the Data Series Y by selecting the points, right-clicking and selecting
Format Data Series. Then proceed as you did before to change your markers' options (see
Section 2. l .2c).
44 Chapter 2

Figure 2. B The fJUed regresskm


0
0
\.Cl

.5 0
a
f lI'I
=
:1:1
.. 0
I: 0
...
8.
:.:
Ill
0
.a
Qe>let�
.. m
Cl
.s 0
� Reset to M�WI Sty�e,
0
1=" "" Change Seri:es Ch;utT�pe...
...
Qil

� �
0
0 Sgl e ct Da.ta ...
.-i
ll
>- 3-D B.ol:al1on
CJ

Acfd Data La.!?_els


I{) 10 2() 30 40
Acfd Trl'"ndl.lne ...
weekh! ilil·oome in $!1.00
I�
.>e=
Emm;;;it Data :Seuies ...
� I

Your result might be (see also Figure 2.8 p. 54 in Principles ofEconometrics, 4e):

Figure 2 .8 The fTttedl regre:ssion


0
0
lD
"91-
.5 0
0
!! U1 .
=
:I: \
0
"Cll
I: 0 . '
<T .
8.
>o: 0
II 0
,, fY1
Cl

:f
.2 0
0
"'
. . . .
.. .
II
0

Ii
0
.,.,
II
...
0

0 10 20 30 4'()

x= we•eklv :in.tome in $100

In this next section we illustrate the concept of unbiased estimators.

2.4 EXPECTED VALUES OF b1 AND b2

To show that under the assumptions of the simple linear regression model, E(b1) = {31 and
E(b2) = {32, we first put ourselves in a situation where we know our population and regression
parameters (i.e. we know the truth). We then use the least squares regression technique to unveil
the truth (which we already know). This allows us to check on the validity of the least squares
regression technique, and specifically to check on the unbiasedness of the least squares
estimators.
The Simple Linear Regression Model 45

2.4.1 Model Assumptions

First, let us restate the assumptions of the simple linear regression model (see p. 45 of Principles
ofEconometrics, 4e):

• The mean value of y, for each value of x, is given by the linear regression function:

E(ylx) = f31 + f32x (2.6)

• For each value of x, the values of y are distributed about their mean value, following
probability distributions that all have the same variance:

var(ylx) = a2 (2.7)

• The sample values of y are all uncorrelated and have zero covariance, implying that there
is no linear association among them:

(2.8)

• The variable x is not random and must take at least two different values.

• (optional) The values of y are normally distributed about their mean for each value of x:

y -N[({31 + {32x), a2] (2.9)

In the specific and simplified case we are considering in this section, half of our hypothetical
population of three person households has a weekly income of $1000 (x = 10), and half of it has
a weekly income of $2000 (x = 20). Because we are all mighty, we know the values of our
population parameters, and consequently the values of our regression parameters. Let µylx=lO =
200, µylx=ZO = 300, and var(ylx = 10) = var(ylx = 20) = a2 = 2500. This implies
{31 = 100 and {32 = 10.

The probability distribution functions of weekly food expenditure, y, given an income level
x = 10 and an income level x = 20, are assumed to be Normal. They look like this:

- t(vl�=10J
-t(vlx=20)
46 Chapter 2

The linear relationship between weekly food expenditure and weekly income looks like the
following:
lJ

300

200

() 10 20

Let us emphasize the difference between this section and Chapter 2 in Principles of
Econometrics, 4e. In this section, we do know the truth. In other words, we have information
regarding weekly food expenditure and weekly food income on all three person households that
constitute our population. In Chapter 2 of Principles of Econometrics, 4e, like it is the case in
real-life, you do not have that population information. You must thus rely solely on your random
sample information to make inferences about your population.

Now, as an exercise, and as a way to prove the unbiasedness of the least squares estimators, we
are going to use the least square regression technique to unveil the truth.

Insert a new worksheet in your workbook by selecting the Insert Worksheet tab at the bottom of
your screen (or Press the Shift and Fl 1 keys). Name it Simulation.

Simu lation�'

We are going to draw a random sample of 40 households from our population. Half of the sample
is drawn from the first type of households, with weekly income x = 10; and half of the sample is
drawn from the second type of households, with weekly income x = 20.

Let us keep records of the level of weekly income for our 40 households in column A of our
Simulation worksheet: in cell Al, type x and Right-Align it; in cells A2:A21, record the value
10; in cells A22:A41, record the value 20.
The Simple Linear Regression Model 47

A A
1 20
2 20
3 10 20
4 10 20
5 10 20
6 10 20
7 1Q 20
8 10 20
9 rn 20
1.0 10 20
11 10 io
12 10 33 20
13 10 34 20
14 10 35 20
15 10 36 20
16 10 37 20
17 10 38 20
1.8 10 39 20
19 10 40 20
20 10 41 20
21 10 42

2.4.2 Random Number Generation

We use the Random Number Generation analysis tool to draw our random sample of
households. We keep record of their weekly food expenditure in column B of our Simulation
worksheet: type y in Bl, and Right-Align it.

I A I B II
1 J x y

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

Anal1111sc

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.

�alysi,.Tools
f-Test Two-Sample
Fowrier Analysis
Histogram
Movi�verag_e
for \/ariances
,� [
l c:�
DMfti@Miii.ffil§§·@M·'·!· I tfelp I
Rank and Per c:entile
Regression
Sampling
t-Test: PairedT,..C>Sample for<Means
1YI
t-Test: Two-Sample Assuming Equal Variances vi

A Random Number Generation dialog box pops up. Since we are drawing one random sample,
we specify 1 in the Number of Variables window. We first draw a random samples of 20 from
48 Chapter 2

households with weekly income of x = 10, so we specify the Number of Random Numbers to
be 20. For simplicity we assumed that our population of households has weekly food expenditure
that is normally distributed, so this is the distribution we choose. Once you have selected Normal
in the Distribution window, you will be able to specify its Parameters: for x = 10, its Mean is
µylx=io = 200 and its Standard deviation is .Jvar(ylx = 10) = a = 50. Select the Output
Range in the Output options section, and specify it to be B2:B21 in your Simulation worksheet.
Finally, select OK.

'R�lldom Number Ge ner�ti o_


n ___
ffr[g]
Number. of�ariables:
1
� 1 _ ____I_. �
rllumbff of Random Numt!ers: �lzo
____ �I �
'Qisrnbu·tion: �'
N
o _r m_aJ_
' -----"
� ' [ tielp J
Parameters

M!::,an=

Standard deviatior;i = �

�dom S eed;

Output options

0 Quljxit Range;
0 'New Worksheet.�ly:·
0 New Wodcbook

Repeat to draw a random sample of 20 from households with weekly income of x = 20. Change
the Mean to µylx=lO = 300 and the Output Range to B22:B41.

ParametErs
QutpLlt options

-
M�an=
� I e Qulµ!Jlt'R<lnge:. 1$8$22;$6$41 �

Here is the random sample that we obtained. NOTE: you will obtain a different random sample,
due to the nature of random sampling.
The Simple Linear Regression Model 49

A B A B
1 :x y 22- :m. ·214.6751
2 HJ 122.490&' 23 20 336.57.85
3 11() 163.1711 24 20 303.5467
4 11() 211.0i02 .25 20 .216.4365'
5 10 294.12.95· 26 20 358.9562.
6 10 192.9407 27 20 278.1513

1 1IQ 228.56.27 2& 20 257.9295


8 10 223.1013 291 20 33.1.23.85
'9 1!0 184.7241. 30 20 328.9643
11() 10 164.82·
-
67 31 20 .297.1585.
11 10 125.1754 32 20 338.727
12 10 274.037 33 20 297.34.23
13 10 136.920'1 34 20 201.38'94
14 llO 190.4468 35 20 309.4635
15 11() 121.6272. 36 20 305.@2.
1·6 10 202.8224 37
-
20 334.5588:'
17 10 123.4H 3& 20 2&6.24(12

l8 10 116.1414 39 20 273.67.85'

1'J 10 209.413.; 40 20 318.1071


20 11() 152.0113' 41 20 .2&3.9447
21 llO 200.4915 42

2.4.3 The LINEST Function

Next, we use the LINEST function to obtain the least squares estimates for the intercept and
slope parameters, based on the random sample we just drew. The LINEST function is an
alternative to using the Least Squares Estimators' Formulas (see Section 2.2.1) or the Excel
Regression Analysis Routine (see Section 2.2.2). It allows us to quickly get the least squares
estimates for the intercept and slope parameters. For this purpose, the general syntax of the
LINEST function is as follows:
= LINEST(y's, x's)

The first argument of the LINEST function specifies the y values, and the second argument
specifies the x values, the least squares estimates are based on. In our case, we thus need to
specify:
= LINEST(B2:B41,A2:A41)

The LINEST function creates a table where it stores the least squares estimates in Excel memory.
It first reports the slope coefficient estimate, and then the intercept coefficient estimate. So, if we
were to look into Excel memory, the estimates would be reported as shown below:

column 1 column 2
rowl

We nest the LINEST function in the INDEX function to get the estimated coefficients, one at a
time. The INDEX function returns values from within a table. In the case of a table with only one
row, the INDEX function general syntax is as follows:

= INDEX(table of results, column_num)


50 Chapter 2

The first argument of the INDEX function specifies which table to get the results from. In our
case, this is the table of results generated by the LINEST function above. So, we replace "table of
results" by "LINEST(B2:B41,A2:A41)". The second argument indicates from which column of
the table to retrieve the result of interest to us. So, if we want to retrieve the estimate of the
intercept coefficient, b1, from the table above, we would indicate that it can be found in column 2
by replacing "column_num" by "2".

We are going to report our estimated coefficients at the bottom of our table. In cell A43, type bl
=; in cell A44, type b2 =. Bold those labels. In cell B43 and B44, type the following equations,
respectively:
A B
43 bl= =INDEX(LINEST(B2:B41,A2:A41),2)
44 b2= =INDEX(LINEST(B2:B41,A2:A41),l)

Here are the estimates that we get:


A I B· I
43, b1= 67.•64114
-

44 b2= 11.47325

The estimates of the intercept and slope coefficients are based on one random sample. Our
random sample is different than yours, and each random sample yields different estimates, which
may or may not be close to the true parameter values. The property of unbiasedness is about the
average values of b1 and b2 if many samples of the same size are drawn from the same
population. In the next section, we are thus going to repeat our sampling and least squares
estimation exercise.

2.4.4 Repeated Sampling

Note that in Chapter 2 of Principles


of Econometrics, 4e, the repeated samples given to you were
randomly collected from a population with unknown parameters. In this section, we draw our
samples from a population with known parameters.

Go back to the Random Number Generation dialog box. We would like to draw 9 additional
random samples, so we specify 9 in the Number of Variables window. Again, we first draw
random samples of 20 from households with weekly income of x = 10, so we specify the
Number of Random Numbers to be 20. We also select Normal in the Distribution window,
and specify its Parameters. For x = 10, its Mean is µylx=lO = 200 and its Standard Deviation
is .Jvar(ylx = 10) = a= 50. Specify the Output Range to be C2:K21. Finally, select OK.
The Simple Linear Regression Model 51

. �ndom Number G eneratio111 -_- �� -


Number of'{�riables:
lg �
Number of RandomNum�ers: J�20----� �
�---�

Qistribulion: jNarmal !:Jelp

Parameters-

M�an=

::i_t:and"rrl dev.ialion = �

8_andom Seed:

Outp;Jt op lions
@ QutputRa'J9e: �$2:$C$21

0 New Worksheet f'.ly:


0 NewWorl\bcok

Repeat to draw a random sample of 20 from households with weekly income of x = 20. Change
the Mean to µylx=lO = 300 and the Output Range to C22:K41.

Parameter.s.

I�
Output apfons
QutputR,ange:

Next, before we copy the formula to get our coefficient estimates, we need to transform their
Relative cell references A2:A41 into Absolute cell references $A$2:$A$41, since we will be
using the same x-values for our next 9 rounds of least squares estimates.

I b I :INDEX(UNESlil B2: B41, A 2:A41},2}�I "'INDEX( LIN EST( B2.:B41, $A$2:$A$41},2}


1'
lie I =INDEX(UNEST(B2:B41,.A2:A41),1) r :,: '.'fr j =INDEX(UNBli'(B2:B41,$A$2:$A$41),1)

Copy the formulas from B43:B44 into C43:K44. In cells L43:L44 compute the AVERAGEs of
your estimates from your 10 samples. In cell L43, you should have =AVERAGE(B43:K43); in
cell L44, you should have =AVERAGE(B44:K44). The estimates and average values that we get
for our 10 samples are:

A I B I c I D I E I F I G I H I I I I I I<'. I l

43 bl: 67.64114 65.92893 110.0?45 50.41892. 102.9383 12.7. 2p �6 68.025{)8 30.43498 132..2953 75.4688 89.14425
--·
. -.
44 . b2: 11.4732.6 12.2687 S:.813-088 11.73885 10.11185 8.61•69 11.5.521 10.8758 8.048971 11.33003 10.48296

If we took the averages of estimates from many samples, these averages would approach the true
parameter values {31 and {32. To show you that this is the case, we repeated the exercise again.
Here are the average values of b1 and b2 that we did get as we increased the number of samples
from 10, to 100, and finally to 1000:

Number of samples 10 100 1000 Parameter Values


Average value of b1 89.14425 98.44593 99.48067 100
Average value of b2 10.48296 10.08958 10.04135 10
52 Chapter 2

The next section of this chapter is very short. It points out how you can compute an estimate of
the variances and covariance of the least squares estimators b1 and b2 using Excel. It also outlines
other numbers you can recognize in the Excel summary output. Note that for this section we are
getting back to our food expenditure and income data of Sections 2.1-2.3, i.e. data from one
sample of 40 households that was drawn from a population with unknown parameters.

2.5 VARIANCES AND COVARIANCE OF b1 AND b2

You can compute an estimate of the variances and covariance of the least squares estimators
b1 and b2, the same way you computed b1 and b2. Consider their algebraic expressions (see
below or p. 65 of Principles of Econometrics, 4e), and perform the simple arithmetic operations
needed. You might want to do that as an exercise; you will be able to check on your work by
comparing your estimates to the one reported on pp. 66-67 of Principles ofEconometrics, 4e.

Estimates of the variances and covariance of the least squares estimators b1 and b2 are given by:

(2.10)

(2.11)

(2.12)

where: N is the total number of pairs of values,

2 L -2
and 8 = _!J_ is an estimate of the error variance, (2.13)
N-K

where: K is the number of regression parameters, K = 2,

and ei = Yi - Yi = Yi - b1 - b2Xi are the least squares residuals.

The square roots of the estimated variances are the standard errors of b1 and b2. They are denoted
as se(b1) and se(b2).
(2.14)

Excel regression routine does not automatically generate estimates of the variances and
covariance of the least squares estimators b1 and b2, but it does compute the standard errors of b1
and b2, as well as other intermediary results.
The Simple Linear Regression Model 53

Specifically, the following estimates can be found in the Excel Summary Output you generated
earlier:

Sum of Squared Residuals (SS Residual) in C13

Mean Square Residual (MS Residual) inD13

B: Standard Error of the Regression inB7

Standard Errors of Intercept and X Variable 1 in C17:C18

A I B I c I D I E I F G I H I I
� SUMMARY OUTPUT

JI RefJ.e
I Ssfon Statistic:s:
4 Mul tii:>le :R 0_620485472
c-5 - R Sqllaie 0 _385002221.
� �djus!erl R Square 0.368818059
7 Stan.dard Error 89_51700429
8 Observations 40

e-fo-IANOVA
i! I __
dt SS MS f Sig_niflc1111ce f
'
J1_ f3egr·ession 1· 190 G2'
- &.9788 190626_9'788 .23-78884107 1 . 94585E-()5
-

13 Residual 38 304�05.1742 8 0 13294 OJ 8"


4961 32 .153'
f-.,..
14 Total 39
1fl
,__1s. I -Coefficients Slandard Error I Stal P-v,a.lue Lower95% l.Jooer 95% Lo�·er 950% Uooer95.0%
HH�nter>:ept .BJ .41600997 43.41016192 1_921-5n-9'51 0_0621 B23 !9' -4 .463267721 171-2%2877 -4-453-2'.67721 171.2952877
X V.ariable.1 1·0J'!096425 2:093263461 J
4_87i BO 5-54 1.94586E-05 6.9720522a2 14.4472328 5_97.2'05220:2 14".44 72328

Note that :L if, the Sum of Squared Residuals (SS Residual), is also referred to as the Sum of
Squared Errors - hence the abbreviation SSE used in p. 51 of Principles ofEconometrics, 4e.

2.6 NONLINEAR RELATIONSHIPS

2.6.1 A Quadratic Model

2.6.la Estimating the Model

Open the Excel file hr. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 2 Excel file, name it pr data, and in it, copy the data set you just opened.

Sim ulat'lon -t;J JS smuli!tion


I lnmtWorkshett (Stlift+ fl.1] t,

This data set contains data on 1080 houses sold in Baton Rouge, LA during mid-2005, which we
are using to estimate the following quadratic model for house prices:

(2.15)
54 Chapter 2

In your br data worksheet, insert a column to the right of the sqft column B (see Section 1.4 for
more details on how to do that). In your new cells Cl:C2, enter the following column label and
formula.
c
1 sq ff
2 =B2J\2

Copy the content of cells C2 to cells C3:C1081. Here is how your table should look (only the
first five values are shown below):

A I B I c
1 pric� sq!f sqft2
-
2 6:6500 741 549'081

3 56000 741 549·081
4 68500 790 624100
1 02000
-

5 2783 7745089
6 -
54000 11165 1357225

In theRegression dialog box, the Input Y Range should be A2:A1081, and the Input X Range
should be C2:C1081. Select New Worksheet Ply and name it Quadratic Model. Finally select
OK.
i Regress.io n ��
Input
I $A 2': $As1os [�

.

Input)'. Range: 1

Input! Range: I $C:$2: $c$1os 1



tielp
Obabei� D Constant is ;:;er ci
D Con�dence LeYel:. �%
Qulpui optio�
0 Qulpu± Range: �1
0 New Worksheet�y: J Quadratic ll'\odel I

The result is (matching the one reported on p. 70 in Principles ofEconometrics, 4e):

I
A B I c D I E I F I G I H I I
1
_,_

SUMMARY O'UTPUT
,_
2
3 Rec:ire-ssion Stab-.Slics
4. Multiple R U32075415
5 R .S�uare 0.&92349497
� Adju�.!e<:I R. Sq':J_ar.e
--

OJj.920£4107
.1__1
Standard Error 68205: 74032
8 Observations 1080
9
10 AN OVA
111 (jf SS MS F Stg_nif.lcar1cr;: F
12 Regression 1 1.1286Et13 1.12B6Et13 2425.976064 3.3748E-278
13 Residual -- 1078 5.0150JE+12 465�21594.26 �

fotal 10>79 1.&3011'E;-1-13


f-7--
14
---
15
16. Cooffl.Gfrmts Standard Enor t Stat P-v.alve Lower·95% Upper95% Lower95.0% Upver 95. 0% 1
*I I nterce]}t 5577'6.565-64 289'0.-44!213 192969()357 1.67487E-71 50105.0373 G144B.ml398. 50105,()373 61448. Q9.3-98
x va'�i abte 1 cLo1s421301 3j748E-27B o.oriso69s4 o .11·11fo3 5'Mi a. 014.806954 0 .Oc16035'.648
- . -

Q_OOQ31"3095 49.2'5419844
The Simple Linear Regression Model 55

2.6.Jb Scatter ofData and Fitted Quadratic Relationship

Go back to your br data worksheet and select A2:B1081. Select the Insert tab located next to the
Home tab. In the Charts group of commands select Scatter, and then Scatter with only
Markers.

Scatter

(olu1r1n Urie Ppe B-ar Are•

The result is:

9000

8000

700!()

6000

5000

4000 • S.erie'l=l

3001[)

2000

10{)0

IQ 50:0000 1()00000. fiOOOOO 2000000

You can see that our house price values are on the horizontal axis and square footage values are
on the vertical axis; we would like to change that around and edit our chart as we did in Section
2.1 with our plot of food expenditure data. The result is (see also Figure 2.14 on p. 70 in
Principles ofEconometrics, 4e):

150000{)

<I>
.5
100000<0
·�ll.
$
::I .
0 . ..
:c soon no .
·. '
. ':..··�·.��· . ....:· .
' •I II

0 2000 4000 6000 800Q

Total Square feet

Finally, we add the fitted quadratic relationship to our scatter plot. In cells Nl:N2 and 01:03 of
your br data worksheet , enter the following column label and formula.
56 Chapter 2

N 0
1 quadratic price-hat sq ft
2 ='Quadratic Model'!$B$17+'Quadratic Model'!$B$18*'br data'!02 0
3 400

Select cells 02:03, move your cursor to the lower right corner of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell 022: Excel recognizes
the series and automatically completes it for you. Next, copy the content of cell N2 to cells
N3:N22. Here is how your table should look (only the first five values are shown below):

N I 0

_1__ quadrnti_c_ pri ce-.ltat :sqft


• 0 I __1_ 55776.56564 0
1 sllft -3 5B243.9B67 400
2
3 40[ --
-
4
5
6.5646.19·855
·n9.a3 _2397
95z55 n913
800
1200
1600
41 _§__ __

Go back to your scatter plot and right-click in the middle of your chart area. Select Select Data.
In the Legend Entries (Series) window of the Select Data Source dialog box, select the Add
button. In the Series name window, type Fitted Quadratic Relationship. Select 02:022 for the
Series X values and select N2:N22 for the Series Y values. Finally, select OK. The Fitted
Quadratic Relationship series has been added to your graph.

' Select Dat<t So1m;e --- ----


t Series
Chart Qat3 raoge' c= Series �ame:

Qelete The data range is too comieJex I


t:Ae seriez in the Series.parlel.
[ Fitted Quadratic Relationshlp
m =f
l!J Reset tQ M�t<h, Styr.e Series� values:

Change Chart T�pe ... �r [I] =C

I� ·��l�ct D�ta... 1:€- Legend :Entries <S_e:rles}


-'
1 J -C' Rolanon I 12t' Edit [ ='br dara'! $N$Z::$NS22
� foorlblt IPlol Area... .Serlest

Before you close the Select Data Source dialog box, select Seriesl and Edit. Type the name
Actual in the Series name window. Select OK. In the Select Data Source window that re­
appears, select OK again.

-------
elect IJata Source
'
. - -
Chart Q.ara range: !==: Edit Seri es

The data·range is·toocooiplex t sene.s �ame:


.the series in the-Seiii es panel
._IA_ctu_11a.:... __,[i] _.,i�

�p
______

Series ;ii;: valueg:


[ =b'_ r_d_taa _·, _�_$'._
� 2 :� :$- 1_os_a
__ __ �[i} = 7•

Serie�;( values:
=br data'!$A'$2:-$A�108 l �

Llit;J
Make sure you chart is selected so that the Chart Tools are visible. In the Layout tab, go to the
Labels group of commands. Select the Legend button and choose either one of the Overlay
The Simple Linear Regression Model 57

Legend options. Grab your legend with your cursor and move it to the upper left comer of your
chart area.

�� [i] �
Chart Tool! Chart Axrs leg.enp'j Data Data
Title· Titles� • [:?Labels• Table
Design
Labels

Finally, we want to reformat our Fitted Quadratic Relationship values series. Select the plotted
series in your chart area, right-click and select Format Data Series. A Format Data Series
dialog box pops up. Select Line Color and Solid line. Change the line color to something
different from the Actual series points. Select Marker Options, and change the Marker Type
from Automatic to None. Select Close.

Qelete
� Reset to M.!!_!ch S.tyle

-
Change S:eries Cl'lallt T�!J·e ... I Format Oata s�ries Line Color --

Format D'ill ta S.e ries


tiilJ S_gl ert ID'ata ,,, 0 !::!oline
Marker Options
Ser'ie s Option•
3-DRo o:uon 0 ·l:iolid line Series.Opficns Marker Type
Marker Options· 0 (l:r adiEnt ,fine
Add D'ata La)leP>
0 Aytomatic
Marker Opllons[i Q Al!_tor:natc
Marker Fill
Add li!fndline .. , Marker Fill
f.om1at Data Se-ri�s ...
c;;olor: I�-� Line Color

The result is (see also Figure 2.14 on p. 70 in Principles ofEconometrics, 4e):

• •
1500000

. ..
1h
.5! - Fitte.d Quadratic
1000000 Relationship
-�
... •


..
0
x SOOOGO

• •

D'
a 2000 4000 6000 8000

Total Square Feet

2.6.2 A Log-Linear Model

2. 6.2a Histograms of PRICE and ln(PRICE)


2
In your br data worksheet, insert a column to the right of the sqft column C (see Section 1.4 for
more details on how to do that). In your new cells Dl:D2, enter the following column label and
formula.
58 Chapter 2

D
1 ln(price)
2 =ln(A2)

Copy the content of cells D2 to cells D3:Dl081. Here is how your table should look (only the
first five values are shown below):

A I B I c I D J
I-
1 price sqft sqtt2 lnlpric:e)
i 6&500 741 5490�1 1UCi496
3 6600-0 741 549081 11.09741
,_
4 68500 79n &24100 11-13459
I-
5 102000 2:183 7745089 11.53273
>---
� 54 000 1165 1357225' 10.89674

Next, we specify BIN values. These values will determine the range of PRICE and ln(PRICE)
values for each column of the histogram. The bin values have to be given in ascending order.
Starting with the lowest bin value, a PRICE or ln(PRICE) value will be counted in a particular
bin if it is equal to or less than the bin value.

In cells Sl:T3 of your br data worksheet , enter the following column labels and data.

s T
1 price bin lnprice bin
2 0 9
3 50000 9.2

Select cells S2:S3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell S34: Excel recognizes
the series and automatically completes it for you. Similarly, select cells T2:T3, move your cursor
to the lower right comer of your selection until it turns into a skinny cross; left-click, hold it and
drag it down to cell T29. Here is how your table should look (only the first five values are shown
below):
s. I T
, price bin I npric-e bin
-
2 0 9
s J T I --
3 50000 9.2
1 ori&e bin 1 lnnrioe IJ.in
T 1()000.0 9_4
c--2
3 500 0a 2
3
-9.2+
9:1
-
5 150000 9.6
, ' . ' 6 200000 9.8

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

F1Jrmulas

The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
The Simple Linear Regression Model 59

� - _-_ -:--

; D11ta Analys i s [1J rg)


e,nal�sis Tools
Covariance
Descriptive Sta tis tics
E.xponential Smoothing

=I
F-TestTwo-5ample for Variances
!:!!elp
Fouirier Analysis
w 1sto ram
MC1ving Average
Random N�mber Generation
Rarik and Perceritile
Regression vi

An Histogram dialog box pops up. For the Input Range, specify A2:A1081; for the Bin Range,
specify S2:S34. The Input Range indicates the data set Excel will look at to determine how
many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it Price Histogram; check the box next to Chart Output. Finally, select OK.

1' Hi-s- _______________ CIJCgJ


to _g_r.a_m

Input

[nput R;m�e: 1$A$2;:$A$lll81


�Jr;iRange:

Output options

0 Qutput RQnge:
@New Workshe.etBJy: I Price Hismgr.im
I
0 Ner/11 '8'_arkbm1k
D Pgreto (s(!)rted histogram)
Ooumulative Percentage

�l��·�r.t·90_tiJUt]

Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap.

�r - -

Format D11ta s.eries


Ll)rE)
Q•l�tt@ I Serles Options j Series. Options
Fill Series Qverlap
� Re�et to M!tch S!yle•

01ang;e Series CElartT11pe... Bi;ir.der Color Separated r------C)- O�erlapped

� s�nect oata ... Border Styles.



3 DE.oti!tlon Shadow
Gap�dth
Add Data Laget:s

Add Trendline ...


J-0 Format
NoGap tQ;: Large GapI
� format Oil.ta, Series ... [!� _J
Go to the Border Color tab and select Solid line, choose a different Color if you would like.
Select Close.
60 Chapter 2

--
'F·�rmat Data Seri e$ -(1)�
S"erfe. s Options
Border Color
0 !:!oline
� 8'Jrder Colmr J 0 :i_olldh

Border Styles
0 �radlent line
0 A!!_tomatic
Shado'll'

3-0
�oler: �

Format
Iransp (Co l<>r) Q"----- �1 Clo

After editing our chart as we did in Section 2.1 with our plot of food expenditure data, the result
is (see also Figure 2.16(a) on p. 72 in Principles ofEconometrics, 4e):

450

400

350

...
"
300

"
250
Ill
"
...
...�
200

150

100

50

0 50-0000 1000000 150DOOO

Sale Prfce, dollars

Note that the frequencies given in the graph above are absolute ones, while the frequencies given
in Figure 2.16(a) of Principles ofEconometrics, 4e are relative ones.

Go back to your br data worksheet. In the Histogram dialog box, specify D2:D1081 for the
Input Range and T2:T29 for the Bin Range. Check the New Worksheet Ply option and name it
lnPrice Histogram; check the box next to Chart Output. Finally, select OK.

,· Histognm LIJ"�
[nput
lnput RaiJge: 1$()$2:$0$1081 �
!l.in Range: I $T$2�n�9 �
tielp
D�abe1s

Output options
0 QiJlplJtRiOnge: I �I
® New Worksheet Ely:: ItnPcice Hisfugram I
0 New \!!!_orkbook
D P!!!eto (sorted histogram)
D Cumulative Pern:er'll:age
� !;;_h,.rt Output

The final result is (see also Figure 2.16(b) on p. 72 in Principles ofEconometrics, 4e):
The Simple Linear Regression Model 61

25()

200-

;,..
"
c::
15()
al
"
1!:11'
� 100
...

50·

"' -<:" 00 "" w rl . ..,. 00 "" .., "' ... 00 "' "'
a'i ai 0 c:i ·rl ,,...j ..... ..:; ...; rl ,.,; ,,.; <i
rl ·rl ·rl rl rl .,.., ·rl rl rl
0

lnPrloe

Again, note that the frequencies given in the graph above are absolute ones, while the frequencies
given in Figure 2.16(b) of Principles ofEconometrics, 4e are relative ones.

2. 6.2b Estimating the Model

We estimate the following log-linear model for house prices:

ln(PRICE) = y1 + y2SQFT + e (2.16)

In the Regression dialog box, the Input Y Range should be D2:D1081, and the Input X Range
should be B2:B1081. Select New Worksheet Ply and name it Log-Linear Model Finally select
OK.

1' R�----------ITJ@
Input
InputY. Range:

Input'� Range:
I $0:�::$051081
I sssz: : ��1oa 1
[fil
[�J
� el

!:ielp
Dtoabels. D !Coo stant is ;;'.ero
D Confidence Level: EJ %
Output opb"onSo
0 Qurtput·Rarige� rii J
e New Worksheet E:IY': I Log-linear Model I

The result is (matching the one reported on p. 72 in Principles ofEconometrics, 4e):


62 Chapter 2

-.Hs'UMMAR:YA ouTPm I
h-1·
B I
}
C. I D I E I F G H I I

3 I Reg_ress1on Stalislics I
,_i_ _Mulliple R 0-79·(}4.13619
.-3 R Square 0.624753·&89
� A·djusted R s.q�are 0.6.24405594
l Standard Error
ti Observations.
0.'.3'2:1465013
108-0

10 AN'OVA I
11 I I df SS MS F Sig_nificc11nG"! F 1
1 2 R·egressiun
13 Residual
i 1
1078
1·85.4720974
111.4002553
185.4720-_9'74
0 .103339'75 4
1794-779738 t1066E-231

J4 Total 1079 296.8723527


15·
16 .GoeffiGienfo Standard Error l Sfaf P-V'alue lower95% Ue_eer 95% Lower 95.0% U!!J!.er95.0%
H Intercept ·10.8385%32 �-024fi0_7484: . 440.45�3�2 I} 1 ();_79031232 10Jl86680'3 1 - 10.790·31232 10�8868:8031
m )( var.i abl e 1 0.0004
· 112.6·9 9'.7D779E-06 42.364840:8.2 1.1066E-231 0•.000392221 0:000430}1 T O_ O'OQ.J9'2'2i1 o.odo4'3oj11

2. 6.2c Scatter ofData and Fitted Log-Linear Relationship

In cells Ql:Q2 of your br data worksheet, enter the following column label and formula.

Q
1 log-linear price-hat
2 =EXP('Log-Linear Model'!$B$17+'Log-Linear Model'!$B$18*'br data'!P2)

Next, copy the content of cells Q2 to cells Q3:Q22. Here is how your table should look (only the
first five values are shown below):
=
.. Q
.Llog-linear p.rice hat
_L_ 50949.81045
-3 6006Qi.27135
4 70799.7%17
5- 83459.681 BJ
-6-
9B383.3t279

Select your scatter plot of actual data points and fitted quadratic relationship and make a copy of
it. Right-click in the middle of the copy of your chart. Select Select Data. In the Legend Entries
(Series) window of the Select Data Source dialog box, select the Fitted Quadratic
Relationship series, and then the Edit button. In the Series name window, replace the old name
by Fitted Log-Linear Relationship. Select P2:P22 for the Series X values and select Q2:Q22
for the Series Y values. Finally, select OK, twice. The Fitted Log-Linear Relationship series
has been added to your graph.
, ------
Select Data Source
,... - --
chart Qata ranl'Je: ·c= ' Edit 5-erit'S
The data range is lo1:1 compi_ex t Series name:
:the. S..ries in the Series pan el .
I ='M�d Log�inear Rela1ionship" m =Fi'
I
Delefe

IJ Reset to M�tch Style Ic-=1 Series� values:

1 Chang� Chart Tl!Jil� ... Legend Entries ffiertes) I =br data'!$1'$2:$P$22 �


LEl!3J = o,
IUiJ S.�lf'tl Data... .[i Series Y values:

13-D RQ.tiltmn [�] = 5�


-�

,_ forma� Plot Area . . .


The Simple Linear Regression Model 63

The result is (see also Figure 2.17 on p. 73 in Principles ofEconometrics, 4e):

• •
1500000 • Actual

••
-4Jl­
�= - fittce d Qi.iail n1t(c
.� 100000-0 R'e J.atlon:>hip
&: •

�..
0
;c 500000

0 2000 4000 6000 8000

Tata! Square fi!et

2.7 REGRESSION WITH INDICATOR VARIABLES

2.7.1 Histograms of House Prices

Open the Excel file utown. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 2 Excel file, name it utown data, and in it, copy the data set you just opened.

I utown data ''ti:..


I
I

This data file contains a sample of 1000 observations on house prices in two neighborhoods. One
neighborhood is near a major university and called University Town. Another similar
neighborhood, called Golden Oaks, is a few miles away from the university.

In cells Hl:H3 of your utown data worksheet, enter the following column label and data.

H
1 bin
2 125
3 137.5

Select cells H2:H3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; left-click, hold it and drag it down to cell H20. Here is how your
table should look (only the first five values are shown below):
64 Chapter 2

H
1 bin
I-
2- 125
H .I f-
:j 137.5
1 bin
2 12� 4·- 15.Qr

137_�1
,_
5 162_5
3 ,_
' I ,5 175

In the Histogram dialog box, specify A2:A482 for the Input Range and H2:H20 for the Bin
Range. Check the New Worksheet Ply option and name it Golden Oaks Prices Histogram;
check the box next to Chart Output. Finally, select OK.

I H istogram rn�
Jnput
!npwt Range: $!.$2:$<\$482: li3 rn;: 1£]
!:!in Range·:
cancel ]
$H$:2:')H� �
O�abels t:Jelp J
Output 01:rtions
0 Qutput �nge:
0 NeVY Worksheet �ly.; J Oaks Prices. Histogram J
0 Nelil Workbo;;ok
0 P,grero (SQl"ted hisilogram)
0 Cu!!!ulanve Percentage
� Chart Output

The final result is (see also Figure 2.18 on p. 74 in Principles ofEconometrics, 4e):

90
80

70

60
t"
"
ill 50
"'
'Ir"

40
ILL

30

20
10

0
125 :1!50 175 201J .225 .25() 275 300 325 350

House. Pril:es [$1,000�ln Goldem Oaks

Note that the frequencies given in the graph above are absolute ones, while the frequencies given
in Figure 2.18 of Principles ofEconometrics, 4e are relative ones.

Go back to your utown data worksheet. In the Histogram dialog box, specify A483:A1001 for
the Input Range and H2:H20 for the Bin Range. Check the New Worksheet Ply option and
name it U Town Prices Histogram; check the box next to Chart Output. Finally, select OK.
The Simple Linear Regression Model 65

. -

i Histogram � t8]
Input

lflput Range: ISA$'18.J:$A$WOl [�l


I $H�:$H$2D �l

Output �.P btms


0 Qutput:Range: ,�I
® New Vllorl;sheet!"_ly; I U Town I
Prices.Histngrar

0 :New illorkboo\_
D 'P�reto (scf rted histogram)'
D Cumulative Percentage
� Q.hartOutput

The final result is (see also Figure 2.18 on p. 74 in Principles ofEconometrics, 4e):

90

80

70

50
e-
i= 50

u..
40
30

20

10

() -t--.--.--i-.,_
125 150 ]75 200 225 250 275 300 325 350

House Prices ($!1.,000) in Univercsity Towrn

2. 7 .2 Estimating the Model

We estimate the following regression model for house prices

PRICE= {J1 + {J2UTOWN + e (2.17)


The indicator variable is

UTOWN = {� house is in University Town


house is in Golden Oaks
(2.18)

Go back to your utown data worksheet.

In the Regression dialog box, the Input Y Range should be A2:A1001, and the Input X Range
should be D2:D1001. Select New Worksheet Ply and name it Indicator Variable Model.
Finally select OK.
66 Chapter 2

OK
Input)'Jl.�e: 1
Cancel
Irlpuf� Range.: si:J$2:$Dsioo1

t!elp
Dkabeis. 0 Cons'tant is f:er'o
D Confldence Level:

Output �ptions

0 Qutput Range:
___ _____ _
___
_
l1] �
_11
g r_e-_s51-_0
� R-e-0 _ _ _

New Workstiee� �y: [ndicator Variable Mode

lnput �
I SA$2: $A, 5WO [ifil I
The result is (matching the one reported on p. I 75 in Principles ofI Econometrics,
I 4e):
[�l
l J
SUMMARY OUTPUT 6=) %
Statfslir::s
MultipleR 0.728744479, �1
Adjusted R Square 0.53106851&. I I
Standard Ermr 28.90745008
Obser\l:alions 1000

A.NOVA
F
A I 8 I 944476.7536'
c I 94447D6.7536 I 11.30.242684
E 2.'64F
79E-1<&�.I G I H I I
1 I 83 3969.3888 835. 640670>1
f---
2
14 I H78446.143:
3 l Reg_ressrofi'
Coe.fficienfo Standard Error I stal P-value Lowre 95% Lower95.0%
,__!_
L R_Sq�are 215.7324947 131.806625S 1163.673481"2. 213.145S956 213.1459'956 218.J18993�


7
X Vari.a11Jre 1 0 ._53 0598645 ,
61,.5091066'&: 1.829589113· 38.6190.8214
I-
2.'6479E-166 57.9188238 65.0fHr3:89-51 57_9188238 6.5.D9938951

f---
8
..

9j
This
1 a ends Chapter 2 of this manual.
f1 I
I
You might want to save your work before you close shop.
df SS MS
I

F Sif!.n'lfic11nce
J? r�r.essi ar:i 1
J.3 Re�si<iual 9SS. ·

Total 999',
15 1
�Intercept
16 "1
,____

r-
0
Urper95%
21 8.J.189939
Uopw-95. 0%
CHAPTER 3

Interval Estimation and Hypothesis


Testing

CHAPTER OUTLINE
3.1 Interval Estimation 3.2 Hypothesis Tests
3.1.1 The t-Distribution 3.2.1 One-Tail Tests with Alternative "Greater
3.1.1a The t-Distribution versus Normal Than"(>)
Distribution 3.2.2 One-Tail Tests with Alternative "Less
3.1.1b t-Critical Values and Interval Than"(<)
Estimates 3.2.3 Two-Tail Tests with Alternative "Not
3.1.1c Percentile Values Equal To" (;t)
3.1.1d TINV Function 3.3 Examples of Hypothesis Tests
3.1.1e Appendix E: Table 2 in POE 3.3.1 Right-Tail Tests
3.1.2 Obtaining Interval Estimates 3.3.1a One-Tail Test of Significance
3.1.3 An Illustration 3.3.1b One-Tail Test of an Economic
3.1.3a Using the Interval Estimator Hypothesis
Formula 3.3.2 Left-Tail Tests
3.1.3b Excel Regression Default Output 3.3.3 Two-Tail Tests
3.1.3c Excel Regression Confidence Level 3.3.3a Two-Tail Test of an Economic
Option Hypothesis
3.1.4 The Repeated Sampling Context 3.3.3b Two-Tail Test of Significance
(Advanced Material) 3.4 The p-Value
3.1.4a Model Assumptions 3.4.1 The p-Value Rule
3.1.4b Repeated Random Sampling 3.4.1a Definition of p-Value
3.1.4c The LINEST Function Revisited 3.4.1 b Justification for the p-Value Rule
3.1.4d The Simulation Template 3.4.2 The TDIST Function
3.1.4e The IF Function 3.4.3 Examples of Hypothesis Tests Revisited
3.1.4f The OR Function 3.4.3a Right-Tail Test from Section 3.3.1b
3.1.4g The COUNTIF Function 3.4.3b Left-Tail Test from Section 3.3.2
3.4.3c Two-Tail Test from Section 3.3.3a
3.4.3d Two-Tail Test from Section 3.3.3b

67
68 Chapter 3

In this chapter we will use the t-distribution to construct interval estimates and perform
hypothesis tests. We continue to work with the simple linear regression model of weekly food
expenditure.

3.1 INTERVAL ESTIMATION

Open the Excel file food. Save it as POE Chapter 3.

Rename Sheet 1 Data. Quickly re-estimate the regression parameters using Excel regression
analysis routine as in Section 2.2.2. In the Regression dialog box, the Input Y Range should be
A2:A41, and the Input X Range should be B2:B41. Select New Worksheet Ply and name it
Regression; you do not need to check the box next to Line Fit Plots.

3.1.1 The t-Distribution

3.1.la The I-Distribution versus Normal Distribution

The t-distribution is a bell-shaped curve centered and symmetric around its mean, equal to zero. It
looks like the standard normal distribution, except it is more spread out, with a larger variance
and thicker tails. The exact shape of the t-distribution is controlled by a single parameter called
the degrees of freedom, often abbreviated as df The notation tern) is used to specify a t­
distribution with m degrees of freedom.

Below is a graph of the t-distribution with m = 3 degrees of freedom and the standard normal
distribution.

D.40 l-"""===;;;;
" ;;;;
;; ;;:='
;; :=::==�k===.;:=:1 - - - N(0.1)
--
fl'3.\

D.1QI===:::

[)_2{)1-""""==�

n.on ------=-----------...__
_ ._____._....__
.. ___

-6 -2 0 6
Interval Estimation and Hypothesis Testing 69

3.1.lb t-Critical Values and Interval Estimates

In order to construct interval estimates, we will need critical values of I-distributions with various
degrees of freedom. The abbreviation used for a critical value is tc. The values -tc and tc are the
endpoints of a closed interval around zero such that the probability of drawing a I-value in this
interval is (1 - a), and the probability is a that a value is either less than -tc or greater than tc.
Since the distribution is symmetric, the probability that a I-value is less than -tc is (a/2), and
the probability that a I-value is greater than tc is (a/2).

We are usually interested in the critical value tc such that the probability that a randomly drawnt­
value is within the closed interval [-tc, tc] is 0.95 or 0.99, which means that the probability of a
value outside the interval, in the tails of the distribution, is only 0.05 or 0.01.

Let a 0.05. This leads to a closed interval [-tc, tc] such that
= the probability is (1 - a) =

(1 - 0.05) 0.95 of randomly drawing at-value in this interval.


=

/(!)

3.1.lc Percentile Values

Since the probability is(a/2) that at-value is greater than tc, this also means that the probability
of drawing a t-value less than or equal to tc is (1 - a/2). The critical value tc is the 100(1 -
a/2) percentile of the I-distribution, denoted tci-a/Z,m)·

3.1.ld TINV Function

We will use the TINV function to computet-critical values. First, we create a new worksheet and
table where we will store our computations.

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the Data tab. Name it t-critical value.

14 4 � •I I Rfiljression r 'om '


....t;J rs.. ' � f . •1 I Regression • Data1 J t-ril'.itic.al value { ti!1
Re;ady j llmert W�1.kshe!rt: (S.hift�filJ I L--.../ f.fea lf.y I I

Select cell Al. Select the Insert tab located next the Home tab. In the Text group of commands
select Symbol. In the Symbol dialog box, the Symbols tab should be open. Select a (you might
70 Chapter 3
need to use the scroll bar to move up and down the window and find this symbol). Finally, select
Insert.
- ------

5ymlbol

�)llTlbds Sll,eda[ characters

E_ont: l(normill text)


ti;]
"(�id: Head ..r WordArt s:f.gnatu re Oojo:d Symbol
& footer
� Gfn13 I v I c I
Box Gne •
Insert
Te'lll!

Fill in the rest as shown below:


I A I s:
1 ll = 0•.05
,__ .1
2 m= --
3!.l
3
tc=

t-critical values are obtained in Excel by using the TINV function. The syntax of the TINV
function is as follows:
=TINV(a, m)

To find the t-critical value for a= 0.05 (the combined probability in two-tails) and m= 38,
given the way we organized our table above, we need to write the following formula in B3:

131 �TINV l, B2) �


Here is the t-critical value that you should get:

- -

I- 3 le=
A I B
2.0243�3'4
I

Although we could have directly enter the TINV function, =TINV(0.05,


a and m values into the
38), we chose instead to refer to the cells where we have stored and displayed those values.
Displaying the values of the function's arguments makes our worksheet much easier to read and
understand. In addition, we can compute a new t-critical value by changing one or both
arguments' values.

In cell Bl, change a from 0.05 to 0.01. Here is how your table should look like:

A I B I
I-
1 :tt=·
l 0.0·1
2 m= 36
I-
3 tc =
2.711556
,___

For a= 0.01, holding m constant, the t-critical value is 2.711558.


Interval Estimation and Hypothesis Testing 71

3.1.le Appendix E: Table 2 in POE

Alternatively, we could have gotten those t-critical values from Table 2 at the end of Principles of
Econometrics, 4e. Recall that the critical value tc is also the 100(1 - a/2)th percentile of the t­
distribution, denoted tci-a/Z,m)· For a= 0.05 and m= 38, the critical value tc is the 100(1 -
a/2)= 100(1 - 0.05/2)= 100(1 - 0.025)= 97.5 or 97.5th percentile of the t-distribution,
denoted tc.975,38). At the intersection of the column labeled "tc.975,df)" and the row "38" degrees
of freedom (dj), tc= 2.024.

For a= 0.01, holding m constant, the critical value tc is the 100(1 - a/2)= 100(1 -
0.01/2)= 100(1 - 0.005)= 99.5 or 99.5th percentile of the t-distribution, t(.995,38). Its value
is found at the intersection of the column labeled "tc.955,df)" and the row "38" degrees of
freedom (dj): tc= 2.712. Those t-critical values are slightly different from the ones we obtained
in Excel due to rounding in Table 2.

3.1.2 Obtaining Interval Estimates

The interval estimator of f3k is defined as:

(3.1)

The interval bk± tcse(bk) has probability (1 - a) of containing the true but unknown parameter
f3k· When using data, we say that we have a 100(1 - a)o/o interval estimate or 100(1 - a)o/o
confidence interval.

We are usually interested in constructing either a 95% or a 99% confidence interval, so the
corresponding a values that we would use to get our t-critical values are a= 0.05, and a= 0.01.

To obtain the interval estimates, we use equation (3.1) and replace the least squares estimators bk,
the critical t-value tc, and the standard errors of bk's, se(bk), by their estimated values. The
lower limit (LL) and the upper limit (UL) of the interval will be:

(3.2)

(3.3)

3.1.3 An Illustration

In this section, we will first illustrate how to obtain an interval estimate by plugging values into
the interval estimator's formula. Next, we will go back to the Excel regression analysis tool and
look at the output we already have generated, as well as look at the built-in option available to
generate additional interval estimates.

3.1.3a Using the Interval Estimator Formula

We create a template to compute the interval estimates for the least squares regression parameters
of the food expenditure model.
72 Chapter 3

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the t-critical value tab. Name it Interval Estimate.

11� �
I Re.ady

I
�1 I Reo rassTcrn , Dciti J t-cri.Hcal v;J1ue /15:'1;{1
"
i
� ! Rewe�ion r" Data , t-aitiGJI Vi3lue ] . J Estim.ilte < "'t:J
Interva _A
L--(

Create the following template to construct interval estimates:

A B c
1 Data Input Sample Size= =Regression!B8
2 Confidence Level=
3 Estimated bk= =Regression!B18
4 Standard Error of bk= =Regression!C18
5
6 Computed Values a= =l-C2
7 df or m= =Cl-2
8 tc = =TINV(C6,C7)
9
10 Interval Estimate Lower Limit= =C3-C8*C4
11 Upper Limit= =C3+C8*C4

Note that we get the sample size, estimated coefficient and standard error from our Regression
worksheet. All you have to do in cells Cl and C3:C4 is, first, type the equal sign, and then, go
select the needed value in the Regression worksheet with your cursor. Finally, press Enter. We
are computing the interval estimate for {32, the slope parameter. Cell C2 is left blank for now.
Later, you will enter either 95 or 99 depending on whether you are constructing a 95% or a 99%
confidence interval, but you could also enter any other confidence level. In cell C6, the a level
will be computed based on the level of confidence entered in C2. In cell C7, the degrees of
freedom are set equal to N - 2, where N is the sample size, which we record in cell Cl. Cell C8
is where the critical t-value is computed, as shown in Section 3.1.ld. Cells ClO-Cll are where
the limits of the interval estimate are computed, using equations (3.2) and (3.3).

Before we specify our level of confidence, we would like to reformat C2 so that the level of
confidence can be displayed as a percentage. In cell C2, right-click, and select Format Cells on
the tasks panel that opens up. In the Format Cells dialog box, select Percentage in the Category
window, choose 0 decimal place (use the up and down arrows for that, to the right of the Decimal
places window). Finally, select OK.

Reformat cell C6 the same way.


Interval Estimation and Hypothesis Testing 73

f"
- - • -

Format Cell� ll] �


Number Alignment Fant Border Fill 'Prntecliofil

'Category:
General 1 Sample.
Number
Currenc:y
Accounting
Da'te
Q.edmal places: [ii�
.)I; Cut Time

-
lii@i@.ir.l•i-
J;;opy Fraction
f'aste Scientific
Text
Past' �pec�ar ... Special
Custom
Insert .. .
Q�let� .. .
Cle<ir Content�

Fflt�.r
S.Qrt

Percentage formats multiply the cell value by 100 and displays the result with a.percent
� symbol.
� ,Eorrmat Cells...
.

I
·
Pie.ls From Drop-dovin U.s.t...

N.ame aB,ar:<ge ...


'---o_K_.t;J I Cancel

Here are the results you should get for a 95% confidence interval estimate for {32 (make sure you
type 95, and not 0.95, in C2):

A B c
1 Data. Input Sample Size=
2 Confide.nee Level =

3 Estimated ��=
4 Standard Error of [ii:=
5
6 Comput·edl Values (]'=

7 dfafm=
B �=
9
10 Interval Estimate L·ower Limit =

11 Upper Limit·=

The lower limit and upper limit of the interval estimate above should be the same as those
reported on p. 98 of Principles ofEconometrics, 4e.

We plugged values in equation (3.1), and built a template, to obtain interval estimates. Next, we
will go to our Regression worksheet and look at the interval estimates Excel has already
generated in the regression summary output.

3.1.3h Excel Regression Default Output

Go to your Regression worksheet, and look at the last table of the summary output. Columns F
and G of that table present the lower limits and upper limits of the interval estimates for the
intercept and slope parameters, {31 and {32 (shaded cells below). Excel regression analysis routine
automatically generates the 95% confidence interval estimates.
74 Chapter 3

In cell F18, you can find the lower limit of the interval estimate for {32. In cell G18, you can find
the upper limit of the interval estimate for {32. Those values are identical to the ones you
computed in your Interval Estimate worksheet.

111'1 A B I c () I E I F I G H I I
1 SUMMARY OUTPUT
2
3 I Regression Stalistics
J.
i
+I Mwltip·le R 0_&2:04�5472
R�g,uam 038500•2221
Adjusted R Square 0.36.8818069
Tl standard-Error 89_51700429 I
-a-I Otiservations 40

foi ANOVA
11 1 df SS MS F S'E.nific.ance F
�Regressi on - 1 190626.!1788 190626_97BB 2'];_78884107 1 _94586E-05
Re·siduaJ 38 304505.174Z 8013.2'.94058 I
Total 39 495132-153
15
16 I Coefficients SlanciardEmo.r 1 Stat P'·VB/tJe -Lower95% eUpper95% Low�r 95 0% UeE,er 95 0% ,
1�11'!tercept 83_416.00997 43.4101 &19>2'. l 9>2'15 77(!51 0•.052182379 -·L46�26.n21 '171-2952&!77 -4_463267721, 171 .2%2S77
1. 8 )( Variable 1 10.2a.95425 2_0932534&1 4.877380554 1 : 94 586E·05 5-972.052202 14.4(1.72328 5_972052202· 14.4472328

Excel actually reported the interval estimate for {32 twice: in cells F18:G18, and again in cells
H18:118. The table is set so that, if you choose to, Excel will be able to report confidence interval
estimates, other than the 95% one.

3.1.3c Excel Regression Confidence Level Option

Go back to your Data worksheet. From there, select the Data tab, the Data Analysis button in
the Analysis group of commands, and Regression in the Analysis Tools window. In the
Regression dialog box, check the box next to Confidence Level and type in 99. Select New
Worksheet Ply and name it Regression and 99% CI (for Confidence Interval). Select OK.

Input
Input;yRange:

!jelp
D �abels D Constant Is �ero
� Coniidli'.nce Lev.eJ: EJ %
Output options
0 Q1JtptJt Range: �1
@ New Worksheet �ly:
0 New !!/.orkbook
R.esiduals
0 8.esiduia!. 0 Residual Plols
0 Standardizi=d Ri=siduals D L[fli= Fit Phlls.
Normal PHlbabllity
0 �ormal Probability Plots

Alongside the 95% interval estimates, Excel now has also generated 99% interval estimates for
{31 and {32 (cells H16:118, shaded below):
Interval Estimation and Hypothesis Testing 75

I 8 c E F G H
TT$UMMARY A
OUTPUT
I I D I I I I I

I""fl
��
Rearession Slatk;tirxr
�4- Multiple R
§qu a:re
� Adjastet1 R Sgaare
0_620485472
0_385002221
0 _358818069·
+

------'- l
,_]_ Standard Error
8 0 bservafons 401
89. 517004291
-

;01ANOVA
11 1 df SS MS F Sig_aificance F
�i Regression
�y 1 1906-2:6_9'788 190&26_978ll 2'3-78884107 1 _94!i86E-O!i
1.3 Resi-dual 3a- ,'304505.1742 8013.294050
14 Total 391 495132.1-53
t5 1
1·5 I Goefflcierrts Slane/a.rd Error _!Stal P-va/ue Lower95% Upper95%. lowef-99.0% Uppei99 0% I
��
>-1 Intercept 8 3-4Hi00997 43_4101&192 1.92'15779'51 0·_062'182379 4.463267721 171.2952&T'7 -:n4,29314438 201.1251'643
1 6 )C Variable 1 10.. 2095425- 2_ 0932534•61 4_ B 773805-54 1 _�4586E-05 5_912052202 14-4472328 4-fi336(3 8051! 15"88564�341

The interpretation of confidence intervals requires a great deal of care. The true meaning of being
95% or 99% confident about our interval estimates is that, if we were to repeat this exercise of
drawing a sample size of N = 40, estimate the least regression parameters, and construct interval
estimates for those regression parameters, many more times, then 95% or 99% of all the interval
estimates constructed this way would contain the true parameters' values. To illustrate this
concept we are going back to our simulation exercise of Section 2.4.4.

3.1.4 The Repeated Sampling Context (Advanced Material)

In Section 2.4.4 we drew many random samples of size N = 40, and, based on each, estimated
the corresponding least squares regression parameters. We can repeat this exercise and extend it
to compute, for each sample, not only least squares estimates, but interval estimates as well.

Note that in Section 3.1.4 of Principles of Econometrics, 4e, 10 samples were randomly drawn
from a population with unknown parameters, while in this section we will draw 100 samples from
a population with known parameters.

3.J.4a Model Assumptions

In the simulation exercise we are considering in this section, half of our hypothetical population
of three person households has a weekly income of $1000 (x 10), and half of it has a weekly =

income of $2000 (x = 20). Because we know the data generation process, we know the values of
population parameters for the normal distribution, and consequently the values of our regression
parameters. Let µylx=io = 200, µylx=zo = 300, and var(ylx = 10) = var(ylx = 20) = a2 =
2500. This implies {31 = 100 and {32 = 10.

3.J.4b Repeated Random Sampling

We will draw random samples of 40 households from our population. Half of each sample will be
drawn from the first type of households, with weekly income x = 10; and half of each sample
will be drawn from the second type of households, with weekly income x = 20.

First, insert a new worksheet in your workbook by selecting the Insert Worksheet tab at the
bottom of your screen, next to the Interval Estimate tab. Name it Simulation.
76 Chapter 3

t-crit!Cal v.alue J Jnterval I .--\


Estimate/� 1� t-uJtical v;ilue ; lnteMJ Erunilte J S"mrnlation ,,.ti Al
I In.sertWo rk�heet (Slnfft-fllJI L--,1
�============�����
I

Let us keep records of the level of weekly income for our 40 households in column A of our
Simulation worksheet: in cell Al, type x and Right-Align it; in cells A2:A21, record the value
10; in cells A22:A41, record the value 20.

20
20
3 20
4 20
5 20
6 20
7 20
-B 20
9 20
10 20
11 20
12 20
13 20
14 20
15 20
16 20
17 20
18 20
-19 20
2:0 20
21

Next, use the Random Number Generation analysis tool to draw 100 random samples of
households.

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

li'crrrnula; , Data{� R"viie-w


AnalJ!i�

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.

Data Analysis l1.JIBJ


Analysis ToC>ls
OK
F-TestTl'..a-Bample for V<1rian�·" �I
Fourier Analysis. Cam:el
Histogram
Mo vi1'!J�v era_g�
IMMttllfAA•@ii.!.I,
fi tieJp
Q
�1
Rank. and Percentile
Regression
Sampling
t-Test: Paired Two Sample for Meams
t-Te�t: Two-S�mple Ass1:Jmin9 Equ:al 'l�riances �I

A Random Number Generation dialog box pops up. Since we are drawing 100 random
samples, we specify 100 in the Number of Variables window. We first draw random samples of
Interval Estimation and Hypothesis Testing 77

20 from households with weekly income of x= 10, so we specify the Number of Random
Numbers to be 20. For simplicity we assumed that our population of households is normally
distributed, so this is the distribution we choose. Once you have selected Normal in the
Distribution window, you will be able to specify its Parameters: for x= 10, its Mean is
µylx=lO = 200 and its Standard Deviation is �var(ylx = 10) = a= 50. Select the Output
Range in the Output options section, and specify it to be B2:CW21. Finally, select OK.

Ni.amber of!(ariables: 1�10_0 ___�


OK

Number of'Random Num!i.ers.;


l._20
____ _.
Cancel

Qls.trib\Jtioo: !::!elp

Pari'lllleters

M�an=

�dard deviation = �

B..amlom .Seed::

Outµ.A opti()flS
@ Quiput Range:
0 New Worksheet Ely:
0 New W.orilbook

Repeat to draw a random sample of 20 from households with weekly income of x= 20. Change
the Mean to µylx=lO = 300 and the Output Range to B22:CW41.

Parameters
Output options
@ Qutput Range;

3.1.4c The LINEST Function Revisited

This time we use the LINEST function to obtain the least squares estimates and their standard
errors. The LINEST function can compute the latter, if you ask it to return additional regression
statistics. For this purpose, the general syntax of the LINEST function is as follows:

= LINEST(y's, x's, , TRUE)

The first argument of LINEST function specifies the y values; the second argument specifies the
x values; we ignore the third argument by putting a space between the second and third commas;
and the fourth argument, TRUE, indicates that we would like LINEST to return additional
regression statistics.

The LINEST function creates a table where it stores the least squares and standard errors
estimates in Excel memory. The following illustration shows the order in which they are reported:
78 Chapter 3

column 1 column 2
row 1 bz b1
row 2 se(b2) se(b1)

We nest the LINEST function in the INDEX function to get the estimated coefficients, one at a
time. The INDEX function returns values from within a table. The INDEX function general
syntax is as follows:
= INDEX(table of results, row_num, column_num)

The first argument of the INDEX function specifies which table to get the results from. The
second argument and third argument indicate the intersection of a row and a column at which the
result of interest can be found.

The nested commands will thus be as follows:

b1: =INDEX(LINEST(y-values,x-values,,TRUE),1,2)
se (b1): =INDEX(LINEST(y-values,x-values,,TRUE),2,2)
b2: =INDEX(LINEST(y-values,x-values,,TRUE),1,1)
se (b2): =INDEX(LINEST(y-values,x-values,,TRUE),2,1)

3.1.4d The Simulation Template

We will report our estimated coefficients and standard errors at the bottom of our table of random
samples. We will also compute our !-critical value and limits of our interval estimates (Lower
Limit: LL and Upper Limit: UL). Finally, we would like to count how many of our 100 interval
estimates contain the true parameters' values.

We will specify cells A42:B57 as shown below (we outlined some cells in different shades of
gray only to distinguish groups of similar or related cells which we comment on shortly):

A B
42 N= 40
43 a= 0.05
44 m= =B42-2
45 tc= =TINV(B43,B44)
46 b1= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),1,2)
47 se(b 1 )= =INDEX(LINEST(B2:B4l,$A$2:$A$4l,,TRUE),2,2)
48 LL= =B46-$B$45*B47
49 UL= =B46+$B$45*B47
50 fhin CI =IF(OR(lOO<B48,lOO>B49),"No", "Yes")
51 Yes' =COUNTIF(B50:CW50, "Yes")
52 b2= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),1,1)
53 se(b2 )= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),2,1)
54 LL= =B52-$B$45*B53
55 UL= =B52+$B$45*B53
56 lh in CI =IF(OR(lO<B54,lO>B55),"No", "Yes")
57 Yes' =COUNTIF(B56:CW56, "Yes")
Interval Estimation and Hypothesis Testing 79

In cells A42:B43, the N (sample size) and a values are specified so that m (degrees of freedom)
and tc (t-critical value) can be computed and reported in cell A44:B45. tc is computed as shown
in Section 3.1.ld.

Cells A46:B47 and A52:B53 are used to report and compute coefficient and standard error
estimates, as explained in Section 3.l.4c. The cell references to the x values are in Absolute
format, $A$2:$A$41, as opposed to Relative format, as we will be using the same x values for
all 100 repetitions.

Cells A48:B49 and A54:B55 are used to report and compute interval estimates, as explained in
Section 3.1.2. The value for tc will be the same over all repetitions; its cell reference is thus in
Absolute format, $B$45, in the formulas of the intervals limits.

3.1.4e The IF Function

We make use of the IF and OR logical functions to indicate, for each interval estimate, whether
or not it contains the true parameter value. The general syntax for the IF function is as follows:

IF(logical_test,value if true,value_if_false)
_ _

Logical_test is any value or expression that can be evaluated to be TRUE or FALSE. In this
exercise we want to determine whether or not the true parameter value, pk, is within the estimated
interval [LL, UL], where LL =bk - tcse(bk) and UL =bk+ tcse(bk)· The logical expression
we use is: if pk < LL or pk > UL. If pk is outside [LL, UL], then this expression is TRUE.
Otherwise, the expression is FALSE.

Value_if_true is the value that is returned iflogical_test is TRUE. For example, if this argument
is the text string "No" and the logical_test argument is TRUE, then the IF function displays the
text "No".

Value_if_false is the value that is returned if logical_test is FALSE. For example, if this
argument is the text string "Yes," and the logical_test argument is FALSE, then the IF function
displays the text "Yes".

3.1.4/ The OR Function

We use the OR function to write our logical test. The general syntax of the OR function is as
follows:
OR(argument_1,argument 2) _

If the first logical expression, argument_!, or the second logical expression, argument_2, is
TRUE, then the OR function returns TRUE. It returns FALSE only if both arguments are
FALSE.

The general syntax for the OR function, nested in the IF function, is:

IF(OR( argument_1,argument_2),value_if true,value_if false)


_ _
80 Chapter 3

Applied to our exercise, the nested function looks like this (which is what we have in cellB56):

IF(OR(pk <LL, pk> UL),"No","Yes")

If flk is outside
[LL, UL], then the logical test flk <LL or flk > UL is TRUE, and "No" is
returned to indicate that flk is not in the estimated confidence interval. Otherwise, the logical
expression is FALSE, and "Yes" is returned to indicate that flk is in the estimated confidence
interval.

3.J.4g The COUNTIF Function

Finally, we use the COUNTIF function to count the number of times flk is found within the
estimated interval [LL, UL].

The COUNTIF function is a statistical function that counts the number of cells within a range
that meet a given criteria. Its general syntax is:

COUNTIF(cell_range,criteria)

Cell_Range is one or more cells to count. Criteria is the number, expression, cell reference, or
text that defines which cells will be counted. Since we are interested in counting how many
interval estimates, among all the ones we will construct, actually contain the true parameter value,
we will count the "Yes" that are generated following the application of our logical test (this is
what we do in cellB57):
COUNTIF(cell_range,"Yes")

Once you have reviewed and understood the formulas and values from B42:B57, you can copy
the content ofB46:B50 toC46:CW50 and copy the content ofB52:B56 toC52:CW56.

Here is how our worksheet looks like (only 10 out of 100 simulations results are shown below):

A 8 c D E F G H J K
42 Ill= 40
43 a= 0.65
44 m= 38
45 'le= 1.024394
46 b1= 163_162645 12:!L1E79 4i6.826J6i 1WW13 13 . 5.5 64J 85._4841>5 93.69496, 89.25071 117.0464 1l9.4847
47 se{b1)= 28.53373 22. 14-145 24.0()9091 23.8.1712 27-41891 25. 52'32'9 19241()2 19.19294 27-79757 22.4184
48 LL= !i.862943 83.33492 �i.7774�1 &2.�56'°3 80.0.5716 31-79'105 54_ 74354 50.39S6S 6.0. 77321_ 74.10106
49 UL= 121.39 172 981 95:4.30:22: 159L1S65 -, 91.()f1 139.178.2 132.6464 128_ 10491 173.3197 164:86!!4
50 �1 in -Cl Yes Yes Na Yes Yes Yeis Yes Yes Yes YEJ>s
51 Yes· �8
52 bi= 12 32048 7.215456 13:31 B9� 9'-29'7985 8.060182 11.0701>1 10.90295 10.74238 9'.0090 1 1 B.548776
53 seCb2)= 1.804631 1.4.00348 l .5164,6&, 1.506]27 1.734124 1.167748 l.2:169()9- 1.21386& 1. 758073 1.417864
54 LL= l:l.&67196 4,380-599 j 0:24497 S..248586 4.549531 7.674729 8.439441 82650·28 5.44998 5�6-i'B459
55 .UL= 15.97377 1 0 . 05031 1°16:39293 12.34738 11.57073 14.46i649 U.36645 1.3.199172: 12.56604 11.4190,9
56 S2 in Cl Yes Yes Na, Y�s Yes Yes Yes_ Yes Y�s Yes
57 Yes' 911

We find that 98 out of our 100 confidence intervals contain the true parameter value, both for our
intercept and slope coefficient confidence intervals. Note that you will draw different random
Interval Estimation and Hypothesis Testing 81

samples, obtain different interval estimates and thus obtain a different number of intervals that
will contain the true parameters values.

We first extended our repetitions to 1,000 samples, and found that 959 out of 1,000 interval
estimates contained {31, and 962 out of 1,000 interval estimates contained {32. Finally, we
extended the repetitions to 10,000 samples and found that 95.08% of both the intercept and slope
coefficients interval estimates contained the true parameters values.

In the next section of this chapter, we will perform hypothesis tests. To go over examples of
hypothesis tests, we are getting back to our simple linear regression model of weekly food
expenditure.

3.2 HYPOTHESIS TESTS

If the null hypothesis H0: {Jk = c is true, then the test statistic t =(bk - c)/se(bk) follows at­
distribution with m = N - 2 degrees of freedom:

(3.4)

When we reject H0, we accept a logical alternative hypothesis H1. There are three possible
alternative hypotheses to H0:
(3.5)

(3.6)

(3.7)

3.2.1 One-Tail Tests with Alternative "Greater Than" (>)


If the alternative hypothesis (3.5) is true, then the value of the computed test statistic will tend to
be unusually large. We will reject H0 if the test statistic is in the right-tail of the distribution.

reje t J-10:
�k=c
do 11ot
rej�ct H �

�k =c

Note that in this case the probability is a that a randomly drawnt-value is equal to or greater than
tc, where tc is defined as the lower limit of the right-tail of the distribution shown in the graph
above.
82 Chapter 3

3.2.2 One-Tail Tests with Alternative "Less Than" (<)

If the alternative hypothesis (3 .6) is true, then the value of the computed test statistic will tend to
be unusually small. We will reject H0 if the test statistic is in the left-tail of the distribution.

1(m)

RejecL H0: �k "' c

Note that in this case the probability is a that a randomly drawn t-value is equal to or less than tc,
where tc is defined as the upper limit of the left-tail of the distribution shown in the graph above.

3.2.3 Two-Tail Tests with Alternative "Not Equal To" (¢)


If the alternative hypothesis (3. 7) is true, then the value of the computed test statistic will tend to
be unusually small or large. We will reject H0 if the test statistic is either in the left-tail or the
right-tail of the distribution.
fl/)
Re-jecl H0;13�= c R.-jecl f/0:13k= i:
Do no< rejccn
Accc�t ll1: �:
1 ,. ....--�� ��--i ,Accept H1 : �k ·I- c
.c
Ho:�k="

Note that in this case the probability is a that a randomly drawn t-value will fall in the tails of the
distribution, either equal to or less than tca;2,N-2) or equal to or greater than t(l-a/2,N-2). Those
limits are shown in the graph above. (Note that those limits correspond to values -tc and tc first
defined in Section 3.1.lb.)

3.3 EXAMPLES OF HYPOTHESIS TESTS

We illustrate the mechanics of hypothesis testing using the food expenditure model. We give
examples of right-tail, left-tail, and two-tail tests. Note that when the null hypothesis of a test is
that the parameter is zero, the test is called a test of significance. We can have one-tail tests of
significance or two-tail tests of significance.
Interval Estimation and Hypothesis Testing 83

Recall our estimated regression model; below the estimated values for b1 and b2, we report their
estimated standard errors, se(b1) and se(b2):

Yi= 83.42 + 10.21xi


(3.8)
(se) (43.41) (2.09)

3.3.1 Right-Tail Tests

We create a template for right-tail tests.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Simulation tab. Name it Right-Tail Tests.

lnte rval estimate Interval estllrate Simulation J Rloht-Tail Tests JI. ti 1

Create the following template to perform right-tail tests:

A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!B 18
3 se(bk) = =Regression!C18
4 Ho: flk=
5 a=
6
7 Computed Values df or m = =Cl-2
8 tc= =TINV(C5*2,C7)
9
10 Rb?ht-Tail Test t-statistic = =(C2-C4)/C3
11 Conclusion: =IF(C10>=C8,"Reject Ho","Do Not Reject Ho")

We get the sample size N, estimated coefficient b2 and standard error se(b2) from our
Regression worksheet. All you have to do in each of cells Cl:C3 is, first, type the equal sign, and
then, select the needed value in the Regression worksheet with your cursor. Next, press Enter.
We are performing hypothesis tests on the slope parameter, {32. Cells C4:C5 are left blank for
now. Later, you will specify the value you hypothesize /32 takes, as well as the level of
significance of your test (a). In cell C7, the degrees of freedom are set equal to N 2, where N is -

the sample size, which we record in cell Cl.

Cell CS is where the critical-value for the right-tail rejection region is computed. Recall that all
the probability a of rejecting H0 is in the right tail of the distribution greater than or equal to tc.
The TINV function, on the other hand, gives us a tc value such that P(tm > tc) = a/2. So, what
we need to do, to get the correct critical-value for the right-tail rejection region, is to multiply the
specified a value by 2 in the TINV function (half of a x 2 is a, which is what we want).

Cell ClO is where the test-statistic t is computed. The test statistic is computed by plugging the
least squares estimate and its standard error into the equation fort in (3.4).
84 Chapter 3

Finally, in cell Cll, we use the IF function to determine whether or not our t-statistic falls into
the rejection region. If it does, we reject our null hypothesis; if it does not, we do not reject it (see
Section 3.1.4e for details on how the IF logical function works).

3.3.la One-Tail Test ofSignificance

Let a= 0.05; H0: {32 = 0 and H1: {32 > 0.

B c
N= 40
b;: 10.20964
3 .se{bl<)= 2.09326
· 3
4 Ho: Pk= 0
5 a= 01.05
6
7 C:omrmted Values dfo-rm= 38
6 le= 1.685954
9
10 Right-Tail Test t-statistic: 4.877381
11 C::onc�u�ion: Rejed H·o

3.3.lb One-Tail Test of an Economic Hypothesis

Let a= 0.01; H0: {32 :::;; 5.5 and H1: {32 > S.S.

Note that the hypothesis testing procedure for testing the null hypothesis that H0: {32 :::;; 5.5
against the alternative hypothesis H1: {32 > 5.5 is exactly the same as testing H1: {32 = 5.5 against
the alternative hypothesis H1: {32 > 5.5.

A I B c I D
1 �ta Input N= 40
2. b. = 18•.20964

t
I-
3 se(bie)= �.0.93;;"63 -
I-
4 f-!o: �k = 5.5
-
T cr= 01.01

�-
7 C()mputed Values df OF m =
-·-

38
8 tc =
-
2.42B.568
9
10 Ri.ght-Tail Test· t-stati stic = 2.249904
>---
11 Condusio_n� _po Not Reje<:,t H()

3.3.2 Left-Tail Tests

We create a template for left-tail tests.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Right-Tail Tests tab. Name it Left-Tail Tests.

Simulation l Riaht-Tail Tests / £:! IV" ,-\ I SimulatloM . RIO'ht-TaHests J Left-Tai!Tests,. 'D "'11
t·F =ll�lI� � I
,

======�ll=n= W=
s� =rt = � s h e=et�Sh�=if
o= == 11
!=========== =
Interval Estimation and Hypothesis Testing 85

The left-tail test template will be very similar to the right-tail test template. You can copy cell
Al:Cll from the Right-Tail Tests worksheet to cells Al:Cll in the Left-Tail Tests worksheet.

Alternatively, you can select the whole Right-Tail Tests worksheet by left clicking on the upper
left-comer of the worksheet. Your cursor should turn into a fat cross as shown below:

Select Copy. Left-click in cell Al of the Left-Tail Tests worksheet, and select Paste.

m A II s I
N-

You will need to make just a few modifications to create the following left-tail test template:

A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!Bl8
3 se(bk)= =Regression! C18
4 Ho: Pk=
5 a=
6
7 Computed Values df or m= =Cl-2
8 tc= = -TINV(C5*2,C7)
9
10 Left-Tail Test t-statistic= =(C2-C4)/C3
11 Conclusion: =IF(ClO<=C8,"Reiect Ho","Do Not Reiect Ho")

The rejection region for a left-tail test is the mirror image of the rejection region for a right-tail
test; it is on the left-tail instead of the right-tail of the distribution. The critical value for a left-tail
test is thus the negative of the critical value for a right-tail test: in cell C8, we precede the TINV
function by a minus sign to reflect that.

In a left-tail test, we reject our null hypothesis if our !-statistic is less than or equal to our critical
value, not greater than or equal to our critical value as it is the case in a right-tail test; we adjust
the equation in Cll accordingly.

Finally change the label in cell AlO to "Left-Tail Test".

Let a= 0.05; H0: {32 � 15 and H1: {32 < 15.

Note that the hypothesis testing procedure for testing the null hypothesis that H0: {32 � 15
against the alternative hypothesis H1: {32 < 15 is exactly the same as testing H1: {32 = 15 against
the alternative hypothesis H1: {32 < 15.
86 Chapter 3

A I 8 I c
1 Data Input N= 40
I-
2 b,,= ·rn.20964
I-
3
-
se(b�= 2.0-93263
,_
4 Ho: �k = 15
5 ci= 0.05
&
f--
1 Computed Values df or m = 38
8 t., = -1.6-85954

-�-
10 Left-Tail Test
-
t-statistic = -2.288464
11 Conc.lusion: Reject Ho

3.3.3 Two-Tail Tests

We create a template for two-tail tests.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Left-Tail Tests tab. Name it Two-Tail Tests.

Left-Tail Tests q R.icJht-T<iU Te.51:s / Lef-t Tall Tests l Two-Ta'il Tests, �:1
l.:-�.;;;;;;;
;; ;;;;
;;;;;;;; ;; ����
;;

The two-tail test template will also be very similar to the right-tail test template. You can copy
cell Al:Cll from the Right-Tail Tests worksheet to cells Al:Cll in the Two-Tail Tests
worksheet. Alternatively, you can select the whole Right-Tail Tests worksheet and copy it in the
Two-Tail Tests worksheet.

You will need to make just a few modifications to create the following two-tail test template:

A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!B18
3 se(bk)= =Regression!C18
4 Ho: �k=
5 a=
6
7 Computed Values dfor m= =Cl-2
8 tc= =TINV(C5,C7)
9
10 Two-Tail Test t-statistic= =(C2-C4)/C3
11 Conclusion: =IF(OR(C10<=-C8,C10>=C8),
"Reject Ho","Do Not Reject Ho")

The rejection region for a two-tail test is split in half between the left-tail and the right-tail of the
distribution: only a/2 of the probability is in each tail of the distribution. So, we do not need to
multiply a by 2 in the TINV function any more: delete *2 in cell CS.
Interval Estimation and Hypothesis Testing 87

In a two-tail test, we reject our null hypothesis if our t-statistic is less than or equal to the left-tail
critical value, or greater than or equal to right-tail critical value: we adjust the equation in Cll to
reflect that (see Section 3.1.4f for details on how the OR logical function works).

Finally we change the label in cell AlO to "Two-Tail Test".

3.3.3a Two-Tail Test of an Economic Hypothesis

Let a= 0.05; H0: /32 = 7.5 and H1: /32 * 7.5.

A B c D
ii Data Input N= 40
2 b;.= 10-20964
3 se(b.i,) = 2.093263
4 �= l.6
5 .er= 0.05
6
7 Comp·uted Vilues df or m = 38
B le= '2.024394
9
10 Two-Tail T est · t-stati stic 1.29445 8
=

11 .Co.nclu·s]on: Do Not Reject Ho

3.3.3b Two-Tail Test ofSignificance

Let a= 0.05; H0: {32 = 0 and H1: /32 * 0.

A B I c
-
1 Data Input N= 40

-
2 10.209'6'4
b,.=

-
3 se(b,;)= 2.0932&3
-
4 Ho:�= 0

-
5 II= 0·.05
6
---

_]__ o(:omputed Values df or m = 38


8 le= 2.024394
T
�.-
10 Two-Tail Test
-
t-statist]c = 4.8773:81
J.1 CoIJ1;;lusion: Roejoct Ho•

Note that the t-statistic in a two-tail test of significance is equal to the !-statistic in one-tail test of
significance (compare the !-statistic value above to the one obtained in Section 3.3.la). Also note
that this t-statistic value for tests of significance is reported in the regression summary output
generated by Excel.

Go back to your Regression worksheet. If you do not see your Regression tab, it is because it is
hidden. Use either one of the left-arrows at the left comer of your screen so that the first
worksheets you were working with can be seen again.

I Re a ressio n rv<.
0
•� '4 � �1

Ready I 'IC
88 Chapter 3

Column D of the last table of the summary output presents the t-statistic values for tests of
significance of the intercept and slope parameters, {31 and {32 (shaded cells below).

�i A
MMARY OVTPITT
I B I c I D I E I F G I H I

J[ Re11.reson
s1 Stil'tisfjcs
4 I Mult ipl e R 0.620485472
'

I R!g? u�re 0.38500 2221


j �

_!_ Adjus!�d RS_.g_uar!! 3 8 8 9'


0. 6 8.1 06

__]__ Stafldard Error 89.517004291


8 Observations 4(}
i
9
1() ANOVA I
11 1 df SS MS F Sig_nificance F
'
_g_ Regression 1 190626_97BB· 190626.9788 23;78884107 U14586E-O �
13 Residual 3B· 304505.17 42 �013.29401sa
14 Total � 495132_q5:J.
15 I
_!Ii Co&fficie!l'ls S tarrdarrf Error t-Stat P-11alue Lower95% Upper 95% Lower95_0% Uee_er 95- 0% 1
JI lrJter.cept 83.41 &00997 43.410·161·92 1.92-15779'51 0.0621823'79 -4.46J2r.n2n 171.2952877 -4_<\63267721 1 T1 -2:9_5?_877
14 4472328 5.9720522GZ 14:4472328'
I

16 X Varialille 1 ro 2osii425, 2.093263461 4.!l7738Ji5-54 1.94586E--05 5.9'72052202

3.4 THE p-VALUE

When reporting the outcome of statistical hypothesis tests, it has become standard practice to
report the p-value (an abbreviation for probability value) of the test. If we have the p-value of a
test, we can determine the outcome of the test by comparing p to the chosen level of significance,
a. This is an alternative to comparing the test-statistic value to the critical value(s) or limit(s) of
the rejection region for a test.

3.4.1 The p-Value Rule

In order to explain the p-value decision rule for hypothesis tests, we first give a definition of the
p-value.

3.4.la Definition ofp-Value

How the p-value is computed depends on the alternative hypothesis of our test. If H 1: Pk > c, p
is the probability that at-value be equal to or greater than the test statistic t value.

0 t
Interval Estimation and Hypothesis Testing 89

If H1: Pk < c, pis the probability that at-value be equal to or less than the test statistic t value.

t 0

If H1: f3k * c, pis the probability that at-value be equal to or less than - It I or equal to or greater
than It I, where t is test statistic value.

p/2

-t
l l 0 t
ll

3.4.lb Justification for the p-Value Rule

We can see that when the test statistic value t falls into the rejection region, this means that its p­
value is less than, or equal to, the level of significance a.

For H1: f3k > c; if t > tc, t is in the rejection region and p < a. The case illustrated below is
where t > tc, and p < a. H0 is rejected.
90 Chapter 3

reject Ho

0 fc = f (l-a,N-2) f

For H1: f3k < c; if t � tc, t is in the rejection region and p � a. The case illustrated below is
where t < tc, and p <a. H0 is rejected.

reject Ho

f fc = f(a,N-2) 0

For H1: {3k =F c; if t � tc on the left-tail of the distribution or t � tc on the right-tail of the
distribution, t is in the rejection region and p � a.

The case illustrated below is where t > tc on the right-tail of the distribution, and p <a. H0 is
rejected.

reject Ho reject Ho

a/2

tc tca12,N-2) 0 tc = t(l-a/2,N-2) t
=
Interval Estimation and Hypothesis Testing 91

The case illustrated below is where t < tc on the left-tail of the distribution, and p < a. H0 is
rejected.

reject Ho reject Ho

p/2

f fc = f(o12,N-2) 0 fc = f(l-o12,N-2)

We can thus compare the p-value of a test, p, to the chosen level of significance, a, and
determine the outcome of our hypothesis test: if p ::::; a, we reject H0 and accept H1; if p > a, we
do not reject H0. This is the p-value rule.

3.4.2 The TDIST Function

p-values are obtained in Excel by using the TDIST function. For hypothesis tests purposes, the
syntax of the TDIST function is as follows:

=TDIST(ABS(t),m,tails)

t is the value of the computed test statistic, ABS is a mathematical function that will return the
absolute value oft, mis the degrees of freedom, and tails specifies whether we are seeking the p­
value for a one-tail test or a two-tail test. Set tails to 1 for a one-tail test, and set tails to 2 for a
two-tail test.

Go back to your Right-Tail Tests and Left-Tail Tests worksheets and add the following at the
bottom of each template:

A B c
12 p-value = =TDIST(ABS(C10),C7 ,1)
13 Conclusion: =IF(C12<=C5,"Reject Ho","Do Not Reject Ho")

Go back to your Two-Tail Tests worksheet and add the following at the bottom its template:

A B c
12 p-value = =TDIST(ABS(C10),C7 ,2)
13 Conclusion: =IF(C12<=C5,"Reject Ho","Do Not Reject Ho")
92 Chapter 3

3.4.3 Examples of Hypothesis Tests Revisited

3.4.3a Right-Tail Test of an Economic Hypothesis from Section 3.3.1 b

Note that the hypothesis testing procedure for testing the null hypothesis that H0: P2 < 5.5
against the alternative hypothesis H1: Pz > 5.5 is exactly the same as testing H0: Pz = 5.5
against the alternative hypothesis H1: Pz > 5.5.

� A B I c I D
I' Oata Input N·= 40

-
2 b, = 10.20964

-
3 se(b,,) = 2-09326�
4 H<'l��-= 55
5
-
(l = 0�01

_ _§____
7 Comput·edl Values df or m = 38
8 t.c= 2_42856.&

-�
9
--
to Rig1ht-Ta.il Test t-statistic = 2'.249994
11 Co·�clusi6n: Do Not Reject .Ho
·12 -
p-'llalu·e �.·015163
=

13 Concl1.1sion: Do Not Reject Ho

Let a= 0.05.
A B c
1 Data Input N= 40
2 bi.= 10_20964
3 se(bk) = 2_093253
4 Hu: �1<_= 5_5
5 ll = 0_05
6
7 Computed Values dfor m = 38
8 tc-
- 1 . .585954

10 Right-Tail Test t-st.atistic 2.249904


=

11 Concl�sion: Heject Ho
12 _p-value 0 . 0 1 51 & 3
=

13-- C·o·nclusion: Reject Ho

3.4.3b Left-Tail Test of an Economic Hypothesis from Section 3.3.2

Note that the hypothesis testing procedure for testing the null hypothesis that H0: Pz > 15 against
the alternative hypothesis H1: Pz < 15 is exactly the same as testing H0: Pz = 15 against the
alternative hypothesis H1: Pz < 15.
Interval Estimation and Hypothesis Testing 93

A I 8 I c I D
1 D11ta Input N= 401
,___.

-
2: bk= 10_20964
3 se(bk)= 2.093263
f-
4
·-
Ho: P'k = 1S
5 a= 0.01 -
,....._
5
3B·
� Computed: Values df mm=

a. r.., = -2..4285681
,___.
9'
,_____
10 Left-Tail Test t-statistic = -2.:288464
,___.
11 Conclusion: Do Not Re}ect Ho
'12 p-value = 0.013881
13 Conclusion: Do NotRajed Ho

Let a= 0.05.
A B c
Data lnp·ut N= 40
2 bi\:= 10-20'964
.3 s·e(bk) = 2_09'3263
4 '.15
5 a,= 0.05
6
7 Computed Values dform= 38.
a !.: = -1-685-954
9
10 Left-Tail Test t-statistic = -2.2684>64
11 Conclusion: Reject Ho
12 __ p-value = Q_QH881
13 -Conclusion: R·eject Ho

3.4.3c Two-Tail Test of an Economic Hypothesis from Section 3.3.3a

Let a= 0.05; H0: P2 = 7.5 and H1: Pz -=I= 7.5.

A B
Data Input N= 40
2 b·= 10.20964
]. se(bx) = 2.093263
4 Ho:�= 7.5
5 a= 0.0�
6
7 C•omput•.e.di Values df OF m = 38
8 tc = 2.024394
g,
10 Two-Tail Test t-statistic = 1.29M.5B
11 Conclusion: Do ':Jot �eject Ho
·12 p-val�e = 0.20331.8
13 Conclusion: Do N ot ReJect Ho

3.4.3d Two-Tail Test of Significance from Section 3.3.3b

Let a= 0.05; H0: P2 = 0 and H1: Pz -=I= 0.


94 Chapter 3

A B c
1 Q11ta. Input N= 40
2 b,;= 10.20964
3 se'(b1::)= 2.0'9'3263
4 Ho� �k = 'Q
5 o:= 0Jl5
e;
7 Compute<fValues dfor m = 38
8 t,,= 2..(}24394
9
t-statistic = 4.877381
Conclusion: Rej(?ct Ho
p-value- 1.95E-05
Ho
=

Conclusion: Reject

Note that the p-value for this test is very tiny. "l .95E-05" is a standard scientific notation which
means "1.95 times 10 exponent -5":

1 1
"1.95E-05" = 1.95 x 10-5 .95 .95 0.0000195
10s 100,000
= = =

Also note that this p-value for the two-tail test of significance 1s reported m the regress10n
summary output generated by Excel.

Go back to your Regression worksheet. If you do not see your Regression tab, it is because it is
hidden. Use either one of the left-arrows at the left comer of your screen so that the first
worksheets you were working with can be seen again.

0
Column E of the last table of the summary output presents the p-statistic values for the two-tail
test of significance for the intercept and slope parameters, /31 and /32 (shaded cells below).

A I B I c I D I E I F I .G l K I I
1 SUMMARY OUTPLJT
T +
3 Hearession Slati:slics
-

4 Multiple R 01.1620485472.
,___

RSgua.re 0.38:5()02221
I

I 6 �djustoo R Square ·o.3s.ss1sos9


T Standard Erf·or -6. 9 _ 5. 1700429i
-

� Obsenra'tions 4 01
9
10 MOVA
I--
I
�!!.. ,df SS MS F Sfg_nJticance F
12 Regress-i Ctn 1 19()&26-9788 1!10626_9'7.BS 23. 7ss.s4·107 1.9458:6E-()5
13 Residual 3( 304505.1i4:2 80•1 i2940.58
14 Total 39 495132.153
15
16 Coefficients Slandam Error r Stat P-v;alae tower !15% Utmer 95% Lower95.0% Upper95.0%
17 lnteicl'!'pt g·3_4 1160 0997 43.4101619'2 1._92
• 1577951 O_Q.621823 79 4)153267721 17129528 77 -4.463267721 171.2%2877
18 X Variable 1 2.0932&3451 4.877380554 1._94586E-!li5. 5.9720522()2 14.4472328 5._ 97'2052202 R4472328
I--
10.2:1)%425
CHAPTER 4

Prediction, Goodness-of-Fit, and


Modeling Issues

CHAPTER OUTLINE
4.1 Least Squares Prediction 4.6.3 The Jarque-Bera Test for Normality for
4.2 Measuring Goodness-of-Fit the Linear-Log Food Expenditure Model
4.2.1 Coefficient of Determination or R2 4.7 Polynomial Models: An Empirical Example
4.2.2 Correlation Analysis and R
2 4.7.1 Scatter Plot of Wheat Yield over Time
4.2.3 The Food Expenditure Example and the 4.7.2 The Linear Equation Model
CORREL Function 4.7.2a Estimating the Model
4.3 The Effects of Scaling the Data 4.7.2b Residuals Plot
4.3.1 Changing the Scale of x 4.7.3 The Cubic Equation Model
4.3.2 Changing the Scale of y 4.7.3a Estimating the Model
4.3.3 Changing the Scale of x and y 4.7.3b Residuals Plot
4.4 A Linear-Log Food Expenditure Model 4.8 Log-Linear Models
4.4.1 Estimating the Model 4.8.1 A Growth Model
4.4.2 Scatter Plot of Data with Fitted Linear­ 4.8.2 A Wage Equation
Log Relationship 4.8.3 Prediction
4.5 Using Diagnostic Residual Plots
2
4.8.4 A Generalized R Measure
4.5.1 Random Residual Pattern 4.8.5 Prediction Intervals
4.5.2 Heteroskedastic Residual Pattern 4.9 A Log-Log Model: Poultry Demand Equation
4.5.3 Detecting Model Specification Errors 4.9.1 Estimating the Model
4.6 Are the Regression Errors Normally
2
4.9.2 A Generalized R Measure
Distributed? 4.9.3 Scatter Plot of Data with Fitted Log-Log
4.6.1 Histogram of the Residuals Relationship
4.6.2 The Jarque-Bera Test for Normality using
the CHllNV and CHIDIST Functions

In this chapter we continue to work with the simple linear regression model of weekly food
expenditure to make predictions, compute goodness-of-fit measures, and address modeling issues.
We also work with additional examples.

95
96 Chapter 4

4.1 LEAST SQUARES PREDICTION

A 100(1 - a)% prediction interval at value x0 of the explanatory variable is defined as:

(4.1)

where: Yo = b1 + b2x0 is the least squares predictor, (4.2)

tc is the 100(1 - a/2)th percentile from the t-distribution with N - 2


degrees of freedom,

and se(f) is the standard error of the forecast.

The standard error of the forecast is given by:

se(f) = .Jvar(f) = (4.3)

2
where: 8 is the estimate of the error variance or mean square residual (MS residual),

N is the sample size,

and se (b2) is the standard error estimate for b2 .

The lower limit (LL) and upper limit (UL) of the prediction interval are:

LL = Yo - tcse(f) (4.4)

LL = Yo + tcse(f) (4.5)

Before we create a template to compute prediction intervals, we quickly re-estimate the food
expenditure model; note that this time we also want to generate the residual output. We are
interested in the Predicted Y values generated in this output. Also, since we will use more than
one data set and run more than one regression in this chapter, we will choose to give our data and
regression worksheets more explicit names.

Open the Excel file food. Save it as POE Chapter 4.

Rename Sheet 1 food data. Re-estimate the regression parameters using Excel Regression
analysis routine as in Section 2.2.2. In the Regression dialog box, the Input Y Range should be
A2:A41, and the Input X Range should be B2:B41. Select New Worksheet Ply and name it
Food Regression; and do check the box next to Residuals.
Prediction, Goodness-of-Fit, and Modeling Issues 97

-- - -- -
Reg.-essi�n - --- -- l1J(g]
Input
Input")'. Range,:

lnput.l\; Range:
[�]
O'kabels D Constintis �ero
0 Con6d!i!nte Le�el: @=] %

Oulj'JUI tlpt!bns

0 Qt;lputRange: �I
0 New W11r.k:sheet f'.ly:
0 New !!11.orld;loolc
Residuals
� 'R:e�d;;,aii' D Resi�al !>lots
D si�.J�;.i'iz!i!d Residuals D Line Fit Plofu
Normal Probabilicy
D !'iormal Probability P.lots

Next, insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of
your screen. Name it Prediction Interval.

l
I Insert Work.sheet (Shift �FHJ M

Create the following template to construct interval estimates. In the last column you will find the
numbers of the equations and the formatting options used, if any, in the template.

A B c
1 Data Input Sample Size= ='Food Regression'!B8
2 Confidence percentage

Level= 0 decimal place

3 Xo =
4 b1 = ='Food Regression'!B17
5 b2 = ='Food Regression'!Bl8
6 se(b2) = ='Food Regression'!C18
7 MS residual= ='Food Regression'!D13
8
9 Computed a= =l-C2
Values
10 df or m= =Cl-2
11 tc= =TINV(C9,C10)
12 predicted Yn= =C4+C5*C3 (4.2)
13 x-bar= =AVERAGE('food data'!B2:B41)
14 se(f) = =SQRT(C7+C7/Cl+((C3-C13)"'2)*C6) (4.3)
15
16 Prediction Lower Limit= =C12-Cl1*C14 (4.4)
Interval
17 Upper Limit= =Cl2+Cl1*C14 (4.5)
98 Chapter 4

At x0 20, the results


= of a 95% prediction interval for y0 is (see also p. 134 of Principles of
Econometrics, 4e):
A B C
1 Data Input Sample Siz·e = 40
2 CiJnfidence Le11e l = 95%
3 XO:: 20
4 b i = ,B3_41601

5 b2 = 10-20964
6 se'(b2) = 2.093263
7 MS msidual = B O U294
s
9 Comput,ed Values « = 5,3
10 df or rn = 38
11 t,, = 2.024.394
i2 preidicted rm= 287 .6089
13 x-i>ar = 19 _,50475
14 se(f) = gQ_'63D86
15
16 Prediction Interval Lower Limit= 104.1363
17 Up[>E!r Limit= 471.0'814

4.2 MEASURING GOODNESS-OF-FIT

4.2.1 Coefficient of Determination or R2


The coefficient of determination, or R2, is the proportion of variation in y explained by x within
the regression model:

(4.6)

where: SSR is the sum of squares due to the regression (SS Regression),

SST is the total sum of squares (SS Total),

and SSE is the sum of squared errors or sum of squared residuals (SS Residual).

4.2.2 Correlation Analysis and R2


R2 can be computed as the square of the sample correlation coefficient between xi and Yi values.
This result is valid only in simple regression models:

Rz =
z
r:xy (4.7)

R2 can also be computed as the square of the sample correlation coefficient between Yi and
Yi = b1 + b2xi. This result is valid not only in simple regression models but also in multiple
regression models that will be introduced in Chapter 5.

Rz =
2�
r.yy (4.8)
Prediction, Goodness-of-Fit, and Modeling Issues 99

4.2.3 The Food Expenditure Example and the CORREL Function

We create a template to compute goodness-of-fit measures based on our estimated food


expenditure model.

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Correlation Analysis and R2.

l.t:J .N: J I Correlati:io11 Analysis and R2 II


I I l�erl Workslneft tShift- Fll) i L !I

Create the following template (in the last column, you will find the numbers of the equations used
in the template):

A B c
1 Data Input SS Residual= ='Food Regression'!Cl3
2 SS Total = ='Food Regression'!Cl4
3
2
4 Computed R = =l-Cl/C2 (4.6)
Values
5 rxv= =CORREL('food data'!B2:B41, 'food data'!A2:A41)
6 r2xv= =CY'2 (4.7)
7 ryy-hat= =CORREL('food data'!A2:A41, 'Food
Regression'!B25 :B64)
2
8 r vv-hat= =C7A2 (4.8)

The sample correlation coefficients in cells C5 and C7 are computed using the CORREL
statistical function. CORREL returns the correlation coefficient between two data sets. The
general syntax of this function is:

=CORREL(cell_rangel, cell_range2)

In cell C5, we compute the correlation coefficient between x and y values, which we find in the
food data worksheet. In cell C7, we compute the correlation coefficient between y and y values;
the latter are found in the Food Regression worksheet, under the column labeled "Predicted Y"
from the residual output.

Here are the results you should get (see also p. 138 of Principles ofEconometrics, 4e):

A B
1 Data Input SS Residu.al = 3'04505.2'
2 SS Total= 495132-2
3
4 Compuled Values Rz= 0.38.5002
5 rX\' = 0.620485
6 �xy= 0.385002
1 =
ryy-11at 0.620485
-

8 r2yy-hat = 01.385002
100 Chapter 4

Note that ryy and R2 are actually reported in the summary output of your regression analysis:
cells B4:B5, shaded below (ryy is labeled "Multiple R" and R2 is called by its familiar name "R
Square").
I A I B
1 SUMMARY OUTPITT 1
,_
2
3 Reqression Statistics
�"lti�eR �.620¢.85472
R Square L0.38500.2221
Adjusted R Square 0.3'68818069
7 Standard Error 89.517Cl0429
-slohstirvations 40

4.3 THE EFFECTS OF SCALING THE DATA

In our food data worksheet, weekly food expenditure (y values) are recorded in dollars while
weekly income (x values) are recorded in units of $100.

Recall our estimated regression model. Below the estimated values for b1 and b 2, we report their
estimated standard errors, se(b1) and se(b2):

Yi 83.42 + 10.21xi
=
(4.9)
(se ) (43.41) (2.09)

Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as weekly income increases by 1 unit, i.e. $100, weekly food expenditure is expected
to increase by 10.21 units, i.e. $10.21. The interpretation of the estimated intercept coefficient is
as follows: weekly food expenditure for a household with zero income is estimated at $83.42.

4.3.1 Changing the Scale of x

Let x* = 100x. We change the scale of measurement of our x values so that weekly income is
now recorded in dollars.

Go back to your food data worksheet. In Dl, enter the column label x*=lOOx. In cell D2, enter
the formula =100*B2; copy it to cells D3:D41. Here is how your table should look (only the first
five values are shown below):
A B c D
1 food_exp income x..=100x
2 115.22 3.69' 369
3 135.98 4.39' 439
4 119.34 4_75. 475
5 ..
114 96 6.03 50.3
6 18'7.05 12.47 1247

We want to re-estimate the food expenditure model using our original y values and our re-scaled
x* values.
Prediction, Goodness-of-Fit, and Modeling Issues 101

In the Regression dialog box, the Input Y Range should be A2:A41, and the Input X Range
should be D2:D41. Select New Worksheet Ply and name it Food Regression lOOx (you do not
need to select Residuals).

------
• Regres.siorn -LIJ�
'.Input
�l OK
Input)'. Range::
(�] [ Cancel
Input 1( Range::
[�]
ttelp
D babels D Constant!is �ero
D Confidence Level:. (0 %

Output opfions
0 Qutput fl.ange:
®New WoFks!ieet Ely: IFood Regression 100x I
0 New '\!'.'l_orkboo�·

D B.esiclu:als D Resigual Plots


D S!,andardired Residt.:1als D wne Rt P.lots.
Normal ProbabWty
D Mormal Prooabtt ity Plots

The results of your re-estimated regression model should be as reported below:

Yi 83.42 + o.1021xi
=
(4.10)
(se ) (43.41) (0.0209)

Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
1 unit, i.e. $1, weekly food expenditure is expected to
as follows: as weekly income increases by
increase by 0.1021 $0.1021 or 10.21 cents. Note that this is equivalent to saying that
units, i.e.
as weekly income increases by $100, weekly food expenditure is expected to increase by $10.21;
rescaling the data does not affect the measurement of the underlying relationship.

4.3.2 Changing the Scale of y


*
Let y = y/100. We change the scale of measurement of our y values so that weekly food
expenditure is now recorded in $100 units. We hold our x values at their original level of
measurement, which also recorded weekly income in $100 units.

Go back to your food data worksheet. In E l, enter the column label y*=y/100. In cell E2, enter
the formula =A2/100; copy it to cells E3:E41. Here is how your table should look (only the first
five values are shown below):

A I B I c I D I .E
1 foodi_exp in.come x'"=11}0x 'f=ylU>O
-
2 --
115.22 3.69 369 1.152:2
-
,3 135.98 4_39 439 1-3598
4 119.34 4.75· 475 1.-'.1934
-
5 114.96 6.03 603 1.1496
-� 187.05 12:.47 1247 1.87Cl5
102 Chapter 4

We want to re-estimate the food expenditure model using our original x values and our re-scaled
y* values.

In the Regression dialog box, the Input Y Range should be E2:E41, and the Input X Range
should be B2:B41. Select New Worksheet Ply and name it Food Regression divided by 100.

------------------- -
'. Regression
'

�L8J
Input
!J1Jput Y. Range: �

O !..ab€ls D Gonstll'ilttis;:'_ero
D Conjjaence !Level: �%
Output ep66ns

Q.·Qutput Range;
@New Worksheet �ly:: I;ion divided by 1001 l
0 New IJ!orkbcck
Residuals

D Re siduals. D Resi�ual Plcts


D Standan;lized Residullls D L[ne Rt Flo�
Nor.rrial Prababi�fy
D �orm-al ;Prnbability Plots

The results of your re-estimated regression model should be as reported below:

Yi o.8342 + o.1021xi
(4.11)
=

(se ) (0.4341) (0.0209)

Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as weekly income increases by 1 unit, i.e. $100, weekly food expenditure is expected
to increase by 0.1021 of a $100 unit, i.e. $10.21. The interpretation of the estimated intercept
coefficient is as follows: weekly food expenditure for a household with zero income is estimated
at 0.8342 of a $100 unit, i.e. $83.42. Again, note that rescaling the data does not affect the
measurement of the underlying relationship.

4.3.3 Changing the Scale of x and y


Let x* = 4x and y* = 4y. We change the scale of measurement of our original x values and y
values so that food expenditure and income refer to a period of 4 weeks instead of 1. For
simplicity we will refer to monthly food expenditure and income values. Food expenditure (y
values) are still recorded in dollars while income (x values) are recorded in units of $100.
Prediction, Goodness-of-Fit, and Modeling Issues 103

Go back to your food data worksheet. In Fl, enter the column label x*=4x. In Gl, enter the
column label y*=4y. In cell F2, enter the formula =4*B2. In cell G2, enter the formula =4*A2.
Copy the content of cells F2:G2 to cells F3:G41. Here is how your table should look (only the
first five values are shown below):

A I B I c I D I E I F I G
food_e:xp income. x"=100.x x"�.x:
_1_ y"=yJ11H) y*�}'
115.2'2 3.&9' 369 1.152'2 14.75:' 4&0.88
c-1---
3 ns_g.s 4.39 439 1_3S.9'8 17.56: 543.92
4 119.34 4.75' 475 1.1934 19' 4n.J.s
i---

5 114.96 6.03 603 1.1496


' 24.12 45.9.84
I---

6 187-G5 12.47 1 247 1.870'5 49.88: 748.2

We want to re-estimate the food expenditure model using our newly rescaled x* and y* values.

In the Regression dialog box, the Input Y Range should be G2:G41, and the Input X Range
should be F2:F41. Select New Worksheet Ply and name it Regression 4x and 4y.

1 Regrnssiorn LZJ[8]
Jnp:rt
illpllt Y. Range: $G52:5G$41

lnput �Range: $F�2:: $f:$41

D \._oibefs D Constant is �era


D Coofidence Leve1: EJ:o/o
Output o ptions

0 QutputRange:: I ·�l
@New W-0rksheet·e_ly: j. egression 4x a'rid 4y I
0 New W.orkOOok
Residuals
D B_esiduals D Reslgual plots
D Standardized Residuals, D L"!ne Fit :f'.!Otl
Normal Preb,abUi ty
D �ormal Pr.obability pJ. ots

The results of your re-estimated regression model should be as reported below:

Yi 333.66 + 10.21xi
(4.12)
=

(se ) (173.64) (2.09)

Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as monthly income increases by 1 unit, i.e. $100, monthly food expenditure is
expected to increase by 10.21 units, i.e. $10.21. The estimated monthly food expenditure for a
household with zero income is $333.41; this is 4 times the estimated weekly food expenditure for
a household with zero income (see Section 4.3.1). Again, rescaling the data did not affect the
measurement of the underlying relationship.
104 Chapter 4

4.4 A LINEAR-LOG FOOD EXPENDITURE MODEL

In your food data worksheet, insert a column to the right of the income column B (see Section
1.4 for more details on how to do that). In cells Cl:C2, enter the following column label and
formula.
c
1 ln(income)
2 =ln(B2)

Copy the content of cells C2 to cells C3:C41. Here is how your table should look (only the first
five values are shown below):
· - _._
·
-A _
l B l c
+--
_ ______

1 food elilp income l_n{i!lcome}


2 115.22 3_,59 1.305&26458
-3 135.98 4.3:9 '1.47S32�2:27
4 119.34 4.75 1.558144618
5- 114.96 ,s_o3 1.796741011
6 187.05 "12.47 2.5233257f)

4.4.1 Estimating the Model


We estimate the following linear-log model for food expenditure:

FOOD_EXP = {11 + {J2ln(INCOME) + e (4.13)

In the Regression dialog box, the Input Y Range should be A2:A41, and the Input X Range
should be C2:C41. Select New Worksheet Ply, name it Log-Linear Food Model and do check
the box next to Residuals. Finally select OK.

r R�----- -f1j�
OK tiJ
Input
inputYRange: �:$2::$Aµ1 �
Cancel I
[nputKRange: I !iC$2:: $t �1 [iJ
Ol,,abels D Cornstant is �ero
!ielp· ]
D Con�dence Level: �%
Output op�on:;;

0 Qu:tputlRange:: !iii
® Ne•111 Worksheet Ely:; I Log-linear Food Model I
0 New �orKbook
Residuals
D ResiQUJal Plots

The result is (matching the one reported on p. 144 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 105

A I B I c I D I E I F I G H I I
1 SUMMARY OUTPUT
.
2 I
:3 R.egression Sl�lislics·
I '
4 Multiple H 0_5917084.978

f
'5 R ?quar� _ 0-35651;04 71
,___,__ I

_£ Adjus�ed R SqL!aFe· 0 . .3 3957 6536 r


7 Standard Error 91-5671. i 0:26
'
'8 Observations 40
'

�AN OVA t '


'
11 I rJf SS MS F Sig_nificance f
'
12 RegF�ssion 1 176:51.9.79 71 176:519.79'!1 21.05301996 4.75993E-05

t
- ·I
t3 Resrdwal 38 318612.3 5 59 8384,_535G82
'
14 Total 39 495132.153
1.5
�6 GoeHicients Sl<1ndani Error I Stef P-vil')Ue .Lowe.r95% Uooer95% Lower95.0% Uooer95.0%
r--
11 lnleKept. cS7.18641517 8423744235 -1.1537199'19 0.255620028. -2&7.716:2004 73.34337005' -267. 7162004 73.34337005
-
Ta x variabie.1 13z:1 s584.24 28_8 0461184 ··.uag357 ii .759'93E--os 73 8 53'g.54-77
_ 190.47773· 7f8.S395477 f 9fl._,f7773:

Note that your ANOVA table should be followed by a RESIDUAL OUTPUT table. This last
table contains a column of Predicted Y or fitted values and a column of Residuals values. We use
the fitted values in the next section.

'! A I B I c I
2:.2 RESJDUAL OUTPUT
I
·23
24 ObseNation Predicted Y .Residua.ts
25 1 75.37280548. 3�.84719-552:
2.6 2 98.330ll827 37.649-51773
27 3 108.747080-8 10,59291519
28
-- 4 140 -282'1-6,7 -2532216803
.5 23K31 1 059·4 --4!}-25105·644

4.4.2 Scatter Plot of Data with Fitted Linear-Log Relationship

Go back to your food data worksheet and select A2:B41. Select the Insert tab located next to the
Home tab. In the Charts group of commands select Scatter, and then Scatter with only
Markers.
-
Scattn -

c111urn·111

Cham
A,rea Srntf->J
� 1-'-S
Other
Charts T
fi
! Ll �
• a

The result is:


106 Chapter 4

.... ..
4()

35

3()


.. .. ....
25
T
••••
2()
••• .. ...�

. .. ,.. . . .- •Seriesl 1:
15 ..... .
• ••
t()

5

r
()

0 100 200 30-0 4()0 500 60{) 70Q


: -- ---

You can see that our food expenditure values are on the horizontal axis and income values are on
the vertical axis; we would like to change that around and edit our chart as we did in Section 2. 1.
(
The result is see also Figure 4.6 on p. 144 of Principles ofEconometrics, 4e):

0
D
lD

....,. 0
.i; 0
l1' .
!!! �.
" 0
:= a
,,
"
"" . I
.
8. 0
. .
" <>
" "'
"C . . ·'
0 0
.e a . .
,,_ "' . .
::;;; .
" a . . .
0
"
!: ....
0

0 5 10 15 20 25 .:15 40

wee klvin<:0me in .SUIO

.... .. - -

Finally, we add the fitted linear-log relationship to our scatter plot. Right-click in the middle of
the chart area of your scatter plot and select Select Data. In the Legend Entries (Series) window
of the Select Data Source dialog box, select the Add button. In the Series name window, type
Fitted Linear-Log Relationship. Select B2:B41, from the food data worksheet, for the Series X
values; select B25:B64, from the Log-Linear Model worksheet (Predicted Y values) , for the
Series Y values. Finally, select OK. The Fitted Linear-log Relationship series has been added
to your graph.

eel Data Source


, ------ -
!Edit Series.
Ct.art gata range: c=
Qel�ete The data ranoe Is. toQ ·complel< I I ="Fitte:d Linear-log Relationship" [i] = Fr
the series in ihe Serles panel.
Re·;et.to M.o_tch Stytf Series� va'luesc
-

Change ClhalTtTJ!p� ...


rr==i I ='fwd data' ':$B$2;S8$41 � =
$.

Le!Jend Enbies. (S_eries) Series)'. �a'lui>s:•


S!le.ct Data ... __ � I ='Log-linear Food ·Modd'.J.$85'2.5: $13. [!EJ
3-C foLtLic-n, � I �_Add �[ ��it = 7�

� Eo·mat Pr.otAr,·a...
Seriesl DK G;J
Prediction, Goodness-of-Fit, and Modeling Issues 107

Before you close the Select Data Source dialog box, select Seriesl and Edit. Type the name
Actual in the Series name window. Select OK. In the Select Data Source window that re­
appears, select OK again.

r
1 S'0lffi:t Data Source
..

I Edit Se rie5.
J:l:!lart gata range; c=
The dlliD range is roo complex t Series g_ame·:
the serieos in ttie· Series.pal':lel.
[IJ '>!!lei

JP Series �values.:

I ='food clatt.'!$6$2:::58.$"11 lil =3".

Series Y 'lalUJes:
I ='fuod dat:l'!$A$2:-$A$41 liJ. = 1.l

OK_E;J

Make sure you chart is selected so that the Chart Tools are visible. In the Layout tab, go to the
Labels group of commands. Select the Legend button and choose either one of the Overlay
Legend options. Grab your legend with your cursor and move it to the upper left comer of your
chart area.
Ol!erl!ay Legemf at .Right
Sh1ow Leg1e-ndl at iight of

Ch;ntTool�
the chart
��
wbthouli resizing
ov'11rl.ay i�;;n di at L1eft
Show Legrend at ren of 1-'of
Chart Axi1 Leg:end D.ot:i Data
Title• Title>• • �f;;ablJ"ls • Tattle•
the chort wWlou1i re si:zin: g
Design Layo Format
Labe�
wC::s

Finally, we want to reformat our Fitted Linear-Log Relationship values series. Select the
plotted series in your chart area, right-click and select Format Data Series. A Format Data
Series dialog box pops up. Select Line Color and Solid line. Change the line color to something
different from the Actual series points. Select Marker Options, and change the Marker Type
from Automatic to None. Select Close.

Qel:ete

� Resetfo M£tch Sfyle


r
Ch:ang·e· Serre·! Chart:TJ!pe·... I Format Da ta S·eries line Color --- ----_ ,

For mat Data Series


� S:tled Data ... 0 �a'.line
I Series Option& Marker Options
.!3_111tatio"I ® �olid line
Serjes Op1lons Marker Type
I 3-D
Marker Oplicms 0 §.radient line
Add Data1 �a.Q.els
0 >'1!,!fomalic
Madu"r Options� 0 A�matic

�-
Marker Fill
A.dtd lirendli:ne.,, Marker Fill

format Data Serres.... Line Color � �olor·:


Une Color �[�����

The result is (see also Figure 4.6 on p. 144 of Principles ofEconometrics, 4e):
108 Chapter 4

0
0
"' • Adunl
"' 0
.!ii 0
11\ .
!:! -Fitted Linear-Log '·
" 0
:!: 0 R"latiombip
"Cl <t- • .r,
c

8. 0
"' 0
" m

] 0
.g 0
"'
.i:'
...
!II 0
" 0
....
s
0

0 5 10 15 20 2.5 30 35 40

we•!klyinoome in $100

4.5 USING DIAGNOSTIC RESIDUAL PLOTS

4.5.1 Random Residual Pattern

Consider the following simple linear regression model:

y= 1+x+e (4.14)

First, 300 pairs of xi and ei values are created using random number generators, similarly to the
way we artificially generated variables in Sections 2.4 and 3.1.4. The variable x is simulated,
using a random number generator, to be evenly, or uniformly, distributed between 0 and 10. The
error term e is simulated to be uncorrelated, homoskedastic, and from a standard normal
distribution, or e-N(0,1). We generate these simulated observations next.

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Random Residual.

I Random Residual ,./� Al


'I

In cells Al:Bl of your Random Residual worksheet, enter the following column labels.

A B
x e

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

formulas [ D•ta'� R!'lliew


Prediction, Goodness-of-Fit, and Modeling Issues 109

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
----- ------=---- '-

ta Analysis (1]�
Analysis, Tools

F-TestTwo-Sample fur Variances


Fourier MalySis
His tog ram
M ovi�vera9�
umm4m;m!§.l§&l!l!l· t±elp

ti
Rank and Percentile
Regr=ion
Sampling
t-Test� Paired Two Saomple fur Means
t-Test: Two-Sample Assuming EQual Variances �I

A Random Number Generation dialog box pops up. The Number of Variables simulated is 1,
and the Number of Random Numbers generated is 300. The variable x is simulated to be
Uniformly distributed between 0 and 10. Select the Output Range in the Output options
section, and specify it to be A2:A301 in your Random Residual worksheet. Finally, select OK.

,. -
Random Numb-er Generation [1]�
Nw-riber of !£ariables.:
1�1.----�
Number of Random Numb_ers: �I
.3
_ 0
0 ____� �-C_an_ce_I �

Q.lstnbutiom I uniform
Parameters

Ri111dom Seed:

Outputop\iarui
@ Qutput Range:

We repeat to draw a random sample of 300 error term from a standard normal distribution. Select
the Output Range in the Output options section, and specify it to be B2:B301 in your Random
Residual worksheet. Finally, select OK.

-
1' Rarnidom Numltl·er Generation [1] �
Nll!Tlber of ilariables: lt
._ ___ __.I �
Ni.imber of Rilif'ldom NurnQers: l ::m_o
._
____
_.I I cancel ]
Q.istribution:
�IN _rn ma_ _I _____
v I [ ttelp ]

Parameters

M�an=
!CJ
:i_randard deviation = �

Random Seed:

Ou1put oplions

® QU1tput RanQe: :$8$2: $6$30 l


110 Chapter 4

In cells Cl:C2 of your Random Residual worksheet, enter the following column label and
formula.
c

y
=l+A2+B2

Select cell C2 and copy it to cells C3:C301. Here is how our worksheet looks (only the first five
values are shown below):
A B c
x e y
4-405957 0.998193 6.40415
'9.518723 1.011883 11.53061
3.821223 -0.0063 4.812 922
5 :2.649'922 - 0 . 4 32 0 1 3.217908
6 3.976562 0.25586 5..23:2422

Note that you will have drawn a different random samples and thus also obtained a different
sample values for y.

Next, we apply the least squares estimator to these simulated observations and compute the least
squares residuals.

In the Regression dialog box, the Input Y Range should be C2:C301, and the Input X Range
should be X2:X301. Select New Worksheet Ply, name it Simulated Model 1 and do check the
box next to Residual Plots. Finally select OK.

- - ----�
--== - - -

Regr,es�ion llJ�
Input

Input 'f. Range: I $C$2=$C$30:1 c�J


Input-� Ra"ge: I s,e, $2: $1i$301 �
t::!eip
Dtoabe.ls D Con"t"mt ls.£ero
D ConBdence te�el � �%
Output opbons
0 Qutput Range� �1
® Ne111 Wmrkshtet �ly� J Simulated Model 1
0 Ne"' ��rkbook
Residuals
D &esiduals

In addition to the Summary Output you now have a Residual Output table and a Residual Plot
in your new worksheet.

'I A I B I c X Variable 1 Residual Plot


22 RESIDUAL OUTPUT
f--. • -<-
I
4
23
.JI! 2
24 ObseTWJtiorr Precfjcfed Y Residuals ..
25 L 5_.}73941992 1.0302083 94.
"
:g 0
-
0. 95 7il94 7.4t1:' ii
J.§_ 2' 10.5'72:7117 -2 12

J!_ 3' 4.779'3713-02 0.033550994


-4
28 4, J.58836801 -0 .3704603 63 XV11riabl1' 1

2.9 5 4.937323536 029,5098435


Prediction, Goodness-of-Fit, and Modeling Issues 111

After editing the chart as we did in Section 2.1 or Section 2.3.4, the result is (see also Figure 4.7
p. 146 of Principles ofEconometrics, 4e):

Simulated Linear Mod·el Residuals

z

• I
.. • 4ii • •

,. •
, . ••:
l - .. : .,... , •e •. ... .. ..

:.
••
4e45 • I

-: •

.., :
: I
• • • . , 41 \.,. • • .I • "
• ••• •
= • • • • • •

....•• "'1i
ll 0 I • ••
..., I .. ...
••

?•: • • • .. • •.... �-- � •


-
• II •
• II •.
I

i II .Ii ;.. :· I ol

U •• a : I ..• • I •; 6 • • A • ••• e e • II .. �


-l : • ,: • • .. � • I fl..:.: .:
• •• II I e • I .1-·

.. z
j ' • • .. •
• •

-� -+--���������������

0 2 4 6 8 10

4.5.2 Heteroskedastic Residual Pattern

Go back to your food data worksheet, select your scatter plot of food expenditure-income data
points and fitted linear-log relationship and make a copy of it. Right-click in the middle of the
copy of your chart. Select Select Data. In the Legend Entries (Series) window of the Select
Data Source dialog box, select the Fitted Linear-Log Relationship series, and then the Remove
button.
' - ������-

' Sel�ct D;ita Sourne

Chart Q.a ta rarng�:


The data range is, too complex to be dispjayed.
the series in the Series panel .
.Qeleh
� Resotto M;?tch Styl� 1r s.::itmP.

Chang< Chart T�p< ...

3-1:! Rotcuon.

format Pl1r>t Area...

Next, select the Actual series, and then the Edit button. In the Edit Series window, replace delete
the old Series name and re-specify r the Series Y values to be C25:C64, from the Log-Linear
Food Model worksheet. Finally, select OK, twice.

,. -- - ---

I Select ITlata S0>urce
Edit Series
chart !l_ara ranQ€: c= Series name�
The data�ange is 'too corn,Ple:x t
the series irn '!he Series panel.
Senes X �alues.:
-
J� �_cd �d_a:ta_;!-'-�-'--
l,_·rn_ -' -'-4-
- $2::58
---'--$ IJ � 3.
1 - --�•
=

Legend Enlries (Series) Series Y values:


l �'Log�inear .Food Model' !$C$2S'.$C � � �

OK fiJ

The result is (see also Figure 4.8 p. 146 of Principles ofEconometrics, 4e):
112 Chapter 4

Linear-Log Model Re.sidual:s

. " '·
.
.
. . .

: . .. .

10 20 .3 0 40

I mcomein S 1!!0

4.5.3 Detecting Model Specification Errors

Consider the following quadratic relationship:

y = 15 - 4x2 + e (4.15)

First, 50 pairs of xi and ei values are created using random number generators, similarly to the
way we artificially generated variables in Sections 2.4, 3 .1.4 and 4.5.1. The variable x is
simulated, using a random number generator, to be evenly, or uniformly, distributed between 0
and 10. The error term e is simulated to be uncorrelated, homoskedastic, and from a normal
distribution with mean 0 and variance 4, or e�N(0,4). We generate these simulated observations
next.

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Specification Error Residual.

II Soeaffcation Error Residual . ti 11


I lnmt Worksheet (Shift +Fll} II
11

In cells Al :C2 of your Random Residual worksheet, enter the following column labels and
formula.
A B c
1 x e
2 1 =2.5-((A2-1)/10)
3 2

Select cells A2:A3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell A52.

A J
1
-
2 1.1
3
. ' �
Prediction, Goodness-of-Fit, and Modeling Issues 113

Copy cell B2 to cells B3 :B52. Your table should look as the one below (only the first five values
are shown).
., A I B I c
1

x e

2 1 '.2-S
3 2 2.4
_!._ 3 23
5 4 2-2
6 5 2_ 1

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
, -

Data Ana lysis --1i]L8)


ArialysiSTools
F-TestTwo-Sameil� for \lar.iances
Fourier Ana'ly9s
HistOQram
Mo��_il-vera�_
tielp
1;mm1mwu4.14E1u..1.

H
Rank and Percentile
Re:gre.ssio n
Sampling =

t-Test: Paired Two Sample for Means


t-Test: Two-Sample Assuming.Equal Variances

We draw 1 random sample of 51 error terms from a normal distribution with Mean 0 and
Standard Deviation 2. Select the Output Range in the Output options section, and specify it to
be C2:C52 in your Random Residual worksheet. Finally, select OK.

Nlumber -of '{ariables:


I�i___ ___,l LSit;J
Number of Random Nu�rs':
�l -'1_--- � J [ Cancel J
·Qistrlbution: '-IN_ al
o r_m_ ___
_
____,"""
" I [ t!_elp
Parameters

Mg_an=

::trandard deviation = �

B.andom Seedc.
OUtputoptions
0 Quti:>ut-Range: I $1::$2: $C$52

In cells Dl:D2 of your Specification Error Residual worksheet, enter the following column
label and formula.
114 Chapter 4

D
1 y
2 =15-4*(A2A2) +B2

Select cell D2 and copy it to cells D3:D52. Here is how our worksheet looks (only the first five
values are shown below):
A B c D
x e J. -
1 2_5 2.72.3068 _7_275g3
2 2-4 -0_50477 -8_54477
3 2.3 1_115236 -5_04476
4 2-2 2_916886 -1_44311
5 2__ 1 2.982706. 0.342706

Note that you will have drawn a different random samples and thus also obtained a different
sample values for y.

Next, we apply the least squares estimator to these simulated observations and compute the least
squares residuals.

In the Regression dialog box, the Input Y Range should be C2:C52, and the Input X Range
should be A2:A52. Select New Worksheet Ply, name it Simulated Model 2 and do check the
box next to Residual Plots. Finally select OK.

Input
Input :t_ Rar.ige:
Cancel
InputlIR,,,,ge: �$2::$11$52 �
ltjelp
D loaliels D CenSctallt is f_ern
0 Coojider.ice Level: EJ <>r.
Output options
0 QutputRange: l 'sm0roted !odd� �I
0 NeVi' W"rksheet E'.IY: I S"wnulated M"del � I
0 New �orkbook

D B_eslduals � Resi!l_ual Plots

In addition to the Summary Output you now have a Residual Output table and a Residual Plot
in your new worksheet.
- -- -

A I B I c X Var iable 1 Residual Plot


22 RESIDUAL OUTPUT
23
24 Oooervafion Predii;ted Y Residuttk;
2'5 1 5_967061�37 -i 3_233�9347
,_
26 2 5_976552712 -14.521.32
2f 3 5_9%043887 -11_040807!!4
,__.la
. 4 -6_015535062' -7-458 64-9129


29 5 6_035026236- -5-'692320172

After editing the chart as we did in Section 2.1 or Section 2.3.4, the result is (see also Figure 4.9
on p. 147 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 115

Mis.specified Medel R·esidu.al:s

15

10

.Iii 5
..
::I
JZ ()>

,;! -5

-10

-15

-20 -1-----.,.----.---.---,,...---.

-3

4.6 ARE THE REGRESSION ERRORS NORMALLY DISTRIBUTED?

Our analysis of normality of the regression errors will include a histogram of the residuals and the
Jarque-Bera test for normality.

4.6.1 Histogram of the Residuals

Go back to your Food Regression worksheet. If you do not see your Food Regression tab, it is
because it is hidden. Use either one of the left-arrows at the left comer of your screen so that the
first worksheets you were working with can be seen again. (If the worksheet you need to go back
to is a recently created one, use the right-arrows.)



Next to the columns of Residuals in the residual output section of the worksheet, we will create a
BIN column. In cell D24, type BIN. The bin values will determine the range of residual values
for each column of the histogram. The bin values have to be given in ascending order. Starting
with the lowest bin value, a residual value will be counted in a particular bin if it is equal to or
less than the bin value.

Fill in the bin values as shown below. Note that all you need to do is enter the first two values:
-225 and -200, select cells D25:D26, move your cursor to the lower right comer of your selection
until it turns into a skinny cross as shown below, left-click, hold it and drag it down to cell D43:
Excel recognizes the series and automatically completes it for you.
116 Chapter 4

D I
24 BIN
2
-

25
25
26
27
28
-200
-1'75
-150
1
29 -125
30 -100
31 -75
32 -50
33 -25
34 0
35 25
36 50
37 75
38 100
39 125
D J 40
41
'1'50
175 .I
��I
2
. + r--1\
:���I I 42
43
200
225
7J T:
E:::========::::::::!I
. ,

Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
Ii& Data.�rcalysls I
-

nata[;j�11te�
- -

I I
I
I

"Ila5
E'orn:i J.ln.arym

The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.

analysis Tools
Covariance
Desi::i-iptive Stalistirn
'Exponential Smoothing
.f-TestTwo-Sam ple for V;;irianc:es
1�Fonum
r ie�r�M l ·� s tl_elp
iai
1i#ij!.Ji. I'"' r 1........ ......
I .. ..·.�
'Mov.ing .Average �
'RaAdom Number (:;ener.ation
Rank and Percentile
'Regression -vll

An Histogram dialog box pops up. For the Input Range, specify C25:C64; for the Bin Range,
specify D25:D43. The Input Range indicates the data set Excel will look at to determine how
many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it Residuals Histogram; check the box next to Chart Output. Finally, select OK.
Prediction, Goodness-of-Fit, and Modeling Issues 117

r -

Hi stogram tz:J�
Input
Input Range:
[�] DLt1
!:l.inRilflge: �
't!elp
0!,abels

Output opti om;:


0 Qutput Range; I �I
® New Wod;sheet Ely; I Residuoils Histogram I
0 New '.O'.orkbook
D P5!1'eto (sorted! histogram)
D Camulatrve .Perc:entage
0[\;;��fr'.9,�.tr;i.!1

Select the columns in your chart area, right-click and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap.

-
· - ----- _-

Formal Data Series rn�


Series Options Series Options
.Qelete
AU Series Qverlap
� Res etto M§.tcl!I Style
Border Color
Separated overlapped
Olangie s;e·!ies. Ch.art WP·�...
Border Styles
� Ss_lect Data ...

3 DRotat1r;m, Shadow
Gap�dth
Add Dato LaQels 3-0 Format
No Gap Large Gap
Add lirmdline ...

f o.rmat Dat.a 5 e fi�L.


l :·�······· [0�··•1=--
Go to the Border Color tab and select Solid line, choose a different Color if you would like.
Select Close.

r Format Data Series - ru�


.Serie� Options Border Color
Rll 0 !?!_o line
Border color @ S.olld line

Bor·der Styles. 0 !2.radient line

0 Agtomatic
Shadow

3--0 Format
�olor: I��
]"ansp �Col'or] 0 [0% �1 Close.ti]

Finally, delete the Legend, and increase the size of the Chart area (see Section 2.3.4 for more
details on that). The result should be very similar to Figure 4.10 on p. 148 of Principles of
Econometrics, 4e:
118 Chapter 4

Histogram

- 225 -]75 -125 -75 -25 25 75 Jl25 175 225

Dim

4.6.2 The Jarque-Bera Test for Normality using the CHllNV and CHIDIST
Functions
When the residuals are normally distributed, the Jarque-Bera statistic UB) follows a chi-squared
distribution with m = 2 degrees of freedom:

]B =
N
6
( S
z
+
(K - 3)2 ) "'X(m=2)
4
z
(4.16)

where S = µ3
0'3
is a measure of skewness and K = �a 44 is a measure of kurtosis,

where (4.17)

(4.18)

(4.19)

and N is the sample size.

If the hypothesis of normally distributed residuals is true, there is 100a percent chance that the
computed ]B statistic is equal to or greater than the chi-square critical value Xci-a,m)· If the
computed ]B statistic is equal to or greater than the chi-square critical value Xci-a,m)' then this
presents us with evidence that our hypothesis of normally distributed errors is false; we thus
reject it.
Prediction, Goodness-of-Fit, and Modeling Issues 119

2 reject Ho
X(m)

2 x_'L value
X(1-a,m)

We will create a template for the Jarque-Bera test for normality. But before we do that, we need
to go back to our Food Regression worksheet to perform intermediate calculations.

11� � � �1 I Food Rea res.sion /'..'


l Ready l ·"' 11

Before we compute the measure of skewness S and the measure of kurtosis K, note that since
2
L( ei - �4,
3
� = 0, the numerators of equations (4.17)-(4.19): L( ei - �) , L( ei - � , and can
.
s1mpl.1fy to.
. � "2, .t...
.t.... e
� "3
. e and .t.... e"4 .

i i i

To the right of the residual output section, create the following table:

F G H
2 3 4
24 residuals residuals residuals
25 =C25/\2 =C25/\3 =C25/\4

Copy cells F25:H25 to cells F26:H64. Your worksheet should now look like the one below (only
partly shown):
F I G I H
24 Residuals? Residuals3 Residuals"
25
-- 34A_S208433 -202219603 1186.945115
2'6 59_.�§41�98 �,. 464.3421034 3595.705.263
f--
27 158.0505536� -1986.98245 24979_9n4s
,..__
28 901-2:097207 -2:7054.4557 812178.9608
r--
2:3 560.7541899 - 1 3278 798
� -- -
. 8 -314445.2614
-- - -

Now, we are ready to create our Jarque-Bera test template.

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it Jarque-Bera Tests.

lt:J f'{ 1
I 1 rnre.lit Warkihe.•t CShi�-Fli) �

Create the following template to perform Jarque-Bera tests:


120 Chapter 4

A B c
1 Data Input N= ='Food Regression'!B8
2 a=
3 dfor m= 2
4
5 Computed a-tilde= =SQRT(SUM('Food Regression'!G25:G64)/Cl) (4.17)
Values
6 µ3-tilde= =SUM('Food Regression'!H25:H64)/Cl (4.18)
7 µ,i-tilde= =SUM('Food Regression'!125:164)/C1 (4.19)
8 S= =C6/C5"'3
9 K= =C7/C5"'4
2
10 x -critical =CHllNV(C2,C3)
value=
11
12 Jarque-Bera JB= =(Cl/6)*(C8"'2+((C9-3)"'2)/4) (4.16)
Test
13 Conclusion =IF(C12>=C10,"Reject the hypothesis of
normally distributed errors","Do not reject the
hypothesis ofnormally distributed errors")
14 p-value= =CHIDIST(C12,C3)
15 Conclusion =IF(C14<=C2,"Reject the hypothesis of
normally distributed errors","Do not reject the
hypothesis ofnormally distributed errors")

2
The x -critical value is computed using the CHIINV statistical function. For our purpose, this
function syntax is:
=CHIINV( a,m)

where a is the level of significance of the Jarque-Bera test, and m is the degree of freedom of the
chi-squared distribution.

The p-value is computed using the CHIDIST statistical function. For our purpose, this function
syntax is:
2
=CIDDIST(x -value,m)

2 2
where x -value is the x -critical value for which we are computing the p-value, and m is the
degree offreedom ofthe chi-squared distribution.

At a= 0.05, the results ofthe Jarque-Bera test are (see p. 148 ofPrinciples ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 121

A B c D E F G
1 Data Input N= 40
2 a= o_o5
3 df or m = 2
4
5 Computed Values cr-ti'lde = 87_250383
6 µ.,.-tilde = "�.�39,..66
7 Jl�tilde = 173:220834
8 s= -0_ 097319

--r
9 I{ = 2 _9'890333
10 I-critical value = s:!l-914645
'11
12 Jan1u�-Bera Test -
JB = 0.0633402 �

13 Conclusion = Do nol re]ed the hypothesis of normally dlslrihuled errro-rs

14 p-valua = 0_9'680262
1.5 Cenci us ion= Do not reject the l'lypoth·esis of normally distributed emars

4.6.3 The Jarque-Bera Test for Normality for the Linear-Log Food
Expenditure Model

We first go back to our Log-Linear Food Model worksheet to perform intermediate calculations.

To the right of the residual output section, create the following table:

F G H
2 3 4
24 residuals residuals residuals
25 =C25"'2 =C25/\3 =C25/\4

Copy cells F25:H25 to cells F26:H64. Your worksheet should now look like the one below (only
partly shown):
F I G I H I
2 � 4
24
--
Resid'uals Residuals Resid'uals
25 1587_ 79ll991 6326�.33&84 2521105.635
--'-

1.§. 1417-493715
..
53368.09651
.
.20092
1 68.432'

_JJ_ 112.2098522 1188_629448 12591-05()94


� 641.2121938 -16236.8829 411153'°'775
29 242.fi.651681 -'119539 .425 588863U82

Now, we are ready to modify a few cell references in our Jarque-Bera test template.

Go to the Jarque-Bera Tests worksheet.

Replace all references to the Food Regression worksheet to the Log-Linear Food Model
worksheet (see outlined below in bold).

A B c
1 Data Input N= ='Log-Linear Food Model'!B8
2 a=
3 df or m = 2
122 Chapter 4

A B c
5 Computed a-tilde= =SQRT(SUM('Log-Linear Food Model'!G25:G64)/Cl)
Values
6 µ3-tilde= =SUM('Log-Linear Food Model'!H25:H64)/Cl
7 l.14-tilde= =SUM('Log-Linear Food Model'!I25:164)/Cl
8 S= =C6/C5/\3
9 K= =C7/C5/\4
10 X,2-critical =CHIINV(C2,C3)
value=
11
12 Jarque-Bera JB= =(Cl/6)*(C8/\2+((C9-3)A2)/4)
Test
13 Conclusion =IF(C12>=C10,"Reject the hypothesis of normally
distributed errors","Do not reject the hypothesis of
normally distributed errors")
14 p-value = =CHIDIST(C12,C3)
15 Conclusion =IF(C14<=C2,"Reject the hypothesis of normally
distributed errors","Do not reject the hypothesis of
normally distributed errors")

At a = 0.05, the results of the Jarque-Bera test are (see p. 149 of Principles ofEconometrics, 4e):

A B c D E F G
1 Data Input f\I= 40·
2 II= 0.05
3
4
1df o� m = 2
t
5 Computed Valu�s er-tilde= 8:9.248579
(); µ3-tilde = 99251.00
7 �-tilde= 20Q.3.353n
5, S= 0.�3961-45
9t K= J: . 2048499
10 i-criticaJ va'lue = 5'.9914645·

.
H
·12 Jarque-Bera Test JB =
0.'.1998875
-
13 Conclusion= Do �ot rej.ect the. hypothesis of normally_distributed emors

�4 p_-value =
01.S048883
15 Conclusion= D_o nat_�eJect the_hypa:the_sis of n_grmally_distributed en:rors

4. 7 POLYNOMIAL MODELS: AN EMPIRICAL EXAMPLE

Open the Excel file wa-wheat. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 4 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it wa-wheat data, and in it, copy the data set you just opened.

I I lns.ert Work�fteet (Shtft-f11) II Q


' wa-wbeat data , f;I 11
Prediction, Goodness-of-Fit, and Modeling Issues 123

This data set gives average wheat yield for different regions of Australia for the period 1950-
1997. Time is measured using the values 1, 2, ..., 48 in column E. We would like to plot the
yield data for the Greenough Shire area, reported in column D.

4. 7 .1 Scatter Plot of Wheat Yield over Time

Select the Insert tab located next to the Home tab. Select D2:E49. In the Charts group of
commands select Scatter, and then Scatter with only Markers.

Scatter

Columoi Une Rar Ar..a. 1 :>c a·thi Otll:�r


- - l°f Chart,�

Cllarts r.

The result is:


.. ...... ..
60

50

# •
....• .
40
.,
;
•"·
..
'30
•Series1

20
-
- .••- \ I:

. ...
• #
10
·-�
.. - ...

0
�· .. .
0 0.5 ·1 1.5 2 �5

: - - ...... - - ..

You can see that our yield values are on the horizontal axis and our time values are on the vertical
axis; we would like to change that around as we did in Chapter 2 with our plot of food
expenditure data. Select the points on your plot, right-click and select Select Data.

·60

50

40


.Qel.ete
so
+.s.eries1
,l:J Reset to M:i!_tch Style
.20
_1£11 -Change.SHies Charil:Type...

10
• lliJJ S:i;lect D.atta ....
ht
3 D B�ta'J1rn,,
• -
Adidl Data La!):els
0
AddiT1endlin• ...
0 0..5 1 L5 z Z_'i

.... � fermat Data Series...

A Select Data Source dialog box pops up. Select Edit.


124 Chapter 4

Chart !iati range:


[�)

� I �S�hli Ro.,,/Column J 9
L =e
r;=e i. es �'=
=
n d=En=tr
g= er=
ies=� ===;-;====-==7---, ,Hori2orit:aJ (!;;ategory) Axis Label£
=;;i'
lk � ��=dd�..
� ll
rN..,
...., · �d
.... it"*'t'J�I
=X=�=c w=
em=
u
--1 -1 ' 2{ E 1 I
_ •
e [_'.'_ J ll
Seriei:l 0.9141

0.6721

0.71.91

O.nlill

o.:ms

[ !::!jdden and Einpty Cells J r OK I [ Cancel

In the Edit Series dialog box, highlight the text from the Series X values window. Press the
Delete key on your keyboard. Select E2:E49. Highlight and delete the text from the Series Y
values window. Select D2:D49. Select OK.

- . .
I Edit Se ries m� Edit Series - �rg)
Seriesoame: Series :[lame :

�-----[i] �-------��-; �<ct� �


Series� values� Series X values:

... .. I la.'il = o.:91'!1,o.6n1 ...


� I =vv 1'5fS2,$E$49
' a�A•heatdata'
�-------�-
[�J = i, 2,, 3, 4, 5,.,,
Series Y values:. Series 'f_ values:
I ='wa-.vheat dat.'!$E5:2::SE$49 [i} = 1, 2, 3, "· 5, ... �I _ a_ta_' ' _
='_ vv _a �_.. h_ea_ td · _ M'J
S0 S2 _ ; $0 [ J�
_ _ l_�-:
_ = 1), 9141, 0.6721 ...

OK � [ Cancel OK Cancel

The Select Data Source dialog box reappears. Select OK again. You have just told Excel that
time are the X-values, and yield are the Y-values - not the other way around.

After editing your chart like you did in Sections 2.1.2a-2.1.2c, the result is (see also Figure 4.11
p. 150 of Principles ofEconometrics, 4e):

J '

..

L5

:II .
� .
.
.. . .
. ..
. .
1

. ..
..

OS

0 10 20 30 40 �o

Tunec
Prediction, Goodness-of-Fit, and Modeling Issues 125

4.7.2 The Linear Equation Model

4. 7.2a Estimating the Model

We start by estimating the following linear equation model:

YIELDt =/Ji+ P2TIMEt +et (4.20)

In the Regression dialog box, the Input Y Range should be D2:D49, the Input X Range should
be E2:E49. Select New Worksheet Ply and name it Linear Equation Model; and do check the
box next to Residual Plots.

- ������� --

! Regn�:1-sion [I] rg)


u
rn p t

!!nJJUt i'. RangJe:

!Input li Ran.ge.:

·l::!elp
D \_abels D Constant is G_er()
D Con�dence Level: EJ %.
Output option&

0 Qutput Range: �1
0 New WorkSheettely: jLinear E:quation M1>d;el I
0 New Woi\Cbook
itesidual,;
DResiduals

The results are (only part of the residual output section is shown below; the residual plot is not
shown at all) :

A I B .C I D I E I F G I H
1 SUMMARY OUTPUT
-;zi--�����-
3 Reores:;ion Sraostir;s
4 Multiple R (J.805849601
5 R·squaM� 0.6'4939358
T Adjusted R Squ.are a.'641771101
- Standard Error
T
8
·

Ohs(!rvatic.ns
0.:21B69Zz34
48

_;�ANOVA
11 ] df SS MS F Signific1tnce F
1 4.074859899 4.074859899 85.20124832 4.B7517E-12
4,6 2.200009496 0 . 04 7826293
47 6.274869'3.95

Coeffidents Standard Error t Slat- P-vaJue lower 95% Upper 95% lower 95. 0% Upper 95. 0%
I ntsrc_spt
J..l.j tL!J.37777837 0_064130508. 9'.944999006 4,.fi492E-13 0.5.QB689822 CJ.7661Hi5B52 .
0. 50868
. 9822' 0,_7'6&865852
1"8 X Vanab.le 1 () ..021031942 o.06227as.:fo �.23o4s2221' 4.87577E-12 ().01G445482 0_02s6ls·402 o.o 16'i4s4si o.01s&1s402

Predrr;fed Y Re.siduaJs
1 (J.:�5fl80,9�?9 0-255290e21
2 a:&79B41721 -o.o"Qn,f1121
3 ()_ 700873663. 0.01822.&337
126 Chapter 4

The estimated linear equation model is (see also p. 150 of Principles ofEconometrics, 4e):

YIELDt = 0.638 + 0.021TIMEt


(4.21)
(se ) (0.064) (0.002)

4. 7.2b Residuals Plot

2.1 or Section 2.3.4, the result is (see also Figure 4.12


After editing the chart as we did in Section
on p. 150 of Principles ofEconometrics, 4e):

Lin·e.ar Yield Model Residu.als Plot

0.8

0.6

0.4
. . .
. . . . ..
.Jll
"'
0.2
"'

3Z
. ..
-··.
. .
••
••it

,i (J
..
-0.2 . . . ..
-0.4
...
-0.6

(J 10 20 s;o 40 5()

Time

Note: to draw the horizontal axis below all the points, select the vertical axis on your chart, right­
click, and select Format Axis. In the Format Axis dialog box, under the Axis options panel,
select the Horizontal axis crosses at the Axis value -6.0. To draw an horizontal line at level 0 of
the residuals values, select the plot of residuals on your chart, right-click and select Add
Trendline. Choose the Linear option, and Close.

.Qelete
r
- ------ ------
� R.-�et to M�tch Style
-- 1 Format T ren d line

Change S:e·rie>. Chart li�pe .. ,


Trendline Option,. Tr>endline Options
tiJ 5�1ed [}ah...
L·in� Color Trend/R_egrerion Type
3--0 Rullill n ...

I Add�-Data La_!!els
Line Style
Shadow
Jl£J 0 EJgJOnenbal
AddlTrendU n,.. ... � JV!.'J �binear
� .Em•rmat Datta 5eries ...
Close c;J

4.7.3 The Cubic Equation Model

4. 7.3a Estimating the Model

We start by estimating the following cubic equation model:

YIELDt =Pi+P2TIME� +et (4.22)


Prediction, Goodness-of-Fit, and Modeling Issues 127

Let TIMECUBEt = TIMEt /1,000,000: our explanatory variable is redefined as our original
explanatory variable, cubed; and it is also rescaled before the equation above is estimated.

3
Go back to your wa-wheat data worksheet. In Fl, enter the column label time . In cell F2, enter
the formula =(E2A3)/1000000; copy it to cells F3:F49. Here is how your table should look (only
the first five values are shown below):

I D I E I F
1 gre.enoug,h time time3
l 0>.9r141 f Oi.000001
3 0..6721 2 Oi.000008
'4
15 o.i1s1 3 01.000021
0·.7258 4 0>.000064
£ 0.7998 5 0'.000125

We want to re-estimate our wheat yield model using our original y values and our re-defined and
re-scaled x values.

In the Regression dialog box, the Input Y Range should be D2 :D49, the Input X Range should
be F2:F49. Select New Worksheet Ply and name it Cubic Equation Model; and do check the
box next to Residuals Plots.

,- - ------- - -

I Reg ressi on LI)rg:i


Input
InputY,Range: 1$)$2:$0;?49 [�J [_o�tiJ
lt$F'§;2:�M9 � [ Carncel I
lnptJt�Range:

D !,,_cibels D Constant is ;;.ere


I ttelp. J
D Con�dence Level: EJ %
Output options

0 Quq:utRange: .�1
0 New WoFllSheete,ly: [ Cubidquaficn Modell I
0 New �orkboolc
Residuals
0Boe'liduals � Re'liQ_ual Plots

The results are (only part of the residual output section is shown below):
128 Chapter 4

A I B I c I D I E F I G I H I I I

I
2
SUfl!'IMARY OLJTPLJT

J( Rearession Statistics
�M,IUpoR •Oi.86&495734
R Square 01. 750814858• -
Adj u sted R Square 0.745397789
S1and:ard Error 01. 1 84367557
0 bs ervatio ns 48

1901ANOVA
11 I df SS MS f Sig_nrficance F
J_?_ Regressi'on 1 4.711265172 4.7112&5172• 1,3S.:50'16965 1. 76303.E-15

,.11 Resi-Ow1I
14 T�t�I --
46:
47
1.56:i604223
6.274869395
0_03399139'6.
. -�· .... ...--
1
I
15 I
161 Uee_er 95% Lower95.0% Upper95.0%
�lnterVG1crie-po11blle
X 1
Coefficients Standa!d Error
(}_8.7411<6-582
9.68151584.
t Stet
0.0·35'63066'3 24.532702.71
0.1322354527' 11. 7729<2217
P-val11e
4.6(}22-3 E-28
V680.3E-15
fo1•;redl5%
!H02395IG9i 0:9458373 96, O<. $0�3�5:76,9
!L 026202058. 11.336829&2 8. 02620205 ll
Q.945837.396·
11.33682962
-
19

JQ_
21
f---I-
22 RESIDUAL OLJTPLJT
,___
:23
241 Observalion Predicted Y Residuals
1 0.8.74126?64 Q.()3991373&
�6 2 0.8741941:)34 _:_iL2D�.094034
rm 3 0:874377983 -0.1552n9B3 I

The estimated cubic equation model is (see also p. 151 of Principles ofEconometrics, 4e):

YTELDt 0.874 + 9.68 TIMECUBEt


(4.23)
=

(se ) (0.036) (0.082)

4. 7.3b Residuals Plot

Notice that when you choose the Residual Plots option in the Regression dialog box, Excel
generates a plot of the residuals against the explanatory variable, which, in this case, is
TIMECUBE. We would like to have a plot of residuals against time instead. Select the data point
in your chart, right click and select Select Data. A Select Data Source dialog box pops up. Select
Seriesl and then Edit. In the Edit Series dialog box, change the Series X values references to
E2:E49. Finally, select OK, twice.

,., - .
S-9'lec:t Data Source

Chartgatar:ainge: c=
1he data range is too c.omplex to
e.
!he s r ies in the Series paneL . ������

Qelete Edit SE.'ries

� Re>etto M�tch·S:tyle J�
1Le!iiend Entries (§eries)
Change Se<ies Chart Type.. .
l.q S�led i)a�a... � Seri�s K values:

3-0 £otal1or I ='wa-'Wheat data'! $E.$2:�$'19I

2.1 or
After editing the chart as we did in Section Section 2.3.4, the result is (see also Figure 4.13
on p. 151 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 129

Cubic Yi·eld Medel Residuals Plot


0.4
0_3 . � ..

0.2 . .

. �

Cl.:l! ..
Jll . ..
.. 0
.
=
:!:I
-ll.ll ... .. ..
.�

..
.. .
..
--0.2 .+

-{)_3
-0.4
--0.5

() 10 20 :!O 50

Time

4.8 LOG-LINEAR MODELS

4.8.1 A Growth Model

We would like to estimate the following growth model:

ln(YIELDt) = P1 + P2TIMEt +et (4.24)

where y; = ln(YIELDt); i.e our dependent variable is redefined as the natural logarithm of our
original dependent variable.

In your wa-wheat data worksheet, move your charts to the right a little bit if you would like. In
cell Gl, enter the column label ln(greenough); resize the width of your column so it fits the new
label. In cell G2, enter the formula =ln(D2); copy it to cells G3:G49. Here is how your table
should look (only the first five values are shown below) :

D I E I F I G

,_
1 gre.enough time tim!!3 ln(gre-::nough)
2 (}_9141 1 1E-06· - 0 _ 089'815304
3- ()_6721
- 2 BE-05 -0_39'7348.14
-
,_
4 0>_7191 3 3E-05· 0 _ 329'754849
--
-

5 01_7258 4- 6E-05· -0: 320'480 784


6 0-79'98 5 1 E-04. -0_2233.93583

We want to re-estimate our wheat yield model using our original x values and our re-defined y
values.

In the Regression dialog box, the Input Y Range should be G2:G49, the Input X Range should
be E2:E49. Select New Worksheet Ply and name it Growth Model.
130 Chapter 4

Input
Input� Rar;ige:

Input lt Range:
I :$G$2: $(;s49
:$E$2::$E:$49
�]

� el

tielp
D b.abels D t;onstant:is �ero
D Con�deni:;e Level: �%
output oi;i�i;rns
0 QufputRarige: �1
@ New WGrk:ihleet Riv: IGrowth Modell I
0 Ne111J �orkbook
Residuals

0 8_esiduals 0 Resid_ual Plots


0 5:\ilfldardized.Residuals 0 Ltne Fit Pfo1ts
Normal Probability
0 f'.>!ormal Probaoility Plots

The result is:

I I I I I I
A B I c I D E F G fl I
mSUMMARY OUTPUT I I I
l
JI Reg_re;i.sjon S/ati:>fic;:; I l
4 Multirile:R 0.785168587
f-"-
5 £l. �qua�e o.51648911 •
-
c-§__ Adjusted R Square -o.1ios1s.2s:i
7 Standard Errm 0.1'.'l9164869
r--- '
8 Ol:iservations 48
'
9
c-- j
1 0 AN OVA
11 df SS MS F Sr11.nifica11ce F
t2
f----
Regres'Siol'l 1 _2 9·3313542 2!}3313'542 73. 944£3042 3.9'3229E-11
13 Re,sidual 46 1.8.24655579 0 .0396-€>&645.
c-- 1
14 Tota.I 47 4. 7s.rno 1099
15 [
t61 Coefficienft; Standard Enor t Slal P-rnlue lowr
e 95% UeP_er'95% lower95.0% Upper95.D%
H j lnterce19t -0. 3 43366453 0. 0•5 8404196 -.5.8791400!34 4,�9317E-07 - (). Mi,0> 928 0 0 1 -0.2258049'05, -0.460928001 -(). 225.804905
Ts ' x vaiia'l>le 1 b� Q,j 7843872 0.0 0'2075084 8.599106374 3.93229E-11 0-. 013666943 (LOi20,2:08- 0.013666 943 - ·a_o.2262osl

The estimated growth model is (see also p. 153 of Principles ofEconometrics, 4e):

ln(YIELDt) = -0.3434 + 0.0178 TIMEt


(4.25)
(se ) (0.0584) (0.0021)

4.8.2 A Wage Equation

Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 4 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it cps4_small data, and in it, copy the data set you just opened.

l
[Insert Worbheet (Shift-FilJ M
Prediction, Goodness-of-Fit, and Modeling Issues 131

This data set gives information on hourly wages, years of education and other variables. Based on
this data, we would like to estimate the following wage equation:

(4.26)

where Yt ln(WAGEa;
= i.e our dependent variable is defined as the natural logarithm of the
variable WAGE.

In cell Ml of the cps_small data worksheet, enter the column label ln(wage). In cell M2, enter
the formula =ln(A2); copy it to cells M3:M1001. Here is how your table should look (only the
first five values are shown below):

A I l'3 I c I D I E I F I G H I I I j I K I L I M

_J_ wage educ: exp_er hrswk ;marrie.d-


female metm midwest south west black -
asian ln{'!�9el
2· 18..7 t6 39 37 1 ·1 1 0 1 (} 0 0 2.92:852352

-
j. 11.5 1·2 16 62 QI 0 0 1 0 QI U1 0 2 . 4423 4704
4 15.04 16. 13 4·01 1 Q1 1 0 0 1 � 0 2.71071332
5 25.95 14 11 401 Qi 1 1 0 1 Qi I 0 3 255171 51
-
-+-"'"
G 24. 03 '1:2 51 401 1 o· 1 0 0 O· O· 0 3.179-3·03'05

We want to estimate our wage equation using our original x values and our re-defined y values.

In the Regression dialog box, the Input Y Range should be M2:M1001, the Input X Range
should be B2:B1001. Select New Worksheet Ply and name it Wage Equation.

������� --·-

' Regression l1]�


Input
input:r Range: I $"1$2.::$'1$1001 � �
Input! Range: $13$2::$8$1001 �

t!.elp
D b.abel� D Consbnti:o �ero
D ConBdenae G.evel: � o/.o
Output op1ions

0 Qutput Range; �I
0 New Workslieet ['.ly: J WE1ge Equation
0 Nel\' j&'.or•kbook
Re s iduals
0 B.esiduals
D SiandEirdized ResiduElis
Normal Probability
0 t:!ormal Prnbability PJot&

The result is:


132 Chapter 4

I
A __J_ B I c _j_ D _)___ E �
F _l_ G J_ H I I
1 SUMMARY OLJTPUT
,_
2
3 Regressiorr Slelistio.S
4 Multiple f3: 0_4?2142751
5 RSquare 0_ 1 7BZ04 502
'T
, A\ljust<ed R Square o. 17738106:
0-526611364:
_

,_l_ Starida.rd Erro.r


B Obse!'Jdtions 1000'

MAN
I
OVA t �
11 I df SS MS F SiS]_ni'f(carice F
Jl Regression 1 &0_01.5342·69 60_015842:69 2!6.4;1'!_q?_11 U455 9E-44
f3 Residual ----i- 998' 276.7648898 0 2773195'29
� ., �·

1.4 TolaJ 999' 336.78°'7325


15

JfiI
16
I nterc-ept
18 X Variable 1
_
CoeffiGient:i
1-60>944446&.
0_090408247
Standard Eiror
0. 08642:2944
0.006145615
t Star
18__ 622_381
14-71101802
P-value
1. 14·645E"66
U4559E-44
Lower 95% Ue_l!_er 95% Lower95.0% Upper 95_0%
1.43c9852B7 1.7790'35995 1.4391652937 1_7790%995
0.078348438· Q_ 1·02468 056 ff.O:i8:34B438 o_ 102458 as&:

The estimated wage equation is (see also p. 153 of Principles ofEconometrics, 4e):

ln(WAGEi) 1.6094 + 0.0904 EDUCi


=
(4.27)
(se) (0.0864) (0.0061)

4.8.3 Prediction

For the natural logarithm the antilog is the exponential function, so a natural choice for prediction
in a log-linear model is:

Yn = exp(b1 + b2x) (4.28)

An alternative and corrected predictor is:

2
Ye = exp(b1 + b2X + 8 /2)
(4.29)

2
where b1 b2 are the estimated intercept and slope coefficients of the log-linear model, and 8
and
is the estimate of the error variance or mean square residual (MS residual).

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Prediction in Log-Linear Model.

Pred.iction in Loai-Linear M.odel .. 'I


I Insert worksheet [5hifHF11) I

Create the following template to make prediction (in the last column below you will find the
numbers of the equations used in the template):
Prediction, Goodness-of-Fit, and Modeling Issues 133

A B c
1 Data Input Xo = 12
2 b1 = ='Wage Equation'!Bl7
3 b1 = ='Wage Equation'!B18
4 MS residual = ='Wage Equation'!D13
5
6 Computed Values natural predicted y0= =EXP(C2+C3*Cl) (4.28)
7 corrected predicted y0= =C6*EXP(C4/2) (4.29)

Here are the results you should get (see also p. 154 of Principles ofEconometrics, 4e):

A I B I C_
1 Data Input Xo = 12
-

2 b1 = 1_60H444
,_

3 �= Qi.090'408
;-----
4
--
-MS residual = 0.27732-
5
;-
6 Computedi Values natural predicten y0 = ·r4_795s
7 correc1ed pr edicted y = 16_9'9'5431

Next, we want to show graphically how the correction affects our prediction. Go to your
cps4_small data worksheet. Here are the formulas and labels you should enter (in the last row of
each of the tables below, you will find the numbers of the equations used):

N 0
1 educ Yhatn
2 0 =EXP('Wage Equation'!$B$17 + 'Wage Equation'!$B$18 * N2)
3 1 (4.28)

p
1 Yhatc
2 =02*EXP('Wage Equation'!$D$13/2)
3 (4.29)

Select cells N2:N3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell N23.

N �
1 l 1educ


�.LL !

Select 02:P2 and copy their content to 03:P23. Here is how your table should look (only the
first five values are shown below):
134 Chapter 4

I N I 0 I p
-

_j_ educ yhatn yhatc


+--
2 0 5.0000 32:76� .24.40129'449
-
3 1 5.473141211 I6_'9'M427aS
-
4 2 5_9·910156B 24.40129449
-
5 3 6.55·7091994 20.36503968

_§_ 4 7.17840673 £6.99642785

Select the Insert tab located next to the Home tab. Select Nl :P23. In the Charts group of
commands select Scatter, and then Scatter with only Markers.

ColLlmn Line �ie Bar .Aoea


I Scatf'V Oth·er
. I<[ (hilrt5 -
r.-

The result is:

45

40

;15
• •
'30
• •
25
+vhatn
20
• vhatc
15

10

0
0 5 15 20 25

Next, we would like to plot the actual values on the same chart. Select the points on your plot,
right-click and select Select Data. A Select Data Source dialog box pops up. Select Add. In the
Edit Series dialog box, specify earnings per hour for the Series name, select B2:B1001 for the
Series X values and A2:A1001 for the Series Y values-all from the cps4_small data
worksheet. Select OK, and then OK again in the Select Data Source dialog box.

: Select Data So1JJrce


-------� -

Qe[ete ChartQ_ata range.: � t Serie�

� Reset to M:2tch Style Series name:

Change S�rics Chart'Ty:pe ... �lea_r_in ng�s�_r n _our_ ___-'�


pe
I Ui S!'i-"'d Data ... J? Series � 11 alues:

3-D !!,oli!flon. Leg.ond En11'le.s §eries) I ='w.;'l_srnall data'! SB$2::$6$l001 [iJ



Series 1 vlliues:
Add Data La.!l,elsl
I ='cps'!_small data'! $A$2: �$tDO tl [i}
y.hatn

AddtTrendline.. ,
OK
!' <>rrnat Dato s .. ries ... ynatc

After editing your chart like you did in Sections 2.l.2a-2.l.2c, the result is (see also Figure 4.14
p. 155 of Principles ofEconometrics, 4e):
Prediction, Goodness-of-Fit, and Modeling Issues 135

BO .

.
70 ,.

501 ..

50•

4'0
3()• •

201
]0

0 5 :to 15 20 25

- - -•,hatn - yhak • earniri,gsper hollr

4.8.4 A Generalized R2 Measure


2
A generalized R measure can be computed as the square of the sample correlation coefficient
between y and Ye, where Ye are the corrected predicted y values:

Rz r.2�
(4.30)
=

YYc

Make sure you are in your cps4_small data worksheet. We will compute the corrected predicted
y values in column Q, and next to it, we will compute the generalized R2.

Here are the formulas and labels you should enter (in the last row of each of the tables below, you
will find the numbers of the equations used):

Q
1 corrected predicted y
2 =EXP('Wage Equation'!$B$17 +'Wage Equation'!$B$18 * B2)
*EXP('Wage Equation'! $D$13/2)
3 (4.29)

Copy the content of cell Q2 to cell Q3:N1001.

R
2
1 generalizedR
2 =(CORREL(A2:Al 001,Q2:Q1001))"'2
3 (4.30)

The result is (see also p. 155 of Principles ofEconometrics, 4e):


136 Chapter 4

Q I R
correctedl predlicted y· generalized R2
____!__
2 24_40129449 0-185930705
3 E6:!>9M2785
>--
4 24-40129449
s 2{)_36503968
,_
6 [6_99642785 I

4.8.5 Prediction Intervals

The lower limit (LL) and upper limit (UL) of the prediction interval in a log-linear model are:

LL= exp(yn - tcse(f)) (4.31)

UL= exp(yn + tcse(f)) (4.32)

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it PI in Log-Linear Model.

Lt! K I .----\ I PJ in LM1-Linear Model /I


I IJnshtWorltlhed IS�jf!:-.-FUJi L---v' I

Copy the template from the Prediction Interval worksheet (if you cannot see it, it is because it is
hidden further to the left of your visible worksheets) to the PI in Log-Linear Model worksheet.

You just need to make a few modifications to it: (1) get your regression results from the Wage
Equation worksheet instead of the Food Regression worksheet, (2) change x0 to 12, (3) compute
i from the cps4_small data worksheet instead of the food data worksheet, and (4) take the anti­
logs of the interval limits using the EXP function. Those modifications are outlined in the table
below.

A B c
1 Data Input Sample Size = ='Wage Equation'!B8
2 Confidence percentage

Level= 0 decimal place

3 Xo = 12
4 b1 = ='Wage Equation'!B17
5 b7 = ='Wage Equation'!B18
6 se(b2) = ='Wage Equation'! C18
7 MS residual = ='Wage Equation'!D13
9 Computed a= =l-C2
Values
10 df or m= =Cl-2
11 tc = =TINV(C9,C10)
12 predicted Yo= =C4+C5*C3 (4.2)
13 x-bar = =AVERAGE ( 'cps4_small
data'!B2:B1001)
Prediction, Goodness-of-Fit, and Modeling Issues 137

A B c
14 se(f) = =SQRT(C7+C7/Cl +((C3-C13)"'2)*C6) (4.3)
15
16 Prediction Lower Limit = =EXP(C12-Cl 1*C14) (4.31)
Interval
17 Upper Limit = =EXP(C12+Cl 1*C14) (4.32)

Here are the results you should get (see also p. 155 of Principles ofEconometrics, 4e):

A 8 c
; 9 Computed Values a= 5%
A I B I c
1 Data Input SamJl'le .Size= 1000 10 df.or m = 998
r---
Confiden{;e Le.vel 9'5% 11 le = 1-962344
,_L =

3 xo= 12 12 predided yo= 2_6"94343


,_
4 b1= 1.609444 13 x-bai; =" 1J. S5
,_
_,e(f) 0.546471

r
14
5 b1 = 0.090408
=

- 15
6 :se (b2) = 0.006146
16 Predictio·n Interval Lower Limit= 5.0631()6
,_
1 MS residlilal = 0.27732 17 Ufl'per Limit= 43.23744

Note that the results above and the ones from your textbook might differ slightly due to rounding
number differences.

Next, we want to show graphically how our prediction interval changes over the range of years of
education. Go to your cps4_small data worksheet. Here are the formulas and labels you should
enter (in the last row of each of the tables below, you will find the numbers of the equations used
in the template):

s
1 lb wa2e
2 =02* EXP(-'PI in Log-Linear Mode1'!$C$11*'PI in Log-Linear Mode1'!$C$14)
3 (4.31)

T
1 ub wa2e
2 =02* EXP('PI in Log-Linear Mode1'!$C$11*'PI in Log-Linear Mode1'!$C$14)
3 (4.32)

Select S2:T2 and copy their content to S3:T23. Here is how your table should look (only the first
five values are shown below):
s T
1 lb_wage ub_wag·e
2 1-711005 14Ji114!l
3 1.872!102 15.99404
4 2.050118 17.50741
5 2.244103 19.16398
s Vl56442 20.9773

Select the whole plot area you completed in Section 4.8.3, which compares the natural and
corrected predictors of wage (replica of Figure 4.14 p. 155 of Principles of Econometrics, 4e).
Select Copy and then Paste. You should have two identical charts. Below we will work with one
138 Chapter 4

of them. On that chart, we want to remove the yhatc series and add the lb_wage and ub_wage
series instead.

Select the points on the chart, right-click and select Select Data. A Select Data Source dialog
box pops up. Select the yhatc series, and then Remove. Then select Add. In the Edit Series
dialog box, specify lb_wage for the Series name, select N2:N23 for the Series X values and
S2:S23 for the Series Y values-all from the cps4_small data worksheet. Select OK.

'
Select Data 5oura!

Chart !l:a'ta ran.l)e;


Edit Seri es
The daia range is·j:oo compleli,to be· displayed.
the series in ilhe Serles panel Seri�s !Jame:
-===:l -- �S_jl&F! Ch"rt gati [ J1tu11age
Jl .nange:
ii}
s es
The data range is toll <;c Series K values:

otf!-
Legend Entr,ies <s_eries) 1he eri in the series 1
-
[ t:JMd ][ � Edit ] I J< &em
vhatn
JL Series 'f. �alues�

JI
legend Enlries �eries)

earniri.g s. per hour


�Ad.:!,;�

Select Add. In the Edit Series dialog box, specify ub_wage for the Series name, select N2:N23
for the Series X values and T2:T23 for the Series Y values-all from the cps4_small data
worksheet. Select OK, and then OK again in the Select Data Source dialog box.

-
�· -
1 Edit Series
Select Data Source
Serie s o.ame:

lub_wage
�--------

The.data range is too cc Ser. ies l( �alues::
the series in 1he Series J
I ='cps4_smaU data'!$N$2::$f\1$2.l [ii]
·series f 'lalues�

After editing your chart like you did in Sections 2.1.2a-2.1.2c, the result is (see also Figure 4.15
p. 156 of Principles ofEconometrics, 4e):

BO

70

60

� i

. . _i 1 .-i.
: -
__

1: �-�-�J�-;;:-�-�1�-���-�. -�r��-��::1.::jrt=::::_
• I : ; • !. -

__

0 5 10 15 20 25

Yearsof Education

--- yh atn • earnings:pe.rhour -lb_wa_ge -ub_wage


Prediction, Goodness-of-Fit, and Modeling Issues 139

4.9 A LOG-LOG MODEL: POULTRY DEMAND EQUATION

Open the Excel file newbroiler. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it newbroiler data, and in it, copy the data set you just opened.

t:J .K newbmiler data ..- II


I Iris.rt Worhhe.et (S:hift�Fi:LJ IJ

4.9.1 Estimating the Model

We estimate the following log-log model for poultry demand:

ln(Q) = P1 + Pzln(P) + e (4.33)

where Q is the U.S. per capita consumption of chicken, in pounds and P is the real price of
chicken, for annual observations over the period 1950 - 2001.

In cells Kl:L2 of your newbroiler data worksheet, enter the following column labels and
formulas.
K L
1 ln(q) ln(p)
2 = ln(B2) = ln(D2)

Select K2:L2 and copy their content to K3: L53. Here is how your table should look (only the
first five values are shown below):

K I L J
_1_ ln.{q) ln{p)
2 2.66026 1.0591116
-3 2. 714595 1.030993
4 2.727853 1 .0 1 4683
T 2.721295 0_992232
6 2. 7@'01 0.872986

In the Regression dialog box, the Input Y Range should be K2:K53, and the Input X Range
should be L2:L53. Select New Worksheet Ply and name it Log-Log Model. Finally select OK.

' -- - ....
1 Regre>.sion ITJ l'.8J
Input
OKW
lnput '.!'. R21fl9e: I $K$2, $1q;53 � Cancel I
Input,); Range: I $;$2: u 53 [�1
DLabels. D Consmnt is.lero
b!elp l
D Cor>�denc:e Level: �%
Oulput l.lplfons
0 Qutput Rar;ige: 'fiii
0 New Worl<Sheet e:1y: j Log-log Model! I
140 Chapter 4

The result is (matching the one reported on p. 157 of Principles ofEconometrics, 4e):
'
A I B I G I D I E I F I G I H I I
1 SUMMARY OUTPUT
-
2:
1S Coeofffofon fa Standaro Error I Stal P-value Lower 95% Ut>.o&r 95% Low.er95. 0% Upper95_0%
1 7 lnteFc-ept 3.716943882' 0_022:3594'14 166.236191 -!i 2.94446E-70 3_672 Q 336.77 3.761854086 3_6720336.77 3 .7618.54086
----
18
-· · -·

X Varia.tlle 1 -1.121358001 0_0487�6431 -22-999118135 2_99987E-28 -1-2192881 74 -1.02342782'9 -1-219288174 -1. 02342782'9

4.9.2 A Generalized R2 Measure

Make sure you are in your newbroiler data worksheet. We will compute the corrected predicted
y values in column M, and next to it, we will compute the generalized R2.

Here are the formulas and labels you should enter (in the last row of each of the tables below, you
will find the numbers of the equations used):

M
1 corrected predicted y
2 =EXP('Log-Log Model'!$B$17 +'Log-Log Model'!$B$18 *L2)
*EXP('Log-Log Mode1'!$D$13/2)
3 (4.29)

Copy the content of cell M2 to cell M3 :M53.

N
1 2eneralized R2
2 =(CORREL(B2:B53,M2:M53))1'2
3 (4.30)

The result is (see also p. 157 of Principles ofEconometrics, 4e):

M N

,_l_ corrected predicted y g·eneraliZ:ed R2


2 12.16:3229'70'7 0.88U757.76·
-
3 13.G3700687
·4
I-
13.27763969
5
-
13_76970711
G 15.56421528

4.9.3 Scatter Plot of Data with Fitted Log-Log Relationship

Enter the following formulas and labels you should in your newbroiler data worksheet (in the
last row of each of the tables below, you will find the numbers of the equations used):
Prediction, Goodness-of-Fit, and Modeling Issues 141

0 p
1 p Yhatc
2 =EXP('Log-Log Model'!$B$17 +'Log-Log Model'!$B$18 * ln(02))
0.9
* EXP('Log-Log Model'!$D$13/2)
3 1.0 (4.29)

Select cells P2:P3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, left-click, hold it and drag it down to cell P22.

0 :I
1 D

l
2 0.9
3 1.0
. '

Select P2 and copy its content to P3:P22. Here is how your table should look (only the first five
values are shown below):
0 I p
1 p y'hatc
2 0.9 40.62103
>--
3 1-0 4t.42584
I\. u 37.22676
>---
5 1.2 33.76609
6 1.3 3-0.&674

Select the Insert tab located next to the Home tab. Select Ol:P22. In the Charts group of
commands select Scatter, and then Scatter with only Markers.

Column tune PL� B.ar


'

Chart<

The result is:

yhatc
so

45

40 •

35
•....
30
.. ...
25
... .
21{) *•
15 ···�
••••••
10

0.0 0.5 LO 15 2.0 3.0 :>.5

Next, we would like to plot the actual values on the same chart. Select the points on your plot,
right-click and select Select Data. A Select Data Source dialog box pops up. Select Add. In the
142 Chapter 4

Edit Series dialog box, specify actual values for the Series name, select D2:D53 for the Series
X values and B2:B53 for the Series Y values-all from the newbroiler data worksheet. Select
OK, and then OK again in the Select Data Source dialog box.

'
nata Souroe
.Qelcte Edit Series

Rei ett to M'gtch �ala range.: {j


Chan . ge 'Serie; Cinar!! liJlpe ... actucil values - .a;

Series.� values:
Data.. ,
- 2.
E.otd11Dn,
Legend Entries. (S_eries) Series, I values:
Addi Data Lll_Q_e·li

Audi TEendli n·f" ...


�add
form<it Data. S:eii es, .. yhatc

. ------
----- --- --·

! S·elect
After editing your chart like you did in Sections 2.l.2a-2.l.2c,
: the result is (see also Figure 4.16
p. 157 of PrinciplesS:fyle
ofEconometrics,
Chart 4e):

'---
---------
- ��

S,gtect

3-C R L-1-·n ..:...


_ _ew bri . _o _l re _ d_a m
_' __ 52.
l _ $0 � _:$J _·$_s 3_ _&$J
....
�-. ..

� - yhatc:
1-'newbroCTer dam'!$6$2.;$6SS3 liJ - 1·

� 4.()
� a dual values
,___O._
K t;J OK .G;l
..

£0

.. Price Gf Chicken
50
..
.i

...
u
....
0
>
.t::
- .30
r::

a
20

10
05 LO :1!5 2.0 2.5 3.0
CHAPTER 5

The Multiple Linear Regression

CHAPTER OUTLINE
5.1 Least Squares Estimates Using the Hamburger 5.4 Polynomial Equations: Extending the Model for
Chain Data Burger Barn Sales
5.2 Interval Estimation 5.5 Interaction Variables
5.3 Hypothesis Tests for a Single Coefficient 5.5.1 Linear Models
5.3.1 Tests of Significance 5.5.2 Log-Linear Models
5.3.2 One-Tail Tests 5.6 Measuring Goodness-of-Fit
5.3.2a Left-Tail Test of Elastic Demand
5.3.2b Right-Tail Test of Advertising
Effectiveness

This chapter is a simple extension of the material covered in Chapters 2-4. Instead of only one
explanatory variable in the simple linear regression model, two or more explanatory variables will
be used in the multiple linear regression model.

5.1 LEAST SQUARES ESTIMATES USING THE HAMBURGER CHAIN DATA

Open the Excel file andy. Save your file as POE Chapter 5. Rename Sheet 1 data.

We would like to estimate the following multiple linear regression model for Big Andy's Burger
Barn hamburger chain:

(5.1)

where SALES represents monthly sales revenue in a given city (in $1000), PRICE represents a
price index in that city (in $), and ADVERT is monthly advertising expenditure in that city (in
$1000).

143
144 Chapter 5

As we have done before, we will use the Excel Regression analysis tool. There are only two
things to note.

• First, because we have more than one explanatory variable, we will include the labels of
the variables in the input ranges we specify. Those labels will then be reported in the
summary output Excel produces, and we will be able to distinguish the different
estimated slope coefficients.
• Second, as long as the data on the explanatory variables are stored in adjacent columns,
all we have to do is select the whole range of data and Excel will recognize each column
of data as separate observations on separate explanatory variables.

In the Regression dialog box, the Input Y Range should be Al:A76, and the Input X Range
should be Bl:C76; do check the box next to Labels. Finally, select New Worksheet Ply and
name it Regression.

-----

• �ression -r:fj�
'.Input
Input 'Y' Range: I $>1.$1: $A$76 1�1 I
l1'1pllt l!'.. Range:
1$$1:$($75 �

tielp
�babek D Cori�tant is �ero
0 Confidence Level; �%
Ou !put op fions
0 Qutpl.Jt Rionge: I 1�1
@ New Worksheet f:ly: J Regre s�io nl I
0 New worklrock
Residuals
O&esiduals 0 Reslgual Plots
0 Standardized Re$idoals D L[r1e FitPl1,1tll
N.;.rmal Pro bability
O.t>!ormal Prdbab"dity Plots

The result is (see also p. 175 in Principles ofEconometrics, 4e):

A I B I c I D I E F I G I H I
1 SUMMARY OLJTPITT
-2� ,._���������-
.3 I Regression Statistics
_i_ Multif>le R 0J)69520_55
5 R Square 0-446257766
JL Adju_sted R S<�l!<lre 0.432931593
7 Stand.ard Error 75
4.886124039
T ' O �s erv at i o n s
i
_:i_Q__AN OVA
1j dt SS, MS F SignifiGarace F
-¥- R.e91!:ssion 2 13:9'6.538993 6982694963 29_24785998 _ 9.86J=-W I
5-:.01
1 3 Resid11al 72 11'18_942995 23-87420813 - - ·
14 Tot a'I 74 3-11. 5.481"978 I

�5 1
161;--������G -
o e
_ffi_ e-
6 _t s�S-a
n - f-ffd_a_
rd _Eiro
�r- . �-l-S_
a
r_f ��-P
--v- ��-
mu-e L-o- %5 -��-U
__9- ---- -�-L- --59-_0_% _ _U___-9�-0-%�
w w pp er 5� % •o .,,.r a p pe r _
�Intercept 11 S.913613 1 6.351637.595 18_72172:5-12 2_21 42.9E-29 106-.2518552 B1.5753711 1 06.2.51$5-5-2 131.575-3711
PRICE -7_907854804 .(0%993()37 -7.215241826 4_423.9'9E-10 -10_09267696 -.5.12J032645 -10.0926Tfi,9fi -5_7iJ�Q·645
,
ADVERT (S625B3787 0.6831954 BJ 2.726282349 0�0-08 0381 99 0_500658501 3_224-509073 0_500658501 3-224509 · 073·
Multiple Linear Regression 145

5.2 INTERVAL ESTIMATION

Recall from Chapter 3 that the interval estimator of {Jk is defined as:

(5.2)

The one important thing to notice is that, in the case of the multiple linear regression model, the
critical value tc is from a !-distribution with m = N - K degrees of freedoms, where K is the
number of parameters in the multiple linear regression model.

To compute interval estimates, we could use the template we created in Chapter 3 and make sure
we specify the degree of freedom correctly.

Instead, we use the interval estimates Excel has already generated in the regression summary
output.

The results of interest to us, reported on pp. 182-183 of Principles of Econometrics, 4e are
highlighted below:

A I B I c I D I E I F I G
1·6 Coefficients Slendard Error tStat P-�elu& Lol'l'er 95% Upper95%
18-72172512 2. 2142 9E-29'
___R Intercept 118.9136131 635 i ()375.95 105,2518552 1.31-5753711
18 PRICE 7 9 078548 04
-
- 1.0959930:37 -1.i1 s24.ns26- 4.42399E-101 -1b.o9261,595 -5..7:21032645
O.OOH038199'lo.500058501 3.·224509073
-
·�

19 ADVERT 1.8£2583787 0.583195483 2.726282349

Recall that to obtain interval estimates other than the 95% ones, all we have to do is to specify a
different Confidence Level in the Regression dialog box (see Section 3.l.3c).

5.3 HYPOTHESIS TESTS FOR A SINGLE COEFFICIENT

Similarly to results from Chapter 3, we have the following: if the null hypothesis H0: {Jk = c is
true, then the test statistic t =(bk - c)/se(bk) follows a !-distribution with m = N - K
degrees of freedom:
(5.3)

Again, note that in the case of the multiple linear regression model, the !-distribution of interest
has m = N - K degrees of freedom, where K is the number of parameters in the multiple linear
regression model.

5.3.1 Tests of Significance

Recall that when the null hypothesis of a test is that the parameter is zero, the test is called a test
of significance. Results of two-tail test of significance are reported in the Excel summary output
and highlighted below (see also pp. 185-186 of Principles ofEconometrics, 4e):
146 Chapter 5

I A I B I c I D I E I F G
161 Coefficients Standard Etror t sral P-value Lovrer95% Upper 95%
-mfioto<e•� 118.913&131 6.351637595 18..72172512 2.2142.9E-29 iOG.2518552 131_5753711
PRICE -7. 907 8-54804 1.0%99'3037 -7.215241826' 4.42399E-10 -i 0_09267696 -5_ 723032645,
ADVERT i _8625.83787 0_683H5i83 I 2.72fi282349 QJ}081l381991 0_ 50 0 658501 3.224509073;

Note: you could also have used the Two-Tail Tests template you created in Chapter 3.

5.3.2 One-Tail Tests

5.3.2a Left-Tail Test ofElastic Demand

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left corner of your
screen, next to the data tab. Name it Left-Tail Tests.

I i-t 1 � �1 .J Reare5.5Jon data /ti .K II Reoression /; dati J Left-Tail Tests •. t:J I


q

I Re.ady llns.e-rtWork§heoet ['.ihift-FllJll I

Open your POE Chapter 3 Excel file and go to the Left-Tail Tests worksheet. Copy its content
to the Left-Tail Tests worksheet you just created in your POE Chapter 5 Excel file.

You will need to make just a few modifications to create the left-tail test template shown below.
First, go back to each formula and delete the references to POE Chapter 3 Excel file: [POE
Chapter 3.xlsx]; this way the interval estimate will be computed based on the regression results
of your current Excel file: POE Chapter 5. Next, insert a new row, underneath the first one, for
K. Finally, modify the degrees of freedom formula. All needed changes are highlighted below:

A B c
1 Data Input N= =Re2ression!B8
2 K= =Regression!B12+ 1
3 b k= = Reg r essio n !B 18
4 se(bk)= =Regression!C18
5 Ho: Bk=
6 a=
7
8 Computed Values df or m= =Cl-C2
9 tc= = -TINV(C6*2,C8)
10
11 Left-Tail Test t-statistic= =(C3-C5)/C4
12 Conclusion: =IF(Cl1 <=C9,"Reject Ho","Do Not Reject Ho")
13 p-value= =TDIST(ABS(Cl1),C8,1)
14 Conclusion: =IF(C13<=C6,"Reject Ho","Do Not Reject Ho")
Multiple Linear Regression 147

Let a - 0.05; H0:{33 2:'. 0 and H1:{33 < 0. The result is (p. 187 of Principles of Econometrics,
4e):
A B c
1 Data Input N= 75
2 K= 3
3 b.= -7.90785�
4 se{b<) = 1. 0 9'5.99'3
5 Ho: �k = 0
6 a= O,O:S
7
8 Compu-fe.d Values dform= 72
9 le= -1.G66;2937
1 ()
11 Left-T ailT est !··statistic"' -7.215.2418
12 Conclusion: Reject_Hci
13 f)-Valwe = 2.212E-10
14 Conclu:1;.ion: Rej�ct H()

5.3.2b Right-Tail Test ofAdvertising Effectiveness

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the Left-Tail Tests tab. Name it Right-Tail Tests.

dcrti- -/ Left-TaH Tests J Rinhll:-Tail Tests El II

In your POE Chapter 3 Excel file, go to the Right-Tail Tests worksheet. Copy its content to the
Right-Tail Tests worksheet you just created in your POE Chapter 5 Excel file.

You will need to make just a few modifications to create the right-tail test template shown below.
First, go back to each formula and delete the references to POE Chapter 3 Excel file: [POE
Chapter 3.xlsx]; this way the interval estimate will be computed based on the regression results
of your current Excel file: POE Chapter 5. Next, change the reference to bk and se(bk) to the
ADVERT coefficient estimates instead of the PRICE coefficient estimates. Also, insert a new row,
underneath the first one, for K. Finally, modify the degrees of freedom formula. All needed
changes are highlighted below:

A B c
1 Data Input N= =Regression!B8
2 K = =Regression!B12+ 1
3 bk = =Regression!B19
4 se(bk)= =Regression! C19
5 Ho: J3k=
6 a=

8 Computed Values df or m= =Cl-C2


9 tc= =TINV(C6*2,C8)

11 Right-Tail Test t-statistic= =(C3-C5)/C4


12 Conclusion: =IF(Cll>=C9,"Reject Ho","Do Not Reject Ho")
13 p-value= =TDIST(ABS(Cl1),C8,1)
14 Conclusion: =IF(C13<=C6,"Reject Ho","Do Not Reject Ho")
148 Chapter 5

Let a = 0.05; H0:{33 < 1 and H1:{33 > 1. The result is (see also p. 188 of Principles of
Econometrics, 4e):
A B c D
1 Data Input N= 76·'
2 K= J.
3 b,= 1_8,62583787
- --
4 se(b,)= O.S83195483.
5 Ho: �k = 1
5 cr= CJ.OS
7
8 Computed Values dform= 72
9 t., = 1-666293697
10
11 Right-Tail T�s_t t-statistic = 1 _262572438
12 _,___Conclusion: Do Not Rej�ct Ho
13 f)-value = 0_ 105408444
14 _ C onc � �s i �r:i_ Q_ o
: . �ot_B_��ct_ H9 _

5.4 POLYNOMIAL EQUATIONS: EXTENDING THE MODEL FOR BURGER


BARN SALES

We estimate the following extended model for Big Andy's Burger Barn hamburger chain.

(5.4)
2
Go back to your data worksheet. In Dl, enter the column label ADVERT • In cell D2, enter the
formula =C2A2; copy it to cells D3:D76. Here is how your table should look (only the first five
values are shown below):
A I B I c I D
2
_l_ SALES PRICE ADVERT ADVERT
-
2 732 5_69 1_3 u;.9
3 71.& 5_49 2_9 !L41
-
4 52_4 5_53 0_8 (Ui-4
5
-
67_4 5.22. 0.7 ()_49
6 893 5._02 1.5 :225

In the Regression dialog box, the Input Y Range should be Al:A76, and the Input X Range
should be Bl:D76. Check the box next to Labels. Select New Worksheet Ply and name it
Extended Model. Finally, select OK.

,'Regression
CIJ�
[nput

lnputl RanQe: I SA$1: $A$7G � ()K t+J


Input �Range: $8$1!$$76 �
Cancel
l
�Labels D Constant is ;;_ero
t'!elp
l
D CDn'(ider.ic.e Level: � �,.
Output options
0 QutputRange: �I
@ Ne"' Worksheet �ly; I Ex tl':nded Modei I
Multiple Linear Regression 149

The result is (see also p. 193 in Principles ofEconometrics, 4e):

A I B I (; I D E I F G H I I

� SUMMARY OUTPIJT
2
l
3 I Reares.s;on Sfafjsfics ·

� MultipJe R 0-712906125
_R !?quare _q.§Q82,3.5142'
�� �.�

6
,....__
Adjwsted J3. �iqua��, 0.48145'6345

_I_ Sta:nolard Ermr 4.645283'161


8 Obse'rvations 75
9
To ANO VA I
:D_ I I rif SS MS f Sig_nfficance F
12 Re<;ir·essicm I 3' 158.3.39742:7 527.7991422 24.459'3153 s.. 599'97E-11
'13 Resid�aT 71 1532..0·84551 .21_578·65565
r
f--
14 T-0tal. 74 ],115.481978
15
16 Coefficients Standetd Error t Siert P-value Low.er 95% Uo.aer95%
Lower 95 0% U.cmer 95.0%
11_Intercept 109-7190398 6. 79904556 16.1374177 1.870 31E"2:5 96.162-1.2798 96.16212798 'J232759515
-123275,9515

--- PRICE
f18
-7.6',100G0543 1-045938915 '· -7:30:14443_84 3_23648E-10 -9.725543479 _5_55445,7,5oa _9_ 725543479 -5_55445750B

J.! ADVERT
20 ADVERT2
12.15123398
-2_ 767°%2762
J.556164048 3.41&949784
Cl'. 94062405'9 -2.94:i6 876:07
0.001"0516
0.004391267 1°
5.060444353
-4.643513842
1.92'1:2:0235
-0_892411-683
5.060444353
-4.643513842
19.24202:36
-0.892"4116.83

5.5 INTERACTION VARIABLES

5.5.1 Linear Models


Consider the following life-cycle model:

PIZZA = {31 + {32AGE + {33INCOME + e (5.5)

where PIZZA is annual expenditure on pizza, AGE is age, and INCOME is income of a random
sample of 40 individuals, age 18 and older.

Open the Excel file pizza4. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 5 in one file, create a new worksheet in your POE
Chapter 5 Excel file, rename it pizza4 data, and in it, copy the data set you just opened.

I oiua'I data .<

I lns�rt WorlGneet fStilift+Fll) � I

In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Fl:G41. Check the box next to Labels. Select New Worksheet Ply and name it Life­
Cycle Model 1. Finally select OK.
150 Chapter 5

Input
lmpu t I Range� �
Input! Range:: I $!" $1:$Gs41 �
�Labels D Consta111trs z.ero
D Con6dencoe: Level:. � D/�
Output options
0 QulputR.ange: I 1 ili1l Clde 100I
Ur� e;

0 New Wor'ksheet!:ly;; J Life-Cyde Mc.del � I

The result is (see also p. 196 in Principles ofEconometrics, 4e):

A I B I c I D' E I F I G I H I I
t-+-ISUMMARY OUTPUT i
I
3 I Regression Stefistie;.s
I
4 Mulrifl'le R 0_573803829
- l
5 R Squa�.e 0:32·g:-2.sos34
-
6 Adjusted R Sq;llare 0�292994123 ;
I
�1a -- Standard Error 1 '.}1.070099
Observations 40 I
1
I
!�ANOVA
11 elf SS MS .F Srgnificnnce F ,
l
I
i
Regression 2 �12015_ 1787 1560()7_5894 9'�0·81100278 0.000618533
Jg_
13 Residual 37 1&356 35_ 7213 17179-370 85 I
14 Total 39 I
947651_9
15
1£ Coe ffic;ients Standard E1JUI f Sfaf P-va!ue lower 95% Uee_er95% l'._01'1'er 95_0% Uee_er950%
17 Intercept 342. 8848.279 72.3434.19'66 4_ 739-682�3 3. 14373E-05 196.3()3H73 4891.4665184 196.3031373 469.4665°184
- -
�1f income 1 _832:478934 0'.-4643()0741 3.946749963 o:_.060340943 0.8917162:78 2.773'241589 0_89171'6.278 2-773241589
19 aqe --1.57.5555694 2',J169.fl758:J -3269571209 °'- 0 0233260'7 -12_27021864 -2.8808931.53 -12.2 7021664 -2.860893:153

To account for an effect of income that depends on the age of the individual, we add the
interaction variable (AGE x INCOME) to the life-cycle model:

PIZZA = P1 + P2AGE + P3INCOME + P4(AGE x INCOME) + e (5.6)

Go back to your pizza4 data worksheet. In Hl, enter the column label age x income. In cell H2,
enter the formula =F2*G2; copy it to cells H3:H41. Here is how your table should look (only the
first five values are shown below):
H
1 age K°i'ncome
2 487_5
3 1755
4 312
5 728
6 487.5

In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Fl:H41. Check the box next to Labels. Select New Worksheet Ply and name it Life­
Cycle Model 2. Finally select OK.
Multiple Linear Regression 151

.
Regression Ll] [8]
Input
lnjXJt r_ R"nge�

Inpuq Range�

t!elp
0 h<it>els 0 Constant is f;ern
Ocmn�denGelevei:· �%
output optioro-
0 Qutput R<lnge: I lf�lr rte:J Hode .�1
1

@Ne\\/Worksheet['.ly: I Life-<:�de Model 2J J

The result is (see also p. 196 in Principles ofEconometrics, 4e):

� ______A
__ I B c I D I E. I F I G H I I
1 SUMMARY O'UTPUT
T
3 Rearession Statistics
+
l MultirJ"R ()_62:2349295
R Square (),38'73111645

± A-Ojuste-0 R Square o-33G26f8·!i6


1;26_�95134
t
_J_ S.tand;ird Error--
l
8 Otrservati-01'1-s 4el
+ -

__1__
-
10 ANOVA
11 rff SS MS f Sig_nifi@noe F
12 Regression 3, )67043.25 122347.75 7.5650;37514 0.00046:8085
13" fl;esidual ·36 580608.65 16128.01806
t
-
14 Total 39· 947551-9
15
16 1 Coefficients- Slandaro EnDr ISfaf P-value Lo�•er95% Upper95% Lower95.0% Upper95.0%
];. ln_terc_ept, 1;61.46?4.32 120.6G34096 1.338147434 0.189?3�6.8�9 -83.2513.0349' 406.1821675 -83-25130349 4 06.18211675
J_! inc-ome 6.917990507 2.82276761 2_,if�116s"fi4 0.01826628' 1_z55ofi:7055 12-7o414309 1-isso61055 12_70474309·
__:12,_age_ -z__9t7423365 3.352100814 -0_88§22SQ 8 0.380315589 :9. ns798!J7 3 _s20952_139 -?-17579?8 7 3Jl_209_52139,
20 , age x income
-
-().1232393.51 0. 0136718728 -1.847147792 O.ll7Z957528 -0.Z5il55·12Q2 0.01<'0725 -0.258551202 Q_0120725

5.5.2 Log-Linear Models

Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 5 in one file, create a new worksheet in your POE
Chapter 5 Excel file, rename it cps4_small data, and in it, copy the data set you just opened.

I
lrrisertw�rk.sheet rsnift... lflla

Consider the following wage equation:

ln(WAGE) = {31 + {32EDUC + {33EXPER + y (EDUC x EXPER)+ e (5.7)

Go back to your cps4_small data worksheet. In cells M1:02, enter the following column labels
and formulas.

M N 0 p
1 In(wage) educ exper educ x exper
2 =ln(A2) =B2 =C2 =M2*N2
152 Chapter 5

Copy the content of cells M2:P2 to cells M3:P1001. Here is how your table should look (only the
first five values are shown below):

M I N I 0 I p
1 _ln(wage·) educ: exp educ x ·e.:icp

_1 2.9285:235, 1·& 3,9 624
3'
-
2.442347 1:2 16 19'2
4 2.710>7133 15 13 208
5 3.25·S:H16 14 11 154
rS: 3.179303 12 5,1 612

In the Regression dialog box, the Input Y Range should be Ml:MlOOl, and the Input X Range
should be Nl:PlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
Log-Linear Model w Interaction. Finally select OK.

, - -

1 Regression L1Jrg)
Inpu° t
Input)!'. Range:

Irnput ! Range::
j .§M.$1:$M�1001 �
1�$1:�§;1001 � � I

t!e:lp
0r�abels D Constant1s :?_er.o
D Confider;ice Level: � o/.
Output opliol'ls

0 QutpJ't Range.; �1
@New WorkShee:t Ely: j icie I w In te•iilciion
I

The result is (see also p. 197 of Principles ofEconometrics, 4e):

1
b- A-�.l
SUMMARY OLJTPLJT
B I c I D E I F I G H I I

2" 1
t
2I\ 1 Multip.le Pleg_ressipn
.!!.__
Staofistics
Q_44115987
'
R Squa�·El 0 .19�W22031
.2.... -

��
r-- ·
8
Adju;:iteu RSqu.ar•e
Standard Ermr
Qbservat·iorns
0·-1921961�
0.521847758
1000
t
-
9'
10 AN OVA -
11 df SS MS f Sig_niffca11ce F
��Re� r0ess i·oii 3 6.5.54495019' 21.B4&nwn 8022880786 1.7205E-46
1J. Re·s:idual 99'6' 2'71,.2357823 : 0.27232508);
14
,__
To1<11 99'9; -336.760732'.5
15
16' Coe.fficients Sfanrierri Error t Slat P-vaiue Low.er95% Upper95% L-ower 95. 0% Upper 95.0%
17 lnt�rce�t 1 :392317989 Oi:206�44H3• 5. 7377J.7'608 2.7172'.E-11 0: 985:808 989 1.7978;2£91l8 Cl'.98680B�ey9: 1-79'7 8259 88
18. educ
f---
0..-09493849'5 0.. 0145245$7 ' 5.4.91712999' 1.33643E-10 0.0652399-95 0.123£3699'4 0 .0£62'3 999>5 0.1236'.ls.994
1,g exp GJ.006329514., 0.00569851 CJ>.94:4.9'13664-; 0·�3449'32118 -0.006615298 0.019474326 -0.00681529>8 0.01947432.6
2o OOUC.1'( rait:f!i ,J_64453E-O.S o 000483,rss' -ci.07533629�1 o.9'.3�96227 -o:·ooci9s516-61 o.ooo.9fa8is- -0�.000�8576-6 o.ooo�fi2:a75
Multiple Linear Regression 153

5.6 MEASURING GOODNESS-OF-FIT

The coefficient of determination R2 is reported in the Excel regression summary output. For Big
Andy's Burger Barn multiple linear regression model of Section 5 .1, it is highlighted below:

SUMMARY OUTPUT

sties
Multiple. R 0.66952055
R Square o_44B25no6
Adjusted R Square
Standard Error 4.886124039
Observations

--
A I B

-
1
2 I
3 I R.egress.ion Sfott
4
5 I
El 0 .4]2931593
-

7
8 75
CHAPTER 6

Further Inference in the Multiple


Regression Model

CHAPTER OUTLINE
6.1 Testing the Effect of Advertising: the F-test 6.4.2 The Optimal Level of Advertising and
6.1.1 The Logic of the Test Price
6.1.2 The Unrestricted and Restricted Models 6.5 The Use of Nonsample Information
6.1.3 Test Template 6.6 Model Specification
6.2 Testing the Significance of the Model 6.6.1 Omitted Variables
6.2.1 Null and Alternative Hypotheses 6.6.2 Irrelevant Variables
6.2.2 Test Template 6.6.3 The RESET Test
6.2.3 Excel Regression Output 6.7 Poor Data, Collinearity and Insignificance
6.3 The Relationship between t- and F-Tests 6.7.1 Correlation Matrix
6.4 Testing Some Economic Hypotheses 6.7.2 The Car Mileage Model Example
6.4.1 The Optimal Level of Advertising

In this chapter we continue to work with the multiple linear regression model of Big Andy's
Burger Barn hamburger chain to illustrate the F-test procedure. We also work with additional
examples to address nonsample information, model specification and collinearity issues.

6.1 TESTING THE EFFECT OF ADVERTISING: THE F-TEST

6.1.1 The Logic of the Test

In Chapters 3 and 5 we worked with t-tests for null hypotheses consisting of a single restriction
on one parameter f3k· An F-test will be used when a null hypothesis consists of a single or more
restrictions, each regarding two or more parameters.

154
Further Inference in the Multiple Regression Model 155

An F-test is based on a comparison of the sum of squared errors from the original, unrestricted
model, with the sum of squared errors from the model in which the null hypothesis is assumed to
be true and in which the restriction(s) implied by it has(have) been imposed-this latter model is
referred to as the restricted model.

If the null hypothesis is true, then the following F-statistic follows an F-distribution with m1 = ]
numerator degrees of freedom and m2 = N - K denominator degrees of freedom:

(SSER - SSEu)/J
F F -K) (6.1)
SSEu/(N - K) � (m1=f,m2=N
=

where SSER is the sum of squared errors from the restricted model,

SSEu is the sum of squared errors from the unrestricted model,

] is the number of restrictions in the null hypothesis,

N is the sample size,

and K is the number of parameters in the unrestricted model.

If the null hypothesis is not true, then the value of the computed F-statistic will tend to be
unusually large. We will reject the null hypothesis if F ;::: Fe, where Fe is the critical value shown
below.

6.1.2 The Unrestricted and Restricted Models

We will use the Big Andy's Burger Barn model to illustrate the F-test procedure. We start by
specifying and estimating the unrestricted and restricted models.

Recall from Chapter 5, the following multiple linear regression model for Big Andy's Burger
Barn hamburger chain. This is the unrestricted model.

(6.2)

where SALES represents monthly sales revenue in a given city (in $1000), PRICE represents a
price index in that city (in $), and ADVERT is monthly advertising expenditure in that city (in
$1000).
156 Chapter 6

Suppose we wish to test the hypothesis that changes in price have no effect on sales revenue
against the alternative that changes in price do have an effect. The null and alternative hypotheses
are H0:{33 = 0,/34 = 0 and H1:{33 * 0 or /34 * 0 or both are nonzero. If we impose our null
hypothesis or restriction to equation (6.2), we obtain the following restricted model:

(6.3)

We would like to successively estimate the unrestricted model (6.2) and the restricted model
(6.3). First, open your Excel file andy. Save your file as POE Chapter 6. Rename Sheet 1 andy's
hamburger chain data.

2
In Dl, enter the column label ADVERT . In cell D2, enter the formula =C2"2; copy it to cells
D3:D76. Here is how your table should look (only the first five values are shown below):

A I B I c I D
1 SALES PRICE ADVERT ADVERT2
2 73.2 5_,59 u i.69
,_
3 7'i.8i ·5.49 2.9 8.41
,_

I---
4 62'.4 S_fi;3 0.8 0.64

,-2_ 67.4 ·5.22 0.7 0.49


6 891.3 :5_0,2 1 ..5 2.25

For the unrestricted model (6.2), the Input Y Range should be Al:A76, and the Input X Range
should be Bl:D76. Check the box next to Labels. Select New Worksheet Ply and name it
Unrestricted Model. Finally select OK.

l.[lput
OK!8
lnput'Y:'Range: I SA;$1:$AS76 �
I $8$1:$[)$76
Cancel
I
[nput;X1Riilng�:
[�]
.!ieip,
�b.abels: D Constant.is �ere
D Con�dence Level; EJ o;,.
Output cptian�

Q Qulput Ra1:1fie;; rill


@· New Worksheetf:ly.: I Unrestricted Medell I

The result is what you already obtained in Chapter 5:


Further Inference in the Multiple Regression Model 157

A I B I c I D E I F I G I H I I
1 SUMMARY OUliPUli

2 T
3. I Reg·ression Statistic.s
4 Multtple_ R 0.712'9051.33
5 RSqusre 0.50&2'35155-
1--
5 Ad'justed R square 0.48-7455358
7 StaJ11dard Error 4 .·645.2&3021
a Observatiorus 75


10 ANOVA

JII rlf SS M:S F Significance F


12 Regi:_essibru 3. 1583.397408 527.799136. 24.45931648 ,5 .59'9'36E-U

_J
,___ --

13 Residual 71 153:2.0844.S.9. 21.57865435


1--
14 T.otal 74 3115.4818157

15 J
.t6J Caeffidenfs Standrn:d Er'mr tSt·at P-vril/Ue Lowef'95% Upper95%· tpwef'95.0% Uppff 95.lJ%
108.719035 •6.79'.9045455

�' "'""''
16.13.741763 1.B7037E-25 96.16212457 123.2759'474 95.16212457 123.2759474
-

18 price - 7 ,.540000035 1.04593888'4 -7. 3-04442117 3.23548E-l0 -9.725542907 -5 .554457162 - 9 . 725542907 -5.554457162

19' advert 12.15123567 C3.S.55HB941 3.416850354 0.001051598 5.050446253 19.24:202509 5.060446253 19.242025()9
f- -

20 adveat2 -2. ?57963()89 {).9'40624011 -2..94Zn88043. 0.004392655 -4.643514112 -0.812412065' -4.6435·1411"2 - O . S92412G 56 ,

Go back to your andy's hamburger chain data worksheet. For the restricted model (6.3), the
Input Y Range should be Al:A76, and the Input X Range should be only the PRICE data
Bl:B76. Check the box next to Labels. Select New Worksheet Ply and name it Restricted
Model. Finally, select OK.

Input OKW
input Y:Rcinge; I ¥'-111: $11:$76 �
input�Range: 513$1::$8�76 rs
cancel
l
t:!elp
�b:a'bels D •Coras�nt is �ero
D ConBdeno:e 'Level: �%
Oatput optlans

0 Qul::put Rijnge: �1
@Ne\111 WQrl;:;heet1�Jy;; j Restricted Model I
The result is:
158 Chapter 6

I I I I I I I
�SUMMA.RY
A B c D E F G H

OUTP'UT
i
r I
3 I Regr:essivn Statistics
4 Multple R 0.62554053
- -t
5 R:5'qua[e 0.391300�55
i
5 Adjusted R Square 0.382952'612
- .,
-
7 standard Error 5.09685752'9
8 Obs.grvatlons. 75

l�ANOVA I
i

i
nl df SS MS F Sign.if.icancf! F
i
12 _Re15re:;sion 1 12l<j,091Q3. 121'9 .09103 46.9279-0295 1.97078E-09
- --- + i
13
-
Re.si,dual n 1896.390837 25.97795667
i
14 Iota I 74 3115.4818&7 I
15·
101 Coefficients Stamfarri Error t stat P-voiue lowef"'J5% Upper95% Lower95.0% Upper95.0%
�ter<0e-pt 121.9001736 6.5262906'98'. 18.67832421 l.5876E-29 108.8932951 134.9o7052 108.&932951 134.907052
pri ce. - 7 .829'073515 1.142864644 -6.850394365· 1.97-078E-09 -10.10679943 -5.551347597 - 10.10679943 -5,551347597
'

6.1.3 Test Template

Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it F-test.

I F-test '

I I !111mt Work.sheet ('lhift�Fll) �

Create the F-test template as shown in the table below.

F-critical values are obtained in Excel by using the FINV function. The syntax of the FINV
function is as follows:

where a is the level of significance of the test, m1 is the numerator degrees of freedom and m2 is
the denominator degrees of freedom of the F-distribution.

p-values for F-statistics are obtained in Excel by using the FDIST function. For hypothesis tests
purposes, the syntax of the FDIST function is as follows:

=FDIST(F-statistic, mi, m2)

A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!B 12+ 1
4 SSEu= ='Unrestricted Model'! C 13
5 SSER= ='Restricted Model'! C 13
6 a=
Further Inference in the Multiple Regression Model 159

A B c
8 Computed Values mi= =Cl
9 mz = =C2-C3
10 Fc= =FINV(C6,C8,C9)
11
12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)
13 Conclusion = =IF(C12>=C10,"Reject Ho","Do Not Reject Ho")
14 p-value = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")

Note that the number of parameters K is equal to the Excel regression degrees of freedom plus
one (see cell C3 above).

With 2 restrictions in the null hypothesis H0:{33 = 0, {34 = 0, and at a 0.05, the results of the
F-test are (see also p. 225 of Principles ofEconometrics, 4e):

A I B I c
I
8 Computed V1:1lu.es m1= 2
,_
A I B I c 9
-
n12 = 71
1 Data Input J= 2 10 Fe= 3.125764
-
I-
2 N= 75 ll
- �
3 K!: 4 12 F.test F-statistic = .&.44136

4 SSELI= 1532.084 13 Gonc.lusion =1Reject Ho



5 SSER = 18%.'.!91 14
��
p-value = 0.000514

·fi I a=I 0.05 15 Conc]usion =Reject Ho

6.2 TESTING THE SIGNIFICANCE OF THE MODEL

6.2.1 Null and Alternative Hypotheses

For a general unrestricted multiple regression model with K 1 explanatory - variables and K
unknown coefficients: Yi = �1 + �zXiz + �3xi3 + + �KxiK + ei> the null · · · and alternative
hypotheses of a test of significance of the model are:

Ho:/32 = 0,{33 = 0, ...,{3k = 0

H1: At least one of the f3k is nonzero fork = 2, 3, ...,K

Note that, in this one case, in which we are testing the null hypothesis that all the model
parameters are zero, except the intercept, the sum of squared errors from the restricted model is
equal to the total sum of squares from the unrestricted model: SSER = SSTu.

6.2.2 Test Template

Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it Test of Significance of Model.
160 Chapter 6

LI Test of Sianificanoo,of Model,.,


I [nsertWa-rkl h""t (Shitt� Fil�!

Copy the template from your F-test worksheet into your new worksheet. You just need to modify
the reference in cell CS, as highlighted below, to obtain a template for a test of the overall
significance of the regression model.

A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!B12+1
4 SSEu= ='Unrestricted Model'!C13
5 SSER = ='Unrestricted Model'!C14
6 a=

8 Computed Values m1= =Cl


9 m1= =C2-C3
10 F e= =FINV(C6,C8,C9)

12 F-test F-statistic= =((C5-C4)/C8)/(C4/C9)


13 Conclusion = =IF(Cl2>=Cl0,"Reject Ho","Do Not Reject Ho")
14 p-value = =FDIST(Cl2,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")

For the unrestricted model (6.2), SALESi = {31 + {32PRICEi + {33ADVERTi + {34ADVERTl + eb

the null and alternative hypotheses of a test of significance of the model are:

H1: At least one of {32 or {J3or {34 is nonzero

The null hypothesis above contains two restrictions. With 3 restrictions, at a = 0.05, the results
of the test of significance of model (6.2) are (see also pp. 226-227 of Principles of Econometrics,
4e):
A 8 I c
s Computed: Value's: m1= 3
A I 8 I c '9 m2= 71
-

1 Data Input J= 3 10 f<C:; = 2..733047


,_

2 N= 75 11
-

3 K= 4 12 F-test F-statisti,c = 24.45:932


-- -

-
4 SSEu= 1532.084' 13
-
Conclusion = Reje-ct Ho
5 SSfai= 3115.482 14 p-v:alu·e = 5.6E-11
- �
6 a= 0.05 15 Conclusion = Reject Ho

6.2.3 Excel Regression Output

For the test of significance of a model, since SSER = SSTu, there is no need to estimate a
restricted model-all the information needed to compute the F-statistic is available from the
regression analysis of the unrestricted model. This is why the F-statistic of the test of significance
Further Inference in the Multiple Regression Model 161

of a model and its p-value are found in the Excel summary output (see your Unrestricted Model
worksheet):

A I B I G I D I E F
11 I rJF SS MS F Sifl_niffr;anc;e F
.R Reg,re-ssion t
2 1396.536993 693 .26'94963 29.24785,998 5.0'408SE-10
Jl. Reisidual 72 rna.94.2sss 23-8:7420B13
14 Total I 74 3115-.481978

6.3 THE RELATIONSHIP BETWEEN t- AND F-TESTS

Reconsider the following multiple linear regression model for Big Andy's Burger Barn
hamburger chain. This is the unrestricted model.

(6.2)

Suppose we wish to test the hypothesis that changes in price have no effect on sales revenue
against the alternative that changes in price do have an effect. The null and alternative hypotheses
are H0: /32 = 0 and H1: /32 * 0. If we impose our null hypothesis or restriction to equation (6.2),
we obtain the following restricted model:

(6.4)

Go back to your andy's hamburger chain data worksheet. In the Regression dialog box, the
Input Y Range should be Al:A76, and the Input X Range should be Cl:D76. Check the box
next to Labels. Select Output Range and specify it to be cell Al in your Unrestricted Model
worksheet: you can place your cursor in the Output Range window and move it to that cell to do
that, or type 'Restricted Model'!Al in the Output Range window. Finally, select OK.

egression ffi l8J


Jn put

Input Y.. Range:

Input l'.I Range:


s,.\$1:SAS76

I $CS1:$0$76 �

� I

t:j_elp
0babels D co,,,;rant i• !.ero
D Co.nfjdence Level: �%
Ouiput options:
0 Qu'tputR�e: (i6?dei'!�:$.1. m

Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.

. - ��������---

Microsoft Offi ce Excel

? Regression - Outµut ramge will lilverwrite eliisting data. Press OK to overwrib:: data in range

I o<J:il [ Cancel ] ( Heli:i ]


162 Chapter 6

The result is:

A I B l c I [) I E 1 F 6 H I I

t
1 SUMMARY OUTPUlf
-

3 I Regression Statistics

__±__ Mllltiple R 0.372:4-04526,


5 R S.quare 0.13&!>85131
-

6 Adjusted RSquare 0.11475'9718'


-

-
7 star11dard Error 6.1048829
& Observations I 75

j

-

10 A NOVA I
11 I I cJf 'SS MS F Significan oe F
12 1 Regres�irnn 5. 796561616 0.004632556.

t
2 432.0710103 216.0355051
-

13 11e:sidual 72 2683.4108'.16 37.2695952.l


---

14 Total 74 3115.481857 I I
15

16J Coefficients Standan:i Error t Stat P-vafu·e Lower.95% Upper.95% Low.er.95.Cl% Upper-.95.0%
17 lrnt·ercept 64.1l4148981 3.827012492 1·6. 9431()� 7.87896E-27 57.2.12:47994 72.47Q4S968 57.. 21247994 n.4704996B
18 advert 14.249.15942 4.6582829 3.058886559' 0.003118901 4.96304.2303 23.53527653, 4.%3042303 :B.53527653
19 advert2 -3 .3 55•8'94266 1.231488631 - 2 7331915C 7
. J 0.00788'726& - 5.&2082195 -0.9'lfr966582 -5 .. 82082195 -0.910965582

Go back to your F-test worksheet. With 1 restrictions, at a = 0.05, the result is (see also p. 227
in Principles ofEconometrics, 4e):
A I B I c I
g Computed Values m1= 1

I I
I

Data Input
A B
J=
c
1
,_
'9
11()
m2=
Fe= 3.97581
71

,_
2 N= 75 11
-
,_
3 K= 4 12 F-11est F-s.talisti c = 53.35487
I-
4 SSEu= 1532.084 13 I Conclusion = Reject Ho
1-
.SSER = 2683.411 14 p-value = 3.24E-10
,_
1+ o= 0.05 15 I Conclusio·n =·Reject Ho

Note that we used at-test in Chapter 5 (Section 5.3.1) for this same test of significance of {32.
When testing a single "equality" null hypothesis (a single restriction) against a "not equal to"
alternative hypothesis, either a t-test or an F-test can be used and the test outcomes will be
identical.

If you go back to your Unrestricted Model worksheet and look at the p-value for b2, you should
find that it is exactly the same as the one computed in your F-test template. We highlight both
results below:

- - - -

-
A I B I c I Di I E
A B c 1fj Coefficients Standard Error t'Stat P-11u/ue
11 F-tes1 F-statishc = 53.35487 17 Intercept 109.719:D35 5.7990454551 15.13741753 1.87937E-25
,_
13 Conclusion= Reject Ho 18 price -7.540000035 1.04.59:388841 -7.304442117 3.23648E"l0
14 p-valuB = 3.24E-10 19 advert
-
12.1512356.7 3 .555153941 3.416�50364 0.001051598

15> Conclusion = Rej ect Ho 201ad11ert2 -:!.757963089 0.940624031 -2:.94258804'.l 0.004392565


_

As explained in pp. 227-228 of Principles of Econometrics, 4e, note F-statistic = 53.355 = t2


(-7.3044)2.
Further Inference in the Multiple Regression Model 163

6.4 TESTING SOME ECONOMIC HYPOTHESES

6.4.1 The Optimal Level of Advertising


For this test, as explained on p. 229 of Principles ofEconometrics, 4e, the restricted model is:

Go back to your andy' s hamburger chain data worksheet. Because explanatory variables must
be adjacent, insert a new column to the right of the PRICE data column. In Cl, enter the column
label x*. In C2, enter the formula =E2-3.8*D2; copy it to cells C3:C76. In Fl, enter the column
label y*. In F2, enter the formula =A2-D2; copy it to cells F3:F76.

Here is how your table should look (only the first five values are shown below):

·1 A I B I c I D I E I F
1 SALES, PRICE x:" ADVERT ADVERT2 y"
2 73-2 S,_&9 -3_2S 1-3 U19 71_ 9'
'3
--
71-8 ,5_49 - 2 61
. 2.9 a_41 68_9'

-
4 62-4 £._6;3 -2'-4 o_a o,_fi4 61 _ fi,
5 67-4 .fi,_22 -2.17 0.7 0-49 Eifi'-7
....____
6 89_3 s._n2 -
3 45
. 1-5 2'-25 ll7_8

For the restricted model (6.5), the Input Y Range should be Fl:F76, and the Input X Range
should be Bl:C76. Check the box next to Labels. Select Output Range and specify it to be cell
Al in your Restricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Restricted Model'!Al in the Output Range
window. Finally, select OK.

"-� ------- � -

' lteg resskm


CTJL8]
Input
lnputy Range:

b[elp
�labels: D Ct?nstant is f;ero
D Gonfjdence Level: EJ %
OIJtpUt CpllOl:'IS-

@ Qutput Range.: I 'Res.1ricted Model'! �

Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
164 Chapter 6

mSUMMARY OUTPUT I I I I I
A B C. I 0 E I F G I H I

3 I Regression S falistics
4 R
Multipl., 0 _ 693339057
f--
� R Sqwarn 0-480719048
6 Adjusted R �quare
,__
o_4662945n -.
T Standard Error 4.643224.3-9
1-�
a Ol>sen1a.tions 75

1�AN0v'A
J_'.1_ I dt SS MS F Sig_riificance F
12
f--
Regression 2 1437_0 1327 1 718._5066355 33�321'&3303 5J>-818E-11
R�sidua.I 72 1552.28"6;357 21 �55953273

14 Total 74 2989-2�96.?8"
15 I
�16-I Coefficients Sla11dard Error t Slat P-v:alue Lower95% Upper95% Lov,.er 95.0% Ueeer95.0%
17 Intercept 110.35B95-9'9 ,6_763!10,3393 16.31610996 6.84193E-26 96.87556446 123-B4Z 3554 96_8.7,5.56446. 123.. 8423554
-7 .60310422 -7.21722771 -5:5203-727·7 -9_ 6 8 5 835675 , -5 §2037_27GS'
J_i[ PRICE
19 x* -2-87651491
1.044 78()'30 9
0.9334%59 -3.Cl8144457
3.3961 TE-10
0.0029'17717
-9Ji,85835675
4. fi7404337 -1-0156·2'549, 4-737404337 -1.01562503

Go back to your F-test worksheet. With 1 restriction, at a = 0.05, the result is (see also p. 229 in
Principles ofEconometrics, 4e):
A I B I c I D
8 Computed! V.alues m1= 1
A B
9 m2= 71
1 Data Input J=
2 N=
iO
-f--
Fe= 3.�7581
11
3 K= 4 -
12 F-test
--
F-statistlc = 0.936194
4 SSEu= 1532.0BS ·13 Conclusion= Do No.\ Reject Ho
-
SSER= 1 552 _ 2!!6 14 p-value = 0.336543,
-
a= Q. (15 15 Conclusion= Do Not ·Rej.Bcl Ho

6.4.2 The Optimal Level of Advertising and Price

For this test, the restricted model is:

f31 + f32(PRICE - 6)i


(SALES - ADVERT - 78.l)i (6.6)
+ {34(ADVERT2 - 3.8 ADVERT+ 3.61)i + ei
=

Go back to your andy's hamburger chain data worksheet. In cells G1:12, enter the following
column labels and formulas.

G H I
1 y** X1** X2**
2 =A2-D2-78.l =B2-6 =E2-3.8*D2+3.6 1

Copy the content of cells G2:12 to cells G3:176. Here is how your table should look (only the
first five values are shown below):
G I H I I


1 y"* I x.i* X2"'
2
,__
-6.i -0.31 0.3·6
3 -9.2: 0.4'9 1
4 -165
-
-0.37 1.21
5 -11.4 0.2.2 1.44
f--
6 9.7' -0.9:& 0.16
Further Inference in the Multiple Regression Model 165

For the restricted model (6.6), notice that there is no intercept; so you will need to select the
Constant is Zero option in the Regression dialog box. The Input Y Range should be Gl:G76,
and the Input X Range should be Hl:I76. Check the box next to Labels and Constant is Zero.
Select Output Range and specify it to be cell Al in your Restricted Model worksheet: you can
place your cursor in the Output Range window and move it to that cell to do that, or type
'Restricted Model'!Al in the Output Range window. Finally, select OK.

- Re�-- ------ LI)l'.8J


input
Inp111ty Ral'lge:

Input.ii RMlge;
I ·�$1::$1;$76
$--1$1:$.1$76
[�]

� I

'tielp
�Labels. � CoMt:arntls z_ero
D Confidence tevel; EJ %
Output options

® QutwtRarnge; I J Model'!.$"'4$� [iU

Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:

A B I G I D E I F I G H
1- SUMMARY OUTPUT
2
3 R.�gr.ession Statlstic.s
4 .Multiple R 0._699423441
5R Square 0-489193159!
6 Adjusted R Square 0•-466497175
'7 Standard Error - 4_937778213
T Dbservations 75
9'
10 ANOVA
J1_.�
I ____________ � s_s�-�-1w _ s _______ F w�n_m_ca_n_
s� ec _F_
12 Regression 2 1 704_549'/'rn 1152-2748861 34-_95558173 2_46249E-'11
B Residual 73 1719_8'60-71.9 24-18165368
'14
f--.-\
.
Total 75 3484_410'495 , ,
15
���������������������������-

Goeffrcienls StandB'rd Error t SlB't F'-Vil']ue .Lower 95% Upper 95% lowedl5. 0% Upper 95 0%-
� Intercept _ 0. #NIA #NJ.A ___ #NIA, _ _ #NIA #NIA #NIA #NIA
-.i1-_17957010 -:.s_2os f2o s3'4 -4.1191sf81�1
_

18 :x1'* -6 _1�1495 1 Ul'1082�01& -6-12761579 4J 11S2E-08 .0_200.no934


--
19 x2'" -5.08-0167ID4 0_&79983611 -1-479& 3769 U304�E-10 -6_441372408 -3_1309
° 6168 -6_44137240-8 J D0 96168 2
- _ '

Go back to your F-test worksheet. With 2 restrictions, at a= 0.05, the result is (see also p. 231
in Principles ofEconometrics, 4e):
I
A B I c
8 Compute.di Values m1= 2
,_
A I B I c
� m2 = 71
_J_ Da1a Input. J= 2: -
1Qi Fe= 3.125764
2 N.= 75 ,_
-
-
11
3 K= 4
12 F-t·est F-statistic = 5,_7412:33
4 SSEu= 153-2_085 ,__
-
,_
13 Conclu�ion = Reje_ct .Ho
-
5 SSER= 1779.Jl61 14. p-value = [)_004885
1�
6 a= o_o.s 15 I Conclu.sicin = 'R0eject Ho
166 Chapter 6

6.5 THE USE OF NONSAMPLE INFORMATION

Consider the following unrestricted demand model for beer:

where Q is the quantity demanded, PB is the price of beer, PL is the price of liquor, PR is the
price of all other remaining goods and services, and I is income. All information for this model
has been collected over a period of 30 years from a randomly selected household.

The assumption that economic agents do not suffer from "money illusion" can be imposed on the
demand model. This lead to the following restricted demand model for beer (see pp. 231-232 in
Principles ofEconometrics, 4e for more details):

(6.8)

Below we estimate the restricted model (6.8).

Open the Excel file beer. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, rename it beer data, and in it, copy the data set you just opened.

U heerdata,./
I I Insert Wen.ks heft [S!lifHfll ) �
I

In cells Fl:I2 of your beer data worksheet, enter the following column labels and formulas.

F G H I
1 y
* X1* X2* X3*
2 =ln(A2) =ln(B2/D2) =ln(C2/D2) =ln(E2/D2)

Copy the content of cells F2:I2 to cells F3:131. Here is how your table should look (only the first
five values are shown below):
F G H
r x1• x:l X3�
2 4_403054 0_472253 1_834382 10_025:7a
3 4.0412915 1.2:20257 2.39'1:088 10.58768
4 4_160444 0_979322' 2_Wi509 10_.33316
5 4.180522 1-05315 2.2'58981 10.49'711
6 4.160444 0.757095 1.%1287 10.15131

In the Regression dialog box, the Input Y Range should be Fl:F31, and the Input X Range
should be Gl:I31. Check the box next to Labels. Select new Worksheet Ply and name it
Restricted Beer Demand Model. Finally, select OK.
Further Inference in the Multiple Regression Model 167

• R�gression ---- - -l7]�


Iilput

Input '!'.Range:

i:Aput �Range:
J.$F$1:�'.$.,31
J$G$i:$1$3i

r�l
� -

t:[elp
�Labels D Cons:tant is ;'.ere
D Crin�clence Lev-el: �%
Output options

0 Qutput Range: �
@New Worksheet['_lyo J�er Demand Mc-dell I

The result is (see also p. 232 in Principles ofEconometrics, 4e):

A B c I D I E I F I G I H I
11-1""'1-"'-
S � UM�M
�....
ARY.....: OLJTp UT
'2
3 -----Re- a-,r. -e s-·
s, on_
- S_ta _t1-
- sti -c_s __

4 Multiple R 0.898659761
'5 RSquar-e 0.80794887
L AdJ usied R Square
-

o i as i a9124
_

7 I Sta.ndard Error 0.0-6.1675593


8-! 0bservalions_ 30

�ANOVA
11 I df SS MS F Significance F _

,JL Regression 3 0.41•6070592 0.138690197 "Ju.46020486 1.83399E-09


13' ResiduaJ 26 o-.o98sooa41 o.ob3a.o3679
'14total 29 0•_ 5-149-7 1 439 I

15
Coefficienls Sfapd,.rd Error t Sfat P-value LowerY5% Upper 95% lowe.r95J)% Upper95_0%
17 1 lnterce·pt _-4.7'f7797376 3.7139(}504
- .:1.2_9184707-9 0207775913 -12�43'183 844 2.83624369'1 -12.43'183844 2.83624369'1
1if !K"'l' -1299�8¥8:4- _0.16573!623 -7.840021241 �.57799E-08 -l.640065044 - 0 95s i' o1925 �1.64oo6s-044
_ -0_9�8ro7�.?�
tl
,_
x2. 0.186615879 0284383258 0.656915882 051700'8126
- - - -0,3!!'77�?275 0. 771374032 -01.3917742275 0. 77137403'2
20' x3� 0.945628579 �-427046831 2.214812313 O.Qo3574:2225 0.0&.8021255 1. 823&35904 0·.0&8021255
. 1.823&359.04

6.6 MODEL SPECIFICATION

6.6.1 Omitted Variables

Consider the following family income model:

(6.9)

where FAM INC is the annual family income of married couples where both husbands and wives
work; HEDU is the years of education of the husband and WEDU is the years of education of the
wife.

If we incorrectly omit the relevant variable WEDU (wife's education) from the family income
model, it becomes:
(6.10)

If we add the omitted relevant variable KL6 (number of children less than 6 years old) to the
family income model, it becomes:
168 Chapter 6

(6.11)

You can estimate models (6.9)-(6.11) using the edu_inc data set. Below, we will show you how
to get the correlation matrix as shown in Table 6.1 of Principles ofEconometrics, 4e (p. 235).

Open the Excel file edu_inc. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, name it education and income data, and in it, copy the data set you just
opened.


I
edura.tion and in.come datal, 'tJ
Jln�"rtWor�lrteet (�pitl'"-FtlJ 1 Q
Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.

The Data Analysis dialog box pops up. In it, select Correlation (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.

Analysis Tools
e Factor
OK
Ano�a: Singl "' I
e
l io!An o .,. a; m
/:ova: '11/iiiiii fi caibiioniiin •••• l
Cancel
hD
•oi -fi
a ctiiiil
o r Without !h· iiRliipii
i!ijlim
Cov ariance
T'lll o-Fact w Reolicaoo


dl :t[elp
Descriptive Statsncs
Exponenti;31 Srnaething
F·TestT•wo-Sample fur Variances
Fowrier Anill�sis
Histogram

A Correlation dialog box pops up. Specify the Input Range to be Al:F429. Select Grouped by
Columns, as this is the way the data on each variable are stored. Check the box next to Labels in
first row. Select New Worksheet Ply and name it Correlation Matrix. Finally, select OK.

,
Correlation (f]�
Input

lnput Range:
ISA$k.�.$'\:2.9 �
Grouped By: 0 Column�
OB.ows
� eel

t;!elp
0 tabels in first rn•w
Output options
0 QutputRi0n9e: I"" 1 . �1
@ NewW11rkl:heet E'Jw:· ICorrelation Matrix I
0 New \O!_nr :kbool
Further Inference in the Multiple Regression Model 169

The result is:


A B I c I D I E I F I G
1 FAMING HE WE KL6 X1RA X5 XTRA X6

� FAMINC 1
3 HE 0.354·684 1

T WE
>---
0-3-62328 0.594343 1
'
5, KL·6 -D·. .0-7195 (}_ 104877 0 . 1 2�34 1
T XTRA XS 0.289!!17 0.!!35468 0.517798· 0.148742 1
T XmA-X6 0.351.365, 0.820563 o.7:m6& 0.159522 0.900206 1

6.6.2 Irrelevant Variables

To see the effect of irrelevant variables, we can add two artificially generated variables X5 and X6
to the family income model (6.11):

(6.12)

You can estimate model (6.12) using the edu_inc.xls data set. Below, we will show you how the
variables X5 and X6 were generated.

Variables X5 and X6 were constructed so that they are correlated with HEDU and WEDU, but
they are not expected to influence family income. Specifically, they were defined as follows:

Xsi = HEDUi + 2N(0,1) (6.13)

x6i = Xsi + WEDUi + N(0,1) (6.14)

where N(0,1) are random numbers from a normal distribution with mean 0 and standard
deviation 1, generated the way we generated our random samples in Section 2.4.4 and Section
3.1.4.

Go back to your education and income data worksheet. In cells Hl :N2 enter the following
column labels and formulas. In the last row of the table you will find the numbers of the
equations used in the formulas.

H I J K L M N
1 N(0,1) for x5 N(0,1) for x6 HEDU WEDU KL6 Xs x6
2 =B2 =C2 =D2 =J2+2*H2 =M2+K2+I2
(6.14) (6.15)

Note that we copy the values of the HEDU, WEDU and KL6 variables in columns J-L. The
reason we are doing this is that we need to have the columns of explanatory variables next to one
another to be able to use the Excel regression analysis tool.

In columns H-1, we will generate samples of random numbers from a normal distribution with
mean 0 and standard deviation 1.

Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.
170 Chapter 6

I� Data�nialys.f� I

I Fcirl!!ILilll� I l':lata� --
R:evh•w
Anal)'�is

The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
r· ---= -------

I Data Analysis. m�
,;nalysis,Too1$
OK
F-TestTwo-Sam�e for Varianc.es
Fourier Analysis Cancel
l:lisfogrlilm
Movi".1.9.!l"era�
IMf;ffldl MNfui,@@l�!.J, !::!.elp

Rank and Percentile


Regression
Sampling
t-Test: Paired Two .Sample fur Meal'lS
t·Test: Two·Sample Assuming Equal \larianc.es

A Random Number Generation dialog box pops up. We need to generate two sets of random
numbers: one for our X5 variable and one for our X6 variable, so we specify 2 in the Number of
Variables window. We would like to generate as many data points as we have in the data set we
are working with, so we specify 428 in the Number of Random Numbers window. We select
Normal in the Distribution window; the selected Parameters should be Mean equal to 0, and
Standard deviation equal to 1. Select Output Range and specify it to be H2:1429. Finally, we
select OK.
Random tfomb<er Generation
��
'Number'of!!'.ariables;
J 2.-
'- - ---� �
'Number of Random 'Num!;\ers: I.._4:2_8
_____,

'gjstribulion.: ! Normal �1 t!elp

M'�an=

2,tandard deviation = [=1

"B,arndom Seed::

Output options
0 Quiput Range;

0 Jlle'll' WGrk!;l'leet Ely:

0 Nell)' �Gtkbook

After you copy the content of cells J2:N2 to cells J3:N429, your table should look like the one
below (only the first five values are shown below):
Further Inference in the Multiple Regression Model 171

l;-l I I I j I K I L I M I N
1 NIO� 1 �for x� Nfa>, 1) for x6 HE WE KLG X'!J XG
2 1-167550>181 0-2U471 5il9 12 12 1 14.3351 2&_53982'
]. 0_2412639'33 0_08421011 9 12 0 9_482528 2'1 _56'674
T -0_ 7'237940·74 0_549'94871 12 12'. 1 10_55241 23,_1023G
-
5,- 0.459443648 0.53'153258 10 12 0 1 o.:J36:89 23.47042
_? 1_7905404-0.9 -0. 5.&18233 12 14 1 15_58108 29'_01926

In the Regression dialog box, the Input Y Range should be Al:A429, and the Input X Range
should be Jl:N429. Check the box next to Labels. Select new Worksheet Ply and name it
Irrelevant Variable Model. Finally, select OK.

!
������������������- �-

R·egre5.sicm ��
Input
Irnput I Range:

lnput ;ii;_ Range:


1 $A$1=$A�429
$J:$1::FJ.$4�


� el

D Consblnt is Zern
t!elp
0!..abels
0 Confider1a> !Level: EJ ·ry,.
Output opoons.

0 Qutput Range; I Re::;'r :·:j - - �1


•-

0 ·Nel'I• Worksheet E'.Jy: I ;nt Variable Model ! I

Note: we obtained different random samples than the ones recorded in the edu_inc data set, this is
why our resulting estimated equation will also differ from the one reported on p. 236 of
Principles of Econometrics, 4e. You will also obtain different parameters estimates for equation
(6.12) because your random numbers will differ from those above.

Our regression analysis results are:

_-6_ B c D E F H
1 SUMMARY OUTPUT
2
3 Re ression Sfalistics
4 MultipleR 0 _421302759<
5 _R ��;uarn ,,,--+-0.1774960·�5
6 R Square-

T
Adjust•ed 0_.1§n�_0707
7 Stantlard Erri:rr 40247-24063
8 0 bs erva't i o-n s 428

190 IANOVA
11 I df SS MS F Sig_nificance F !
t2 Regressi(m 5 1A751.5E+11 29502937711 18.21348455 2.23?01E-i 6:
13 R·esidt:1al 422' 6_83573E+11 1619'840378
14 fatal 427 8_31 OS7'E+11
151
16-I C1Jeffic;lerrls Slandard &ror t Stat P-�aJue L1Jwer 95% Upper95% Lower95.0% Ue_e_e95"0%
17 Intercept -7'682_625 15.2 11�8!U2S23 -0_6.86602894 DA927!�098 -2967· 6 . 3&31 2 14311.132'81 -2967iU8"312
.. .. ... 14311_132'81
18 HE 2_ 4592'645
_ 1
� .

303 5_ 184489'· 1234 _ 183�29 0_01432250 B fi09-27116B:3: .5461.09781 609-27116B3• .546-1_fr97'8 1


19 WE 4097.602729' 224:8.8598.81 1.822:080052 0.069150476 -'322.7591451 a:s11 .964&04 -322. 7591451 8517.9646;04
20 Kl6 1 42 75 1 99 41 5016_721707 -2_ 84552348 0_0046498&3 "241J6_014os, -4414-324769 "241JG_oi4os -4414_324769,
21 Jit5 -487_�()440J-38
- . . 1

22 34_009 772 -4ilts2i&s14·


- 3904--.1. 28447 -481s21&s14
. 3904_ 128447:
1945_4 82493 Q_ 73255486:3 -3158.77510-5; 448!U1067:J' -3158_775105
-0-2'.lB 01JJ.B6 0_8275-2406;3
22 x6 665_267784 0_3:419551164 4499_310&73'
172 Chapter 6

6.6.3 The RESET Test


Let (b1, b2, b3, b4) be the least squares estimates of the family income model (6.11); the
predicted values of family income are:

(6.15)

Consider further the following two artificial models and their associated test for misspecification.
We will use an F-test for both even though at-test could be used for the RESET test 1.

(6.16)

RESET test 1: H0: y1 = 0, H1: Y1 * 0


Unrestricted model: equation (6.16)
Restricted model: equation (6.11)

FAMINCi
- 2 - 3
=

(6.17)
/31 + {32HEDUi + {33WEDUi + {34KL6i + y1FAMINCi + y2FAMINCi + ei

RESET test 2: H0: y1 = y2 = 0, H1: y1 * 0 and/or y2 * 0


Unrestricted model: equation (6.17)
Restricted model: equation (6.11)

Go back to your education and income data worksheet, from where we will first estimate the
restricted model (6.12). In the Regression dialog box, the Input Y Range should be Al:A429,
and the Input X Range should be Bl:D429. Check the box next to Labels. Select Output Range
and specify it to be cell Al in your restricted Model worksheet: you can place your cursor in the
Output Range window and move it to that cell to do that, or type 'Restricted Model'!Al in the
Output Range window. Finally select OK.

. ------- · - �

, Regres.s.imi ��
Input
lr\pu t y Range ::

Input� Ral'illlle,:
I $A$1:SA�$42.9
$8$1:�$429


� el

t!�P
�!..abels 0 Consrant is. £em
D Coojjdena- Le.11el :, EJ '%
Output options

@ QulptJt RanQe: j Model'l $1'1 $1I �

Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
Further Inference in the Multiple Regression Model 173

A I B I c I D E I F I G H I
1 SUMMARY OUTPUT I I
-

2 t I
3 Regr;ession SJatisfics
l Multiple R ()_420919613
2-
-

R Squ�!:e ()_ 1771'73321


6 Adjusted R S-q,uare 0_171351434"

_]_ Standard Error 40160.08{4


6 Observ.a1ions 418
9 I
10 ANOVA I
11 rif SS MS F Sii:inific.ance F
-
1'2 Regression 3 1.47247E+11 49082167249 30.4 3.228498 U736E-18
-
1.3 Resid·ual
-
424 6.ll3841E+11 1'612832138 -

14 fotal