GENEVIEVE BRIAND
Washington State University
R. CARTER HILL
Louisiana State University
Copyright© 2010, 2011 John Wiley & Sons, Inc. All rights reserved. No part of this
publication rnay be reproduced, stored in a retrieval system or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act,
without either the prior written permission of the Publisher, or authorization through
payment of the appropriate percopy fee to the Copyright Clearance Center, Inc. 222
Rosewood Drive, Danvers, MA 01923, website www.copyright.corn. Requests to the
Publisher for permission should be addressed to the Permissions Department, John Wiley
& Sons, Inc., 111 River Street, Hoboken, NJ 070305774, (201)7486011, fax (201)748
6008, website http://www.wiley.corn/go/permissions.
ISBN13 9781118032107
10 9 8 7 6 5 4 3 2 1
Preface
This book is a supplement to Principles of Econometrics, 4th Edition by R. Carter Hill, William E.
Griffiths and Guay C. Lim (Wiley, 2011). This book is not a substitute for the textbook, nor is it a
stand alone computer manual. It is a companion to the textbook, showing how to perform the
examples in the textbook using Excel 2007. This book will be useful to students taking
econometrics, as well as their instructors, and others who wish to use Excel for econometric
analysis.
In addition to this computer manual for Excel, there are similar manuals and support for the
software packages EViews, Gretl, Shazam, and Stata. In addition, all the data for Principles of
Econometrics, lh in various formats, including Excel, are available at
http://www.wiley.com/college/hill. Individual data files, as well as errata for this manual and the
textbook, can also be found at http://principlesofeconometrics.com.
The chapters in this book parallel the chapters in Principles of Econometrics, lh. Thus, if you
seek help for the examples in Chapter 11 of the textbook, check Chapter 11 in this book.
However within a Chapter the sections numbers in Principles of Econometrics, lh do not
necessarily correspond to the Excel manual sections.
This work is a revision of Using Excel 2007 for Principles of Econometrics, 3rd Edition by
Genevieve Briand and R. Carter Hill (Wiley, 2010). Genevieve Briand is the corresponding
author.
Genevieve Briand
School of Economic Sciences
Washington State University
Pullman, WA 99164
gbriand@wsu.edu
R. Carter Hill
Economics Department
Louisiana State University
Baton Rouge, LA 70803
eohill@lsu.edu
·
Microsoft product screen shot(s) reprinted with permission from Microsoft Corporation. Our use does not directly or indirectly imply
Microsoft sponsorship, affiliation, or endorsement.
iv
BRIEF CONTENTS
1. Introduction to Excel 1
8. Heteroskedasticity 204
Index 466
v
CONTENTS 2.4.1 Model Assumptions 45
2.4.2 Random Number Generation
47
CHAPTER 1 Introduction to Excel 1
2.4.3 The LINEST Function 49
1.1 Starting Excel 1
2.4.4 Repeated Sampling 50
1.2 Entering Data 3
2.5 Variance and Covariance ofb1 and b2
1.3 Using Excel for Calculations 3
52
1.3.1 Arithmetic Operations 3
2.6 Nonlinear Relationships 53
1.3.2 Mathematical Functions 4
2.6.1 A Quadratic Model 53
1.4 Editing your Data 6
2.6.la Estimating the Model
1.5 Saving andPrinting your Data 8
53
1.6 Importing Data into Excel 10
2.6.lb ScatterPlot ofData
1.6.1 Resources for Economists
with Fitted Quadratic
on the Internet 10
Relationship 55
1.6.2 Data Files forPrinciples of
2.6.2 A LogLinear Model 57
Econometrics 13
2.6.2a Histograms ofPRICE
1.6.2a John Wiley & Sons
and ln(PRJCE) 57
Website 13
2.6.2b Estimating the Model
1.6.2bPrinciples of
61
Econometrics Website
2.6.2c ScatterPlot ofData
14
with Fitted Log
1.6.3 Importing ASCII Files 14
Linear Relationship
62
CHAPTER 2 The Simple Linear Regression 2.7 Regression with Indicator Variables 63
Model 19 2.7.1 Histograms ofHousePrices
2.1 Plotting the Food Expenditure Data 19 63
2.1.1 Using Chart Tools 21 2.7.2 Estimating the Model 65
2.1.2 Editing the Graph 23
2.1.2a Editing the Vertical
CHAPTER 3 Interval Estimation and
Axis 23
Hypothesis Testing 67
2.1.2b Axis Titles 24
3.1 Interval Estimation 68
2.1.2c Gridlines and Markers
3.1.1 The tDistribution 68
25
3.1.1a The tDistribution
2.1.2d Moving the Chart
versus Normal
26
Distribution 68
2.2 Estimating a Simple Regression 27
3.1.1b tCritical Values and
2.2.1 Using Least Squares
Interval Estimates
Estimators' Formulas 27
69
2.2.2 Using Excel Regression
3.1.1c Percentile Values
Analysis Routine 31
69
2.3 Plotting a Simple Regression 34
3.1.1d TINY Function 69
2.3.1 Using TwoPoints 34
3.1.le Appendix E: Table 2
2.3.2 Using Excel Builtin Feature
inPOE 71
38
3.1.2 Obtaining Interval Estimates
2.3.3 Using a Regression Option
71
38
3.1.3 An Illustration 71
2.3.4 Editing the Chart 40
2.4 Expected Values of b1 and b2 44
vi
3.1.3a Using the Interval 3.4.1 ThepValue Rule 88
Estimator Formula 3.4.1a Definition ofpvalue
71 88
3.1.3b Excel Regression 3.4.1b Justification for thep
Default Output 73 Value Rule 89
3.1.3c Excel Regression 3.4.2 The TDIST Function 91
Confidence Level 3.4.3 Examples of Hypothesis Tests
Option 74 Revisited 92
3.1.4 The Repeated Sampling 3.4.3a RightTail Test from
Context (Advanced Material) Section 3.3.1b 92
75 3.4.3b LeftTail Test from
3.1.4a Model Assumptions Section 3.3.2 92
75 3.4.3c TwoTail Test from
3.1.4b Repeated Random Section 3.3.3a 93
Sampling 75 3.4.3d TwoTail Test from
3.1.4c The LINEST Function Section 3.3.3b 93
Revisited 77
3.1.4d The Simulation
CHAPTER 4 Prediction, GoodnessofFit
Template 78
and Modeling Issues 95
3.1.4e The IF Function 79
4.1 Least Squares Prediction 96
3.1.4f The OR Function 79
4.2 Measuring GoodnessofFit 98
3.1.4g The COUNTIF
4.2.1 Coefficient of Determination
Function 80
or R2 98
3.2 Hypothesis Tests 81
4.2.2 Correlation Analysis and R2
3.2.1 OneTail Tests with
98
Alternative "Greater Than" (>)
4.2.3 The Food Expenditure
81
Example and the CORREL
3.2.2 OneTail Tests with
Function 99
Alternative "Less Than"(<)
4.3 The Effects of Scaling the Data 100
82
4.3.1 Changing the Scale of x 100
3.2.3 TwoTail Tests with
4.3.2 Changing the Scale ofy 101
Alternative "Not Equal To"(:1:)
4.3.3 Changing the Scale of x andy
82
102
3.3 Examples of Hypothesis Tests 82
4.4 A LinearLog Food Expenditure Model
3.3.l RightTail Tests 83
104
3.3.la OneTail Test of
4.4.l Estimating the Model 104
Significance 84
4.4.2 Scatter Plot of Data with Fitted
3.3.lb OneTail Test of an
LinearLog Relationship 105
Economic Hypothesis
4.5 Using Diagnostic Residual Plots 108
84
4.5.1 Random Residual Pattern
3.3.2 LeftTail Tests 84
108
3.3.3 TwoTail Tests 86
4.5.2 Heteroskedastic Residual
3.3.3a TwoTail Test of an
Pattern 111
Economic Hypothesis
4.5.3 Detecting Model Specification
87
Errors 112
3.3.3b TwoTail Test of
4.6 Are the Regression Errors Normally
Significance 87
Distributed? 115
3.4 ThepValue 88
vii
4.6.1 Histogram of the Residuals 5.3.2a LeftTail Test of
115 Elastic Demand
4.6.2 The JarqueBera Test for 146
Normality using the CHINV 5.3.2b RightTail Test of
and CHIDIST Functions 118 Advertising
4.6.3 The JarqueBera Test for Effectiveness 147
Normality for the LinearLog 5.4 Polynomial Equations: Extending the
Food ExpenditureModel 121 Model for Burger Barn Sales 148
4.7 PolynomialModels: An Empirical 5.5 Interaction Variables 149
Example 122 5.5.1 LinearModels 149
4.7.1 Scatter Plot of Wheat Yield 5.5.2 LogLinearModels 151
over Time 123 5.6 Measuring GoodnessofFit 153
4.7.2 The Linear EquationModel
125
CHAPTER 6 Further Inferenee in the
4.7.2a Estimating theModel
Multiple Regression Model 154
125
6.1 Testing the Effect of Advertising: the F
4.7.2b Residuals Plot 126
test 154
4.7.3 The Cubic EquationModel
6.1.1 The Logic of the Test 154
126
6.1.2 The Unrestricted and
4.7.3a Estimating theModel
RestrictedModels 155
126
6.1.3 Test Template 158
4.7.3b Residuals Plot 128
6.2 Testing the Significance of theModel
4.8 LogLinearModels 129
159
4.8.1 A Growth Model 129
6.2.1 Null and Alternative
4.8.2 A Wage Equation 130
Hypotheses 159
4.8.3 Prediction 132
6.2.2 Test Template 159
4.8.4 A Generalized R2Measure
6.2.3 Excel Regression Output 160
135
6.3 The Relationship between t and FTests
4.6.5 Prediction Intervals 136
161
4.9 A LogLogModel: Poultry Demand
6.4 Testing Some Economic
Equation 139
Hypotheses 163
4.9.1 Estimating theModel 139
6.4.1 The Optimal Level of
4.9.2 A Generalized R2Measure
Advertising 163
140
6.4.2 The Optimal Level of
4.9.3 Scatter Plot of Data with Fitted
Advertising and Price 164
LogLog Relationship 140
6.5 The Use of Nonsample Information
166
CHAPTER 5 The Multiple Linear Regression 6.6 Model Specification 167
143 6.6.1 Omitted Variables 167
5.1 Least Squares Estimates Using the 6.6.2 Irrelevant Variables 169
Hamburger Chain Data 143 6.6.3 The RESET Test 172
5.2 Interval Estimation 145 6.7 Poor Data, Collinearity and
5.3 Hypothesis Tests for a Single Coefficient Insignificance 176
145 6.7.1 CorrelationMatrix 176
5.3.1 Tests of Significance 145 6.7.2 The CarMileageModel
5.3.2 OneTail Tests 146 Example 177
viii
CHAPTER 7 Using Indicator Variables 180 8.4.2 Grouped Data: Wage Equation
7.1 Indicator Variables: The University Example 222
Effect on House Prices Example 180 8.4.2a Separate Wage
7.2 Applying Indicator Variables 182 Equations for
7.2.1 Interactions Between Metropolitan and
Qualitative Factors 182 Rural Areas 222
7.2.2 Qualitative Factors with 8.4.2b GLS Wage Equation
Several Categories 185 223
7.2.3 Testing the Equivalence of 8.5 Generalized Least Squares: Unknown
Two Regressions 187 Form of Variance 224
7.3 LogLinear Models: a Wage Equation
Example 191
CHAPTER 9 Regressions with Time Series
7.4 The Linear Probability Model: A
Data: Stationary Variables 228
Marketing Example 192
9.1 Finite Distributed Lags 228
7.5 The Difference Estimator: The Project
9.1.1 US Economic Time Series
STAR Example 193
228
7.6 The DifferencesinDifferences
9.1.2 An Example: The Okun's Law
Estimator: The Effect of Minimum Wage
230
Change Example 198
9.2 Serial Correlation 232
9.2.1 Serial Correlation in Ouput
CHAPTER 8 Heteroskedasticity 204 Growth 232
8.1 The Nature ofHeteroskedasticity 204 9.2.la Scatter Diagram for Gt
8.2 Detecting Heteroskedasticity 206 and Gt1 232
8.2.1 Residual Plots 206 9.2.lb Correlogram for G
8.2.2 Lagrange Multiplier Tests 233
206 9.2.2 Serially Correlated Errors
8.2.2a Using the Lagrange 237
Multiplier or Breusch 9.2.2a Australian Economic
Pagan Test 206 Time Series 237
8.2.2b Using the White Test 9.2.2b A Phillips Curve
209 239
8.2.3 The GoldfeldQuandt 9.2.2c Correlogram for
Test 210 Residuals 240
8.2.3a The Logic of the Test 9.3 Lagrange Multiplier Tests for Serially
210 Correlated Errrors 241
8.2.3b Test Template 211 9.3.1 !Test Version 241
8.2.3c Wage Equation 9.3.2 T x R2 Version 243
Example 212 9.4 Estimation with Serially Correlated
8.2.3d Food Expenditure Errors 245
Example 216 9.4.1 Generalized Least Squares
8.3 HeteroskedasticityConsistent Standard Estimation of an AR(1) Error
Errors or the White Standard Errors Model 245
219 9.4.la The PraisWinsten
8.4 Generalized Least Squares: Known Form Estimator 245
of Variance 221 9.4.lb The CochraneOrcutt
8.4.1 Variance Proportional to x: Estimator 248
Food Expenditure Example 9.4.2 Autoregressive Distributed
221 Lag (ARDL) Model 252
ix
9.5 Forecasting 254 11.1.2a 2SLS Estimates for
9.5.1 Using an Autoregressive (AR) Truffle Demand
Model 254 281
9.5.2 Using an Exponential 11.1.2b 2SLS Estimates for
Smoothing Model 257 Truffle Supply
9.6 Multiplier Analysis 258 283
11.2 Supply and Demand Model for the
Fulton Fish Market 286
CHAPTER 10 Random Regressors and
11.2.1 The Reduced Form Equations
MomentBased Estimation 262
286
10.1 OLS Estimation of a Wage Equation
11.2.la Reduced Form
262
Equation for lnQ
10.2 Instrumental Variables Estimation of the
286
Wage Equation 264
11.2.1b Reduced Form
10.2.1 With a Single Instrument 264
Equation for lnP
10.2.la First Stage Equation
287
for EDUC 264
11.2.2 The Structural Equations or
10.2.lb Stage 2 Least
Stage 2 Least Squares
Squares Estimates
Estimates 290
265
11.2.2a 2SLS Estimates for
10.2.2 With a Surplus Instrument
Fulton Fish Demand
268
290
10.2.2a First Stage Equation
for EDUC 268
10.2.2b Stage 2 Least CHAPTER 12 Nonstationary TimeSeries
Squares Estimates Data and Cointegration 294
270 12.1 Stationary and Nonstationary
10.3 Specification Tests for the Wage Variables 294
Equation 273 12.1.1 US Economic Time Series
10.3.1 The Hausman Test 273 294
10.3.2 Testing Surplus Moment 12.1.2 Simulated Data 296
Conditions 274 12.2 Spurious Regressions 299
12.3 Unit Root Tests for Stationarity 301
12.4 Cointegration 306
x
CHAPTER 14 TimeVarying Volatility and 15.4.3 Estimation: Different
ARCH Models 328 Coefficients, Different Error
14.1 TimeVarying Volatility 328 Variances 384
14.1.1 Returns Data 328 15.4.4 Seemingly Unrelated
14.1.2 Simulated Data 334 Regressions: Testing for
14.2 Testing and Forecasting 341 Contemporaneous Correlation
14.2.1 Testing for ARCH Effects 388
341
14.2.la Time Series and
CHAPTER 16 Qualitative and Limited
Histogram 342
Dependent Variable Models 391
14.2.lb Lagrange Multiplier
16.1 Least Squares Fitted Linear Probability
Test 344
Model 391
14.2.2 Forecasting Volatility 347
16.2 Limited Dependent Variables 393
14.3 Extensions 349
16.2.1 Censored Data 393
14.3.1 The GARCH Model 349
16.2.2 Simulated Data 395
14.3.2 The TGARCH Model 350
14.3.3 The GARCHInMean Model
352 APPENDIX A Mathematical Tools 402
A. I Mathematical Operations 402
A.1.1 Exponents 408
CHAPTER 15 Panel Data Models 355
A.1.2 Scientific Notation 409
15.1 Pooled Least Squares Estimates of Wage
A.1.3 Logarithm and the Number e
Equation 355
410
15.2 The Fixed Effects Model 357
A.2 Percentages 413
15.2.1 Estimates of Wage Equation
for SmallN 357
15.2.la The Least Squares APPENDIX B Review of Probability
Dummy Variable Concepts 416
Estimator for Small B.1 Binomial Probabilities 416
xi
B.3 Distributions Related to the Normal
426
B.3.1 The ChiSquare Distribution
426
B.3.2 The tDistribution 428
B.3.3 The FDistribution 429
unkown 446
C.4.2 Interval Estimation with the
Hip Data 447
C.5 Hypothesis Tests About a Population
Mean 449
C.5.1 An Example 450
C.5.2 The pvalue 450
C.5.3 A Template for Hypothesis
Tests 451
C.6 Other Useful Tests 454
C.6.1 Simulating Data 454
C.6.2 Testing a Population Variance
456
C.6.3 Testing Two Population Means
459
C.6.4 Testing Two Population
Variances 461
C.7 Testing Population Normality 463
C.7.1 A Histogram 463
C.7.2 The JacqueBera Test 465
Index 467
xii
CHAPTER 1
Introduction to Excel
CHAPTER OUTLINE
1.1 Starting Excel 1.6 Importing Data into Excel
1.2 Entering Data 1.6.1 Resources for Economists on the Internet
1.3 Using Excel for Calculations 1.6.2 Data Files for Principles of Econometrics
1.3.1 Arithmetic Operations 1.6.2a John Wiley & Sons Website
1.3.2 Mathematical Functions 1.6.2b Principles of Econometrics Website
1.4 Editing your Data 1.6.3 Importing ASCII Files
1.5 Saving and Printing your Data
Find the Excel shortcut on your desktop. Double click on it to start Excel (left clicks).
Alternatively, leftclick the Start menu at the bottom left comer of your computer screen.
i1/,; Sta rt
... " ' .:,!o., ""
Slide your mouse over All programs, Microsoft Office, and finally Microsoft Office Excel
2007. Leftclick on this last one to start Excelor better yet, if you would like to create a
shortcut, rightclick on it; slide your mouse over Send to, and then select (i.e. drag your mouse
over and leftclick on) Desktop (create shortcut). An Excel 2007 shortcut is created on your
desktop. If you rightclick on your shortcut and select Rename, you can also type in a shorter
name like Excel.
1
2 Chapter 1
Excel opens to a new file, titled Book I. You can find the name of the open file on the very top of
the Excel window, on the Title bar. An Excel file like Bookl contains several sheets. By default,
Excel opens to Sheet I of Book I. You can figure out which sheet is open by looking at the Sheet
tabs found in the lower left comer of your Excel window.
 "
�
$ty/es
10 cell reference group of
II c1>mmand.s
v
ll_
11
There are lots of little bits that you will become more familiar with as we go along. The Active
cell is surrounded by a border and is in Column A and Row I; its Cell reference is Al.
Below the title bar is a Tab list. The Home tab is the one Excel opens to. Under each tab you
will find groups of commands. Under the home tab, the first one is the Clipboard group of
commands, named after the tasks it relates to. The wide bar including the tab list and the groups
of commands is referred to as the Ribbon. The content of the Active cell shows up in the
Formula bar (right now, there is nothing in it). Perhaps the most important of all of this is to
locate the Help button on the upper right comer of the Excel window. Finally, you can use the
Scroll bars and the arrows around them to navigate updown and rightleft in your worksheet.
And you have a long way to go: each worksheet in Microsoft Excel 2007 contains 1,048,576
rows and 16,384 columns!!!!
Note that your Ribbon might look slightly different than the one shown above. If your screen is
bigger, Excel will automatically display more of its available options. For example, in the Styles
group of command, instead of the Cell styles button, you might have a colorful display of cell
styles.
Introduction to Excel 3
We will use Excel to analyze data. To enter labels and data into an Excel worksheet move the
cursor to a cell and type. First type X in cell Al. Press the Enter key on your keyboard to get to
cell A2 or navigate by moving the cursor with the mouse, or use the Arrow keys (to move right,
left, up or down). Fill in the rest as shown below:
1
2
3
4
s
What is Excel good for? Its primary usefulness is to carry out repeated calculations. We can add,
subtract, multiply and divide; and we can apply mathematical and statistical functions to the data
in our worksheet. To illustrate, we are going to compute the squares of the numbers we just
entered and then add them up. There are two main ways to perform calculations in Excel. One is
to write formulas using arithmetic operators; the other is to write formulas using mathematical
functions.
Select the Excel Help button in the upper right comer of your screen. In the window of the Excel
Help dialog box that pops up, type arithmetic operators and select Search. In the list of results,
select Calculation operators and precedence.
�Excel He.Ip
R.esults 125 �f l'J
 l!ll x (� ... �) �) � � Ai
arithmetic0perators '_formulas
Standard arithmetic operators are defined as shown below. To close the Excel help dialog box,
select the X button found on its upper right comer.
Negation 1
�
.. (caret) ExponentiaUon 3"2
4 Chapter 1
Place your cursor in cell Bl, and type Xsquared. In cells B2 through B6 below (henceforth
referred to as B2:B6), we are going to compute the squares of the corresponding values from cells
A2:A6. Let us emphasize that the trick to using Excel efficiently is NOT to retype values already
stored in the worksheet, but instead to use references of cells where the values are stored. So, to
compute the square of 1, which is the value stored in cell Al, instead of using the formula =l*l,
you should use the formula =A2*A2 or =A2"2. Place your cursor in cell B2 and type the formula.
SUM
.. ( x "" f;o I =A2"2
A I B j c I D I
1 )(
2 1] ill •
Then press Enter. Note that: (1) a formula always starts with an equal sign; this is how Excel
recognizes it is a formula, and (2) formulas are not case sensitive, so you could also have typed
=a2"2 instead. Now, we want to copy this formula to cells B3:B6. To do that, place your cursor
back into cell B2, and move it to the southeast comer of the cell, until the fat cross turns into a
skinny one, as shown below:
A I B � c
1 x Xs91.1•nea
2.
11 11
,_ f
3 .2
Leftclick, hold it, drag it down to the next four cells below, and release!
Excel has copied the formula you typed in cell B2 into the cells below. The way Excel
understands the instructions you gave in cell B2 is "square the value found at the address A2".
Now, it is important to understand how Excel interprets "address A2". To Excel "address A2"
means "from where you are at, go left by one cell"because this is where A2 is located visavis
B2. In other words, an address gives directions: leftright, updown, and distances: number of
cells awayall in reference to the cell where the formula is entered. So, when we copied the
formula we entered in cell B2, which instructed Excel to collect the value stored onecell away
from its left, and then square itthose exact same instructions were given in cells B3:B6. If you
place your cursor back into B3, and look at the Formula bar, you can see that, in this cell, these
same instructions translate into "=A3"2".
There are a large number of mathematical functions. Again, the list of functions available in
Excel can be found by calling upon our good friend Help button and type Mathematical
functions. If you try it, you will be able to see that the list is long. We will not copy it here.
Introduction to Excel 5
We did compute the squares of the numbers we had. Now we will add them upthe numbers,
and the squares of the numbers, separately. For that, we will be using the SUM function.
We first need to select or highlight all the numbers from our table. There are several ways to
highlight cells. For this small area the easiest way is to place your cursor in A2, hold down the
left mouse button and drag it across the area you wish to highlighti.e. all the way to cell B6.
Here is how your worksheet should look like:
A B I
1 x Xsauared
2 1 1
a 2 4
4 3 '9
5 4 16
6 5 025 •
Next, go to the Editing group of command, which is found in the extreme right of the Home tab,
and select :r. AutoSum.
i%Aut�� �
!ii f!IC:!:" Z1f'
Sort & Find &
Cl;ear •
Hitt r • Selt:d •
Editing
Excel sums the numbers from each column and places the sum in the bottom cell of each column.
The result is:

.A El I
1 x Xsquared
2 1 1
3 2 4
4 3 9
5 4 16
5 5 2.5
7 15 55
..
•
Notice that if you select the arrow found to the right of :r. AutoSum you can find a list of
additional calculations that Excel can automatically perform for you.
Alternatively, you could have placed your cursor in cell A7, typed =SUM(A2:A6), and pressed
the Enter key (and then copied this formula to cell B7).
A I B
7 l=SUM(A2:::" 6)
Note that: (1) as soon as you type the first letter of your function, a list of all the other available
functions that start with the same letter pops up. This can be very useful: if you left click on any
of them, Excel gives you its definition; if you double leftclick on any of them, it automatically
finishes typing the function name for you, and (2) once the function name and the opening
parenthesis are typed, Excel reminds you of what the needed Arguments are, i.e. what else you
need to specify in your function to use it properly.
6 Chapter 1
Now, you could also have used the Insert function button, which you can find on the left side of
the Formula bar .
Once your cursor is placed in A 7, select the Insert function button. An Insert function dialog
box pops up. You can Select a function you need (highlight it, and select OK), or Search for a
function first (follow the instructions given in that window).
    __
Select a funttiC!JQ_:
"I
In the Function Arguments dialog box that pops up, you need to specify the cell references of
the values you want to add. If they are not already properly specified, you can type A2:A6 in the
Number 1 window, or place your cursor in the window, delete whatever is in it, and then select
A2:A6. Select OK. Now that you have the formula in A7, copy it into B7 .
. 
Number1 jA2::A6
Before wrappingup, you want to polish the presentation of your data. It actually has less to do
with appearance than with organization and communication. You want to make sure that anyone
can easily make sense of your table (like your instructor for example, or yourself for that
matterwhen you come back to it after you let it sit for a while).
We are going to add labels and color/shade to our table. Hold your cursor over cell A until it turns
into an arrowdown; leftclick to select the whole column; and select Insert in the Cells group of
commands, found left to the Editing group of commands.
JS.:i.
·n � l g iH
1 x
2 l 2. 1 [ns_ert De�.e1e li'o�at
�
3 z 2
_3
4 3 4 3 C:�ll•
Excel adds a new column to the left of the one you selected. That's where we are going to write
our labels. In the new Al cell, type Variables; in cell A2, type Values; in cell A7 type Sum .
Introduction to Excel 7
A B A
1 x 1 v.a�iables

2 1 
2 Values

3 
2 3
4 3 
4
5 4 5
5 5 5
L 15 7 Sum
Select column A again, make it Bold (Font group of commands, right to the Clipboard one), and
align it Left (Alignment group of commands, right to the Font one).
caribri �l I A � •
[= = =lJ�· / � wrapT�xt
�I Ir T1[03 Tl[&� ,A �/ Ii([§ �J I �� ��l

Font fii ·
Al1gnment
Select cells Bl and Cl, and make them Bold. Repeat with cells B7 and C7. Better, but not there
yet. Select row 7, make it Italic (next to Bold). Select column B, hold your leftclick and drag
your mouse over cell C to select column C too; select Center alignment (next to Left). Next,
select A2:A6; leftclick the arrow next to Merge & Center (on the Alignment group of
commands), and select Merge cells.
Immediately after, select Middle Align, which is found right above the Center alignment button.
AllJJnm�nt
Select Al:C7, leftclick the arrow next to the Bottom Border button and select All Borders.
61),r.ilers
BJ llQtl.Om Bo·rder
Select A7:C7 (A7:C7, not Al:C7 this time), leftclick the arrow next to the Fill Color button,
and select a grey color to fill in the cell with. Choose a different color for Al:Cl.
8 Chapter 1
Theme Colms
[caJilbri T 111
rA ATJ
T
le I JI ·j I � �1 �. A 1 �
Fant � Ii
Finally, put your cursor between cells C and D until it turns to a left and right arrow as shown
here:
C + D
Hold it there and double leftclick so that the width of column C gets resized to better
accommodate the length of the label "Xsquared". The result is:
A B c
rtl. 
 1 
1 variables x Xsquared
1 1
�"''"�
2 4
3 9
4 16
5 25
7 fsum 15 55
Next, drag your cursor over the Sheetl tab, rightclick, select Rename and type in a descriptive
name for your worksheet like Excel for POE 1.21.4, for Using Excel for Principles of
Econometrics, 4esections 1.2 through 1.4. Press the Enter key on your keyboard or leftclick
anywhere on your worksheet.
All you need to do now is to save your Excel file. Select the Save button on the upper left comer
of the Excel window.
A Save As dialog box pops up. Locate the folder you want to save your file in by using the
arrowdown located at the extreme right of the Save in window or browsing through the list of
folders displayed below it.
Introduction to Excel 9
In the File name window, at the bottom of the Save As dialog box, the generic name Bookl
should be outlined. Type the descriptive name you would like to give to your Excel file, like POE
Chapter 1. Finally, select Save.
If you need to create a new folder, use the Create New Folder button found to the right of the
Save in window.
A New Folder dialog box pops up; it is prompting you for the name you want to give to your new
folder, Excel for POE for example. Type it in the Name window and select OK. Finally, select
Save.
� ���folder
f::!ame: jExcel for POE
 = �CgJ
c
If you would like to print your table, select the Office Button, next to the Save button; go to
Print, and select one of the print options.
f:rint
Se•lect.a p�inter, nrumb�r of rnpies,·and
oth .. r pri111tin.g optiorn< before prri·ntfng.
Qukl<Print
s�nd th• woukbo.olcdi'r�ctly ti© tm.e default
printer with.a"! makin9 changes,
Eri nt �· •
For more print options, you might want to check out the Page Layout tab, on the upper left of
your screen, as well as the Page Layout button on the bottom right of your screen.
To close your file, select the X button on the upper right comer of your screen.
 �Ix!
,�,  !' . � 1'
�
10 Chapter 1
In the next section, we show you how to import data into an Excel spreadsheet. Getting data for
economic research is much easier today than it was years ago. Before the Internet, hours would be
spent in libraries, looking for and copying data by hand. Now we have access to rich data sources
which are a few clicks away.
First we will illustrate how convenient sites that make data available in Excel format can be. Then
we illustrate how to import ASCII or, text files, into Excel.
Suppose you are interested in analyzing the GDP of the United States. The website Resources for
Economists contains a wide variety of data, and in particular the macro data we seek. Websites
are continually updated and improved. We guide you through an example, but be prepared for
differences from what we show here.
ISSN 1081·4248.
vol. 1J., No. s
RFE Seaoch May, 2010
• Int m d u ctio n
• D ta
•  "onarii=:s; G l o=a rles & Enc do edias
• E omi>ts. Dep.artments, & UniY c r s itii:.s.
• Fore casti ng & Con:.ulting
• Jobs. Grants. Grad School. & Advice
Select the Data link and then select U.S. Macro and Regional Data.
Introduction to Excel 11
.Data
This will open up a range of subdata categories. For the example discussed here, select the
Bureau of Economic Analysis (BEA).
dmw
Latest Information:
Federal Recovery Programs amd BEA Slatislics
Cl.:lrr.ent Releases
Dig ital tib·r,,.ry t Satellite Accnunt Survev Forms .aPld Related Materials
11 Rssie•arch arid De,uelopment.
l'apers. and Working l'all"'rs
View all lnte.rnati·onal Accounts Information •••
Methodology P"f>""' • View all N1ational Actounts Infarm.:atio1T1 ••.
The result shows the point we are making. Many government and other web sites make data
available in Excel format. Select Currentdollar and "real" GDP.
� Selected Nll?A Tallies: Vie•"' tne ch.ange..s to the layout for the advancoo
download P"ae
You have the option of saving the resulting Excel file to your computer or storage device, or
opening it right awaywhich we proceed to do next.
What opens is a workbook with headers explaining the variables it contained. We see that there is
a series of annual data and a quarterly series.
Introduction to Excel 13
,., A � B I c J _Q___j__ E I F I G I
1 JCurrentDollar and "RealA Gr·OSS Domestic Product
2
Quart�Jy
�

3 Annual

7
8


9, 1929 103.6 977.0 '1.�47q1 23'7.2 1/�2·.2
10 19'30 '91.2 s92
1 .a 1947q2 240.4 1, 7169.5
·11 1931 765 1!34_9 19471q] 244_5 1,7@.0
12

1932 SS.:7 725_S. 1_19471q4 254_3 1,7'94,,B
'13 1933 56.4 716.4 1.948i;j1 260.3 1,823'.4
The opened file is "Read Only" so you must save it under another name to work with it, graph,
run regressions and so on.
The book Principles of Econometrics, 4e, uses many examples with data. These data files have
been saved as workbooks and are available for you to download to your computer. There are
about 150 such files. The data files and other supplementary materials can be downloaded from
two web locations: the publisher website or the book website maintained by the authors.
Using your web browser, enter the address www.wiley.com/college/hill. Find, among the authors
named "Hill", the book Principles ofEconometrics, 4e.
t* TEXTBOOK
P1rfm:::i.p,1'es of 6c:Ooonu�trics., 4ttll EdJ1Jirn111
R Carter H ill CLouislan.a State Uni.versity), William E. Griffiths
Univers.ity Ctf'Melbourne·, Australia), Gua: C. Um (University of
Melb·ourne ustra.l ia)
January 2011, ©2012
Follow the link to Resources for Students, and then Student Companion Site. There, you will
find links to supplement materials, including a link to Data Files that will allow you to download
all the data definition files and data files at once.
14 Chapter I
The address for the book website is www.principlesofeconometrics.com. There, you will find
links to the Data definitions files, Excel spreadsheets, as well as an Errata list. You can download
the data definition files and the Excel files all at once or select individual files. The data definition
files contain variable names, variable definitions, and summary statistics. The Excel spreadsheets
contain data only; those files were created using Excel 2003.
lnstriuctor Resourrce s from John Wiley & Sons Data files, PowefPoirit Slides, Tustructo:r's.Mairnal
Student, Resources. frnm John Wiley & Sons Datafiles. .and Using Excelfor Principk� oiEconometri.c
Data files: POE includes 148 data files in various formats_ Usiri,g the links 'below you can download all files in a ".ZIP format,
or d01.Vn'load i'ndhiidual fi'le·s_ The data dennifio.n fil·es should he downloaded by all users_
Data d'efinitfon files (•_def) are text file·s conta:ining variable ·n ames., definitions .and summary statistics_
ASCII riles (•.dat) are text files contai.nin·g only data. Variable .names are in �.def files.
ASCII data files (* .dat) are text files containing only data.
Dnwriload all ilie * .. dat files in (a) ZIP format m· (b) a s.e1 f exib'adin!? EXE file (download and doubledick)
Rightclick on the file name. Select Save Target As. A Save As dialog box pops up. Locate the
folder you want to save your file in by using the arrowdown located at the extreme right of the
Save in window or browsing through the list of folders displayed below it. Finally, select Save.
Once the download of the file 1s completed, a Download complete window pops up. Choose
Close.
Do "'nload Complete
Start Excel. Select the Office Button on the upper left comer of the Excel window, then Open.
16 Chapter 1
Navigate to the location of the data file. Make sure you have selected All Files in the Files of
Type window. Select you food.dat file and then select Open .
. 
Open
If this.i;,·mrrect, ·choose Next) or ch�ose the data type that best describes your �ata.
_
Original data type
/
S:hoose the file type that best describe� your data:

0 Q_�limited  Characters such as commas or tabs separate each fi.eld:
®fh��·�··_cii�.\F1  Fields are aligned in colum.ns with spaces between each field. ·
PreviewofFile C:\data\econ4630\food.dat .
l . 115.ZZ 3.69
z 135. 98 4 .. 39
3 119. 31 4. 75
Pr.e'View of Data file
4 ll4. 9oS 6_0_3,
5 lB_'I_ 05 12: 47
__
In the next step the data are previewed. By clicking on the vertical black line you could adjust the
column width, but there is no need most of the time. For neatly arrayed data like ours, Excel can
determine where the columns end and begin. Select Next again.
Introduction to Excel 17
;
r �
Data .._reVie'l'll
30 40 SU 60 7tl
1Hi_2:! .3 _ 6 9 �I
135._:<l·S 439 1

11:9.34 4.7Ei I
11'4. S•o& 6. 03
1.87. 05 12.47
�I
Cam:el
l [ <�ck
l �:· ··.· ��:it_>·_ ··� [ EJnish
]
In the third and final step Excel permits you to format each column, or in fact to skip a column. In
our case you can simply select Finish.
r   �.
@ §erJeral
"General' cooverTii rn.1meric 11aliles ill numliers, d<1te v<11ues. ID d11tEs, and all
Ore·xt r·emair;iing values. to :text.
O Q.ate.1 j'1"1'ov_
__ _,,,v,,J,,, [ !!_dvanced . . .. ]
0 1Do. mit [mpcrt column (skip)
.... ,
'L3'9
=I
4_7�
fi_ 03
·
12.47 vj
�I
This step concludes the process and now the data is in a worksheet named food.
18 Chapter 1
II A I B I
1 115.22 3.69
2_ B5.98 4.39

3 119.�4 4.75
4 114.:96 6.03


5 187.05 12.47
1 .. � � �1 I food I<" � .•
Rl"aily
Next, you need to save your food data in an Excel File format. To do that, select the Office
Button, Save As, and finally Excel Workbook.
::
�oeel W«kboolt
Save the ffle as an El( (el Workboafc ts
· Enel M.acrnEniib!ed Wadl:bcmk.
• Savoe the workbook lrt !heXMLba5oed andi
macr.ae·nabred me farm.at.
A Save As dialog box pops up. Locate the folder you want to save your file in by using the
arrowdown located at the extreme right of the Save in window or browsing through the list of
folders displayed below it.
Excel has automatically given a File name, food.xlsx, and specify the file format in the Save as
type window, Excel Workbook (*.xlsx). All you need to do is select Save.
This completes our introductory Chapter. The rest of this manual is designed to supplement your
readings of Principles ofEconometrics, 4e. We will walk you through the analysis of examples
found in the text, using Excel 2007. We would like to be able to replicate most of the plots of data
and tables of results found in your text.
CHAPTER 2
CHAPTER OUTLINE
2.1 Plotting the Food Expenditure Data 2.4.2 Random Number Generation
2.1.1 Using Chart Tools 2.4.3 The LINEST Function
2.1.2 Editing the Graph 2.4.4 Repeated Sampling
2.1.2a Editing the Vertical Axis 2.5 Variance and Covariance of b1 and b2
2.1.2b Axis Titles 2.6 Nonlinear Relationships
2.1.2c Gridlines and Markers 2.6.1 A Quadratic Model
2.1.2d Moving the Chart 2.6.1a Estimating the Model
2.2 Estimating a Simple Regression 2.6.1b Scatter Plot of Data with Fitted
2.2.1 Using Least Squares Estimators' Formulas Quadratic Relationship
2.2.2 Using Excel Regression Analysis Routine 2.6.2 A LogLinear Model
2.3 Plotting a Simple Regression 2.6.2a Histograms of PRICE and
2.3.1 Using Two Points ln(PR/CE)
2.3.2 Using Excel Builtin Feature 2.6.2b Estimating the Model
2.3.3 Using a Regression Option 2.6.2c Scatter Plot of Data with Fitted
2.3.4 Editing the Chart LogLinear Relationship
2.4 Expected Values of b1 and b2 2.7 Regression with Indicator Variables
2.4.1 Model Assumptions 2.7.1 Histograms of House Prices
2.7.2 Estimating the Model
In this chapter we estimate a simple linear regression model of weekly food expenditure. We also
illustrate the concept of unbiased estimation. In the first section, we start by plotting the food
expenditure data.
Compare the values you have in your worksheet to the ones found in Table 2.1, p. 49 of
Principles of Econometrics, 4e. The second part of Table 2.1 shows summary statistics. You can
19
20 Chapter 2
compute and check on those by using Excel mathematical functions introduced in Chapter 1, if
you would like.
Select the Insert tab located next to the Home tab. Select A2:B41. In the Charts groups of
commands select Scatter, and then Scatter with only Markers.
40·
35
•
30
•
25 
20
•.series1
15
• •
10
Each point on this Scatter chart illustrates one household for which we have recorded a pair of
values: weekly food expenditure and weekly income. This is very important. We chose Scatter
chart because we wanted to keep track of those pairs of values. For example, the point
highlighted below illustrates the pair of values (187.05, 12.47) found in row 6 of your table.
....  ..
:·
40
'
�5
• I
6:0
... ..
...  .... :
25
•
.... .. 
••• �
2:0 ......
#"• ,. •• • •seriesl
'15
. ..... .
....  '
.I'\.
10
_"t I
Serier 1 Point "187 . 1>5000·3 "1
[1!87.050003, 12.47] I
0 I
When we select two columns of values to plot on a Scatter chart, Excel, by default, represents
values from the first column on the horizontal axis and values from the second column on the
vertical axis. So, in this case, the expenditure values are illustrated on the horizontal axis and
income values on the vertical axis. Indeed, you can see that the scale of the values on the
The Simple Linear Regression Model 21
horizontal axis corresponds to the one of the food expenditure values in column A, and the scale
of the values on the vertical axis corresponds to the one of the income values in column B.
We actually would like to illustrate the food expenditure values on the vertical axis and the
income values on the horizontal axisopposite of what it is now. By convention, across
disciplines, the variable we monitor the level of (the dependent variable) is illustrated on the
vertical axis (Yvariable ). And by convention, across disciplines, the variable that we think might
explain the level of the dependent variable is illustrated on the horizontal axis (Xvariable).
In our case, we think that the variation of levels of income across households might explain the
variation of levels of food expenditure across those same households. That is why we would like
to illustrate the food expenditure values on the vertical axis and the income values on the
horizontal axis.
X= Income
If you look up on your screen, to the right end of your tab list, you should notice that Chart Tools
are now displayed, adding the Design, Layout, and Format tabs to the list. The Design tab is
open. (If, at any time, the Chart Tools and its tabs seem to disappear, all you need to do is to put
your cursor anywhere in your Chart area, leftclick, and they will be made available again.)
Microsoft Excel �i Ch
� a rt
Ta_
· ·a_
�
� ��� 1

Vlew Addms Auobat DeiTgin [;iyo.ut Format
Chart SlylH
Go to the Data group of commands, to the left, and select the Select Data button.
Swit�n Select
Row/CO·IUrtll!l Datot'(
D.ata �
22 Chapter 2
'
Select Datil Source 11]�
Cbart Qata range: llf@ll!·MRll
rr==1 [ � S�itch,RowfColumn ]�
Le!jel'ld Entries �er,ies) Horizontal (§_ateljory) Axis Labels
���=>'!"'=='=�=rr ����:.
[ '§l Md )I CT? E:irut J[ X ;B;emove JI 'It I ' :r/�,
°()
Seriesl 115.220001.
l:J.5.979996
119 .. 339996
114.959999
187 .. 050003
In the Edit Series dialog box, highlight the text from the Series X values window. Press the
Delete key on your keyboard. Select B2:B41. Highlight and delete the text from the Series Y
values window. Select A2:A41. Select OK.
  
c__
________ _�
[i]
_, s..., Range m ett'lang�
.Series �values: Series� \lalrues::
��
ifimiiim
m1iiq,iio1:ii
1.••'41:rl
l!ii .11rli 111,ji]
 ri
ii �a. = iu. 22000 i, i3... I�=_Sh _ e_e t_1!_$8_$_2: $8_ _$4_i
___ �[iJ � 3 .. 69, 41.39, 4....
'�
'
OK iJ I Canrn ] OK t)l 1 Cancel l
The Select Data Source dialog box reappears. Select OK again. You have just told Excel that
income are the Xvalues, and food expenditure are the Yvaluesnot the other way around.
600
500
400
•
+
300
+ •seriesl
•• •
200
100
() 2() 30 40
The Simple Linear Regression Model 23
Now, we would like to do some editing. We do not need a Legend, since we have only one data
series. Our expenditure values do not go over 600, so we can restrict our vertical axis scale to
that. We definitely would like to label our axes. We might want to get rid of our Gridlines, and
change the Format of our data series. Finally, we would like to move our chart to a new
worksheet.
Select the Layout tab. On the Labels group of commands, select Legend and None to delete the
legend.
��ila�T�olt
[;J l"i:l � lib] lil 1
11
Chart Axi·s Ltgen<11 Data Data Non<'
�Label�
De� ta.yo;!) Fermat
InleT nt1e1. �
Labers
T 1able.
Select the Axes button on the Axes group of commands. Go to Primary Vertical Axis, and select
More Primary Vertical Axis Options.
Show Axis fn !lBllons
Display �.xls with numbers
'e�resente:d in Billions
A Format Axis dialog box pops up. Change the Maximum value illustrated on the axis from
Auto to Fixed, and speci fy 600.
Next select Alignment, and use the arrowdown in the Text direction window to select Rotate
all text 270°.
I I
.ABC Horizontal
!\lumber
Line Color I
Alignment
line St>jle
Shadow
Te�tlay,,ut
I
\l_erbcal �lignment: Middle Cente.,, I v
• Rotate all text 210°
�
1..
J0 f()rmat
Alignment�
Teir! direction: IHorizonral
C!!_•tom
. "r;ge:
I
"'J rn c:
Stacked
4T .I i,,�
24 Chapter 2
Place your cursor on the upper blue border of your Format Axis dialog box.
Leftclick, hold it, and drag the box over so you can see your chart; release. Look at the vertical
axis of your chart.
The numbers are now displayed vertically instead of horizontally, but less of them are displayed
as well:
00
00
a
a
v
00
00 0
a
"'
00
Select Axis Options again. Change Major unit from Auto to Fixed, and specify 100. Select
Close.
Back to the Labels group of commands; select Axis Titles, go to Primary Horizontal Axis
Title, and select Title Below Axis.
N�me
Do not cd'i1pll!y�nAl<i< Title
Ol·art Axir
Titlies t&
Legenlli Dat.a Datil I� Prirnt•ny !fori>:o°'tal !bi< TlUe �· Trtle Selow Axis
TrtlP · ta.be!'� · Table· Disp!ay Tiflf' belOJ•W Ho ri;zontal t.xis md f°".
�
· label�
� Prin:ui:yyentlcal Axil Title � re<Lze cha·rt
The Simple Linear Regression Model 25
Select the generic Axis Title in the bottom of your chart and type in x =weekly income in $100.
cr.:: 
... x= ;t?
weekly income in S10�J
[!J �
Go back to Axis Titles, then to Primary Vertical Axis Title this time. Select Rotated Title.
None
Do nett dl!1Play a.n Aili� Trtle
Chart Axisc Legend Data Dm Primary Horizontal Axis m1e � Rotated' rrtie
Tiitle
� 1iit1E§ N 
Labels� Ta.hie
P1im;:11y Ye rtical �j5. Tltrt
[}i;sp. �a.y Rc.tt iitedl 11.Jcf,5 liitfe and' mile �
"'S labels clnart
Select the generic Axis Title on the left of your chart and press Delete, or put your cursor on top
of the Axis Title box, leftclick, and press the Backspace key to delete the generic Axis Title.
Type in y =weekly food expenditure in $.
1:1
·1
�I
�I
.,, I
=1
i1I
al
.,, I
1111
:i,1
I ...
1
I 111 I
I 1111
I :: I
I .,
o}"j
Back to the Axes group of commands now. Select Gridlines. Go to Primary Horizontal
Gridlines, and select None.
�I
� 
Axes Grldttnes !iii l?fim a ry .t!o rilzontal Gr�d Ii roes � � M.aj'or Gr[dlirie5
i\xe5
�� "lilJ l P1imary :\[errtic.al GrldITne;; "\ Dhplay . Hmizontaf G.� icllun es for Major units
Change the Current Selection (group of commands to the far left) to Series 1 (use the arrow
down button to the right of the window to make that selection). Select Format Selection.
Fs ] _j.· . �rRf'Sl w]
� E=ornna.t Selection � l<q,, i'ormat Sell'ction�
� Rfid to M'atcll 'Styl� tij Reset to Matcll S:tyl·�
CurrentSeli:ction Currenl Selection.
26 Chapter 2
A Format Data Series dialog box pops up. Select Marker Options. Change the Marker Type
from Automatic to Builtin. Change the Type and the Size as shown below:
Marker Type,
0 �bltoma1ic
0 NQne
@ Buili:4n
Type:�
Si2e: a
Next, select Marker Fill. Change it from Automatic to Solid fill. Color options pop up. Change
the Color to black. Select Marker Line Color, and change it from Automatic to No line. Select
Close.
@ ;i.ondfill

Marker Fill
·••!'�" il.'11:1 .. •�;�] 0 !?r,.dientfill
Marker Line Color
0 tlofill 0 !:'.icture or te�ture fill Markerfll
� rn
���::;:�tfi
line
Series Options 0 Al,!toma1fc Line Color
N
�olid line
Marker OptiOMS ll D Y:ary colors by poin.t
line Style
��I
0 f:ic.lure or texiure fill 0 i;;radient line
Marlcer Fill
·� @ Ab!toma1ic
�r..lor.:
� � Markerli"1e Color� ®

Ay_toma1fc
 I'
1 11 Close
The result is a replica of Figure 2.6 p. 50 in Principles of Econometrics, 4e: (if it looks like some
of your dots are little flowers, leftclick your cursor anywhere on your screen first)
.... .. ,
D I
D .
�

.!ii 0
D
I!! VI .
" .
:t: ..
p
.,, D �
c
... . . . •
8. . .
.
>< D . . . .
llJ 0 . .
.,, m . . . . .
0 . . ..
.g D . . .
::.. 0
. . .
::;;: "' . .
II .
I
"' D . . . .
ii: 0
.....
II
::..
0
0 5 10 15 20 25 30 35 40
:
x� w�eldv inoome in $100
....
I
2.1.2d Moving the Chart
Go back to the Design tab. (Remember if you don't see your Chart Tools tabs, what you need to
do is place your cursor in your chart area and leftclick). Select the Move Chart button on the
Location group of commands to the far right of your screen.
Ch.a.rt
li>esngn
T110!5
�: Layout Format
Move
Cha
��rt �<
loGJhcn;
I
The Simple Linear Regression Model 27
A Move Chart dialog box pops up. Select New sheet and give it a name like Figure 2.6. Select
OK.
Rename Sheet 1 Data (if needed, see Section 1.4 of this manual on how to do that).
We have plotted our data, and edited our chart. Next, we want to estimate the regression line that
best fit the data, and add this line to the chart.
In this section, we are going to use two different methods to obtain the least squares estimates of
the intercept and slope parameters {31 and {32. Method 1 consists of plugging in values into the
b1 and b2 least squares estimators' formulas. Method 2 consists of making use of Excel builtin
regression analysis routine.
(2.2)
These formulas are telling us two things: (1) which values we need, and (2) how we need to
combine them to compute b1 and b2.
We need the (xi, Yi) pairs of valuesthey do appear explicitly in equation (2.1). We also need x
and y, which are the sample means, or simple arithmetic averages of the xi values and Yi
valuesthose averages appear both in equation (2.1) and equation (2.2). Note that the subscript i
in xi and Yi keeps count of the x and y values. In other words, i denotes the ith value or ith pair
of values. Also, x and y, are referred to as "xbar" and "ybar".
28 Chapter 2
The numerator is the sum of products; L is the Greek capital letter "sigma" which denotes sum.
The first term of each product is the deviation of an x value from its mean (xi x). The second

term of each product is the deviation of the corresponding y value from its mean (yi y). The 
products are computed for each (xi,yJ pair of values before they are added together.
The denominator is the sum of the squared deviations from the mean, for the x values only. In
other words, each x value deviation from its mean is first squared, and then all those squared
deviations values are summed.
This equation tells us to multiply b2 by x, and then subtract this product from y. Note that b2
must be computed firstbefore b1 can be computed.
There is actually no magic to this. We use the food expenditure and income values we have
collected from our random sample of 40 households, and perform simple arithmetic operations to
compute the estimates the intercept and slope coefficient of our regression line.
As for the computation of b1 and b2 itself, there is only one trick. We need to make sure we
know which values are the x 's and which ones are the y' s. So, we are going to start by adding
labels to our columns of data.
You should be in your Data worksheet. If not, you can go back to it by selecting its tab on the
bottom of your screen.
Select row 2 and insert a new row (see Section 1.4 of this manual if you need help on that). In the
new cell A2, type y; and in the new cell B2, type x. Rightalign Al :B2.
I A I B
j' jfood_exp income
_I_J 'J x
Next, we need to lay out the frame of the table where we are going to store our intermediate and
final computations. Type x_bar=in cell D2, y_bar=in cell D3, b2 =in cell D6, and bl=in cell
D7. In cell G2:J2, type x_deviation, y_deviation, (x_dev)(y_dev), and (x_deviation)2,
respectively. (Note that you can use your Tab key, instead of moving your cursor or using the
Arrow key, to move to the next cell to your right).
The Simple Linear Regression Model 29
D E 'F G H I J K
·
2 x_bar= J:<�delliatiory_delliatior (x_dev)(y !ex deviation
_ )2
J. y_bar=
4
5.
& b2 =
7 b1 =
Below x_deviation we are going to compute and store the deviations of the x values from their
mean. Below y_deviation, we are going to compute and store the deviations of they values from
their mean. Below (x_dev)(y_dev), we are going to compute and store the products of the x
deviation and they deviation for each pair of values. Finally, below (x_deviation)2 we are going
to compute and store the x deviations squared.
To show the 2 of (x_deviation)2 as a square, place your cursor in J2, if it is not already in it.
Move to the Formula bar to select the 2, and select the arrow to the right comer of the Font
group of commands.
A Format cells dialog box pops up. Select Superscript and then OK.
�_nt_; _________, F �� nt _s cy
r � le_: __ �iz _e:_____,
r � r
Arial Regular 10
Underline : C.ol on
,,_Non e .,.�1 1 Automatic v I D 't!i.ormal font
.Effects.
I g��::�ut
Osul;i_saipt
This is a TrueType funt. The same fonh'lliTI be used on both y0ur printer.and your
ween.
OK� [ Cancel
In cells D6 and D7 proceed to format the 2 and 1 of b2 and b1 as Subscripts instead. Bold all
the labels you just typed, and Align Right the ones from G2:J2. Finally, resize the width of
columns G:J to accommodate the width of its labels (see Section 1.4 of this manual if you need
help on that).
30 Chapter 2
l'1P'I D j E I F I G I H I I I J
2 )( bar=
 
!<_:deviation ·y_devia1io11 (�_lfev'}()'�dev) 1(x�d'evi11tionf I
3 y_bar=
4
__§_

6 bl=
7 b1 = l " I
We have computed averages before. The formula you should have in cell E2 is
=AVERAGE(B3:B42), and the one in cell E3 is = AVERAGE(A3:A42). Compare the averages
you get to the sample means of Table 2.1 in Principles of Econometrics, 4e (p. 49); they should
be the same.
D I E I F I G I H I I I J
1:_ x bar= 19_60475 1t _devfatfon l..Y. de
' viation lx dev)(y_d!ev) (1<_ deviati'onf
_
� y_bar= 283.5735

4
_j_
6 b:z=

7 b1 =
Next, we want to compute the deviations. Think about what you are trying to compute. And then
type the needed formulas in G3:J3.
You should type =B3  E2 in cell G3, =A3  E3 in cell H3, =G3*H3 in cell 13, and G23A2 in
cell J3. Here are the values you should get:
D I E I F I G I H I I I J I
2 xbar= 19.60'475 x_deviation y_d'.eviation (x_�ev}{y_d:ey] (x_dE:Jviaticrnf
,__
J y_bar= :283.5.735 15_9 1 4 7 501 16.8_353498 2679. 303845 253_2792692
,_
4
>
2
6 b2=
I
7 b1= I
Now, in cells G3 and H3, we gave cell references E2 and E3, where the averages are stored. Note
that we will need to use those averages again, and get those averages from these same exact
locations, to compute the deviations of the next 39 observations.
So, what we actually need to do is to transform these Relative cell references (E2 and E3) into
Absolute cell references ($E$2 and $E$3). This will allow us to copy the formula from G3:H3
down below without losing track of the fact that the values for the averages are stored in cells E2
and E3.
A Relative cell reference is made into an Absolute cell reference by preceding both the row and
column references by a dollar sign. Place your cursor back in cell G3 (i.e. move your mouse over
and leftclick); in the Formula bar, place your cursor before the E and insert a dollar sign (press
the Shiftkey and the $ key at the same time); move your cursor before the 2 and insert another
dollar sign; place your cursor at the end of the formula and press Enter.
Go to cellH3, and add the needed dollar signs there too. Now, you can select G3:J3. Select
Copy on the Clipboard group of command. Select G4:J42, and select Paste (next to Copy). You
have just copied the formulas to compute the needed deviations for the rest of the (xi, Yi) pairs.

D I E I F G H I J 1
2
I
xbar= 1 9 60475
_
:C�d!.Y!a�t!C?:'l J�d!:'!l'!;t!�n.. J����Y1!�U!�'!.t Lx�d_e_v11!.'l�'!t.
y_bar 283.5735 : 15 9147501 �68 353498 267930�845 253.2792,692'
�
4
=  ,_
15214!501
 _
t
14.8547501 _ 3
6 b'2=  13 51475 01
_ 168_6135 221!8_886121 184.27363891
7 b1 = 7 13475005 96.52349£3 681!.<6710199 50 .9'Q4,65 828

 _
Place your cursor in cell E6, and again think about what you need to compute b2. Recall that the
least squares estimators are:
= L(Xi  .i)(yi  y)
b2 2 (2.1)
L(xi  x)
(2.2)
If you refer back to equation (2.1), you can see that =SUM(I3:142)/SUM(J3:J42) is the formula
you need in cell E6. The one you need in cell E7 is =E3  E6*E2 for equation (2.2).
        
A I B I c I D I E I F I G H I I j
2 y x x bm= 1960475 x_deviation y_deviation lx_dev')(y_d�ev) 1(x_deviatio·nf
3 115.22 3!69 y_bar = 283.5735 15,.9N7501 168.3 53498 2679.303845 253.279269'2
4 135.98 4.39 151214 7501 147 5935 03
_ 2245_5 98251 23148861911
5
�
119.34 4.75·

14.8.547501 1'64.233503 2439.64 7641 220•.66�599
6 114.96 6.031 �= 10.2096:4 ·13.5747501 168_6135 221! 8.8 86121 184.273838 9
7 187.05 1247 ht= 83_41501 7 13475005 9 6_ 5234%3 688:6710199 50 90465828

 _ _
In the table above we obtain the same exact least squares estimates as those reported on p. 53 of
Principles of Econometrics, 4e.
That was Method 1 of obtaining the least squares estimates of the intercept and slope parameters
/Ji and {32. For Method 2, we are going to use the Excel builtin regression analysis routine.
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
32 Chapter 2
If the Data Analysis tool does not appear on the ribbon, you need to load it first.
Select the Office Button in the upper left comer of your screen, Excel Options on the bottom of
the Office Button tasks panel, AddIns in the Excel Options dialog box, Excel Addins in the
Manage window at the bottom of the Excel Options dialog box, and then Go.
. 
! Excel Options
Popular
Fcrmurlas
Proofin.!1
'iave
Advanced
In the AddIns dialog box, check the box in front of Analysis ToolPak. Select OK.
!!ddIns available.:
1(8! .. ·iiirlj
0 ·•· mmiiij \
,___K
D _ .I'...
=<"'P
I
O AnalysisTo dlPak  VB A
Now Data Analysis should be available on the Analysis group of commands. Select it.
A Data Analysis dialog box pops up. In it, select Regression (you might need to use the scroll up
and down bar to the right of the Analysis Tools window to find it), then select OK.

, Data An alysi s [1.JL.8]
�rna'lysis Tools
'HistIJgram
Movil]g Average
Random Number Gener.ation
Rank arnl Percentile
tfelP'
Re ESSIDn
Sampling
tTest: Paired Two Sample filr Means
tTest: Two·Sample Assuming Equal Variances
tTest: TwoSample Assuming Une:qual Variances
zTed:Two Sam�e for Means
The Regression dialog box that pops up next is very similar to the Edit Series box we
encountered before (see Section 2.1.1). Place your cursor in the Input Y Range window, and
select A3:A42 to specify theyvalues you are working with. Similarly, place your cursor in the
Input X Range window, and select B3:B42 to specify the xvalues you are working with. Next,
place your cursor in the New Worksheet Ply window and type Regressionthis is going to be
the name of the new worksheet where Excel regression analysis results are going to be stored.
Select OK.
The Simple Linear Regression Model 33
r  
1 Regressfon l1J �
lilput
[nputJ_Range;
lnput:KJRange;
I :$A$.3:$A$42
I '$8$3::$13$42
�
�
�
1jelp
O!oabe:ls D ·Constant is fero
D Confidem:e Level: �%
0Ulp11I options.
Q .Quj;put Rllflge: �j
0 New l!'JQrkslieet:pJy� I Regre�sionl I
0 New �orlibook
Reslduals
013.esiduals D Re:sigual P:lotE
Ostandardized Residuals D L!i:ie: RtPlots
Normal Prcliabihty
D·!'.!orrnal'Probability Plots
The Summary Output that Excel just generated should be highlighted as shown below:
.,, A B c D I E I F I G H r J
1 SUMMARY OUTPUT
2
3 Regression Sraiistics
4 MultiplB R 0.0.204.85
.5 R Square 0_385D_Q2
& Adjusted F 0.368S·16
7 Standard :E .89.517
B Observat.io 40
9
iO ANOVA
11 df SS MS F :Qrr;ficarerc F
12 R1l9ressio1 1 190627 190627 23.78684 USEOS
1 3 Residual 38 304505.2 8fil13_294
14 Total 39 4951'32.2
15
16 CoefficienManaa'«i E:m t·Sfat PvaJi.Je l..ower 95% UpDer 95%.ower 95. OfJipper 95. 09�
1 7 lntemept 83_41ifiQ1 43_41orn B21.518 ()'_Qfi2182 4.4•fi327 1712953 4_46327 HL2953 
18 X Variable 10.20964 2.G9326J· 4.87138:1 t.95.E05· 5.972052 14.44723 5.972(}52 14.44723
19
20
21
•·
22 I L 1
Select the Home tab. In the Cells group of commands, select Format, and AutoFit Column
Width; this is an alternative to adjust the width of the selected columns to fit their contents.
=
,._ n�
rn · �
EB n Cclumn'Width ...
:;:
Autolt=ft CoEUlll'l'li'I Wi1dth.�
�ef,;ult Width ...
34 Chapter 2
A I B c I
D I
E I F G H I
1 SUMMARY OUTPUT
f
2
3 Re_qression S/:alisfics
4 Multi'f1leR O.S20485472
5 R $:quare oiSS001Z22�
6. Adjus1ed R Squ;;ire C1.:Jliea1 sos9
7 Stal'ld<ird Errnr 89.51700429
1i OhSe.l'Vatlon s 40
9
10 AN OVA
11 ,rJf SS MS F SianificaQce f
12 R"J:rr:e<ssicm 1 13062�.�788 190626.9788 23".7$884'1 Q7 1.94586E05
13 Residual 38 )0450.5.1742 8013.294058
14 Tota.I 39 4951'32.153
15
16 Coefficients Slandani Error t Stat Pvelue Lower95% Upoer95% LDwer 95.0%. Uppe.t 95. 0%
17 lntemef11 83.4""'16U0997 43.4"1016.1921.9215779�1 0.06,2182379 4.46:1267721 11129.s2srr 4.4632&n21 1112952877
Hl X Variable 1 10.2095425 2.0!t3263461 4.BTTJB0554 1 .94586EO� 5.97205221f2 14.4472328 5.972052202 14.447.2;328
The least squares estimates are given under the Coefficients column in the last table of the
Summary Output. The estimate for the Intercept coefficient or b1 is the first one; followed by
the estimate of the slope coefficient (X variable 1 coefficient) or b2. The summary output
contains many other items that we will learn about shortly. For now, notice that the number of
observations or pairs of values, 40, is given in cell BS.
A convenient way to report the values for b1 and b2 is to write out the equation of the estimated
regression line:
Yi = 83.42 + 10.21xi (2.3)
Now that we have the equation of our straight line, we would like to graph it. This is what we are
doing in the next section.
There are different ways to draw a regression line. One way is to plot two points and draw the
line that passes through those two pointsthis is the method we are going to use first. Another
way is plot many points, and then draw the line that passes through all those pointsthis is the
method that Excel uses in its builtin features we are going to look at next.
When we draw a line by hand, on a piece of paper, using a pen and a ruler, we can use any two
points. We can extend our line between the points, as well as beyond the points, up and down, or
right and left. Excel does not use a ruler. Instead, it uses the coordinates of two points to draw a
line, and it draws the line only between them. So, to have Excel draw a line that spans over the
whole range of data we have, we need to choose those two points a little bit more strategically
than usual.
The Simple Linear Regression Model 35
If you look back at your scatter chart (Figure 2.6 worksheet) or back in your table (Data
worksheet), you can see that our x values range from about 0 to 35 (from 3.69 to 33.4 exactly).
So, we choose our first point to have an x value equal to 0, and our second point an x value of
35.
The point with an x value of zero is our y intercept. It is the point where the line crosses the
vertical axis. Its coordinates are x = 0 and y = b1 or (0, 83.42). This is our first point.
For our second point, we let x = 35; plug this x value in equation (2.3), and compute its
corresponding or predicted y value. We obtain:
Go back to your Data worksheet (if you are not already there). In cell Ll, type Points to graph
regression line. In columns L and M we are going to record the coordinates of the two points we
are using to draw our regression line. In cell L2, type y; in cell M2, type x. In cell M3, type O; in
cell M4, type 35. In cell L3, we actually want to record the value for our y intercept or bi, which
we already have in cell E7. So, we are going to get it from there: in cell L3, type= E7, and press
Enter. In cell L4, we want to have the computed predicted y value from (2.4). So we type
=E7+E6*M4, and press Enter. Note that instead of typing all those cell references, you can just
move your cursor to the cells of interest as if you were actually getting the needed valuesthis is
a very good way to avoid typing errors. So, you would type the equal sign, move your cursor to
E7 and leftclick to select it, type the plus sign, move your cursor to cell E6 and leftclick to
select it, type the asterisk, move your cursor to sell M4 and leftclick to select it, and finally press
Enter. Once you have done all of that, your worksheet should look like this:
L J M J N
1 P'oints fo graph regre.ssion line
2 y. x
,_ ..
j
83_41601 0
,_l_
4 440.7535 35
Note that the predicted y value we obtain in the worksheet for x = 35 is slightly different than
the one we just computed in equation (2.4) due to rounding number differences.
Now, go back to your Figure 2.6 worksheet. The data we have plotted on the chart represent one
set or series of data. The two new pairs of values we want to add to this chart represent a second
set or series of data.
Select the Design tab, then the Select data button from the Data group of commands.
Chart loCJh
In the Legend Entries (Series) window of the Select data source dialog box, select the Add
button.
,.. _ _;____ .
JP
Legend Entries §eries)
Place your cursor in the Series X values window of the Edit series dialog box, and select
M3:M4 in the Data worksheet. Place your cursor in the Series Y values window (delete
whatever is in there), and select L3:L4 in the Data worksheet. Select OK.
· � dit Series
�
 rli �
Series.name:
[�] :deURlitl!JF
Series.� valLles:
=Dara1�$3:.$M$4 � = 0, 35
GK Can(eJ
The Select data source dialog box reappears. A second data series, Series2, was created from the
selection you just specified. Select OK.
The two points from your new series are plotted on your chart (squares below):
:
.. .. ..
0
D .
�
"II>
.5 D
0
:!! Lil
.
.
"'
..
D
.. •
.., 0
., .
.., ··. .
K .
.
" D . .
.
.. D . . .
.., "' . .
. .
.. . . . .
4! 0 . .
r
D .
J:
""'
N
. .
. .
.
.. .
D . . .
� ;'; II
.
II
;=,.
D
0 5 JlO 15 20 25 SD 35 40
Now, we need to draw a line across those two points. Go to the Layout tab. Change the Current
selection (group of command to the far left) to Series 2 (use the arrow down button to the right of
the window to make that selection). Select Format selection.
!series 2. 1. I SerHeS 2
i�
L� Fi;nma,t_SelectiCJ�
�
Chart roars
� Form<>t S:l'll: 'rtior:i
I � Rrset to Matcl'.I Sfyl<". � Resetto Match Style
[}esign �ayout ts Fmmat C:unenlS:ele'Cllon Current 5clection
A Format data series dialog box pops up. Select Line color and change its selection from No
line to Solid line. Select Close.
'"'
,11,!1.ur.1 •lm.�w"1...:<:J] line Color

�so'ii"cfl 1ile1
Series. Options.
0 r:!o Line
Marker Optlons I
�;�di�tlne
Marker Fill 0 Ay_tomatk
I
(;_olor;
11
I
0
0
lD
�
.5 0
0
E lf"I
=
:t: 0
g 0
..
<;!"
l
x 0
111 0
"Ill f"l
..
.s 0
0
z IN
...
111
111 0
� 0
rl
II
::..
0
0 5 10 15 35 40
Note that while you need only two points to be able to draw a straight line, you can use more than
two points. So we could have computed a predicted level of food expenditure for every level of
income we have in our original data set, and use the 40 (xi, .Ya pairs of values as our data Series
2. This is actually what Excel does when it adds a Linear Trend Line to a Scatter chart or a
Line of best Fit to Plots of data as part of the Regression Analysis routine.
We are going to delete the line and two points we just added to our graph and successively look at
these other two ways to plot our regression line.
38 Chapter 2
In the Design tab, go back to the Data group of commands, and select the Select Data button. In
the Select Data Source dialog box, select Series2 and Remove. Finally select OK.
J
r �S\'!']t:h Row/C<;;fumn
Chart Tool!
To add a Linear Trend Line, select the Layout tab. Go to the Analysis group of commands,
select Trendline, and then Linear Trendline.
No.ne
Removes the <etected Tr..r1dline OJ all
' Trendlines ili none are selerted
1 Lines UpiDmwn Error Uneatr Trend nne
Layout � Format
Bars·
i!>.n�lysis
Bar1 •
.Ad1'sfse1s a UneafTrendHne for the
�elected chart ser�e�
"'
"i I
Your chart should look like this (see also Figure 2.8 p. 54 in Principles ofEconometrics, 4e):
0
0
ID
...,.
.! 0
0
� If)
.
"
·" ..
0
.., D
" ...
w..
" a
OJ 0
.., m
0
.e ·O
�
.a
0
N
.,
OJ a
3 0
rl
II
i:
·o
0 5 10 15 20 25 30 .35 40
x� weeklyiru:ome ini$1IDO
You can also have Excel add the Line that best Fit your data by choosing that option on the
Regression dialog box.
Select the Data tab, located in the middle of your tab list. Select Data Analysis on the Analysis
group of commands to the far right of the ribbon. Select Regression in the Data Analysis dialog
l1J (g]
box, and then OK.
 
a _ _n al . 
:' Dat_ A ysi s
�alysis Tools
Covariance
Descriptive Sratisties
Exponential Smoothing
F'Test TwoSample fur �ariances
Fouri:er Analysis I t[elp
Hi.s�ram
M.:wing Average
Random Number Gen�ation
Rank and Percentile
1..F
o _
r m_
u a�_ o_al!...._,
s ____ a t! Review
Analysis
Re ess1on V'
In the Regression dialog box, proceed as you did before, except this time, name your worksheet
Regression and Line, and check the box in front of Line Fit Plots. Select OK.
Output options.
0 QutputRange: �1
0 New Worksheet!ely: I Regression anci Line I
0 New W.orkbook
Residuals
D Residuals
D siandar.dized Re.siduals
In addition to the Summary Output you now have a Residual Output table and a Chart in your
new worksheet. The Residual Output table is only partially shown below, and shown after
AutoFitting the Column Width (see Section 2.2.2 for more details on that).
The Predicted Y or Yi values have been computed for all the original observed xi values,
similarly to the way we computed y for x = 35 (see Section 2.3.1).
(2.5)
You can compare the Predicted Y and Residuals values reported in the Excel Residual Output
to the ones reported in Table 2.3 of Principles of Econometrics, 4e (p. 66). They should be the
same.
40 Chapter 2
Now, the chart needs a little bit of editing. For one it looks like it is a Column chart as opposed
to a Scatter one. The scales could be changed. Finally, Chart and Axis titles are not currently
very helpful.
Place your cursor anywhere in the Chart area, and leftclick, so that Chart Tools are made
available to you again. Select the Design tab. Go to the far left group of commands, Type, and
select Change Chart Type. In the Change Chart Type dialog box, select X Y (Scatter) chart,
and then Scatters with only Markers. Finally, select OK.
, 
Templates
lltill Co1umn
� Line
@ Pie
� Bar
Chi!rt 1oor.s
� Area
11:1 XY (Scatter) �I

.•'4
,.. so: .
I
0.

w
J!�
30 40 • �redicted V
X \I ariable 1
Now that we have the correct chart type, we would like to draw a line through all the Predicted Y
points. Actually, since we are using those points to draw our regression line, what we want to
show is only the line. So, we will use the points to draw the line, and then get rid of those big
square points. This way our chart won't be as busy.
On your chart, select the Predicted Y points with your cursor. Your cursor should turn into a fat
cross as shown below:
I (26.6100CJI, 5.0946()'71)
35
11 30 40 • Pr<edicted Y
XVariable 1 XVariab'le 1
The Simple Linear Regression Model 41
Rightclick and select Format Data Series. A Format Data Series dialog box pops up. Select
Line Color and Solid line. Change the line color to something different from the Y points.
Select Marker Options, and change the Marker Type from Automatic to None. Select Close.
Qelete
� Reset to MQtch Stylle 
r 
Marker Option.
0
0
�olidline
§_r adientftne.
Series Options
Marker Options
Marker Type
Adlf: Data. La.Q�f>
0 A�toma1ic
0 A�toma1ic
�
Marker Flll
Adc!Trendline... Marker Fill

Predkted'!I
)( Va rfable 1
On your chart, select the Legend with your cursor, rightclick and select Delete.
1 ,J'' t\;1
A Eont...
,_ 500
Clilange Cnart TYJ:H' ...
0
0 10 20 30 40 � :S:�lect Data ...
3n _E'.nt;;ilon
ICVaria'ble !I.
� Eor_mat Legen.a...
Change the Chart and Axis titles as you see fit. Below, we show you how you can change the
Chart title. You can follow a similar process to change the Axis titles.
; ) 500
0
I ..
HJ 20 30 40
XVariable 1
42 Chapter 2
G _i;i
l X rVariahle ll Line Fit Plot l
woo
lch>rtTIle;
�  1T   0
•
> 500
0 I
0
..... I
10
' ••, ...
·�=··�. !
30
.
40
X Varlab'le l
You can select any of the titles and change the Font size by going back to the Home tab. Select
what you need on the Font group of commands.
You can reformat the yaxis (and/or the xaxis) by selecting it with your cursor, rightclicking and
selecting Format Axis.
Q.elete
..
� SS:lect Data ...
If you proceed as you did before to edit your vertical axis (see Section 2.1.2a), you should obtain
the following:
'Figure2.8 The frttedl.regres:<ion
To resize the whole Chart area, put your cursor over its lower border until it turns into a double
cross arrow as shown below.
1·
The Simple Linear Regression Model 43
Hold it, and drag it down until you are satisfied with the way your chart looks.
"Cl ro
0
Ji! a
0
? N
""
Ill
0
�" a
..;·
;:a.
a
0 5 10 15 20 25 3() 35 40
You can delete the Gridlines by first selecting them, rightclicking and then selecting Delete.
,. D
II D
"Cl rn
0
_Qelete
.s
1!
O•
0 � 
....
Ill
N
� Re5i't to M;!hh �tyle
II 0
�
"
0
.i oll Change Cha.rt Type ...
;:.. LE@i S.tledi Data...
0
� 3D _Batat1!ln ...
0 :m 20 40
� furm af Grl d l i n, es ...
JI= weeklyinoome iru$10lD
Forma.t Axls...
You can also reformat the Data Series Y by selecting the points, rightclicking and selecting
Format Data Series. Then proceed as you did before to change your markers' options (see
Section 2. l .2c).
44 Chapter 2
� �
0
0 Sgl e ct Da.ta ...
.i
ll
> 3D B.ol:al1on
CJ
Your result might be (see also Figure 2.8 p. 54 in Principles ofEconometrics, 4e):
:f
.2 0
0
"'
. . . .
.. .
II
0
�
Ii
0
.,.,
II
...
0
0 10 20 30 4'()
To show that under the assumptions of the simple linear regression model, E(b1) = {31 and
E(b2) = {32, we first put ourselves in a situation where we know our population and regression
parameters (i.e. we know the truth). We then use the least squares regression technique to unveil
the truth (which we already know). This allows us to check on the validity of the least squares
regression technique, and specifically to check on the unbiasedness of the least squares
estimators.
The Simple Linear Regression Model 45
First, let us restate the assumptions of the simple linear regression model (see p. 45 of Principles
ofEconometrics, 4e):
• The mean value of y, for each value of x, is given by the linear regression function:
• For each value of x, the values of y are distributed about their mean value, following
probability distributions that all have the same variance:
var(ylx) = a2 (2.7)
• The sample values of y are all uncorrelated and have zero covariance, implying that there
is no linear association among them:
(2.8)
• The variable x is not random and must take at least two different values.
• (optional) The values of y are normally distributed about their mean for each value of x:
In the specific and simplified case we are considering in this section, half of our hypothetical
population of three person households has a weekly income of $1000 (x = 10), and half of it has
a weekly income of $2000 (x = 20). Because we are all mighty, we know the values of our
population parameters, and consequently the values of our regression parameters. Let µylx=lO =
200, µylx=ZO = 300, and var(ylx = 10) = var(ylx = 20) = a2 = 2500. This implies
{31 = 100 and {32 = 10.
The probability distribution functions of weekly food expenditure, y, given an income level
x = 10 and an income level x = 20, are assumed to be Normal. They look like this:
 t(vl�=10J
t(vlx=20)
46 Chapter 2
The linear relationship between weekly food expenditure and weekly income looks like the
following:
lJ
300
200
() 10 20
Let us emphasize the difference between this section and Chapter 2 in Principles of
Econometrics, 4e. In this section, we do know the truth. In other words, we have information
regarding weekly food expenditure and weekly food income on all three person households that
constitute our population. In Chapter 2 of Principles of Econometrics, 4e, like it is the case in
reallife, you do not have that population information. You must thus rely solely on your random
sample information to make inferences about your population.
Now, as an exercise, and as a way to prove the unbiasedness of the least squares estimators, we
are going to use the least square regression technique to unveil the truth.
Insert a new worksheet in your workbook by selecting the Insert Worksheet tab at the bottom of
your screen (or Press the Shift and Fl 1 keys). Name it Simulation.
Simu lation�'
We are going to draw a random sample of 40 households from our population. Half of the sample
is drawn from the first type of households, with weekly income x = 10; and half of the sample is
drawn from the second type of households, with weekly income x = 20.
Let us keep records of the level of weekly income for our 40 households in column A of our
Simulation worksheet: in cell Al, type x and RightAlign it; in cells A2:A21, record the value
10; in cells A22:A41, record the value 20.
The Simple Linear Regression Model 47
A A
1 20
2 20
3 10 20
4 10 20
5 10 20
6 10 20
7 1Q 20
8 10 20
9 rn 20
1.0 10 20
11 10 io
12 10 33 20
13 10 34 20
14 10 35 20
15 10 36 20
16 10 37 20
17 10 38 20
1.8 10 39 20
19 10 40 20
20 10 41 20
21 10 42
We use the Random Number Generation analysis tool to draw our random sample of
households. We keep record of their weekly food expenditure in column B of our Simulation
worksheet: type y in Bl, and RightAlign it.
I A I B II
1 J x y
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
Anal1111sc
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
�alysi,.Tools
fTest TwoSample
Fowrier Analysis
Histogram
Movi�verag_e
for \/ariances
,� [
l c:�
DMfti@Miii.ffil§§·@M·'·!· I tfelp I
Rank and Per c:entile
Regression
Sampling
tTest: PairedT,..C>Sample for<Means
1YI
tTest: TwoSample Assuming Equal Variances vi
A Random Number Generation dialog box pops up. Since we are drawing one random sample,
we specify 1 in the Number of Variables window. We first draw a random samples of 20 from
48 Chapter 2
households with weekly income of x = 10, so we specify the Number of Random Numbers to
be 20. For simplicity we assumed that our population of households has weekly food expenditure
that is normally distributed, so this is the distribution we choose. Once you have selected Normal
in the Distribution window, you will be able to specify its Parameters: for x = 10, its Mean is
µylx=io = 200 and its Standard deviation is .Jvar(ylx = 10) = a = 50. Select the Output
Range in the Output options section, and specify it to be B2:B21 in your Simulation worksheet.
Finally, select OK.
M!::,an=
�
Standard deviatior;i = �
�dom S eed;
Output options
0 Quljxit Range;
0 'New Worksheet.�ly:·
0 New Wodcbook
Repeat to draw a random sample of 20 from households with weekly income of x = 20. Change
the Mean to µylx=lO = 300 and the Output Range to B22:B41.
ParametErs
QutpLlt options

M�an=
� I e Qulµ!Jlt'R<lnge:. 1$8$22;$6$41 �
Here is the random sample that we obtained. NOTE: you will obtain a different random sample,
due to the nature of random sampling.
The Simple Linear Regression Model 49
A B A B
1 :x y 22 :m. ·214.6751
2 HJ 122.490&' 23 20 336.57.85
3 11() 163.1711 24 20 303.5467
4 11() 211.0i02 .25 20 .216.4365'
5 10 294.12.95· 26 20 358.9562.
6 10 192.9407 27 20 278.1513
l8 10 116.1414 39 20 273.67.85'
Next, we use the LINEST function to obtain the least squares estimates for the intercept and
slope parameters, based on the random sample we just drew. The LINEST function is an
alternative to using the Least Squares Estimators' Formulas (see Section 2.2.1) or the Excel
Regression Analysis Routine (see Section 2.2.2). It allows us to quickly get the least squares
estimates for the intercept and slope parameters. For this purpose, the general syntax of the
LINEST function is as follows:
= LINEST(y's, x's)
The first argument of the LINEST function specifies the y values, and the second argument
specifies the x values, the least squares estimates are based on. In our case, we thus need to
specify:
= LINEST(B2:B41,A2:A41)
The LINEST function creates a table where it stores the least squares estimates in Excel memory.
It first reports the slope coefficient estimate, and then the intercept coefficient estimate. So, if we
were to look into Excel memory, the estimates would be reported as shown below:
column 1 column 2
rowl
We nest the LINEST function in the INDEX function to get the estimated coefficients, one at a
time. The INDEX function returns values from within a table. In the case of a table with only one
row, the INDEX function general syntax is as follows:
The first argument of the INDEX function specifies which table to get the results from. In our
case, this is the table of results generated by the LINEST function above. So, we replace "table of
results" by "LINEST(B2:B41,A2:A41)". The second argument indicates from which column of
the table to retrieve the result of interest to us. So, if we want to retrieve the estimate of the
intercept coefficient, b1, from the table above, we would indicate that it can be found in column 2
by replacing "column_num" by "2".
We are going to report our estimated coefficients at the bottom of our table. In cell A43, type bl
=; in cell A44, type b2 =. Bold those labels. In cell B43 and B44, type the following equations,
respectively:
A B
43 bl= =INDEX(LINEST(B2:B41,A2:A41),2)
44 b2= =INDEX(LINEST(B2:B41,A2:A41),l)
44 b2= 11.47325
The estimates of the intercept and slope coefficients are based on one random sample. Our
random sample is different than yours, and each random sample yields different estimates, which
may or may not be close to the true parameter values. The property of unbiasedness is about the
average values of b1 and b2 if many samples of the same size are drawn from the same
population. In the next section, we are thus going to repeat our sampling and least squares
estimation exercise.
Go back to the Random Number Generation dialog box. We would like to draw 9 additional
random samples, so we specify 9 in the Number of Variables window. Again, we first draw
random samples of 20 from households with weekly income of x = 10, so we specify the
Number of Random Numbers to be 20. We also select Normal in the Distribution window,
and specify its Parameters. For x = 10, its Mean is µylx=lO = 200 and its Standard Deviation
is .Jvar(ylx = 10) = a= 50. Specify the Output Range to be C2:K21. Finally, select OK.
The Simple Linear Regression Model 51
Parameters
M�an=
�
::i_t:and"rrl dev.ialion = �
8_andom Seed:
Outp;Jt op lions
@ QutputRa'J9e: �$2:$C$21
Repeat to draw a random sample of 20 from households with weekly income of x = 20. Change
the Mean to µylx=lO = 300 and the Output Range to C22:K41.
Parameter.s.
I�
Output apfons
QutputR,ange:
Next, before we copy the formula to get our coefficient estimates, we need to transform their
Relative cell references A2:A41 into Absolute cell references $A$2:$A$41, since we will be
using the same xvalues for our next 9 rounds of least squares estimates.
Copy the formulas from B43:B44 into C43:K44. In cells L43:L44 compute the AVERAGEs of
your estimates from your 10 samples. In cell L43, you should have =AVERAGE(B43:K43); in
cell L44, you should have =AVERAGE(B44:K44). The estimates and average values that we get
for our 10 samples are:
A I B I c I D I E I F I G I H I I I I I I<'. I l
43 bl: 67.64114 65.92893 110.0?45 50.41892. 102.9383 12.7. 2p �6 68.025{)8 30.43498 132..2953 75.4688 89.14425
·
. .
44 . b2: 11.4732.6 12.2687 S:.813088 11.73885 10.11185 8.61•69 11.5.521 10.8758 8.048971 11.33003 10.48296
If we took the averages of estimates from many samples, these averages would approach the true
parameter values {31 and {32. To show you that this is the case, we repeated the exercise again.
Here are the average values of b1 and b2 that we did get as we increased the number of samples
from 10, to 100, and finally to 1000:
The next section of this chapter is very short. It points out how you can compute an estimate of
the variances and covariance of the least squares estimators b1 and b2 using Excel. It also outlines
other numbers you can recognize in the Excel summary output. Note that for this section we are
getting back to our food expenditure and income data of Sections 2.12.3, i.e. data from one
sample of 40 households that was drawn from a population with unknown parameters.
You can compute an estimate of the variances and covariance of the least squares estimators
b1 and b2, the same way you computed b1 and b2. Consider their algebraic expressions (see
below or p. 65 of Principles of Econometrics, 4e), and perform the simple arithmetic operations
needed. You might want to do that as an exercise; you will be able to check on your work by
comparing your estimates to the one reported on pp. 6667 of Principles ofEconometrics, 4e.
Estimates of the variances and covariance of the least squares estimators b1 and b2 are given by:
(2.10)
(2.11)
(2.12)
2 L 2
and 8 = _!J_ is an estimate of the error variance, (2.13)
NK
The square roots of the estimated variances are the standard errors of b1 and b2. They are denoted
as se(b1) and se(b2).
(2.14)
Excel regression routine does not automatically generate estimates of the variances and
covariance of the least squares estimators b1 and b2, but it does compute the standard errors of b1
and b2, as well as other intermediary results.
The Simple Linear Regression Model 53
Specifically, the following estimates can be found in the Excel Summary Output you generated
earlier:
A I B I c I D I E I F G I H I I
� SUMMARY OUTPUT
JI RefJ.e
I Ssfon Statistic:s:
4 Mul tii:>le :R 0_620485472
c5  R Sqllaie 0 _385002221.
� �djus!erl R Square 0.368818059
7 Stan.dard Error 89_51700429
8 Observations 40
efoIANOVA
i! I __
dt SS MS f Sig_niflc1111ce f
'
J1_ f3egr·ession 1· 190 G2'
 &.9788 190626_9'788 .2378884107 1 . 94585E()5

Note that :L if, the Sum of Squared Residuals (SS Residual), is also referred to as the Sum of
Squared Errors  hence the abbreviation SSE used in p. 51 of Principles ofEconometrics, 4e.
Open the Excel file hr. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 2 Excel file, name it pr data, and in it, copy the data set you just opened.
This data set contains data on 1080 houses sold in Baton Rouge, LA during mid2005, which we
are using to estimate the following quadratic model for house prices:
(2.15)
54 Chapter 2
In your br data worksheet, insert a column to the right of the sqft column B (see Section 1.4 for
more details on how to do that). In your new cells Cl:C2, enter the following column label and
formula.
c
1 sq ff
2 =B2J\2
Copy the content of cells C2 to cells C3:C1081. Here is how your table should look (only the
first five values are shown below):
A I B I c
1 pric� sq!f sqft2

2 6:6500 741 549'081
�
3 56000 741 549·081
4 68500 790 624100
1 02000

5 2783 7745089
6 
54000 11165 1357225
In theRegression dialog box, the Input Y Range should be A2:A1081, and the Input X Range
should be C2:C1081. Select New Worksheet Ply and name it Quadratic Model. Finally select
OK.
i Regress.io n ��
Input
I $A 2': $As1os [�
�
.
Input)'. Range: 1
I
A B I c D I E I F I G I H I I
1
_,_
SUMMARY O'UTPUT
,_
2
3 Rec:iression Stab.Slics
4. Multiple R U32075415
5 R .S�uare 0.&92349497
� Adju�.!e<:I R. Sq':J_ar.e

OJj.920£4107
.1__1
Standard Error 68205: 74032
8 Observations 1080
9
10 AN OVA
111 (jf SS MS F Stg_nif.lcar1cr;: F
12 Regression 1 1.1286Et13 1.12B6Et13 2425.976064 3.3748E278
13 Residual  1078 5.0150JE+12 465�21594.26 �
Q_OOQ31"3095 49.2'5419844
The Simple Linear Regression Model 55
Go back to your br data worksheet and select A2:B1081. Select the Insert tab located next to the
Home tab. In the Charts group of commands select Scatter, and then Scatter with only
Markers.
Scatter
9000
8000
700!()
•
6000
5000
4000 • S.erie'l=l
3001[)
2000
10{)0
You can see that our house price values are on the horizontal axis and square footage values are
on the vertical axis; we would like to change that around and edit our chart as we did in Section
2.1 with our plot of food expenditure data. The result is (see also Figure 2.14 on p. 70 in
Principles ofEconometrics, 4e):
150000{)
<I>
.5
100000<0
·�ll.
$
::I .
0 . ..
:c soon no .
·. '
. ':..··�·.��· . ....:· .
' •I II
Finally, we add the fitted quadratic relationship to our scatter plot. In cells Nl:N2 and 01:03 of
your br data worksheet , enter the following column label and formula.
56 Chapter 2
N 0
1 quadratic pricehat sq ft
2 ='Quadratic Model'!$B$17+'Quadratic Model'!$B$18*'br data'!02 0
3 400
Select cells 02:03, move your cursor to the lower right corner of your selection until it turns into
a skinny cross as shown below; leftclick, hold it and drag it down to cell 022: Excel recognizes
the series and automatically completes it for you. Next, copy the content of cell N2 to cells
N3:N22. Here is how your table should look (only the first five values are shown below):
N I 0
Go back to your scatter plot and rightclick in the middle of your chart area. Select Select Data.
In the Legend Entries (Series) window of the Select Data Source dialog box, select the Add
button. In the Series name window, type Fitted Quadratic Relationship. Select 02:022 for the
Series X values and select N2:N22 for the Series Y values. Finally, select OK. The Fitted
Quadratic Relationship series has been added to your graph.
Before you close the Select Data Source dialog box, select Seriesl and Edit. Type the name
Actual in the Series name window. Select OK. In the Select Data Source window that re
appears, select OK again.

elect IJata Source
'
.  
Chart Q.ara range: !==: Edit Seri es
�p
______
Serie�;( values:
=br data'!$A'$2:$A�108 l �
Llit;J
Make sure you chart is selected so that the Chart Tools are visible. In the Layout tab, go to the
Labels group of commands. Select the Legend button and choose either one of the Overlay
The Simple Linear Regression Model 57
Legend options. Grab your legend with your cursor and move it to the upper left comer of your
chart area.
�� [i] �
Chart Tool! Chart Axrs leg.enp'j Data Data
Title· Titles� • [:?Labels• Table
Design
Labels
Finally, we want to reformat our Fitted Quadratic Relationship values series. Select the plotted
series in your chart area, rightclick and select Format Data Series. A Format Data Series
dialog box pops up. Select Line Color and Solid line. Change the line color to something
different from the Actual series points. Select Marker Options, and change the Marker Type
from Automatic to None. Select Close.
Qelete
� Reset to M.!!_!ch S.tyle

Change S:eries Cl'lallt T�!J·e ... I Format Oata s�ries Line Color 
• •
1500000
•
. ..
1h
.5!  Fitte.d Quadratic
1000000 Relationship
�
... •
�
..
0
x SOOOGO
• •
D'
a 2000 4000 6000 8000
D
1 ln(price)
2 =ln(A2)
Copy the content of cells D2 to cells D3:Dl081. Here is how your table should look (only the
first five values are shown below):
A I B I c I D J
I
1 price sqft sqtt2 lnlpric:e)
i 6&500 741 5490�1 1UCi496
3 66000 741 549081 11.09741
,_
4 68500 79n &24100 1113459
I
5 102000 2:183 7745089 11.53273
>
� 54 000 1165 1357225' 10.89674
Next, we specify BIN values. These values will determine the range of PRICE and ln(PRICE)
values for each column of the histogram. The bin values have to be given in ascending order.
Starting with the lowest bin value, a PRICE or ln(PRICE) value will be counted in a particular
bin if it is equal to or less than the bin value.
In cells Sl:T3 of your br data worksheet , enter the following column labels and data.
s T
1 price bin lnprice bin
2 0 9
3 50000 9.2
Select cells S2:S3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; leftclick, hold it and drag it down to cell S34: Excel recognizes
the series and automatically completes it for you. Similarly, select cells T2:T3, move your cursor
to the lower right comer of your selection until it turns into a skinny cross; leftclick, hold it and
drag it down to cell T29. Here is how your table should look (only the first five values are shown
below):
s. I T
, price bin I nprice bin

2 0 9
s J T I 
3 50000 9.2
1 ori&e bin 1 lnnrioe IJ.in
T 1()000.0 9_4
c2
3 500 0a 2
3
9.2+
9:1

5 150000 9.6
, ' . ' 6 200000 9.8
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
F1Jrmulas
The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
The Simple Linear Regression Model 59
�  __ :
=I
FTestTwo5ample for Variances
!:!!elp
Fouirier Analysis
w 1sto ram
MC1ving Average
Random N�mber Generation
Rarik and Perceritile
Regression vi
An Histogram dialog box pops up. For the Input Range, specify A2:A1081; for the Bin Range,
specify S2:S34. The Input Range indicates the data set Excel will look at to determine how
many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it Price Histogram; check the box next to Chart Output. Finally, select OK.
Input
Output options
0 Qutput RQnge:
@New Workshe.etBJy: I Price Hismgr.im
I
0 Ner/11 '8'_arkbm1k
D Pgreto (s(!)rted histogram)
Ooumulative Percentage
�l��·�r.t·90_tiJUt]
Select the columns in your chart area, rightclick and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap.
�r  

'F·�rmat Data Seri e$ (1)�
S"erfe. s Options
Border Color
0 !:!oline
� 8'Jrder Colmr J 0 :i_olldh
Border Styles
0 �radlent line
0 A!!_tomatic
Shado'll'
30
�oler: �
�
Format
Iransp (Co l<>r) Q" �1 Clo
After editing our chart as we did in Section 2.1 with our plot of food expenditure data, the result
is (see also Figure 2.16(a) on p. 72 in Principles ofEconometrics, 4e):
450
400
350
...
"
300
"
250
Ill
"
...
...�
200
150
100
50
Note that the frequencies given in the graph above are absolute ones, while the frequencies given
in Figure 2.16(a) of Principles ofEconometrics, 4e are relative ones.
Go back to your br data worksheet. In the Histogram dialog box, specify D2:D1081 for the
Input Range and T2:T29 for the Bin Range. Check the New Worksheet Ply option and name it
lnPrice Histogram; check the box next to Chart Output. Finally, select OK.
,· Histognm LIJ"�
[nput
lnput RaiJge: 1$()$2:$0$1081 �
!l.in Range: I $T$2�n�9 �
tielp
D�abe1s
Output options
0 QiJlplJtRiOnge: I �I
® New Worksheet Ely:: ItnPcice Hisfugram I
0 New \!!!_orkbook
D P!!!eto (sorted histogram)
D Cumulative Pern:er'll:age
� !;;_h,.rt Output
The final result is (see also Figure 2.16(b) on p. 72 in Principles ofEconometrics, 4e):
The Simple Linear Regression Model 61
25()
200
;,..
"
c::
15()
al
"
1!:11'
� 100
...
50·
"' <:" 00 "" w rl . ..,. 00 "" .., "' ... 00 "' "'
a'i ai 0 c:i ·rl ,,...j ..... ..:; ...; rl ,.,; ,,.; <i
rl ·rl ·rl rl rl .,.., ·rl rl rl
0
�
lnPrloe
Again, note that the frequencies given in the graph above are absolute ones, while the frequencies
given in Figure 2.16(b) of Principles ofEconometrics, 4e are relative ones.
In the Regression dialog box, the Input Y Range should be D2:D1081, and the Input X Range
should be B2:B1081. Select New Worksheet Ply and name it LogLinear Model Finally select
OK.
1' R�ITJ@
Input
InputY. Range:
Input'� Range:
I $0:�::$051081
I sssz: : ��1oa 1
[fil
[�J
� el
!:ielp
Dtoabels. D !Coo stant is ;;'.ero
D Confidence Level: EJ %
Output opb"onSo
0 Qurtput·Rarige� rii J
e New Worksheet E:IY': I Loglinear Model I
.Hs'UMMAR:YA ouTPm I
h1·
B I
}
C. I D I E I F G H I I
3 I Reg_ress1on Stalislics I
,_i_ _Mulliple R 079·(}4.13619
.3 R Square 0.624753·&89
� A·djusted R s.q�are 0.6.24405594
l Standard Error
ti Observations.
0.'.3'2:1465013
1080
10 AN'OVA I
11 I I df SS MS F Sig_nificc11nG"! F 1
1 2 R·egressiun
13 Residual
i 1
1078
1·85.4720974
111.4002553
185.4720_9'74
0 .103339'75 4
1794779738 t1066E231
In cells Ql:Q2 of your br data worksheet, enter the following column label and formula.
Q
1 loglinear pricehat
2 =EXP('LogLinear Model'!$B$17+'LogLinear Model'!$B$18*'br data'!P2)
Next, copy the content of cells Q2 to cells Q3:Q22. Here is how your table should look (only the
first five values are shown below):
=
.. Q
.Lloglinear p.rice hat
_L_ 50949.81045
3 6006Qi.27135
4 70799.7%17
5 83459.681 BJ
6
9B383.3t279
Select your scatter plot of actual data points and fitted quadratic relationship and make a copy of
it. Rightclick in the middle of the copy of your chart. Select Select Data. In the Legend Entries
(Series) window of the Select Data Source dialog box, select the Fitted Quadratic
Relationship series, and then the Edit button. In the Series name window, replace the old name
by Fitted LogLinear Relationship. Select P2:P22 for the Series X values and select Q2:Q22
for the Series Y values. Finally, select OK, twice. The Fitted LogLinear Relationship series
has been added to your graph.
, 
Select Data Source
,...  
chart Qata ranl'Je: ·c= ' Edit 5erit'S
The data range is lo1:1 compi_ex t Series name:
:the. S..ries in the Series pan el .
I ='M�d Log�inear Rela1ionship" m =Fi'
I
Delefe
• •
1500000 • Actual
•
••
4Jl
�=  fittce d Qi.iail n1t(c
.� 1000000 R'e J.atlon:>hip
&: •
�..
0
;c 500000
Open the Excel file utown. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 2 Excel file, name it utown data, and in it, copy the data set you just opened.
This data file contains a sample of 1000 observations on house prices in two neighborhoods. One
neighborhood is near a major university and called University Town. Another similar
neighborhood, called Golden Oaks, is a few miles away from the university.
In cells Hl:H3 of your utown data worksheet, enter the following column label and data.
H
1 bin
2 125
3 137.5
Select cells H2:H3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below; leftclick, hold it and drag it down to cell H20. Here is how your
table should look (only the first five values are shown below):
64 Chapter 2
H
1 bin
I
2 125
H .I f
:j 137.5
1 bin
2 12� 4· 15.Qr
137_�1
,_
5 162_5
3 ,_
' I ,5 175
In the Histogram dialog box, specify A2:A482 for the Input Range and H2:H20 for the Bin
Range. Check the New Worksheet Ply option and name it Golden Oaks Prices Histogram;
check the box next to Chart Output. Finally, select OK.
I H istogram rn�
Jnput
!npwt Range: $!.$2:$<\$482: li3 rn;: 1£]
!:!in Range·:
cancel ]
$H$:2:')H� �
O�abels t:Jelp J
Output 01:rtions
0 Qutput �nge:
0 NeVY Worksheet �ly.; J Oaks Prices. Histogram J
0 Nelil Workbo;;ok
0 P,grero (SQl"ted hisilogram)
0 Cu!!!ulanve Percentage
� Chart Output
The final result is (see also Figure 2.18 on p. 74 in Principles ofEconometrics, 4e):
90
80
70
60
t"
"
ill 50
"'
'Ir"
�
40
ILL
30
20
10
0
125 :1!50 175 201J .225 .25() 275 300 325 350
Note that the frequencies given in the graph above are absolute ones, while the frequencies given
in Figure 2.18 of Principles ofEconometrics, 4e are relative ones.
Go back to your utown data worksheet. In the Histogram dialog box, specify A483:A1001 for
the Input Range and H2:H20 for the Bin Range. Check the New Worksheet Ply option and
name it U Town Prices Histogram; check the box next to Chart Output. Finally, select OK.
The Simple Linear Regression Model 65
. 
i Histogram � t8]
Input
0 :New illorkboo\_
D 'P�reto (scf rted histogram)'
D Cumulative Percentage
� Q.hartOutput
The final result is (see also Figure 2.18 on p. 74 in Principles ofEconometrics, 4e):
90
80
70
50
e
i= 50
�
u..
40
30
20
10
() t..i.,_
125 150 ]75 200 225 250 275 300 325 350
In the Regression dialog box, the Input Y Range should be A2:A1001, and the Input X Range
should be D2:D1001. Select New Worksheet Ply and name it Indicator Variable Model.
Finally select OK.
66 Chapter 2
OK
Input)'Jl.�e: 1
Cancel
Irlpuf� Range.: si:J$2:$Dsioo1
t!elp
Dkabeis. 0 Cons'tant is f:er'o
D Confldence Level:
Output �ptions
0 Qutput Range:
___ _____ _
___
_
l1] �
_11
g r_e_s51_0
� Re0 _ _ _
lnput �
I SA$2: $A, 5WO [ifil I
The result is (matching the one reported on p. I 75 in Principles ofI Econometrics,
I 4e):
[�l
l J
SUMMARY OUTPUT 6=) %
Statfslir::s
MultipleR 0.728744479, �1
Adjusted R Square 0.53106851&. I I
Standard Ermr 28.90745008
Obser\l:alions 1000
A.NOVA
F
A I 8 I 944476.7536'
c I 94447D6.7536 I 11.30.242684
E 2.'64F
79E1<&�.I G I H I I
1 I 83 3969.3888 835. 640670>1
f
2
14 I H78446.143:
3 l Reg_ressrofi'
Coe.fficienfo Standard Error I stal Pvalue Lowre 95% Lower95.0%
,__!_
L R_Sq�are 215.7324947 131.806625S 1163.673481"2. 213.145S956 213.1459'956 218.J18993�
�
7
X Vari.a11Jre 1 0 ._53 0598645 ,
61,.5091066'&: 1.829589113· 38.6190.8214
I
2.'6479E166 57.9188238 65.0fHr3:8951 57_9188238 6.5.D9938951
f
8
..
9j
This
1 a ends Chapter 2 of this manual.
f1 I
I
You might want to save your work before you close shop.
df SS MS
I
F Sif!.n'lfic11nce
J? r�r.essi ar:i 1
J.3 Re�si<iual 9SS. ·
Total 999',
15 1
�Intercept
16 "1
,____
r
0
Urper95%
21 8.J.189939
Uopw95. 0%
CHAPTER 3
CHAPTER OUTLINE
3.1 Interval Estimation 3.2 Hypothesis Tests
3.1.1 The tDistribution 3.2.1 OneTail Tests with Alternative "Greater
3.1.1a The tDistribution versus Normal Than"(>)
Distribution 3.2.2 OneTail Tests with Alternative "Less
3.1.1b tCritical Values and Interval Than"(<)
Estimates 3.2.3 TwoTail Tests with Alternative "Not
3.1.1c Percentile Values Equal To" (;t)
3.1.1d TINV Function 3.3 Examples of Hypothesis Tests
3.1.1e Appendix E: Table 2 in POE 3.3.1 RightTail Tests
3.1.2 Obtaining Interval Estimates 3.3.1a OneTail Test of Significance
3.1.3 An Illustration 3.3.1b OneTail Test of an Economic
3.1.3a Using the Interval Estimator Hypothesis
Formula 3.3.2 LeftTail Tests
3.1.3b Excel Regression Default Output 3.3.3 TwoTail Tests
3.1.3c Excel Regression Confidence Level 3.3.3a TwoTail Test of an Economic
Option Hypothesis
3.1.4 The Repeated Sampling Context 3.3.3b TwoTail Test of Significance
(Advanced Material) 3.4 The pValue
3.1.4a Model Assumptions 3.4.1 The pValue Rule
3.1.4b Repeated Random Sampling 3.4.1a Definition of pValue
3.1.4c The LINEST Function Revisited 3.4.1 b Justification for the pValue Rule
3.1.4d The Simulation Template 3.4.2 The TDIST Function
3.1.4e The IF Function 3.4.3 Examples of Hypothesis Tests Revisited
3.1.4f The OR Function 3.4.3a RightTail Test from Section 3.3.1b
3.1.4g The COUNTIF Function 3.4.3b LeftTail Test from Section 3.3.2
3.4.3c TwoTail Test from Section 3.3.3a
3.4.3d TwoTail Test from Section 3.3.3b
67
68 Chapter 3
In this chapter we will use the tdistribution to construct interval estimates and perform
hypothesis tests. We continue to work with the simple linear regression model of weekly food
expenditure.
Rename Sheet 1 Data. Quickly reestimate the regression parameters using Excel regression
analysis routine as in Section 2.2.2. In the Regression dialog box, the Input Y Range should be
A2:A41, and the Input X Range should be B2:B41. Select New Worksheet Ply and name it
Regression; you do not need to check the box next to Line Fit Plots.
The tdistribution is a bellshaped curve centered and symmetric around its mean, equal to zero. It
looks like the standard normal distribution, except it is more spread out, with a larger variance
and thicker tails. The exact shape of the tdistribution is controlled by a single parameter called
the degrees of freedom, often abbreviated as df The notation tern) is used to specify a t
distribution with m degrees of freedom.
Below is a graph of the tdistribution with m = 3 degrees of freedom and the standard normal
distribution.
D.40 l"""===;;;;
" ;;;;
;; ;;:='
;; :=::==�k===.;:=:1    N(0.1)

fl'3.\
D.1QI===:::
[)_2{)1""""==�
n.on =...__
_ ._____._....__
.. ___
6 2 0 6
Interval Estimation and Hypothesis Testing 69
In order to construct interval estimates, we will need critical values of Idistributions with various
degrees of freedom. The abbreviation used for a critical value is tc. The values tc and tc are the
endpoints of a closed interval around zero such that the probability of drawing a Ivalue in this
interval is (1  a), and the probability is a that a value is either less than tc or greater than tc.
Since the distribution is symmetric, the probability that a Ivalue is less than tc is (a/2), and
the probability that a Ivalue is greater than tc is (a/2).
We are usually interested in the critical value tc such that the probability that a randomly drawnt
value is within the closed interval [tc, tc] is 0.95 or 0.99, which means that the probability of a
value outside the interval, in the tails of the distribution, is only 0.05 or 0.01.
Let a 0.05. This leads to a closed interval [tc, tc] such that
= the probability is (1  a) =
/(!)
Since the probability is(a/2) that atvalue is greater than tc, this also means that the probability
of drawing a tvalue less than or equal to tc is (1  a/2). The critical value tc is the 100(1 
a/2) percentile of the Idistribution, denoted tcia/Z,m)·
We will use the TINV function to computetcritical values. First, we create a new worksheet and
table where we will store our computations.
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the Data tab. Name it tcritical value.
Select cell Al. Select the Insert tab located next the Home tab. In the Text group of commands
select Symbol. In the Symbol dialog box, the Symbols tab should be open. Select a (you might
70 Chapter 3
need to use the scroll bar to move up and down the window and find this symbol). Finally, select
Insert.
 
5ymlbol
tcritical values are obtained in Excel by using the TINV function. The syntax of the TINV
function is as follows:
=TINV(a, m)
To find the tcritical value for a= 0.05 (the combined probability in twotails) and m= 38,
given the way we organized our table above, we need to write the following formula in B3:
 
I 3 le=
A I B
2.0243�3'4
I
In cell Bl, change a from 0.05 to 0.01. Here is how your table should look like:
A I B I
I
1 :tt=·
l 0.0·1
2 m= 36
I
3 tc =
2.711556
,___
Alternatively, we could have gotten those tcritical values from Table 2 at the end of Principles of
Econometrics, 4e. Recall that the critical value tc is also the 100(1  a/2)th percentile of the t
distribution, denoted tcia/Z,m)· For a= 0.05 and m= 38, the critical value tc is the 100(1 
a/2)= 100(1  0.05/2)= 100(1  0.025)= 97.5 or 97.5th percentile of the tdistribution,
denoted tc.975,38). At the intersection of the column labeled "tc.975,df)" and the row "38" degrees
of freedom (dj), tc= 2.024.
For a= 0.01, holding m constant, the critical value tc is the 100(1  a/2)= 100(1 
0.01/2)= 100(1  0.005)= 99.5 or 99.5th percentile of the tdistribution, t(.995,38). Its value
is found at the intersection of the column labeled "tc.955,df)" and the row "38" degrees of
freedom (dj): tc= 2.712. Those tcritical values are slightly different from the ones we obtained
in Excel due to rounding in Table 2.
(3.1)
The interval bk± tcse(bk) has probability (1  a) of containing the true but unknown parameter
f3k· When using data, we say that we have a 100(1  a)o/o interval estimate or 100(1  a)o/o
confidence interval.
We are usually interested in constructing either a 95% or a 99% confidence interval, so the
corresponding a values that we would use to get our tcritical values are a= 0.05, and a= 0.01.
To obtain the interval estimates, we use equation (3.1) and replace the least squares estimators bk,
the critical tvalue tc, and the standard errors of bk's, se(bk), by their estimated values. The
lower limit (LL) and the upper limit (UL) of the interval will be:
(3.2)
(3.3)
3.1.3 An Illustration
In this section, we will first illustrate how to obtain an interval estimate by plugging values into
the interval estimator's formula. Next, we will go back to the Excel regression analysis tool and
look at the output we already have generated, as well as look at the builtin option available to
generate additional interval estimates.
We create a template to compute the interval estimates for the least squares regression parameters
of the food expenditure model.
72 Chapter 3
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the tcritical value tab. Name it Interval Estimate.
11� �
I Re.ady
�
I
�1 I Reo rassTcrn , Dciti J tcri.Hcal v;J1ue /15:'1;{1
"
i
� ! Rewe�ion r" Data , taitiGJI Vi3lue ] . J Estim.ilte < "'t:J
Interva _A
L(
A B c
1 Data Input Sample Size= =Regression!B8
2 Confidence Level=
3 Estimated bk= =Regression!B18
4 Standard Error of bk= =Regression!C18
5
6 Computed Values a= =lC2
7 df or m= =Cl2
8 tc = =TINV(C6,C7)
9
10 Interval Estimate Lower Limit= =C3C8*C4
11 Upper Limit= =C3+C8*C4
Note that we get the sample size, estimated coefficient and standard error from our Regression
worksheet. All you have to do in cells Cl and C3:C4 is, first, type the equal sign, and then, go
select the needed value in the Regression worksheet with your cursor. Finally, press Enter. We
are computing the interval estimate for {32, the slope parameter. Cell C2 is left blank for now.
Later, you will enter either 95 or 99 depending on whether you are constructing a 95% or a 99%
confidence interval, but you could also enter any other confidence level. In cell C6, the a level
will be computed based on the level of confidence entered in C2. In cell C7, the degrees of
freedom are set equal to N  2, where N is the sample size, which we record in cell Cl. Cell C8
is where the critical tvalue is computed, as shown in Section 3.1.ld. Cells ClOCll are where
the limits of the interval estimate are computed, using equations (3.2) and (3.3).
Before we specify our level of confidence, we would like to reformat C2 so that the level of
confidence can be displayed as a percentage. In cell C2, rightclick, and select Format Cells on
the tasks panel that opens up. In the Format Cells dialog box, select Percentage in the Category
window, choose 0 decimal place (use the up and down arrows for that, to the right of the Decimal
places window). Finally, select OK.
f"
  • 
'Category:
General 1 Sample.
Number
Currenc:y
Accounting
Da'te
Q.edmal places: [ii�
.)I; Cut Time

lii@i@.ir.l•i
J;;opy Fraction
f'aste Scientific
Text
Past' �pec�ar ... Special
Custom
Insert .. .
Q�let� .. .
Cle<ir Content�
Fflt�.r
S.Qrt
Percentage formats multiply the cell value by 100 and displays the result with a.percent
� symbol.
� ,Eorrmat Cells...
.
�
I
·
Pie.ls From Dropdovin U.s.t...
Here are the results you should get for a 95% confidence interval estimate for {32 (make sure you
type 95, and not 0.95, in C2):
A B c
1 Data. Input Sample Size=
2 Confide.nee Level =
3 Estimated ��=
4 Standard Error of [ii:=
5
6 Comput·edl Values (]'=
7 dfafm=
B �=
9
10 Interval Estimate L·ower Limit =
11 Upper Limit·=
The lower limit and upper limit of the interval estimate above should be the same as those
reported on p. 98 of Principles ofEconometrics, 4e.
We plugged values in equation (3.1), and built a template, to obtain interval estimates. Next, we
will go to our Regression worksheet and look at the interval estimates Excel has already
generated in the regression summary output.
Go to your Regression worksheet, and look at the last table of the summary output. Columns F
and G of that table present the lower limits and upper limits of the interval estimates for the
intercept and slope parameters, {31 and {32 (shaded cells below). Excel regression analysis routine
automatically generates the 95% confidence interval estimates.
74 Chapter 3
In cell F18, you can find the lower limit of the interval estimate for {32. In cell G18, you can find
the upper limit of the interval estimate for {32. Those values are identical to the ones you
computed in your Interval Estimate worksheet.
111'1 A B I c () I E I F I G H I I
1 SUMMARY OUTPUT
2
3 I Regression Stalistics
J.
i
+I Mwltip·le R 0_&2:04�5472
R�g,uam 038500•2221
Adjusted R Square 0.36.8818069
Tl standardError 89_51700429 I
aI Otiservations 40
foi ANOVA
11 1 df SS MS F S'E.nific.ance F
�Regressi on  1 190626.!1788 190626_97BB 2'];_78884107 1 _94586E05
Re·siduaJ 38 304505.174Z 8013.2'.94058 I
Total 39 495132153
15
16 I Coefficients SlanciardEmo.r 1 Stat P'·VB/tJe Lower95% eUpper95% Low�r 95 0% UeE,er 95 0% ,
1�11'!tercept 83_416.00997 43.4101 &19>2'. l 9>2'15 77(!51 0•.052182379 ·L46�26.n21 '1712952&!77 4_463267721, 171 .2%2S77
1. 8 )( Variable 1 10.2a.95425 2_0932534&1 4.877380554 1 : 94 586E·05 5972.052202 14.4(1.72328 5_972052202· 14.4472328
Excel actually reported the interval estimate for {32 twice: in cells F18:G18, and again in cells
H18:118. The table is set so that, if you choose to, Excel will be able to report confidence interval
estimates, other than the 95% one.
Go back to your Data worksheet. From there, select the Data tab, the Data Analysis button in
the Analysis group of commands, and Regression in the Analysis Tools window. In the
Regression dialog box, check the box next to Confidence Level and type in 99. Select New
Worksheet Ply and name it Regression and 99% CI (for Confidence Interval). Select OK.
Input
Input;yRange:
!jelp
D �abels D Constant Is �ero
� Coniidli'.nce Lev.eJ: EJ %
Output options
0 Q1JtptJt Range: �1
@ New Worksheet �ly:
0 New !!/.orkbook
R.esiduals
0 8.esiduia!. 0 Residual Plols
0 Standardizi=d Ri=siduals D L[fli= Fit Phlls.
Normal PHlbabllity
0 �ormal Probability Plots
Alongside the 95% interval estimates, Excel now has also generated 99% interval estimates for
{31 and {32 (cells H16:118, shaded below):
Interval Estimation and Hypothesis Testing 75
I 8 c E F G H
TT$UMMARY A
OUTPUT
I I D I I I I I
I""fl
��
Rearession Slatk;tirxr
�4 Multiple R
§qu a:re
� Adjastet1 R Sgaare
0_620485472
0_385002221
0 _358818069·
+
' l
,_]_ Standard Error
8 0 bservafons 401
89. 517004291

;01ANOVA
11 1 df SS MS F Sig_aificance F
�i Regression
�y 1 19062:6_9'788 190&26_978ll 2'378884107 1 _94!i86EO!i
1.3 Residual 3a ,'304505.1742 8013.294050
14 Total 391 495132.153
t5 1
1·5 I Goefflcierrts Slane/a.rd Error _!Stal Pva/ue Lower95% Upper95%. lowef99.0% Uppei99 0% I
��
>1 Intercept 8 34Hi00997 43_4101&192 1.92'15779'51 0·_062'182379 4.463267721 171.2952&T'7 :n4,29314438 201.1251'643
1 6 )C Variable 1 10.. 2095425 2_ 0932534•61 4_ B 77380554 1 _�4586E05 5_912052202 144472328 4fi336(3 8051! 15"88564�341
The interpretation of confidence intervals requires a great deal of care. The true meaning of being
95% or 99% confident about our interval estimates is that, if we were to repeat this exercise of
drawing a sample size of N = 40, estimate the least regression parameters, and construct interval
estimates for those regression parameters, many more times, then 95% or 99% of all the interval
estimates constructed this way would contain the true parameters' values. To illustrate this
concept we are going back to our simulation exercise of Section 2.4.4.
In Section 2.4.4 we drew many random samples of size N = 40, and, based on each, estimated
the corresponding least squares regression parameters. We can repeat this exercise and extend it
to compute, for each sample, not only least squares estimates, but interval estimates as well.
Note that in Section 3.1.4 of Principles of Econometrics, 4e, 10 samples were randomly drawn
from a population with unknown parameters, while in this section we will draw 100 samples from
a population with known parameters.
In the simulation exercise we are considering in this section, half of our hypothetical population
of three person households has a weekly income of $1000 (x 10), and half of it has a weekly =
income of $2000 (x = 20). Because we know the data generation process, we know the values of
population parameters for the normal distribution, and consequently the values of our regression
parameters. Let µylx=io = 200, µylx=zo = 300, and var(ylx = 10) = var(ylx = 20) = a2 =
2500. This implies {31 = 100 and {32 = 10.
We will draw random samples of 40 households from our population. Half of each sample will be
drawn from the first type of households, with weekly income x = 10; and half of each sample
will be drawn from the second type of households, with weekly income x = 20.
First, insert a new worksheet in your workbook by selecting the Insert Worksheet tab at the
bottom of your screen, next to the Interval Estimate tab. Name it Simulation.
76 Chapter 3
Let us keep records of the level of weekly income for our 40 households in column A of our
Simulation worksheet: in cell Al, type x and RightAlign it; in cells A2:A21, record the value
10; in cells A22:A41, record the value 20.
20
20
3 20
4 20
5 20
6 20
7 20
B 20
9 20
10 20
11 20
12 20
13 20
14 20
15 20
16 20
17 20
18 20
19 20
2:0 20
21
Next, use the Random Number Generation analysis tool to draw 100 random samples of
households.
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
A Random Number Generation dialog box pops up. Since we are drawing 100 random
samples, we specify 100 in the Number of Variables window. We first draw random samples of
Interval Estimation and Hypothesis Testing 77
20 from households with weekly income of x= 10, so we specify the Number of Random
Numbers to be 20. For simplicity we assumed that our population of households is normally
distributed, so this is the distribution we choose. Once you have selected Normal in the
Distribution window, you will be able to specify its Parameters: for x= 10, its Mean is
µylx=lO = 200 and its Standard Deviation is �var(ylx = 10) = a= 50. Select the Output
Range in the Output options section, and specify it to be B2:CW21. Finally, select OK.
Qls.trib\Jtioo: !::!elp
Pari'lllleters
M�an=
�
�dard deviation = �
B..amlom .Seed::
Outµ.A opti()flS
@ Quiput Range:
0 New Worksheet Ely:
0 New W.orilbook
Repeat to draw a random sample of 20 from households with weekly income of x= 20. Change
the Mean to µylx=lO = 300 and the Output Range to B22:CW41.
Parameters
Output options
@ Qutput Range;
This time we use the LINEST function to obtain the least squares estimates and their standard
errors. The LINEST function can compute the latter, if you ask it to return additional regression
statistics. For this purpose, the general syntax of the LINEST function is as follows:
The first argument of LINEST function specifies the y values; the second argument specifies the
x values; we ignore the third argument by putting a space between the second and third commas;
and the fourth argument, TRUE, indicates that we would like LINEST to return additional
regression statistics.
The LINEST function creates a table where it stores the least squares and standard errors
estimates in Excel memory. The following illustration shows the order in which they are reported:
78 Chapter 3
column 1 column 2
row 1 bz b1
row 2 se(b2) se(b1)
We nest the LINEST function in the INDEX function to get the estimated coefficients, one at a
time. The INDEX function returns values from within a table. The INDEX function general
syntax is as follows:
= INDEX(table of results, row_num, column_num)
The first argument of the INDEX function specifies which table to get the results from. The
second argument and third argument indicate the intersection of a row and a column at which the
result of interest can be found.
b1: =INDEX(LINEST(yvalues,xvalues,,TRUE),1,2)
se (b1): =INDEX(LINEST(yvalues,xvalues,,TRUE),2,2)
b2: =INDEX(LINEST(yvalues,xvalues,,TRUE),1,1)
se (b2): =INDEX(LINEST(yvalues,xvalues,,TRUE),2,1)
We will report our estimated coefficients and standard errors at the bottom of our table of random
samples. We will also compute our !critical value and limits of our interval estimates (Lower
Limit: LL and Upper Limit: UL). Finally, we would like to count how many of our 100 interval
estimates contain the true parameters' values.
We will specify cells A42:B57 as shown below (we outlined some cells in different shades of
gray only to distinguish groups of similar or related cells which we comment on shortly):
A B
42 N= 40
43 a= 0.05
44 m= =B422
45 tc= =TINV(B43,B44)
46 b1= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),1,2)
47 se(b 1 )= =INDEX(LINEST(B2:B4l,$A$2:$A$4l,,TRUE),2,2)
48 LL= =B46$B$45*B47
49 UL= =B46+$B$45*B47
50 fhin CI =IF(OR(lOO<B48,lOO>B49),"No", "Yes")
51 Yes' =COUNTIF(B50:CW50, "Yes")
52 b2= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),1,1)
53 se(b2 )= =INDEX(LINEST(B2:B41,$A$2:$A$41,,TRUE),2,1)
54 LL= =B52$B$45*B53
55 UL= =B52+$B$45*B53
56 lh in CI =IF(OR(lO<B54,lO>B55),"No", "Yes")
57 Yes' =COUNTIF(B56:CW56, "Yes")
Interval Estimation and Hypothesis Testing 79
In cells A42:B43, the N (sample size) and a values are specified so that m (degrees of freedom)
and tc (tcritical value) can be computed and reported in cell A44:B45. tc is computed as shown
in Section 3.1.ld.
Cells A46:B47 and A52:B53 are used to report and compute coefficient and standard error
estimates, as explained in Section 3.l.4c. The cell references to the x values are in Absolute
format, $A$2:$A$41, as opposed to Relative format, as we will be using the same x values for
all 100 repetitions.
Cells A48:B49 and A54:B55 are used to report and compute interval estimates, as explained in
Section 3.1.2. The value for tc will be the same over all repetitions; its cell reference is thus in
Absolute format, $B$45, in the formulas of the intervals limits.
We make use of the IF and OR logical functions to indicate, for each interval estimate, whether
or not it contains the true parameter value. The general syntax for the IF function is as follows:
IF(logical_test,value if true,value_if_false)
_ _
Logical_test is any value or expression that can be evaluated to be TRUE or FALSE. In this
exercise we want to determine whether or not the true parameter value, pk, is within the estimated
interval [LL, UL], where LL =bk  tcse(bk) and UL =bk+ tcse(bk)· The logical expression
we use is: if pk < LL or pk > UL. If pk is outside [LL, UL], then this expression is TRUE.
Otherwise, the expression is FALSE.
Value_if_true is the value that is returned iflogical_test is TRUE. For example, if this argument
is the text string "No" and the logical_test argument is TRUE, then the IF function displays the
text "No".
Value_if_false is the value that is returned if logical_test is FALSE. For example, if this
argument is the text string "Yes," and the logical_test argument is FALSE, then the IF function
displays the text "Yes".
We use the OR function to write our logical test. The general syntax of the OR function is as
follows:
OR(argument_1,argument 2) _
If the first logical expression, argument_!, or the second logical expression, argument_2, is
TRUE, then the OR function returns TRUE. It returns FALSE only if both arguments are
FALSE.
The general syntax for the OR function, nested in the IF function, is:
Applied to our exercise, the nested function looks like this (which is what we have in cellB56):
If flk is outside
[LL, UL], then the logical test flk <LL or flk > UL is TRUE, and "No" is
returned to indicate that flk is not in the estimated confidence interval. Otherwise, the logical
expression is FALSE, and "Yes" is returned to indicate that flk is in the estimated confidence
interval.
Finally, we use the COUNTIF function to count the number of times flk is found within the
estimated interval [LL, UL].
The COUNTIF function is a statistical function that counts the number of cells within a range
that meet a given criteria. Its general syntax is:
COUNTIF(cell_range,criteria)
Cell_Range is one or more cells to count. Criteria is the number, expression, cell reference, or
text that defines which cells will be counted. Since we are interested in counting how many
interval estimates, among all the ones we will construct, actually contain the true parameter value,
we will count the "Yes" that are generated following the application of our logical test (this is
what we do in cellB57):
COUNTIF(cell_range,"Yes")
Once you have reviewed and understood the formulas and values from B42:B57, you can copy
the content ofB46:B50 toC46:CW50 and copy the content ofB52:B56 toC52:CW56.
Here is how our worksheet looks like (only 10 out of 100 simulations results are shown below):
A 8 c D E F G H J K
42 Ill= 40
43 a= 0.65
44 m= 38
45 'le= 1.024394
46 b1= 163_162645 12:!L1E79 4i6.826J6i 1WW13 13 . 5.5 64J 85._4841>5 93.69496, 89.25071 117.0464 1l9.4847
47 se{b1)= 28.53373 22. 14145 24.0()9091 23.8.1712 2741891 25. 52'32'9 19241()2 19.19294 2779757 22.4184
48 LL= !i.862943 83.33492 �i.7774�1 &2.�56'°3 80.0.5716 3179'105 54_ 74354 50.39S6S 6.0. 77321_ 74.10106
49 UL= 121.39 172 981 95:4.30:22: 159L1S65 , 91.()f1 139.178.2 132.6464 128_ 10491 173.3197 164:86!!4
50 �1 in Cl Yes Yes Na Yes Yes Yeis Yes Yes Yes YEJ>s
51 Yes· �8
52 bi= 12 32048 7.215456 13:31 B9� 9'29'7985 8.060182 11.0701>1 10.90295 10.74238 9'.0090 1 1 B.548776
53 seCb2)= 1.804631 1.4.00348 l .5164,6&, 1.506]27 1.734124 1.167748 l.2:169()9 1.21386& 1. 758073 1.417864
54 LL= l:l.&67196 4,380599 j 0:24497 S..248586 4.549531 7.674729 8.439441 82650·28 5.44998 5�6i'B459
55 .UL= 15.97377 1 0 . 05031 1°16:39293 12.34738 11.57073 14.46i649 U.36645 1.3.199172: 12.56604 11.4190,9
56 S2 in Cl Yes Yes Na, Y�s Yes Yes Yes_ Yes Y�s Yes
57 Yes' 911
We find that 98 out of our 100 confidence intervals contain the true parameter value, both for our
intercept and slope coefficient confidence intervals. Note that you will draw different random
Interval Estimation and Hypothesis Testing 81
samples, obtain different interval estimates and thus obtain a different number of intervals that
will contain the true parameters values.
We first extended our repetitions to 1,000 samples, and found that 959 out of 1,000 interval
estimates contained {31, and 962 out of 1,000 interval estimates contained {32. Finally, we
extended the repetitions to 10,000 samples and found that 95.08% of both the intercept and slope
coefficients interval estimates contained the true parameters values.
In the next section of this chapter, we will perform hypothesis tests. To go over examples of
hypothesis tests, we are getting back to our simple linear regression model of weekly food
expenditure.
If the null hypothesis H0: {Jk = c is true, then the test statistic t =(bk  c)/se(bk) follows at
distribution with m = N  2 degrees of freedom:
(3.4)
When we reject H0, we accept a logical alternative hypothesis H1. There are three possible
alternative hypotheses to H0:
(3.5)
(3.6)
(3.7)
reje t J10:
�k=c
do 11ot
rej�ct H �
�k =c
Note that in this case the probability is a that a randomly drawntvalue is equal to or greater than
tc, where tc is defined as the lower limit of the righttail of the distribution shown in the graph
above.
82 Chapter 3
If the alternative hypothesis (3 .6) is true, then the value of the computed test statistic will tend to
be unusually small. We will reject H0 if the test statistic is in the lefttail of the distribution.
1(m)
Note that in this case the probability is a that a randomly drawn tvalue is equal to or less than tc,
where tc is defined as the upper limit of the lefttail of the distribution shown in the graph above.
Note that in this case the probability is a that a randomly drawn tvalue will fall in the tails of the
distribution, either equal to or less than tca;2,N2) or equal to or greater than t(la/2,N2). Those
limits are shown in the graph above. (Note that those limits correspond to values tc and tc first
defined in Section 3.1.lb.)
We illustrate the mechanics of hypothesis testing using the food expenditure model. We give
examples of righttail, lefttail, and twotail tests. Note that when the null hypothesis of a test is
that the parameter is zero, the test is called a test of significance. We can have onetail tests of
significance or twotail tests of significance.
Interval Estimation and Hypothesis Testing 83
Recall our estimated regression model; below the estimated values for b1 and b2, we report their
estimated standard errors, se(b1) and se(b2):
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the Simulation tab. Name it RightTail Tests.
A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!B 18
3 se(bk) = =Regression!C18
4 Ho: flk=
5 a=
6
7 Computed Values df or m = =Cl2
8 tc= =TINV(C5*2,C7)
9
10 Rb?htTail Test tstatistic = =(C2C4)/C3
11 Conclusion: =IF(C10>=C8,"Reject Ho","Do Not Reject Ho")
We get the sample size N, estimated coefficient b2 and standard error se(b2) from our
Regression worksheet. All you have to do in each of cells Cl:C3 is, first, type the equal sign, and
then, select the needed value in the Regression worksheet with your cursor. Next, press Enter.
We are performing hypothesis tests on the slope parameter, {32. Cells C4:C5 are left blank for
now. Later, you will specify the value you hypothesize /32 takes, as well as the level of
significance of your test (a). In cell C7, the degrees of freedom are set equal to N 2, where N is 
Cell CS is where the criticalvalue for the righttail rejection region is computed. Recall that all
the probability a of rejecting H0 is in the right tail of the distribution greater than or equal to tc.
The TINV function, on the other hand, gives us a tc value such that P(tm > tc) = a/2. So, what
we need to do, to get the correct criticalvalue for the righttail rejection region, is to multiply the
specified a value by 2 in the TINV function (half of a x 2 is a, which is what we want).
Cell ClO is where the teststatistic t is computed. The test statistic is computed by plugging the
least squares estimate and its standard error into the equation fort in (3.4).
84 Chapter 3
Finally, in cell Cll, we use the IF function to determine whether or not our tstatistic falls into
the rejection region. If it does, we reject our null hypothesis; if it does not, we do not reject it (see
Section 3.1.4e for details on how the IF logical function works).
B c
N= 40
b;: 10.20964
3 .se{bl<)= 2.09326
· 3
4 Ho: Pk= 0
5 a= 01.05
6
7 C:omrmted Values dform= 38
6 le= 1.685954
9
10 RightTail Test tstatistic: 4.877381
11 C::onc�u�ion: Rejed H·o
Let a= 0.01; H0: {32 :::;; 5.5 and H1: {32 > S.S.
Note that the hypothesis testing procedure for testing the null hypothesis that H0: {32 :::;; 5.5
against the alternative hypothesis H1: {32 > 5.5 is exactly the same as testing H1: {32 = 5.5 against
the alternative hypothesis H1: {32 > 5.5.
A I B c I D
1 �ta Input N= 40
2. b. = 18•.20964
t
I
3 se(bie)= �.0.93;;"63 
I
4 f!o: �k = 5.5

T cr= 01.01
�
7 C()mputed Values df OF m =
·
38
8 tc =

2.42B.568
9
10 Ri.ghtTail Test· tstati stic = 2.249904
>
11 Condusio_n� _po Not Reje<:,t H()
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the RightTail Tests tab. Name it LeftTail Tests.
Simulation l RiahtTail Tests / £:! IV" ,\ I SimulatloM . RIO'htTaHests J LeftTai!Tests,. 'D "'11
t·F =ll�lI� � I
,
======�ll=n= W=
s� =rt = � s h e=et�Sh�=if
o= == 11
!=========== =
Interval Estimation and Hypothesis Testing 85
The lefttail test template will be very similar to the righttail test template. You can copy cell
Al:Cll from the RightTail Tests worksheet to cells Al:Cll in the LeftTail Tests worksheet.
Alternatively, you can select the whole RightTail Tests worksheet by left clicking on the upper
leftcomer of the worksheet. Your cursor should turn into a fat cross as shown below:
Select Copy. Leftclick in cell Al of the LeftTail Tests worksheet, and select Paste.
m A II s I
N
You will need to make just a few modifications to create the following lefttail test template:
A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!Bl8
3 se(bk)= =Regression! C18
4 Ho: Pk=
5 a=
6
7 Computed Values df or m= =Cl2
8 tc= = TINV(C5*2,C7)
9
10 LeftTail Test tstatistic= =(C2C4)/C3
11 Conclusion: =IF(ClO<=C8,"Reiect Ho","Do Not Reiect Ho")
The rejection region for a lefttail test is the mirror image of the rejection region for a righttail
test; it is on the lefttail instead of the righttail of the distribution. The critical value for a lefttail
test is thus the negative of the critical value for a righttail test: in cell C8, we precede the TINV
function by a minus sign to reflect that.
In a lefttail test, we reject our null hypothesis if our !statistic is less than or equal to our critical
value, not greater than or equal to our critical value as it is the case in a righttail test; we adjust
the equation in Cll accordingly.
Note that the hypothesis testing procedure for testing the null hypothesis that H0: {32 � 15
against the alternative hypothesis H1: {32 < 15 is exactly the same as testing H1: {32 = 15 against
the alternative hypothesis H1: {32 < 15.
86 Chapter 3
A I 8 I c
1 Data Input N= 40
I
2 b,,= ·rn.20964
I
3

se(b�= 2.093263
,_
4 Ho: �k = 15
5 ci= 0.05
&
f
1 Computed Values df or m = 38
8 t., = 1.685954
�
10 LeftTail Test

tstatistic = 2.288464
11 Conc.lusion: Reject Ho
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen, next
to the LeftTail Tests tab. Name it TwoTail Tests.
LeftTail Tests q R.icJhtT<iU Te.51:s / Left Tall Tests l TwoTa'il Tests, �:1
l.:�.;;;;;;;
;; ;;;;
;;;;;;;; ;; ����
;;
The twotail test template will also be very similar to the righttail test template. You can copy
cell Al:Cll from the RightTail Tests worksheet to cells Al:Cll in the TwoTail Tests
worksheet. Alternatively, you can select the whole RightTail Tests worksheet and copy it in the
TwoTail Tests worksheet.
You will need to make just a few modifications to create the following twotail test template:
A B c
1 Data Input N= =Regression!B8
2 bk= =Regression!B18
3 se(bk)= =Regression!C18
4 Ho: �k=
5 a=
6
7 Computed Values dfor m= =Cl2
8 tc= =TINV(C5,C7)
9
10 TwoTail Test tstatistic= =(C2C4)/C3
11 Conclusion: =IF(OR(C10<=C8,C10>=C8),
"Reject Ho","Do Not Reject Ho")
The rejection region for a twotail test is split in half between the lefttail and the righttail of the
distribution: only a/2 of the probability is in each tail of the distribution. So, we do not need to
multiply a by 2 in the TINV function any more: delete *2 in cell CS.
Interval Estimation and Hypothesis Testing 87
In a twotail test, we reject our null hypothesis if our tstatistic is less than or equal to the lefttail
critical value, or greater than or equal to righttail critical value: we adjust the equation in Cll to
reflect that (see Section 3.1.4f for details on how the OR logical function works).
A B c D
ii Data Input N= 40
2 b;.= 1020964
3 se(b.i,) = 2.093263
4 �= l.6
5 .er= 0.05
6
7 Comp·uted Vilues df or m = 38
B le= '2.024394
9
10 TwoTail T est · tstati stic 1.29445 8
=
A B I c

1 Data Input N= 40

2 10.209'6'4
b,.=

3 se(b,;)= 2.0932&3

4 Ho:�= 0

5 II= 0·.05
6

Note that the tstatistic in a twotail test of significance is equal to the !statistic in onetail test of
significance (compare the !statistic value above to the one obtained in Section 3.3.la). Also note
that this tstatistic value for tests of significance is reported in the regression summary output
generated by Excel.
Go back to your Regression worksheet. If you do not see your Regression tab, it is because it is
hidden. Use either one of the leftarrows at the left comer of your screen so that the first
worksheets you were working with can be seen again.
I Re a ressio n rv<.
0
•� '4 � �1
Ready I 'IC
88 Chapter 3
Column D of the last table of the summary output presents the tstatistic values for tests of
significance of the intercept and slope parameters, {31 and {32 (shaded cells below).
�i A
MMARY OVTPITT
I B I c I D I E I F G I H I
J[ Re11.reson
s1 Stil'tisfjcs
4 I Mult ipl e R 0.620485472
'
When reporting the outcome of statistical hypothesis tests, it has become standard practice to
report the pvalue (an abbreviation for probability value) of the test. If we have the pvalue of a
test, we can determine the outcome of the test by comparing p to the chosen level of significance,
a. This is an alternative to comparing the teststatistic value to the critical value(s) or limit(s) of
the rejection region for a test.
In order to explain the pvalue decision rule for hypothesis tests, we first give a definition of the
pvalue.
How the pvalue is computed depends on the alternative hypothesis of our test. If H 1: Pk > c, p
is the probability that atvalue be equal to or greater than the test statistic t value.
0 t
Interval Estimation and Hypothesis Testing 89
If H1: Pk < c, pis the probability that atvalue be equal to or less than the test statistic t value.
t 0
If H1: f3k * c, pis the probability that atvalue be equal to or less than  It I or equal to or greater
than It I, where t is test statistic value.
p/2
t
l l 0 t
ll
We can see that when the test statistic value t falls into the rejection region, this means that its p
value is less than, or equal to, the level of significance a.
For H1: f3k > c; if t > tc, t is in the rejection region and p < a. The case illustrated below is
where t > tc, and p < a. H0 is rejected.
90 Chapter 3
reject Ho
0 fc = f (la,N2) f
For H1: f3k < c; if t � tc, t is in the rejection region and p � a. The case illustrated below is
where t < tc, and p <a. H0 is rejected.
reject Ho
f fc = f(a,N2) 0
For H1: {3k =F c; if t � tc on the lefttail of the distribution or t � tc on the righttail of the
distribution, t is in the rejection region and p � a.
The case illustrated below is where t > tc on the righttail of the distribution, and p <a. H0 is
rejected.
reject Ho reject Ho
a/2
tc tca12,N2) 0 tc = t(la/2,N2) t
=
Interval Estimation and Hypothesis Testing 91
The case illustrated below is where t < tc on the lefttail of the distribution, and p < a. H0 is
rejected.
reject Ho reject Ho
p/2
f fc = f(o12,N2) 0 fc = f(lo12,N2)
We can thus compare the pvalue of a test, p, to the chosen level of significance, a, and
determine the outcome of our hypothesis test: if p ::::; a, we reject H0 and accept H1; if p > a, we
do not reject H0. This is the pvalue rule.
pvalues are obtained in Excel by using the TDIST function. For hypothesis tests purposes, the
syntax of the TDIST function is as follows:
=TDIST(ABS(t),m,tails)
t is the value of the computed test statistic, ABS is a mathematical function that will return the
absolute value oft, mis the degrees of freedom, and tails specifies whether we are seeking the p
value for a onetail test or a twotail test. Set tails to 1 for a onetail test, and set tails to 2 for a
twotail test.
Go back to your RightTail Tests and LeftTail Tests worksheets and add the following at the
bottom of each template:
A B c
12 pvalue = =TDIST(ABS(C10),C7 ,1)
13 Conclusion: =IF(C12<=C5,"Reject Ho","Do Not Reject Ho")
Go back to your TwoTail Tests worksheet and add the following at the bottom its template:
A B c
12 pvalue = =TDIST(ABS(C10),C7 ,2)
13 Conclusion: =IF(C12<=C5,"Reject Ho","Do Not Reject Ho")
92 Chapter 3
Note that the hypothesis testing procedure for testing the null hypothesis that H0: P2 < 5.5
against the alternative hypothesis H1: Pz > 5.5 is exactly the same as testing H0: Pz = 5.5
against the alternative hypothesis H1: Pz > 5.5.
� A B I c I D
I' Oata Input N·= 40

2 b, = 10.20964

3 se(b,,) = 209326�
4 H<'l��= 55
5

(l = 0�01
_ _§____
7 Comput·edl Values df or m = 38
8 t.c= 2_42856.&
�
9

to Rig1htTa.il Test tstatistic = 2'.249994
11 Co·�clusi6n: Do Not Reject .Ho
·12 
p'llalu·e �.·015163
=
Let a= 0.05.
A B c
1 Data Input N= 40
2 bi.= 10_20964
3 se(bk) = 2_093253
4 Hu: �1<_= 5_5
5 ll = 0_05
6
7 Computed Values dfor m = 38
8 tc
 1 . .585954
11 Concl�sion: Heject Ho
12 _pvalue 0 . 0 1 51 & 3
=
Note that the hypothesis testing procedure for testing the null hypothesis that H0: Pz > 15 against
the alternative hypothesis H1: Pz < 15 is exactly the same as testing H0: Pz = 15 against the
alternative hypothesis H1: Pz < 15.
Interval Estimation and Hypothesis Testing 93
A I 8 I c I D
1 D11ta Input N= 401
,___.

2: bk= 10_20964
3 se(bk)= 2.093263
f
4
·
Ho: P'k = 1S
5 a= 0.01 
,....._
5
3B·
� Computed: Values df mm=
a. r.., = 2..4285681
,___.
9'
,_____
10 LeftTail Test tstatistic = 2.:288464
,___.
11 Conclusion: Do Not Re}ect Ho
'12 pvalue = 0.013881
13 Conclusion: Do NotRajed Ho
Let a= 0.05.
A B c
Data lnp·ut N= 40
2 bi\:= 1020'964
.3 s·e(bk) = 2_09'3263
4 '.15
5 a,= 0.05
6
7 Computed Values dform= 38.
a !.: = 1685954
9
10 LeftTail Test tstatistic = 2.2684>64
11 Conclusion: Reject Ho
12 __ pvalue = Q_QH881
13 Conclusion: R·eject Ho
A B
Data Input N= 40
2 b·= 10.20964
]. se(bx) = 2.093263
4 Ho:�= 7.5
5 a= 0.0�
6
7 C•omput•.e.di Values df OF m = 38
8 tc = 2.024394
g,
10 TwoTail Test tstatistic = 1.29M.5B
11 Conclusion: Do ':Jot �eject Ho
·12 pval�e = 0.20331.8
13 Conclusion: Do N ot ReJect Ho
A B c
1 Q11ta. Input N= 40
2 b,;= 10.20964
3 se'(b1::)= 2.0'9'3263
4 Ho� �k = 'Q
5 o:= 0Jl5
e;
7 Compute<fValues dfor m = 38
8 t,,= 2..(}24394
9
tstatistic = 4.877381
Conclusion: Rej(?ct Ho
pvalue 1.95E05
Ho
=
Conclusion: Reject
Note that the pvalue for this test is very tiny. "l .95E05" is a standard scientific notation which
means "1.95 times 10 exponent 5":
1 1
"1.95E05" = 1.95 x 105 .95 .95 0.0000195
10s 100,000
= = =
Also note that this pvalue for the twotail test of significance 1s reported m the regress10n
summary output generated by Excel.
Go back to your Regression worksheet. If you do not see your Regression tab, it is because it is
hidden. Use either one of the leftarrows at the left comer of your screen so that the first
worksheets you were working with can be seen again.
0
Column E of the last table of the summary output presents the pstatistic values for the twotail
test of significance for the intercept and slope parameters, /31 and /32 (shaded cells below).
A I B I c I D I E I F I .G l K I I
1 SUMMARY OUTPLJT
T +
3 Hearession Slati:slics

4 Multiple R 01.1620485472.
,___
RSgua.re 0.38:5()02221
I
� Obsenra'tions 4 01
9
10 MOVA
I
I
�!!.. ,df SS MS F Sfg_nJticance F
12 Regressi Ctn 1 19()&269788 1!10626_9'7.BS 23. 7ss.s4·107 1.9458:6E()5
13 Residual 3( 304505.1i4:2 80•1 i2940.58
14 Total 39 495132.153
15
16 Coefficients Slandam Error r Stat Pv;alae tower !15% Utmer 95% Lower95.0% Upper95.0%
17 lnteicl'!'pt g·3_4 1160 0997 43.4101619'2 1._92
• 1577951 O_Q.621823 79 4)153267721 17129528 77 4.463267721 171.2%2877
18 X Variable 1 2.0932&3451 4.877380554 1._94586E!li5. 5.9720522()2 14.4472328 5._ 97'2052202 R4472328
I
10.2:1)%425
CHAPTER 4
CHAPTER OUTLINE
4.1 Least Squares Prediction 4.6.3 The JarqueBera Test for Normality for
4.2 Measuring GoodnessofFit the LinearLog Food Expenditure Model
4.2.1 Coefficient of Determination or R2 4.7 Polynomial Models: An Empirical Example
4.2.2 Correlation Analysis and R
2 4.7.1 Scatter Plot of Wheat Yield over Time
4.2.3 The Food Expenditure Example and the 4.7.2 The Linear Equation Model
CORREL Function 4.7.2a Estimating the Model
4.3 The Effects of Scaling the Data 4.7.2b Residuals Plot
4.3.1 Changing the Scale of x 4.7.3 The Cubic Equation Model
4.3.2 Changing the Scale of y 4.7.3a Estimating the Model
4.3.3 Changing the Scale of x and y 4.7.3b Residuals Plot
4.4 A LinearLog Food Expenditure Model 4.8 LogLinear Models
4.4.1 Estimating the Model 4.8.1 A Growth Model
4.4.2 Scatter Plot of Data with Fitted Linear 4.8.2 A Wage Equation
Log Relationship 4.8.3 Prediction
4.5 Using Diagnostic Residual Plots
2
4.8.4 A Generalized R Measure
4.5.1 Random Residual Pattern 4.8.5 Prediction Intervals
4.5.2 Heteroskedastic Residual Pattern 4.9 A LogLog Model: Poultry Demand Equation
4.5.3 Detecting Model Specification Errors 4.9.1 Estimating the Model
4.6 Are the Regression Errors Normally
2
4.9.2 A Generalized R Measure
Distributed? 4.9.3 Scatter Plot of Data with Fitted LogLog
4.6.1 Histogram of the Residuals Relationship
4.6.2 The JarqueBera Test for Normality using
the CHllNV and CHIDIST Functions
In this chapter we continue to work with the simple linear regression model of weekly food
expenditure to make predictions, compute goodnessoffit measures, and address modeling issues.
We also work with additional examples.
95
96 Chapter 4
A 100(1  a)% prediction interval at value x0 of the explanatory variable is defined as:
(4.1)
2
where: 8 is the estimate of the error variance or mean square residual (MS residual),
The lower limit (LL) and upper limit (UL) of the prediction interval are:
LL = Yo  tcse(f) (4.4)
LL = Yo + tcse(f) (4.5)
Before we create a template to compute prediction intervals, we quickly reestimate the food
expenditure model; note that this time we also want to generate the residual output. We are
interested in the Predicted Y values generated in this output. Also, since we will use more than
one data set and run more than one regression in this chapter, we will choose to give our data and
regression worksheets more explicit names.
Rename Sheet 1 food data. Reestimate the regression parameters using Excel Regression
analysis routine as in Section 2.2.2. In the Regression dialog box, the Input Y Range should be
A2:A41, and the Input X Range should be B2:B41. Select New Worksheet Ply and name it
Food Regression; and do check the box next to Residuals.
Prediction, GoodnessofFit, and Modeling Issues 97
   
Reg.essi�n    l1J(g]
Input
Input")'. Range,:
lnput.l\; Range:
[�]
O'kabels D Constintis �ero
0 Con6d!i!nte Le�el: @=] %
Oulj'JUI tlpt!bns
0 Qt;lputRange: �I
0 New W11r.k:sheet f'.ly:
0 New !!11.orld;loolc
Residuals
� 'R:e�d;;,aii' D Resi�al !>lots
D si�.J�;.i'iz!i!d Residuals D Line Fit Plofu
Normal Probabilicy
D !'iormal Probability P.lots
Next, insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of
your screen. Name it Prediction Interval.
l
I Insert Work.sheet (Shift �FHJ M
Create the following template to construct interval estimates. In the last column you will find the
numbers of the equations and the formatting options used, if any, in the template.
A B c
1 Data Input Sample Size= ='Food Regression'!B8
2 Confidence percentage
3 Xo =
4 b1 = ='Food Regression'!B17
5 b2 = ='Food Regression'!Bl8
6 se(b2) = ='Food Regression'!C18
7 MS residual= ='Food Regression'!D13
8
9 Computed a= =lC2
Values
10 df or m= =Cl2
11 tc= =TINV(C9,C10)
12 predicted Yn= =C4+C5*C3 (4.2)
13 xbar= =AVERAGE('food data'!B2:B41)
14 se(f) = =SQRT(C7+C7/Cl+((C3C13)"'2)*C6) (4.3)
15
16 Prediction Lower Limit= =C12Cl1*C14 (4.4)
Interval
17 Upper Limit= =Cl2+Cl1*C14 (4.5)
98 Chapter 4
5 b2 = 1020964
6 se'(b2) = 2.093263
7 MS msidual = B O U294
s
9 Comput,ed Values « = 5,3
10 df or rn = 38
11 t,, = 2.024.394
i2 preidicted rm= 287 .6089
13 xi>ar = 19 _,50475
14 se(f) = gQ_'63D86
15
16 Prediction Interval Lower Limit= 104.1363
17 Up[>E!r Limit= 471.0'814
(4.6)
where: SSR is the sum of squares due to the regression (SS Regression),
and SSE is the sum of squared errors or sum of squared residuals (SS Residual).
Rz =
z
r:xy (4.7)
R2 can also be computed as the square of the sample correlation coefficient between Yi and
Yi = b1 + b2xi. This result is valid not only in simple regression models but also in multiple
regression models that will be introduced in Chapter 5.
Rz =
2�
r.yy (4.8)
Prediction, GoodnessofFit, and Modeling Issues 99
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Correlation Analysis and R2.
Create the following template (in the last column, you will find the numbers of the equations used
in the template):
A B c
1 Data Input SS Residual= ='Food Regression'!Cl3
2 SS Total = ='Food Regression'!Cl4
3
2
4 Computed R = =lCl/C2 (4.6)
Values
5 rxv= =CORREL('food data'!B2:B41, 'food data'!A2:A41)
6 r2xv= =CY'2 (4.7)
7 ryyhat= =CORREL('food data'!A2:A41, 'Food
Regression'!B25 :B64)
2
8 r vvhat= =C7A2 (4.8)
The sample correlation coefficients in cells C5 and C7 are computed using the CORREL
statistical function. CORREL returns the correlation coefficient between two data sets. The
general syntax of this function is:
=CORREL(cell_rangel, cell_range2)
In cell C5, we compute the correlation coefficient between x and y values, which we find in the
food data worksheet. In cell C7, we compute the correlation coefficient between y and y values;
the latter are found in the Food Regression worksheet, under the column labeled "Predicted Y"
from the residual output.
Here are the results you should get (see also p. 138 of Principles ofEconometrics, 4e):
A B
1 Data Input SS Residu.al = 3'04505.2'
2 SS Total= 4951322
3
4 Compuled Values Rz= 0.38.5002
5 rX\' = 0.620485
6 �xy= 0.385002
1 =
ryy11at 0.620485

8 r2yyhat = 01.385002
100 Chapter 4
Note that ryy and R2 are actually reported in the summary output of your regression analysis:
cells B4:B5, shaded below (ryy is labeled "Multiple R" and R2 is called by its familiar name "R
Square").
I A I B
1 SUMMARY OUTPITT 1
,_
2
3 Reqression Statistics
�"lti�eR �.620¢.85472
R Square L0.38500.2221
Adjusted R Square 0.3'68818069
7 Standard Error 89.517Cl0429
slohstirvations 40
In our food data worksheet, weekly food expenditure (y values) are recorded in dollars while
weekly income (x values) are recorded in units of $100.
Recall our estimated regression model. Below the estimated values for b1 and b 2, we report their
estimated standard errors, se(b1) and se(b2):
Yi 83.42 + 10.21xi
=
(4.9)
(se ) (43.41) (2.09)
Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as weekly income increases by 1 unit, i.e. $100, weekly food expenditure is expected
to increase by 10.21 units, i.e. $10.21. The interpretation of the estimated intercept coefficient is
as follows: weekly food expenditure for a household with zero income is estimated at $83.42.
Let x* = 100x. We change the scale of measurement of our x values so that weekly income is
now recorded in dollars.
Go back to your food data worksheet. In Dl, enter the column label x*=lOOx. In cell D2, enter
the formula =100*B2; copy it to cells D3:D41. Here is how your table should look (only the first
five values are shown below):
A B c D
1 food_exp income x..=100x
2 115.22 3.69' 369
3 135.98 4.39' 439
4 119.34 4_75. 475
5 ..
114 96 6.03 50.3
6 18'7.05 12.47 1247
We want to reestimate the food expenditure model using our original y values and our rescaled
x* values.
Prediction, GoodnessofFit, and Modeling Issues 101
In the Regression dialog box, the Input Y Range should be A2:A41, and the Input X Range
should be D2:D41. Select New Worksheet Ply and name it Food Regression lOOx (you do not
need to select Residuals).

• Regres.siorn LIJ�
'.Input
�l OK
Input)'. Range::
(�] [ Cancel
Input 1( Range::
[�]
ttelp
D babels D Constant!is �ero
D Confidence Level:. (0 %
Output opfions
0 Qutput fl.ange:
®New WoFks!ieet Ely: IFood Regression 100x I
0 New '\!'.'l_orkboo�·
Yi 83.42 + o.1021xi
=
(4.10)
(se ) (43.41) (0.0209)
Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
1 unit, i.e. $1, weekly food expenditure is expected to
as follows: as weekly income increases by
increase by 0.1021 $0.1021 or 10.21 cents. Note that this is equivalent to saying that
units, i.e.
as weekly income increases by $100, weekly food expenditure is expected to increase by $10.21;
rescaling the data does not affect the measurement of the underlying relationship.
Go back to your food data worksheet. In E l, enter the column label y*=y/100. In cell E2, enter
the formula =A2/100; copy it to cells E3:E41. Here is how your table should look (only the first
five values are shown below):
A I B I c I D I .E
1 foodi_exp in.come x'"=11}0x 'f=ylU>O

2 
115.22 3.69 369 1.152:2

,3 135.98 4_39 439 13598
4 119.34 4.75· 475 1.'.1934

5 114.96 6.03 603 1.1496
� 187.05 12:.47 1247 1.87Cl5
102 Chapter 4
We want to reestimate the food expenditure model using our original x values and our rescaled
y* values.
In the Regression dialog box, the Input Y Range should be E2:E41, and the Input X Range
should be B2:B41. Select New Worksheet Ply and name it Food Regression divided by 100.
 
'. Regression
'
�L8J
Input
!J1Jput Y. Range: �
�
O !..ab€ls D Gonstll'ilttis;:'_ero
D Conjjaence !Level: �%
Output ep66ns
Q.·Qutput Range;
@New Worksheet �ly:: I;ion divided by 1001 l
0 New IJ!orkbcck
Residuals
Yi o.8342 + o.1021xi
(4.11)
=
Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as weekly income increases by 1 unit, i.e. $100, weekly food expenditure is expected
to increase by 0.1021 of a $100 unit, i.e. $10.21. The interpretation of the estimated intercept
coefficient is as follows: weekly food expenditure for a household with zero income is estimated
at 0.8342 of a $100 unit, i.e. $83.42. Again, note that rescaling the data does not affect the
measurement of the underlying relationship.
Go back to your food data worksheet. In Fl, enter the column label x*=4x. In Gl, enter the
column label y*=4y. In cell F2, enter the formula =4*B2. In cell G2, enter the formula =4*A2.
Copy the content of cells F2:G2 to cells F3:G41. Here is how your table should look (only the
first five values are shown below):
A I B I c I D I E I F I G
food_e:xp income. x"=100.x x"�.x:
_1_ y"=yJ11H) y*�}'
115.2'2 3.&9' 369 1.152'2 14.75:' 4&0.88
c1
3 ns_g.s 4.39 439 1_3S.9'8 17.56: 543.92
4 119.34 4.75' 475 1.1934 19' 4n.J.s
i
We want to reestimate the food expenditure model using our newly rescaled x* and y* values.
In the Regression dialog box, the Input Y Range should be G2:G41, and the Input X Range
should be F2:F41. Select New Worksheet Ply and name it Regression 4x and 4y.
1 Regrnssiorn LZJ[8]
Jnp:rt
illpllt Y. Range: $G52:5G$41
0 QutputRange:: I ·�l
@New W0rksheet·e_ly: j. egression 4x a'rid 4y I
0 New W.orkOOok
Residuals
D B_esiduals D Reslgual plots
D Standardized Residuals, D L"!ne Fit :f'.!Otl
Normal Preb,abUi ty
D �ormal Pr.obability pJ. ots
Yi 333.66 + 10.21xi
(4.12)
=
Given the units of measurement of the data, the interpretation of the estimated slope coefficient is
as follows: as monthly income increases by 1 unit, i.e. $100, monthly food expenditure is
expected to increase by 10.21 units, i.e. $10.21. The estimated monthly food expenditure for a
household with zero income is $333.41; this is 4 times the estimated weekly food expenditure for
a household with zero income (see Section 4.3.1). Again, rescaling the data did not affect the
measurement of the underlying relationship.
104 Chapter 4
In your food data worksheet, insert a column to the right of the income column B (see Section
1.4 for more details on how to do that). In cells Cl:C2, enter the following column label and
formula.
c
1 ln(income)
2 =ln(B2)
Copy the content of cells C2 to cells C3:C41. Here is how your table should look (only the first
five values are shown below):
·  _._
·
A _
l B l c
+
_ ______
In the Regression dialog box, the Input Y Range should be A2:A41, and the Input X Range
should be C2:C41. Select New Worksheet Ply, name it LogLinear Food Model and do check
the box next to Residuals. Finally select OK.
r R� f1j�
OK tiJ
Input
inputYRange: �:$2::$Aµ1 �
Cancel I
[nputKRange: I !iC$2:: $t �1 [iJ
Ol,,abels D Cornstant is �ero
!ielp· ]
D Con�dence Level: �%
Output op�on:;;
0 Qu:tputlRange:: !iii
® Ne•111 Worksheet Ely:; I Loglinear Food Model I
0 New �orKbook
Residuals
D ResiQUJal Plots
The result is (matching the one reported on p. 144 of Principles ofEconometrics, 4e):
Prediction, GoodnessofFit, and Modeling Issues 105
A I B I c I D I E I F I G H I I
1 SUMMARY OUTPUT
.
2 I
:3 R.egression Sl�lislics·
I '
4 Multiple H 0_5917084.978
f
'5 R ?quar� _ 035651;04 71
,___,__ I
t
 ·I
t3 Resrdwal 38 318612.3 5 59 8384,_535G82
'
14 Total 39 495132.153
1.5
�6 GoeHicients Sl<1ndani Error I Stef Pvil')Ue .Lowe.r95% Uooer95% Lower95.0% Uooer95.0%
r
11 lnleKept. cS7.18641517 8423744235 1.1537199'19 0.255620028. 2&7.716:2004 73.34337005' 267. 7162004 73.34337005

Ta x variabie.1 13z:1 s584.24 28_8 0461184 ··.uag357 ii .759'93Eos 73 8 53'g.5477
_ 190.47773· 7f8.S395477 f 9fl._,f7773:
Note that your ANOVA table should be followed by a RESIDUAL OUTPUT table. This last
table contains a column of Predicted Y or fitted values and a column of Residuals values. We use
the fitted values in the next section.
'! A I B I c I
2:.2 RESJDUAL OUTPUT
I
·23
24 ObseNation Predicted Y .Residua.ts
25 1 75.37280548. 3�.84719552:
2.6 2 98.330ll827 37.64951773
27 3 108.7470808 10,59291519
28
 4 140 282'16,7 2532216803
.5 23K31 1 059·4 4!}25105·644
�
Go back to your food data worksheet and select A2:B41. Select the Insert tab located next to the
Home tab. In the Charts group of commands select Scatter, and then Scatter with only
Markers.

Scattn 
c111urn·111
Cham
A,rea Srntf>J
� 1'S
Other
Charts T
fi
! Ll �
• a
.... ..
4()
35
•
3()
•
•
.. .. ....
25
T
••••
2()
••• .. ...�
. .. ,.. . . . •Seriesl 1:
15 ..... .
• ••
t()
5
•
r
()
You can see that our food expenditure values are on the horizontal axis and income values are on
the vertical axis; we would like to change that around and edit our chart as we did in Section 2. 1.
(
The result is see also Figure 4.6 on p. 144 of Principles ofEconometrics, 4e):
0
D
lD
....,. 0
.i; 0
l1' .
!!! �.
" 0
:= a
,,
"
"" . I
.
8. 0
. .
" <>
" "'
"C . . ·'
0 0
.e a . .
,,_ "' . .
::;;; .
" a . . .
0
"
!: ....
0
0 5 10 15 20 25 .:15 40
.... ..  
Finally, we add the fitted linearlog relationship to our scatter plot. Rightclick in the middle of
the chart area of your scatter plot and select Select Data. In the Legend Entries (Series) window
of the Select Data Source dialog box, select the Add button. In the Series name window, type
Fitted LinearLog Relationship. Select B2:B41, from the food data worksheet, for the Series X
values; select B25:B64, from the LogLinear Model worksheet (Predicted Y values) , for the
Series Y values. Finally, select OK. The Fitted Linearlog Relationship series has been added
to your graph.
� Eo·mat Pr.otAr,·a...
Seriesl DK G;J
Prediction, GoodnessofFit, and Modeling Issues 107
Before you close the Select Data Source dialog box, select Seriesl and Edit. Type the name
Actual in the Series name window. Select OK. In the Select Data Source window that re
appears, select OK again.
r
1 S'0lffi:t Data Source
..
I Edit Se rie5.
J:l:!lart gata range; c=
The dlliD range is roo complex t Series g_ame·:
the serieos in ttie· Series.pal':lel.
[IJ '>!!lei
JP Series �values.:
Series Y 'lalUJes:
I ='fuod dat:l'!$A$2:$A$41 liJ. = 1.l
OK_E;J
Make sure you chart is selected so that the Chart Tools are visible. In the Layout tab, go to the
Labels group of commands. Select the Legend button and choose either one of the Overlay
Legend options. Grab your legend with your cursor and move it to the upper left comer of your
chart area.
Ol!erl!ay Legemf at .Right
Sh1ow Leg1endl at iight of
Ch;ntTool�
the chart
��
wbthouli resizing
ov'11rl.ay i�;;n di at L1eft
Show Legrend at ren of 1'of
Chart Axi1 Leg:end D.ot:i Data
Title• Title>• • �f;;ablJ"ls • Tattle•
the chort wWlou1i re si:zin: g
Design Layo Format
Labe�
wC::s
Finally, we want to reformat our Fitted LinearLog Relationship values series. Select the
plotted series in your chart area, rightclick and select Format Data Series. A Format Data
Series dialog box pops up. Select Line Color and Solid line. Change the line color to something
different from the Actual series points. Select Marker Options, and change the Marker Type
from Automatic to None. Select Close.
Qel:ete
�
Marker Fill
A.dtd lirendli:ne.,, Marker Fill
The result is (see also Figure 4.6 on p. 144 of Principles ofEconometrics, 4e):
108 Chapter 4
0
0
"' • Adunl
"' 0
.!ii 0
11\ .
!:! Fitted LinearLog '·
" 0
:!: 0 R"latiombip
"Cl <t • .r,
c
8. 0
"' 0
" m
] 0
.g 0
"'
.i:'
...
!II 0
" 0
....
s
0
0 5 10 15 20 2.5 30 35 40
we•!klyinoome in $100
y= 1+x+e (4.14)
First, 300 pairs of xi and ei values are created using random number generators, similarly to the
way we artificially generated variables in Sections 2.4 and 3.1.4. The variable x is simulated,
using a random number generator, to be evenly, or uniformly, distributed between 0 and 10. The
error term e is simulated to be uncorrelated, homoskedastic, and from a standard normal
distribution, or eN(0,1). We generate these simulated observations next.
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Random Residual.
In cells Al:Bl of your Random Residual worksheet, enter the following column labels.
A B
x e
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
 = '
ta Analysis (1]�
Analysis, Tools
ti
Rank and Percentile
Regr=ion
Sampling
tTest� Paired Two Saomple fur Means
tTest: TwoSample Assuming EQual Variances �I
A Random Number Generation dialog box pops up. The Number of Variables simulated is 1,
and the Number of Random Numbers generated is 300. The variable x is simulated to be
Uniformly distributed between 0 and 10. Select the Output Range in the Output options
section, and specify it to be A2:A301 in your Random Residual worksheet. Finally, select OK.
,. 
Random Number Generation [1]�
Nwriber of !£ariables.:
1�1.�
Number of Random Numb_ers: �I
.3
_ 0
0 ____� �C_an_ce_I �
Q.lstnbutiom I uniform
Parameters
Ri111dom Seed:
Outputop\iarui
@ Qutput Range:
We repeat to draw a random sample of 300 error term from a standard normal distribution. Select
the Output Range in the Output options section, and specify it to be B2:B301 in your Random
Residual worksheet. Finally, select OK.

1' Rarnidom Numltl·er Generation [1] �
Nll!Tlber of ilariables: lt
._ ___ __.I �
Ni.imber of Rilif'ldom NurnQers: l ::m_o
._
____
_.I I cancel ]
Q.istribution:
�IN _rn ma_ _I _____
v I [ ttelp ]
�
Parameters
M�an=
!CJ
:i_randard deviation = �
Random Seed:
Ou1put oplions
In cells Cl:C2 of your Random Residual worksheet, enter the following column label and
formula.
c
y
=l+A2+B2
Select cell C2 and copy it to cells C3:C301. Here is how our worksheet looks (only the first five
values are shown below):
A B c
x e y
4405957 0.998193 6.40415
'9.518723 1.011883 11.53061
3.821223 0.0063 4.812 922
5 :2.649'922  0 . 4 32 0 1 3.217908
6 3.976562 0.25586 5..23:2422
Note that you will have drawn a different random samples and thus also obtained a different
sample values for y.
Next, we apply the least squares estimator to these simulated observations and compute the least
squares residuals.
In the Regression dialog box, the Input Y Range should be C2:C301, and the Input X Range
should be X2:X301. Select New Worksheet Ply, name it Simulated Model 1 and do check the
box next to Residual Plots. Finally select OK.
  �
==   
Regr,es�ion llJ�
Input
In addition to the Summary Output you now have a Residual Output table and a Residual Plot
in your new worksheet.
After editing the chart as we did in Section 2.1 or Section 2.3.4, the result is (see also Figure 4.7
p. 146 of Principles ofEconometrics, 4e):
z
•
• I
.. • 4ii • •
,. •
, . ••:
l  .. : .,... , •e •. ... .. ..
•
:.
••
4e45 • I
: •
.., :
: I
• • • . , 41 \.,. • • .I • "
• ••• •
= • • • • • •
....•• "'1i
ll 0 I • ••
..., I .. ...
••
i II .Ii ;.. :· I ol
•
U •• a : I ..• • I •; 6 • • A • ••• e e • II .. �
•
l : • ,: • • .. � • I fl..:.: .:
• •• II I e • I .1·
•
.. z
j ' • • .. •
• •
•
� +���������������
0 2 4 6 8 10
Go back to your food data worksheet, select your scatter plot of food expenditureincome data
points and fitted linearlog relationship and make a copy of it. Rightclick in the middle of the
copy of your chart. Select Select Data. In the Legend Entries (Series) window of the Select
Data Source dialog box, select the Fitted LinearLog Relationship series, and then the Remove
button.
'  ������
31:! Rotcuon.
Next, select the Actual series, and then the Edit button. In the Edit Series window, replace delete
the old Series name and respecify r the Series Y values to be C25:C64, from the LogLinear
Food Model worksheet. Finally, select OK, twice.
,.   
�
I Select ITlata S0>urce
Edit Series
chart !l_ara ranQ€: c= Series name�
The data�ange is 'too corn,Ple:x t
the series irn '!he Series panel.
Senes X �alues.:

J� �_cd �d_a:ta_;!'�'
l,_·rn_ ' '4
 $2::58
'$ IJ � 3.
1  �•
=
OK fiJ
The result is (see also Figure 4.8 p. 146 of Principles ofEconometrics, 4e):
112 Chapter 4
. " '·
.
.
. . .
: . .. .
10 20 .3 0 40
I mcomein S 1!!0
y = 15  4x2 + e (4.15)
First, 50 pairs of xi and ei values are created using random number generators, similarly to the
way we artificially generated variables in Sections 2.4, 3 .1.4 and 4.5.1. The variable x is
simulated, using a random number generator, to be evenly, or uniformly, distributed between 0
and 10. The error term e is simulated to be uncorrelated, homoskedastic, and from a normal
distribution with mean 0 and variance 4, or e�N(0,4). We generate these simulated observations
next.
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Specification Error Residual.
In cells Al :C2 of your Random Residual worksheet, enter the following column labels and
formula.
A B c
1 x e
2 1 =2.5((A21)/10)
3 2
Select cells A2:A3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, leftclick, hold it and drag it down to cell A52.
A J
1

2 1.1
3
. ' �
Prediction, GoodnessofFit, and Modeling Issues 113
Copy cell B2 to cells B3 :B52. Your table should look as the one below (only the first five values
are shown).
., A I B I c
1
�
x e
2 1 '.2S
3 2 2.4
_!._ 3 23
5 4 22
6 5 2_ 1
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
, 
H
Rank and Percentile
Re:gre.ssio n
Sampling =
We draw 1 random sample of 51 error terms from a normal distribution with Mean 0 and
Standard Deviation 2. Select the Output Range in the Output options section, and specify it to
be C2:C52 in your Random Residual worksheet. Finally, select OK.
Mg_an=
�
::trandard deviation = �
B.andom Seedc.
OUtputoptions
0 Quti:>utRange: I $1::$2: $C$52
In cells Dl:D2 of your Specification Error Residual worksheet, enter the following column
label and formula.
114 Chapter 4
D
1 y
2 =154*(A2A2) +B2
Select cell D2 and copy it to cells D3:D52. Here is how our worksheet looks (only the first five
values are shown below):
A B c D
x e J. 
1 2_5 2.72.3068 _7_275g3
2 24 0_50477 8_54477
3 2.3 1_115236 5_04476
4 22 2_916886 1_44311
5 2__ 1 2.982706. 0.342706
Note that you will have drawn a different random samples and thus also obtained a different
sample values for y.
Next, we apply the least squares estimator to these simulated observations and compute the least
squares residuals.
In the Regression dialog box, the Input Y Range should be C2:C52, and the Input X Range
should be A2:A52. Select New Worksheet Ply, name it Simulated Model 2 and do check the
box next to Residual Plots. Finally select OK.
Input
Input :t_ Rar.ige:
Cancel
InputlIR,,,,ge: �$2::$11$52 �
ltjelp
D loaliels D CenSctallt is f_ern
0 Coojider.ice Level: EJ <>r.
Output options
0 QutputRange: l 'sm0roted !odd� �I
0 NeVi' W"rksheet E'.IY: I S"wnulated M"del � I
0 New �orkbook
In addition to the Summary Output you now have a Residual Output table and a Residual Plot
in your new worksheet.
  
�
29 5 6_035026236 5'692320172
After editing the chart as we did in Section 2.1 or Section 2.3.4, the result is (see also Figure 4.9
on p. 147 of Principles ofEconometrics, 4e):
Prediction, GoodnessofFit, and Modeling Issues 115
15
10
.Iii 5
..
::I
JZ ()>
,;! 5
10
15
20 1.,...,,....
3
Our analysis of normality of the regression errors will include a histogram of the residuals and the
JarqueBera test for normality.
Go back to your Food Regression worksheet. If you do not see your Food Regression tab, it is
because it is hidden. Use either one of the leftarrows at the left comer of your screen so that the
first worksheets you were working with can be seen again. (If the worksheet you need to go back
to is a recently created one, use the rightarrows.)
�
�
Next to the columns of Residuals in the residual output section of the worksheet, we will create a
BIN column. In cell D24, type BIN. The bin values will determine the range of residual values
for each column of the histogram. The bin values have to be given in ascending order. Starting
with the lowest bin value, a residual value will be counted in a particular bin if it is equal to or
less than the bin value.
Fill in the bin values as shown below. Note that all you need to do is enter the first two values:
225 and 200, select cells D25:D26, move your cursor to the lower right comer of your selection
until it turns into a skinny cross as shown below, leftclick, hold it and drag it down to cell D43:
Excel recognizes the series and automatically completes it for you.
116 Chapter 4
D I
24 BIN
2

25
25
26
27
28
200
1'75
150
1
29 125
30 100
31 75
32 50
33 25
34 0
35 25
36 50
37 75
38 100
39 125
D J 40
41
'1'50
175 .I
��I
2
. + r1\
:���I I 42
43
200
225
7J T:
E:::========::::::::!I
. ,
Select the Data tab, in the middle of your tab list. On the Analysis group of commands to the far
right of the ribbon, select Data Analysis.
Ii& Data.�rcalysls I

nata[;j�11te�
 
I I
I
I
"Ila5
E'orn:i J.ln.arym
The Data Analysis dialog box pops up. In it, select Histogram (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
analysis Tools
Covariance
Desi::iiptive Stalistirn
'Exponential Smoothing
.fTestTwoSam ple for V;;irianc:es
1�Fonum
r ie�r�M l ·� s tl_elp
iai
1i#ij!.Ji. I'"' r 1........ ......
I .. ..·.�
'Mov.ing .Average �
'RaAdom Number (:;ener.ation
Rank and Percentile
'Regression vll
An Histogram dialog box pops up. For the Input Range, specify C25:C64; for the Bin Range,
specify D25:D43. The Input Range indicates the data set Excel will look at to determine how
many values are counted in each bin of the Bin Range. Check the New Worksheet Ply option
and name it Residuals Histogram; check the box next to Chart Output. Finally, select OK.
Prediction, GoodnessofFit, and Modeling Issues 117
r 
Hi stogram tz:J�
Input
Input Range:
[�] DLt1
!:l.inRilflge: �
't!elp
0!,abels
Select the columns in your chart area, rightclick and select Format Data Series. The Series
Options tab of the Format Data Series dialog box should be open. Select the Gap Width button
and move it to the far left, towards No Gap.

·   _
3 DRotat1r;m, Shadow
Gap�dth
Add Dato LaQels 30 Format
No Gap Large Gap
Add lirmdline ...
0 Agtomatic
Shadow
30 Format
�olor: I��
]"ansp �Col'or] 0 [0% �1 Close.ti]
Finally, delete the Legend, and increase the size of the Chart area (see Section 2.3.4 for more
details on that). The result should be very similar to Figure 4.10 on p. 148 of Principles of
Econometrics, 4e:
118 Chapter 4
Histogram
Dim
4.6.2 The JarqueBera Test for Normality using the CHllNV and CHIDIST
Functions
When the residuals are normally distributed, the JarqueBera statistic UB) follows a chisquared
distribution with m = 2 degrees of freedom:
]B =
N
6
( S
z
+
(K  3)2 ) "'X(m=2)
4
z
(4.16)
where S = µ3
0'3
is a measure of skewness and K = �a 44 is a measure of kurtosis,
where (4.17)
(4.18)
(4.19)
If the hypothesis of normally distributed residuals is true, there is 100a percent chance that the
computed ]B statistic is equal to or greater than the chisquare critical value Xcia,m)· If the
computed ]B statistic is equal to or greater than the chisquare critical value Xcia,m)' then this
presents us with evidence that our hypothesis of normally distributed errors is false; we thus
reject it.
Prediction, GoodnessofFit, and Modeling Issues 119
2 reject Ho
X(m)
2 x_'L value
X(1a,m)
We will create a template for the JarqueBera test for normality. But before we do that, we need
to go back to our Food Regression worksheet to perform intermediate calculations.
Before we compute the measure of skewness S and the measure of kurtosis K, note that since
2
L( ei  �4,
3
� = 0, the numerators of equations (4.17)(4.19): L( ei  �) , L( ei  � , and can
.
s1mpl.1fy to.
. � "2, .t...
.t.... e
� "3
. e and .t.... e"4 .
�
i i i
To the right of the residual output section, create the following table:
F G H
2 3 4
24 residuals residuals residuals
25 =C25/\2 =C25/\3 =C25/\4
Copy cells F25:H25 to cells F26:H64. Your worksheet should now look like the one below (only
partly shown):
F I G I H
24 Residuals? Residuals3 Residuals"
25
 34A_S208433 202219603 1186.945115
2'6 59_.�§41�98 �,. 464.3421034 3595.705.263
f
27 158.0505536� 1986.98245 24979_9n4s
,..__
28 9012:097207 2:7054.4557 812178.9608
r
2:3 560.7541899  1 3278 798
�  
. 8 314445.2614
  
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it JarqueBera Tests.
lt:J f'{ 1
I 1 rnre.lit Warkihe.•t CShi�Fli) �
A B c
1 Data Input N= ='Food Regression'!B8
2 a=
3 dfor m= 2
4
5 Computed atilde= =SQRT(SUM('Food Regression'!G25:G64)/Cl) (4.17)
Values
6 µ3tilde= =SUM('Food Regression'!H25:H64)/Cl (4.18)
7 µ,itilde= =SUM('Food Regression'!125:164)/C1 (4.19)
8 S= =C6/C5"'3
9 K= =C7/C5"'4
2
10 x critical =CHllNV(C2,C3)
value=
11
12 JarqueBera JB= =(Cl/6)*(C8"'2+((C93)"'2)/4) (4.16)
Test
13 Conclusion =IF(C12>=C10,"Reject the hypothesis of
normally distributed errors","Do not reject the
hypothesis ofnormally distributed errors")
14 pvalue= =CHIDIST(C12,C3)
15 Conclusion =IF(C14<=C2,"Reject the hypothesis of
normally distributed errors","Do not reject the
hypothesis ofnormally distributed errors")
2
The x critical value is computed using the CHIINV statistical function. For our purpose, this
function syntax is:
=CHIINV( a,m)
where a is the level of significance of the JarqueBera test, and m is the degree of freedom of the
chisquared distribution.
The pvalue is computed using the CHIDIST statistical function. For our purpose, this function
syntax is:
2
=CIDDIST(x value,m)
2 2
where x value is the x critical value for which we are computing the pvalue, and m is the
degree offreedom ofthe chisquared distribution.
At a= 0.05, the results ofthe JarqueBera test are (see p. 148 ofPrinciples ofEconometrics, 4e):
Prediction, GoodnessofFit, and Modeling Issues 121
A B c D E F G
1 Data Input N= 40
2 a= o_o5
3 df or m = 2
4
5 Computed Values crti'lde = 87_250383
6 µ.,.tilde = "�.�39,..66
7 Jl�tilde = 173:220834
8 s= 0_ 097319
r
9 I{ = 2 _9'890333
10 Icritical value = s:!l914645
'11
12 Jan1u�Bera Test 
JB = 0.0633402 �
14 pvalua = 0_9'680262
1.5 Cenci us ion= Do not reject the l'lypoth·esis of normally distributed emars
4.6.3 The JarqueBera Test for Normality for the LinearLog Food
Expenditure Model
We first go back to our LogLinear Food Model worksheet to perform intermediate calculations.
To the right of the residual output section, create the following table:
F G H
2 3 4
24 residuals residuals residuals
25 =C25"'2 =C25/\3 =C25/\4
Copy cells F25:H25 to cells F26:H64. Your worksheet should now look like the one below (only
partly shown):
F I G I H I
2 � 4
24

Resid'uals Residuals Resid'uals
25 1587_ 79ll991 6326�.33&84 2521105.635
'
1.§. 1417493715
..
53368.09651
.
.20092
1 68.432'
Now, we are ready to modify a few cell references in our JarqueBera test template.
Replace all references to the Food Regression worksheet to the LogLinear Food Model
worksheet (see outlined below in bold).
A B c
1 Data Input N= ='LogLinear Food Model'!B8
2 a=
3 df or m = 2
122 Chapter 4
A B c
5 Computed atilde= =SQRT(SUM('LogLinear Food Model'!G25:G64)/Cl)
Values
6 µ3tilde= =SUM('LogLinear Food Model'!H25:H64)/Cl
7 l.14tilde= =SUM('LogLinear Food Model'!I25:164)/Cl
8 S= =C6/C5/\3
9 K= =C7/C5/\4
10 X,2critical =CHIINV(C2,C3)
value=
11
12 JarqueBera JB= =(Cl/6)*(C8/\2+((C93)A2)/4)
Test
13 Conclusion =IF(C12>=C10,"Reject the hypothesis of normally
distributed errors","Do not reject the hypothesis of
normally distributed errors")
14 pvalue = =CHIDIST(C12,C3)
15 Conclusion =IF(C14<=C2,"Reject the hypothesis of normally
distributed errors","Do not reject the hypothesis of
normally distributed errors")
At a = 0.05, the results of the JarqueBera test are (see p. 149 of Principles ofEconometrics, 4e):
A B c D E F G
1 Data Input f\I= 40·
2 II= 0.05
3
4
1df o� m = 2
t
5 Computed Valu�s ertilde= 8:9.248579
(); µ3tilde = 99251.00
7 �tilde= 20Q.3.353n
5, S= 0.�396145
9t K= J: . 2048499
10 icriticaJ va'lue = 5'.9914645·
.
H
·12 JarqueBera Test JB =
0.'.1998875

13 Conclusion= Do �ot rej.ect the. hypothesis of normally_distributed emors
�4 p_value =
01.S048883
15 Conclusion= D_o nat_�eJect the_hypa:the_sis of n_grmally_distributed en:rors
Open the Excel file wawheat. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 4 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it wawheat data, and in it, copy the data set you just opened.
This data set gives average wheat yield for different regions of Australia for the period 1950
1997. Time is measured using the values 1, 2, ..., 48 in column E. We would like to plot the
yield data for the Greenough Shire area, reported in column D.
Select the Insert tab located next to the Home tab. Select D2:E49. In the Charts group of
commands select Scatter, and then Scatter with only Markers.
Scatter
Cllarts r.
50
•
# •
....• .
40
.,
;
•"·
..
'30
•Series1
20

 .•• \ I:
. ...
• #
10
·�
..  ...
0
�· .. .
0 0.5 ·1 1.5 2 �5
:   ......   ..
You can see that our yield values are on the horizontal axis and our time values are on the vertical
axis; we would like to change that around as we did in Chapter 2 with our plot of food
expenditure data. Select the points on your plot, rightclick and select Select Data.
·60
50
•
40
•
.Qel.ete
so
+.s.eries1
,l:J Reset to M:i!_tch Style
.20
_1£11 Change.SHies Charil:Type...
10
• lliJJ S:i;lect D.atta ....
ht
3 D B�ta'J1rn,,
• 
Adidl Data La!):els
0
AddiT1endlin• ...
0 0..5 1 L5 z Z_'i
� I �S�hli Ro.,,/Column J 9
L =e
r;=e i. es �'=
=
n d=En=tr
g= er=
ies=� ===;;======7, ,Hori2orit:aJ (!;;ategory) Axis Label£
=;;i'
lk � ��=dd�..
� ll
rN..,
...., · �d
.... it"*'t'J�I
=X=�=c w=
em=
u
1 1 ' 2{ E 1 I
_ •
e [_'.'_ J ll
Seriei:l 0.9141
0.6721
0.71.91
O.nlill
o.:ms
In the Edit Series dialog box, highlight the text from the Series X values window. Press the
Delete key on your keyboard. Select E2:E49. Highlight and delete the text from the Series Y
values window. Select D2:D49. Select OK.
 . .
I Edit Se ries m� Edit Series  �rg)
Seriesoame: Series :[lame :
OK � [ Cancel OK Cancel
The Select Data Source dialog box reappears. Select OK again. You have just told Excel that
time are the Xvalues, and yield are the Yvalues  not the other way around.
After editing your chart like you did in Sections 2.1.2a2.1.2c, the result is (see also Figure 4.11
p. 150 of Principles ofEconometrics, 4e):
J '
..
L5
:II .
� .
.
.. . .
. ..
. .
1
. ..
..
OS
0 10 20 30 40 �o
Tunec
Prediction, GoodnessofFit, and Modeling Issues 125
In the Regression dialog box, the Input Y Range should be D2:D49, the Input X Range should
be E2:E49. Select New Worksheet Ply and name it Linear Equation Model; and do check the
box next to Residual Plots.
 ������� 
!Input li Ran.ge.:
·l::!elp
D \_abels D Constant is G_er()
D Con�dence Level: EJ %.
Output option&
0 Qutput Range: �1
0 New WorkSheettely: jLinear E:quation M1>d;el I
0 New Woi\Cbook
itesidual,;
DResiduals
The results are (only part of the residual output section is shown below; the residual plot is not
shown at all) :
A I B .C I D I E I F G I H
1 SUMMARY OUTPUT
;zi�����
3 Reores:;ion Sraostir;s
4 Multiple R (J.805849601
5 R·squaM� 0.6'4939358
T Adjusted R Squ.are a.'641771101
 Standard Error
T
8
·
Ohs(!rvatic.ns
0.:21B69Zz34
48
_;�ANOVA
11 ] df SS MS F Signific1tnce F
1 4.074859899 4.074859899 85.20124832 4.B7517E12
4,6 2.200009496 0 . 04 7826293
47 6.274869'3.95
Coeffidents Standard Error t Slat PvaJue lower 95% Upper 95% lower 95. 0% Upper 95. 0%
I ntsrc_spt
J..l.j tL!J.37777837 0_064130508. 9'.944999006 4,.fi492E13 0.5.QB689822 CJ.7661Hi5B52 .
0. 50868
. 9822' 0,_7'6&865852
1"8 X Vanab.le 1 () ..021031942 o.06227as.:fo �.23o4s2221' 4.87577E12 ().01G445482 0_02s6ls·402 o.o 16'i4s4si o.01s&1s402
Predrr;fed Y Re.siduaJs
1 (J.:�5fl80,9�?9 0255290e21
2 a:&79B41721 o.o"Qn,f1121
3 ()_ 700873663. 0.01822.&337
126 Chapter 4
The estimated linear equation model is (see also p. 150 of Principles ofEconometrics, 4e):
0.8
0.6
0.4
. . .
. . . . ..
.Jll
"'
0.2
"'
3Z
. ..
··.
. .
••
••it
,i (J
..
0.2 . . . ..
0.4
...
0.6
(J 10 20 s;o 40 5()
Time
Note: to draw the horizontal axis below all the points, select the vertical axis on your chart, right
click, and select Format Axis. In the Format Axis dialog box, under the Axis options panel,
select the Horizontal axis crosses at the Axis value 6.0. To draw an horizontal line at level 0 of
the residuals values, select the plot of residuals on your chart, rightclick and select Add
Trendline. Choose the Linear option, and Close.
.Qelete
r
  
� R.�et to M�tch Style
 1 Format T ren d line
I Add�Data La_!!els
Line Style
Shadow
Jl£J 0 EJgJOnenbal
AddlTrendU n,.. ... � JV!.'J �binear
� .Em•rmat Datta 5eries ...
Close c;J
Let TIMECUBEt = TIMEt /1,000,000: our explanatory variable is redefined as our original
explanatory variable, cubed; and it is also rescaled before the equation above is estimated.
3
Go back to your wawheat data worksheet. In Fl, enter the column label time . In cell F2, enter
the formula =(E2A3)/1000000; copy it to cells F3:F49. Here is how your table should look (only
the first five values are shown below):
I D I E I F
1 gre.enoug,h time time3
l 0>.9r141 f Oi.000001
3 0..6721 2 Oi.000008
'4
15 o.i1s1 3 01.000021
0·.7258 4 0>.000064
£ 0.7998 5 0'.000125
We want to reestimate our wheat yield model using our original y values and our redefined and
rescaled x values.
In the Regression dialog box, the Input Y Range should be D2 :D49, the Input X Range should
be F2:F49. Select New Worksheet Ply and name it Cubic Equation Model; and do check the
box next to Residuals Plots.
,    
0 Quq:utRange: .�1
0 New WoFllSheete,ly: [ Cubidquaficn Modell I
0 New �orkboolc
Residuals
0Boe'liduals � Re'liQ_ual Plots
The results are (only part of the residual output section is shown below):
128 Chapter 4
A I B I c I D I E F I G I H I I I
�
I
2
SUfl!'IMARY OLJTPLJT
J( Rearession Statistics
�M,IUpoR •Oi.86&495734
R Square 01. 750814858• 
Adj u sted R Square 0.745397789
S1and:ard Error 01. 1 84367557
0 bs ervatio ns 48
1901ANOVA
11 I df SS MS f Sig_nrficance F
J_?_ Regressi'on 1 4.711265172 4.7112&5172• 1,3S.:50'16965 1. 76303.E15
,.11 ResiOw1I
14 T�t�I 
46:
47
1.56:i604223
6.274869395
0_03399139'6.
. �· .... ...
1
I
15 I
161 Uee_er 95% Lower95.0% Upper95.0%
�lnterVG1criepo11blle
X 1
Coefficients Standa!d Error
(}_8.7411<6582
9.68151584.
t Stet
0.0·35'63066'3 24.532702.71
0.1322354527' 11. 7729<2217
Pval11e
4.6(}223 E28
V680.3E15
fo1•;redl5%
!H02395IG9i 0:9458373 96, O<. $0�3�5:76,9
!L 026202058. 11.336829&2 8. 02620205 ll
Q.945837.396·
11.33682962

19
JQ_
21
fI
22 RESIDUAL OLJTPLJT
,___
:23
241 Observalion Predicted Y Residuals
1 0.8.74126?64 Q.()3991373&
�6 2 0.8741941:)34 _:_iL2D�.094034
rm 3 0:874377983 0.1552n9B3 I
The estimated cubic equation model is (see also p. 151 of Principles ofEconometrics, 4e):
Notice that when you choose the Residual Plots option in the Regression dialog box, Excel
generates a plot of the residuals against the explanatory variable, which, in this case, is
TIMECUBE. We would like to have a plot of residuals against time instead. Select the data point
in your chart, right click and select Select Data. A Select Data Source dialog box pops up. Select
Seriesl and then Edit. In the Edit Series dialog box, change the Series X values references to
E2:E49. Finally, select OK, twice.
,.,  .
S9'lec:t Data Source
Chartgatar:ainge: c=
1he data range is too c.omplex to
e.
!he s r ies in the Series paneL . ������
� Re>etto M�tch·S:tyle J�
1Le!iiend Entries (§eries)
Change Se<ies Chart Type.. .
l.q S�led i)a�a... � Seri�s K values:
2.1 or
After editing the chart as we did in Section Section 2.3.4, the result is (see also Figure 4.13
on p. 151 of Principles ofEconometrics, 4e):
Prediction, GoodnessofFit, and Modeling Issues 129
0.2 . .
�
. �
Cl.:l! ..
Jll . ..
.. 0
.
=
:!:I
ll.ll ... .. ..
.�
•
..
.. .
..
0.2 .+
{)_3
0.4
0.5
() 10 20 :!O 50
Time
where y; = ln(YIELDt); i.e our dependent variable is redefined as the natural logarithm of our
original dependent variable.
In your wawheat data worksheet, move your charts to the right a little bit if you would like. In
cell Gl, enter the column label ln(greenough); resize the width of your column so it fits the new
label. In cell G2, enter the formula =ln(D2); copy it to cells G3:G49. Here is how your table
should look (only the first five values are shown below) :
D I E I F I G
,_
1 gre.enough time tim!!3 ln(gre::nough)
2 (}_9141 1 1E06·  0 _ 089'815304
3 ()_6721
 2 BE05 0_39'7348.14

,_
4 0>_7191 3 3E05· 0 _ 329'754849


We want to reestimate our wheat yield model using our original x values and our redefined y
values.
In the Regression dialog box, the Input Y Range should be G2:G49, the Input X Range should
be E2:E49. Select New Worksheet Ply and name it Growth Model.
130 Chapter 4
Input
Input� Rar;ige:
Input lt Range:
I :$G$2: $(;s49
:$E$2::$E:$49
�]
�
� el
tielp
D b.abels D t;onstant:is �ero
D Con�deni:;e Level: �%
output oi;i�i;rns
0 QufputRarige: �1
@ New WGrk:ihleet Riv: IGrowth Modell I
0 Ne111J �orkbook
Residuals
I I I I I I
A B I c I D E F G fl I
mSUMMARY OUTPUT I I I
l
JI Reg_re;i.sjon S/ati:>fic;:; I l
4 Multirile:R 0.785168587
f"
5 £l. �qua�e o.51648911 •

c§__ Adjusted R Square o.1ios1s.2s:i
7 Standard Errm 0.1'.'l9164869
r '
8 Ol:iservations 48
'
9
c j
1 0 AN OVA
11 df SS MS F Sr11.nifica11ce F
t2
f
Regres'Siol'l 1 _2 9·3313542 2!}3313'542 73. 944£3042 3.9'3229E11
13 Re,sidual 46 1.8.24655579 0 .0396€>&645.
c 1
14 Tota.I 47 4. 7s.rno 1099
15 [
t61 Coefficienft; Standard Enor t Slal Prnlue lowr
e 95% UeP_er'95% lower95.0% Upper95.D%
H j lnterce19t 0. 3 43366453 0. 0•5 8404196 .5.8791400!34 4,�9317E07  (). Mi,0> 928 0 0 1 0.2258049'05, 0.460928001 (). 225.804905
Ts ' x vaiia'l>le 1 b� Q,j 7843872 0.0 0'2075084 8.599106374 3.93229E11 0. 013666943 (LOi20,2:08 0.013666 943  ·a_o.2262osl
The estimated growth model is (see also p. 153 of Principles ofEconometrics, 4e):
Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 4 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it cps4_small data, and in it, copy the data set you just opened.
l
[Insert Worbheet (ShiftFilJ M
Prediction, GoodnessofFit, and Modeling Issues 131
This data set gives information on hourly wages, years of education and other variables. Based on
this data, we would like to estimate the following wage equation:
(4.26)
where Yt ln(WAGEa;
= i.e our dependent variable is defined as the natural logarithm of the
variable WAGE.
In cell Ml of the cps_small data worksheet, enter the column label ln(wage). In cell M2, enter
the formula =ln(A2); copy it to cells M3:M1001. Here is how your table should look (only the
first five values are shown below):
A I l'3 I c I D I E I F I G H I I I j I K I L I M

j. 11.5 1·2 16 62 QI 0 0 1 0 QI U1 0 2 . 4423 4704
4 15.04 16. 13 4·01 1 Q1 1 0 0 1 � 0 2.71071332
5 25.95 14 11 401 Qi 1 1 0 1 Qi I 0 3 255171 51

+"'"
G 24. 03 '1:2 51 401 1 o· 1 0 0 O· O· 0 3.1793·03'05
We want to estimate our wage equation using our original x values and our redefined y values.
In the Regression dialog box, the Input Y Range should be M2:M1001, the Input X Range
should be B2:B1001. Select New Worksheet Ply and name it Wage Equation.
������� ·
0 Qutput Range; �I
0 New Workslieet ['.ly: J WE1ge Equation
0 Nel\' j&'.or•kbook
Re s iduals
0 B.esiduals
D SiandEirdized ResiduElis
Normal Probability
0 t:!ormal Prnbability PJot&
I
A __J_ B I c _j_ D _)___ E �
F _l_ G J_ H I I
1 SUMMARY OLJTPUT
,_
2
3 Regressiorr Slelistio.S
4 Multiple f3: 0_4?2142751
5 RSquare 0_ 1 7BZ04 502
'T
, A\ljust<ed R Square o. 17738106:
0526611364:
_
MAN
I
OVA t �
11 I df SS MS F SiS]_ni'f(carice F
Jl Regression 1 &0_01.5342·69 60_015842:69 2!6.4;1'!_q?_11 U455 9E44
f3 Residual i 998' 276.7648898 0 2773195'29
� ., �·
JfiI
16
I ntercept
18 X Variable 1
_
CoeffiGient:i
160>944446&.
0_090408247
Standard Eiror
0. 08642:2944
0.006145615
t Star
18__ 622_381
1471101802
Pvalue
1. 14·645E"66
U4559E44
Lower 95% Ue_l!_er 95% Lower95.0% Upper 95_0%
1.43c9852B7 1.7790'35995 1.4391652937 1_7790%995
0.078348438· Q_ 1·02468 056 ff.O:i8:34B438 o_ 102458 as&:
The estimated wage equation is (see also p. 153 of Principles ofEconometrics, 4e):
4.8.3 Prediction
For the natural logarithm the antilog is the exponential function, so a natural choice for prediction
in a loglinear model is:
2
Ye = exp(b1 + b2X + 8 /2)
(4.29)
2
where b1 b2 are the estimated intercept and slope coefficients of the loglinear model, and 8
and
is the estimate of the error variance or mean square residual (MS residual).
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Prediction in LogLinear Model.
Create the following template to make prediction (in the last column below you will find the
numbers of the equations used in the template):
Prediction, GoodnessofFit, and Modeling Issues 133
A B c
1 Data Input Xo = 12
2 b1 = ='Wage Equation'!Bl7
3 b1 = ='Wage Equation'!B18
4 MS residual = ='Wage Equation'!D13
5
6 Computed Values natural predicted y0= =EXP(C2+C3*Cl) (4.28)
7 corrected predicted y0= =C6*EXP(C4/2) (4.29)
Here are the results you should get (see also p. 154 of Principles ofEconometrics, 4e):
A I B I C_
1 Data Input Xo = 12

2 b1 = 1_60H444
,_
3 �= Qi.090'408
;
4

MS residual = 0.27732
5
;
6 Computedi Values natural predicten y0 = ·r4_795s
7 correc1ed pr edicted y = 16_9'9'5431
Next, we want to show graphically how the correction affects our prediction. Go to your
cps4_small data worksheet. Here are the formulas and labels you should enter (in the last row of
each of the tables below, you will find the numbers of the equations used):
N 0
1 educ Yhatn
2 0 =EXP('Wage Equation'!$B$17 + 'Wage Equation'!$B$18 * N2)
3 1 (4.28)
p
1 Yhatc
2 =02*EXP('Wage Equation'!$D$13/2)
3 (4.29)
Select cells N2:N3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, leftclick, hold it and drag it down to cell N23.
N �
1 l 1educ
�
�.LL !
Select 02:P2 and copy their content to 03:P23. Here is how your table should look (only the
first five values are shown below):
134 Chapter 4
I N I 0 I p

Select the Insert tab located next to the Home tab. Select Nl :P23. In the Charts group of
commands select Scatter, and then Scatter with only Markers.
45
40
•
;15
• •
'30
• •
25
+vhatn
20
• vhatc
15
10
0
0 5 15 20 25
Next, we would like to plot the actual values on the same chart. Select the points on your plot,
rightclick and select Select Data. A Select Data Source dialog box pops up. Select Add. In the
Edit Series dialog box, specify earnings per hour for the Series name, select B2:B1001 for the
Series X values and A2:A1001 for the Series Y valuesall from the cps4_small data
worksheet. Select OK, and then OK again in the Select Data Source dialog box.
After editing your chart like you did in Sections 2.l.2a2.l.2c, the result is (see also Figure 4.14
p. 155 of Principles ofEconometrics, 4e):
Prediction, GoodnessofFit, and Modeling Issues 135
BO .
.
70 ,.
501 ..
50•
4'0
3()• •
�
201
]0
0 5 :to 15 20 25
Rz r.2�
(4.30)
=
YYc
Make sure you are in your cps4_small data worksheet. We will compute the corrected predicted
y values in column Q, and next to it, we will compute the generalized R2.
Here are the formulas and labels you should enter (in the last row of each of the tables below, you
will find the numbers of the equations used):
Q
1 corrected predicted y
2 =EXP('Wage Equation'!$B$17 +'Wage Equation'!$B$18 * B2)
*EXP('Wage Equation'! $D$13/2)
3 (4.29)
R
2
1 generalizedR
2 =(CORREL(A2:Al 001,Q2:Q1001))"'2
3 (4.30)
Q I R
correctedl predlicted y· generalized R2
____!__
2 24_40129449 0185930705
3 E6:!>9M2785
>
4 2440129449
s 2{)_36503968
,_
6 [6_99642785 I
The lower limit (LL) and upper limit (UL) of the prediction interval in a loglinear model are:
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it PI in LogLinear Model.
Copy the template from the Prediction Interval worksheet (if you cannot see it, it is because it is
hidden further to the left of your visible worksheets) to the PI in LogLinear Model worksheet.
You just need to make a few modifications to it: (1) get your regression results from the Wage
Equation worksheet instead of the Food Regression worksheet, (2) change x0 to 12, (3) compute
i from the cps4_small data worksheet instead of the food data worksheet, and (4) take the anti
logs of the interval limits using the EXP function. Those modifications are outlined in the table
below.
A B c
1 Data Input Sample Size = ='Wage Equation'!B8
2 Confidence percentage
3 Xo = 12
4 b1 = ='Wage Equation'!B17
5 b7 = ='Wage Equation'!B18
6 se(b2) = ='Wage Equation'! C18
7 MS residual = ='Wage Equation'!D13
9 Computed a= =lC2
Values
10 df or m= =Cl2
11 tc = =TINV(C9,C10)
12 predicted Yo= =C4+C5*C3 (4.2)
13 xbar = =AVERAGE ( 'cps4_small
data'!B2:B1001)
Prediction, GoodnessofFit, and Modeling Issues 137
A B c
14 se(f) = =SQRT(C7+C7/Cl +((C3C13)"'2)*C6) (4.3)
15
16 Prediction Lower Limit = =EXP(C12Cl 1*C14) (4.31)
Interval
17 Upper Limit = =EXP(C12+Cl 1*C14) (4.32)
Here are the results you should get (see also p. 155 of Principles ofEconometrics, 4e):
A 8 c
; 9 Computed Values a= 5%
A I B I c
1 Data Input SamJl'le .Size= 1000 10 df.or m = 998
r
Confiden{;e Le.vel 9'5% 11 le = 1962344
,_L =
r
14
5 b1 = 0.090408
=
 15
6 :se (b2) = 0.006146
16 Predictio·n Interval Lower Limit= 5.0631()6
,_
1 MS residlilal = 0.27732 17 Ufl'per Limit= 43.23744
Note that the results above and the ones from your textbook might differ slightly due to rounding
number differences.
Next, we want to show graphically how our prediction interval changes over the range of years of
education. Go to your cps4_small data worksheet. Here are the formulas and labels you should
enter (in the last row of each of the tables below, you will find the numbers of the equations used
in the template):
s
1 lb wa2e
2 =02* EXP('PI in LogLinear Mode1'!$C$11*'PI in LogLinear Mode1'!$C$14)
3 (4.31)
T
1 ub wa2e
2 =02* EXP('PI in LogLinear Mode1'!$C$11*'PI in LogLinear Mode1'!$C$14)
3 (4.32)
Select S2:T2 and copy their content to S3:T23. Here is how your table should look (only the first
five values are shown below):
s T
1 lb_wage ub_wag·e
2 1711005 14Ji114!l
3 1.872!102 15.99404
4 2.050118 17.50741
5 2.244103 19.16398
s Vl56442 20.9773
Select the whole plot area you completed in Section 4.8.3, which compares the natural and
corrected predictors of wage (replica of Figure 4.14 p. 155 of Principles of Econometrics, 4e).
Select Copy and then Paste. You should have two identical charts. Below we will work with one
138 Chapter 4
of them. On that chart, we want to remove the yhatc series and add the lb_wage and ub_wage
series instead.
Select the points on the chart, rightclick and select Select Data. A Select Data Source dialog
box pops up. Select the yhatc series, and then Remove. Then select Add. In the Edit Series
dialog box, specify lb_wage for the Series name, select N2:N23 for the Series X values and
S2:S23 for the Series Y valuesall from the cps4_small data worksheet. Select OK.
'
Select Data 5oura!
otf!
Legend Entr,ies <s_eries) 1he eri in the series 1

[ t:JMd ][ � Edit ] I J< &em
vhatn
JL Series 'f. �alues�
JI
legend Enlries �eries)
Select Add. In the Edit Series dialog box, specify ub_wage for the Series name, select N2:N23
for the Series X values and T2:T23 for the Series Y valuesall from the cps4_small data
worksheet. Select OK, and then OK again in the Select Data Source dialog box.

�· 
1 Edit Series
Select Data Source
Serie s o.ame:
lub_wage
�
�
The.data range is too cc Ser. ies l( �alues::
the series in 1he Series J
I ='cps4_smaU data'!$N$2::$f\1$2.l [ii]
·series f 'lalues�
After editing your chart like you did in Sections 2.1.2a2.1.2c, the result is (see also Figure 4.15
p. 156 of Principles ofEconometrics, 4e):
BO
70
60
� i
. . _i 1 .i.
: 
__
1: ���J�;;:��1�����. �r����::1.::jrt=::::_
• I : ; • !. 
__
�
0 5 10 15 20 25
Yearsof Education
Open the Excel file newbroiler. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 2 in one file, create a new worksheet in your POE
Chapter 4 Excel file, name it newbroiler data, and in it, copy the data set you just opened.
where Q is the U.S. per capita consumption of chicken, in pounds and P is the real price of
chicken, for annual observations over the period 1950  2001.
In cells Kl:L2 of your newbroiler data worksheet, enter the following column labels and
formulas.
K L
1 ln(q) ln(p)
2 = ln(B2) = ln(D2)
Select K2:L2 and copy their content to K3: L53. Here is how your table should look (only the
first five values are shown below):
K I L J
_1_ ln.{q) ln{p)
2 2.66026 1.0591116
3 2. 714595 1.030993
4 2.727853 1 .0 1 4683
T 2.721295 0_992232
6 2. 7@'01 0.872986
In the Regression dialog box, the Input Y Range should be K2:K53, and the Input X Range
should be L2:L53. Select New Worksheet Ply and name it LogLog Model. Finally select OK.
'   ....
1 Regre>.sion ITJ l'.8J
Input
OKW
lnput '.!'. R21fl9e: I $K$2, $1q;53 � Cancel I
Input,); Range: I $;$2: u 53 [�1
DLabels. D Consmnt is.lero
b!elp l
D Cor>�denc:e Level: �%
Oulput l.lplfons
0 Qutput Rar;ige: 'fiii
0 New Worl<Sheet e:1y: j Loglog Model! I
140 Chapter 4
The result is (matching the one reported on p. 157 of Principles ofEconometrics, 4e):
'
A I B I G I D I E I F I G I H I I
1 SUMMARY OUTPUT

2:
1S Coeofffofon fa Standaro Error I Stal Pvalue Lower 95% Ut>.o&r 95% Low.er95. 0% Upper95_0%
1 7 lnteFcept 3.716943882' 0_022:3594'14 166.236191 !i 2.94446E70 3_672 Q 336.77 3.761854086 3_6720336.77 3 .7618.54086

18
· · ·
X Varia.tlle 1 1.121358001 0_0487�6431 22999118135 2_99987E28 12192881 74 1.02342782'9 1219288174 1. 02342782'9
Make sure you are in your newbroiler data worksheet. We will compute the corrected predicted
y values in column M, and next to it, we will compute the generalized R2.
Here are the formulas and labels you should enter (in the last row of each of the tables below, you
will find the numbers of the equations used):
M
1 corrected predicted y
2 =EXP('LogLog Model'!$B$17 +'LogLog Model'!$B$18 *L2)
*EXP('LogLog Mode1'!$D$13/2)
3 (4.29)
N
1 2eneralized R2
2 =(CORREL(B2:B53,M2:M53))1'2
3 (4.30)
M N
Enter the following formulas and labels you should in your newbroiler data worksheet (in the
last row of each of the tables below, you will find the numbers of the equations used):
Prediction, GoodnessofFit, and Modeling Issues 141
0 p
1 p Yhatc
2 =EXP('LogLog Model'!$B$17 +'LogLog Model'!$B$18 * ln(02))
0.9
* EXP('LogLog Model'!$D$13/2)
3 1.0 (4.29)
Select cells P2:P3, move your cursor to the lower right comer of your selection until it turns into
a skinny cross as shown below, leftclick, hold it and drag it down to cell P22.
0 :I
1 D
l
2 0.9
3 1.0
. '
Select P2 and copy its content to P3:P22. Here is how your table should look (only the first five
values are shown below):
0 I p
1 p y'hatc
2 0.9 40.62103
>
3 10 4t.42584
I\. u 37.22676
>
5 1.2 33.76609
6 1.3 30.&674
Select the Insert tab located next to the Home tab. Select Ol:P22. In the Charts group of
commands select Scatter, and then Scatter with only Markers.
Chart<
yhatc
so
•
45
40 •
•
35
•....
30
.. ...
25
... .
21{) *•
15 ···�
••••••
10
Next, we would like to plot the actual values on the same chart. Select the points on your plot,
rightclick and select Select Data. A Select Data Source dialog box pops up. Select Add. In the
142 Chapter 4
Edit Series dialog box, specify actual values for the Series name, select D2:D53 for the Series
X values and B2:B53 for the Series Y valuesall from the newbroiler data worksheet. Select
OK, and then OK again in the Select Data Source dialog box.
'
nata Souroe
.Qelcte Edit Series
Series.� values:
Data.. ,
 2.
E.otd11Dn,
Legend Entries. (S_eries) Series, I values:
Addi Data Lll_Q_e·li
. 
  ·
! S·elect
After editing your chart like you did in Sections 2.l.2a2.l.2c,
: the result is (see also Figure 4.16
p. 157 of PrinciplesS:fyle
ofEconometrics,
Chart 4e):
'

 ��
�
S,gtect
�  yhatc:
1'newbroCTer dam'!$6$2.;$6SS3 liJ  1·
� 4.()
� a dual values
,___O._
K t;J OK .G;l
..
£0
.. Price Gf Chicken
50
..
.i
�
...
u
....
0
>
.t::
 .30
r::
a
20
10
05 LO :1!5 2.0 2.5 3.0
CHAPTER 5
CHAPTER OUTLINE
5.1 Least Squares Estimates Using the Hamburger 5.4 Polynomial Equations: Extending the Model for
Chain Data Burger Barn Sales
5.2 Interval Estimation 5.5 Interaction Variables
5.3 Hypothesis Tests for a Single Coefficient 5.5.1 Linear Models
5.3.1 Tests of Significance 5.5.2 LogLinear Models
5.3.2 OneTail Tests 5.6 Measuring GoodnessofFit
5.3.2a LeftTail Test of Elastic Demand
5.3.2b RightTail Test of Advertising
Effectiveness
This chapter is a simple extension of the material covered in Chapters 24. Instead of only one
explanatory variable in the simple linear regression model, two or more explanatory variables will
be used in the multiple linear regression model.
Open the Excel file andy. Save your file as POE Chapter 5. Rename Sheet 1 data.
We would like to estimate the following multiple linear regression model for Big Andy's Burger
Barn hamburger chain:
(5.1)
where SALES represents monthly sales revenue in a given city (in $1000), PRICE represents a
price index in that city (in $), and ADVERT is monthly advertising expenditure in that city (in
$1000).
143
144 Chapter 5
As we have done before, we will use the Excel Regression analysis tool. There are only two
things to note.
• First, because we have more than one explanatory variable, we will include the labels of
the variables in the input ranges we specify. Those labels will then be reported in the
summary output Excel produces, and we will be able to distinguish the different
estimated slope coefficients.
• Second, as long as the data on the explanatory variables are stored in adjacent columns,
all we have to do is select the whole range of data and Excel will recognize each column
of data as separate observations on separate explanatory variables.
In the Regression dialog box, the Input Y Range should be Al:A76, and the Input X Range
should be Bl:C76; do check the box next to Labels. Finally, select New Worksheet Ply and
name it Regression.

• �ression r:fj�
'.Input
Input 'Y' Range: I $>1.$1: $A$76 1�1 I
l1'1pllt l!'.. Range:
1$$1:$($75 �
�
tielp
�babek D Cori�tant is �ero
0 Confidence Level; �%
Ou !put op fions
0 Qutpl.Jt Rionge: I 1�1
@ New Worksheet f:ly: J Regre s�io nl I
0 New worklrock
Residuals
O&esiduals 0 Reslgual Plots
0 Standardized Re$idoals D L[r1e FitPl1,1tll
N.;.rmal Pro bability
O.t>!ormal Prdbab"dity Plots
A I B I c I D I E F I G I H I
1 SUMMARY OLJTPITT
2� ,._���������
.3 I Regression Statistics
_i_ Multif>le R 0J)69520_55
5 R Square 0446257766
JL Adju_sted R S<�l!<lre 0.432931593
7 Stand.ard Error 75
4.886124039
T ' O �s erv at i o n s
i
_:i_Q__AN OVA
1j dt SS, MS F SignifiGarace F
¥ R.e91!:ssion 2 13:9'6.538993 6982694963 29_24785998 _ 9.86J=W I
5:.01
1 3 Resid11al 72 11'18_942995 2387420813   ·
14 Tot a'I 74 311. 5.481"978 I
�5 1
161;������G 
o e
_ffi_ e
6 _t s�Sa
n  fffd_a_
rd _Eiro
�r . �lS_
a
r_f ��P
v ��
mue Lo %5 ��U
__9  �L 59_0_% _ _U___9�0%�
w w pp er 5� % •o .,,.r a p pe r _
�Intercept 11 S.913613 1 6.351637.595 18_72172:512 2_21 42.9E29 106.2518552 B1.5753711 1 06.2.51$552 131.5753711
PRICE 7_907854804 .(0%993()37 7.215241826 4_423.9'9E10 10_09267696 .5.12J032645 10.0926Tfi,9fi 5_7iJ�Q·645
,
ADVERT (S625B3787 0.6831954 BJ 2.726282349 0�008 0381 99 0_500658501 3_224509073 0_500658501 3224509 · 073·
Multiple Linear Regression 145
Recall from Chapter 3 that the interval estimator of {Jk is defined as:
(5.2)
The one important thing to notice is that, in the case of the multiple linear regression model, the
critical value tc is from a !distribution with m = N  K degrees of freedoms, where K is the
number of parameters in the multiple linear regression model.
To compute interval estimates, we could use the template we created in Chapter 3 and make sure
we specify the degree of freedom correctly.
Instead, we use the interval estimates Excel has already generated in the regression summary
output.
The results of interest to us, reported on pp. 182183 of Principles of Econometrics, 4e are
highlighted below:
A I B I c I D I E I F I G
1·6 Coefficients Slendard Error tStat P�elu& Lol'l'er 95% Upper95%
1872172512 2. 2142 9E29'
___R Intercept 118.9136131 635 i ()375.95 105,2518552 1.315753711
18 PRICE 7 9 078548 04

 1.0959930:37 1.i1 s24.ns26 4.42399E101 1b.o9261,595 5..7:21032645
O.OOH038199'lo.500058501 3.·224509073

·�
Recall that to obtain interval estimates other than the 95% ones, all we have to do is to specify a
different Confidence Level in the Regression dialog box (see Section 3.l.3c).
Similarly to results from Chapter 3, we have the following: if the null hypothesis H0: {Jk = c is
true, then the test statistic t =(bk  c)/se(bk) follows a !distribution with m = N  K
degrees of freedom:
(5.3)
Again, note that in the case of the multiple linear regression model, the !distribution of interest
has m = N  K degrees of freedom, where K is the number of parameters in the multiple linear
regression model.
Recall that when the null hypothesis of a test is that the parameter is zero, the test is called a test
of significance. Results of twotail test of significance are reported in the Excel summary output
and highlighted below (see also pp. 185186 of Principles ofEconometrics, 4e):
146 Chapter 5
I A I B I c I D I E I F G
161 Coefficients Standard Etror t sral Pvalue Lovrer95% Upper 95%
mfioto<e•� 118.913&131 6.351637595 18..72172512 2.2142.9E29 iOG.2518552 131_5753711
PRICE 7. 907 854804 1.0%99'3037 7.215241826' 4.42399E10 i 0_09267696 5_ 723032645,
ADVERT i _8625.83787 0_683H5i83 I 2.72fi282349 QJ}081l381991 0_ 50 0 658501 3.224509073;
Note: you could also have used the TwoTail Tests template you created in Chapter 3.
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left corner of your
screen, next to the data tab. Name it LeftTail Tests.
Open your POE Chapter 3 Excel file and go to the LeftTail Tests worksheet. Copy its content
to the LeftTail Tests worksheet you just created in your POE Chapter 5 Excel file.
You will need to make just a few modifications to create the lefttail test template shown below.
First, go back to each formula and delete the references to POE Chapter 3 Excel file: [POE
Chapter 3.xlsx]; this way the interval estimate will be computed based on the regression results
of your current Excel file: POE Chapter 5. Next, insert a new row, underneath the first one, for
K. Finally, modify the degrees of freedom formula. All needed changes are highlighted below:
A B c
1 Data Input N= =Re2ression!B8
2 K= =Regression!B12+ 1
3 b k= = Reg r essio n !B 18
4 se(bk)= =Regression!C18
5 Ho: Bk=
6 a=
7
8 Computed Values df or m= =ClC2
9 tc= = TINV(C6*2,C8)
10
11 LeftTail Test tstatistic= =(C3C5)/C4
12 Conclusion: =IF(Cl1 <=C9,"Reject Ho","Do Not Reject Ho")
13 pvalue= =TDIST(ABS(Cl1),C8,1)
14 Conclusion: =IF(C13<=C6,"Reject Ho","Do Not Reject Ho")
Multiple Linear Regression 147
Let a  0.05; H0:{33 2:'. 0 and H1:{33 < 0. The result is (p. 187 of Principles of Econometrics,
4e):
A B c
1 Data Input N= 75
2 K= 3
3 b.= 7.90785�
4 se{b<) = 1. 0 9'5.99'3
5 Ho: �k = 0
6 a= O,O:S
7
8 Compufe.d Values dform= 72
9 le= 1.G66;2937
1 ()
11 LeftT ailT est !··statistic"' 7.215.2418
12 Conclusion: Reject_Hci
13 f)Valwe = 2.212E10
14 Conclu:1;.ion: Rej�ct H()
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen, next to the LeftTail Tests tab. Name it RightTail Tests.
In your POE Chapter 3 Excel file, go to the RightTail Tests worksheet. Copy its content to the
RightTail Tests worksheet you just created in your POE Chapter 5 Excel file.
You will need to make just a few modifications to create the righttail test template shown below.
First, go back to each formula and delete the references to POE Chapter 3 Excel file: [POE
Chapter 3.xlsx]; this way the interval estimate will be computed based on the regression results
of your current Excel file: POE Chapter 5. Next, change the reference to bk and se(bk) to the
ADVERT coefficient estimates instead of the PRICE coefficient estimates. Also, insert a new row,
underneath the first one, for K. Finally, modify the degrees of freedom formula. All needed
changes are highlighted below:
A B c
1 Data Input N= =Regression!B8
2 K = =Regression!B12+ 1
3 bk = =Regression!B19
4 se(bk)= =Regression! C19
5 Ho: J3k=
6 a=
Let a = 0.05; H0:{33 < 1 and H1:{33 > 1. The result is (see also p. 188 of Principles of
Econometrics, 4e):
A B c D
1 Data Input N= 76·'
2 K= J.
3 b,= 1_8,62583787
 
4 se(b,)= O.S83195483.
5 Ho: �k = 1
5 cr= CJ.OS
7
8 Computed Values dform= 72
9 t., = 1666293697
10
11 RightTail T�s_t tstatistic = 1 _262572438
12 _,___Conclusion: Do Not Rej�ct Ho
13 f)value = 0_ 105408444
14 _ C onc � �s i �r:i_ Q_ o
: . �ot_B_��ct_ H9 _
We estimate the following extended model for Big Andy's Burger Barn hamburger chain.
(5.4)
2
Go back to your data worksheet. In Dl, enter the column label ADVERT • In cell D2, enter the
formula =C2A2; copy it to cells D3:D76. Here is how your table should look (only the first five
values are shown below):
A I B I c I D
2
_l_ SALES PRICE ADVERT ADVERT

2 732 5_69 1_3 u;.9
3 71.& 5_49 2_9 !L41

4 52_4 5_53 0_8 (Ui4
5

67_4 5.22. 0.7 ()_49
6 893 5._02 1.5 :225
In the Regression dialog box, the Input Y Range should be Al:A76, and the Input X Range
should be Bl:D76. Check the box next to Labels. Select New Worksheet Ply and name it
Extended Model. Finally, select OK.
,'Regression
CIJ�
[nput
A I B I (; I D E I F G H I I
� SUMMARY OUTPIJT
2
l
3 I Reares.s;on Sfafjsfics ·
� MultipJe R 0712906125
_R !?quare _q.§Q82,3.5142'
�� �.�
6
,....__
Adjwsted J3. �iqua��, 0.48145'6345
 PRICE
f18
7.6',100G0543 1045938915 '· 7:30:14443_84 3_23648E10 9.725543479 _5_55445,7,5oa _9_ 725543479 5_55445750B
J.! ADVERT
20 ADVERT2
12.15123398
2_ 767°%2762
J.556164048 3.41&949784
Cl'. 94062405'9 2.94:i6 876:07
0.001"0516
0.004391267 1°
5.060444353
4.643513842
1.92'1:2:0235
0_892411683
5.060444353
4.643513842
19.24202:36
0.892"4116.83
where PIZZA is annual expenditure on pizza, AGE is age, and INCOME is income of a random
sample of 40 individuals, age 18 and older.
Open the Excel file pizza4. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 5 in one file, create a new worksheet in your POE
Chapter 5 Excel file, rename it pizza4 data, and in it, copy the data set you just opened.
In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Fl:G41. Check the box next to Labels. Select New Worksheet Ply and name it Life
Cycle Model 1. Finally select OK.
150 Chapter 5
Input
lmpu t I Range� �
Input! Range:: I $!" $1:$Gs41 �
�Labels D Consta111trs z.ero
D Con6dencoe: Level:. � D/�
Output options
0 QulputR.ange: I 1 ili1l Clde 100I
Ur� e;
A I B I c I D' E I F I G I H I I
t+ISUMMARY OUTPUT i
I
3 I Regression Stefistie;.s
I
4 Mulrifl'le R 0_573803829
 l
5 R Squa�.e 0:32·g:2.sos34

6 Adjusted R Sq;llare 0�292994123 ;
I
�1a  Standard Error 1 '.}1.070099
Observations 40 I
1
I
!�ANOVA
11 elf SS MS .F Srgnificnnce F ,
l
I
i
Regression 2 �12015_ 1787 1560()7_5894 9'�0·81100278 0.000618533
Jg_
13 Residual 37 1&356 35_ 7213 17179370 85 I
14 Total 39 I
947651_9
15
1£ Coe ffic;ients Standard E1JUI f Sfaf Pva!ue lower 95% Uee_er95% l'._01'1'er 95_0% Uee_er950%
17 Intercept 342. 8848.279 72.3434.19'66 4_ 739682�3 3. 14373E05 196.3()3H73 4891.4665184 196.3031373 469.4665°184
 
�1f income 1 _832:478934 0'.4643()0741 3.946749963 o:_.060340943 0.8917162:78 2.773'241589 0_89171'6.278 2773241589
19 aqe 1.57.5555694 2',J169.fl758:J 3269571209 °' 0 0233260'7 12_27021864 2.8808931.53 12.2 7021664 2.860893:153
To account for an effect of income that depends on the age of the individual, we add the
interaction variable (AGE x INCOME) to the lifecycle model:
Go back to your pizza4 data worksheet. In Hl, enter the column label age x income. In cell H2,
enter the formula =F2*G2; copy it to cells H3:H41. Here is how your table should look (only the
first five values are shown below):
H
1 age K°i'ncome
2 487_5
3 1755
4 312
5 728
6 487.5
In the Regression dialog box, the Input Y Range should be Al:A41, and the Input X Range
should be Fl:H41. Check the box next to Labels. Select New Worksheet Ply and name it Life
Cycle Model 2. Finally select OK.
Multiple Linear Regression 151
.
Regression Ll] [8]
Input
lnjXJt r_ R"nge�
Inpuq Range�
t!elp
0 h<it>els 0 Constant is f;ern
Ocmn�denGelevei:· �%
output optioro
0 Qutput R<lnge: I lf�lr rte:J Hode .�1
1
� ______A
__ I B c I D I E. I F I G H I I
1 SUMMARY O'UTPUT
T
3 Rearession Statistics
+
l MultirJ"R ()_62:2349295
R Square (),38'73111645
__1__

10 ANOVA
11 rff SS MS f Sig_nifi@noe F
12 Regression 3, )67043.25 122347.75 7.5650;37514 0.00046:8085
13" fl;esidual ·36 580608.65 16128.01806
t

14 Total 39· 9475519
15
16 1 Coefficients Slandaro EnDr ISfaf Pvalue Lo�•er95% Upper95% Lower95.0% Upper95.0%
];. ln_terc_ept, 1;61.46?4.32 120.6G34096 1.338147434 0.189?3�6.8�9 83.2513.0349' 406.1821675 8325130349 4 06.18211675
J_! income 6.917990507 2.82276761 2_,if�116s"fi4 0.01826628' 1_z55ofi:7055 127o414309 1isso61055 12_70474309·
__:12,_age_ z__9t7423365 3.352100814 0_88§22SQ 8 0.380315589 :9. ns798!J7 3 _s20952_139 ?17579?8 7 3Jl_209_52139,
20 , age x income

().1232393.51 0. 0136718728 1.847147792 O.ll7Z957528 0.Z5il55·12Q2 0.01<'0725 0.258551202 Q_0120725
Open the Excel file cps4_small. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 5 in one file, create a new worksheet in your POE
Chapter 5 Excel file, rename it cps4_small data, and in it, copy the data set you just opened.
I
lrrisertw�rk.sheet rsnift... lflla
Go back to your cps4_small data worksheet. In cells M1:02, enter the following column labels
and formulas.
M N 0 p
1 In(wage) educ exper educ x exper
2 =ln(A2) =B2 =C2 =M2*N2
152 Chapter 5
Copy the content of cells M2:P2 to cells M3:P1001. Here is how your table should look (only the
first five values are shown below):
M I N I 0 I p
1 _ln(wage·) educ: exp educ x ·e.:icp
�
_1 2.9285:235, 1·& 3,9 624
3'

2.442347 1:2 16 19'2
4 2.710>7133 15 13 208
5 3.25·S:H16 14 11 154
rS: 3.179303 12 5,1 612
In the Regression dialog box, the Input Y Range should be Ml:MlOOl, and the Input X Range
should be Nl:PlOOl. Check the box next to Labels. Select New Worksheet Ply and name it
LogLinear Model w Interaction. Finally select OK.
,  
1 Regression L1Jrg)
Inpu° t
Input)!'. Range:
Irnput ! Range::
j .§M.$1:$M�1001 �
1�$1:�§;1001 � � I
t!e:lp
0r�abels D Constant1s :?_er.o
D Confider;ice Level: � o/.
Output opliol'ls
0 QutpJ't Range.; �1
@New WorkShee:t Ely: j icie I w In te•iilciion
I
1
b A�.l
SUMMARY OLJTPLJT
B I c I D E I F I G H I I
2" 1
t
2I\ 1 Multip.le Pleg_ressipn
.!!.__
Staofistics
Q_44115987
'
R Squa�·El 0 .19�W22031
.2.... 
��
r ·
8
Adju;:iteu RSqu.ar•e
Standard Ermr
Qbservat·iorns
0·1921961�
0.521847758
1000
t

9'
10 AN OVA 
11 df SS MS f Sig_niffca11ce F
��Re� r0ess i·oii 3 6.5.54495019' 21.B4&nwn 8022880786 1.7205E46
1J. Re·s:idual 99'6' 2'71,.2357823 : 0.27232508);
14
,__
To1<11 99'9; 336.760732'.5
15
16' Coe.fficients Sfanrierri Error t Slat Pvaiue Low.er95% Upper95% Lower 95. 0% Upper 95.0%
17 lnt�rce�t 1 :392317989 Oi:206�44H3• 5. 7377J.7'608 2.7172'.E11 0: 985:808 989 1.7978;2£91l8 Cl'.98680B�ey9: 179'7 8259 88
18. educ
f
0..09493849'5 0.. 0145245$7 ' 5.4.91712999' 1.33643E10 0.065239995 0.123£3699'4 0 .0£62'3 999>5 0.1236'.ls.994
1,g exp GJ.006329514., 0.00569851 CJ>.94:4.9'13664; 0·�3449'32118 0.006615298 0.019474326 0.00681529>8 0.01947432.6
2o OOUC.1'( rait:f!i ,J_64453EO.S o 000483,rss' ci.07533629�1 o.9'.3�96227 o:·ooci9s51661 o.ooo.9fa8is 0�.000�85766 o.ooo�fi2:a75
Multiple Linear Regression 153
The coefficient of determination R2 is reported in the Excel regression summary output. For Big
Andy's Burger Barn multiple linear regression model of Section 5 .1, it is highlighted below:
SUMMARY OUTPUT
sties
Multiple. R 0.66952055
R Square o_44B25no6
Adjusted R Square
Standard Error 4.886124039
Observations

A I B

1
2 I
3 I R.egress.ion Sfott
4
5 I
El 0 .4]2931593

7
8 75
CHAPTER 6
CHAPTER OUTLINE
6.1 Testing the Effect of Advertising: the Ftest 6.4.2 The Optimal Level of Advertising and
6.1.1 The Logic of the Test Price
6.1.2 The Unrestricted and Restricted Models 6.5 The Use of Nonsample Information
6.1.3 Test Template 6.6 Model Specification
6.2 Testing the Significance of the Model 6.6.1 Omitted Variables
6.2.1 Null and Alternative Hypotheses 6.6.2 Irrelevant Variables
6.2.2 Test Template 6.6.3 The RESET Test
6.2.3 Excel Regression Output 6.7 Poor Data, Collinearity and Insignificance
6.3 The Relationship between t and FTests 6.7.1 Correlation Matrix
6.4 Testing Some Economic Hypotheses 6.7.2 The Car Mileage Model Example
6.4.1 The Optimal Level of Advertising
In this chapter we continue to work with the multiple linear regression model of Big Andy's
Burger Barn hamburger chain to illustrate the Ftest procedure. We also work with additional
examples to address nonsample information, model specification and collinearity issues.
In Chapters 3 and 5 we worked with ttests for null hypotheses consisting of a single restriction
on one parameter f3k· An Ftest will be used when a null hypothesis consists of a single or more
restrictions, each regarding two or more parameters.
154
Further Inference in the Multiple Regression Model 155
An Ftest is based on a comparison of the sum of squared errors from the original, unrestricted
model, with the sum of squared errors from the model in which the null hypothesis is assumed to
be true and in which the restriction(s) implied by it has(have) been imposedthis latter model is
referred to as the restricted model.
If the null hypothesis is true, then the following Fstatistic follows an Fdistribution with m1 = ]
numerator degrees of freedom and m2 = N  K denominator degrees of freedom:
(SSER  SSEu)/J
F F K) (6.1)
SSEu/(N  K) � (m1=f,m2=N
=
where SSER is the sum of squared errors from the restricted model,
If the null hypothesis is not true, then the value of the computed Fstatistic will tend to be
unusually large. We will reject the null hypothesis if F ;::: Fe, where Fe is the critical value shown
below.
We will use the Big Andy's Burger Barn model to illustrate the Ftest procedure. We start by
specifying and estimating the unrestricted and restricted models.
Recall from Chapter 5, the following multiple linear regression model for Big Andy's Burger
Barn hamburger chain. This is the unrestricted model.
(6.2)
where SALES represents monthly sales revenue in a given city (in $1000), PRICE represents a
price index in that city (in $), and ADVERT is monthly advertising expenditure in that city (in
$1000).
156 Chapter 6
Suppose we wish to test the hypothesis that changes in price have no effect on sales revenue
against the alternative that changes in price do have an effect. The null and alternative hypotheses
are H0:{33 = 0,/34 = 0 and H1:{33 * 0 or /34 * 0 or both are nonzero. If we impose our null
hypothesis or restriction to equation (6.2), we obtain the following restricted model:
(6.3)
We would like to successively estimate the unrestricted model (6.2) and the restricted model
(6.3). First, open your Excel file andy. Save your file as POE Chapter 6. Rename Sheet 1 andy's
hamburger chain data.
2
In Dl, enter the column label ADVERT . In cell D2, enter the formula =C2"2; copy it to cells
D3:D76. Here is how your table should look (only the first five values are shown below):
A I B I c I D
1 SALES PRICE ADVERT ADVERT2
2 73.2 5_,59 u i.69
,_
3 7'i.8i ·5.49 2.9 8.41
,_
I
4 62'.4 S_fi;3 0.8 0.64
For the unrestricted model (6.2), the Input Y Range should be Al:A76, and the Input X Range
should be Bl:D76. Check the box next to Labels. Select New Worksheet Ply and name it
Unrestricted Model. Finally select OK.
l.[lput
OK!8
lnput'Y:'Range: I SA;$1:$AS76 �
I $8$1:$[)$76
Cancel
I
[nput;X1Riilng�:
[�]
.!ieip,
�b.abels: D Constant.is �ere
D Con�dence Level; EJ o;,.
Output cptian�
A I B I c I D E I F I G I H I I
1 SUMMARY OUliPUli
2 T
3. I Reg·ression Statistic.s
4 Multtple_ R 0.712'9051.33
5 RSqusre 0.50&2'35155
1
5 Ad'justed R square 0.487455358
7 StaJ11dard Error 4 .·645.2&3021
a Observatiorus 75
�
10 ANOVA
_J
,___ 
15 J
.t6J Caeffidenfs Standrn:d Er'mr tSt·at Pvril/Ue Lowef'95% Upper95%· tpwef'95.0% Uppff 95.lJ%
108.719035 •6.79'.9045455
�' "'""''
16.13.741763 1.B7037E25 96.16212457 123.2759'474 95.16212457 123.2759474

18 price  7 ,.540000035 1.04593888'4 7. 304442117 3.23548El0 9.725542907 5 .554457162  9 . 725542907 5.554457162
19' advert 12.15123567 C3.S.55HB941 3.416850354 0.001051598 5.050446253 19.24:202509 5.060446253 19.242025()9
f 
20 adveat2 2. ?57963()89 {).9'40624011 2..94Zn88043. 0.004392655 4.643514112 0.812412065' 4.6435·1411"2  O . S92412G 56 ,
Go back to your andy's hamburger chain data worksheet. For the restricted model (6.3), the
Input Y Range should be Al:A76, and the Input X Range should be only the PRICE data
Bl:B76. Check the box next to Labels. Select New Worksheet Ply and name it Restricted
Model. Finally, select OK.
Input OKW
input Y:Rcinge; I ¥'111: $11:$76 �
input�Range: 513$1::$8�76 rs
cancel
l
t:!elp
�b:a'bels D •Coras�nt is �ero
D ConBdeno:e 'Level: �%
Oatput optlans
0 Qul::put Rijnge: �1
@Ne\111 WQrl;:;heet1�Jy;; j Restricted Model I
The result is:
158 Chapter 6
I I I I I I I
�SUMMA.RY
A B c D E F G H
OUTP'UT
i
r I
3 I Regr:essivn Statistics
4 Multple R 0.62554053
 t
5 R:5'qua[e 0.391300�55
i
5 Adjusted R Square 0.382952'612
 .,

7 standard Error 5.09685752'9
8 Obs.grvatlons. 75
l�ANOVA I
i
i
nl df SS MS F Sign.if.icancf! F
i
12 _Re15re:;sion 1 12l<j,091Q3. 121'9 .09103 46.92790295 1.97078E09
  + i
13

Re.si,dual n 1896.390837 25.97795667
i
14 Iota I 74 3115.4818&7 I
15·
101 Coefficients Stamfarri Error t stat Pvoiue lowef"'J5% Upper95% Lower95.0% Upper95.0%
�ter<0ept 121.9001736 6.5262906'98'. 18.67832421 l.5876E29 108.8932951 134.9o7052 108.&932951 134.907052
pri ce.  7 .829'073515 1.142864644 6.850394365· 1.97078E09 10.10679943 5.551347597  10.10679943 5,551347597
'
Insert a new worksheet by selecting the Insert Worksheet tab at the lower left comer of your
screen. Name it Ftest.
I Ftest '
Fcritical values are obtained in Excel by using the FINV function. The syntax of the FINV
function is as follows:
where a is the level of significance of the test, m1 is the numerator degrees of freedom and m2 is
the denominator degrees of freedom of the Fdistribution.
pvalues for Fstatistics are obtained in Excel by using the FDIST function. For hypothesis tests
purposes, the syntax of the FDIST function is as follows:
A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!B 12+ 1
4 SSEu= ='Unrestricted Model'! C 13
5 SSER= ='Restricted Model'! C 13
6 a=
Further Inference in the Multiple Regression Model 159
A B c
8 Computed Values mi= =Cl
9 mz = =C2C3
10 Fc= =FINV(C6,C8,C9)
11
12 Ftest Fstatistic= =((C5C4)/C8)/(C4/C9)
13 Conclusion = =IF(C12>=C10,"Reject Ho","Do Not Reject Ho")
14 pvalue = =FDIST(C12,C8,C9)
15 Conclusion = =IF(C14<=C6,"Reject Ho","Do Not Reject Ho")
Note that the number of parameters K is equal to the Excel regression degrees of freedom plus
one (see cell C3 above).
With 2 restrictions in the null hypothesis H0:{33 = 0, {34 = 0, and at a 0.05, the results of the
Ftest are (see also p. 225 of Principles ofEconometrics, 4e):
A I B I c
I
8 Computed V1:1lu.es m1= 2
,_
A I B I c 9

n12 = 71
1 Data Input J= 2 10 Fe= 3.125764

I
2 N= 75 ll
 �
3 K!: 4 12 F.test Fstatistic = .&.44136
�
For a general unrestricted multiple regression model with K 1 explanatory  variables and K
unknown coefficients: Yi = �1 + �zXiz + �3xi3 + + �KxiK + ei> the null · · · and alternative
hypotheses of a test of significance of the model are:
Note that, in this one case, in which we are testing the null hypothesis that all the model
parameters are zero, except the intercept, the sum of squared errors from the restricted model is
equal to the total sum of squares from the unrestricted model: SSER = SSTu.
Insert a new worksheet by selecting the Insert Worksheet tab at the bottom of your screen.
Name it Test of Significance of Model.
160 Chapter 6
Copy the template from your Ftest worksheet into your new worksheet. You just need to modify
the reference in cell CS, as highlighted below, to obtain a template for a test of the overall
significance of the regression model.
A B c
1 Data Input J=
2 N= ='Unrestricted Model'!B8
3 K= ='Unrestricted Model'!B12+1
4 SSEu= ='Unrestricted Model'!C13
5 SSER = ='Unrestricted Model'!C14
6 a=
For the unrestricted model (6.2), SALESi = {31 + {32PRICEi + {33ADVERTi + {34ADVERTl + eb
the null and alternative hypotheses of a test of significance of the model are:
The null hypothesis above contains two restrictions. With 3 restrictions, at a = 0.05, the results
of the test of significance of model (6.2) are (see also pp. 226227 of Principles of Econometrics,
4e):
A 8 I c
s Computed: Value's: m1= 3
A I 8 I c '9 m2= 71

2 N= 75 11


4 SSEu= 1532.084' 13

Conclusion = Reject Ho
5 SSfai= 3115.482 14 pv:alu·e = 5.6E11
 �
6 a= 0.05 15 Conclusion = Reject Ho
For the test of significance of a model, since SSER = SSTu, there is no need to estimate a
restricted modelall the information needed to compute the Fstatistic is available from the
regression analysis of the unrestricted model. This is why the Fstatistic of the test of significance
Further Inference in the Multiple Regression Model 161
of a model and its pvalue are found in the Excel summary output (see your Unrestricted Model
worksheet):
A I B I G I D I E F
11 I rJF SS MS F Sifl_niffr;anc;e F
.R Reg,ression t
2 1396.536993 693 .26'94963 29.24785,998 5.0'408SE10
Jl. Reisidual 72 rna.94.2sss 238:7420B13
14 Total I 74 3115.481978
Reconsider the following multiple linear regression model for Big Andy's Burger Barn
hamburger chain. This is the unrestricted model.
(6.2)
Suppose we wish to test the hypothesis that changes in price have no effect on sales revenue
against the alternative that changes in price do have an effect. The null and alternative hypotheses
are H0: /32 = 0 and H1: /32 * 0. If we impose our null hypothesis or restriction to equation (6.2),
we obtain the following restricted model:
(6.4)
Go back to your andy's hamburger chain data worksheet. In the Regression dialog box, the
Input Y Range should be Al:A76, and the Input X Range should be Cl:D76. Check the box
next to Labels. Select Output Range and specify it to be cell Al in your Unrestricted Model
worksheet: you can place your cursor in the Output Range window and move it to that cell to do
that, or type 'Restricted Model'!Al in the Output Range window. Finally, select OK.
I $CS1:$0$76 �
�
� I
t:j_elp
0babels D co,,,;rant i• !.ero
D Co.nfjdence Level: �%
Ouiput options:
0 Qu'tputR�e: (i6?dei'!�:$.1. m
Excel informs you that the output range will overwrite existing data. Do select OK to overwrite
the data in the specified range.
.  ��������
? Regression  Outµut ramge will lilverwrite eliisting data. Press OK to overwrib:: data in range
A I B l c I [) I E 1 F 6 H I I
t
1 SUMMARY OUTPUlf

3 I Regression Statistics

7 star11dard Error 6.1048829
& Observations I 75
j
�

10 A NOVA I
11 I I cJf 'SS MS F Significan oe F
12 1 Regres�irnn 5. 796561616 0.004632556.
t
2 432.0710103 216.0355051

14 Total 74 3115.481857 I I
15
16J Coefficients Standan:i Error t Stat Pvafu·e Lower.95% Upper.95% Low.er.95.Cl% Upper.95.0%
17 lrnt·ercept 64.1l4148981 3.827012492 1·6. 9431()� 7.87896E27 57.2.12:47994 72.47Q4S968 57.. 21247994 n.4704996B
18 advert 14.249.15942 4.6582829 3.058886559' 0.003118901 4.96304.2303 23.53527653, 4.%3042303 :B.53527653
19 advert2 3 .3 55•8'94266 1.231488631  2 7331915C 7
. J 0.00788'726&  5.&2082195 0.9'lfr966582 5 .. 82082195 0.910965582
Go back to your Ftest worksheet. With 1 restrictions, at a = 0.05, the result is (see also p. 227
in Principles ofEconometrics, 4e):
A I B I c I
g Computed Values m1= 1
�
I I
I
Data Input
A B
J=
c
1
,_
'9
11()
m2=
Fe= 3.97581
71
,_
2 N= 75 11

,_
3 K= 4 12 F11est Fs.talisti c = 53.35487
I
4 SSEu= 1532.084 13 I Conclusion = Reject Ho
1
.SSER = 2683.411 14 pvalue = 3.24E10
,_
1+ o= 0.05 15 I Conclusio·n =·Reject Ho
Note that we used attest in Chapter 5 (Section 5.3.1) for this same test of significance of {32.
When testing a single "equality" null hypothesis (a single restriction) against a "not equal to"
alternative hypothesis, either a ttest or an Ftest can be used and the test outcomes will be
identical.
If you go back to your Unrestricted Model worksheet and look at the pvalue for b2, you should
find that it is exactly the same as the one computed in your Ftest template. We highlight both
results below:
   

A I B I c I Di I E
A B c 1fj Coefficients Standard Error t'Stat P11u/ue
11 Ftes1 Fstatishc = 53.35487 17 Intercept 109.719:D35 5.7990454551 15.13741753 1.87937E25
,_
13 Conclusion= Reject Ho 18 price 7.540000035 1.04.59:388841 7.304442117 3.23648E"l0
14 pvaluB = 3.24E10 19 advert

12.1512356.7 3 .555153941 3.416�50364 0.001051598
Go back to your andy' s hamburger chain data worksheet. Because explanatory variables must
be adjacent, insert a new column to the right of the PRICE data column. In Cl, enter the column
label x*. In C2, enter the formula =E23.8*D2; copy it to cells C3:C76. In Fl, enter the column
label y*. In F2, enter the formula =A2D2; copy it to cells F3:F76.
Here is how your table should look (only the first five values are shown below):
·1 A I B I c I D I E I F
1 SALES, PRICE x:" ADVERT ADVERT2 y"
2 732 S,_&9 3_2S 13 U19 71_ 9'
'3

718 ,5_49  2 61
. 2.9 a_41 68_9'

4 624 £._6;3 2'4 o_a o,_fi4 61 _ fi,
5 674 .fi,_22 2.17 0.7 049 Eifi'7
....____
6 89_3 s._n2 
3 45
. 15 2'25 ll7_8
For the restricted model (6.5), the Input Y Range should be Fl:F76, and the Input X Range
should be Bl:C76. Check the box next to Labels. Select Output Range and specify it to be cell
Al in your Restricted Model worksheet: you can place your cursor in the Output Range
window and move it to that cell to do that, or type 'Restricted Model'!Al in the Output Range
window. Finally, select OK.
"�  � 
b[elp
�labels: D Ct?nstant is f;ero
D Gonfjdence Level: EJ %
OIJtpUt CpllOl:'IS
Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
164 Chapter 6
mSUMMARY OUTPUT I I I I I
A B C. I 0 E I F G I H I
3 I Regression S falistics
4 R
Multipl., 0 _ 693339057
f
� R Sqwarn 0480719048
6 Adjusted R �quare
,__
o_4662945n .
T Standard Error 4.643224.39
1�
a Ol>sen1a.tions 75
1�AN0v'A
J_'.1_ I dt SS MS F Sig_riificance F
12
f
Regression 2 1437_0 1327 1 718._5066355 33�321'&3303 5J>818E11
R�sidua.I 72 1552.28"6;357 21 �55953273
�
14 Total 74 29892�96.?8"
15 I
�16I Coefficients Sla11dard Error t Slat Pv:alue Lower95% Upper95% Lov,.er 95.0% Ueeer95.0%
17 Intercept 110.35B959'9 ,6_763!10,3393 16.31610996 6.84193E26 96.87556446 123B4Z 3554 96_8.7,5.56446. 123.. 8423554
7 .60310422 7.21722771 5:5203727·7 9_ 6 8 5 835675 , 5 §2037_27GS'
J_i[ PRICE
19 x* 287651491
1.044 78()'30 9
0.9334%59 3.Cl8144457
3.3961 TE10
0.0029'17717
9Ji,85835675
4. fi7404337 10156·2'549, 4737404337 1.01562503
Go back to your Ftest worksheet. With 1 restriction, at a = 0.05, the result is (see also p. 229 in
Principles ofEconometrics, 4e):
A I B I c I D
8 Computed! V.alues m1= 1
A B
9 m2= 71
1 Data Input J=
2 N=
iO
f
Fe= 3.�7581
11
3 K= 4 
12 Ftest

Fstatistlc = 0.936194
4 SSEu= 1532.0BS ·13 Conclusion= Do No.\ Reject Ho

SSER= 1 552 _ 2!!6 14 pvalue = 0.336543,

a= Q. (15 15 Conclusion= Do Not ·Rej.Bcl Ho
Go back to your andy's hamburger chain data worksheet. In cells G1:12, enter the following
column labels and formulas.
G H I
1 y** X1** X2**
2 =A2D278.l =B26 =E23.8*D2+3.6 1
Copy the content of cells G2:12 to cells G3:176. Here is how your table should look (only the
first five values are shown below):
G I H I I
�
1 y"* I x.i* X2"'
2
,__
6.i 0.31 0.3·6
3 9.2: 0.4'9 1
4 165

0.37 1.21
5 11.4 0.2.2 1.44
f
6 9.7' 0.9:& 0.16
Further Inference in the Multiple Regression Model 165
For the restricted model (6.6), notice that there is no intercept; so you will need to select the
Constant is Zero option in the Regression dialog box. The Input Y Range should be Gl:G76,
and the Input X Range should be Hl:I76. Check the box next to Labels and Constant is Zero.
Select Output Range and specify it to be cell Al in your Restricted Model worksheet: you can
place your cursor in the Output Range window and move it to that cell to do that, or type
'Restricted Model'!Al in the Output Range window. Finally, select OK.
Input.ii RMlge;
I ·�$1::$1;$76
$1$1:$.1$76
[�]
�
� I
'tielp
�Labels. � CoMt:arntls z_ero
D Confidence tevel; EJ %
Output options
Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
A B I G I D E I F I G H
1 SUMMARY OUTPUT
2
3 R.�gr.ession Statlstic.s
4 .Multiple R 0._699423441
5R Square 0489193159!
6 Adjusted R Square 0•466497175
'7 Standard Error  4_937778213
T Dbservations 75
9'
10 ANOVA
J1_.�
I ____________ � s_s��1w _ s _______ F w�n_m_ca_n_
s� ec _F_
12 Regression 2 1 704_549'/'rn 11522748861 34_95558173 2_46249E'11
B Residual 73 1719_8'6071.9 2418165368
'14
f.\
.
Total 75 3484_410'495 , ,
15
���������������������������
Goeffrcienls StandB'rd Error t SlB't F'Vil']ue .Lower 95% Upper 95% lowedl5. 0% Upper 95 0%
� Intercept _ 0. #NIA #NJ.A ___ #NIA, _ _ #NIA #NIA #NIA #NIA
.i1_17957010 :.s_2os f2o s3'4 4.1191sf81�1
_
Go back to your Ftest worksheet. With 2 restrictions, at a= 0.05, the result is (see also p. 231
in Principles ofEconometrics, 4e):
I
A B I c
8 Compute.di Values m1= 2
,_
A I B I c
� m2 = 71
_J_ Da1a Input. J= 2: 
1Qi Fe= 3.125764
2 N.= 75 ,_


11
3 K= 4
12 Ft·est Fstatistic = 5,_7412:33
4 SSEu= 1532_085 ,__

,_
13 Conclu�ion = Reje_ct .Ho

5 SSER= 1779.Jl61 14. pvalue = [)_004885
1�
6 a= o_o.s 15 I Conclu.sicin = 'R0eject Ho
166 Chapter 6
where Q is the quantity demanded, PB is the price of beer, PL is the price of liquor, PR is the
price of all other remaining goods and services, and I is income. All information for this model
has been collected over a period of 30 years from a randomly selected household.
The assumption that economic agents do not suffer from "money illusion" can be imposed on the
demand model. This lead to the following restricted demand model for beer (see pp. 231232 in
Principles ofEconometrics, 4e for more details):
(6.8)
Open the Excel file beer. Excel opens the data set in Sheet 1 of a new Excel file. Since we would
like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, rename it beer data, and in it, copy the data set you just opened.
U heerdata,./
I I Insert Wen.ks heft [S!lifHfll ) �
I
In cells Fl:I2 of your beer data worksheet, enter the following column labels and formulas.
F G H I
1 y
* X1* X2* X3*
2 =ln(A2) =ln(B2/D2) =ln(C2/D2) =ln(E2/D2)
Copy the content of cells F2:I2 to cells F3:131. Here is how your table should look (only the first
five values are shown below):
F G H
r x1• x:l X3�
2 4_403054 0_472253 1_834382 10_025:7a
3 4.0412915 1.2:20257 2.39'1:088 10.58768
4 4_160444 0_979322' 2_Wi509 10_.33316
5 4.180522 105315 2.2'58981 10.49'711
6 4.160444 0.757095 1.%1287 10.15131
In the Regression dialog box, the Input Y Range should be Fl:F31, and the Input X Range
should be Gl:I31. Check the box next to Labels. Select new Worksheet Ply and name it
Restricted Beer Demand Model. Finally, select OK.
Further Inference in the Multiple Regression Model 167
Input '!'.Range:
i:Aput �Range:
J.$F$1:�'.$.,31
J$G$i:$1$3i
�
r�l
� 
t:[elp
�Labels D Cons:tant is ;'.ere
D Crin�clence Level: �%
Output options
0 Qutput Range: �
@New Worksheet['_lyo J�er Demand Mcdell I
A B c I D I E I F I G I H I
111""'1"'
S � UM�M
�....
ARY.....: OLJTp UT
'2
3 Re a,r. e s·
s, on_
 S_ta _t1
 sti c_s __
4 Multiple R 0.898659761
'5 RSquare 0.80794887
L AdJ usied R Square

o i as i a9124
_
�ANOVA
11 I df SS MS F Significance F _
15
Coefficienls Sfapd,.rd Error t Sfat Pvalue LowerY5% Upper 95% lowe.r95J)% Upper95_0%
17 1 lnterce·pt _4.7'f7797376 3.7139(}504
 .:1.2_91847079 0207775913 12�43'183 844 2.83624369'1 12.43'183844 2.83624369'1
1if !K"'l' 1299�8¥8:4 _0.16573!623 7.840021241 �.57799E08 l.640065044  0 95s i' o1925 �1.64oo6s044
_ 0_9�8ro7�.?�
tl
,_
x2. 0.186615879 0284383258 0.656915882 051700'8126
   0,3!!'77�?275 0. 771374032 01.3917742275 0. 77137403'2
20' x3� 0.945628579 �427046831 2.214812313 O.Qo3574:2225 0.0&.8021255 1. 823&35904 0·.0&8021255
. 1.823&359.04
(6.9)
where FAM INC is the annual family income of married couples where both husbands and wives
work; HEDU is the years of education of the husband and WEDU is the years of education of the
wife.
If we incorrectly omit the relevant variable WEDU (wife's education) from the family income
model, it becomes:
(6.10)
If we add the omitted relevant variable KL6 (number of children less than 6 years old) to the
family income model, it becomes:
168 Chapter 6
(6.11)
You can estimate models (6.9)(6.11) using the edu_inc data set. Below, we will show you how
to get the correlation matrix as shown in Table 6.1 of Principles ofEconometrics, 4e (p. 235).
Open the Excel file edu_inc. Excel opens the data set in Sheet 1 of a new Excel file. Since we
would like to save all our work from Chapter 6 in one file, create a new worksheet in your POE
Chapter 6 Excel file, name it education and income data, and in it, copy the data set you just
opened.
�
I
edura.tion and in.come datal, 'tJ
Jln�"rtWor�lrteet (�pitl'"FtlJ 1 Q
Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.
The Data Analysis dialog box pops up. In it, select Correlation (you might need to use the scroll
up and down bar to the right of the Analysis Tools window to find it), then select OK.
Analysis Tools
e Factor
OK
Ano�a: Singl "' I
e
l io!An o .,. a; m
/:ova: '11/iiiiii fi caibiioniiin •••• l
Cancel
hD
•oi fi
a ctiiiil
o r Without !h· iiRliipii
i!ijlim
Cov ariance
T'lll oFact w Reolicaoo
�
dl :t[elp
Descriptive Statsncs
Exponenti;31 Srnaething
F·TestT•woSample fur Variances
Fowrier Anill�sis
Histogram
A Correlation dialog box pops up. Specify the Input Range to be Al:F429. Select Grouped by
Columns, as this is the way the data on each variable are stored. Check the box next to Labels in
first row. Select New Worksheet Ply and name it Correlation Matrix. Finally, select OK.
,
Correlation (f]�
Input
lnput Range:
ISA$k.�.$'\:2.9 �
Grouped By: 0 Column�
OB.ows
� eel
t;!elp
0 tabels in first rn•w
Output options
0 QutputRi0n9e: I"" 1 . �1
@ NewW11rkl:heet E'Jw:· ICorrelation Matrix I
0 New \O!_nr :kbool
Further Inference in the Multiple Regression Model 169
� FAMINC 1
3 HE 0.354·684 1
•
T WE
>
0362328 0.594343 1
'
5, KL·6 D·. .07195 (}_ 104877 0 . 1 2�34 1
T XTRA XS 0.289!!17 0.!!35468 0.517798· 0.148742 1
T XmAX6 0.351.365, 0.820563 o.7:m6& 0.159522 0.900206 1
To see the effect of irrelevant variables, we can add two artificially generated variables X5 and X6
to the family income model (6.11):
(6.12)
You can estimate model (6.12) using the edu_inc.xls data set. Below, we will show you how the
variables X5 and X6 were generated.
Variables X5 and X6 were constructed so that they are correlated with HEDU and WEDU, but
they are not expected to influence family income. Specifically, they were defined as follows:
where N(0,1) are random numbers from a normal distribution with mean 0 and standard
deviation 1, generated the way we generated our random samples in Section 2.4.4 and Section
3.1.4.
Go back to your education and income data worksheet. In cells Hl :N2 enter the following
column labels and formulas. In the last row of the table you will find the numbers of the
equations used in the formulas.
H I J K L M N
1 N(0,1) for x5 N(0,1) for x6 HEDU WEDU KL6 Xs x6
2 =B2 =C2 =D2 =J2+2*H2 =M2+K2+I2
(6.14) (6.15)
Note that we copy the values of the HEDU, WEDU and KL6 variables in columns JL. The
reason we are doing this is that we need to have the columns of explanatory variables next to one
another to be able to use the Excel regression analysis tool.
In columns H1, we will generate samples of random numbers from a normal distribution with
mean 0 and standard deviation 1.
Select the Data tab, in the middle of your tab list located on top of your screen. On the Analysis
group of commands, to the far right, select Data Analysis.
170 Chapter 6
I� Data�nialys.f� I
I Fcirl!!ILilll� I l':lata� 
R:evh•w
Anal)'�is
The Data Analysis dialog box pops up. In it, select Random Number Generation (you might
need to use the scroll up and down bar to the right of the Analysis Tools window to find it), then
select OK.
r· = 
I Data Analysis. m�
,;nalysis,Too1$
OK
FTestTwoSam�e for Varianc.es
Fourier Analysis Cancel
l:lisfogrlilm
Movi".1.9.!l"era�
IMf;ffldl MNfui,@@l�!.J, !::!.elp
A Random Number Generation dialog box pops up. We need to generate two sets of random
numbers: one for our X5 variable and one for our X6 variable, so we specify 2 in the Number of
Variables window. We would like to generate as many data points as we have in the data set we
are working with, so we specify 428 in the Number of Random Numbers window. We select
Normal in the Distribution window; the selected Parameters should be Mean equal to 0, and
Standard deviation equal to 1. Select Output Range and specify it to be H2:1429. Finally, we
select OK.
Random tfomb<er Generation
��
'Number'of!!'.ariables;
J 2.
'  � �
'Number of Random 'Num!;\ers: I.._4:2_8
_____,
�
'gjstribulion.: ! Normal �1 t!elp
M'�an=
�
2,tandard deviation = [=1
"B,arndom Seed::
Output options
0 Quiput Range;
0 Nell)' �Gtkbook
After you copy the content of cells J2:N2 to cells J3:N429, your table should look like the one
below (only the first five values are shown below):
Further Inference in the Multiple Regression Model 171
l;l I I I j I K I L I M I N
1 NIO� 1 �for x� Nfa>, 1) for x6 HE WE KLG X'!J XG
2 1167550>181 02U471 5il9 12 12 1 14.3351 2&_53982'
]. 0_2412639'33 0_08421011 9 12 0 9_482528 2'1 _56'674
T 0_ 7'237940·74 0_549'94871 12 12'. 1 10_55241 23,_1023G

5, 0.459443648 0.53'153258 10 12 0 1 o.:J36:89 23.47042
_? 1_79054040.9 0. 5.&18233 12 14 1 15_58108 29'_01926
In the Regression dialog box, the Input Y Range should be Al:A429, and the Input X Range
should be Jl:N429. Check the box next to Labels. Select new Worksheet Ply and name it
Irrelevant Variable Model. Finally, select OK.
!
������������������ �
R·egre5.sicm ��
Input
Irnput I Range:
D Consblnt is Zern
t!elp
0!..abels
0 Confider1a> !Level: EJ ·ry,.
Output opoons.
Note: we obtained different random samples than the ones recorded in the edu_inc data set, this is
why our resulting estimated equation will also differ from the one reported on p. 236 of
Principles of Econometrics, 4e. You will also obtain different parameters estimates for equation
(6.12) because your random numbers will differ from those above.
_6_ B c D E F H
1 SUMMARY OUTPUT
2
3 Re ression Sfalistics
4 MultipleR 0 _421302759<
5 _R ��;uarn ,,,+0.1774960·�5
6 R Square
T
Adjust•ed 0_.1§n�_0707
7 Stantlard Erri:rr 4024724063
8 0 bs erva't i on s 428
190 IANOVA
11 I df SS MS F Sig_nificance F !
t2 Regressi(m 5 1A751.5E+11 29502937711 18.21348455 2.23?01Ei 6:
13 R·esidt:1al 422' 6_83573E+11 1619'840378
14 fatal 427 8_31 OS7'E+11
151
16I C1Jeffic;lerrls Slandard &ror t Stat P�aJue L1Jwer 95% Upper95% Lower95.0% Ue_e_e95"0%
17 Intercept 7'682_625 15.2 11�8!U2S23 0_6.86602894 DA927!�098 2967· 6 . 3&31 2 14311.132'81 2967iU8"312
.. .. ... 14311_132'81
18 HE 2_ 4592'645
_ 1
� .
(6.15)
Consider further the following two artificial models and their associated test for misspecification.
We will use an Ftest for both even though attest could be used for the RESET test 1.
(6.16)
FAMINCi
 2  3
=
(6.17)
/31 + {32HEDUi + {33WEDUi + {34KL6i + y1FAMINCi + y2FAMINCi + ei
Go back to your education and income data worksheet, from where we will first estimate the
restricted model (6.12). In the Regression dialog box, the Input Y Range should be Al:A429,
and the Input X Range should be Bl:D429. Check the box next to Labels. Select Output Range
and specify it to be cell Al in your restricted Model worksheet: you can place your cursor in the
Output Range window and move it to that cell to do that, or type 'Restricted Model'!Al in the
Output Range window. Finally select OK.
.  ·  �
, Regres.s.imi ��
Input
lr\pu t y Range ::
Input� Ral'illlle,:
I $A$1:SA�$42.9
$8$1:�$429
�
�
� el
t!�P
�!..abels 0 Consrant is. £em
D Coojjdena Le.11el :, EJ '%
Output options
Excel informs you that the output range will overwrite existing data. Do press OK to overwrite
the data in the specified range. The result is:
Further Inference in the Multiple Regression Model 173
A I B I c I D E I F I G H I
1 SUMMARY OUTPUT I I

2 t I
3 Regr;ession SJatisfics
l Multiple R ()_420919613
2

Bien plus que des documents.
Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.
Annulez à tout moment.