Vous êtes sur la page 1sur 213

# 660

620

640

testscr

680

700

Empirical MethodsMW24.1

14

16

18

20

22

24

26

str

Call:
lm(formula = testscr ~ str)
Residuals:
Min
1Q
-47.727 -14.251

Median
0.483

3Q
12.822

Max
48.540

Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 698.9330
9.4675 73.825
< 2e-16 ***
str
-2.2798
0.4798 -4.751 0.00000278 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.58 on 418 degrees of freedom
F-statistic: 22.58 on 1 and 418 DF, p-value: 0.000002783

Oliver Kirchkamp

c Oliver Kirchkamp

## 6 February 2015 09:49:38

This handout is a summary of the slides we use in the lecture. The handout is
perhaps not very helpful unless you also attend the lecture. This handout is also
not supposed to replace a book. The principal text for the lecture is Stock and
Watsons book. All formulas we use in the lecture can be found there (with fewer
mistakes). Please expect slides and small parts of the lecture to change from time
to time and print only the material you currently need.
Homepage: http://www.kirchkamp.de/oekonometrie/
Schedule: Lecture: Fri, 10:15-11:45, HS5
Exercise: Fri, 14:15-15:45, SR207
Mon, 16:15-17:45, HS4
Exam: Wed. 18.2., 8-10 (please check homepage!)
Literature:
* Stock and Watson; Introduction to Econometrics, Pearson, 2006
Studenmund; Using Econometrics, Pearson, 2006

## Barreto and Howland; Introductory Econometrics, Cambridge, 2006

Software:
R

free
wide range of applications
Helpful hints, links to the documentation, etc., can be found on the
Homepage
In the lecture we will illustrate many things with R. You should try
these examples on you own computer. Use the online help to look
up unknown commands.

## SAS, STATA, EViews, TSP, SPSS,. . .

expensive

more specialised
more heterogeneous syntax

Contents
1 Introduction
1.1 What is the purpose of economics . . . . . . . . . . . . . . . . . . . .

6
6

1.2
1.3
1.4
1.5
1.6

## Econometrics uses data to measure causal relationsships

Learning aims . . . . . . . . . . . . . . . . . . . . . . . . .
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

2 Statistical theory
2.1 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Random variables and distributions . . . . . . . . . . . . . . . . .
2.3.1 Conditional expected value and conditional variance . . .
2.4 Samples of a population . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.1 The distribution of Y . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Characteristics of sampling distributions . . . . . . . . . .
2.5.3 Why should we use Y to estimate Y ? . . . . . . . . . . . .
2.5.4 Testing hypotheses . . . . . . . . . . . . . . . . . . . . . . .
2.5.5 Estimating the variance of Y . . . . . . . . . . . . . . . . . .
2.5.6 Calculating the p-value with the help of an estimated 2Y .
2.5.7 Relation between p-value and the level of significance . .
2.5.8 What happened to the t table and the degrees of freedom?
2.5.9 A comment . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.10 Another problem . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 An alternative: The Bayesian Approach . . . . . . . . . . . . . . .
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Linear regression with a single regressor
3.1 Measures of determination . . . . . . . . . . . . . .
3.2 OLS assumptions . . . . . . . . . . . . . . . . . . .
3.2.1 Digression The Existence of Moments .
3.3 The distribution of the OLS estimator . . . . . . .
^1 . . . . . . . . . . . . . . . . . . .
3.4 Distribution of
^0 . . . . . . . . . . . . . . . . . . .
3.5 Distribution of
^1 . . . . . . . . . . . . . . . .
3.6 Hypothesis tests for
3.7 Confidence intervals and p-values . . . . . . . . .
3.8 Bayesian Regression . . . . . . . . . . . . . . . . .
3.9 Reporting estimation results . . . . . . . . . . . . .
3.10 Continuous and nominal variables . . . . . . . . .
3.11 Heteroscedastic and homoscedastic error terms . .
3.11.1 An example from labour market economics

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

6
7
7
11
11

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

11
11
12
12
15
16
17
17
17
19
19
21
21
21
22
22
23
23
24
31
32

.
.
.
.
.
.
.
.
.
.
.
.
.

34
36
37
38
39
43
46
46
48
49
51
52
56
59

c Oliver Kirchkamp

Contents

c Oliver Kirchkamp

## 6 February 2015 09:49:38

3.11.2 Back to Caschool . . . . . . . . . . . . . .
3.11.3 What do we get from homoscedasticity? .
3.11.4 Summary Homo-/Heteroskedasticity . .
3.12 Extended OLS assumptions . . . . . . . . . . . .
3.13 OLS problems . . . . . . . . . . . . . . . . . . . .
3.13.1 Alternatives to OLS . . . . . . . . . . . . .
3.13.2 Robust regression . . . . . . . . . . . . . .
3.13.3 A Bayesian approach to robust regression
3.14 Exercises . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

59
60
63
63
64
65
66
68
76

## 4 Models with more than one independent variable (multiple regression)

4.1 Matrix notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 How to do calculations with matrices . . . . . . . . . . . . .
4.1.2 Calculations with matrices in R . . . . . . . . . . . . . . . . .
4.2 Deriving the OLS estimator in matrix notation . . . . . . . . . . . .
4.3 Sp ecification errors . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Examples: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2 Specification errors generalization . . . . . . . . . . . . .
4.4 Assumptions for the multiple regression model . . . . . . . . . . . .
4.5 The distribution of the OLS estimator in a multiple regression . . .
4.6 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.2 Example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.3 Which regressor is responsible for the multicollinearity? . .
4.6.4 Multicollinearity of dummy variables . . . . . . . . . . . . .
4.7 Specification Errors: Summary . . . . . . . . . . . . . . . . . . . . .
^ . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1 The variance of
4.7.2 Imperfect multicollinearity . . . . . . . . . . . . . . . . . . .
4.7.3 Hypothesis tests . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.4 Digression: Multiplication: . . . . . . . . . . . . . . . . . . .
4.7.5 Extending the estimation equation by adding expenditure
per student . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8 Joint Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.1 F statistic for two restrictions . . . . . . . . . . . . . . . . . .
4.8.2 More than two restrictions . . . . . . . . . . . . . . . . . . . .
4.8.3 Specials cases: . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8.4 Special case: Homoscedastic error terms . . . . . . . . . . . .
4.9 Restrictions with more than one coefficient . . . . . . . . . . . . . .
4.10 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10.1 Measure R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.10.2 Measure contribution to R2 . . . . . . . . . . . . . . . . . . .
4.10.3 Information criteria . . . . . . . . . . . . . . . . . . . . . . . .

80
82
82
83
84
86
87
89
90
90
90
94
94
95
97
97
98
98
98
101
102
103
106
107
108
111
112
118
120
121
121

## 4.10.4 t-statistic for individual coefficients

4.10.5 Bayesian Model Comparison . . . .
4.10.6 Comparing models . . . . . . . . . .
4.10.7 Discussion . . . . . . . . . . . . . . .
4.11 Exercises . . . . . . . . . . . . . . . . . . . .

5
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

## 5 Non-linear regression functions

5.1 Functional forms . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Logarithmic Models . . . . . . . . . . . . . . . . . . . .
5.1.3 Logarithmic Models: linear-log . . . . . . . . . . . . . .
5.1.4 Logarithmic Models: log-linear . . . . . . . . . . . . . .
5.1.5 Logarithmic Models: log-log . . . . . . . . . . . . . . .
5.1.6 Comparison of the three logarithmic models . . . . . .
5.1.7 Generalization Box-Cox . . . . . . . . . . . . . . . .
5.1.8 Other non-linear functions . . . . . . . . . . . . . . . .
5.1.9 Non-linear least squares . . . . . . . . . . . . . . . . . .
5.2 Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Interactions between binary variables . . . . . . . . . .
5.2.2 Interaction between a binary and a continuous variable
5.2.3 Application: Gender gap . . . . . . . . . . . . . . . . .
5.2.4 Interaction between two continuous variables . . . . .
5.3 Non-linear interaction terms . . . . . . . . . . . . . . . . . . . .
5.3.1 Non-linear interaction terms . . . . . . . . . . . . . . .
5.3.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 Evaluating multiple regressions
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Can we evaluate multiple regressions systematically?
6.1.2 Internal and external validity . . . . . . . . . . . . . .
6.2 Internal validity - Problems . . . . . . . . . . . . . . . . . . .
6.2.1 Omitted Variable Bias . . . . . . . . . . . . . . . . . .
6.2.2 Incorrect specification of the functional form . . . . .
6.2.3 Errors in the variables . . . . . . . . . . . . . . . . . .
6.2.4 Sample selection bias . . . . . . . . . . . . . . . . . . .
6.2.5 Simultaneous causality . . . . . . . . . . . . . . . . . .
6.2.6 Heteroscedasticity and correlation of error terms . . .
6.3 OLS and prediction . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Comparison of Caschool and MCAS . . . . . . . . . . . . . .
6.4.1 Internal validity . . . . . . . . . . . . . . . . . . . . . .
6.4.2 External validity . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

129
130
134
135
135

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

140
148
148
152
153
154
156
157
158
159
160
161
162
165
169
170
173
174
175
175

.
.
.
.
.
.
.
.
.
.
.
.
.
.

180
180
180
180
181
181
182
183
184
185
185
186
186
212
213

c Oliver Kirchkamp

Contents

c Oliver Kirchkamp

## 6 February 2015 09:49:38

6.4.3

Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

1 Introduction
What is an interesting economic theory?
Claim:
For each economic theory there is an alternative theory predicting the opposite.
Often economic theories suggest relationships often with policy implications but these relationships are rarely quantified.
How large is the increase in the performance of students when courses are
smaller?
How large is the increase in your income when you study for another year?
What is the elasticity of demand for cigarettes?
By how much does the GDP increase if the ECB raises interest rates by 1%?

## 1.1 What is the purpose of economics

Developing theories
Testing theories
Using theories for prediction

## 1.2 Econometrics uses data to measure causal relationsships

Ideal approach: controlled experiment (control group/treatment group)

## By how much does the performance of students increase when courses

become smaller?
How much more do you earn if you decide to spend an additional year
at university.
What is the price elasticity of cigarettes?
By how much does the GDP increase if the ECB reduces the interest by
one percentage point?

hard to do

INTRODUCTION

## Most of the data we have are from uncontrolled processes.

Student test scores

Incomes of alumni
Time series data about monetary policy
Problems related to data from uncontrolled processes
Unobserved factors

Simultaneous causalities
Coincidence causality

## Quantifying causal effects using observational data from uncontrolled

processes
Extrapolating times series

## Evaluating the econometric work of others

1.4 Example
How does learning success change when class size is reduced by one student? What, if class size is reduced by eight students?
Can we answer this question without using data?
E.g. test scores from 420 school districts in California from 1998-1999

str = student teacher ratio number of students in the district / full time
equivalent teachers
testscr = 5th -grade test score (Stanford-9 achievement test)

## For our examples we use the statistical software R.

Some components of R are contained in so called libraries. Together these libraries cover a huge
functional range. For our introductory examples we use only a few of these libraries. Additional libraries
can be loaded by using the library command. RSiteSearch and the R Site Search Extension for Firefox
help us to determine which library offers a certain functionality. Here we are going to use Ecdat, which
contains several econometric data sets and the library car, which offers a number of handy econometric
functions.
library(Ecdat)
library(car)

c Oliver Kirchkamp

c Oliver Kirchkamp

## 6 February 2015 09:49:38

The command data enables access to the data set contained in a library.

data(Caschool)

We can now access this data set. summary displays an overview of the statistical characteristics of the data
set.

names(Caschool)
[1] "distcod"
[7] "calwpct"
[13] "str"

"county"
"mealpct"
"avginc"

"district" "grspan"
"computer" "testscr"
"elpct"

"enrltot"
"compstu"
"mathscr"

"teachers"
"expnstu"

summary(Caschool$str) Min. 1st Qu. 14.00 18.58 Median 19.72 ## Mean 3rd Qu. 19.64 20.87 Max. 25.80 It is quite cumbersome to write the name of a data set - here Caschool - time and time again. Whenever we intend to work with the same data set for a while we can use attach(Caschool). This will tell R to look at Caschool first, whenever we ask for a variable. attach(Caschool) ## Using summary is much easier now. summary(str) Min. 1st Qu. 14.00 18.58 Median 19.72 ## hist draws a histogram. hist(str) ## Mean 3rd Qu. 19.64 20.87 Max. 25.80 INTRODUCTION 60 40 0 20 Frequency 80 100 Histogram of str 14 16 18 20 22 24 26 str ## scatterplot draws a scatterplot. 660 640 620 testscr 680 700 library(car) scatterplot(testscr ~ str) 14 16 18 20 str 22 24 26 c Oliver Kirchkamp c Oliver Kirchkamp 10 ## 6 February 2015 09:49:38 Test results testscr seem to be getting worse as student teacher ratios str are getting higher. Is it possible to show that districts with low student teacher ratios str have higher test scores testscr? Compare average test scores in districts with small str to test scores in districts with high str (estimation) Test the null hypothesis that mean test scores are the same against the alternative hypothesis that they are not (hypothesis testing) Estimate an interval for the difference of the mean test scores (confidence interval) Is the difference large enough for a school reform to convince parents to convince the school authority In the following example we want to split up the data set into two pieces schools with a student teacher ratio above and below 20. In other words, we will introduce a nominal variable. In R a nominal variable is called a factor and factor converts a continuous variable (str) into a factor. t.test performs a student-t test to compare mean values. We write Caschool$testsrc ~ large.
The variable to be tested is given before the tilde. The factor describing the two groups to be compared is
given after the tilde.
large <- str>20
t.test(testscr ~ large)

## Welch Two Sample t-test

data: testscr by large
t = 3.9231, df = 393.721, p-value = 0.0001031
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
3.584445 10.785813
sample estimates:
mean in group FALSE mean in group TRUE
657.1846
649.9994

This simple test tells us that there is a significant difference of the test scores
testscr between large and small school groups.
We can estimate the difference between the two groups, we can test a hypothesis,
and we can calculate a confidence interval.

STATISTICAL THEORY

11

1.5 Plan
You already know estimates, hypotheses tests and confidence intervals.
We will generalize these concepts for regressions.
Before we do this, we will take a brief look at the underlying theory.

1.6 Exercises
1. Econometrics
What is econometrics?

## Which kind of questions can be answered with econometric tools? Give

some examples of questions that can possibly be answered with econometrics. Give examples of different fields, e.g. policy advice, science,
marketing, . . . .
2. Data sources
What are typical data sources for econometric analysis?

## Which problems might be related to the different data sources?

2 Statistical theory
Population, random variable, distribution
Moments of a distribution (mean value, variance, standard deviation, covariance, correlation)
Conditional distribution, conditional mean values
Distribution of a random sample

2.1 Population
The set of all entities which could theoretically be observed (e.g. all imaginable school districts at all points in time under all imaginable conditions)
Quite often we assume that the population is of infinite size (or at least very
large)
Usually we know something about A (our sample) and we want to say
something about B. We can do this, if we assume that both A and B are
drawn from the same population.

c Oliver Kirchkamp

c Oliver Kirchkamp

12

## 6 February 2015 09:49:38

2.2 Sample
A part of the population that we observe (e.g. Californian school districts in
1998 (and under the conditions of this year))

## 2.3 Random variables and distributions

random variable (RV) = numerical summary of random event
discrete (categorial, factor) variable / continuous variable
one-dimensional variable / multi-dimensional variable
Describing random variables using distributions
Probabilities of events P(x)
(when dealing with discrete RV)
Cumulative distribution function F(x)
(when dealing with one-dimensional RV)
Probability density function f(x)
(when dealing with continuous RV)
Properties of random variables
Expected value E(X), X , (theoretical) mean value of X
mean value of X for the entire population
Variance E((X X )2 ) = 2X
Measure of the mean of the squared deviation from the mean of the distribution

## Standard deviation variance = X

Generally,
X F(1 , 2 , . . .)
where 1 , 2 , . . . are parameters of the distribution.
Some distributions (e.g. the normal distribution) are characterised by and 2 .
X N(, 2 )

STATISTICAL THEORY

13

## Joint distribution of random variables

Random variables X and Z have a joint distribution
Covariance of X and Z: cov(X, Z) = E((X X )(Z Z )) = XZ
Covariance is a measure of the linear dependence between X and Z
Positive covariance = positive dependence between X and Z
If X and Z are independently distributed, then cov(X, Z) = 0 (but not
vice versa!!!)
The covariance between a random variable and itself is its variance
cov(X, X) = E((X X )(X X )) = E((X X )2 ) = 2X
The correlation coefficient can be written as a fraction of covariances:
cor(X, Z) = p

cov(X, Z)

= XZ
X Z
var(X) var(Z)

cov(str,testscr)
[1] -8.159324
cov(str,str)
[1] 3.578952
var(str)
[1] 3.578952
cor(str,testscr)
[1] -0.2263628

## cor(X, Z) [1, +1]

cor(X, Z) = 1 perfectly positive linear dependence
cor(X, Z) = 1 perfectly negative linear dependence
cor(X, Z) = 0 no linear dependence

c Oliver Kirchkamp

## Conditional distributions and conditional mean values

Conditional distributions
The distribution of Y given the value of another random variable X
E.g. the distribution of test scores testscr given that student teacher
ratio str< 20
library(lattice)
densityplot(testscr,group=large,auto.key=list(columns=2,cex=.7))

FALSE

TRUE

0.020
0.015

Density

c Oliver Kirchkamp

14

0.010
0.005
0.000
600

650

700

testscr

## E.g. wages of men and women

data(Wages)

The data set Wages contains, among others, the following two variables:
exp
years of full-time work experience
lwage logarithm of wage
xyplot(lwage ~ exp,group=sex,data=Wages,auto.key=list(corner=c(1,1)))

STATISTICAL THEORY

15

female
male

8.5
8.0

lwage

7.5
7.0
6.5
6.0
5.5
0

10

20

30

40

50

exp

## 2.3.1 Conditional expected value and conditional variance

Now, we have two data sets in memory. There are several possibilities of telling R which data set we want
to use, when we type a command:
Above, we have already used attach(Caschool). It tells R to first look for a variable in Caschool. We
can use detach to remove this search directive and issue attach(Wages) to tell R to search in Wages from
now on.
Alternatively, we can use statements like Caschool$large to call the variable large in the data set Caschool and Wages$exp to call the variable exp in the data set Wages.
Furthermore, there is with(Wages,...) which tells R that we want to use Wages for everything we
write in parentheses after with.
In this example, we learn to use subset to select a part of a data set; for example we select all schools
who meet the condition str<20.

## Conditional expected value: E(X|Y = y)

( important notation)

## Conditional variance: variance of the the conditional distribution

Examples
E(testscr|str < 20) = expected mean value of the test scores of all districts
with small group size
with(subset(Caschool,str<20),mean(testscr))
[1] 657.3513

c Oliver Kirchkamp

c Oliver Kirchkamp

16

## 6 February 2015 09:49:38

with(subset(Caschool,str>=20),mean(testscr))
[1] 649.9788

## Wage of female workers (X=wage, Y=sex)

with(subset(Wages,sex=="female"),mean(lwage))
[1] 6.255308
with(subset(Wages,sex=="male"),mean(lwage))
[1] 6.729774

Recovery rate of all patients who have received a certain drug (X=recovery,
Y=drug)
If E(X|Y = y) = const for all values of y (does not depend on y), then cor(X, Y) = 0
(not vice versa!!!)

## 2.4 Samples of a population

Consider the sample Y1 . . . Yn of a population Y
Before the sample is drawn the Y1 . . . Yn are random.
After the sample is drawn, the values of Y1 . . . Yn are realised, they are
fixed numbers they are not random anymore.
Y1 . . . Yn is the data set. Yi is the value of Y for observation i (person i,
district i etc.)
If we draw a sample randomly, it is true that

## Two observations are drawn randomly, thus the value of Yi contains no

information about the value of Yj .

## Yi and Yj are independently distributed

Yi and Yj were drawn from the same distribution. Thus, they are identically distributed
We say that Yi and Yj are independently and identically distributed (=
i.i.d.).
More generally speaking: Yi are i.i.d. for i = 1, . . . , n.

STATISTICAL THEORY

17

2.5 Estimations
In econometrics we often estimate unknown quantities. Lets suppose we have a
sample Y1 . . . Yn of a random variable Y. We start with a simple problem: How
can we estimate the mean value of Y (not the mean value of Y1 . . . Yn )?
Idea:
We could simply use the mean value Y of the sample Y1 . . . Yn
We could simply use the first observation Y1
We could use the median of the sample Y1 . . . Yn

2.5.1 The distribution of Y
The observations of the sample are drawn randomly.
Thus, the values of Y1 . . . Yn are random.
Thus, functions of Y1 . . . Yn are random (e.g. the mean value).

If we had drawn a different sample, the function (e.g. the mean value)
would have a different value.

## We call the distribution of Y over several possible samples sampling distribution.

The mean value and the variance of Y are the mean value and the variance
and var(Y).

## of the sampling distribution E(Y)

2.5.2 Characteristics of sampling distributions
Expected value of Y
= Y , i.e. Y is an unbiased estimator of Y
E(Y)
Variance of Y
How does the variance depend on the size of the sample n?
2Y

var(Y) =
n
Question: Does Y converge to Y if n is large?
Law of large numbers:

c Oliver Kirchkamp

## 6 February 2015 09:49:38

Y is a consistent estimator of Y .
Formally: If Y1 , . . . , Yn i.i.d. and 2Y < , then Y is a consistent estimator of Y ,
i.e.

n

p
Y Y

## Central Limit Theorem:

If Y1 , . . . , Yn i.i.d. and 0 < 2Y < and n is large, then the distribution of Y
approximates a normal distribution
!
2

Y N Y , Y
n
Of course, R knows distributions, too. In the following example we draw two density functions of binomially distributed variable using dbinom.

0.6

x/10

1.0

0.020
0.010
0.000

0.30
0.20
0.10
0.00
0.2

0.030

x = 750:850
plot(x/1000,
dbinom(x, size=1000, prob=0.8))

## dbinom(x, size = 1000, prob = 0.8)

x<-1:10
plot(x/10,
dbinom(x, size=10, prob=0.8))

## dbinom(x, size = 10, prob = 0.8)

c Oliver Kirchkamp

18

0.76

0.80

0.84

x/1000

The distribution on the left, which is based on a small sample size, does not
quite look like a normal distribution. The distribution on the right is based on a

STATISTICAL THEORY

19

much larger sample size (n = 1000) and has a lot more similarities with a normal
distribution.
2.5.3 Why should we use Y to estimate Y ?
= Y
Y is unbiased: E(Y)
p
Y is consistent: Y Y

## Y is the least squares estimator for Y

P
Y is the solution of minx n (Yi x)2
i=1

## Y has a smaller variance than all other linear unbiased estimators.

1 Pn
^ Y is unbiased, it is
For any estimator
^Y = n
i=1 ai Yi with {ai } so that

## true that var(Y) var(^

Y )
However, there are non-linear estimators, too. . .
2.5.4 Testing hypotheses
Is 652 the mean testscr?
t.test(Caschool$testscr, mu=652) ## One Sample t-test data: Caschool$testscr
t = 2.3196, df = 419, p-value = 0.02084
alternative hypothesis: true mean is not equal to 652
95 percent confidence interval:
652.3291 655.9840
sample estimates:
mean of x
654.1565

(two-sided test)

one-sided test

one-sided test

## Level of significance of a test = Predefined probability of rejecting the null

hypothesis, despite it being true.

c Oliver Kirchkamp

c Oliver Kirchkamp

20

## 6 February 2015 09:49:38

= Probability of drawing a sample Y1 , . . . , YN ,
p-value of a statistic (e.g. for Y)
that is at least as averse to the null hypothesis as our data given that the
null hypothesis is true.
p-value = PrH (|Y Y,0 | > |Y sample Y,0 |)
e.g. with Y:
0

## To calculate the p-value we have to know the sampling distribution of Y.

That is complicated if n is small.
If n is large, we can use the normal distribution to approximate the sampling
(Central Limit Theorem)
distribution of Y.

p value = PrH0

= PrH0
= PrH0


sample

|Y Y,0 | > |Y
Y,0 |

!

Y Y,0 Y sample Y,0

/n > /n
Y
Y
!
sample

Y Y,0 Y

Y,0

>

Y
Y

F(|g|)
|g|

(1)
(2)
(3)

F(|g|)
0

|g|

If n is large: p value
= the probability,
that an N(0, 1)-distributed random

Y sample Y,0
.
variable is outside of
Y

Statistic: g =

x
0
/ n

## In practice Y is unknown it must be estimated.

F(|g|)
|g|

F(|g|)
0

|g|

STATISTICAL THEORY

21

## 2.5.5 Estimating the variance of Y

s2Y

n
2
1 X
Yi Y = sample variance of Y
=
n1
i=1

## If Y1 , . . . , Yn i.i.d. and E(Y 4 ) < , then s2Y 2Y

Why does the law of large numbers apply?
s2Y is the mean value of a sample.

We demand E(Y 4 ) < , because the mean value is not calculated from Yi
but from its square.
2.5.6 Calculating the p-value with the help of an estimated 2Y



sample

## p value = PrH0 |Y Y,0 | > |Y

Y,0 |
!

sample

Y Y,0 Y
Y,0

>
= PrH0

Y / n Y / n

Y Y sample

Y,0 >
Y,0
= PrH0

sY / n
sY / n
| {z } |

{z
}

sample
t
t

F(|g|)
|g|

F(|g|)
0

|g|

H0 is rejected if p <
2.5.7 Relation between p-value and the level of significance
The level of significance is given. E.g. if the given level of significance is 5%,. . .
. . . the null hypothesis is rejected if |t| > 1.96,
. . . equivalently the null hypothesis is rejected if p < 0.05.
The p-value is also called marginal level of significance.

(4)
(5)

(6)

c Oliver Kirchkamp

c Oliver Kirchkamp

22

## 6 February 2015 09:49:38

In many situations we will provide much more information to others by
telling them the p-value we calculated, than by telling them whether we
rejected the null hypothesis or not.

## 2.5.8 What happened to the t table and the degrees of freedom?

If Y1 , . . . , Yn is i.i.d. and normally distributed N(Y , 2Y ), the t-statistic follows the Student-t distribution with n 1 degrees of freedom.
The most important values of the t-distribution can be found in all old statistics books. The recipe is simple:
1. Calculate the t statistic
2. Calculate the degrees of freedom n 1
3. Look up the 5% critical value
4. If the t static is greater (in absolute terms) than the critical value, reject
the null hypothesis.

2.5.9 A comment
The theory of the t-distribution is a mathematically beautiful and interesting
result.
If Y is i.i.d. and normally distributed, we know the exact distribution of the
t statistic.
But
If the Y are not exactly normally distributed, this does not help us at all.

data(OFP, package="Ecdat")
hist(OFP[["faminc"]],breaks=40)

STATISTICAL THEORY

23

600
0

200

Frequency

1000

Histogram of OFP[[faminc]]

10

20

30

40

50

OFP[[faminc]]

## However, this is not as bad as it seems:

No matter how Y is distributed: if n is large enough, Y converges to the
normal distribution anyway.
2.5.10 Another problem
When we want to compare two groups, we look at
Y YB
t= rA
s2A
nA

s2B
nB

## This statistic only follows the t-distribution if

Y is normally distributed and i.i.d.,
and the (population) variance 2A = 2B is the same in both groups. This can
be a heroic assumption. (e.g. wages of men vs wages of women).

## 2.6 Confidence intervals

A 95% confidence interval for Y is the interval that contains the true value of Y in
95% percent of all repeated samples.

c Oliver Kirchkamp

c Oliver Kirchkamp

24

## 6 February 2015 09:49:38

confidence interval
for

Y + n Q

0 Y +

Q 1

## H0 : Y = 0 is rejected, if 0 is outside the confidence interval.

Note: The confidence interval is based on the random sample Y1 , . . . , Yn .
Thus, the confidence interval is random itself.
The parameter Y of the population is not random but we do not know
it.
confint(lm(testscr ~ 1))
2.5 % 97.5 %
(Intercept) 652.3291 655.984

## 2.7 An alternative: The Bayesian Approach

1
= P(|X)
P() P(X|)
| {z }
|{z} | {z } P(X)
prior

likelihood

posterior

Here we use a numerical approximation to calculate the Bayesian posterior distribution for the mean of testscr. We employ the Gibbs sampler jags (which is
similar to Bugs).
The first lines specify the stochastic process (y[i] ~ dnorm(mu,tau)), the next
lines specify the priors. Here we use uninformed priors, mu ~ dnorm (0,.0001)
means that mu could take almost any value. The precision of the normal distribution (0.0001) is very small.
library(runjags)
modelX <- model {
for (i in 1:length(y)) {
y[i] ~ dnorm(mu,tau)
}
mu
~ dnorm (0,.0001)
tau ~ dgamma(.01,.01)
sd <- sqrt(1/tau)
}
}
bayesX<-run.jags(model=modelX,data=list(y=testscr),monitor=c("mu","sd"))

STATISTICAL THEORY

25

## Compiling rjags model and adapting for 1000 iterations...

Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 2 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation

c Oliver Kirchkamp

## Digression: an uninformed prior for , mudnorm(0,.0001)

precision<-.0001
x<-seq(-10000,10000,500)
xyplot(pnorm(x,0,1/precision) ~ x,type="l",xlab="$\\mu$",ylab="$F(\\mu)$")

0.8

F()

0.6

0.4

0.2

-10000

-5000

5000

10000

The prior distribution for (i.e. dnorm(0,.0001)) assigns (more or less) the
same a-priory probability to any reasonable value of .

## Digression: an uninformed prior for = 1/ , taudgamma(.01,.01):

s<-10^seq(-1,4.5,.1)
x<-1-pgamma(1/s^2,.01,.01)
xyplot(x ~ s, scales=list(x=list(log=T)), xscale.components = xscale.components.fractions,xlab

## 6 February 2015 09:49:38

0.20

F()

0.15
0.10
0.05
0.00
1/10

10

100

1000

10000

JAGS gives us now a posterior distribution for and for the standard deviation
of testscr.

Density

## 651 652 653 654 655 656 657

plot(bayesX,var="mu",type=c("trace","density"))

mu

c Oliver Kirchkamp

26

652

654

mu

## 6000 8000 10000

Iteration

14000

656

658

STATISTICAL THEORY

27

## 0.0 0.2 0.4 0.6

19

17

18

sd

20

Density

21

plot(bayesX,var="sd",type=c("trace","density"))

18

19

20

21

22

17

sd

## 6000 8000 10000

14000

Iteration
summary(bayesX)

Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
mu 654.10 0.9329 0.006597
0.006387
sd 19.09 0.6630 0.004688
0.004688
2. Quantiles for each variable:
2.5%
25%
50%
75% 97.5%
mu 652.28 653.46 654.11 654.73 655.93
sd 17.84 18.64 19.07 19.53 20.44

Comparison with the frequentist approach: The credible interval which can
be obtained from the last line of the summary is very similar to the confidence
interval from the frequentist approach.

c Oliver Kirchkamp

c Oliver Kirchkamp

28

## 6 February 2015 09:49:38

Credible interval:
2.5%
97.5%
652.2768 655.9267

Confidence interval:
2.5 % 97.5 %
(Intercept) 652.3291 655.984

Also the estimated mean and its standard deviation are very similar to mean
and standard error of the mean from the frequentist approach.
Priors: uninformed / mildly informed / informed
sonable?

## Are uninformed priors rea-

Example:
You measure the eye colour of your fellow students. You sample 5 students and
they all have blue eyes.
100% of your sample has blue eyes. You have no variance. How many of the
remaining students will have blue eyes? Can you give a confidence interval?
Informed priors Above we used (similar to the frequentist approach) an uninformed prior. Here we will assume that we already know something. Actually,
we will pretend that we already did a similar study. That study gave us results
of similar precision but with a different mean. Here we pretend that our prior
distribution for is dnorm(664,1). Everything else remains the same.
library(runjags)
modelI <- model {
for (i in 1:length(y)) {
y[i] ~ dnorm(mu,tau)
}
mu
~ dnorm (664,1)
tau ~ dgamma(.01,.01)
}
}
bayesI<-run.jags(model=modelI,data=list(y=testscr),monitor="mu")
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 1 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation

STATISTICAL THEORY

29

## 0.0 0.2 0.4

Density

659
658

mu

660

661

plot(bayesI,var="mu",type=c("trace","density"))

657

mu

## 6000 8000 10000

14000

Iteration

Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
mu 658.9 0.7124 0.005037
0.005344
2. Quantiles for each variable:
2.5%
25%
50%
75% 97.5%
mu 657.5 658.4 658.9 659.4 660.3

We see that the informed prior has shifted the posterior away from the previous
results. The new results are now somewhere between the ones we got with an
uninformed prior and the new prior.
Comparison: Frequenties versus Bayesian approach
Frequentist: Null Hypothesis Significance Testing (Ronald A. Fisher, Statistical
Methods for Research Workers, 1925, p. 43)

c Oliver Kirchkamp

c Oliver Kirchkamp

30

## 6 February 2015 09:49:38

X , X is random, is fixed.

## Confidence intervals and p-values are easy to calculate.

Interpretation of confidence intervals and p-values is awkward.
p-values depend on the intention of the researcher.
We can test Null-hypotheses (but where do these Null-hypotheses come
from).
Not good at accumulating knowledge.
More restrictive modelling.

Bayesian: (Thomas Bayes, 1702-1761; Metropolis et al., Equations of State Calculations by Fast Computing Machines. Journal of Chemical Physics, 1953.)
X , X is fixed, is random.

## Requires more computational effort.

Credible intervals are easier to interpret.
Can work with uninformed priors (similar results as with frequentist statistics)
Efficient at accumulating knowledge.
Flexible modelling.
Most people are still used to the frequentist approach. Although the Bayesian
approach might have clear advantages it is important that we are able to understand research that is done in the context of the frequentist approach.
How the intention of the researcher affects p-values Example: Multiple testing
(this is not the only example).
Assume a researcher obtains the following confidence intervals for different
groups (H0 : = 0)

STATISTICAL THEORY

31

2
0
-2
-4

9 10 11 12 13 14 15 16 17 18 19 20

Groups
A researcher who a priori only suspects group 16 to have 6= 0 will find (correctly) a significant effect.
A researcher who does not have this a priori hypothesis, but who justs carries
out 20 independent tests, must correct for multiple testing and will find no significant effect. After all, it is not surprising to find in 5% of all samples a 95%
confidence interval which does not include the Null-hypothetical value.

2.8 Summary
Having started from these assumptions
single random samples of a population (Y1 , . . . , Yn are i.i.d.)

E Y4 <
the sample is large (n is large)

## estimate (sample distribution of Y)

test hypotheses (Y is t-distributed and approximately normally distributed.
This allows us to calculate the p-value.)
calculate confidence intervals
Do these assumptions make sense?

c Oliver Kirchkamp

c Oliver Kirchkamp

32

## 6 February 2015 09:49:38

2.9 Exercises
1. Revision I
In the following task we will refresh some basic concepts:
You have the following data about childrens age (a) and the pocket money
(pm) they receive from their parents on children in elementary school.
age in years (a)
6
7
6
7
8
8
9
10
9
10

(Intercept)
6.569925e-242

str
2.783307e-06

## We find that the two values are slightly different.

The following two statements are equivalent:
The 95% confidence interval does not contain the zero
The hypothesis 1 = 0 is rejected at the 5% level of significance

strCIdist

strH0dist

0.8

Density

0.6
0.4
0.2
0.0
-4

-3

-2

-1

^1

## 3.8 Bayesian Regression

Of course, we use the Bayesian approach also in the context of regressions. All we
have to do is adjust the model from section 2.7 slightly.

c Oliver Kirchkamp

## 6 February 2015 09:49:38

modelR<-model {
for (i in 1:length(y)) {
y[i] ~ dnorm(beta0 + beta1*x[i],tau)
}
beta0 ~ dunif (0,1200)
beta1 ~ dnorm (0,.0001)
tau
~ dgamma(.01,.01)
}
}
bayesR<-run.jags(model=modelR,data=list(y=testscr,x=str),
monitor=c("beta0","beta1"))
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 2 variables....
Convergence may have failed for this run for 2 parameters after 10000
iterations (multi-variate psrf = 1.164)
Finished running the simulation

## JAGS returns us a distribution for 1 :

0.8
0.4
0.0

-3.0

-2.5

-2.0

Density

-1.5

-1.0

plot(bayesR,var="beta1",type=c("trace","density"))

-4

-3.5

beta1

c Oliver Kirchkamp

50

-3

-2

beta1
6000 8000 10000

Iteration

14000

-1

## LINEAR REGRESSION WITH A SINGLE REGRESSOR

51

summary(bayesR)

Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
SD Naive SE Time-series SE
beta0 700.611 9.9421 0.070301
1.10111
beta1 -2.365 0.5038 0.003563
0.05573
2. Quantiles for each variable:
2.5%
25%
50%
75%
97.5%
beta0 682.074 693.408 700.372 707.782 719.225
beta1 -3.305 -2.732 -2.353 -1.999 -1.423

## Comparison with frequentist approach:

coef(est1)
(Intercept)
698.932952

str
-2.279808

sqrt(diag(vcov(est1)))
(Intercept)
9.4674914

str
0.4798256

confint(est1)
2.5 %
97.5 %
(Intercept) 680.32313 717.542779
str
-3.22298 -1.336637

As in section 2.7 we see that the credible intervals are similar to the frequentist
confidence intervals. The interpretation, however, is quite different. The credible
intervals make a direct statement about the probability that 1 is in a certain interval. Confidence intervals make much more indirect statement which is harder
to interpret.

## 3.9 Reporting estimation results

Reporting estimation results:
The summary table is not very concise:

c Oliver Kirchkamp

c Oliver Kirchkamp

52

## 6 February 2015 09:49:38

summary(lm(testscr ~ str))

Call:
lm(formula = testscr ~ str)
Residuals:
Min
1Q
-47.727 -14.251

Median
0.483

3Q
12.822

Max
48.540

Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 698.9330
9.4675 73.825
< 2e-16 ***
str
-2.2798
0.4798 -4.751 0.00000278 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 18.58 on 418 degrees of freedom
F-statistic: 22.58 on 1 and 418 DF, p-value: 0.000002783

(9.4675)
(0.4798)

## R2 = 0.05, SER = 18.58

Standard errors are often shown in parentheses below the estimated coefficients.
The estimated regression line is testscr = 698.933 2.2798 str
The standard error of 0 = 9.4675
The standard error of 1 = 0.4798
The R2 = 0.05, the standard error of the residuals is SER = 18.58.
These are almost all of the numbers we need to perform a hypothesis test and
calculate confidence intervals.

Continuous:

## gross domestic product

income in Euro
str

Nominal / discrete
sex

## LINEAR REGRESSION WITH A SINGLE REGRESSOR

53

profession
sector of a firm
income in categories
Binary variable / dummy-variables are a special case of nominal variables
sex male/female

## income higher than 40 000 Euro per year Yes/No

unemployed Yes/No
university degree Yes/No
In
testsrc = 0 + 1 str + u
we used a continous independent variable str. But what if only had binary data
for str?

1 if str>20
large =
0 else
Now estimate
testsrc = 0 + 1 large + u

## large <- Caschool$str>20 est<-lm(testscr ~ large) summary(est) Call: lm(formula = testscr ~ large) Residuals: Min 1Q -50.435 -14.071 Median -0.285 3Q 12.778 Max 49.565 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 657.185 1.202 546.62 < 2e-16 *** largeTRUE -7.185 1.852 -3.88 0.000121 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 18.74 on 418 degrees of freedom Multiple R-squared: 0.03476,Adjusted R-squared: 0.03245 F-statistic: 15.05 on 1 and 418 DF, p-value: 0.0001215 c Oliver Kirchkamp ## 6 February 2015 09:49:38 700 700 680 680 testscr testscr c Oliver Kirchkamp 54 660 660 640 640 620 620 14 16 18 20 22 24 26 ## 0.0 0.2 0.4 0.6 0.8 1.0 str large ## In general (when X is a binary / dummy-variable) Yi = 0 + 1 Xi + ui Interpretation: If Xi = 0: Yi = 0 + ui The mean value Y = 0 E(Yi |Xi = 0) = 0 If Xi = 1: Yi = 0 + 1 + ui The mean value Y = 0 + 1 E(Yi |Xi = 1) = 0 + 1 1 = E(Yi |Xi = 1) E(Yi |Xi = 0) is the difference between the mean values of the two groups of the population. t.test performs a t-test to compare two mean values. tapply uses a function (here the mean value mean and the standard deviation sd) on individual groups of a data set. Here, these groups are described through the variable large. est1<-lm(testscr ~ large) summary(est1) ## LINEAR REGRESSION WITH A SINGLE REGRESSOR Call: lm(formula = testscr ~ large) Residuals: Min 1Q -50.435 -14.071 Median -0.285 3Q 12.778 Max 49.565 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 657.185 1.202 546.62 < 2e-16 *** largeTRUE -7.185 1.852 -3.88 0.000121 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 18.74 on 418 degrees of freedom Multiple R-squared: 0.03476,Adjusted R-squared: 0.03245 F-statistic: 15.05 on 1 and 418 DF, p-value: 0.0001215 confint(est1) 2.5 % 97.5 % (Intercept) 654.82130 659.547833 largeTRUE -10.82554 -3.544715 t.test(testscr ~ large) ## Welch Two Sample t-test data: testscr by large t = 3.9231, df = 393.721, p-value = 0.0001031 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 3.584445 10.785813 sample estimates: mean in group FALSE mean in group TRUE 657.1846 649.9994 tapply(testscr,large,mean) FALSE TRUE 657.1846 649.9994 tapply(testscr,large,sd) FALSE TRUE 19.28629 17.96589 55 c Oliver Kirchkamp ## 6 February 2015 09:49:38 It does not matter, whether we use a Student t-test to compare the mean values of groups, or whether we calculate a regression with a single binary variable. ## A regression can be useful if we want to control for additional regressors. ## 3.11 Heteroscedastic and homoscedastic error terms Recall the three OLS assumptions: 1. E(ui |Xi = x) = 0 2. (Xi , Yi ) are i.i.d. 3. Large outliers in X and Y are rare (the fourth moments of X and Y exist) Now we add an additional assumption: var(u|X = x) ist constant, u is homoscedastic runif calculates a vector of uniformly distributed (pseudo-) random variables. Here, we need this vector to simulate an estimation model. 10 12 x <- runif(1000) u <- rnorm(1000) y <- 10 - 3.1*x + u plot(y ~ x) c Oliver Kirchkamp 56 0.0 0.2 0.4 0.6 0.8 1.0 ## LINEAR REGRESSION WITH A SINGLE REGRESSOR 57 est<-lm(y ~ x) plot(est,which=1:2) Normal Q-Q 7.0 8.0 9.0 Fitted values 10.0 324 126 622 -3 -2 -1 0 -2 Residuals 324 622 126 Standardized residuals Residuals vs Fitted -3 -1 1 2 3 Theoretical Quantiles ## In the following example the error terms are no longer independent of x u2 <- rnorm(1000)*x y2 <- 10 - 3.1*x + u2 plot(y2 ~ x) c Oliver Kirchkamp y2 10 ## 6 February 2015 09:49:38 0.0 0.2 0.4 0.6 0.8 1.0 est2<-lm(y2 ~ x) plot(est2,which=1:2) Residuals vs Fitted Normal Q-Q 7.0 8.0 9.0 839 -2 -3 383 697 -4 -2 -1 Standardized residuals 839 Residuals c Oliver Kirchkamp 58 10.0 Fitted values In both examples it is true that E(ui |Xi = x) = 0 In the first example u is homoscedastic 697 383 -3 -1 1 2 3 Theoretical Quantiles ## LINEAR REGRESSION WITH A SINGLE REGRESSOR 59 ## In the second exmample u is heteroscedastic ## 3.11.1 An example from labour market economics wage educ ## weekly wages for US make workers from current population survey 1988 years of education data(uswages,package="faraway") plot(wage ~ educ,data=uswages) plot(lm(wage ~ educ,data=uswages),which=1:2) 10 2780 25909 4000 2000 Residuals 2000 4000 25909 Standardized residuals 15 15387 2780 wage Normal Q-Q 15387 6000 6000 8000 8000 Residuals vs Fitted 10 15 educ ## 3.11.2 Back to Caschool data(Caschool,package="Ecdat") attach(Caschool) est <- lm(testscr ~ str) plot(testscr ~ str) plot(est,which=1:2) 100 300 500 700 Fitted values -3 -1 Theoretical Quantiles c Oliver Kirchkamp ## 6 February 2015 09:49:38 Normal Q-Q 700 60 Residuals vs Fitted 417 0 -2 -1 Standardized residuals 20 -20 -40 Residuals 40 417 680 660 640 620 testscr c Oliver Kirchkamp 60 6 7 14 18 22 26 640 str 650 660 Fitted values -3 -1 0 Theoretical Quantiles ## 3.11.3 What do we get from homoscedasticity? OLS has the smallest variance of all estimators, which are linear in Y (GaussMarkov Theorem)  ^1 . It is easier to calculate var ## Recall: var(X Y) = (E(X))2 var(Y) + (E(Y))2 var(X) + var(X) var(Y)  ^ 1 = 1 var((Xi X )ui ) var n 4X 0 }| { z }| { z 2 1 (E(Xi X )) var ui + (E(ui ))2 var(Xi X ) + var(Xi X ) var ui n 4X 2u 1 2X 2u = n 4X n 2X ## ^ 1 ) decreases when var(X) increases. We can see that var( Above we assumed homoscedasticity: ## LINEAR REGRESSION WITH A SINGLE REGRESSOR ^ ^1 v u u1 =t n 61 1 Pn ^ 2i i=1 u n2 1 Pn 2 i=1 (Xi X) n ## ^ 1 is the standard setting of statistiThis formula for the standard deviation of cal software and often the only choice we have in office software. But if we do not assume var(u|X = x), then we have: ^ ^1 = 1 var((Xi X )ui ) n 4X ## What if (Xi X ) and var(ui ) are not independent? homoscedasticity: ^ ^ v u u1 =t n 1 Pn ^ 2i i=1 u n2 1 Pn 2 i=1 (Xi X) n ## heteroscedasticity (always correct): ^ ^1 v u u1 =u tn 1 n2 Pn 1 n i=1 Pn i=1 2  Xi X u ^i 2 2 Xi X The formula for the case of homoscedastic error terms is simpler, but it is only correct if the assumption of homoscedastic error terms is actually satisfied. Since the formulas are different, we usually get different results. Homoscedasticity is the standard setting of the software (if not the only possible setting). ^ than the setting for Typically, it will give us smaller standard error for heteroscedasticity. ^ under the assumption of heteroscedastic residuals. hccm calculates the variance-covariance matrix for est <- lm(testscr ~ str) summary(est) c Oliver Kirchkamp c Oliver Kirchkamp 62 ## 6 February 2015 09:49:38 Call: lm(formula = testscr ~ str) Residuals: Min 1Q -47.727 -14.251 Median 0.483 3Q 12.822 Max 48.540 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 698.9330 9.4675 73.825 < 2e-16 *** str -2.2798 0.4798 -4.751 0.00000278 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 18.58 on 418 degrees of freedom Multiple R-squared: 0.05124,Adjusted R-squared: 0.04897 F-statistic: 22.58 on 1 and 418 DF, p-value: 0.000002783 ## By default, R uses homoscedastic standard error for p-values and confidence intervals To obtain the summary with heteroscedastic errors, we use summaryR from the package tonymisc. library(tonymisc) summaryR(est) Call: lm(formula = testscr ~ str) Residuals: Min 1Q -47.727 -14.251 Median 0.483 3Q 12.822 Max 48.540 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 698.9330 10.4605 66.816 < 2e-16 *** str -2.2798 0.5244 -4.348 0.0000173 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 18.58 on 418 degrees of freedom Multiple R-squared: 0.05124,Adjusted R-squared: 0.04897 F-statistic: 18.9 on 1 and 418 DF, p-value: 0.00001729 ## R can extract the homescedastic and the heteroscedastic standarderrors. sqrt(vcov(est)) (Intercept) str (Intercept) str 9.467491 NaN NaN 0.4798256 ## LINEAR REGRESSION WITH A SINGLE REGRESSOR 63 sqrt(hccm(est)) (Intercept) str (Intercept) str 10.46053 NaN NaN 0.5243585 ## 3.11.4 Summary Homo-/Heteroskedasticity If the data is homoscedastic and we assume heteroscedasticity, we are in the clear (If the data is homoscedastic then both formulas produce the same result for large n). If the data is heteroscedastic and we assume homoscedasticity, we get wrong standard errors. (The estimator is not consistent in this case.) we should always use standard errors that are robust to heteroscedasticity. ## What we know about OLS: OLS is unbiased OLS is consistent ^ we can calculate confidence intervals for ^ we can test hypothesis about A large amount of econometric analysis is presented in the form of OLS. One reason for this is that many people understand, how OLS works. Whenever we use a different estimator we run the risk of not being understood by others. Is that enough of an explanation to use OLS? Are there better estimators? Estimators with a lower variance? To answer this question we will make additional assumptions ## 3.12 Extended OLS assumptions 1. E(ui |Xi = x) = 0 2. (Xi , Yi ) are i.i.d. 3. Large outliers in X and Y are rare (the fourth moments of X and Y exist) c Oliver Kirchkamp c Oliver Kirchkamp 64 ## 6 February 2015 09:49:38 4. var(u|X = x) ist constant, u is homoscedastic 5. u is normally distributed u N(0, 2 ) Assumptions 4 and 5 are more restrictive they are warranted less often. Gauss Markov ^ 1 has the smallest variance of all linear estimators (of all estimators Assuming 1-4, which are linear functions of Y). Efficiency of OLS-II ^ 1 has the smallast variance of all consistent estimators, if n Assuming 1-5, (regardless of wether the estimators are linear or non-linear) ## 3.13 OLS problems Gauss Markov: The assumptions of the Gauss Markov Theorem (homoscedasticity) are often not fulfilled. The result is only valid for linear estimators. But linear estimators represent only a small share of all possible estimators. smallest variance of all consistent estimators requires homoscedastic normally distributed residuals more often than not this is not plausible. Outliers OLS is more sensitive to outliers than many other estimators. Recall the discussion about estimating the mean value: The median is less sensitive to outlieres than the sample mean value. We can do similar things when we estimate linear equations: OLS: min b0 ,b1 n X ## (Yi (b0 + b1 Xi ))2 i=1 LAD: min b0 ,b1 n X |Yi (b0 + b1 Xi )| i=1 however, OLS is used in most use cases we will do the same thing here. ## LINEAR REGRESSION WITH A SINGLE REGRESSOR 65 ## 3.13.1 Alternatives to OLS Identifying and eliminating outliers Quantile regression Robust regression The following dataset show the relation between income and food expenditure. 1000 1500 138 500 foodexp 2000 library(quantreg) data(engel) attach(engel) 1000 2000 3000 4000 5000 income The estimation result depends on the inclusion or exclusion of observation 138: lm(foodexp ~ income) Call: lm(formula = foodexp ~ income) Coefficients: (Intercept) 147.4754 income 0.4852 lm(foodexp ~ income,data=engel[-138,]) c Oliver Kirchkamp ## 6 February 2015 09:49:38 Call: lm(formula = foodexp ~ income, data = engel[-138, ]) Coefficients: (Intercept) 91.3330 income 0.5465 plot(foodexp ~ income) text(engel[138,1],engel[138,2],138,pos=2) est <- lm(foodexp ~ income) abline(est) abline(lm(foodexp ~ income,data=engel[-138,]),lty=2) legend("bottomright",c("all","138 dropped"),lty=1:2,cex=.5) plot(est,which=2) all 138 dropped 1000 3000 -4 -2 59 105 -6 500 1000 1500 138 Standardized residuals 2000 Normal Q-Q foodexp c Oliver Kirchkamp 66 5000 income 138 -3 Theoretical Quantiles ## Is observation 138 an outlier? 3.13.2 Robust regression Until now we have been minimizing least squares: X (yi (0 + 1 xi ))2 -1 0 ## More generally, we minimize the sum of any function: X (yi (0 + 1 xi )) ## LINEAR REGRESSION WITH A SINGLE REGRESSOR 67 where 1. (x) = x2 OLS 2. (x) = |x| LAD (quantile regression)  x2 /2 if |x| c 3. (x) = 2 c|x| c /2 else Hubers Method. c is an estimated value for u . OLS LAD Huber (x) 6 4 2 0 -3 -2 -1 x rq performs a quantile regression, minimizing the sum of the absolutes of the residuals. rlm performs a robust regression. LAD: library(quantreg) summary(rq(foodexp ~ income)) ## Call: rq(formula = foodexp ~ income) tau: [1] 0.5 Coefficients: coefficients lower bd upper bd (Intercept) 81.48225 53.25915 114.01156 income 0.56018 0.48702 0.60199 Huber: c Oliver Kirchkamp ## 6 February 2015 09:49:38 library(MASS) summary(rlm(foodexp ~ income)) ## Call: rlm(formula = foodexp ~ income) Residuals: Min 1Q Median 3Q Max -933.748 -54.995 4.768 53.714 418.020 Coefficients: Value Std. Error t value (Intercept) 99.4319 12.1244 8.2010 income 0.5368 0.0109 49.1797 Residual standard error: 81.45 on 233 degrees of freedom 1000 1500 2000 plot(foodexp ~ income) abline(lm(foodexp ~ income)) abline(rq(foodexp ~ income),lty=2) abline(rlm(foodexp ~ income),lty=3) legend("bottomright",c("OLS","LAD","Huber"),lty=1:3) OLS LAD Huber 500 foodexp c Oliver Kirchkamp 68 1000 2000 3000 4000 5000 income ## 3.13.3 A Bayesian approach to robust regression Of course, there is also a Bayesian approach to outliers. The idea is as follows: Usually, we assume that our dependent variable follows a Normal distribution. ## LINEAR REGRESSION WITH A SINGLE REGRESSOR 69 ## Let us estimate this as follows (this is still the non-robust approach): modelR<-model { for (i in 1:length(y)) { y[i] ~ dnorm(beta0 + beta1*x[i],tau) } beta0 ~ dnorm (0,.0001) beta1 ~ dnorm (0,.0001) tau ~ dgamma(.01,.01) } } bayesR<-run.jags(model=modelR,data=list(y=foodexp,x=income), monitor=c("beta0","beta1")) Compiling rjags model and adapting for 1000 iterations... Calling the simulation using the rjags method... Burning in the model for 4000 iterations... Running the model for 10000 iterations... Simulation complete Calculating the Gelman-Rubin statistic for 2 variables.... The Gelman-Rubin statistic is below 1.05 for all parameters Finished running the simulation 05 1525 Density ## 0.44 0.46 0.48 0.50 0.52 0.54 beta1 plot(bayesR,var="beta1",type=c("trace","density"),newwindows=FALSE) 0.45 0.50 beta1 ## 6000 8000 10000 Iteration 14000 0.55 c Oliver Kirchkamp c Oliver Kirchkamp 70 ## 6 February 2015 09:49:38 summary(bayesR) Iterations = 5001:15000 Thinning interval = 1 Number of chains = 2 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE beta0 143.872 15.91473 0.1125341 0.3198132 beta1 0.488 0.01429 0.0001011 0.0002814 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% beta0 112.6957 133.1380 143.882 154.6010 174.6390 beta1 0.4602 0.4785 0.488 0.4976 0.5164 So far our results are very similar to the (non-robust) OLS results from above. The Normal distribution does not put much weight on its tails. In other words, observations (outliers) which are several standard deviations away from the expected value are very unlikely. When we use the Normal distribution above we, implicitely, shift the estimator closer to these observations, so that the distance between posterior estimate and observation becomes smaller. A distribution which may (but need not) put more weight on its tails, and which still contains the Normal distribution as a special case, is the t-distribution. If the degrees of freedom are large, the t-distribution is very close to the Normal distribution. If the degrees of freedom are small, the t-distribution has very fat tails. ## LINEAR REGRESSION WITH A SINGLE REGRESSOR t20 71 t1 0.4 Density 0.3 0.2 0.1 0.0 -3 -2 -1 ## Bayesian, k = 20 Let us start with a large value for degrees of freedom: modelRR20<-model { for (i in 1:length(y)) { y[i] ~ dt(beta0 + beta1*x[i],tau,20) } beta0 ~ dnorm (0,.0001) beta1 ~ dnorm (0,.0001) tau ~ dgamma(.01,.01) } } set.seed(123) bayesRR20<-run.jags(model=modelRR20,data=list(y=foodexp,x=income), monitor=c("beta0","beta1")) Compiling rjags model and adapting for 1000 iterations... Calling the simulation using the rjags method... Burning in the model for 4000 iterations... Running the model for 10000 iterations... Simulation complete Calculating the Gelman-Rubin statistic for 2 variables.... The Gelman-Rubin statistic is below 1.05 for all parameters Finished running the simulation summary(bayesRR20) c Oliver Kirchkamp c Oliver Kirchkamp 72 ## 6 February 2015 09:49:38 Iterations = 5001:15000 Thinning interval = 1 Number of chains = 2 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE beta0 97.8757 15.84981 0.1120751 0.4986725 beta1 0.5389 0.01606 0.0001136 0.0004964 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% beta0 66.5581 87.4478 97.828 108.3922 129.4566 beta1 0.5067 0.5284 0.539 0.5495 0.5705 Bayesian, k = 1 Let us compare this with a small value for degrees of freedom: modelRR1<-model { for (i in 1:length(y)) { y[i] ~ dt(beta0 + beta1*x[i],tau,1) } beta0 ~ dnorm (0,.0001) beta1 ~ dnorm (0,.0001) tau ~ dgamma(.01,.01) } } bayesRR1<-run.jags(model=modelRR1,data=list(y=foodexp,x=income), monitor=c("beta0","beta1")) Compiling rjags model and adapting for 1000 iterations... Calling the simulation using the rjags method... Burning in the model for 4000 iterations... Running the model for 10000 iterations... Simulation complete Calculating the Gelman-Rubin statistic for 2 variables.... The Gelman-Rubin statistic is below 1.05 for all parameters Finished running the simulation summary(bayesRR1) Iterations = 5001:15000 Thinning interval = 1 Number of chains = 2 Sample size per chain = 10000 ## LINEAR REGRESSION WITH A SINGLE REGRESSOR ## 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: 73 c Oliver Kirchkamp Mean SD Naive SE Time-series SE beta0 62.3640 14.47629 0.1023629 0.5464357 beta1 0.5892 0.01818 0.0001286 0.0006682 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% beta0 35.2204 52.2198 62.0430 72.192 90.9461 beta1 0.5533 0.5767 0.5897 0.602 0.6234 ## Bayesian, more robust: ## Let us next endogenise the degrees of freedom (k): modelRR<-model { for (i in 1:length(y)) { y[i] ~ dt(beta0 + beta1*x[i],tau,k) } beta0 ~ dnorm (0,.0001) beta1 ~ dnorm (0,.0001) tau ~ dgamma(.01,.01) k ~ dexp(1/30) } } bayesRR<-run.jags(model=modelRR,data=list(y=foodexp,x=income), monitor=c("beta0","beta1","k")) Compiling rjags model and adapting for 1000 iterations... Calling the simulation using the rjags method... Burning in the model for 4000 iterations... Running the model for 10000 iterations... Simulation complete Calculating the Gelman-Rubin statistic for 3 variables.... The Gelman-Rubin statistic is below 1.05 for all parameters Finished running the simulation ## Digression: The exponential distribution: k<-10^(seq(-.5,2,.1)) xyplot(pexp(k,1/30) ~ k,type="l",scales=list(x=list(log=T)),xscale.components = xscale.compone ## 6 February 2015 09:49:38 0.8 pexp(k,1/30) c Oliver Kirchkamp 74 0.6 0.4 0.2 0.3 10 30 100 A t-distribution with 20 degrees of freedom is very similar to a normal distribution. If our prior for k follows dexp(k,1/30), then the probability for k > 20 is (slightly) larger than 1/2, i.e. we are giving the traditional model (of an almost normal distribution) a very good chance. plot(bayesRR,var="k",type=c("trace","density")) 75 0.00.20.4 Density 7 6 5 4 ## 6000 8000 10000 14000 Iteration 0 510 20 Density ## 0.50 0.52 0.54 0.56 0.58 0.60 0.62 beta1 plot(bayesRR,var="beta1",type=c("trace","density")) 0.50 0.55 beta1 ## 6000 8000 10000 Iteration summary(bayesRR) 14000 0.60 c Oliver Kirchkamp ## LINEAR REGRESSION WITH A SINGLE REGRESSOR c Oliver Kirchkamp 76 ## 6 February 2015 09:49:38 Iterations = 5001:15000 Thinning interval = 1 Number of chains = 2 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE beta0 79.4320 15.32741 0.1083812 0.5138090 beta1 0.5622 0.01785 0.0001262 0.0006043 k 3.5468 0.86702 0.0061308 0.0131400 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% beta0 48.6728 69.2156 79.5799 89.814 108.5751 beta1 0.5284 0.5498 0.5619 0.574 0.5984 k 2.2482 2.9405 3.4172 4.001 5.6210 We see that our estimation for the slope (based on the t-distribution) is larger, similar to the LAD and Huber regression. Different from LAD and Huber the regression itself has chosen the optimal way (or the optimal value of k) to accomodate outliers. 3.14 Exercises 1. Regressions I Define the following items: Regression Independent variable Dependent variable Give the formula for a linear regression with a single regressor. What does 1 indicate? What does u indicate? 2. Regressions II Use the data set Crime of the library Ecdat in R. How do you interpret positive (negative) coefficients in simple linear models? What is the influence on the number of police men per capita (polpc) on the crime rate in crimes committed per person (crmrte)? Interpret your result. Do you have an explanation for this result? ## LINEAR REGRESSION WITH A SINGLE REGRESSOR 77 ## How can we visualize regressions? Draw the respective graph in R. What is the correlation coefficient of the number of police men per capital (polpc) and the crime rate (crmrte)? Interpret the result. What do the standard errors tell you? What does R2 indicate? ## What do the p-values indicate? 3. Regressions III List and explain the assumptions that have to be fulfilled to be able to use OLS. 4. Classes of variables Explain and give examples of the following types of variables: continuous discrete binary 5. Dummies Use the data set BudgetFood of the library Ecdat in R. What is a dummy variable? Which values can it take? ## Name some typical examples of variables which are often coded as dummy variables. You have heard that older people care more about the quality of food they eat and thus spend more on food than younger people. Test whether Spanish citizens from the age of 30 on spend a higher percentage of their available income on food (wfood) than younger do. Do you have an explanation for these findings? Interpret the result of your regression. What does 1 specify in this case? Could you apply a different test for the same question? Use this test in R and compare the results. Check the relation of age (age) and percentage of income spent on food (wfood) with a graph. 6. Reading data into R Create a data file with the data on pocket money and age that we used in chapters 1 and 2. Use the headers age and pm and save it in the format .csv under the name pocketmoney. c Oliver Kirchkamp c Oliver Kirchkamp 78 ## 6 February 2015 09:49:38 Read the file pocketmoney into R. Draw a scatter plot with age on the x-axis and pm on the y-axis. Draw the same scatter plot, this time without box plots and without the lowess spline, but with a linear regression line. Label your scatter plot with age on the x-axis and pocket money on the y-axis. Give your graph the title Childrens pocket money. 7. Exam 28.7.2007, exercise 5a+d Your task is to work on a hypothetical data set in R. The variable names A, B, C, D, E, and Year are in the header of your data file file.csv. The data set contains 553 observations in the format .csv (comma separated values). Explain what the following commands do and choose the correct one (with explanation). First, read your data set into R. ## a) daten = read.csv(file.csv, header=YES, sep=;) ## b) daten = read.csv(file.csv, header=TRUE, sep=;) c) daten = read.table(file.csv) d) daten = read.table(file.csv, header=YES, sep=,) Further, you would like to know the correlation between the variables B, C, and D. How can you find this out? a) corr(B,C, D) b) corr(daten) c) cor(daten) d) corr(B,C, D) 8. Using and generating dummy variables Use the data set Fatality of the library Ecdat in R. What is the data set about? ## Which of the variables are nominal variables? Which of the variables are discrete variables? ## Which of the variables are continuous variables? Which of the variables are dummy variables? Create a dummy variable which takes the value 1 if the size of the sales floor is > 120 sqm and 0 otherwise. ## LINEAR REGRESSION WITH A SINGLE REGRESSOR 79 Draw a graph with separate box plots for large and small sales floors on the sales per square meter. Measure the influence of the size of the sales floor on the sales per square meter. Do the same task as above, this time using your variable for large sales floors. 9. Heteroscedasticity What is heteroscedasticity? Give an example for data where residual variances of differ along the dimension of a second variable. What is homoscedasticity? ## Which advantage does homoscedasticity have for econometric analysis? Imagine you had data with heteroscedastic error terms. You perform a data analysis under the assumption of homoscedasticity. Is your estimator consistent? Now you have data with homoscedastic error terms and you perform a data analysis under the assumption of heteroscedasticity. Is your estimator consistent? How would the answers to the last two questions be if you had a large sample? 10. Advantages and disadvantages of OLS What are the advantages and disadvantages of OLS? ## What can you do to fix these problems in OLS estimations? 11. Prices for houses Use the data set Housing of the library Ecdat in R. Draw a scatter plot on the the lot size and the price of a house. Look at the graph. From your visual impression, would you say that the error terms are homo- or heteroscedastic? What does the data set contain? Look at the variables the data set contains. Formulate some sensible hypotheses and test them in R. 12. Wages Use the data set Wages1 of the library Ecdat in R. c Oliver Kirchkamp c Oliver Kirchkamp 80 ## 6 February 2015 09:49:38 What does the data set contain? Do you think that gender (sex) matters when it comes to wages (wage)? Check your assumption in R. Are years of education received (school) and gender correlated with each other? Do you think that experience (exper) or years of schooling matter more for wage? Check your assumption in R. Which tests could you use to test it? Do employees with a college education (more than 12 years of education) earn more than those without? Test this in R. Which type of variable do you use to answer this question? Do you think that our models above are well specified? How would you change the model if you could? ## 4 Models with more than one independent variable (multiple regression) testsrc = 1 str + 0 + u testscr test score str student / teacher ratio How can we include more than one factor at the same time? Keep one factor constant by only looking at a small group (e.g. all students with a very similar elpct (english learner percentage)) The subset option of the command lm limits the estimation to a certain part of the dataset. data(Caschool) attach(Caschool) summary(elpct) Min. 1st Qu. 0.000 1.941 Median 8.778 ## Mean 3rd Qu. 15.770 22.970 Max. 85.540 ## lm(testscr ~ str ,subset=(elpct<9)) Call: lm(formula = testscr ~ str, subset = (elpct < 9)) Coefficients: (Intercept) 680.252 str -0.835 81 ## lm(testscr ~ str ,subset=(elpct>=9 & elpct<23)) Call: lm(formula = testscr ~ str, subset = (elpct >= 9 & elpct < 23)) Coefficients: (Intercept) 696.445 str -2.231 ## lm(testscr ~ str ,subset=(elpct>=23)) Call: lm(formula = testscr ~ str, subset = (elpct >= 23)) Coefficients: (Intercept) 653.0746 str -0.8656 [0,9] (9,23] (23,100] 700 testscr 680 660 640 620 14 16 18 20 22 24 str ## depending on elpct the estimated relationships are very different. extend the regression model testsrc = 1 str + 2 elpct + 0 + u 26 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 82 ## 6 February 2015 09:49:38 generally: y = 0 + 1 x1 + 2 x2 + + k xk + u for every observation: 0 + 1 x11 + 2 x12 + + k x1k + u1 y1 y2 y3 = = .. . ## 0 + 1 x21 + 2 x22 + + k x2k + u2 0 + 1 x31 + 2 x32 + + k x3k + u3 yn ## 0 + 1 xn1 + 2 xn2 + + k xnk + un ## lm(testscr ~ str + elpct) Call: lm(formula = testscr ~ str + elpct) Coefficients: (Intercept) 686.0322 str -1.1013 elpct -0.6498 ## 4.1 Matrix notation 4.1.1 How to do calculations with matrices y = X + u y= y1 y2 y3 .. . yn X= 1 x11 1 x21 1 x31 .. . x12 x22 x32 1 xn1 xn2 x1k x2k x3k .. .. . . xnk ; = 0 1 2 .. . k ; u = u1 u2 u3 .. . un Addition ## a11 a12 a13 b11 b12 b13 a11 + b11 a12 + b12 a13 + b13 a21 a22 a23 + b21 b22 b23 = a21 + b21 a22 + b22 a23 + b23 b31 b32 b33 a31 + b31 a32 + b32 a33 + b33 a31 a32 a33 | {z } | {z } | {z } nm Multiplication nm nm a21 a22 {z nm b12 a23 b22 b32 } | {z mk ## 4.1.2 Calculations with matrices in R P m = i=1 a2i bi2 } | {z nk 83 We define vectors with c(...). We can then stack vectors horizontally or vertially with rbind(...) or cbind(...). A <- rbind(c(1,2,3),c(4,5,6)) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 B <- cbind(c(2,2,2),c(3,3,3)) [1,] [2,] [3,] [,1] [,2] 2 3 2 3 2 3 ## For the transpose of a matrix we use t(...) t(B) [1,] [2,] ## [,1] [,2] [,3] 2 2 2 3 3 3 + adds matrices elementwise. This requires that the matrices have the same rank. In this example we can not calculate A+B but we can calculate A+t(B). A + t(B) [1,] [2,] ## [,1] [,2] [,3] 3 4 5 7 8 9 * multiplies the elements of a matrix. This is not the usual matrix multiplication: A * t(B) [1,] [2,] ## [,1] [,2] [,3] 2 4 6 12 15 18 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 84 ## 6 February 2015 09:49:38 A [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 B [1,] [2,] [3,] [,1] [,2] 2 3 2 3 2 3 ## %*% performs the usual matrix multiplication: A %*% B [1,] [2,] [,1] [,2] 12 18 30 45 ## 4.2 Deriving the OLS estimator in matrix notation y1 y= y1 y2 y3 .. . = .. . X= ## 0 + 1 x11 + 2 x12 + + k x1k + u1 y = X + u 1 x11 1 x21 1 x31 .. . x12 x22 x32 yn 1 xn1 xn2 y = X + u: Now, the residuals are x1k x2k x3k .. .. . . xnk ; = 0 1 2 .. . k ; u = u = y X The sum of squares of the residuals S() = n X u2i = u u (y X) (y X) y y y X X y + X X y y 2 X y + X X i=1 u1 u2 u3 .. . un 85 (recall: (AB) = B A ) To minimize S(), we take the first derivative with respect to : S() ! = 2X y + 2X X = 0 ^ = X y X X Normal equations: (7) ## Now, X X is a (k + 1) (k + 1) matrix. If this matrix is nonsingular, we calculate its inverse. ^ (X X)1 X X ^ = = (X X)1 X y (X X)1 X y | {z } X+ X+ = (X X)1 X as an exercise: show that XX+ X = X X+ XX+ = X+ (XX+ ) = XX+ X+ X = I Call ^ ^ = X y ^ = yy ^ u orthogonality ^ = X y X X(X X)1 X y = 0 ^ = X (y y ^ ) = X y X X X u ^ X u ^ u ^= ^=0 y ^ = X y with ^ yields Multiplying the normal equation X X ^ X X ^= ^ X y c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 86 ## 6 February 2015 09:49:38 then ^ u ^ u = = = = = = ^ (y X) ^ (y X) ^ X y + ^ X X ^ y y 2 ^ X X ^+ ^ X X ^ y y 2 ^ X X ^ y y ^ X ^ y y (X) ^ y ^ y y y ## Quadratic analysis (variance analysis) ^ y ^+u ^ u ^ y y = y In case of an inhomogeneous regression: y = X + u y1 1 x11 x12 x1k y2 1 x21 x22 x2k y3 y= ; X = 1 x31 x32 x3k .. .. .. .. . . . . 1 xn1 xn2 xnk yn 1 1 1 ^ y ^ u ^+ u ^ y y= y n n n 1 1 1 ^ y ^ u ^ y 2 + u ^ y y y 2 = y n n n TSS 2 2 2 ESS sy = sy ^ + su ^ |{z} |{z} |{z} SSR SSR TSS ESS R = s2y ^ s2y ; = 0 1 2 .. . k ; u = ## total sum of squares explained sum of squares sum of squares of residuals s2u = 1 2^ sy ## 4.3 Sp ecification errors What can happen if we forget to include a variable into our model? Let us take another look at our simple estimation equation: testsrc = 1 str + 0 + u u1 u2 u3 .. . un 87 ## testscr test score str student / teacher ratio What else could have an influence on testscr? corr. with influence regressor on dep. var. str testscr percent of English learners x x time of day of the test x parking lot space per student x If we do not include a variable in our estimation equation, but this variable is correlated with the regressor and has an influence on the dependent variable our estimation for is biased (omitted variable bias). The assumption E(ui |Xi ) = 0 is no longer satisfied 4.3.1 Examples: Classical music intelligence of children (Rauscher, Shaw, Ky; Nature; 1993) (missing variable: income) French paradox: Red wine, foie gras less illnesses of the coronary blood vessels (Samuel Black, 1819) (missing variable: percentage of fish and sugar in the diet,. . . ) ## Storks in Lower Saxony birth rate (missing variable: industrialisation) c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) ## 6 February 2015 09:49:38 18 1966 1965 1953 1963 1962 1961 1960 16 1967 1958 1954 1956 1957 1955 1959 1972 1968 1971 1969 1975 1973 1970 14 birth rate 1964 12 1974 1976 1977 50 100 150 200 ## freq. of nesting storks 10 ## Gabriel, K. R. and Odoroff, C. L. (1990) Biplots in biomedical research. Statistics in Medicine 9(5): pp. 469-485. UnitedStates Australia NewZealand Scotland Canada England Sweden Norway Ireland Belgium Netherlands Denmark Austria Germany Mortality Italy Switzerland c Oliver Kirchkamp 88 France 20 40 60 Wine [l/year] Mortality due to coronary heart disease (per 1000 men, 55 64 years). St. Leger A.S., Cochrane, A.L. and Moore, F. (1979). Factors Associated with Cardiac Mor- 89 ## tality in Developed Countries with Particular Reference to the Consumption of Wine, Lancet: 10171020. 4.3.2 Specification errors generalization Let the true model be y = X 1 1 + X2 2 + u what happens if we forget to include X2 into the specification of our model? ^ = (X X)1 X y | {z } X+ b1 (X1 X1 )1 X1 y (X1 X1 )1 X1 (X1 1 + X2 2 + u) ## (X1 X1 )1 X1 X1 1 + (X1 X1 )1 X1 X2 2 + (X1 X1 )1 X1 u E(b1 ) = 1 + (X1 X1 )1 X1 X2 2 ## Hence, E(b1 ) = 1 only if 2 = 0 or X1 X2 = 0, i.e. X1 and X2 are orthogonal Specification errors another example ## The correct model: Y = 0 + 1 X1 + 2 X2 + u X2 is correlated with X1 , e.g.: X2 = X1 + Omitting X2 means: Y = 0 + 1 X1 + 2 (X1 + ) + u = 0 + (1 + 2 )X1 + 2 + u We overestimate 1 if 2 > 0 We underestimate 1 if 2 < 0 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 90 ## 6 February 2015 09:49:38 ## 4.4 Assumptions for the multiple regression model 1. E(ui |Xi = x) = 0 2. (Xi , Yi ) are i.i.d. 3. Large outliers in X and Y are rare (the fourth moments of X and Y exist) 4. X has the same rank as its number of columns (no multicollinearity) 5. var(u|X = x) is constant, u is homoscedastic 6. u is normally distributed u N(0, 2 ) ## 4.5 The distribution of the OLS estimator in a multiple regression ^ 0 and ^ 1 are unbi model with a single regressor: the OLS estimators for ^ 0 and ^ 1 are normally ased and consistent estimators. For large samples distributed. multiple regression: under the assumptions 14 (given above) the OLS esti^ = (X X)1 X y is unbiased and consistent. For large samples ^ is mator jointly normally distributed. 4.6 Multicollinearity Example testscr = 1 str + 2 elpct + 0 Now, we extend the model by adding another variable: Ratio of English learners FracEL=elpct/100: testscr = 1 str + 2 elpct + 3 FracEL + 0 FracEL<-elpct/100 ## summary(lm(testscr ~ str + elpct)) Call: lm(formula = testscr ~ str + elpct) Residuals: Min 1Q -48.845 -10.240 Median -0.308 3Q 9.815 Max 43.461 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 686.03225 7.41131 92.566 < 2e-16 *** str -1.10130 0.38028 -2.896 0.00398 ** elpct -0.64978 0.03934 -16.516 < 2e-16 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 14.46 on 417 degrees of freedom Multiple R-squared: 0.4264,Adjusted R-squared: 0.4237 F-statistic: 155 on 2 and 417 DF, p-value: < 2.2e-16 ## summary(lm(testscr ~ str + elpct + FracEL)) Call: lm(formula = testscr ~ str + elpct + FracEL) Residuals: Min 1Q -48.845 -10.240 Median -0.308 3Q 9.815 Max 43.461 ## Coefficients: (1 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 686.03225 7.41131 92.566 < 2e-16 *** str -1.10130 0.38028 -2.896 0.00398 ** elpct -0.64978 0.03934 -16.516 < 2e-16 *** FracEL NA NA NA NA --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 14.46 on 417 degrees of freedom Multiple R-squared: 0.4264,Adjusted R-squared: 0.4237 F-statistic: 155 on 2 and 417 DF, p-value: < 2.2e-16 FracEL<-elpct/100+rnorm(4)*.0000001 summary(lm(testscr ~ str + elpct + FracEL)) Call: lm(formula = testscr ~ str + elpct + FracEL) Residuals: Min 1Q -48.608 -10.063 Coefficients: Median -0.152 3Q 9.613 Max 43.857 91 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 92 ## 6 February 2015 09:49:38 Estimate Std. Error t value Pr(>|t|) (Intercept) 685.9887 7.4172 92.486 < 2e-16 *** str -1.1040 0.3806 -2.901 0.00392 ** elpct -53520.1940 87264.2711 -0.613 0.54001 FracEL 5351954.3231 8726426.9534 0.613 0.54001 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 14.48 on 416 degrees of freedom Multiple R-squared: 0.4269,Adjusted R-squared: 0.4228 F-statistic: 103.3 on 3 and 416 DF, p-value: < 2.2e-16 We notice that R detects the multicollinearity on its own. It simplifies the model accordingly. But this does not always work. We slightly perturb the variable. This can happen accidentally (e.g. through rounding errors). Multicollinearity between the variables is no longer perfect. We get the same result for all coefficients, but any (ever so slight) perturbation changes the result considerably. The standard errors get very large. elpct[1:5]/100 [1] 0.00000000 0.04583333 0.30000002 0.00000000 0.13857677 elpct[1:5]/100+rnorm(4)*.0000001 [1] [5] 0.00000001292877 0.13857678752594 0.04583350642929 0.30000006516511 -0.00000012650612 elpct[1:5]/100+rnorm(4)*.0000001 [1] -0.00000006868529 [5] 0.13857670591188 0.04583329035659 0.30000014148167 0.00000003598138 elpct[1:5]/100+rnorm(4)*.0000001 [1] 0.00000004007715 0.04583334599106 0.29999996348937 0.00000017869131 [5] 0.13857681467431 ## perturbedEstimate <- function (x) { FracEL <- elpct/100+rnorm(4)*.0000001 est <- lm(testscr ~ str + elpct + FracEL) coef(est)[3:4] } perturbedEstimate(1) elpct FracEL -53520.19 5351954.32 93 perturbedEstimate(1) elpct FracEL 56766.73 -5676738.56 perturbedEstimate(1) elpct FracEL -64347.34 6434669.12 ## estList <- sapply(1:100,perturbedEstimate) plot(t(estList),main="multicollinearity, estimated coefficients") -10000000 -30000000 FracEL 10000000 ## multicollinearity, estimated coefficients -200000 -100000 100000 200000 300000 elpct Large coefficients for elpct are balanced by small coefficients for FracEL. What happened? What is the true relationship? testscr = 686.0322 1.1013str 0.6498elpct FracEL = elpct/100 testscr = 686.0322 1.1013 str + (a 0.6498) elpct 100a elpct/100 testscr = 686.0322 1.1013 str + (a 0.6498) elpct 100a FracEL coefficients cannot be identified anymore. c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 94 ## 6 February 2015 09:49:38 4.6.1 Example 2 Dummy variable assumes the value 1 if str>12 (group is not very small) NVS <- str>12 lm(testscr ~ str + elpct + NVS) Call: lm(formula = testscr ~ str + elpct + NVS) Coefficients: (Intercept) 686.0322 str -1.1013 elpct -0.6498 NVSTRUE NA ## The coefficient of NVS cannot be estimated. Why? table(NVS) NVS TRUE 420 The new variable NVS is always TRUE and, hence, it is perfectly correlated with the constant term. Explanation: There are no groups with str < 12. Therefore, we cannot assess the effect of such a small group size. 4.6.2 Example 3 ESpct = 100 elpct ESpct <- 100 - elpct lm(testscr ~ str + elpct + ESpct) Call: lm(formula = testscr ~ str + elpct + ESpct) Coefficients: (Intercept) 686.0322 str -1.1013 elpct -0.6498 ESpct NA ## Again, R detects collinearity. We can perform a small perturbation to get a result. However, this result is not exactly helpful. set.seed(123) perturbedEstimate2 <- function (x) { ESpct <- 100 - elpct +rnorm(4)*.01 est <- lm(testscr ~ str + elpct + ESpct) 95 coef(est)[3:4] } estList <- sapply(1:100,perturbedEstimate2) plot(t(estList),main="multicollinearity 2, estimated coefficients") 100 -300 -100 ESpct 300 ## multicollinearity 2, estimated coefficients -300 -200 -100 100 200 300 elpct ## The true relationship is: testscr = 686.0322 1.1013str 0.6498elpct Now let ESpct = 100 elpct testscr = 686.0322 a 100 1.1013 str (0.6498 + a ) elpct + a (100 elpct) ## testscr = 686.0322 a 100 1.1013 str (0.6498 + a) elpct + a ESpct coefficients cannot be identified anymore. 4.6.3 Which regressor is responsible for the multicollinearity? We perform regressions using each of the k regressors as dependent variables and all other k 1 regressors as independent variables: xi = 0 + 1 x1 + . . . + i1 xi1 + i+1 xi+1 + k xk + u A large R2i is a sign of collinearity. c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 96 ## 6 February 2015 09:49:38 ## We consider the Variance Inflation Factor VIF = 1 1 R2i In the following example we build a(n) (almost) linearly dependent value elpct2. Additionally, we add an obviously pointless regressor to the equation: the number of the school district. set.seed(123) elpct2 <- elpct + rnorm(4) est <- lm (testscr ~ str + elpct2 + elpct + as.numeric(district)) summaryR(est) Call: lm(formula = testscr ~ str + elpct2 + elpct + as.numeric(district)) Residuals: Min 1Q -48.328 -10.212 Median -0.168 3Q 9.518 Max 43.872 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 685.454379 8.923996 76.810 <2e-16 *** str -1.096242 0.438310 -2.501 0.0128 * elpct2 0.544466 0.853474 0.638 0.5239 elpct -1.196071 0.857852 -1.394 0.1640 as.numeric(district) 0.001910 0.005937 0.322 0.7478 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 14.49 on 415 degrees of freedom Multiple R-squared: 0.4271,Adjusted R-squared: 0.4216 F-statistic: 109 on 4 and 415 DF, p-value: < 2.2e-16 We notice that there are some factors with a high variance. We calculate the variance inflation factor to test for collinearity: library(car) elpct2 <- elpct + rnorm(4) est <- lm (testscr ~ str + elpct2 + elpct + as.numeric(district)) vif(est) str 1.040984 as.numeric(district) 1.008520 elpct2 512.718019 ## vif(lm(testscr ~ str + elpct + mealpct + calwpct)) elpct 512.761866 97 str elpct mealpct calwpct 1.044388 1.962265 3.870485 2.476509 We notice (at least we would if we had not know in advance) that the number of the school district is not significant, but neither is it collinear. The two versions of elpct are collinear. If we remove one, the variance of the other gets smaller. summaryR(lm (testscr ~ str + elpct + as.numeric(district))) group 2 group 3 group 1 constant ## 4.6.4 Multicollinearity of dummy variables 1 1 1 1 1 1 1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 ## This matrix does not have the full column rank ## 4.7 Specification Errors: Summary underspecified model, a regressor 2 is missing: ^ is unbiased, only if 2 = 0 or X X2 = 0. 1 ## overspecified model, regressors are collinear: ## ^ cannot be estimated (X X cannot be inverted) ## overspecified model, regressors are almost collinear: ^ can only be estimated inexactly ^ The distribution of c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 98 ## 6 February 2015 09:49:38 ^ 4.7.1 The variance of For the simple regression (as a reminder): ^ 2 ^ 1 = n ^ 2 ^ = 1 1 Pn ^ 2i i=1 u n2 1 Pn 2 i=1 (Xi X) n 1 Pn v2 i=1 ^ n2 Homoscedasticity 1  n 1 Pn ## Heteroscedasticity (always correct) 2 2 i=1 Xi X n  (mit ^v = Xi X u ^ i) ## For the multiple regression: ^ 2u (X X)1 ^ ^ = Homoscedasticity ## X Iu2 X(X X)1 ^ ^ = (X X) ## Heteroscedasticity (always correct) ## 4.7.2 Imperfect multicollinearity Where X is almost multicollinear, (X X)1 is very large and the estimation ^ is rather imprecise. for 4.7.3 Hypothesis tests Testing the hypothesis H0 : j = j,0 against H1 : j 6= j,0 : Determine the t statistic: ^ j j,0 t= ^ ^j  The p-value is p = Pr |t| > tsample = 2(|tsample |) F(|t|) |t| F(|t|) 0 |t| 99 ## est <- lm(testscr ~ str + elpct) diag( X) extracts the diagonal of X if X is a matrix. If x is a vector, diag(x) constructs the diagonal matrix. coef extracts the estimated coefficients fromr a model. ^ Homoscedastic standard deviation of ^ 2u (X X)1 ^ ^ = homoscedasticity (stddevh <-sqrt(diag(vcov(est)))) (Intercept) 7.41131248 str 0.38027832 coef(est) / stddevh (Intercept) 92.565554 str -2.896026 elpct 0.03934255 elpct -16.515879 round(2*pnorm(- abs(coef(est)) / (Intercept) 0.00000 str 0.00378 stddevh),5) elpct 0.00000 summary(est) Call: lm(formula = testscr ~ str + elpct) Residuals: Min 1Q -48.845 -10.240 Median -0.308 3Q 9.815 Max 43.461 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 686.03225 7.41131 92.566 < 2e-16 *** str -1.10130 0.38028 -2.896 0.00398 ** elpct -0.64978 0.03934 -16.516 < 2e-16 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 14.46 on 417 degrees of freedom Multiple R-squared: 0.4264,Adjusted R-squared: 0.4237 F-statistic: 155 on 2 and 417 DF, p-value: < 2.2e-16 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 100 ## 6 February 2015 09:49:38 summaryR(est) Call: lm(formula = testscr ~ str + elpct) Residuals: Min 1Q -48.845 -10.240 Median -0.308 3Q 9.815 Max 43.461 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 686.0322 8.8122 77.85 <2e-16 *** str -1.1013 0.4371 -2.52 0.0121 * elpct -0.6498 0.0313 -20.76 <2e-16 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 14.46 on 417 degrees of freedom Multiple R-squared: 0.4264,Adjusted R-squared: 0.4237 F-statistic: 220.1 on 2 and 417 DF, p-value: < 2.2e-16 ^ Heteroscedastic variance-covariance matrix of Now we perform the same steps using heteroscedasticity-consistent standard errors: sqrt(diag(vcov(est))) (Intercept) 7.41131248 str 0.38027832 elpct 0.03934255 ## (stddev <- sqrt(diag(hccm(est)))) (Intercept) 8.81224085 str 0.43706612 coef(est) / stddev (Intercept) 77.849920 str -2.519747 elpct 0.03129693 elpct -20.761681 round(2*pnorm(- abs(coef(est)) / (Intercept) 0.00000 str 0.01174 stddev),5) elpct 0.00000 1 X Iu2 X(X X)1 ^ ^ = (X X) ## heteroscedasticity (always correct) 101 ## 4.7.4 Digression: Multiplication: Inner product of vectors using % % (a0 , a1 , a2 , , ak ) % % + Elementwise product * a1 a2 a3 .. . ak b1 b2 b3 .. . bk b0 b1 b2 .. . bk k X ai bi = i=0 a1 b1 a2 b2 a3 b3 .. . ak bk Outer product A%o%B a1 b0 a1 b1 a1 b2 a1 a2 b0 a2 b1 a2 b2 a2 a3 %o% (b0 , b1 , b2 , , bm ) = a3 b0 a3 b1 a3 b2 .. .. .. .. . . . . ak b0 ak b1 ak b2 ak a1 a1 a1 a2 a2 a2 a3 a3 a3 %o% (+1, 1) = .. .. .. . . . ak ak ak ^ confidence interval for qnorm(.975) [1] 1.959964 coef(est) + qnorm(.975) * stddev %o% c(-1,1) [,1] [,2] (Intercept) 668.7605740 703.3039234 str -1.9579298 -0.2446620 elpct -0.7111176 -0.5884359 ## What happens if group size is reduced? (e.g. by 2) a1 bm a2 bm a3 bm .. .. . . ak bm c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 102 ## 6 February 2015 09:49:38 ## -2* (coef(est) + qnorm(.975) * stddev %o% c(-1,1))["str",] ## [1] 3.9158595 0.4893241 ## 4.7.5 Extending the estimation equation by adding expenditure per student (est <- lm(testscr ~ str + elpct)) Call: lm(formula = testscr ~ str + elpct) Coefficients: (Intercept) 686.0322 str -1.1013 elpct -0.6498 ## (stddev <- sqrt(diag(hccm(est)))) (Intercept) 8.81224085 str 0.43706612 coef(est) / stddev (Intercept) 77.849920 str -2.519747 elpct 0.03129693 elpct -20.761681 round(2*pnorm(-abs(coef(est) / (Intercept) 0.00000 str 0.01174 stddev)),5) elpct 0.00000 ## (est <- lm(testscr ~ str + elpct + expnstu)) Call: lm(formula = testscr ~ str + elpct + expnstu) Coefficients: (Intercept) 649.577947 str -0.286399 elpct -0.656023 expnstu 0.003868 ## (stddev <- sqrt(diag(hccm(est)))) (Intercept) 15.668622170 coef(est) / str 0.487512918 stddev elpct 0.032114291 expnstu 0.001607407 (Intercept) 41.4572475 str elpct -0.5874701 -20.4277485 round(2*pnorm(-abs(coef(est) / (Intercept) 0.00000 str 0.55689 103 expnstu 2.4062993 stddev)),5) elpct 0.00000 expnstu 0.01612 Compare the standard error of the coefficient of str in the different estimation equations. sqrt(diag(hccm(lm(testscr ~ str ))))["str"] str 0.5243585 sqrt(diag(hccm(lm(testscr ~ str + elpct))))["str"] str 0.4370661 sqrt(diag(hccm(lm(testscr ~ str + elpct + expnstu))))["str"] str 0.4875129 ## In case of multicolliniearity: increases ^ In case of omitted variable: may decrease ^ ## 4.8 Joint Hypotheses e.g. str = 0 and expnstu = 0, or 1 = 0 and 2 = 0 Formally H0 : 1 = 1,0 2 = 2,0 versus H1 : 1 6= 1,0 2 6= 2,0 Idea: We could just test 1 = 0 and 2 = 0 independently of each other. t1 = ^ 1 1,0 ^ ^1 t2 = ^ 2 2,0 ^ ^2 ## In that case the null hypothesis H0 : 1 = 1,0 2 = 2,0 would be rejected if either 1 = 0 or 2 = 0 are rejected. We can easily see that this does not even work for uncorrelated s: c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 104 ## 6 February 2015 09:49:38 set.seed(100) N<-1000 p<-0.05 qcrit<- -qnorm(p/2) b1<-rnorm(N) mean(abs(b1)>qcrit)*100 [1] 5.9 b2<-rnorm(N) mean(abs(b2)>qcrit)*100 [1] 4.6 reject<-abs(b1)>qcrit | abs(b2)>qcrit mean(reject)*100 [1] 10.3 In the example 10.3 % of the values are rejected by the joint test, not 5%. This is not a coincidence. The next diagram shows that we are not only cutting off on the left and on the right, but also at the top and at the bottom. plot(b2 ~ b1,cex=.7) points(b2 ~ b1,subset=reject,col="red",pch=7,cex=.5) abline(v=c(qcrit,-qcrit),h=c(qcrit,-qcrit)) dataEllipse(b1,b2,levels=1-p,plot.points=FALSE) legend("topleft",c("naive rejection","95\\% region"),pch=c(7,NA),col="red",lty=c(NA,1),cex= 105 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) 0 -3 -2 -1 b2 naive rejection 95% region -3 -2 -1 b1 Additionally we can see that this nave approach only takes the maximum deviation of the variables into account. It would be more sensible to exclude all observations outside of the red circle. The second problem becomes even more annoying if the random variables are correlated: set.seed(100) b1<-rnorm(N) b2<-.3* rnorm(N) + .7*b1 reject<-abs(b1)>qcrit | abs(b2)>qcrit plot(b2 ~ b1,cex=.5) points(b2 ~ b1,subset=reject,col="red",pch=7,cex=.5) abline(v=c(qcrit,-qcrit),h=c(qcrit,-qcrit)) dataEllipse(b1,b2,levels=1-p,plot.points=FALSE) text(-1,1,"A") legend("topleft",c("naive rejection","95\\% region"),pch=c(7,NA),col="red",lty=c(NA,1),cex=.7) ## 6 February 2015 09:49:38 naive rejection 95% region -2 -1 b2 c Oliver Kirchkamp 106 -3 -2 -1 b1 For example, "A" in the diagram is clearly outside the confidence ellipse, but none of its single coordinates are conspicious. 4.8.1 F statistic for two restrictions t1 = ^ 1 1,0 ^ ^1 t2 = ^ 2 2,0 ^ ^2 t1 t2 t1 t2 1 t21 + t22 2^ F= 2 1 ^2t t 1 2 ## with ^t1 t2 being the estimated correlation between t1 and t2 . If ^t1 t2 = 0:  1 2 2 F= t + t2 2 1 Recall: N(0, 1) p tn 2n /n n X i=1 (N(0, 1)) 2n 2n1 /n1 Fn1 ,n2 2n2 /n2 ## 4.8.2 More than two restrictions Write restrictions as R = r e.g. (0, 1, 0, , 0) 0 1 0 0 0 1 1 0 1 0 1 0 0 1 2 0 1 2 .. . k =0 0 1 2 .. . k 0 1 2 .. . k   = 0 7 = 1  1 ^ ^ r) ^ (R F = (R r) R ^ ^R q with q being the number of restrictions. If assumptions 14 (see 4.4) are satisfied: p F Fq, 107 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) 1.0 ## 6 February 2015 09:49:38 0.6 0.4 0.0 0.2 density 0.8 q=2 q=3 q=5 q = 10 0.2 0.4 0.6 Fq, density ## p = Pr(F(q, ) > Fsample 0.0 c Oliver Kirchkamp 108 Fq, ## 4.8.3 Specials cases: The last line of a regression output 109 ## summaryR(lm(testscr ~ str + elpct + expnstu)) Call: lm(formula = testscr ~ str + elpct + expnstu) Residuals: Min 1Q -51.340 -10.111 Median 0.293 3Q 10.318 Max 43.181 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 649.577947 15.668622 41.457 <2e-16 *** str -0.286399 0.487513 -0.587 0.5572 elpct -0.656023 0.032114 -20.428 <2e-16 *** expnstu 0.003868 0.001607 2.406 0.0166 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 14.35 on 416 degrees of freedom Multiple R-squared: 0.4366,Adjusted R-squared: 0.4325 F-statistic: 144.3 on 3 and 416 DF, p-value: < 2.2e-16 ## Restrictions for the F-statistic of an estimation (0 is not being tested) 1 0 0 0 1 0 0 0 1 .. .. .. . . . 0 0 0 0 1 0 2 0 3 . . .. . .. .. 1 k 0 0 0 .. . 0 ## Testing a single coefficient: 1 0 0 t1 = ^ 1 1,0 ^ ^1 0 1 2 .. . k =0 X t(k) X2 F(k,k) c constructs a vector by joining the arguments together. rbind joins the arguments of the function (vectors, matrices) line-wise. cbind joins the arguments of the function (vectors, matrices) column-wise. linearHypothesis tests linear hypotheses. pf calculated the distribution function of the F-distribution, df c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 110 ## 6 February 2015 09:49:38 calculates the density function, qf calculates quantiles of the F-distribution, rf calculates an F-distributed random variable. ## H0 : str = 0 and expnstu = 0 est <- lm(testscr ~ str + elpct + expnstu) R <- rbind(c(0,1,0,0),c(0,0,0,1)) r <- c(0,0) 0     0 1 0 0 0 1 = 0 0 0 1 2 0 3 linearHypothesis(est, R, r) Linear hypothesis test Hypothesis: str = 0 expnstu = 0 Model 1: restricted model Model 2: testscr ~ str + elpct + expnstu Res.Df RSS Df Sum of Sq F Pr(>F) 1 418 89000 2 416 85700 2 3300.3 8.0101 0.000386 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 testh<-linearHypothesis(est, R, r) pf(testh$F[2],2,Inf,lower.tail=FALSE)
[1] 0.0003320828

linearHypothesis(est, R, r, vcov=hccm)
Linear hypothesis test
Hypothesis:
str = 0
expnstu = 0
Model 1: restricted model
Model 2: testscr ~ str + elpct + expnstu
Note: Coefficient covariance matrix supplied.

Res.Df Df
F
Pr(>F)
1
418
2
416 2 5.2617 0.005537 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
test<-linearHypothesis(est, R, r, vcov=hccm)
pf(test$F[2],2,Inf,lower.tail=FALSE) [1] 0.005186642 linearHypothesis(est,c("str=0","expnstu=0"),vcov=hccm) Linear hypothesis test Hypothesis: str = 0 expnstu = 0 Model 1: restricted model Model 2: testscr ~ str + elpct + expnstu Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 418 2 416 2 5.2617 0.005537 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## 4.8.4 Special case: Homoscedastic error terms Recall: In the case of heteroscedasticity: 1  1 ^ ^ r) ^ F = (R r) R (R ^ ^R q in the case of homoscedasticity if we test 1 = 2 = . . . = k = 0: F= n k 1 SSRrestricted SSRunrestricted q SSRunrestricted 111 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 112 ## 6 February 2015 09:49:38 Pn u ^ 2i of the restricted model Pi=1 n ^ 2i of the unrestricted model i=1 u number of regressors of the unrestricted model number of restrictions SSRrestricted SSRunrestricted k q Recall: R = s2y ^ s2y s2u SSR = 1 2^ = 1 TSS sy ## divide numerator and denominator of F by TSS: n k 1 R2unrestricted R2restricted F= q 1 R2unrestricted F is distributed according to Fq,nk1 ## 4.9 Restrictions with more than one coefficient e.g. in RetSchool we estimate wage76 ~ grade76 + age76 ## + black + daded + momed ## Hypothesis: daded = momed  1st approach (F-test): R = 0 0 0 0 1 1 , r = 0 2nd Approach (t-test, rearranging the equation) y = = 0 + 1 X1 + 2 X2 + u 0 + (1 2 )X1 + 2 (X2 + X 1) + u data(RetSchool,package="Ecdat") attach(RetSchool) summary(wage76) Min. 1st Qu. 0.000 1.377 Median 1.683 ## Mean 3rd Qu. 1.658 1.957 Max. 3.180 NAs 2147 table(grade76) grade76 0 1 3 2 16 17 539 182 2 2 18 264 3 4 4 6 5 13 6 22 ## est <- lm(wage76 ~ grade76 + age76 7 42 8 90 9 92 10 148 11 12 194 1213 ## + black + daded + momed) 13 332 14 314 15 209 113 summaryR(est) Call: lm(formula = wage76 ~ grade76 + age76 + black + daded + momed) Residuals: Min 1Q -1.75969 -0.25153 Median 0.02054 3Q 0.25961 Max 1.36709 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.002560 0.078653 0.033 0.9740 grade76 0.039486 0.003060 12.902 <2e-16 *** age76 0.039229 0.002317 16.930 <2e-16 *** black -0.218286 0.017866 -12.218 <2e-16 *** daded 0.000465 0.002732 0.170 0.8648 momed 0.007247 0.003009 2.408 0.0161 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.3894 on 3053 degrees of freedom (2166 observations deleted due to missingness) Multiple R-squared: 0.2274,Adjusted R-squared: 0.2261 F-statistic: 177.9 on 5 and 3053 DF, p-value: < 2.2e-16 Although the coefficient of momed is significantly different from zero and the coefficient of daded is not, they are not significantly different from each other: linearHypothesis(est,c("daded=momed"),vcov=hccm) Linear hypothesis test Hypothesis: daded - momed = 0 Model 1: restricted model Model 2: wage76 ~ grade76 + age76 + black + daded + momed Note: Coefficient covariance matrix supplied. 1 2 Res.Df Df F Pr(>F) 3054 3053 1 1.9809 0.1594 alternatively: momdaded <- momed+daded est2<-lm(wage76 ~ grade76 + age76 + black + momed + momdaded) linearHypothesis(est2,"momed=0",vcov=hccm) Linear hypothesis test c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 114 ## 6 February 2015 09:49:38 Hypothesis: momed = 0 Model 1: restricted model Model 2: wage76 ~ grade76 + age76 + black + momed + momdaded Note: Coefficient covariance matrix supplied. 1 2 Res.Df Df F Pr(>F) 3054 3053 1 1.9809 0.1594 or even simpler: summaryR(lm(wage76 ~ grade76 + age76 ## + black + momed + momdaded) Call: lm(formula = wage76 ~ grade76 + age76 + black + momed + momdaded) Residuals: Min 1Q -1.75969 -0.25153 Median 0.02054 3Q 0.25961 Max 1.36709 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.002560 0.078653 0.033 0.974 grade76 0.039486 0.003060 12.902 <2e-16 *** age76 0.039229 0.002317 16.930 <2e-16 *** black -0.218286 0.017866 -12.218 <2e-16 *** momed 0.006782 0.004818 1.407 0.159 momdaded 0.000465 0.002732 0.170 0.865 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 0.3894 on 3053 degrees of freedom (2166 observations deleted due to missingness) Multiple R-squared: 0.2274,Adjusted R-squared: 0.2261 F-statistic: 177.9 on 5 and 3053 DF, p-value: < 2.2e-16 ## confidence.ellipse draws the confidence area for coefficients of a linear model. confidence.ellipse(est,c("daded","momed"),levels=c(.9,.95,.975,.99)) abline(v=0,h=0,a=0,b=1) 0.010 0.005 0.000 momed coefficient 0.015 -0.005 0.000 daded coefficient linearHypothesis(est,c("daded=0","momed=0"),vcov=hccm) Linear hypothesis test Hypothesis: daded = 0 momed = 0 Model 1: restricted model Model 2: wage76 ~ grade76 + age76 + black + daded + momed Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 3055 2 3053 2 3.6955 0.02495 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 linearHypothesis(est,c("daded=0","momed=0.01"),vcov=hccm) Linear hypothesis test Hypothesis: daded = 0 momed = 0.01 0.005 115 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 116 ## 6 February 2015 09:49:38 ## Model 1: restricted model Model 2: wage76 ~ grade76 + age76 + black + daded + momed Note: Coefficient covariance matrix supplied. 1 2 Res.Df Df F Pr(>F) 3055 3053 2 0.4433 0.642 linearHypothesis(est,c("daded=0.01","momed=0"),vcov=hccm) Linear hypothesis test Hypothesis: daded = 0.01 momed = 0 Model 1: restricted model Model 2: wage76 ~ grade76 + age76 + black + daded + momed Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 3055 2 3053 2 6.6741 0.001282 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## Bayesian. . . One possibility to study the difference between the education of dads and moms on wages in a Bayesian framework would be to look at the difference of the two coefficients. In the following we estimate the same regression as above, but concentrate on the difference between momed and daded. mData<-with(est$model,list(wage=wage76,grade=grade76,age=age76,black=black,
modelR<-model {
for (i in 1:length(wage)) {
}
beta0 ~ dnorm (0,.0001)
bAge ~ dnorm (0,.0001)
bBlack~ dnorm (0,.0001)
bMom ~ dnorm (0,.0001)
tau
~ dgamma(.01,.01)

}
}
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 1 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation

0 20406080

Density

0.01
0.00

0.02

-0.01

0.00

0.01

0.02

-0.01

## 6000 8000 10000

14000

Iteration
The credible interval of this difference contains the zero.
summary(bayesR)

Iterations = 5001:15000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,

117

c Oliver Kirchkamp

## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE

(MULTIPLE REGRESSION)

c Oliver Kirchkamp

118

## plus standard error of the mean:

Mean
SD
Naive SE Time-series SE
0.0001848
2. Quantiles for each variable:
2.5%
25%
50%
75%
97.5%
MomMinusDad -0.002547 0.003548 0.006701 0.009747 0.01569

100*mean(unlist(bayesR$mcmc)<0) [1] 8.105 ## How likely is it, that the difference MomMinusDad is positive? 100*mean(unlist(bayesR$mcmc)>0)
[1] 91.895

(The first number is about half of the p-value we got for daded=momed above.
Linear hypothesis test
Hypothesis:
Model 1: restricted model
Model 2: wage76 ~ grade76 + age76 + black + daded + momed

1
2

Res.Df
F Pr(>F)
3054 463.32
3053 463.01 1
0.31293 2.0634 0.151

This is to be expected since above we did a two-sided test, while here the alternative is one-sided.)

## 4.10 Model specification

Model specification which coefficients to include?
testscr distcod + county + district + grspan + enrltot + teachers
+ calwpct + mealpct + computer + compstu + expnstu + str + avginc
+ elpct + readscr + mathscr This model contains perhaps too many coefficients multicollinearity)

119

testscr str This model contains perhaps too few coefficients omitted
variable bias)
Omitted variable bias

E(b1 ) = 1 + (X1 X1 )1 X1 X2 2
Only when X1 is orthogonal to X2 or 2 is zero we have no bias
Overfitting (multicollinearity)
1
X Iu2 X(X X)1

^
^ = (X X)

When X is (almost) collinear, then (X X)1 is large, and then is large, hence
our estimates are not precise.

## When coefficients change in an alternative specification, this can be a sign of

omitted variable bias.

## Scaling coefficients How should we add a new variable to the regression?

coef(lm(testscr ~ str + elpct + expnstu))
(Intercept)
649.577947257

str
-0.286399240

elpct
-0.656022660

expnstu
0.003867902

## elratio <- elpct/100

coef(lm(testscr ~ str + elratio + expnstu))
(Intercept)
649.577947257

str
elratio
-0.286399240 -65.602266008

expnstu
0.003867902

## expnstuTSD <- expnstu/1000

coef(lm(testscr ~ str + elpct + expnstuTSD))
(Intercept)
649.5779473

str
-0.2863992

elpct
-0.6560227

expnstuTSD
3.8679018

## What is the benefit of adding another variable?

measure R2
measure contribution to R2

c Oliver Kirchkamp

## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE

(MULTIPLE REGRESSION)

## look at p-value of the t-statistic

look at p-value of the variance analysis
measure AIC
Perhaps it is helpful to control for wealth in the school district. Which variables
could, in the Caschool example, be a good indicator for wealth?
plot(testscr ~ elpct,main="English learner percentage")
plot(testscr ~ mealpct,main="percentage qualifying for reduced price lunch")
plot(testscr ~ calwpct,main="percentage qualifying for income assistance")

20

40

elpct

60

80

700
620

640

testscr

660

680

700
680
660
640
620

620

640

testscr

660

680

700

English learner percentage percentage qualifying for reduced pricepercentage qualifying for income assistance

testscr

c Oliver Kirchkamp

120

20

40

60

80 100

mealpct

4.10.1 Measure R2
R2 = 1

SSR
TSS

## R2 only measures the fit of the regression

R2 does not measure causality (e.g. parkings lots testscr)

## R2 does not measure the absence of omitted variable bias

R2 does not measure the correctness of the specification

20

40

calwpct

60

80

121

## 4.10.2 Measure contribution to R2

There are different approaches to do this.
We can take a look at R2 , when the variable we are analysing is the only one in
the model, or when it is the last one, or we can look at all other feasible sequences.
library(relaimpo)
est <- lm(testscr ~ str + elpct + mealpct + calwpct)

calc.relimp(est,type=c("first","last","lmg","pmvd"),rela=TRUE)
Response variable: testscr
Total response variance: 363.0301
Analysis based on 420 observations
4 Regressors:
str elpct mealpct calwpct
Proportion of variance explained by model: 77.49%
Metrics are normalized to sum to 100% (rela=TRUE).
Relative importance metrics:
lmg
str
0.03119231
elpct
0.22371548
mealpct 0.53343971
calwpct 0.21165250

pmvd
0.0148176134
0.0242703918
0.9600101671
0.0009018276

last
0.059126952
0.048159854
0.890678586
0.002034608

first
0.03175031
0.25708495
0.46768098
0.24348376

str
elpct
mealpct
calwpct

1X
-2.2798083
-0.6711562
-0.6102858
-1.0426750

2Xs
-1.4612232
-0.4347537
-0.5922408
-0.5863541

3Xs
-1.1371224
-0.2510901
-0.5645062
-0.2639020

4Xs
-1.01435328
-0.12982189
-0.52861908
-0.04785371

## 4.10.3 Information criteria

Analysis of Variance Instead of looking at the p-value of a coefficient, we compare the variance of the residuals of two different models: model 2 (with the coefficient), model 1 (without).
F(k2 k1 ,nk2 ) =

k2 k1

## We can, hence, use the F statistic to compare two models.

Let L be the log-likelihood of the estimated model.

c Oliver Kirchkamp

## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE

(MULTIPLE REGRESSION)

c Oliver Kirchkamp

122

## One can show that for linear models

2 L = n log

+C
n

But then


log
2 (L2 L1 ) = n log
n
n

= n log

2k2 k1

## We can, thus, also use the 2 statistic to compare two models.

est2 <- lm(testscr ~ str + elpct + mealpct + calwpct)
est1 <- lm(testscr ~ str + mealpct + calwpct)
sum(est2$residuals^2) [1] 34247.46 sum(est1$residuals^2)
[1] 35450.8

## An easier way to obtain the SSR is deviance:

deviance(est2)
[1] 34247.46
L2 <- logLik(est2)
L1 <- logLik(est1)
n <- length(est$residuals) k2 <- est2$rank
k1 <- est1$rank pchisq(2 *(L2 - L1),k2-k1,lower=FALSE) log Lik. 0.0001398651 (df=6) pchisq(n * log(RSS1 / RSS2),k2-k1,lower=FALSE) [1] 0.0001398651 anova(est1,est2,test="Chisq") Analysis of Variance Table Model 1: testscr ~ str + mealpct + calwpct Model 2: testscr ~ str + elpct + mealpct + calwpct Res.Df RSS Df Sum of Sq Pr(>Chi) 1 416 35451 2 415 34247 --Signif. codes: ## 1203.3 0.0001342 *** ## 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## pf((RSS1 - RSS2)/RSS2 * (n-k2)/(k2-k1),k2-k1,n-k2,lower=FALSE) [1] 0.0001547027 anova(est1,est2) Analysis of Variance Table Model 1: testscr ~ str + mealpct + calwpct Model 2: testscr ~ str + elpct + mealpct + calwpct Res.Df RSS Df Sum of Sq F Pr(>F) 1 416 35451 2 415 34247 1 1203.3 14.582 0.0001547 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## F-test for ANOVA summary(est2) Call: lm(formula = testscr ~ str + elpct + mealpct + calwpct) Residuals: Min 1Q -32.179 -5.239 Median -0.185 3Q 5.171 Max 31.308 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 700.39184 4.69797 149.084 < 2e-16 *** str -1.01435 0.23974 -4.231 0.0000286 *** elpct -0.12982 0.03400 -3.819 0.000155 *** mealpct -0.52862 0.03219 -16.422 < 2e-16 *** calwpct -0.04785 0.06097 -0.785 0.432974 123 c Oliver Kirchkamp ## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE (MULTIPLE REGRESSION) c Oliver Kirchkamp 124 ## 6 February 2015 09:49:38 --Signif. codes: ## 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## Residual standard error: 9.084 on 415 degrees of freedom Multiple R-squared: 0.7749,Adjusted R-squared: 0.7727 F-statistic: 357.1 on 4 and 415 DF, p-value: < 2.2e-16 ## Comparing two models is different from testing one coefficient Information criteria use a similar approach Goal: Find a model that explains the data well but has as few parameters as possible. (prevent overfitting) Let L be the log-likelihood of the estimated model. Hirotsugo Akaike (1971): An Information Criterion: AIC = 2 L + 2 k Gideon E. Schwarz (1978): Bayesian Information Criterion BIC = 2 L + k log n est <- lm(testscr ~ str + elpct + mealpct + calwpct + enrltot) extractAIC(est) [1] 6.000 1859.498 ## step now looks automatically for a model with a good AIC step(est) Start: AIC=1859.5 testscr ~ str + elpct + mealpct + calwpct + enrltot Df Sum of Sq - calwpct 1 70.1 - enrltot 1 78.9 <none> - elpct 1 1262.6 - str 1 1552.8 - mealpct 1 20702.3 RSS 34239 34247 34169 35431 35721 54871 AIC 1858.4 1858.5 1859.5 1872.7 1876.2 2056.4 Step: AIC=1858.36 testscr ~ str + elpct + mealpct + enrltot Df Sum of Sq RSS AIC - enrltot 1 60 34298 1857.1 <none> 34239 1858.4 - elpct 1 1208 35446 1870.9 - str - mealpct 1 1 ## 1496 35734 1874.3 51150 85388 2240.2 Step: AIC=1857.09 testscr ~ str + elpct + mealpct Df Sum of Sq <none> - elpct - str - mealpct 1 1 1 RSS 34298 1167 35465 1441 35740 52947 87245 AIC 1857.1 1869.1 1872.4 2247.2 Call: lm(formula = testscr ~ str + elpct + mealpct) Coefficients: (Intercept) 700.1500 str -0.9983 elpct -0.1216 mealpct -0.5473 ## Why does the AIC make sense within sample prediction ## including more variables always improve the likelihood ## out of sample prediction ## including more variables may decrease the likelihood set.seed(123) N<-nrow(Caschool) mySamp<-sample(1:N,N/2) CaIn<-Caschool[mySamp,] CaOut<-Caschool[-mySamp,] est <-lm(testscr ~ str + elpct + mealpct + calwpct + enrltot,data=CaIn) estSm<-lm(testscr ~ str + elpct + mealpct + enrltot,data=CaIn) deviance(est) [1] 15996.78 deviance(estSm) [1] 16128.62 ## within sample a smaller model must have a larger deviance. Now do the same out of sample: sum((CaOut$testscr-predict(est,newdata=CaOut))^2)
[1] 18658.43

125

c Oliver Kirchkamp

## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE

(MULTIPLE REGRESSION)

sum((CaOut$testscr-predict(estSm,newdata=CaOut))^2) [1] 18513.59 ## Out of sample a smaller model can have a smaller deviance. 640 660 680 700 newdata<-list(avginc=5:55) plot(testscr ~ avginc) lines(predict(lm(testscr ~ poly(avginc,2)),newdata=newdata)~newdata$avginc,lty=1,lwd=4)
lines(predict(lm(testscr ~ poly(avginc,5)),newdata=newdata)~newdata$avginc,lty=2,lwd=4) lines(predict(lm(testscr ~ poly(avginc,15)),newdata=newdata)~newdata$avginc,lty=3,lwd=4)
legend("bottomright",c("$r=2$","$r=5$","$r=15$"),lty=1:3,lwd=4)

r=2
r=5
r = 15

620

testscr

c Oliver Kirchkamp

126

10

20

30

40

50

avginc
Of course, the above graph depends on the specific sample of CaIn and CaOut.
We could repeat this exercise for many samples.

## Within sample deviance:

plot(sapply(1:15,function(r) deviance(lm(testscr~poly(avginc,r),data=CaIn))),
xlab="degree of polynomial $r$",ylab="within sample deviance")

c Oliver Kirchkamp

28000

30000

127

26000

32000

## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE

(MULTIPLE REGRESSION)

10

12

14

degree of polynomial r
Out of sample deviance:

60000
50000
40000

680
660
640
620

testscr

10

20

30

40

50

avginc

ti =

^ i i,0

^
^i

## est <- lm(testscr ~ str + elpct + mealpct + calwpct)

coef(est)/sqrt(diag(hccm(est)))
(Intercept)
124.7256881

str
-3.7184527

elpct
mealpct
-3.5192705 -13.5610567

calwpct
-0.7778974

round(2*pnorm(-abs(coef(est)/sqrt(diag(hccm(est))))),5)
(Intercept)
0.00000

str
0.00020

elpct
0.00043

mealpct
0.00000

calwpct
0.43663

Instead of always calculating heteroscedasticity-consistent standard errors manually, as we did above, we can also use the function summaryR from the library
tonymisc.
summaryR(lm(testscr ~ str + elpct + mealpct + calwpct))

Call:
lm(formula = testscr ~ str + elpct + mealpct + calwpct)

c Oliver Kirchkamp

129

700

## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE

(MULTIPLE REGRESSION)

c Oliver Kirchkamp

130

## 6 February 2015 09:49:38

Residuals:
Min
1Q
-32.179 -5.239

Median
-0.185

3Q
5.171

Max
31.308

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 700.39184
5.61546 124.726 < 2e-16 ***
str
-1.01435
0.27279 -3.718 0.000228 ***
elpct
-0.12982
0.03689 -3.519 0.000481 ***
mealpct
-0.52862
0.03898 -13.561 < 2e-16 ***
calwpct
-0.04785
0.06152 -0.778 0.437073
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 9.084 on 415 degrees of freedom
F-statistic: 349.4 on 4 and 415 DF, p-value: < 2.2e-16

## 4.10.5 Bayesian Model Comparison

Idea: A binary process selects (randomly) among the two models we want
to compare.
1. testscr = 0 + 1 elpct + 2 str + u
2. testscr = 0 + 1 elpct + u
Problem: While one of the two models is not selected, parameters of this
model can take any value (and do not reduce likelihood).
convergence is slow!

## Solution: Pseudopriors (the binary process already has informed priors

myData1<-list(str=str,testscr=testscr,elpct=elpct)
mod1 <- model {
for (i in 1:length(testscr)) {
testscr[i] ~ dnorm(beta[1]+beta[2]*elpct[i]+beta[3]*str[i],tau)
}
# prior:
for (j in 1:3) {
beta[j] ~ dnorm(0,.0001)
}
tau ~ dgamma(.1,.1)
}
mod1.jags <-run.jags(model=mod1,data=myData1,monitor=c("beta","tau"))

## Compiling rjags model and adapting for 1000 iterations...

Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 4 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation

mod1.sum<- summary(mod1.jags)[["statistics"]][,1:2]
Mean
SD
beta[1] 682.925223960 7.5409541182
beta[2] -0.651991336 0.0397160356
beta[3] -0.942712391 0.3874377114
tau
0.004783592 0.0003318526
myData2<-list(testscr=testscr,elpct=elpct)
mod2 <- model {
for (i in 1:length(testscr)) {
testscr[i] ~ dnorm(beta[1] + beta[2]*elpct[i],tau)
}
for (j in 1:2) {
beta[j] ~ dnorm(0,.0001)
}
tau ~ dgamma(.1,.1)
}
mod2.jags <-run.jags(model=mod2,data=myData2,monitor=c("beta","tau"))
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 3 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation

mod2.sum<- summary(mod2.jags)[["statistics"]][,1:2]
Mean
SD
beta[1] 664.691032674 0.9445561702
beta[2] -0.669906945 0.0389640021
tau
0.004698533 0.0003245831

## Remember: the uninformed priors we used above were as follows:

i dnorm(0, 0.0001)
dgamma(.1, .1)
Now we want to give some (informed) help to the Bayesian model:

131

c Oliver Kirchkamp

## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE

(MULTIPLE REGRESSION)

c Oliver Kirchkamp

132

## Pseudoprior for is trivial.

Mean of can be taken from the estimate above.
Precision = 1/2 , and we know from the estimate above.
To obtain the pseudoprior for we use properties of the -distribution:
If (, ) then E() = / and var() = /2

## We estimate parameters of the -distribution as follows:

^ = /

^ = 2 / var(),
var().
myData12<-list(str=str,testscr=testscr,elpct=elpct,sum=mod1.sum,sumX=mod2.sum)
mod12 <- model {
for (i in 1:length(testscr)) {
testscr[i] ~ dnorm(ifelse(equals(mI,1),
beta[1]+beta[2]*elpct[i]+beta[3]*str[i],
betaX[1]+betaX[2]*elpct[i]),
ifelse(equals(mI,1),tau,tauX))
}
for (j in 1:3) {
beta[j] ~ dnorm(sum[j,1],1/sum[j,2]^2)
}
tau ~ dgamma(sum[4,1]^2/sum[4,2]^2,sum[4,1]/sum[4,2]^2)
for (j in 1:2) {
betaX[j] ~ dnorm(sumX[j,1],1/sumX[j,2]^2)
}
tauX ~ dgamma(sumX[3,1]^2/sumX[3,2]^2,sumX[3,1]/sumX[3,2]^2)
mI ~ dcat(mProb[])
mProb[1]<-.5
mProb[2]<-.5
}
mod12.jags <-run.jags(model=mod12,data=myData12,
monitor=c("beta","tau","betaX","tauX","mI"))
Compiling rjags model and adapting for 1000 iterations...
Calling the simulation using the rjags method...
Burning in the model for 4000 iterations...
Running the model for 10000 iterations...
Simulation complete
Calculating the Gelman-Rubin statistic for 8 variables....
The Gelman-Rubin statistic is below 1.05 for all parameters
Finished running the simulation

summary(mod12.jags)

Iterations = 5001:15000
Thinning interval = 1

133

Number of chains = 2
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean
beta[1] 683.257424
beta[2]
-0.651894
beta[3]
-0.964436
betaX[1] 664.697289
betaX[2] -0.669097
mI
1.169200
tau
0.004784
tauX
0.004700

SD
4.8295929
0.0303106
0.2439432
0.9011985
0.0374353
0.3749378
0.0002554
0.0003110

Naive SE Time-series SE
0.034150379
0.245753589
0.000214328
0.000329866
0.001724939
0.012495793
0.006372436
0.008693537
0.000264708
0.000340590
0.002651211
0.012153832
0.000001806
0.000002364
0.000002199
0.000002849

## 2. Quantiles for each variable:

2.5%
25%
50%
75%
97.5%
beta[1] 673.389990 680.253051 683.486755 686.347180 692.408889
beta[2]
-0.711881 -0.671751 -0.652033 -0.632004 -0.592018
beta[3]
-1.426208 -1.122639 -0.972609 -0.809592 -0.483269
betaX[1] 662.923675 664.096016 664.693130 665.295605 666.456273
betaX[2] -0.742853 -0.693643 -0.669193 -0.644838 -0.594024
mI
1.000000
1.000000
1.000000
1.000000
2.000000
tau
0.004296
0.004615
0.004778
0.004950
0.005300
tauX
0.004106
0.004492
0.004693
0.004898
0.005332

Interesting is the result for mI. This variable has the value 1 or 2 for model 1 or
model 2. An average of 1.1692 means that in 0.8308 of all cases we have model 1
and in 0.1692 of all cases we have model 2.
In other words, model 1 is 4.91 times more likely than model 2.
Compare with p-values from the frequentist analysis:
summaryR(lm(testscr ~ elpct + str ))

Call:
lm(formula = testscr ~ elpct + str)
Residuals:
Min
1Q
-48.845 -10.240

Median
-0.308

3Q
9.815

Max
43.461

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 686.0322
8.8122
77.85
<2e-16 ***
elpct
-0.6498
0.0313 -20.76
<2e-16 ***
str
-1.1013
0.4371
-2.52
0.0121 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

c Oliver Kirchkamp

## 4 MODELS WITH MORE THAN ONE INDEPENDENT VARIABLE

(MULTIPLE REGRESSION)

c Oliver Kirchkamp

134

## Residual standard error: 14.46 on 417 degrees of freedom

F-statistic: 220.1 on 2 and 417 DF, p-value: < 2.2e-16

## Why are these numbers so different from p-values?

p-value=P(data|H0 )

## Here H0 = model 2 is true

p-value=P(data|model2) = 0.0121166

## In our Bayesian model comparison we have calculated

P(model1|data) = 0.8308
P(model2|data) = 0.1692.
Model comparison: Words of caution Whenever we compare models, we have
Are the models we are comparing plausible?
Example: Returning from the lecture I find on my desk a book which I long
Model 1: The book has been provided by Santa Claus.
Model 2: The book has been provided by the Tooth Fairy.
A careful model comparison finds Santa Claus 50 times more likely than the
Tooth Fary. Does this mean that Santa Claus, indeed, brought the book?
The same applies, of course, to interpreting p-values from hypothesis tests.
4.10.6 Comparing models
If we want to see a table of different models (but using heteroscedasticity-consistent
standard errors), we can adjust getSummary.lm. Mainly we use the version of
getSummary.lm from the memisc package, but we replace the standard error by
the heteroscedasticity-consistent standard error:
library(memisc)
getSummary.lm <- function (est,...) {
z <- memisc::getSummary.lm(est,...)
[1]

NA 16.98182

## coef(est2) %*% c(0,1,2*10) * (1 +

qnorm(.975)/sqrt(lhtest$F)[2]) [,1] [1,] 3.351629 coef(est2) %*% c(0,1,2*10) * (1 - qnorm(.975)/sqrt(lhtest$F)[2])

[,1]
[1,] 2.658022

Procedure:
1. theoretical motivation for non-linear dependencies

c Oliver Kirchkamp

c Oliver Kirchkamp

148

## 2. specified functional form

3. testing whether a non-linear function is justified
4. visual test
5. marginal effects

## 5.1 Functional forms

5.1.1 Polynomials
Yi = 0 + 1 Xi + 2 X2i + 3 X3i + . . . + r Xri + ui
(r = 2: quadratic model, r = 3: cubic model, . . . )
Testing whether the regression function is linear:
H0 : 2 = 0 3 = 0 . . . r = 0
versus
H1 : at least one j 6= 0, j {2, . . . , r}
What is the best r?

## large r: more flexibility, better fit

small r: more precise estimation of the individual coefficients

## sequential hypothesis test in the case of polynomial models:

1. Choose the largest sensible value of r and estimate a polynomial regression
2. Test H0 : r = 0. If H0 is rejected, use a polynomial of the r-th degree
3. Else, reduce r by 1. Continue with step 1.
avginc3 <- avginc*avginc*avginc
est3 <- lm(testscr ~ avginc + avginc2 + avginc3)
(lhtest <- linearHypothesis(est3,"avginc3",vcov=hccm))
Linear hypothesis test
Hypothesis:
avginc3 = 0
Model 1: restricted model
Model 2: testscr ~ avginc + avginc2 + avginc3

## Note: Coefficient covariance matrix supplied.

1
2

Res.Df Df
F Pr(>F)
417
416 1 2.4615 0.1174

## est2 <- lm(testscr ~ avginc + avginc2)

(lhtest <- linearHypothesis(est2,"avginc2",vcov=hccm))
Linear hypothesis test
Hypothesis:
avginc2 = 0
Model 1: restricted model
Model 2: testscr ~ avginc + avginc2
Note: Coefficient covariance matrix supplied.
Res.Df Df
F
Pr(>F)
1
418
2
417 1 75.136 < 2.2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

## summary(estp <- lm(testscr ~ poly(avginc,10,raw=TRUE)))

Call:
lm(formula = testscr ~ poly(avginc, 10, raw = TRUE))
Residuals:
Min
1Q
-42.435 -9.159

Median
0.424

3Q
8.764

Max
33.066

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.004e+03 5.773e+02
1.739
0.0828 .
poly(avginc, 10, raw = TRUE)1 -2.714e+02 3.245e+02 -0.837
0.4034
poly(avginc, 10, raw = TRUE)2
7.436e+01 7.721e+01
0.963
0.3361
poly(avginc, 10, raw = TRUE)3 -1.073e+01 1.026e+01 -1.046
0.2961
poly(avginc, 10, raw = TRUE)4
9.349e-01 8.449e-01
1.106
0.2692
poly(avginc, 10, raw = TRUE)5 -5.202e-02 4.519e-02 -1.151
0.2503
poly(avginc, 10, raw = TRUE)6
1.888e-03 1.594e-03
1.184
0.2370
poly(avginc, 10, raw = TRUE)7 -4.444e-05 3.678e-05 -1.208
0.2276
[ reached getOption("max.print") -- omitted 3 rows ]
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

149

c Oliver Kirchkamp

## Residual standard error: 12.67 on 409 degrees of freedom

F-statistic: 53.91 on 10 and 409 DF, p-value: < 2.2e-16

## t(sapply(1:10,function(i) extractAIC(lm(testscr ~ poly(avginc,i,raw=TRUE)))))

[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]

[,1]
2
3
4
5
6
7
8
9
10
11

[,2]
2181.164
2139.508
2139.384
2136.448
2135.989
2137.467
2139.372
2141.179
2143.160
2143.578

2136

2140

2144

2148

r<-2:50
plot(sapply(r,function(x) extractAIC(lm(testscr ~ poly(avginc,x,raw=TRUE)))[2]) ~ r,t="l",
ylab="AIC")

AIC

c Oliver Kirchkamp

150

10

20

30

40

50

## NON-LINEAR REGRESSION FUNCTIONS

151

c Oliver Kirchkamp

## 2140 2150 2160 2170 2180

AIC

r<-1:7
plot(sapply(r,function(x) extractAIC(lm(testscr ~ poly(avginc,x,raw=TRUE)))[2]) ~ r,t="l",
ylab="AIC")

## plot(testscr ~ avginc,main="district average income")

abline(est1,col="blue",lwd=3)
lines(avginc[or],fitted(est2)[or],col="red",lwd=3)
lines(avginc[or],fitted(est3)[or],col="green",lwd=3)
smooth<-list(avginc=seq(5,70,.5))
lines(smooth$avginc,predict(estp,newdata=smooth),col="magenta",lwd=3) legend("bottomright",c("linear","quadratic","cubic","10th-deg"),lwd=3,col=c("blue","red","gree ## 6 February 2015 09:49:38 660 testscr 680 700 ## district average income 620 640 linear quadratic cubic 10th-deg 10 20 30 40 50 avginc ## 5.1.2 Logarithmic Models -4 -3 -2 -1 curve(log(x)) log(x) c Oliver Kirchkamp 152 0.0 0.2 0.4 0.6 0.8 1.0 ## NON-LINEAR REGRESSION FUNCTIONS Yi = 0 + 1 ln Xi + ui linear-log ln Yi = 0 + 1 Xi + ui log-linear ln Yi = 0 + 1 ln Xi + ui 153 c Oliver Kirchkamp log-log ## 5.1.3 Logarithmic Models: linear-log Yi = 0 + 1 ln Xi + ui marginal effects: 1 Yi = 1 Xi Xi Yi Xi 1 Xi ## if Xi changes by 1% (Xi = 0.01 Xi ) . . . Yi 0.01Xi 1 Xi . . . Yi changes by 0.01 1 ## (estL <- lm(testscr ~ log(avginc))) Call: lm(formula = testscr ~ log(avginc)) Coefficients: (Intercept) log(avginc) 557.83 36.42 plot(testscr ~ avginc,main="district average income") abline(est1,col="blue",lwd=3) lines(avginc[or],fitted(est2)[or],col="red",lwd=3) lines(avginc[or],fitted(estL)[or],col="green",lwd=3) legend("bottomright",c("linear","quadratic","linear-log"),lwd=3,col=c("blue","red","green")) ## 6 February 2015 09:49:38 640 660 680 700 ## district average income linear quadratic linear-log 620 testscr c Oliver Kirchkamp 154 10 20 30 40 50 avginc coef(estL)[2]/10 log(avginc) 3.641968 coef(estL)[2]/40 log(avginc) 0.9104921 ## 5.1.4 Logarithmic Models: log-linear ln Yi = 0 + 1 Xi + ui marginal effects ln Yi = 1 Xi ln Yi 1 Xi with ln Yi 1 Yi we have ln Yi 1 Xi Yi Yi Yi ## A change of Xi by one unit translates into a relative change of Yi by the share 1 ## NON-LINEAR REGRESSION FUNCTIONS 155 ## (estLL <- lm(log(testscr) ~ avginc)) c Oliver Kirchkamp Call: lm(formula = log(testscr) ~ avginc) Coefficients: (Intercept) 6.439362 avginc 0.002844 ## plot(testscr ~ avginc,main="district average income") abline(est1,col="blue",lwd=3) lines(avginc[or],fitted(est2)[or],col="red",lwd=3) lines(avginc[or],fitted(estL)[or],col="green",lwd=3) lines(avginc[or],exp(fitted(estLL))[or],col="black",lwd=3) legend("bottomright",c("linear","quadratic","linear-log","log-lin"),lwd=3,col=c("blue","red"," 660 640 linear quadratic linear-log log-lin 620 testscr 680 700 ## district average income 10 20 30 avginc coef(estLL)[2] avginc 0.00284407 ## Example: What is the effect of work experience on wages exp years of full-time work experience lwage logarithm of wage 40 50 c Oliver Kirchkamp 156 ## 6 February 2015 09:49:38 library(lattice) data(Wages, package="Ecdat") lm(lwage ~ exp,data=Wages) Call: lm(formula = lwage ~ exp, data = Wages) Coefficients: (Intercept) 6.50143 exp 0.00881 ## 5.1.5 Logarithmic Models: log-log ln Yi = 0 + 1 ln Xi + ui ui 1 Yi = e0 X i e marginal effect: Yi Y = e0 1 Xi1 1 = 1 i Xi Xi Yi Xi = 1 Xi Yi 1 is the elasticity of Yi with respect to Xi . (estLLL<- lm(log(testscr) ~ log(avginc))) Call: lm(formula = log(testscr) ~ log(avginc)) Coefficients: (Intercept) log(avginc) 6.33635 0.05542 (estLL <- lm(log(testscr) ~ avginc)) Call: lm(formula = log(testscr) ~ avginc) Coefficients: (Intercept) 6.439362 avginc 0.002844 ## NON-LINEAR REGRESSION FUNCTIONS 157 c Oliver Kirchkamp ## plot(testscr ~ avginc,main="district average income") abline(est1,col="blue",lwd=3) lines(avginc[or],fitted(est2)[or],col="red",lwd=3) lines(avginc[or],fitted(estL)[or],col="green",lwd=3) lines(avginc[or],exp(fitted(estLL))[or],col="black",lwd=3) lines(avginc[or],exp(fitted(estLLL))[or],col="orange",lwd=3) legend("bottomright",c("linear","quadratic","linear-log","log-lin","log-log"),lwd=3,col=c("blu 640 660 linear quadratic linear-log log-lin log-log 620 testscr 680 700 ## district average income 10 20 30 40 50 avginc ## 5.1.6 Comparison of the three logarithmic models X and/or Y are transformed The regression equation is linear in the transformed variables Hypothesis tests and confidence intervals can be calculated in the usual way The interpretation of is different in each case R2 and AIC can be used to compare log-log and log-linear R2 and AIC can be used to compare linear-log and a linear model Comparing ln Yi and Yi is impossible. We need economic theory to motivate one of the four specifications. ## 6 February 2015 09:49:38 ## 5.1.7 Generalization Box-Cox The logarithmic model log Y = X + u Now, take a look at g (Y) = X + u where g (Y) = Y 1 log Y if 6= 0 if = 0 ## is calculated using maximum likelihood. 355 660 640 620 360 365 testscr 370 680 375 700 library(MASS) est <- lm (testscr ~ avginc) plot(boxcox(est,lambda=seq(-2,10,by=.5),plotit=FALSE),t="l", xlab="$\\lambda$",ylab="log-Likelihood") est2 <- lm(testscr^8 ~ avginc) plot(testscr ~ avginc) lines(fitted(est2)[or]^(1/8) ~ avginc[or],col="blue") lines(exp(fitted(estLL)[or]) ~ avginc[or],col="orange") log-Likelihood c Oliver Kirchkamp 158 -2 10 10 20 30 avginc 40 50 ## NON-LINEAR REGRESSION FUNCTIONS 159 ## 5.1.8 Other non-linear functions Problems with the above models: Polynomial model: potentially not monotonic. linear-log: testscr rises monotonously in avginc, but is not bounded above. Is there a specification that satisfies both conditions: monotonicity and boundedness? Y = 0 e1 X (negative exponential growth curve) 0.8 0.7 1 - exp(-x) 0.9 1.0 curve(1-exp(-x),xlim=c(1,10)) x Estimate the parameters of Yi = 0 e1 Xi + ui or (using = 0 e2 ) 1 (Xi 2 ) Yi = 0 1 e + ui ## Compare this model to linear-log or polynomial model: Yi Yi 0 + 1 ln Xi + ui 0 + 1 Xi + 2 X2i + 3 X3i + ui 10 c Oliver Kirchkamp c Oliver Kirchkamp 160 ## 6 February 2015 09:49:38 Linearizing Yi = 0 1 e1 (Xi 2 ) ## + ui is not possible anymore. ## 5.1.9 Non-linear least squares Models which are linear in their parameters can be estimated using OLS. Models which are non-linear in one or more parameters can be estimated using non-linear methods (but cannot be estimated using OLS). min 0 ,1 ,2 n  X i=1 1 (Xi 2 ) Yi 0 1 e 2 ## (nest<-nls(testscr ~ b0 * (1 - exp(-1 * b1 * (avginc - b2))), start=c(b0=730,b1=0.1,b2=0),trace=TRUE)) 7485378 : 730.0 0.1 0.0 1392009 : 695.41660291 0.09260118 -8.24400488 233046.7 : 696.82401675 0.07926959 -17.01056369 98541.84 : 699.44123343 0.06495656 -25.58741750 69931.36 : 702.08095456 0.05753086 -31.61313399 67013.75 : 703.05049452 0.05552973 -33.72483709 66988.4 : 703.20479789 0.05525916 -33.98665686 66988.4 : 703.22077199 0.05523597 -34.00235084 66988.4 : 703.22210309 0.05523406 -34.00353793 Nonlinear regression model model: testscr ~ b0 * (1 - exp(-1 * b1 * (avginc - b2))) data: parent.frame() b0 b1 b2 703.22210 0.05523 -34.00354 residual sum-of-squares: 66988 Number of iterations to convergence: 8 Achieved convergence tolerance: 0.0000007268 summary(nest) ## Formula: testscr ~ b0 * (1 - exp(-1 * b1 * (avginc - b2))) Parameters: Estimate Std. Error t value Pr(>|t|) b0 703.222103 6.697451 104.998 < 2e-16 *** b1 0.055234 0.009101 6.069 0.00000000289 *** b2 -34.003538 5.676787 -5.990 0.00000000454 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## NON-LINEAR REGRESSION FUNCTIONS 161 ## Residual standard error: 12.67 on 417 degrees of freedom Number of iterations to convergence: 8 Achieved convergence tolerance: 0.0000007268 ## plot(testscr ~ avginc,main="district average income") lines(avginc[or],fitted(nest)[or],col="red",lwd=3) lines(avginc[or],fitted(estL)[or],col="blue",lwd=3) legend("bottomright",c("nls","log"),lwd=3,col=c("red","blue")) 660 640 nls log 620 testscr 680 700 ## district average income 10 20 30 40 50 avginc 5.2 Interactions Maybe the effect of group sizes on test scores depends on further circumstances. Maybe small groups sizes have a particularity large effect if groups have a lot of foreign students. testscr str depends on elpct. Generally: Y X1 depends on X2 . ## How can we include this interaction into a model? c Oliver Kirchkamp c Oliver Kirchkamp 162 ## 6 February 2015 09:49:38 ## First look at binary X, later consider continuous X. Example 1: testsrc = 1 str + 2 elpct + 0 + u in this model the effect of str is independent of elpct Example 2: lwage = 1 ed + 0 + u library(lattice) attach(Wages) summary(ed) Min. 1st Qu. 4.00 12.00 Median 12.00 ## Mean 3rd Qu. 12.85 16.00 Max. 17.00 lm(lwage ~ ed) Call: lm(formula = lwage ~ ed) Coefficients: (Intercept) 5.8388 ed 0.0652 Example 3: lwage = 1 college + 2 sex + 0 + u college=ed>16 lm(lwage ~ college + sex) Call: lm(formula = lwage ~ college + sex) Coefficients: (Intercept) collegeTRUE 6.2254 0.3340 sexmale 0.4626 ## 5.2.1 Interactions between binary variables lwage = 0 + 1 college + 2 sex + 3 sex college + u |{z} |{z} |{z} |{z} 6.21 0.55 0.49 0.24 ## NON-LINEAR REGRESSION FUNCTIONS 163 ## (est<- lm(lwage ~ college + sex + sex:college)) Call: lm(formula = lwage ~ college + sex + sex:college) Coefficients: (Intercept) 6.2057 collegeTRUE:sexmale -0.2412 collegeTRUE 0.5543 sexmale 0.4850 Instead of regression coefficients we can calculate mean values for the individual categories: mean(lwage[college==FALSE & sex=="female"]) [1] 6.205665 mean(lwage[college==TRUE & sex=="female"]) [1] 6.760007 mean(lwage[college==FALSE & sex=="male"]) [1] 6.690634 mean(lwage[college==TRUE & sex=="male"]) [1] 7.003751 mean(lwage) sex female 6.21 male 6.69 0 6.76 0 + 2 7.00 FALSE college TRUE 0 + 1 0 + 1 + 2 + 3 Effect of college education for women: 1 Effect of college education for men: 1 + 3 Histr<-str>=20 Hiel<-elpct>=10 table(Histr,Hiel) Hiel Histr FALSE TRUE FALSE 149 89 TRUE 79 103 c Oliver Kirchkamp c Oliver Kirchkamp 164 ## 6 February 2015 09:49:38 ## (est<- lm(testscr ~ Histr*Hiel)) Call: lm(formula = testscr ~ Histr * Hiel) Coefficients: (Intercept) 664.143 HistrTRUE -1.908 HielTRUE -18.163 HistrTRUE:HielTRUE -3.494 ## mean(testscr[Hiel==FALSE & Histr==FALSE]) [1] 664.1433 mean(testscr[Hiel==TRUE & Histr==FALSE]) [1] 645.9803 coef(est) %*% c(1,0,1,0) [,1] [1,] 645.9803 mean(testscr[Hiel==FALSE & Histr==TRUE]) [1] 662.2354 coef(est) %*% c(1,1,0,0) [,1] [1,] 662.2354 mean(testscr[Hiel==TRUE & Histr==TRUE]) [1] 640.5782 coef(est) %*% c(1,1,1,1) [,1] [1,] 640.5782 If we do not want to calculate mean values for the individual categories, we can leave that job to R: library(memisc) aggregate(mean(testscr)~Hiel+Histr) Hiel Histr mean(testscr) 1 FALSE FALSE 664.1433 3 TRUE FALSE 645.9803 ## NON-LINEAR REGRESSION FUNCTIONS 2 FALSE 6 TRUE TRUE TRUE 165 662.2354 640.5782 ## testscr = 0 + 1 Hiel + 2 Histr + 3 Histr Hiel + u |{z} |{z} |{z} |{z} 664.14 18.16 1.91 mean(testscr) 3.49 Histr FALSE 664.1433 TRUE 662.2354 0 645.9803 0 + 2 640.5782 0 + 1 0 + 1 + 2 + 3 FALSE Hiel TRUE ## 5.2.2 Interaction between a binary and a continuous variable attach(Wages) plot(lwage ~ ed) abline(lm(lwage ~ ed,subset=(sex=="female"))) abline(lm(lwage ~ ed,subset=(sex=="male")),col="red",lty=2) legend("topright",c("male","female"),lty=2:1,col=c("red","black")) 7 6 5 lwage male female 10 ed 12 14 16 c Oliver Kirchkamp c Oliver Kirchkamp 166 ## 6 February 2015 09:49:38 (lm(lwage ~ ed,subset=(sex=="female"))) Call: lm(formula = lwage ~ ed, subset = (sex == "female")) Coefficients: (Intercept) 5.04207 ed 0.09452 (lm(lwage ~ ed,subset=(sex=="male"))) Call: lm(formula = lwage ~ ed, subset = (sex == "male")) Coefficients: (Intercept) 5.93060 ed 0.06221 ## (est<- lm(lwage ~ sex*ed)) Call: lm(formula = lwage ~ sex * ed) Coefficients: (Intercept) 5.04207 sexmale 0.88854 ed 0.09452 sexmale:ed -0.03231 detach(Wages) ## lwage = 0 + 1 ed + 2 sex + 3 sex ed + u 3 = 0: Regression lines are parallel 2 = 0: Regression lines have the same axis intercept Does the effect of the groups sizes on test scores depend on the ratio of native speakers? Hiel=elpct>=10 plot(testscr ~ str) abline(lm(testscr ~ str,subset=(Hiel==FALSE))) abline(lm(testscr ~ str,subset=(Hiel==TRUE)),col="red") legend("topright",c("few el","many el"),lwd=3,col=c("black","red")) 167 660 640 620 testscr 680 few el many el 14 16 18 20 str (lm(testscr ~ str,subset=(Hiel==FALSE))) Call: lm(formula = testscr ~ str, subset = (Hiel == FALSE)) Coefficients: (Intercept) 682.2458 str -0.9685 (lm(testscr ~ str,subset=(Hiel==TRUE))) Call: lm(formula = testscr ~ str, subset = (Hiel == TRUE)) Coefficients: (Intercept) 687.885 str -2.245 lm(testscr ~ str*Hiel) Call: lm(formula = testscr ~ str * Hiel) Coefficients: (Intercept) 682.2458 str -0.9685 HielTRUE 5.6391 str:HielTRUE -1.2766 22 24 26 c Oliver Kirchkamp ## NON-LINEAR REGRESSION FUNCTIONS 700 c Oliver Kirchkamp 168 ## 6 February 2015 09:49:38 ## (est<- lm(testscr ~ str*Hiel)) Call: lm(formula = testscr ~ str * Hiel) Coefficients: (Intercept) 682.2458 testscr = str -0.9685 0 |{z} 682.2458 HielTRUE 5.6391 str:HielTRUE -1.2766 ## + 1 Hiel + 2 str + 3 str Hiel + u |{z} |{z} |{z} 5.6391 0.9685 1.2766 ## Effect of a change in group size str: if elpct<10: -0.9685 if elpct10: -2.245 Are the two lines parallel? linearHypothesis(est,"str:HielTRUE=0",vcov=hccm) Linear hypothesis test Hypothesis: str:HielTRUE = 0 Model 1: restricted model Model 2: testscr ~ str * Hiel Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 417 2 416 1 1.6778 0.1959 ## Are the two lines identical?? linearHypothesis(est,c("str:HielTRUE=0","HielTRUE=0") ,vcov=hccm) Linear hypothesis test Hypothesis: str:HielTRUE = 0 HielTRUE = 0 Model 1: restricted model Model 2: testscr ~ str * Hiel Note: Coefficient covariance matrix supplied. ## NON-LINEAR REGRESSION FUNCTIONS 169 Res.Df Df F Pr(>F) 1 418 2 416 2 88.806 < 2.2e-16 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## Do both lines have the same axis intercept? linearHypothesis(est,"HielTRUE=0",vcov=hccm) Linear hypothesis test Hypothesis: HielTRUE = 0 Model 1: restricted model Model 2: testscr ~ str * Hiel Note: Coefficient covariance matrix supplied. 1 2 Res.Df Df F Pr(>F) 417 416 1 0.0804 0.7769 ## 5.2.3 Application: Gender gap exp wks bluecol ind south smsa married sex union ed black lwage ## years of full-time work experience weeks worked blue collar ? works in a manufacturing industry ? resides in the south ? resides in a standard metropolitan statistical area ? married ? a factor with levels (male,female) individuals wage set by a union contract ? years of education is the individual black ? logarithm of wage ifelse returns either the second or the third argument. Which argument is returned depends on the first arguments. as.data.frame converts the argument (e.g. a matrix) into a data frame. Here, this is helpful, because the returned structure mixed numbers and strings. colnames provides access to column names. attach(Wages) est1 <- lm(lwage est2 <- lm(lwage est3 <- lm(lwage est4 <- lm(lwage ~ ~ ~ ~ ed) ed + sex) ed * sex) ed * sex + exp + black + union + south + wks + married + smsa + ind) c Oliver Kirchkamp c Oliver Kirchkamp 170 ## 6 February 2015 09:49:38 mtable("(1)"=est1,"(2)"=est2,"(3)"=est3,"(4)"=est4,summary.stats=c("R-squared","N")) (1) 5.839 (Intercept) (0.032) 0.065 ed (0.002) sex: male/female (2) 5.419 (0.034) 0.065 (0.002) 0.474 (0.018) ed sex: male/female (3) 5.042 (0.087) 0.095 (0.007) 0.889 (0.093) 0.032 (0.007) exp black: yes/no (4) 4.666 (0.107) 0.086 (0.006) 0.552 (0.092) 0.016 (0.007) 0.011 (0.001) 0.168 (0.022) 0.063 (0.012) 0.055 (0.013) 0.005 (0.001) 0.066 union: yes/no south: yes/no wks married: yes/no (0.022) 0.161 (0.012) 0.043 (0.012) smsa: yes/no ind R-squared N 0.155 4165 0.260 4165 0.264 4165 ## 5.2.4 Interaction between two continuous variables Example lwage = 0 + 1 ed + 2 exp + 3 ed exp + u est1 <- lm(lwage ~ ed + exp) est2 <- lm(lwage ~ ed * exp) mtable("(1)"=est1,"(2)"=est2,summary.stats=c("R-squared","N")) (Intercept) ed exp (1) (2) 5.436 (0.037) 0.076 (0.002) 0.013 (0.001) 5.446 (0.075) 0.076 (0.005) 0.013 (0.003) 0.000 (0.000) ed exp R-squared N 0.247 4165 0.247 4165 0.387 4165 ## NON-LINEAR REGRESSION FUNCTIONS 171 ## Yi = 0 + 1 X1i + 2 X2i + 3 (X1i X2i ) + ui Marginal Effects: Yi = 1 + 3 X2 X1 Yi = 2 + 3 X1 X2 ## What happens, if X1 changes by X1 and X2 changes by X2 ? Y = = = 0 + 1 (X1 + X1 ) + 2 (X 2 + X2 ) + 3 ((X1 + X1 ) (X2 + X2 ))   0 + 1 X1 + 2 X 2 + 3 (X1 X2 ) 1 X1 + 2 X2 + 3 (X1 X2 + X1 X2 + X2 X1 + X1 X2 ) 3 X1 X2 1 X1 + 2 X2 + 3 X1 X2 + 3 X2 X1 + 3 X1 X2 (1 + 3 X2 )X1 + (2 + 3 X1 )X2 + 3 X1 X2 ## testscr = 0 + 1 str + 2 elpct + |{z} |{z} |{z} 686.34 1.1170 0.6729 3 |{z} 0.001162 attach(Caschool) summaryR(lm(testscr ~ str * elpct )) Call: lm(formula = testscr ~ str * elpct) Residuals: Min 1Q -48.836 -10.226 Median -0.343 3Q 9.796 Max 43.447 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 686.338525 11.937855 57.493 <2e-16 *** str -1.117018 0.596515 -1.873 0.0618 . elpct -0.672911 0.386538 -1.741 0.0824 . str:elpct 0.001162 0.019158 0.061 0.9517 --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 14.48 on 416 degrees of freedom Multiple R-squared: 0.4264,Adjusted R-squared: 0.4223 F-statistic: 150.3 on 3 and 416 DF, p-value: < 2.2e-16 str elpct + u c Oliver Kirchkamp c Oliver Kirchkamp 172 ## 6 February 2015 09:49:38 What is the effect of the group size str for a group with median share of foreigners? median calculates the median of a vector. quantile calculates quantiles of vector. The smallest observation equals a quantile of 0, the largest observation equals a quantile of 1. mean calculates the arithmetic mean. median(elpct) [1] 8.777634 est<- lm(testscr ~ str * elpct) coef(est) (Intercept) 686.338524629 str -1.117018345 elpct -0.672911392 str:elpct 0.001161752 ## (eff1=coef(est)["str"] + coef(est)["str:elpct"] * median(elpct)) str -1.106821 What does this effect look like for a group with a share of foreigners in the 75% quantile? quantile(elpct,.5) 50% 8.777634 quantile(elpct,.75) 75% 22.97 (eff2=coef(est)["str"] + coef(est)["str:elpct"] * quantile(elpct,.75)) str -1.090333 ## Is the interaction term significant? linearHypothesis(est,"str:elpct=0",vcov=hccm) Linear hypothesis test Hypothesis: str:elpct = 0 Model 1: restricted model Model 2: testscr ~ str * elpct ## NON-LINEAR REGRESSION FUNCTIONS 173 ## Note: Coefficient covariance matrix supplied. 1 2 c Oliver Kirchkamp Res.Df Df F Pr(>F) 417 416 1 0.0037 0.9517 ## 5.3 Non-linear interaction terms Hiel=elpct>=10 est1 <- lm(testscr est2 <- lm(testscr est3 <- lm(testscr est4 <- lm(testscr est5 <- lm(testscr est6 <- lm(testscr est7 <- lm(testscr ~ ~ ~ ~ ~ ~ ~ ## str + elpct + mealpct) str + elpct + mealpct + log(avginc)) str * Hiel ) str * Hiel + mealpct + log(avginc)) str + I(str^2) + I(str^3) + Hiel + mealpct + log(avginc)) (str + I(str^2) + I(str^3))*Hiel + mealpct + log(avginc)) str + I(str^2) + I(str^3) + elpct + mealpct + log(avginc)) mtable("(1)"=est1,"(2)"=est2,"(3)"=est3,"(4)"=est4,"(5)"=est5,"(6)"=est6,"(7)"=est7,summary.st (1) (Intercept) str elpct mealpct (2) 700.150 658.552 (0.274) 0.122 (0.033) 0.547 (0.024) (0.261) 0.176 (0.034) 0.398 (0.034) 11.569 (1.841) (5.641) 0.998 log(avginc) (8.749) 0.734 Hiel (3) 682.246 (12.071) 0.968 (0.599) 5.639 (19.889) 1.277 (0.986) str Hiel (4) 653.666 (10.053) 0.531 (0.350) 0.411 (0.029) 12.124 (1.823) 5.498 (10.012) 0.578 (0.507) (5) 252.051 (179.724) 64.339 (27.295) 0.420 (0.029) 11.748 (1.799) 5.474 (1.046) 3.424 (1.373) 0.059 (0.023) str2 str3 str2 Hiel str3 Hiel R-squared N 0.775 420 0.796 420 0.310 420 0.797 420 0.801 420 It looks like there is a non-linear effect of str on testscr. Lets take another look at model 6 and lets estimate the marginal effect of str. estC <- coef(est6) mEffstr <- function (str,Hiel) { estC %*% c(0,1,2*str,3*str^2,0,0,0,Hiel,Hiel*2*str,Hiel*3*str^2) } mEffstr(20,0) [,1] [1,] -1.622543 ## 6 February 2015 09:49:38 mEffstr(20,1) [,1] [1,] -0.7771982 sapply applies a function to every element of a vector. Within an equation I() prevents a term from being interpreted as an interaction. -2 noHitest<-sapply(str,function(x) {mEffstr(x,0)}) Hitest<-sapply(str,function(x) {mEffstr(x,1)}) plot(noHitest ~ str,ylim=c(-6,6)) points(Hitest ~ str,col="red") abline(h=0) legend("bottomright",c("noHitest","Hitest"),pch=1,col=c("black","red")) noHitest Hitest -6 -4 noHitest c Oliver Kirchkamp 174 14 16 18 20 22 24 26 str ## 5.3.1 Non-linear interaction terms linear interaction between str and elpct or str and Hiel: no significant effect significant non-linear effect of str on testscr significant non-linear interaction of str and Hiel Effect of a change in group sizes: It depends ## NON-LINEAR REGRESSION FUNCTIONS 175 5.3.2 Summary Non-linear transformations (log, polynomials) enable us to write non-linear models as multiple regressions. Estimations work in the same way as for OLS. We have to take the transformations into account, when we interpret the coefficients. A large number of non-linear specifications is possible. Consider. . . What non-linear effects are we interested in? What makes sense with respect to the problem we are trying to solve? 5.4 Exercises 1. Polynomial Regression I Give the formula of a quadratic regression. How do you interpret the coefficients and the dependent variable of this quadratic regression? Draw a graph illustrating the impact of X1 on Y in a quadratic regression. Give some real world examples where the use of a quadratic regression function could be useful. 2. Polynomial Regression II You want to estimate the time that it takes the members of your running team to run 1km. You have information on age, gender, and whether the members are generally spoken in a good physical condition. Suggest a model to estimate the time it takes your club members to run 1km. Which sign do you expect for each of the coefficients? Draw a graph that illustrates the relation between time needed to run 1km and age. 3. Polynomial Regression III Use the data set Bwages of the library Ecdat in R on wages in Belgium. c Oliver Kirchkamp c Oliver Kirchkamp 176 ## 6 February 2015 09:49:38 Estimate the effect of years of experience (exper) and level of education (educ) on hourly wages in Euro (wage). Do you think it makes sense to use a non-linear model? Why? Which regression model would you use? Note the regression function. Which signs do you expect the coefficients to take? Estimate the model in R. Interpret the output. How should the relationship between wage and experience look like graphically? Verify this with a graph using R. 4. Logarithmic Regression I What is a logarithm? Where do logarithms occur in nature or science? Give some examples for the use of logarithmic functions in economic contexts. 5. Logarithmic Regression II Which different types of logarithmic regressions do you know? Give the formulas for each of them. How do you interpret the coefficients of the different logarithmic models? Give economic examples for each of them. 6. Logarithmic Regression III Use the data set Wages of the library Ecdat in R on wages in the United States. Estimate the effect of years of experience (exp), whether the employee has a blue collar job (bluecol), whether the employee lives in a standard metropolitan area (smsa), gender (sex), years of education (ed), and whether the employee is black (black) on the logarithm of wage (lwage). Do you think it makes sense to use this model? Would you rather suggest a different model? Which one would you suggest? Estimate both models in R. Interpret and compare the outputs. Visualize the relationship of experience and wage with a graph. Does this graph support the choice of your model? 7. Polynomial Regression IV Use the data set Bwages of the library Ecdat in R on wages in Belgium. ## NON-LINEAR REGRESSION FUNCTIONS 177 You want to estimate the wage increase per year of job experience (exper). You use the level of education (educ) as an additional control. You do not have information on wage increases, but only on absolute wages (wage). Solve this problem using R. 8. Linear and Non-linear Regressions I You want to estimate different models for the following problem sets of one dependent and one independent variable (assume that the models are otherwise specified correctly, i.e. all other important variables are included, no high correlation between two independent variables). Name the appropriate model (linear, quadratic, log-lin, lin-log, or log-log) for each of the problems and explain your choice (exercise adapted from Studenmunds "Using econometrics", chapter 7, exercise 2). Dependent variable: time it takes to walk from A to B independent variable: distance from A to B Dependent variable: total amount spent on food independent variable: income Dependent variable: monthly wage independent variable: age Dependent variable: number of ski lift tickets sold independent variable: whether there is snow Dependent variable: GDP growth rate independent variable: years passed since beginning of transformation to an industrialized country Dependent variable: CO2 emission independent variable: kilometers driven with car Dependent variable: hourly wage independent variable: number of years of job experience Dependent variable: physical ability independent variable: age 9. Linear and Non-linear Regressions II How do you decide which model (linear model or one of the nonlinear models) to use? 10. Interaction terms I What is an interaction term? ## How do you construct an interaction term? c Oliver Kirchkamp c Oliver Kirchkamp 178 ## 6 February 2015 09:49:38 Write down a regression function including an interaction term. How do you interpret interaction terms? Why do you have to include not only the interaction of two variables into your regression function, but also each of the individual variables? What would happen if you would not include the individual variables? Give some examples of situations where you think that interactions play a role. 11. Exam BW 24.1, 26.5.2010, exercise 19 A group of athletes prepares for a competition. You have the following information about the athletes: age (A), gender (G; 1 if female, 0 otherwise), daily training (T; 1 if true, 0 otherwise), healthy diet (E; 1 if true, 0 otherwise), and ranking list scores (R). Age and gender are not correlated with the other variables. You assume that athletes only do especially well in the ranking list if they practice daily and if they follow a healthy diet; a daily training is only effective in combination with a healthy diet. What would be possible specifications of your model to test your assumption? (Here we dont ask for the "best" specification.) a) R = 0 + 1 T + 2 E + u b) R = 0 + 1 A + 2 G + 3 T + 4 E + 5 T E + u c) R = 0 + 1 T + 2 E + 3 T E + u d) T E = 0 + 1 R + u e) R = 0 + 1 A + 2 G + 3 T + 4 E + u ## 12. Interaction terms II You are a teacher of a cross country skiing school. Each year you teach students who have never done cross country skiing before the free style technique (also called skating technique). You realize that your students differ in how fast they learn this new technique. You think that the two things that matter are whether a student knows how to ice skate and whether a student is familiar with down hill skiing. You estimate the following model for the number of days it takes them to learn the new technique so well that they are able to do their first tours: ^ i = 8 1 iceskating 2.5 alpineskiing 1.5 iceskating alpineskiing days How many day needs a person . . . to learn the skating technique with cross country skis? who has never done any ice skating nor downhill skiing who has never done any ice skating, but some downhill skiing ## NON-LINEAR REGRESSION FUNCTIONS 179 who knows how to ice skate, but has never done any downhill skiing who is familiar with both ice skating and downhill skiing 13. Interaction terms in R I You would like to estimate students school achievements measured in test scores (testscore) in a developing country. You think that gender (female) and educational background of the parents (eduparents; measured in years) have an impact. In particular, you think that poor people cannot afford that their children spend all their time on learning, because they also need their help to earn money. This might be especially true for girls, because their parents might think that education is less important for them. How would you test this assumption in R? Which if the following commands is correct (multiple correct answers possible)? a) summary(lm(testscore female+eduparents+eduparents*female)) b) summary(lm(testscore=female+eduparents+eduparents:female)) c) summary(lm(testscore female*eduparents)) d) summary(lm(testscore female+eduparents+eduparents:female)) e) eduparentsfemale <- eduparents*female summary(lm(testscore female+eduparents+eduparentsfemale)) f) summary(lm(testscore <- eduparents*female)) g) summary(lm(testscore eduparents:female)) 14. Interaction terms in R II Use the data set RetSchool of the library Ecdat in R on returns to schooling in the United States. You are interested whether people considered as "black" (black) and people living in the south (south76) earn less than others. Further, you are interested whether Afro Americans (black) who live in the south (south76) earn even less. You control for years of experience (exp76) and grades (grade76). Solve this problem using R. 15. Interaction terms in R III Use the data set DoctorContacts of the library Ecdat in R on contacts to medical doctors. How do gender (sex), age (age), income (linc), the education of the head of the household (educdec), health (health), physical limitations (physlim), and the number of chronic diseases (ndisease) effect the c Oliver Kirchkamp c Oliver Kirchkamp 180 ## 6 February 2015 09:49:38 number of visits to a medical doctor? In which direction do you expect the effects to go? Is there an interaction between gender and physical limitations? ## 16. Non-linear functions Which non-linear functions do you know? List them. Note the formula for each of them. Give examples for each of them. ## 6 Evaluating multiple regressions 6.1 Introduction Strengths of multiple regression models Problems e.g.: What can we really say about the effect of group sizes on testscores? 6.1.1 Can we evaluate multiple regressions systematically? Advantages (in comparison to the simple regression model): Marginal effects X Y can be estimated. Omitted variable bias can sometimes be prevented (if the variable can be measured) Non-linear effects (which depend on X) can be analysed. still: OLS can be a biased estimator of the true effect. ## 6.1.2 Internal and external validity Internal validity statistical inferences about causal dependencies apply to the population / to the model we are studying. The estimator is unbiased and consistent. ## Hypothesis tests have the desired levels of significance and confidence intervals have the desired levels of confidence. External validity statistical inferences about causal dependencies can be carried over to other populations and other circumstances. ## EVALUATING MULTIPLE REGRESSIONS 181 ## To what extent can our results regarding Californian schools be generalized? Different populations California 1998/99 Massachusetts 1997/98 Mexico 1997/98 ## e.g. when testing drugs: Generalization from rats to humans. Different circumstances ## Legal circumstances of subsidy programs Different handling of bilingual education Different characteristics of teachers ## Testing external validity by comparing different populations and circumstances. ## 6.2 Internal validity - Problems Omitted variable bias Incorrect functional form Errors in the variables Selection bias Simultaneous causality Heteroscedasticity and correlation of error terms: E(ui |Xi ) 6= 0, OLS is biased and inconsistent. ## 6.2.1 Omitted Variable Bias ## A variable has an effect on Y A variable is correlated with one of the explaining variables X E(b1 ) = 1 + (X1 X1 )1 X1 X2 2 Remedy: If we can measure the variable add it to the regression. Identifying the important coefficients (a priori) c Oliver Kirchkamp c Oliver Kirchkamp 182 ## 6 February 2015 09:49:38 Searching actively for sources of omitted variable bias (a priori) base specification ## Extending the base specification by adding variables Test whether the estimated coefficients are zero. ## Do the coefficients which we have estimated before change when we add another variable? Overview of the different estimated specifications. data(Caschool) attach(Caschool) est1 <- lm(testscr est2 <- lm(testscr est3 <- lm(testscr est4 <- lm(testscr est5 <- lm(testscr ~ ~ ~ ~ ~ str) str + str + str + str + elpct) elpct + mealpct) elpct + calwpct) elpct + mealpct + calwpct) mtable("(1)"=est1,"(2)"=est2,"(3)"=est3,"(4)"=est4,"(5)"=est5, summary.stats=c("R-squared","N")) (1) (Intercept) 698.933 (10.461) 2.280 (0.524) str elpct (2) (3) 700.150 697.999 (0.031) (0.033) 0.547 (0.024) (0.030) (8.812) 1.101 (0.437) 0.650 mealpct (5.641) 0.998 (0.274) 0.122 calwpct R-squared N 0.051 420 (4) 686.032 0.426 420 0.775 420 (7.006) 1.308 (0.343) 0.488 0.790 (0.070) 0.629 420 (5) 700.392 (5.615) 1.014 (0.273) 0.130 (0.037) 0.529 (0.039) 0.048 (0.062) 0.775 420 ## If we cannot measure the variable: If the variable does not change over time Regression using panel data If the variable is correlated with another variable which we can measure Regression using instruments ## Randomized controlled experiment to eliminate the effect on average (if X is random, it is in particular independent of u, hence E(u|X = x) = 0) 6.2.2 Incorrect specification of the functional form Including interaction terms. ## EVALUATING MULTIPLE REGRESSIONS 183 Logarithmic/polynomial specification In case of a discrete (e.g. binary) dependent variable: Extending multiple regression models (probit, logit) 6.2.3 Errors in the variables What, if we cannot measure our X precisely: Typos Imprecise recollections (When did you start working on this project?) Imprecise questions (What was your income last year?) Conscious lying (Alcohol intake / sexual preferences) ## Example: Let the true specification be Yi = 0 + 1 Xi + ui for this specification it holds that E(ui |Xi ) = 0. Let Xi be the true value of X and X i the imprecisely measured value of X. We estimate Yi = = = 0 + 1 Xi + ui  0 + 1 X i + 1 (Xi X i ) + ui 0 + 1 X i + vi where vi = 1 (Xi X i ) + ui . If (Xi X i ) is correlated with X i , then X i is corre^ 1 is biased and inconsistent. lated with vi , and Example: Let X i = Xi + wi , where wi is a random variable with a mean value of zero and a variance of 2w and wi is uncorrelated with Xi and ui vi 1 (Xi X i ) + ui 1 (Xi Xi wi ) + ui 1 wi + ui ## According to the assumption cov(Xi , ui ) = cov(X i , wi ) = = hence cov(X i , vi ) = = 0 cov(Xi + wi , wi ) 2w 1 cov(X i , wi ) + cov(X i , ui ) 1 2w c Oliver Kirchkamp c Oliver Kirchkamp 184 ## 6 February 2015 09:49:38 ^1 Recall = p 1 + In our example ^1 Since 2X 2X +2w 1 + 1 Pn ## i=1 (Xi X)ui n P n 1 2 i=1 (Xi X) n cov(ui , Xi ) 2X 1 + cov(ui , Xi ) 2X 2X + 2w 2w 2X 1 2w = 1 1 2 2X + 2w 2X + 2w X + 2w ## ^ 1 is biased towards zero. < 1, ## Extreme case 1: wi is large enough, so that X i contains virtually no information ^ 1 0 Extreme case 2: wi = 0: ^ 1 1 ## Remedy in case of errors in the variables Measure X more precisely. ## We can estimate an instrumental variable regression, if there is a variable (instrument) which at the same time is correlated with Xi , but not with ui . Alternatively: Develop a model of the measurement error and use it to correct the error. (e.g. on the basis of 2w and 2X ) 6.2.4 Sample selection bias What, if the selection of the data (sampling) is affected by the dependent variable? Random sampling helps to avoid sampling bias. Example: Explaining employees wages by the education. Unemployed individuals are not included in the sample. Example: Performance of equity funds. Draw 100 equity funds today and observe their average performance over the past 10 years. ## Performance will be overestimated. ## EVALUATING MULTIPLE REGRESSIONS 185 Draw 100 equity funds ten years ago and observe their average performance over the past 10 years. Equity funds do not beat the market. ## High performance in the past does not explain high performance in the future. 6.2.5 Simultaneous causality Causality runs in both directions, from X Y, but also from Y X. Example: What if the government gives additional funds to schools with low test scores, so that they can hire additional teachers? testscr + testscr str str Yi 0 + 1 Xi + ui Xi 0 + 1 Yi + vi ## Problem: Xi is correlated with the error term ui Remedy: Instrumental variable regressions Randomized controlled experiment 6.2.6 Heteroscedasticity and correlation of error terms Heteroscedasticity-consistent standard errors Correlation of error terms across observations ## Does not occur, if observations are drawn randomly Does happen in panels (the same observational unit is drawn many times over time) and in time series. serial correlation Geographical effect OLS is still consistent, but the estimators for OLS standard errors are inconsistent. Alternative formulae for standard errors of panel data, time series data and data which has correlated groups. c Oliver Kirchkamp c Oliver Kirchkamp 186 ## 6 February 2015 09:49:38 ## 6.3 OLS and prediction ^ Unbiased estimation of . reduced by two units. ## Example: What happens to testscr, if str is ## b What is a likely value of testscr in a district Unbiased estimation of Y. with str=20? testscr = 698.933 2.2798 str We know that the coefficient of str is biased. However, for prediction purposes, this is irrelevant. Here, R2 is important. Omitted variable bias is not a problem anymore. The interpretation of the coefficients is not important. What we care about, is a good "fit". External validity is important: The model, which we estimated with data from the past, has to be valid for the future. ## 6.4 Comparison of Caschool and MCAS distcod disctric code county county district district grspan grade span of district enrltot total enrollment teachers number of teachers calwpct percent qualifying for CalWorks mealpct percent qualifying for reduced-price lunch computer number of computers testscr average test score (read.scr+math.scr)/2 compstu computer per student expnstu expenditure per student str student teacher ratio avginc district average income elpct percent of English learners readscr average reading score mathscr average math score Source: California Department of Education code district code (numerical) municipa municipality (name) district district name regday spending per pupil, regular specneed spending per pupil, special needs bilingua spending per pupil, bilingual occupday spending per pupil, occupational totday spending per pupil, total spc students per computer speced special education students lnchpct eligible for free or reduced price lunch tchratio students per teacher percap per capita income totsc4 4th grade score (math+english+science) totsc8 8th grade score (math+english+science) avgsalary average teacher salary pctel percent english learners Source: Massachusetts Comprehensive Assessment System (MCAS), Massachusetts Department of Education, 1990 U.S. Census Datensatz$variable denotes a variable (column) in a data set. Alternatively we can write Datensatz[,"variabl
or Datensatz[,c("variable1","variable2")] for more than one column.
data(MCAS)
MCAS<-within(MCAS,{
type<-"MA"
str<-tchratio
testscr<-totsc4
elpct<-pctel
avginc<-percap
mealpct<-lnchpct})
Caschool$type<-"CA" head(Caschool[,c("type","str","testscr","elpct","avginc","mealpct")]) 1 2 3 4 5 6 ## EVALUATING MULTIPLE REGRESSIONS type CA CA CA CA CA CA str testscr elpct avginc 17.88991 690.80 0.000000 22.690001 21.52466 661.20 4.583333 9.824000 18.69723 643.60 30.000002 8.978000 17.35714 647.70 0.000000 8.978000 18.67133 640.85 13.857677 9.080333 21.40625 605.55 12.408759 10.415000 187 mealpct 2.0408 47.9167 76.3226 77.0492 78.4270 86.9565 head(MCAS[,c("type","str","testscr","elpct","avginc","mealpct")]) 1 2 3 4 5 6 type MA MA MA MA MA MA str testscr elpct avginc mealpct 19.0 714 0.0000000 16.379 11.8 22.6 731 1.2461059 25.792 2.5 19.3 704 0.0000000 14.040 14.1 17.9 704 0.3225806 16.111 12.1 17.5 701 0.0000000 15.423 17.4 15.7 714 3.9215686 11.144 26.8 merge merges two data sets. Let one data contain the matriculation numbers and grades of students and let another data set contain matriculation numbers and names. merge assigns the correct matriculation numbers, grades and names to each other. If the two data sets do not have anything merge can also append one data set to another. cama=merge(Caschool[,c("type","str","testscr","elpct","avginc", "mealpct")],MCAS[,c("type","str","testscr","elpct","avginc", "mealpct")],all=TRUE) The new dataframe cama contains now data from both regions, CA and MA. First comes the CA data, followed by the MA data. head(cama) 1 2 3 4 5 6 type CA CA CA CA CA CA str testscr elpct avginc mealpct 14.00000 635.60 0.000000 10.656 68.8235 14.20176 656.50 0.000000 13.712 20.0000 14.54214 695.30 3.765690 35.342 0.0000 14.70588 666.85 2.500000 11.826 53.5032 15.13898 698.25 2.807284 35.810 0.0000 15.22436 646.40 0.000000 10.268 76.2774 tail(cama) 635 636 637 638 639 640 type MA MA MA MA MA MA str testscr elpct 21.9 691 2.816901 22.0 706 0.000000 22.0 711 0.000000 22.6 731 1.246106 23.5 699 0.000000 27.0 664 10.798017 avginc mealpct 15.905 27.1 14.471 18.3 15.603 12.4 25.792 2.5 16.189 6.8 15.581 70.0 aggregate can be used to split a data set into parts and apply a function to each of the parts. c Oliver Kirchkamp ## 6 February 2015 09:49:38 aggregate(cama[,2:6],list(cama$type),mean)
Group.1
str testscr
elpct
avginc mealpct
1
CA 19.64043 654.1565 15.768155 15.31659 44.70524
2
MA 17.34409 709.8273 1.117676 18.74676 15.31591
aggregate(cama[,2:6],list(cama$type),sd) 1 2 Group.1 str testscr elpct avginc mealpct CA 1.891812 19.05335 18.28593 7.225890 27.12338 MA 2.276666 15.12647 2.90094 5.807637 15.06007 aggregate(cama[,2:6],list(cama$type),length)
Group.1 str testscr elpct avginc mealpct
1
CA 420
420
420
420
420
2
MA 220
220
220
220
220
subset selects a subset of a data set. If a function supports the parameter data, we can supply it with an
appropriate subset. Many function do also have a parameter subset which selects a subset directly. ylim
defines the scale of the y-axis and pch defines, which symbol is to be used to depict a point.

650

700

750

attach(cama)
plot(testscr ~ avginc,subset=(type=="MA"),col="blue",pch=3,ylim=c(600,750))
points(testscr ~ avginc,subset=(type=="CA"),col="red")
legend("bottomright",c("MASS","CA"),pch=c(3,1),col=c("blue","red"))

MASS
CA

600

testscr

c Oliver Kirchkamp

188

10

20

30

avginc

40

189

## We write a small function which calculates a number of plots of this kind.

myPlot <- function (var) {
plot(testscr ~ var,subset=(type=="MA"),ylim=c(600,750),
col="blue",pch=3)
points(testscr ~ var,subset=(type=="CA"),col="red",pch=1)
legend("bottomright",c("MASS","CA"),pch=c(3,1),col=c("blue","red"))
}

10

20

30

700

testscr

700

MASS
CA

600

MASS
CA

600

testscr

myPlot(avginc)
myPlot(str)
myPlot(elpct)
myPlot(mealpct)

40

15

10

15

20

700

testscr

700

MASS
CA

600

MASS
CA
0

25

20

var

## Test scores and income in Massachusetts

(estC<- lm(testscr ~ avginc,data=subset(cama,type=="CA")))

Call:
lm(formula = testscr ~ avginc, data = subset(cama, type == "CA"))
Coefficients:
(Intercept)
625.384

avginc
1.879

25

var

600

testscr

var

20

40

var

60

c Oliver Kirchkamp

## (estM<- lm(testscr ~ avginc,data=subset(cama,type=="MA")))

Call:
lm(formula = testscr ~ avginc, data = subset(cama, type == "MA"))
Coefficients:
(Intercept)
679.387

avginc
1.624

650

700

750

myPlot(avginc)
abline(estC,col="red")
abline(estM,col="blue")

MASS
CA

600

testscr

c Oliver Kirchkamp

190

10

20

30

40

var

If a function has the ... parameter in its definition, we can supply additional parameters when calling
the function. These additional parameters are substituted when we use ... within the function.
ePlot <- function(model,data,...) {
est<- lm(model,data)
stdev<-sqrt(diag(hccm(est)))
pvalue<-round(2*pnorm(-abs(coef(est)/stdev)),4)
stars<-ifelse(pvalue<.001,"***",ifelse(pvalue<.01,"**",
ifelse(pvalue<.05,"*",ifelse(pvalue<.1,".",""))))
a<-as.data.frame(cbind(coef(est),stdev,pvalue))
a$stars=stars ## EVALUATING MULTIPLE REGRESSIONS colnames(a)[1]="beta" print(a,digits=3) sum<-summary(est) cat("R2= ",round(sum$r.squared,2),"\n")
or<-order(data$avginc) if (substr(model[2],1,4)=="log(") { lines(data$avginc[or],exp(fitted(est)[or]),...)
} else
lines(data$avginc[or],fitted(est)[or],...) est } myPlot(avginc) est<-ePlot(testscr ~ avginc + I(avginc^2),data=subset(cama,type=="CA")) beta stdev pvalue stars (Intercept) 607.3017 2.92422 0 *** avginc 3.8510 0.27110 0 *** I(avginc^2) -0.0423 0.00488 0 *** R2= 0.56 est<-ePlot(testscr ~ avginc + I(avginc^2),data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 638.3711 8.0401 0 *** avginc 5.4703 0.6893 0 *** I(avginc^2) -0.0808 0.0136 0 *** R2= 0.48 est<-ePlot(testscr ~ avginc + I(avginc^2)+ I(avginc^3), data=subset(cama,type=="CA"),col="red") beta (Intercept) 600.078985 avginc 5.018677 I(avginc^2) -0.095805 I(avginc^3) 0.000685 R2= 0.56 stdev 5.462310 0.787290 0.034052 0.000437 pvalue stars 0.0000 *** 0.0000 *** 0.0049 ** 0.1167 ## est<-ePlot(testscr ~ avginc + I(avginc^2)+ I(avginc^3), data=subset(cama,type=="MA"),col="red") beta stdev pvalue stars (Intercept) 600.39853 26.96057 0.0000 *** avginc 10.63538 3.63075 0.0034 ** I(avginc^2) -0.29689 0.15614 0.0572 . I(avginc^3) 0.00276 0.00214 0.1968 R2= 0.49 est<-ePlot(testscr ~ log(avginc),data=subset(cama,type=="CA"), col="blue") 191 c Oliver Kirchkamp c Oliver Kirchkamp 192 ## 6 February 2015 09:49:38 ## beta stdev pvalue stars (Intercept) 557.8 3.86 0 *** log(avginc) 36.4 1.41 0 *** R2= 0.56 est<-ePlot(testscr ~ log(avginc),data=subset(cama,type=="MA"), col="blue") beta stdev pvalue stars (Intercept) 600.8 9.36 0 *** log(avginc) 37.7 3.17 0 *** R2= 0.46 est<-ePlot(log(testscr) ~ log(avginc),data=subset(cama,type=="CA"), col="green") beta stdev pvalue stars (Intercept) 6.3363 0.00596 0 *** log(avginc) 0.0554 0.00216 0 *** R2= 0.56 est<-ePlot(log(testscr) ~ log(avginc),data=subset(cama,type=="MA"), col="green") beta stdev pvalue stars (Intercept) 6.4107 0.01343 0 *** log(avginc) 0.0533 0.00454 0 *** R2= 0.46 est<-ePlot(log(testscr) ~ avginc,data=subset(cama,type=="CA"), col="yellow") beta stdev pvalue stars (Intercept) 6.43936 0.002987 0 *** avginc 0.00284 0.000183 0 *** R2= 0.5 options(scipen=5) est<-ePlot(log(testscr) ~ avginc,data=subset(cama,type=="MA"), col="yellow") beta stdev pvalue stars (Intercept) 6.52186 0.005388 0 *** avginc 0.00229 0.000276 0 *** R2= 0.38 193 700 650 testscr 600 MASS CA 10 20 30 var Multiple regression: myPlot(avginc) est<-ePlot(testscr ~ str,data=subset(cama,type=="CA")) beta stdev pvalue stars (Intercept) 698.93 10.461 0 *** str -2.28 0.524 0 *** R2= 0.05 est<-ePlot(testscr ~ str,data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 739.62 8.882 0.0000 *** str -1.72 0.516 0.0009 *** R2= 0.07 myPlot(avginc) est<-ePlot(testscr ~ str + elpct + mealpct + log(avginc), data=subset(cama,type=="CA")) beta stdev pvalue stars (Intercept) 658.552 8.7489 0.0000 *** str -0.734 0.2606 0.0048 ** elpct -0.176 0.0342 0.0000 *** mealpct -0.398 0.0336 0.0000 *** log(avginc) 11.569 1.8413 0.0000 *** R2= 0.8 40 c Oliver Kirchkamp ## EVALUATING MULTIPLE REGRESSIONS 750 ## 6 February 2015 09:49:38 ## est<-ePlot(testscr ~ str + elpct + mealpct + log(avginc), data=subset(cama,type=="MA")) 10 20 30 40 650 700 750 600 MASS CA MASS CA 600 650 testscr 700 750 beta stdev pvalue stars (Intercept) 682.432 12.0943 0.0000 *** str -0.689 0.2779 0.0131 * elpct -0.411 0.3512 0.2422 mealpct -0.521 0.0834 0.0000 *** log(avginc) 16.529 3.3010 0.0000 *** R2= 0.68 testscr c Oliver Kirchkamp 194 10 var 20 30 var ## The effect of str is significant. Additional variables reduce the coefficient of str The effect of avginc is significant. myPlot(avginc) est<-ePlot(testscr ~ str + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 744.02504 23.18585 0.0000 *** str -0.64091 0.27642 0.0204 * elpct -0.43712 0.35908 0.2235 mealpct -0.58182 0.10781 0.0000 *** 40 ## EVALUATING MULTIPLE REGRESSIONS avginc -3.06669 I(avginc^2) 0.16369 I(avginc^3) -0.00218 R2= 0.69 2.53398 0.2262 0.09172 0.0743 0.00104 0.0370 195 . * linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm) Linear hypothesis test Hypothesis: I(avginc^2) = 0 I(avginc^3) = 0 Model 1: restricted model Model 2: testscr ~ str + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. 700 650 MASS CA 600 testscr 750 Res.Df Df F Pr(>F) 1 215 2 213 2 6.227 0.002354 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 10 20 30 var ## The effect of str is significant. 40 c Oliver Kirchkamp c Oliver Kirchkamp 196 ## 6 February 2015 09:49:38 myPlot(avginc) est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA")) beta stdev pvalue stars (Intercept) 330.079169 173.293815 0.0568 . str 55.618325 26.486559 0.0357 * I(str^2) -2.914810 1.340721 0.0297 * I(str^3) 0.049866 0.022437 0.0262 * elpct -0.196440 0.035054 0.0000 *** mealpct -0.411538 0.033874 0.0000 *** avginc -0.912858 0.587802 0.1204 I(avginc^2) 0.067430 0.022781 0.0031 ** I(avginc^3) -0.000826 0.000262 0.0016 ** R2= 0.81 est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 665.49605 116.07834 0.0000 *** str 12.42598 20.27945 0.5401 I(str^2) -0.68030 1.12956 0.5470 I(str^3) 0.01147 0.02081 0.5814 elpct -0.43417 0.36722 0.2371 mealpct -0.58722 0.11724 0.0000 *** avginc -3.38154 2.74013 0.2172 I(avginc^2) 0.17410 0.09819 0.0762 . I(avginc^3) -0.00229 0.00111 0.0398 * R2= 0.69 linearHypothesis(est,c("str=0","I(str^2)","I(str^3)"),vcov=hccm) Linear hypothesis test Hypothesis: str = 0 I(str^2) = 0 I(str^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 214 2 211 3 2.3364 0.07478 . --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 linearHypothesis(est,c("I(str^2)=0","I(str^3)=0"),vcov=hccm) ## EVALUATING MULTIPLE REGRESSIONS ## Linear hypothesis test Hypothesis: I(str^2) = 0 I(str^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. 1 2 Res.Df Df F Pr(>F) 213 211 2 0.3396 0.7124 linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm) Linear hypothesis test Hypothesis: I(avginc^2) = 0 I(avginc^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 213 2 211 2 5.7043 0.003866 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 197 c Oliver Kirchkamp 650 700 750 ## 6 February 2015 09:49:38 MASS CA 600 testscr c Oliver Kirchkamp 198 10 20 30 40 var ## The effect of str is significant. str has a significant effect in California, but not in Massachusetts. myPlot(avginc) est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA")) beta stdev pvalue stars (Intercept) 330.079169 173.293815 0.0568 . str 55.618325 26.486559 0.0357 * I(str^2) -2.914810 1.340721 0.0297 * I(str^3) 0.049866 0.022437 0.0262 * elpct -0.196440 0.035054 0.0000 *** mealpct -0.411538 0.033874 0.0000 *** avginc -0.912858 0.587802 0.1204 I(avginc^2) 0.067430 0.022781 0.0031 ** I(avginc^3) -0.000826 0.000262 0.0016 ** R2= 0.81 est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 665.49605 116.07834 0.0000 *** str 12.42598 20.27945 0.5401 I(str^2) -0.68030 1.12956 0.5470 ## EVALUATING MULTIPLE REGRESSIONS I(str^3) 0.01147 elpct -0.43417 mealpct -0.58722 avginc -3.38154 I(avginc^2) 0.17410 I(avginc^3) -0.00229 R2= 0.69 0.02081 0.36722 0.11724 2.74013 0.09819 0.00111 0.5814 0.2371 0.0000 0.2172 0.0762 0.0398 *** . * linearHypothesis(est,c("str=0","I(str^2)","I(str^3)"),vcov=hccm) Linear hypothesis test Hypothesis: str = 0 I(str^2) = 0 I(str^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 214 2 211 3 2.3364 0.07478 . --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 linearHypothesis(est,c("I(str^2)=0","I(str^3)=0"),vcov=hccm) Linear hypothesis test Hypothesis: I(str^2) = 0 I(str^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 213 2 211 2 0.3396 0.7124 linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm) Linear hypothesis test Hypothesis: 199 c Oliver Kirchkamp c Oliver Kirchkamp 200 ## 6 February 2015 09:49:38 I(avginc^2) = 0 I(avginc^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 213 2 211 2 5.7043 0.003866 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 aggregate(elpct,list(type),median) 1 2 Group.1 x CA 8.777634 MA 0.000000 cama$HiEL=cama$elpct>0 est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA")) beta stdev pvalue stars (Intercept) 658.16110 16.404309 0.0000 *** str 1.35471 0.810345 0.0946 . HiELTRUE 36.11583 16.262280 0.0264 * mealpct -0.50670 0.027027 0.0000 *** avginc -0.90724 0.616350 0.1410 I(avginc^2) 0.05912 0.023531 0.0120 * I(avginc^3) -0.00068 0.000272 0.0123 * str:HiELTRUE -2.18763 0.864613 0.0114 * R2= 0.8 est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 759.91422 25.28938 0.0000 *** str -1.01768 0.38182 0.0077 ** HiELTRUE -12.56073 10.22789 0.2194 mealpct -0.70851 0.09894 0.0000 *** avginc -3.86651 2.71955 0.1551 I(avginc^2) 0.18412 0.09930 0.0637 . I(avginc^3) -0.00234 0.00115 0.0414 * str:HiELTRUE 0.79861 0.58020 0.1687 R2= 0.69 linearHypothesis(est,c("str=0","str:HiELTRUE=0"),vcov=hccm) ## EVALUATING MULTIPLE REGRESSIONS ## Linear hypothesis test Hypothesis: str = 0 str:HiELTRUE = 0 Model 1: restricted model Model 2: testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 214 2 212 2 3.7663 0.0247 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm) Linear hypothesis test Hypothesis: I(avginc^2) = 0 I(avginc^3) = 0 Model 1: restricted model Model 2: testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 214 2 212 2 3.2201 0.04191 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 201 c Oliver Kirchkamp 650 700 750 ## 6 February 2015 09:49:38 MASS CA 600 testscr c Oliver Kirchkamp 202 10 20 30 40 var ## The effect of str is significant. No noteworthy interaction between HiEL and str in Massachusetts, but in California. myPlot(avginc) est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA")) beta stdev pvalue stars (Intercept) 330.079169 173.293815 0.0568 . str 55.618325 26.486559 0.0357 * I(str^2) -2.914810 1.340721 0.0297 * I(str^3) 0.049866 0.022437 0.0262 * elpct -0.196440 0.035054 0.0000 *** mealpct -0.411538 0.033874 0.0000 *** avginc -0.912858 0.587802 0.1204 I(avginc^2) 0.067430 0.022781 0.0031 ** I(avginc^3) -0.000826 0.000262 0.0016 ** R2= 0.81 est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 665.49605 116.07834 0.0000 *** str 12.42598 20.27945 0.5401 ## EVALUATING MULTIPLE REGRESSIONS I(str^2) -0.68030 I(str^3) 0.01147 elpct -0.43417 mealpct -0.58722 avginc -3.38154 I(avginc^2) 0.17410 I(avginc^3) -0.00229 R2= 0.69 1.12956 0.02081 0.36722 0.11724 2.74013 0.09819 0.00111 0.5470 0.5814 0.2371 0.0000 0.2172 0.0762 0.0398 *** . * linearHypothesis(est,c("str=0","I(str^2)","I(str^3)"),vcov=hccm) Linear hypothesis test Hypothesis: str = 0 I(str^2) = 0 I(str^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 214 2 211 3 2.3364 0.07478 . --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 linearHypothesis(est,c("I(str^2)=0","I(str^3)=0"),vcov=hccm) Linear hypothesis test Hypothesis: I(str^2) = 0 I(str^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. 1 2 Res.Df Df F Pr(>F) 213 211 2 0.3396 0.7124 linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm) Linear hypothesis test 203 c Oliver Kirchkamp c Oliver Kirchkamp 204 ## 6 February 2015 09:49:38 Hypothesis: I(avginc^2) = 0 I(avginc^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 213 2 211 2 5.7043 0.003866 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 aggregate(elpct,list(type),median) 1 2 Group.1 x CA 8.777634 MA 0.000000 cama$HiEL=cama$elpct>0 est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA")) beta stdev pvalue stars (Intercept) 658.16110 16.404309 0.0000 *** str 1.35471 0.810345 0.0946 . HiELTRUE 36.11583 16.262280 0.0264 * mealpct -0.50670 0.027027 0.0000 *** avginc -0.90724 0.616350 0.1410 I(avginc^2) 0.05912 0.023531 0.0120 * I(avginc^3) -0.00068 0.000272 0.0123 * str:HiELTRUE -2.18763 0.864613 0.0114 * R2= 0.8 est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 759.91422 25.28938 0.0000 *** str -1.01768 0.38182 0.0077 ** HiELTRUE -12.56073 10.22789 0.2194 mealpct -0.70851 0.09894 0.0000 *** avginc -3.86651 2.71955 0.1551 I(avginc^2) 0.18412 0.09930 0.0637 . I(avginc^3) -0.00234 0.00115 0.0414 * str:HiELTRUE 0.79861 0.58020 0.1687 R2= 0.69 linearHypothesis(est,c("str=0","str:HiELTRUE=0"),vcov=hccm) ## EVALUATING MULTIPLE REGRESSIONS ## Linear hypothesis test Hypothesis: str = 0 str:HiELTRUE = 0 Model 1: restricted model Model 2: testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 214 2 212 2 3.7663 0.0247 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm) Linear hypothesis test Hypothesis: I(avginc^2) = 0 I(avginc^3) = 0 Model 1: restricted model Model 2: testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 214 2 212 2 3.2201 0.04191 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 est<-ePlot(testscr ~ str ## + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 747.36389 21.67952 0.0000 *** str -0.67188 0.27679 0.0152 * mealpct -0.65308 0.07859 0.0000 *** avginc -3.21795 2.46635 0.1920 I(avginc^2) 0.16479 0.09113 0.0706 . I(avginc^3) -0.00216 0.00106 0.0415 * R2= 0.68 linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm) 205 c Oliver Kirchkamp ## 6 February 2015 09:49:38 ## Linear hypothesis test Hypothesis: I(avginc^2) = 0 I(avginc^3) = 0 Model 1: restricted model Model 2: testscr ~ str + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. 650 700 750 Res.Df Df F Pr(>F) 1 216 2 214 2 4.2776 0.01508 * --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 MASS CA 600 testscr c Oliver Kirchkamp 206 10 20 30 40 var ## The effect of str is significant. Discussion: The data set from California is larger it is easier to find significant results. Comparing the mean values and standard deviations of str in California and Massachusetts. ## EVALUATING MULTIPLE REGRESSIONS 207 (mmean<-aggregate(cama$testscr,list(type),mean))
Group.1
x
1
CA 654.1565
2
MA 709.8273
colnames(mmean)<-c("type","testmean")
(msd=aggregate(cama$testscr,list(type),sd)) Group.1 x 1 CA 19.05335 2 MA 15.12647 colnames(msd)<-c("type","testsd") cama2=merge(merge(cama,mmean),msd) head(cama2) type str testscr elpct avginc mealpct HiEL 1 CA 14.00000 635.60 0.000000 10.656 68.8235 FALSE 2 CA 14.20176 656.50 0.000000 13.712 20.0000 FALSE 3 CA 14.54214 695.30 3.765690 35.342 0.0000 TRUE 4 CA 14.70588 666.85 2.500000 11.826 53.5032 TRUE [ reached getOption("max.print") -- omitted 2 rows ] testmean 654.1565 654.1565 654.1565 654.1565 testsd 19.05335 19.05335 19.05335 19.05335 tail(cama2) type str testscr elpct avginc mealpct HiEL testmean testsd 635 MA 21.9 691 2.816901 15.905 27.1 TRUE 709.8273 15.12647 636 MA 22.0 706 0.000000 14.471 18.3 FALSE 709.8273 15.12647 637 MA 22.0 711 0.000000 15.603 12.4 FALSE 709.8273 15.12647 638 MA 22.6 731 1.246106 25.792 2.5 TRUE 709.8273 15.12647 [ reached getOption("max.print") -- omitted 2 rows ] cama2$testnorm=(cama2$testscr - cama2$testmean) / cama2$testsd detach(cama) myPlot(avginc) est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA")) beta stdev pvalue stars (Intercept) 330.079169 173.293815 0.0568 . str 55.618325 26.486559 0.0357 * I(str^2) -2.914810 1.340721 0.0297 * I(str^3) 0.049866 0.022437 0.0262 * elpct -0.196440 0.035054 0.0000 *** mealpct -0.411538 0.033874 0.0000 *** avginc -0.912858 0.587802 0.1204 I(avginc^2) 0.067430 0.022781 0.0031 ** I(avginc^3) -0.000826 0.000262 0.0016 ** R2= 0.81 c Oliver Kirchkamp c Oliver Kirchkamp 208 ## 6 February 2015 09:49:38 ## est<-ePlot(testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA")) beta stdev pvalue stars (Intercept) 665.49605 116.07834 0.0000 *** str 12.42598 20.27945 0.5401 I(str^2) -0.68030 1.12956 0.5470 I(str^3) 0.01147 0.02081 0.5814 elpct -0.43417 0.36722 0.2371 mealpct -0.58722 0.11724 0.0000 *** avginc -3.38154 2.74013 0.2172 I(avginc^2) 0.17410 0.09819 0.0762 . I(avginc^3) -0.00229 0.00111 0.0398 * R2= 0.69 linearHypothesis(est,c("str=0","I(str^2)","I(str^3)"),vcov=hccm) Linear hypothesis test Hypothesis: str = 0 I(str^2) = 0 I(str^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 214 2 211 3 2.3364 0.07478 . --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 linearHypothesis(est,c("I(str^2)=0","I(str^3)=0"),vcov=hccm) Linear hypothesis test Hypothesis: I(str^2) = 0 I(str^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 213 2 211 2 0.3396 0.7124 ## EVALUATING MULTIPLE REGRESSIONS linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm) Linear hypothesis test Hypothesis: I(avginc^2) = 0 I(avginc^3) = 0 Model 1: restricted model Model 2: testscr ~ str + I(str^2) + I(str^3) + elpct + mealpct + avginc + I(avginc^2) + I(avginc^3) Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 213 2 211 2 5.7043 0.003866 ** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 aggregate(elpct,list(type),median) 1 2 Group.1 x CA 8.777634 MA 0.000000 cama$HiEL=cama\$elpct>0
est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc +
I(avginc^2) + I(avginc^3),data=subset(cama,type=="CA"))
beta
stdev pvalue stars
(Intercept) 658.16110 16.404309 0.0000
***
str
1.35471 0.810345 0.0946
.
HiELTRUE
36.11583 16.262280 0.0264
*
mealpct
-0.50670 0.027027 0.0000
***
avginc
-0.90724 0.616350 0.1410
I(avginc^2)
0.05912 0.023531 0.0120
*
I(avginc^3)
-0.00068 0.000272 0.0123
*
str:HiELTRUE -2.18763 0.864613 0.0114
*
R2=
0.8
est<-ePlot(testscr ~ str + HiEL + HiEL:str + mealpct + avginc +
I(avginc^2) + I(avginc^3),data=subset(cama,type=="MA"))
beta
stdev pvalue stars
(Intercept) 759.91422 25.28938 0.0000
***
str
-1.01768 0.38182 0.0077
**
HiELTRUE
-12.56073 10.22789 0.2194
mealpct
-0.70851 0.09894 0.0000
***
avginc
-3.86651 2.71955 0.1551
I(avginc^2)
0.18412 0.09930 0.0637
.
I(avginc^3)
-0.00234 0.00115 0.0414
*
str:HiELTRUE
0.79861 0.58020 0.1687
R2=
0.69

209

c Oliver Kirchkamp

c Oliver Kirchkamp

210

## 6 February 2015 09:49:38

linearHypothesis(est,c("str=0","str:HiELTRUE=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
str = 0
str:HiELTRUE = 0
Model 1: restricted model
Model 2: testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) +
I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
214
2
212 2 3.7663 0.0247 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)
Linear hypothesis test
Hypothesis:
I(avginc^2) = 0
I(avginc^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + HiEL + HiEL:str + mealpct + avginc + I(avginc^2) +
I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
214
2
212 2 3.2201 0.04191 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
est<-ePlot(testscr ~ str

## + mealpct + avginc + I(avginc^2) +

I(avginc^3),data=subset(cama,type=="MA"))

beta
stdev pvalue stars
(Intercept) 747.36389 21.67952 0.0000
***
str
-0.67188 0.27679 0.0152
*
mealpct
-0.65308 0.07859 0.0000
***
avginc
-3.21795 2.46635 0.1920
I(avginc^2)
0.16479 0.09113 0.0706
.
I(avginc^3) -0.00216 0.00106 0.0415
*
R2=
0.68
linearHypothesis(est,c("I(avginc^2)=0","I(avginc^3)=0"),vcov=hccm)

## Linear hypothesis test

Hypothesis:
I(avginc^2) = 0
I(avginc^3) = 0
Model 1: restricted model
Model 2: testscr ~ str + mealpct + avginc + I(avginc^2) + I(avginc^3)
Note: Coefficient covariance matrix supplied.
Res.Df Df
F Pr(>F)
1
216
2
214 2 4.2776 0.01508 *
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
est<-ePlot(testnorm ~ str*type,data=cama2)
beta
(Intercept) 2.35005
str
-0.11965
typeMA
-0.38040
str:typeMA
0.00609
R2=
0.06

0.5490 0.000
***
0.0275 0.000
***
0.8038 0.636
0.0438 0.889

## est<-ePlot(testscr ~ str*type + mealpct + avginc + I(avginc^2) +

I(avginc^3),data=cama2)
beta
stdev pvalue stars
(Intercept) 697.172818 7.309650 0.0000
***
str
-0.747646 0.264140 0.0046
**
typeMA
38.005721 7.229002 0.0000
***
mealpct
-0.542059 0.024022 0.0000
***
avginc
-1.279304 0.587298 0.0294
*
I(avginc^2)
0.075675 0.022051 0.0006
***
I(avginc^3) -0.000909 0.000257 0.0004
***
str:typeMA
-0.070309 0.378909 0.8528
R2=
0.92

211

c Oliver Kirchkamp

650

700

750

## 6 February 2015 09:49:38

MASS
CA

600

testscr

c Oliver Kirchkamp

212

10

20

30

40

var

## 6.4.1 Internal validity

Omitted variables: we control for
Income

## Some characteristics of the students (language)

What is missing? e.g. str could be correlated with
Quality of teachers
Extracurricular activities
Attention of parents
Alternative: Experiment. Pupils are randomly assigned to groups of
different sizes.

Functional form:

## different non-linear specifications lead to similar results in the example

V
Non-linearities are not very large V

213

## The true variance of student teacher ratios will be underestimated by

str. Hence, the estimation for the coefficient of str will be underestimated as well.
Ideally we would like to have the data for individual students.
Selection bias:

## Both data sets are based on a full census. V

Simlutaneous causality:
testscr

## str (e.g. compensatory measures)

Massachusettes: no measures. V

## California: Compensatory funding, but independent of students

successes. V
Heteroscedasticity and correlation of errors terms:

## Heteroscedasticity: Heteroscedasticity-consistent variance-covariance

matrices. V
Correlation of errors terms: No random drawing of observations.

## 6.4.2 External validity

Comparison California Massachusetts

6.4.3 Result

## costs (classrooms, teachers pay)

c Oliver Kirchkamp