Vous êtes sur la page 1sur 785

1

BOOK PREFACE
This book provides a detailed treatment of microeconometric analysis, the analysis of individuallevel data on the economic behavior of individuals or firms. This usually entails regression methods
applied to cross-section and panel data.
The book aims to provide the practitioner with a comprehensive coverage of statistical methods
and their application in modern applied microeconometrics research. These methods include
nonlinear modelling, inference under minimal distributional assumptions, identifying and
measuring causation rather than mere association, and correcting from departures from simple
random sampling. Many of these features are of relevance to individual-level data analysis
throughout the social sciences.
The ambitious agenda has determined the characteristics of this book. First, although oriented to
the practitioner the book is relatively advanced in places. A cookbook approach is inadequate as
when two or more complications occur simultaneously, a common situation, the practitioner must
know enough to be able to adapt available methods. Second, the book provides considerable
coverage of practical data problems, see especially the last three chapters. Third, the book includes
substantial empirical examples in many chapters, to illustrate some of the methods covered. Finally,
the book is unusually long. Despite this length we have been space-constrained. We had intended to
include even more empirical examples. And abbreviated presentations will at times fail to recognize
the accomplishments of researchers who have made substantive contributions.
The book assumes a basic understanding of the linear regression model with matrix algebra. It is
written at the mathematical level of the first-year economics Ph.D. sequence, comparable to Greene
(2003). We have two types of readers in mind. First, the book can be used as a course text for a
microeconometrics course, typically taught in the second-year of the Ph.D., or for data-oriented
microeconomics field courses such as labor economics, public economics and industrial
organization. Second, the book can be used as a reference work for graduate students and applied
researchers who despite training in microeconometrics will inevitably have gaps that they wish to
fill.
For instructors using this book as an econometrics course text it is best to introduce the basic
nonlinear cross-section and linear panel data models as early as possible, initially skipping many of
the methods chapters. The key methods chapter (chapter 5) covers maximum likelihood and
nonlinear least squares estimation. ML and NLS provide adequate background for the most
commonly-used nonlinear cross-section models (chapters 14-17, 20), basic linear panel data models
(chapter 21) and treatment evaluation methods (chapter 25). Generalized method of moments
estimation (chapter 6) is needed especially for advanced linear panel data methods (chapter 22).
For readers using this book as a reference work, the chapters have been written to be as selfcontained as possible. The notable exception is that some command of general estimation results in
chapter 5, and occasionally chapter 6, will be necessary. Most models chapters are structured to
begin with a discussion and example that is accessible to a wide audience.
The web-site www.econ.ucdavis.edu/faculty/cameron/mmabook provides all the data and
computer programs used in this book, and related materials useful for instructional purposes.
This project has been long and arduous, and at times seemingly without an end. Its completion
has been greatly aided by our colleagues, friends, and graduate students. We would like to thank
especially the following for reading and commenting on specific chapters: Bijan Borah, Kurt
Brnns, Pian Chen, Tim Cogley, Parthe Deb, David Drukker, Massimiliano De Santis, Jeff Gill,
10

Tue Gorgens, Shiferaw Gurmu, Lu Ji, Oscar Jorda, Roger Koenker, Chenghui Li, Tong Li, Doug
Miller, Murat Munkin, Jim Prieger, Ahmed Rahmen, Sunil Sapra, Haruki Seitani, Yacheng Sun,
Xiaoyong Zheng, and David Zimmer. We thank Rajeev Dehejia, Bronwyn Hall, Cathy Kling,
Jeffrey Kling, Will Manning, Brian McCall and Jim Ziliak for making their data available for
empirical illustrations. We thank our respective departments for facilitating our collaboration, and
for the production and distribution of the draft manuscript at various stages. We benefitted from the
comments of two anonymous reviewers. Guidance, advice and encouragement from our CUP
editor, Scott Pariss, has been invaluable.
Our interest in econometrics owes much to the training and environments we encountered as
students and in the initial stages of our academic careers. The first author thanks The Australian
National University, Stanford University, especially Takeshi Amemiya and Tom MaCurdy, and The
Ohio State University. The second author thanks the London School of Economics and The
Australian National University.
Our interest in writing a book oriented to the practitioner owes much to our exposure to the
research of graduate students and colleagues at our respective institutions, UC-Davis and IUBloomington.
Finally, we would like to thank our families for their patience and understanding without which
completion of this project would not have been possible.
A. Colin Cameron
Davis, California
Pravin K. Trivedi
Bloomington, Indiana

11

TABLE OF CONTENTS
I: PRELIMINARIES

II: CORE METHODS

1.
Overview
2. Causal and Noncausal Models
3. Microeconomic Data Structures
4.
Linear
models
5.
ML
and
NLS
estimation
6. GMM and Systems Estimation
7.
Hypothesis
Tests
8. Specification Tests and Model
Selection
9.
Semiparametric
Methods
10. Numerical Optimization

III:
SIMULATION- 11.
Bootstrap
BASED
12.
Simulation-based
METHODS
13. Bayesian Methods

Methods
Methods

IV:

CROSS-SECTION 14.
Binary
Outcome
Models
DATA MODELS
15.
Multinomial
Models
16. Tobit and Selection Models
17. Transition Data: Survival Analysis
18. Mixture Models and Unobserved
Heterogeneity
19. Models of Multiple Hazards
20. Count Data Models

V:

PANEL
MODELS

DATA 21. Linear Panel Models: Basics


22. Linear Panel Models: Extensions
23. Nonlinear Panel Models

VI: FURTHER TOPICS 24. Stratified and Clustered Samples


25.
Treatment
Evaluation
26.
Measurement
Error
Models
27. Missing Data and Imputation
APPENDICES

A.
Asymptotic
Theory
B. Making Pseudo-Random Draws

12

PART 1 (chapters 1-3)

Part 1 covers the essential components of microeconometric analysis -- an economic specification, a


statistical
model
and
a
data
set.
Chapter 1 discusses the distinctive aspects of microeconometrics, and provides an outline of the
book. It emphasizes that discreteness of data, and nonlinearity and heterogeneity of behavioral
relationships are key aspects of disaggregated microeconometric models. It concludes by presenting
the
notation
and
conventions
used
throughout
the
book.
Chapters 2 and 3 set the scene for the remainder of the book by introducing the reader to key model
and
data
concepts
that
shape
the
analyses
of
later
chapters.
A key distinction in econometrics is between essentially descriptive models and data summaries at
various levels of statistical sophistication and models that go beyond associations and attempt to
estimate causal parameters. The classic definitions of causality in econometrics derive from the
Cowles Commission simultaneous equations models that draw sharp distinctions between
exogenous and endogenous variables, and between structure and reduced form parameters.
Although reduced form models are very useful for prediction, knowledge of structural or causal
parameters is essential for policy analyses. Identification of structural parameters within the
simultaneous equations framework poses numerous conceptual and practical difficulties. An
alternative approach based on the potential outcome model, also attempts to identify causal
parameters but it does so by posing limited questions within a more manageable framework.
Chapter 2 attempts to provide an overview of the fundamental issues that arise in these alternative
frameworks. Readers who initially find this material challenging should return to this chapter later
after gaining greater familiarity with specific models covered later in the book.
The empirical researchers ability to identify causal parameters depends not only on the statistical
tools and models but also on the type of data available. An experimental framework provides a
standard for establishing causal connections. However, observational, not experimental, data form
the basis of much of econometric inference. Chapter 3 surveys the pros and cons of three main types
of data available: observational data, data from social experiments, and those from natural
experiments. The potential as well as the difficulties of conducting causal inference based on each
type of data are reviewed.

PART 2 (chapters 4-10)

Part 2 presents the core methods least squares, method of moments, and maximum likelihood -of estimation and inference in nonlinear regression models that are central in microeconometrics.
Both the traditional topics as well as more modern topics like quantile regression, sequential
estimation, empirical likelihood, bootstrap, and semi- and nonparametric regression are covered. In
general the discussion is at a level intended to provide enough background and detail to enable the
practitioner to read and comprehend articles in the leading econometrics journals. We presume
prior
familiarity
with
linear
regression
analysis.
Chapter 4 begins with the linear regression model. It then covers at an introductory level quantile
regression, which models distributional features other than the conditional mean. It provides a
lengthy expository treatment of instrumental variables estimation, a major semiparametric method
13

of causal inference. Chapter 5 presents the most commonly-used estimation methods for nonlinear
models, beginning with the quite general topic of m-estimation, before specialization to maximum
likelihood and nonlinear least squares regression. Chapter 6 provides a comprehensive treatment of
generalized method of moments, which is a quite general estimation framework, applicable both in
linear and nonlinear, and single- and multi-equation settings. The chapter emphasizes the special
case
of
instrumental
variables
estimation.
Chapter 7 covers both the classical and bootstrap approaches to hypothesis testing, while Chapter 8
presents relatively more modern methods of model selection and specification analysis. .Because of
their importance the bootstrap methods also get a more detailed stand-alone treatment in Chapter
11. As much as possible testing methods are presented in a unified manner in these chapters, but
specific
applications
occur
throughout
the
book
Chapter 9 is a stand-alone chapter that presents nonparametric and semiparametric estimation
methods that place a flexible structure on the econometric model. Chapter 10 presents the
computational methods used to compute the nonlinear estimators presented in chapters 5 and 6.
This material becomes especially relevant to the practitioner if an estimator is not automatically
computed by an econometrics package.

PART 3 (chapters 11-13)

Part 1 emphasized that: (1) Microeconometric models are often nonlinear; (2) they are frequently
estimated using large and heterogeneous data sets; and (3) the data often come from surveys that
are complex and subject to a variety of sampling biases. A realistic depiction of the economic
phenomena in such settings often requires the use of models that are difficult to estimate and
analyze. Advances in computing hardware and software now make it feasible to tackle such tasks.
Part 3 presents modern, computer-intensive, simulation-based methods of inference that mitigate
some of these difficulties. The background required to cover this material varies somewhat with the
chapter but the essential base is least squares and maximum likelihood estimation.
Chapter 11 presents bootstrap methods for statistical inference. These methods have the attraction
of providing a simple way to obtain standard errors when the formulae from asymptotic theory are
complex, as is the case for some two-step estimators. Furthermore, if implemented appropriately, a
bootstrap can lead to a more refined asymptotic theory that may then lead to better statistical
inference
in
small
samples.
Chapter 12 presents simulation-based estimation methods. These methods permit estimation in
situations where standard computational methods may not permit calculation of an estimator,
because of the presence of an integral over a probability distribution for which there is no closedform
solution.
Chapter 13 surveys Bayesian methods that provide an approach to estimation and inference that is
quite different from the classical approach used in other chapters of this book. Despite this different
approach, the Bayesian toolkit can also be adopted to permit classical estimation and inference for
problems that are otherwise intractable

14

PART 4 (chapters 14-20)

Part 4, consisting of chapters 14 to 20, covers the core nonlinear limited dependent variable models
for cross-section data, defined by the range of values taken by the dependent variable. Topics
covered include models for binary and multinomial data, duration data and count data. The
complications of censoring, truncation and sample selection are also studied.
Chapters 14-15 cover models for binary and multinomial data that are standard in the analysis of
discrete choice and outcomes. Maximum likelihood methods are dominant. Different
parameterizations for the conditional probabilities in these models lead to different models, notably
logit and probit models, which are well-established Recent literature has focused on less restrictive
modeling with more flexible functional forms for conditional probabilities and on accommodating
individual unobserved heterogeneity. These objectives motivate the use of semiparametric methods
and
simulation-based
estimation
methods.
Censoring, truncation or sample selection generate empirically several important classes of models
that are analyzed in Chapter 16. The long-established Tobit model is central to this literature, but its
estimation and inference rely on strong distributional assumptions to permit consistent estimation.
We also examine the newer semiparametric methods require weaker assumptions.
Chapters 17-19 consider duration models in which the focus is on either the determinants of spell
lengths, such as length of an unemployment spell, or on modeling the hazard rate of transitions from
one initial state to another. The relative importance of state dependence and unobserved
heterogeneity as determinants of the average length of spell is a central issue, whose resolution
raises fundamental questions about alternative modeling approaches. The analysis covers both
discrete and continuous time models, and both parametric and semiparametric formulations,
including the standard models like the exponential, the Weibull, and the proportional hazards
model. Chapter 18 covers formulation and interpretation of richer models that incorporate
unobserved heterogeneity. Chapter 19 deals with models with several types of events using the
competing
risks
formulation
and
models
of
multiple
spells.
Chapter 20 covers the analysis of event count of the kind very common in health economics. There
are many strong connections and parallels between count data models and duration models because
of their common foundation in stochastic processes. We analyze the widely-used Poisson and
negative binomial regression models, together with important variants such as the two-part or
hurdle model, zero-inflated models, latent class models, and endogenous regressor models, all of
which accommodate different facets of the event processes.

PART 5 (chapters 21-23)

Cross section models have certain inherent limitations. They are predominantly equilibrium models
that generally do not shed light on intertemporal dependence of events. They also cannot
satisfactorily resolve fundamental issues about the sources of persistence in behavior. Such
persistence may be behavioral, i.e. arising from true state dependence, or it may be spurious, being
an artifact of the inability to control for heterogeneous behavior in the population. Because panel
data, also called longitudinal data, contain periodically repeated observations of the same subjects,
they have a large potential for resolving issues that cross section models cannot satisfactorily
handle. Chapters 21 through 23 present methods for panel data. We progress systematically from
15

linear models for continuous data in Chapter 21 to nonlinear panel data models for limited
dependent variables in Chapter 23. Both fixed effects and random effects models are considered. A
persistent theme through these three chapters is the importance of using robust methods of
inference.
Chapter 21, which reviews the key general results for linear panel data regression models, can be
read easily by those with a good grasp of linear regression; it does not require the material covered
in Parts 2 to 4. We recommend that even those who are interested in more advanced material should
quickly peruse through the contents of this chapter first to gain familiarity with key concepts and
definitions.
Chapter 22 covers important extensions of Chapter 21, especially to dynamic panels which allow
for Markovian dependence structure of current variables. The analysis is in the GMM framework
that is currently favored by many practitioners in this area. The analysis here is at times intricate,
involving many issues of detail. A strong grasp of GMM will be helpful in absorbing the main
results
of
this
chapter.
The results of Chapters 21 and 22 do not extend to nonlinear panel models of Chapter 23 in a
general and unified fashion. There are relatively fewer general results for limited dependent variable
panel models. Despite this, in Chapter 23 we begin by presenting an analysis of some general issues
and approaches. Later sections can be treated as panel data extensions of the counterpart cross
section models in Part 4. these analyze four categories of models for binary, count , censored, and
duration data, respectively. These should be accessible to a suitably prepared reader familiar with
the parallel cross section models.

PART 6 (chapters 24-27)

Frequently in empirical work data present not one but multiple complications that the analysis must
simultaneously deal with. Examples of such complications include departures from simple random
sampling, clustering of observations, measurement errors, and missing data. When they occur,
individually or jointly, and in the context of any of the models developed in Parts 4 and 5,
identification of parameters of interest will be compromised. Three chapters in Part 6 Chapters
24, 26, and 27 analyze the consequences of such complications and then present methods that
attempt to overcome the consequences. The methods are illustrated using examples taken from the
earlier parts of the book. This features gives points of connection between Part 6 and the rest of the
book.
Chapter 24, which deals with features of data from complex surveys, complements various topics
covered Chapters 3, 5, and 16. Chapter 26 which deals with measurement errors complements
topics in Chapter 4, 14, and 20. Chapter 27 is a stand-alone chapter on missing data and multiple
imputation, but its use of the EM algorithm and Gibbs sampler also gives it points of contact with
Chapters
10
and
13,
respectively.
Chapter 25 deals with the important topic of treatment evaluation. Treatment is a broad term that
refers to the impact of one variable, e.g. schooling, on some outcome variable, e.g. income.
Treatment variables may be exogenously assigned, or may be endogenously chosen. The topic of
treatment evaluation concerns the identifiability of the impact of treatment on outcome, as measured
by either the marginal effects or certain functions of marginal effect. A variety of methods are used
including instrumental variables regression and propensity score matching. The problem of
treatment evaluation can arise in the context of any model considered in parts 4 and 5. This chapter
16

may also be read on its own, but it does presume familiarity with many other topics covered in the
book, including instrumental variables and selection models, which is why it is placed in the last
part.

17

GUIDE FOR INSTRUCTORS AND OTHER READERS

The book assumes a basic understanding of the linear regression model with matrix algebra. It is
written at the mathematical level of the first-year economics Ph.D. sequence, comparable to Greene
(2000).
While some of the material in this book is covered in a first-year sequence, most of the material in
this book appears in second year econometrics Ph.D. courses or in data-oriented microeconomics
field courses such as labor economics, public economics or industrial organization. This book is
intended to be used as both an econometrics text and as an adjunct for such field courses. More
generally, the book is intended to be useful as a reference work for applied researchers in
economics, in related social sciences such as sociology and political science, and in epidemiology.
The models chapters have been written to be as self-contained as possible, to minimize the amount
of background material in the methods chapters that needs to be read. For the specific models
presented in parts four and five (chapters 14-23) it will generally be sufficient to read the relevant
chapter in isolation, except that some command of the general estimation results in chapter 5 and in
some cases chapter 6 will be necessary. Most chapters are structured to begin with a discussion and
example that is accessible to a wide audience.
For instructors using this book as a course text it is best to introduce the basic nonlinear crosssection and linear panel data models as early as possible, skipping many of the methods chapters.
The most commonly-used nonlinear cross-section models are presented in chapters 14-16, and
require knowledge of maximum likelihood and least squares estimation, presented in chapter five.
Chapter twenty-one on linear panel data models requires even less preparation, essentially just
chapter four.
Table 1.2 provides an outline for a one-quarter second-year graduate course taught at the University
of California - Davis, immediately following the required first-year statistics and econometrics
sequence. A quarter provides sufficient time to cover the basic results given in the first half of the
chapters in this outline. With additional time one can go into further detail or cover a subset of
chapters eleven to thirteen on computationally-intensive estimation methods (simulation-based
estimation, the bootstrap which is also briefly presented in chapter seven and Bayesian methods);
additional cross-section models (durations and counts) presented in chapters seventeen to twenty;
and additional panel data models (linear model extensions and nonlinear models) given in chapters
twenty-two and twenty-three.
Outline of a twenty-lecture ten-week course:
Lectures
Chapter
Topic
1-3
4
Review of linear models and asymptotic theory
4-7

Estimation: M-estimation, ML and NLS

10

Estimation: Numerical Optimization

9-11

14,15

Models: Binary and multinomial

12-14

16

Models: Censored and Truncated

15

Estimation: GMM

16

Testing: Hypothesis Tests

17-19

21

Models: Basic Linear Panel

20
9
Estimation: Semiparametric
At Indiana University - Bloomington, a fifteen-week semester long field course in
microeconometrics is based on material in most of Parts 4 and 5 (chapters 14-23). The prerequisite
courses for this course cover material similar to the material in Part 2 (chapters 4-10).
18

Some exercises are provided at the end of each chapter after the first three introductory chapters.
These exercises are usually learning-by-doing exercises, some are purely methodological while
others entail analysis of generated or actual data. The level of difficulty of the questions is mostly
related to the level of difficulty of the topic.
Detailed programs and data for all the data applications (using either actual data or generated data)
will be made available at the book website.

19

ADVANCE REVIEWS
"This book presents an elegant and accessible treatment of the broad range of rapidly expanding
topics currently being studied by microeconometricians. Thoughtful, intuitive, and careful in laying
out central concepts of sophisticated econometric methodologies, it is not only an excellent
textbook for students, but also an invaluable reference text for practitioners and researchers."
- Cheng Hsiao, University of Southern California
"I wish "Microeconometrics" was available when I was a student! Here, in one place -- and in clear
and readable prose -- you can find all of the tools that are necessary to do cutting-edge applied
economic
analysis,
and
with
many
helpful
examples."
- Alan Krueger, Princeton University
"Cameron and Trivedi have written a remarkably thorough and up-to-date treatment of
microeconometric methods. This is not a superficial cookbook; the early chapters carefully lay the
theoretical foundations on which the authors build their discussion of methods for discrete and
limited dependent variables and for analysis of longitudinal data. A distinctive feature of the book
is its attention to cutting-edge topics like semiparametric regression, bootstrap methods, simulationbased estimation, and empirical likelihood estimation. A highly valuable book."
- Gary Solon, University of Michigan
"The empirical analysis of micro data is more widespread than ever before. The book by Cameron
and Trivedi contains a superb treatment of all the methods that economists like to apply to such
data. What is more, it fully integrates a number of exciting new methods that have become
applicable due to recent advances in computer technology. The text is in perfect balance between
econometric theory and empirical intuition, and it contains many insightful examples."
-

Gerard J. van den Berg, Free University, Amsterdam, The Netherlands

20

PROGRAMS: I. INTRODUCTION (chapters 1-3)


No programs.

PROGRAMS: II. CORE METHODS (chapters 4-10)


Section Pages

Example

Program and Output

4.5.3

84-5

Robust Standard Errors for mma04p1wls.do


OLS, WLS and GLS
mma04p1wls.txt

* mma04p1wls.asc

4.6.4

88-90

Quantile
and
Regression

qreg0902.dta
qreg0902.asc

4.8.8

102-3

Instrumental
Regression

4.9.6

110-2

IV Application with Weak mma04p4ivweak.do


mma04p4ivweak.txt
Instruments

Median mma04p2qreg.do
mma04p2qreg.txt
Variables mma04p3iv.do
mma04p3iv.txt

Data
[* means generated]

or

* mma04p3iv.asc

DATA66.dat
DATA66.dct

and

5.9.2-3 159-63 Exponential: MLE using mma05p1mle.do


ml command
mma05p1mle.txt

* mma05data.asc

5.9.2-3 159-63 Exponential: NLS using nl mma05p2nls.do


command
mma05p2nls.txt

* mma05data.asc

5.9.2-3 159-63 Exponential: NLS using ml mma05p3nlsbyml.do


command
mma05p3nlsbyml.txt

* mma05data.asc

5.9.4

159-63 Exponential: Computation mma05p4margeffects.do


mma05p4margeffects.txt
of marginal effects

* mma05data.asc

6.5.4

198-9

Nonlinear
Limdep

* mma06p1nl2sls.asc

6.5.4

198-9

Part of preceding using mma06p2twostage.do


Stata
mma06p2twostage.txt

* mma06p1nl2sls.asc

7.4

241-3

Likelihood-based
Hypothesis Testts

* mma07p1mltests.asc

7.6.3

248-9

Asymptotic Power of Wald mma07p2power.do


Test
mma07p2power.txt

No data

7.7.1-5 250-4

Monte Carlo Simulation of mma07p3montecarlo.do


Wald Test
mma07p3montecarlo.txt

Data
for
many
simulations not saved

7.8

254-6

Bootstrap example

* mma07p4boot.asc

8.2.9

269-71 Conditional moment tests mma08p1cmtests.do


example
mma08p1cmtests.txt

* mma08p1cmtests.asc

8.5.5

283-4

Nonnested

2SLS:

models

Using mma06p1nl2sls.lim
mma06p1nl2sls.out

mma07p1mltests.do
mma07p1mltests.txt

mma07p4boot.do
mma07p4boot.txt

test mma08p2nonnested.do

21

example

mma08p2nonnested.txt

8.7.3

290-1

Model
example

diagnostics mma08p3diagnostics.do
mma08p3diagnostics.txt

9.2

295-7

Nonparametric
density mma09p1np.do
estimation and regression: mma09p1np.txt
appplication

mma08p2nonnested.asc
*
mma08p3diagnostics.asc

9.4-9.5 307-19 Nonparametric regression: mma09p2npmore.do


more
mma09p2npmore.txt

* mma09p2npmore.asc

9.3.3

* mma09p3kernels.asc

299300

10.2.5 338-9

Kernel functions plotted

mma09p3kernels.do
mma09p3kernels.txt

Gradient method example mma10p1gradient.do


(Newton Raphson)
mma10p1gradient.txt

PROGRAMS:

No data

III. Computationally-Intensive Methods

(chapters 11-13)

Section

Pages

Example

Program and Output

Data

11.3

366-8

Bootstrap example

mma11p1boot.do
mma11p1boot.txt

* mma11p1boot.asc

12.3.3

391-2

Integral
Example

12.4.5,
12.5.6

397-7,
403-4

Maximum
Simulated mma12p2mslmsm.do
Likelihood and Maximum mma12p2mslmsm.txt
Simulated Score Example

*
mma12p2mslmsm.asc

12.8.2

412-3

Illustration of Methods to mma12p3draws.do


Draw Random Variates
mma12p3draws.txt

No data

13.2.2

424

Bayes Theorem Illustration mma13p1bayesthm.do


for Normal Distribution mma13p1bayesthm.txt
and Prior

No data

13.6

452-4

MCMC Example: Gibbs mma13p2bayesgibbs.sas Program generated


Sampler for SUR
mma13p2bayesgibbs.lst
mma13p2bayesgibbs.log

PROGRAMS:

IV.

Computation mma12p1integration.do No data


mma12p1integration.txt

Models

for

Cross-Section

Data

Section Pages

Example

14.2

Logit
and
Probit mma14p1binary.do
Application (fishing mode) mma14p1binary.txt

464-5

Program and Output

(chapters

14-20)

Data
Nldata.asc

22

14.7.5

486

Maximum score estimator mma14p2maxscore.lim


for binary outcome
mma14p2maxscore.out

mma14p1binary.asc

15.2.1- 491-5
3

Multinomial Logit and mma15p1mnl.do


Conditional
Logit mma15p1mnl.txt
Application (fishing mode)

Nldata.asc

15.6.3

511

Nested Logit (or GEV) mma15p2gev.do


estimation
mma15p2gev.txt

Nldata.asc

15.2.2

493-4

Limdep multinomial logit

Nldata.asc

mma15p3mnl.lim
mma15p3mnl.out

15.2.1- 491-5
3

Limdep and addon Nlogit mma15p4gev.lim


for conditional and nested mma15p4gev.out
logit

mma15p4gev.asc

16.2.1

530-1,
565

Classic Tobit MLE and mma16p1tobit.do


CLAD
mma16p1tobit.txt

mma16p1tobit.asc

16.3.4

540

Inverse Mills ratio plotted

No data

16.6

553-5

Selection
Application
expenditures)

17.2
17.5.1

574-5
581-3

Nonparametric estimation mma17p1km.do


(KM for NA) for survival mma17p1km.txt
data (strike duration)

strkdur.dta
strkdur.asc

17.5.1

581-2

Nonparametric estimation mma17p2kmextra.do


(KM and NA) for survival mma17p2kmextra.txt
data (artificial)

Data in program

17.6.1

584-6

Weibull
distribution mma17p3weib.do
functions plotted
mma17p3weib.txt

No data

17.11

603-8

Duration regression models mma17p4duration.do


(unemployment duration) mma17p4duration.txt

ema1996.dta
or ema1996.asc

18.8

632-6

Duration regression with mma18p1heterogeneity.do ema1996.dta


unobserved heterogeneity mma18p1heterogeneity.txt or ema1996.asc
(unemployment duration)

19.5

658-3

Competing risks model mma19p1comprisks.do


(unemployment duration) mma19p1comprisks.txt

ema1996.dta
or ema1996.asc

20.2
20.7

671-4
690

Count regression (doctor mma20p1count.do


contacts)
mma20p1count.txt

randdata.dta
mma20p1count.asc

mma16p2mills.do
mma16p2mills.txt

Model mma16p3selection.do
(medical mma16p3selection.txt

randdata.dta
or
mma16p3selection.asc

or

23

PROGRAMS:

V.

Models

for

Data

(chapters

Pages

21.3.1-3

708-13 Linear Panel Fixed and mma21p1panfeandre.do


Random Effects Application mma21p1panfeandre.txt
(hours and wages)

MOM.dat

21.3.2
21.3.4

710
719

Linear Panel Estimators mma21p2panmanual.do


manually obtained by OLS mma21p2panmanual.txt
on transformed equation
(hours and wages)

MOM.dat

21.3.4

713-5

Linear
Panel
Residual mma21p3panresiduals.do
Analysis (hours and wages) mma21p3panresiduals.txt

MOM.dat

21.5.5

725

Linear Panel pooled OLS mma21p4pangls.do


and GLS estimation (hours mma21p4pangls.txt
and wages)

MOM.dat

22.3

754-6

Linear
Panel
GMM mma22p1gmmpanel.do
Application (hours and mma22p1gmmpanel.txt
wages)

MOMprecise.dat

23.3

792-5

Nonlinear Panel Application mma23p1pannonlin.do


(patents and R&D)
mma23p1pannonlin.txt

patr7079.asc

VI.
Example

Program and Output

21-23)

Section

PROGRAMS:

Example

Panel

Further

Methods

Section

Pages

24.7

848-53 Clustered Linear Regression mma24p1olscluster.do


(household
medical mma24p1olscluster.txt
expenditure clustered on
commune)

Data

(chapters

Program and Output

Clustered
Poisson mma24p2poiscluster.do
Regression
(individual mma24p2poiscluster.txt
pharmacy visits clustered on
commune)

24-27)

Data
vietnam_ex1.dta
or vietnam_ex1.asc

vietnam_ex2.dta
or vietnam_ex2.asc

25.8.1-4

889-93 Treatment
Evaluation: mma25p1treatment.do
Simple
calculations mma25p1treatment.txt
(training on earnings)

nswpsid.da1
or nswpsid.dta

25.8.5

893-6

nswpsid.da1
or nswpsid.dta

25.8

889-96 Treatment

Treatment
Evaluation: mma25p2matching.do
Propensity score matching mma25p2matching.txt
(training on earnings):
Evaluation: mma25p3extra.do

nswre74_treated.dta
24

Additional analysis not in mma25p3extra.txt


book using additional data
sets (NSW experimental
controls and CPS controls)

26.5

919-20 Measurement
Example

27.8

935-9

Error

Bias To

Missing
Data
MCMC To come
Imputation Example

and
nswre74_control.dta
or nswre74_all.asc
propensity_cps.dta
or
propensity_cps.asc
come Generated data

Generated data

25

DATA

SETS

Data in fixed format text file have extension .asc or .dat [and if Stata dictionary used extension is
.dct]
Stata
data
files
have
extension
.dta
We thank Rajeev Dehejia, Bronwyn Hall, Cathy Kling, Jeffrey Kling, Will Manning, Brian McCall
and Jim Ziliak for making their data available for empirical illustrations. The relevant citations are
given below. For "Authors' extract" the citation is A. C. Cameron and P. K. Trivedi (2005),
"Microeconometrics: Methods and Applications," Cambridge University Press, New York.
Many more examples use generated data - see programs.
Pages

Topic

Data Source

Data

88-90

Median and quantile Vietnam World Bank Livings Standards qreg0902.dta


regression
Survey
qreg0902.asc
Authors' extract

or

110-2

Instrumental
National
Longitudinal
Survey DATA66.dat
variables with weak J. R. Kling (2001) "Interpreting DATA66.dct
instruments
Instrumental Variables Estimates of the
Return to Schooling," Journal of Business
and Economic Statistics, 19, 358-364.

and

295-7
300

Panel Survey of
Nonparametric
density estimation Authors' extract
and regression

463-6
486
491-5

Binary
multinomial
outcomes

553-6
565

Selection models

Rand Health Insurance


Authors' extract

574-5
582

Duration models

Strike
duration
data strkdur.asc
J. Kennan (1985), "The Duration of strkdur.asc
Contract strikes in U.S. Manufacturing,"
Journal of Econometrics, 28, 5-28.

or

603-8
632-6
658-62

Duration models

Current Population Survey Displaced ema1996.dta


Workers
Supplement ema1996.asc
B. P. McCall (1996), "Unemployment
Insurance Rules, Joblessness, and Parttime Work," Econometrica, 64, 647-682.

or

671-4
692

Count data models

Rand Health Insurance Experiment randdata.dta


or
P. Deb and P.K. Trivedi (2002), "The mma20p1count.asc
Structure of Demand for Medical Care:
Latent Class versus Two-Part Models,"
Journal of Health Economics, 21, 601625.

708-15

Linear
panel Panel Survey of Income Dynamics MOM.dat
models: basics
J. Ziliak (1997), "Efficient Estimation

Income

Dynamics psidf3050.dat

choice
data Nldata.asc
and Fishing-mode
J. A. Herriges and C. L. Kling (1999), mma15p4gev.asc
"Nonlinear Income Effects in Random
Utility Models," Review of Economics
and Statistics, 81, 62-72.

or

Experiment randdata.dta
or
mma16p3selection.asc

26

With Panel Data when Instruments are


Predetermined: An Empirical Comparison
of Moment-Condition Estimators," Journal
of Business and Economic Statistics, 15,
419-431.
754-6

Linear
panel Panel Survey of Income Dynamics MOMprecise.dat
models: GMM
J. Ziliak (1997) - see previous cite.

792-5

Nonlinear
models

848-53

Clustered data

889-95

panel Patents-R&D
data patr7079.asc
B. H. Hall, Z. Griliches and J. A.
Hausman (1986), "Patents and R&D: Is
There a Lag?", International Economic
Review, 27, 265-283.

Treatment
evaluation
[nswpsid:
NSW
treated vs PSID
control used in text.
The other data sets
not used in text but
used
in
mmap3extra.do]

Vietnam World Bank Livings Standards


Survey
Authors' extract: (1) Household data (2)
Individual data

vietnam_ex1.dta
vietnam_ex1.asc
vietnam_ex2.dta
vietnam_ex2.asc

National Supported Work demonstration


project
and
controls.
R.H. Dehejia and S. Wahba (1999),
"Causal Effects in Nonexperimental
Studies: Reevaluating the Evaluation of
Training Programs," JASA, 1053-1062.
and
/
or
R.H. Dehejia and S. Wahba (2002),
"Propensity-score Matching Methods for
Nonexperimental Causal Studies," ReStat,
151-161.

nswpsid.da1
or
nswpsid.dta
nswre74_treated.dta
and
nswre74_control.dta
or
nswre74_all.asc
propensity_cps.dta
or propensity_cps.asc

or
or

27

EXPLANATION OF BOOK PROGRAMS


PROGRAMS USED:

Most programs are in Stata version 8.0, executed on a MSWindows PC with Stata 8.2.
Stata 7 will usually be okay. Exceptions where Stata 8 is needed include:
(1) Estimates command (for tabulating regression results) is not available in version 7.
Comment out occurrences of "estimates store ..."
and "estimates table ...."
(2) Graphics commands (used to obtain the figures in the book) changed substantially from 7 to 8.
This only effects generating figures. If graphs are important, it is best to upgrade to Stata 8 as so
much
better.
(3) In some places free Stata add-ons have been included. These are noted in programs.
To download these programs e.g. knnreg in Stata give command "search knnreg" and follow
directions.
The Stata programs vary from very problem-specific code to code that potentially can be adapted to
one's own needs.
Some programs use Limdep version 7.0 and Nlogit 2.0, executed on an MSWindows PC.
Some programs use SAS / IML. SAS version 8.0 used on a Unix machine.
FILE NAMING CONVENTIONS:
For
Stata:
as
an
example
for
chapter
4.5.3
we
provide:
mma04p1wls.do
Stata
program
mma04p1wls.txt
Output
from
this
program
- mma04p1wls.asc
The generated data as fixed width ascii data set
[permits analysis with programs other than Stata]
For
Limdep:
as
an
example
for
chapter
14.5.3
we
provide:
mma15p3mnl.lim
Limdep
program
- mma15p3mnl.out
Output from this program
For
SAS:
as
an
example
for
chapter
13.6
we
provide:
mma15p2bayesgibbs.sas
SAS
program
mma13p2bayesgibbs.lst
SAS
output
- mma13p2bayesgibbs.log SAS logfile
For
data
sets
the
extensions
are:
.dta
for
Stata
data
set
- .asc for ascii (text) data set that is usually both space delimited and fixed width
For descriptions of the data sets see the relevant program that uses the data set, and the associated
output.
PROGRAM CPU TIME
Programs generally take little time to run.
Exception is programs that entail simulation, including bootstrapping.
Programs can be speeded up by reducing the number of simulations / replications, though final
analysis should use many simulations / replications.

28

29

30

31

32

33

34

35

36

37

38

Chapter 4. Linear models

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p1wls.txt
log type: text
opened on: 17 May 2005, 13:41:48
.
. ********** OVERVIEW OF MMA04P1WLS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 4.5.3 pages 84-5
. * Robust Standard Errors for OLS, WLS and GLS
. * (1) Robust and nonrobust standard errors for OLS, WLS and GLS.
. * (2) Table 4.3
. * using generated data (see below)
.
. ********** SETUP **********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA and SUMMARIZE **********
.
. * Model is y = 1 + 1*x + u
. * where u = abs(x)*e
.*
x ~ N(0, 5^2)
.*
e ~ N(0, 2^2)
.
. * Errors are conditionally heteroskedastic with V[u|x]=4*x^2
. * OLS, WLS and GLS are consistent
. * but need to use robust standard errors for OLS and WLS.
.
. set seed 10105
. set obs 100
obs was 0, now 100
. gen x = 5*invnorm(uniform())
39

. gen e = 2*invnorm(uniform())
. gen u = abs(x)*e
. gen y = 1 + 1*x + u
.
. * Descriptive Statistics
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x|
100 -.1322828 4.64293 -11.05289 10.63336
e|
100 .350339 2.033639 -3.776468 5.150759
u|
100 1.215709 8.187081 -19.58098 32.6086
y|
100 2.083426 9.364465 -27.63657 39.93944
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x e u using mma04p1wls.asc, replace
.
. ********** ESTIMATE THE MODELS **********
.
. ** (1) OLS - first column of Table 4.3
.
. * (1A) OLS with wrong standard errors
. regress y x
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 1, 98) = 30.23
Model | 2046.73901 1 2046.73901
Prob > F
= 0.0000
Residual | 6634.88855 98 67.7029444
R-squared = 0.2358
-------------+-----------------------------Adj R-squared = 0.2280
Total | 8681.62755 99 87.6932076
Root MSE
= 8.2282
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .979313 .1781124 5.50 0.000 .6258548 1.332771
_cons | 2.212973 .8231553 2.69 0.008 .5794478 3.846497
-----------------------------------------------------------------------------. estimates store olsusual
.
. * (1B) OLS with correct standard errors (robust sandwich)
. regress y x, robust

40

Regression with robust standard errors


Number of obs =
F( 1, 98) = 12.68
Prob > F
= 0.0006
R-squared = 0.2358
Root MSE = 8.2282

100

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .979313 .2750617 3.56 0.001 .4334621 1.525164
_cons | 2.212973 .8198253 2.70 0.008
.586056 3.839889
-----------------------------------------------------------------------------. estimates store olsrobust
.
. ** (2) WLS - second column of Table 4.3
.
. * (2A) WLS with wrong standard errors
. * Use the aweight option (not clearly explained in Stata manual).
. * The aweight option MULTIPLIES y and x by sqrt(aweight).
. * Here we suppose V[u]=constant*|x|
. * So want to divide by sqrt(|x|), so let aweight=1/|x|
. gen absx = abs(x)
. regress y x [aweight=1/absx]
(sum of wgt is 5.7885e+02)
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 1, 98) = 25.29
Model | 56.759883 1 56.759883
Prob > F
= 0.0000
Residual | 219.985987 98 2.24475497
R-squared = 0.2051
-------------+-----------------------------Adj R-squared = 0.1970
Total | 276.74587 99 2.79541283
Root MSE
= 1.4983
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9569768 .1903115 5.03 0.000 .5793097 1.334644
_cons | 1.060374 .1498265 7.08 0.000 .7630484
1.3577
-----------------------------------------------------------------------------. estimates store wlsusual
.
. * (2B) WLS with correct standard errors (robust sandwich)
. regress y x [aweight=1/absx], robust
(sum of wgt is 5.7885e+02)
Regression with robust standard errors

Number of obs =

100
41

F( 1, 98) = 17.07
Prob > F
= 0.0001
R-squared = 0.2051
Root MSE = 1.4983
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9569768 .231612 4.13 0.000 .4973503 1.416603
_cons | 1.060374 .050533 20.98 0.000 .9600931 1.160655
-----------------------------------------------------------------------------. estimates store wlsrobust
.
. ** (3) GLS - last column of Table 4.3
.
. * (3A) GLS with usual standard errors (correct)
. * Here we know V[u]=constant*x^2
. * So want to divide by x, so let aweight=1/(x^2)
. gen xsq = x*x
. regress y x [aweight=1/xsq]
(sum of wgt is 1.0314e+05)
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 1, 98) = 20.70
Model | .086075004 1 .086075004
Prob > F
= 0.0000
Residual | .407542418 98 .004158596
R-squared = 0.1744
-------------+-----------------------------Adj R-squared = 0.1660
Total | .493617422 99 .004986035
Root MSE
= .06449
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9516457 .2091752 4.55 0.000 .5365444 1.366747
_cons | .9964956 .0065131 153.00 0.000 .9835706 1.009421
-----------------------------------------------------------------------------. estimates store glsusual
.
. * (3B) GLS with standard errors (robust sandwich - unnecessary here)
. regress y x [aweight=1/xsq], robust
(sum of wgt is 1.0314e+05)
Regression with robust standard errors
Number of obs =
F( 1, 98) = 20.89
Prob > F
= 0.0000
R-squared = 0.1744

100

42

Root MSE

= .06449

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9516457 .2082145 4.57 0.000 .5384508 1.364841
_cons | .9964956 .0078922 126.26 0.000 .9808337 1.012157
-----------------------------------------------------------------------------. estimates store glsrobust
.
. * (3C) Check that aweight works as expected.
. * Do GLS by OLS on daya transformed by dividing by x.
. gen try = y/x
. gen trint = 1/x
. gen trx = x/x
. regress try trx trint, noconstant
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 2, 98) =11850.15
Model | 101659.545 2 50829.7726
Prob > F
= 0.0000
Residual | 420.359033 98 4.28937789
R-squared = 0.9959
-------------+-----------------------------Adj R-squared = 0.9958
Total | 102079.904 100 1020.79904
Root MSE
= 2.0711
-----------------------------------------------------------------------------try |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------trx | .9516457 .2091752 4.55 0.000 .5365444 1.366747
trint | .9964956 .0065131 153.00 0.000 .9835706 1.009421
-----------------------------------------------------------------------------.
. ********** DISPLAY KEY RESULTS **********
.
. * Table 4.3
. estimates table olsusual olsrobust wlsusual wlsrobust glsusual glsrobust, /*
>
*/ se stats(N r2) b(%7.3f) keep(_cons x)
-------------------------------------------------------------------------Variable | olsus~l olsro~t wlsus~l wlsro~t glsus~l glsro~t
-------------+-----------------------------------------------------------_cons | 2.213 2.213 1.060 1.060 0.996 0.996
| 0.823 0.820 0.150 0.051 0.007 0.008
x | 0.979 0.979 0.957 0.957 0.952 0.952
| 0.178 0.275 0.190 0.232 0.209 0.208
43

-------------+-----------------------------------------------------------N | 100.000 100.000 100.000 100.000 100.000 100.000


r2 | 0.236 0.236 0.205 0.205 0.174 0.174
-------------------------------------------------------------------------legend: b/se
.
. * Minor typo in Table 4.3:
. * for GLS Constant has robust s.e. of [0.008] not [0.006]
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma04p1wls.txt
log type: text
closed on: 17 May 2005, 13:41:48
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p2qreg.txt
log type: text
opened on: 17 May 2005, 13:43:21
.
. ********** OVERVIEW OF MMA04P2QREG.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 4.6.4 pages 88-90
. * Quantile Regression analysis.
. * (1) Quantile regression estimates for different quantiles
. * (2) Figure 4.1: Quantile Slope Coefficient Estimates as Quantile Varies
. * (3) Figure 4.2: Quantile Regression Lines as Quantile Varies
.
. * To run this program you need data file
. * qreg0902.dta
. * or for programs other than Stata use qreg92.asc
.
. * Step (3) takes a long time due to bootstrap to get standard errors.
. * To speed up the program reduce the number of repititions in qsreg
. * But any final results should use a large number of bootstraps
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */

44

.
. ********** DATA DESCRIPTION **********
.
. * The data from World Bank 1997 Vietnam Living Standards Survey
. * are described in chapter 4.6.4.
. * A larger sample from this survey is studied in Chapter 24.7
.
. ********** READ DATA, TRANSFORM and SAMPLE SELECTION **********
.
. use qreg0902
. describe
Contains data from qreg0902.dta
obs:
5,999
vars:
9
19 Sep 2002 21:45
size:
191,968 (98.1% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------sex
byte %8.0g
Gender of HH.head (1:M;2:F)
age
int %8.0g
Age of household head
educyr98
float %9.0g
schooling year of HH.head
farm
float %9.0g
loaiho Type of HH (1:farm; 0:nonfarm)
urban98
byte %8.0g
urban
1:urban 98; 0:rural 98
hhsize
long %12.0g
Household size
lhhexp1
float %9.0g
lhhex12m
float %9.0g
lnrlfood
float %9.0g
------------------------------------------------------------------------------Sorted by:
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------sex |
5999 1.270712 .4443645
1
2
age |
5999 48.01284 13.7702
16
95
educyr98 |
5999 7.094419 4.416092
0
22
farm |
5999 .5730955 .4946694
0
1
urban98 |
5999 .2883814 .4530472
0
1
-------------+-------------------------------------------------------hhsize |
5999 4.752292 1.954292
1
19
lhhexp1 |
5999 9.341561 .6877458 6.543108 12.20242
lhhex12m |
5006 6.310585 1.593083
0 12.36325
lnrlfood |
5999 8.679536 .5368118 6.356364 11.38385
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile sex age educyr98 farm urban98 hhsize lhhexp1 lhhex12m lnrlfood /*
45

>

*/ using qreg0902.asc, replace

.
. * drop zero observations for medical expenditures
. drop if lhhex12m == .
(993 observations deleted)
.
. * lhhexp1 is natural logarithm of household total expenditure
. * lhhex12m is natural logarithm of household medical expenditure
. gen lntotal = lhhexp1
. gen lnmed = lhhex12m
. label variable lntotal "Log household total expenditure"
. label variable lnmed "Log household medical expenditure"
. describe
Contains data from qreg0902.dta
obs:
5,006
vars:
11
19 Sep 2002 21:45
size:
200,240 (98.0% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------sex
byte %8.0g
Gender of HH.head (1:M;2:F)
age
int %8.0g
Age of household head
educyr98
float %9.0g
schooling year of HH.head
farm
float %9.0g
loaiho Type of HH (1:farm; 0:nonfarm)
urban98
byte %8.0g
urban
1:urban 98; 0:rural 98
hhsize
long %12.0g
Household size
lhhexp1
float %9.0g
lhhex12m
float %9.0g
lnrlfood
float %9.0g
lntotal
float %9.0g
Log household total expenditure
lnmed
float %9.0g
Log household medical
expenditure
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------sex |
5006 1.269676 .443836
1
2
age |
5006 48.06133 13.79974
18
95
educyr98 |
5006 7.147956 4.333304
0
21
46

farm |
5006 .5679185 .4954151
0
1
urban98 |
5006 .2920495 .4547504
0
1
-------------+-------------------------------------------------------hhsize |
5006 4.832601 1.95257
1
19
lhhexp1 |
5006 9.370402 .6726841 6.543108 12.20242
lhhex12m |
5006 6.310585 1.593083
0 12.36325
lnrlfood |
5006 8.697963 .5309517 6.356364 11.38385
lntotal |
5006 9.370402 .6726841 6.543108 12.20242
-------------+-------------------------------------------------------lnmed |
5006 6.310585 1.593083
0 12.36325
.
. ********* ANALYSIS: QUANTILE REGRESSION **********
.
. * (0) OLS
. reg lnmed lntotal
Source |
SS
df
MS
Number of obs = 5006
-------------+-----------------------------F( 1, 5004) = 311.91
Model | 745.293239 1 745.293239
Prob > F
= 0.0000
Residual | 11956.9671 5004 2.38948183
R-squared = 0.0587
-------------+-----------------------------Adj R-squared = 0.0585
Total | 12702.2603 5005 2.53791415
Root MSE
= 1.5458
-----------------------------------------------------------------------------lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .5736545 .0324817 17.66 0.000 .5099761 .6373328
_cons | .9352117 .3051496 3.06 0.002 .3369847 1.533439
-----------------------------------------------------------------------------. predict pols
(option xb assumed; fitted values)
. reg lnmed lntotal, robust
Regression with robust standard errors
Number of obs =
F( 1, 5004) = 318.05
Prob > F
= 0.0000
R-squared = 0.0587
Root MSE = 1.5458

5006

-----------------------------------------------------------------------------|
Robust
lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .5736545 .0321665 17.83 0.000
.510594 .636715
_cons | .9352117 .298119 3.14 0.002 .3507677 1.519656
-----------------------------------------------------------------------------. * Bootstrap standard errors for OLS
47

. set seed 10101


. * bs "reg lnmed lntotal" "_b[lntotal]", reps(100)
.
. * (1) Quantile and median regression for quantiles 0.1, 0.5 and 0.9
. * Save prediction to construct Figure 4.2.
. qreg lnmed lntotal, quant(.10)
Iteration 1: WLS sum of weighted deviations = 3554.0793
Iteration 1: sum of abs. weighted deviations = 3555.3279
Iteration 2: sum of abs. weighted deviations = 3344.1924
Iteration 3: sum of abs. weighted deviations = 3051.7353
Iteration 4: sum of abs. weighted deviations = 2942.1274
Iteration 5: sum of abs. weighted deviations = 2939.3979
Iteration 6: sum of abs. weighted deviations = 2935.9969
Iteration 7: sum of abs. weighted deviations = 2933.0493
Iteration 8: sum of abs. weighted deviations = 2932.7763
Iteration 9: sum of abs. weighted deviations = 2932.4432
Iteration 10: sum of abs. weighted deviations = 2932.4429
.1 Quantile regression
Number of obs =
5006
Raw sum of deviations 2936.097 (about 4.1743875)
Min sum of deviations 2932.443
Pseudo R2 = 0.0012
-----------------------------------------------------------------------------lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .1512009 .0552584 2.74 0.006 .0428702 .2595317
_cons | 2.825072 .5194064 5.44 0.000 1.806808 3.843336
-----------------------------------------------------------------------------. predict pqreg10
(option xb assumed; fitted values)
. qreg lnmed lntotal, quant(.5)
Iteration 1: WLS sum of weighted deviations = 6112.8801
Iteration
Iteration
Iteration
Iteration

1: sum of abs. weighted deviations =


2: sum of abs. weighted deviations =
3: sum of abs. weighted deviations =
4: sum of abs. weighted deviations =

6112.4546
6098.5295
6097.2178
6097.1564

Median regression
Number of obs =
Raw sum of deviations 6324.265 (about 6.3716121)
Min sum of deviations 6097.156
Pseudo R2

5006
=

0.0359

-----------------------------------------------------------------------------lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .6210917 .0388194 16.00 0.000 .5449886 .6971948
_cons | .5921626 .3646869 1.62 0.104 -.1227836 1.307109
48

-----------------------------------------------------------------------------. predict pqreg50


(option xb assumed; fitted values)
. qreg lnmed lntotal, quant(.90)
Iteration 1: WLS sum of weighted deviations = 3275.6073
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

1: sum of abs. weighted deviations =


2: sum of abs. weighted deviations =
3: sum of abs. weighted deviations =
4: sum of abs. weighted deviations =
5: sum of abs. weighted deviations =
6: sum of abs. weighted deviations =
7: sum of abs. weighted deviations =
8: sum of abs. weighted deviations =

3279.5575
2691.3839
2521.5214
2506.303
2505.1952
2505.1334
2505.1314
2505.1313

.9 Quantile regression
Number of obs =
5006
Raw sum of deviations 2687.692 (about 8.2789364)
Min sum of deviations 2505.131
Pseudo R2 = 0.0679
-----------------------------------------------------------------------------lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lntotal | .8003569 .0517225 15.47 0.000 .6989581 .9017558
_cons | .6750967 .4857563 1.39 0.165 -.2771985 1.627392
-----------------------------------------------------------------------------. predict pqreg90
(option xb assumed; fitted values)
.
. * (2) Create Figure 4.2 on page 90 first as this is easy
. graph twoway (scatter lnmed lntotal, msize(vsmall)) (lfit pqreg90 lntotal, clstyle(p2)) /*
> */ (lfit pqreg50 lntotal, clstyle(p1)) (lfit pqreg10 lntotal, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Regression Lines as Quantile Varies") /*
> */ xtitle("Log Household Medical Expenditure", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Log Household Total Expenditure", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Actual Data") label(2 "90th percentile") /*
> */
label(3 "Median") label(4 "10th percentile"))
. graph export ch4fig2QR.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch4fig2QR.wmf written in Windows Metafile format)
.
. * (3) Create Figure 4.1 second as this is more difficult
. * Simultaneous quantile regression for quantiles 0.05, 0.10, ..., 0.90, 0.95
. * with standard errors by bootstrap - here 200 replications
. set seed 10101
49

. sqreg lnmed lntotal, quant(.05,.1,.15,.2,.25,.3,.35,.4,.45,.5,.55,.6,.65,.7,.75,.8,.85,.9,.95) rep


> s(200)
(fitting base model)
(bootstrapping .....................................................................................
> ..................................................................................................
> .................)
Simultaneous quantile regression
bootstrap(200) SEs

Number of obs =
5006
.05 Pseudo R2 = 0.0015
.10 Pseudo R2 = 0.0012
.15 Pseudo R2 = 0.0058
.20 Pseudo R2 = 0.0106
.25 Pseudo R2 = 0.0149
.30 Pseudo R2 = 0.0183
.35 Pseudo R2 = 0.0242
.40 Pseudo R2 = 0.0274
.45 Pseudo R2 = 0.0326
.50 Pseudo R2 = 0.0359
.55 Pseudo R2 = 0.0408
.60 Pseudo R2 = 0.0464
.65 Pseudo R2 = 0.0500
.70 Pseudo R2 = 0.0520
.75 Pseudo R2 = 0.0563
.80 Pseudo R2 = 0.0603
.85 Pseudo R2 = 0.0630
.90 Pseudo R2 = 0.0679
.95 Pseudo R2 = 0.0795

-----------------------------------------------------------------------------|
Bootstrap
lnmed |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------q5
|
lntotal | .1536332 .0791236 1.94 0.052 -.0014838 .3087501
_cons | 2.095395 .7559016 2.77 0.006 .6134964 3.577293
-------------+---------------------------------------------------------------q10
|
lntotal | .1512009 .085018 1.78 0.075 -.0154716 .3178734
_cons | 2.825072 .7697613 3.67 0.000 1.316002 4.334141
-------------+---------------------------------------------------------------q15
|
lntotal | .2695707 .0580757 4.64 0.000 .1557168 .3834245
_cons | 2.231293 .5429047 4.11 0.000 1.166962 3.295624
-------------+---------------------------------------------------------------q20
|
lntotal | .3552251 .0504688 7.04 0.000 .2562841 .4541662
_cons | 1.740233 .4649551 3.74 0.000 .8287172 2.651749
-------------+---------------------------------------------------------------q25
|
lntotal | .4034632 .0421514 9.57 0.000 .3208279 .4860984
50

_cons | 1.567055 .3844967 4.08 0.000 .8132731 2.320837


-------------+---------------------------------------------------------------q30
|
lntotal | .4797723 .0478081 10.04 0.000 .3860474 .5734972
_cons | 1.097107 .4299363 2.55 0.011 .2542435 1.93997
-------------+---------------------------------------------------------------q35
|
lntotal | .52179 .0440082 11.86 0.000 .4355147 .6080652
_cons | .9213684 .4064355 2.27 0.023 .1245768 1.71816
-------------+---------------------------------------------------------------q40
|
lntotal | .5691746 .0412824 13.79 0.000 .4882429 .6501062
_cons | .6808693 .3754568 1.81 0.070 -.0551906 1.416929
-------------+---------------------------------------------------------------q45
|
lntotal | .6123663 .0402805 15.20 0.000 .5333989 .6913337
_cons | .4890392 .373467 1.31 0.190 -.2431197 1.221198
-------------+---------------------------------------------------------------q50
|
lntotal | .6210917 .0414602 14.98 0.000 .5398117 .7023718
_cons | .5921626 .3866997 1.53 0.126 -.1659383 1.350263
-------------+---------------------------------------------------------------q55
|
lntotal | .6523013 .02904 22.46 0.000 .5953701 .7092324
_cons | .4913988 .264271 1.86 0.063 -.0266881 1.009486
-------------+---------------------------------------------------------------q60
|
lntotal | .6531127 .0321585 20.31 0.000 .5900679 .7161575
_cons | .6631971 .2981433 2.22 0.026 .0787056 1.247689
-------------+---------------------------------------------------------------q65
|
lntotal | .6843844 .03378 20.26 0.000 .6181608 .7506079
_cons | .5550968 .3162769 1.76 0.079 -.0649445 1.175138
-------------+---------------------------------------------------------------q70
|
lntotal | .714783 .0330755 21.61 0.000 .6499406 .7796255
_cons | .4732288 .3028818 1.56 0.118 -.1205524 1.06701
-------------+---------------------------------------------------------------q75
|
lntotal | .7416898 .0369607 20.07 0.000 .6692306 .814149
_cons | .4298887 .3416755 1.26 0.208 -.239945 1.099722
-------------+---------------------------------------------------------------q80
|
lntotal | .7675658 .0443925 17.29 0.000
.680537 .8545946
_cons | .3966887 .4132223 0.96 0.337 -.4134081 1.206785
-------------+---------------------------------------------------------------q85
|
lntotal | .8009016 .056703 14.12 0.000 .6897389 .9120642
_cons | .3649957 .5369325 0.68 0.497 -.6876273 1.417619
-------------+---------------------------------------------------------------q90
|
51

lntotal | .8003569 .0473557 16.90 0.000 .7075189 .8931949


_cons | .6750967 .4450068 1.52 0.129 -.1973116 1.547505
-------------+---------------------------------------------------------------q95
|
lntotal | .767308 .0507532 15.12 0.000 .6678094 .8668066
_cons | 1.487137 .4739756 3.14 0.002 .5579371 2.416337
-----------------------------------------------------------------------------. * Test equality of slope coefffiients for 25th and 75th quantiles
. test [q25]lntotal = [q75]lntotal
( 1) [q25]lntotal - [q75]lntotal = 0
F( 1, 5004) = 55.14
Prob > F = 0.0000
. * Create vectors of slope cofficients and estimated variances
. * Code here specific for this problem
. * with single slope coefficient is 1st, 3rd, 5th , ... entry
. matrix b = e(b)
. matrix bslopevector = b[1,1]\b[1,3]\b[1,5]\b[1,7]\b[1,9]\b[1,11]\b[1,13] /*
>
*/ \b[1,15]\b[1,17]\b[1,19]\b[1,21]\b[1,23]\b[1,25] /*
>
*/ \b[1,27]\b[1,29]\b[1,31]\b[1,33]\b[1,35]\b[1,37]
. matrix V = e(V)
. matrix Vslopevector = V[1,1]\V[3,3]\V[5,5]\V[7,7]\V[9,9]\V[11,11]\V[13,13] /*
>
*/ \V[15,15]\V[17,17]\V[19,19]\V[21,21]\V[23,23]\V[25,25] /*
>
*/ \V[27,27]\V[29,29]\V[31,31]\V[33,33]\V[35,35]\V[37,37]
. matrix q = e(q1)\e(q2)\e(q3)\e(q4)\e(q5)\e(q6)\e(q7)\e(q8)\e(q9)\e(q10) /*
>
*/ \e(q11)\e(q12)\e(q13)\e(q14)\e(q15)\e(q16)\e(q17)\e(q18)\e(q19)
. * Convert column vectors to variables as graph handles variables
. svmat bslopevector, name(bslope)
. svmat Vslopevector, name(Vslope)
. svmat q, name(quantiles)
. gen upper = bslope1 + 1.96*sqrt(Vslope1)
(4987 missing values generated)
. gen lower = bslope1 - 1.96*sqrt(Vslope1)
(4987 missing values generated)
. * Also include OLS slope ccoefficient
. quietly reg lnmed lntotal
. gen bols=_b[lntotal]
52

. sum upper bslope1 lower bols


Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------upper |
19 .6564067 .1904354 .3087155 .9120393
bslope1 |
19 .5641943 .209318 .1512009 .8009015
lower |
19 .4719818 .2302585 -.0154343 .7075397
bols |
5006 .5736545
0 .5736545 .5736545
.
. * Following produces Figure 4.1 om page 89
. graph twoway (line upper quantiles1, msize(vtiny) mstyle(p2) clstyle(p1) clcolor(gs12)) /*
> */ (line bslope1 quantiles1, msize(vtiny) mstyle(p1) clstyle(p1)) /*
> */ (line lower quantiles1, msize(vtiny) mstyle(p2) clstyle(p1) clcolor(gs12)) /*
> */ (line bols quantiles1, msize(vtiny) mstyle(p3) clstyle(p2)), /*
> */ scale(1.2) plotregion(style(none)) /*
> */ title("Slope Estimates as Quantile Varies") /*
> */ xtitle("Quantile", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Slope and confidence bands", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(4) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Upper 95% confidence band") label(2 "Quantile slope coefficient") /*
> */
label(3 "Lower 95% confidence band") label(4 "OLS slope coefficient") )
. graph export ch4fig1QR.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch4fig1QR.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma04p2qreg.txt
log type: text
closed on: 17 May 2005, 13:51:21
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p3iv.txt
log type: text
opened on: 17 May 2005, 13:44:29
.
. ********** OVERVIEW OF MMA04P3IV.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 4.8.8 pages 102-3
. * Instrumental variables analysis.
53

. * (1) IV Regression (with robust s.e.'s though not needed here for iid error).
. * (2) Table 4.4
. * using generated data (see below)
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** GENERATE DATA and SUMMARIZE **********
.
. * Model is
. * y = b1 + b2*x + u
. * x = c1 + c2*z + v
. * z ~ N[2,1]
. * where b1=0, b2=0.5, c1=0 and c2=1
. * and u and v are joint normal (0,0,1,1,0.8)
.
. * OLS of y on z is inconsistent as z is correlated with u
. * Instead need to do IV with instrument x for z
. * Also try using
.
. set seed 10001
. set obs 10000
obs was 0, now 10000
. scalar b1 = 0
. scalar b2 = 0.5
. scalar c1 = 0
. scalar c2 = 1
.
. * Generate errors u and v
. * Use fact that u is N(0,1)
. * and v | u is N(0 + (.8/1)(u - 0), 1 - .8x.8/1 = 0.36)
. gen u = 1*invnorm(uniform())
. gen muvgivnu = 0.8*u
. gen v = 1*(muvgivnu+sqrt(0.36)*invnorm(uniform()))
.
. * Generate instrument z (which is purely random)
. gen z = 2 + 1*invnorm(uniform())

54

.
. * Generate regressor x which is correlated with z, and with u via v
. gen x = c1 + c2*z + v
.
. * Generate dependent variable y
. gen y = b1 + b2*x + u
.
. * Generate z-cubed. Used as an alternative instrument
. gen zcube = z*z*z
.
. * Descriptive Statistics
. describe
Contains data
obs:
10,000
vars:
7
size:
320,000 (96.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------u
float %9.0g
muvgivnu
float %9.0g
v
float %9.0g
z
float %9.0g
x
float %9.0g
y
float %9.0g
zcube
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u | 10000 .003772 1.010726 -4.010302 4.267661
muvgivnu | 10000 .0030176 .8085809 -3.208241 3.414129
v | 10000 .0097031 1.005874 -3.992237 3.79261
z | 10000 1.997786 1.013118 -1.895752 5.81496
x | 10000 2.007489 1.436511 -3.139744 7.366555
-------------+-------------------------------------------------------y | 10000 1.007516 1.538611 -5.309155 7.794924
zcube | 10000 14.14145 17.88016 -6.813095 196.6257
. correlate y x z u v
(obs=10000)

55

|
y
x
z
u
v
-------------+--------------------------------------------y | 1.0000
x | 0.8423 1.0000
z | 0.3403 0.7140 1.0000
u | 0.9237 0.5716 0.0107 1.0000
v | 0.8601 0.7090 0.0124 0.8055 1.0000

. correlate y x z u v, cov
(obs=10000)
|
y
x
z
u
v
-------------+--------------------------------------------y | 2.36732
x | 1.86165 2.06356
z | .530456 1.0391 1.02641
u | 1.4365 .829866 .010909 1.02157
v | 1.33119 1.02447 .012687 .818958 1.01178

. graph matrix y x z u v
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x z u v using mma04p3iv.asc, replace
.
. ********** DO THE ANALYSIS: ESTIMATE MODELS **********
.
. * (1) OLS is inconsistent (first column of Table 4.4)
. regress y x
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) =24412.17
Model | 16793.2198 1 16793.2198
Prob > F
= 0.0000
Residual | 6877.65935 9998 .687903516
R-squared = 0.7094
-------------+-----------------------------Adj R-squared = 0.7094
Total | 23670.8791 9999 2.36732464
Root MSE
= .8294
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9021522 .005774 156.24 0.000 .890834 .9134704
_cons | -.8035441 .014253 -56.38 0.000 -.8314827 -.7756054
-----------------------------------------------------------------------------. regress y x, robust
Regression with robust standard errors
Number of obs = 10000
F( 1, 9998) =24780.49
56

Prob > F
= 0.0000
R-squared = 0.7094
Root MSE = .8294
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .9021522 .0057309 157.42 0.000 .8909184 .9133859
_cons | -.8035441 .0141056 -56.97 0.000 -.8311939 -.7758942
-----------------------------------------------------------------------------. estimates store olswrong
.
. * (2) IV with instrument x is consistent and efficient (second column of Table 4.4)
. ivreg y (x = z)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 2728.97
Model | 13628.1781 1 13628.1781
Prob > F
= 0.0000
Residual | 10042.701 9998 1.004471
R-squared = 0.5757
-------------+-----------------------------Adj R-squared = 0.5757
Total | 23670.8791 9999 2.36732464
Root MSE
= 1.0022
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .5104982 .0097723 52.24 0.000 .4913426 .5296538
_cons | -.017303 .0220296 -0.79 0.432 -.0604854 .0258793
-----------------------------------------------------------------------------Instrumented: x
Instruments: z
-----------------------------------------------------------------------------. ivreg y (x = z), robust
IV (2SLS) regression with robust standard errors
Number of obs = 10000
F( 1, 9998) = 2670.19
Prob > F
= 0.0000
R-squared = 0.5757
Root MSE = 1.0022
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .5104982 .0098792 51.67 0.000 .4911329 .5298635
_cons | -.017303 .0220785 -0.78 0.433 -.0605813 .0259752
57

-----------------------------------------------------------------------------Instrumented: x
Instruments: z
-----------------------------------------------------------------------------. estimates store iv
.
. * (3) IV estimator in (3) can be computed by
.*
regress y on z gives dy/dz
.*
regress x on z gives dx/dz
. * and divide the two
. regress y z
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 1309.44
Model | 2741.16635 1 2741.16635
Prob > F
= 0.0000
Residual | 20929.7128 9998 2.09338995
R-squared = 0.1158
-------------+-----------------------------Adj R-squared = 0.1157
Total | 23670.8791 9999 2.36732464
Root MSE
= 1.4469
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | .516808 .0142819 36.19 0.000 .4888126 .5448035
_cons | -.0249553 .031991 -0.78 0.435 -.0876642 .0377535
-----------------------------------------------------------------------------. matrix byonz = e(b)
. regress x z
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) =10396.43
Model | 10518.3341 1 10518.3341
Prob > F
= 0.0000
Residual | 10115.2362 9998 1.01172597
R-squared = 0.5098
-------------+-----------------------------Adj R-squared = 0.5097
Total | 20633.5703 9999 2.06356339
Root MSE
= 1.0058
-----------------------------------------------------------------------------x|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | 1.01236 .0099287 101.96 0.000 .9928979 1.031822
_cons | -.0149899 .02224 -0.67 0.500 -.0585847 .028605
-----------------------------------------------------------------------------. matrix bxonz = e(b)
. matrix ivfirstprinciples = byonz[1,1]/bxonz[1,1]
. matrix list byonz
58

byonz[1,2]
z
_cons
y1 .51680804 -.02495533
. matrix list bxonz
bxonz[1,2]
z
_cons
y1 1.0123602 -.01498985
. matrix list ivfirstprinciples
symmetric ivfirstprinciples[1,1]
c1
r1 .5104982
.
. * (4) IV can be computed as 2SLS, but wrong standard errors
. * (third column of Table 4.4)
. * (4A) OLS of x on z gives xhat
. regress x z
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) =10396.43
Model | 10518.3341 1 10518.3341
Prob > F
= 0.0000
Residual | 10115.2362 9998 1.01172597
R-squared = 0.5098
-------------+-----------------------------Adj R-squared = 0.5097
Total | 20633.5703 9999 2.06356339
Root MSE
= 1.0058
-----------------------------------------------------------------------------x|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | 1.01236 .0099287 101.96 0.000 .9928979 1.031822
_cons | -.0149899 .02224 -0.67 0.500 -.0585847 .028605
-----------------------------------------------------------------------------. predict xhat, xb
. * (4B) OLS of x on xhat gives IV but wrong standard errors
. regress y xhat
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 1309.44
Model | 2741.16636 1 2741.16636
Prob > F
= 0.0000
Residual | 20929.7127 9998 2.09338995
R-squared = 0.1158
-------------+-----------------------------Adj R-squared = 0.1157
Total | 23670.8791 9999 2.36732464
Root MSE
= 1.4469
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
59

-------------+---------------------------------------------------------------xhat | .5104982 .0141075 36.19 0.000 .4828446 .5381518


_cons | -.017303 .0318026 -0.54 0.586 -.0796425 .0450364
-----------------------------------------------------------------------------. regress y xhat, robust
Regression with robust standard errors
Number of obs = 10000
F( 1, 9998) = 1271.86
Prob > F
= 0.0000
R-squared = 0.1158
Root MSE = 1.4469
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xhat | .5104982 .0143144 35.66 0.000
.482439 .5385574
_cons | -.017303 .0319207 -0.54 0.588 -.0798741 .045268
-----------------------------------------------------------------------------. estimates store twosls
.
. * (5) IV with instrument xcubed is consistent but inefficient
. * (last column of Table 4.4)
. ivreg y (x = zcube)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 2001.31
Model | 13598.1181 1 13598.1181
Prob > F
= 0.0000
Residual | 10072.761 9998 1.0074776
R-squared = 0.5745
-------------+-----------------------------Adj R-squared = 0.5744
Total | 23670.8791 9999 2.36732464
Root MSE
= 1.0037
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .5086427 .0113699 44.74 0.000 .4863555 .5309299
_cons | -.0135782 .0249344 -0.54 0.586 -.0624546 .0352982
-----------------------------------------------------------------------------Instrumented: x
Instruments: zcube
-----------------------------------------------------------------------------. ivreg y (x = zcube), robust
IV (2SLS) regression with robust standard errors
Number of obs = 10000
F( 1, 9998) = 1894.15
60

Prob > F
= 0.0000
R-squared = 0.5745
Root MSE = 1.0037
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .5086427 .0116871 43.52 0.000 .4857337 .5315517
_cons | -.0135782 .0253208 -0.54 0.592 -.063212 .0360556
-----------------------------------------------------------------------------Instrumented: x
Instruments: zcube
-----------------------------------------------------------------------------. estimates store ivineff
.
. ********** DISPLAY KEY RESULTS in Table 4.4 p.103 **********
.
. * Table 4.4 page 103
. estimates table olswrong iv twosls ivineff, se stats(N r2) b(%8.3f) keep(_cons x xhat)
---------------------------------------------------------Variable | olswrong
iv
twosls ivineff
-------------+-------------------------------------------_cons | -0.804 -0.017 -0.017 -0.014
| 0.014
0.022
0.032
0.025
x | 0.902
0.510
0.509
| 0.006
0.010
0.012
xhat |
0.510
|
0.014
-------------+-------------------------------------------N | 1.0e+04 1.0e+04 1.0e+04 1.0e+04
r2 | 0.709
0.576
0.116
0.574
---------------------------------------------------------legend: b/se
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma04p3iv.txt
log type: text
closed on: 17 May 2005, 13:44:41
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma04p4ivweak.txt
log type: text
opened on: 17 May 2005, 13:45:59

61

.
. ********** OVERVIEW OF MMA04P4IVWEAK.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 4.9.5 pages 110-2
. * IV regression with potentially weak instruments
. * (1) Compares OLS and IV estimation of log-wages on schooling regression
. * where schooling, experience and experience-squared are endogenous
. * and proximity to 4-year college, age and age-squared are instruments
. * so model is just-identified.
. * (2) Verifies that here can treat errors as homoskedastic
. * (3) Looks at weak instruments
. * (A) instrument relevance: Whether Shea's partial R-squared is low
. * (B) finite sample bias: whether first-stage partial F is low
. * (4) Provides Table 4.5
. * (5) Does more analysis than reported in the book
.
. * To run this program you need data and dictionary files
. * DATA66.dat ASCII data set
. * DATA66.dct Stata dictionary that labels variables
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set memory 20m
(20480k)
. set linesize 150 /* Permits long inputline commands with delimit */
.
. ********** ORIGINAL DATA SOURCE **********
.
. * Program mma4p4ivweak.do based on Kling Analys66.d0 September 2003
. * written for Jeffrey R. Kling (2001) "Interpreting Instrumental Variables Estimates
. * of the Return to Schooling", Journal of Business and Economic Statistics,
. * July 2001, 19 (3), pp.358-364.
. * This program focuses on Columns (1) and (2) of Kling's Table 1 on p.359
. * in turn based on
. * David Card (1995), "Using Geographic Variation in College Proximity to
. * Estimate the Returns to Schooling", in
. * Aspects of Labor Market Behavior: Essays in Honor of John Vanderkamp,
. * eds. L.N. Christofides et al., Toronto: University of Toronto Press, pp.201-221.
.
62

. ********** READ IN DATA and SUMMARIZE **********


.
. infile using DATA66.dct, using(DATA66.dat)
dictionary using DATA66.dat {
_column(1) id
%8f "ID CODE (r0000100) n= 5225 mean= 2613.000 min= 1 max=
5225 "
_column(9) black
%3f "Race (r0002300) n= 5225 mean= 1.296 min= 1 max=3
"
_column(13) imigrnt
%3f "Was r's brthpl in the US? (r0038000) n=4965 mean=0.98 mn=0
mx=1 "
_column(17) hhead
%8f "Person R lived w/ @ age 14 (r0039700) n= 5213 mean=1.92 mn=1
mx=9"
_column(28) mag_14
%10f "Were magznes avail at age 14 (r0039900) n=5167 mean=0.69
mn=0 mx=1 "
_column(40) news_14 %10f "Were nwspaprs avail at age 14 (r0040000) n=5195 mean=0.85
mn=0 mx=1"
_column(52) lib_14 %10f "Were lib-card avail at age14 (r0040100) n=5204 mean=0.66 mn=0
mx=1 "
_column(63) num_sib
%8f "Tot # sibs r 66 (r0056900) n=5168 mean=3.408 min=0
max=18"
_column(72) fgrade
%8f "Hgc by father, 66 (r0063100) n=3930 mean=9.937 min=0
max=18"
_column(81) mgrade
%8f "Hgc by mother, 66 (r0063300) n=4573 mean=10.25 min=0
max=18"
_column(90) iq
%8f "Iq_score (r0171100) n= 3369 mean=101.582 min=50 max=158 "
_column(99) bdate
%8f "Birthdate - STATA formatted
"
_column(108) gfill76 %8f "'76 Grade level, some values filled from prevs reports"
_column(117) wt76
%8f "'76 Weight "
_column(126) grade76 %8f "'76 Grade level"
_column(135) grade66 %8f "'66 Grade level"
_column(144) age66
%8f "Age reported by screener (r0002200) "
_column(153) smsa66
%8f "If lived in SMSA in 1966 (r0002455=1,2)"
_column(162) region
%8f "Census Region in 1966 (r0002900)
"
_column(171) smsa76
%8f "If lived in SMSA in 1976 (r0437515=1,2)"
_column(180) col4
%8f "If any 4-year college nearby (r0004000!=4) "
_column(189) mcol4
%8f "If male 4-year college nearby (r0004100=1,2) "
_column(198) col4pub %8f "If public 4-year college nearby (r0004000=2,3)"
_column(207) south76 %1f "If lived in South in 1976 (r0437511=1)
"
_column(209) wage76 %10f "'76 Wage"
_column(219) exp76
%8f "'76 experience, (10 + age66) - grade76 - 6)"
_column(230) expsq76 %10f "'76 experience, exp76 ^2/100
"
_column(243) age76
%8f "'76 age (age66 +10)
"
_column(252) agesq76 %8f "'76 age squared (age76^2)
"
_column(261) reg1
%8f "region==NE"
_column(270) reg2
%8f "If lived in Region 2 (region= MidAtl)"
_column(279) reg3
%8f "If lived in Region 3 (region= ENC) "
_column(288) reg4
%8f "If lived in Region 4 (region= WNC) "
_column(297) reg5
%8f "If lived in Region 5 (region= SA ) "
_column(306) reg6
%8f "If lived in Region 6 (region= ESC) "
_column(315) reg7
%8f "If lived in Region 7 (region= WSC) "
_column(324) reg8
%8f "If lived in Region 8 (region= M ) "
63

_column(333) reg9
%8f "If lived in Region 9 (region= P ) "
_column(342) momdad14 %8f "If lived with both parents at age 14 "
_column(351) sinmom14 %8f "If lived with mother only at age 14 "
_column(360) nodaded %1f "If father has no formal education "
_column(362) nomomed %1f "If mother has no formal education "
_column(365) daded
%10f "Mean grade level of father
"
_column(377) momed
%10f "Mean grade level of mother
"
_column(396) famed
%8f "Father's and mother's education
"
_column(405) famed1
%8f "If mgrade> 12 & fgrade> 12 (famed=1) "
_column(414) famed2
%8f "If mgrade>=12 & fgrade>=12 (famed=2) "
_column(423) famed3
%8f "If mgrade==12 & fgrade==12 (famed=3) "
_column(432) famed4
%8f "If mgrade>=12 & fgrade==-1 (famed=4) "
_column(441) famed5
%8f "If fgrade>=12 (famed=5)
"
_column(450) famed6
%8f "If mgrade>=12 & fgrade> -1 (famed=6) "
_column(459) famed7
%8f "If mgrade>=9 & fgrade>=9 (famed=7) "
_column(468) famed8
%8f "If mgrade> -1 & fgrade> -1 (famed=8) "
_column(477) famed9
%8f "If famed not in range (1-8)"
_column(486) int76
%8f "If wt76 not missing "
_column(495) age1415 %8f "If in age group =14-15"
_column(504) age1617 %8f "If in age group =16-17"
_column(513) age1819 %8f "If in age group =18-19"
_column(522) age2021 %8f "If in age group =20-21"
_column(531) age2224 %8f "If in age group =20-24"
_column(540) cage1415 %8f "If in age group =14,15 and lived near college"
_column(549) cage1617 %8f "If in age group =16,17 and lived near college"
_column(558) cage1819 %8f "If in age group =18,19 and lived near college"
_column(567) cage2021 %8f "If in age group =20,21 and lived near college"
_column(576) cage2224 %8f "If in age group =20-24 and lived near college"
_column(585) cage66
%8f "Age in 66 and whether lived near college "
_column(594) a1
%8f "If age in 66 = 14 (age66= 14)"
_column(603) a2
%8f "If age in 66 = 15 (age66= 15)"
_column(612) a3
%8f "If age in 66 = 16 (age66= 16)"
_column(621) a4
%8f "If age in 66 = 17 (age66= 17)"
_column(630) a5
%8f "If age in 66 = 18 (age66= 18)"
_column(639) a6
%8f "If age in 66 = 19 (age66= 19)"
_column(648) a7
%8f "If age in 66 = 20 (age66= 20)"
_column(657) a8
%8f "If age in 66 = 21 (age66= 21)"
_column(666) a9
%8f "If age in 66 = 22 (age66= 22)"
_column(675) a10
%8f "If age in 66 = 23 (age66= 23)"
_column(684) a11
%8f "If age in 66 = 24 (age66= 24)"
_column(693) ca1
%8f "Not lived near college in 66"
_column(702) ca2
%8f "If age in 66 = 14 and lived near college"
_column(711) ca3
%8f "If age in 66 = 15 and lived near college"
_column(720) ca4
%8f "If age in 66 = 16 and lived near college"
_column(729) ca5
%8f "If age in 66 = 17 and lived near college"
_column(738) ca6
%8f "If age in 66 = 18 and lived near college"
_column(747) ca7
%8f "If age in 66 = 19 and lived near college"
_column(756) ca8
%8f "If age in 66 = 20 and lived near college"
_column(765) ca9
%8f "If age in 66 = 21 and lived near college"
_column(774) ca10
%2f "If age in 66 = 22 and lived near college"
_column(777) ca11
%2f "If age in 66 = 23 and lived near college"
64

_column(780) ca12
%8f "If age in 66 = 24 and lived near college"
_column(782) g25
%12f "Grade level when 25 years old
"
_column(795) g25i
%12f "If =g25 and intrvwed in year used for determining g25 "
_column(819) intmo66 %8f "Intvw month in 1966, used to identify cases incl by CARD"
_column(828) nlsflt
%8f "Flag to identify if the case was used by CARD"
_column(837) nsib
%8f "Number of siblings "
_column(846) ns1
%8f "If number of siblings = 0 (nsib= 0)"
_column(855) ns2
%8f "If number of siblings = 2 (nsib= 2)"
_column(864) ns3
%8f "If number of siblings = 3 (nsib= 3)"
_column(873) ns4
%8f "If number of siblings = 4 (nsib= 4)"
_column(882) ns5
%8f "If number of siblings = 6 (nsib= 6)"
_column(891) ns6
%8f "If number of siblings = 9 (nsib= 9)"
_column(900) ns7
%8f "If number of siblings =18 (nsib=18)"
}
(5226 observations read)
. * save DATA66, replace
. desc
Contains data
obs:
5,226
vars:
101
size: 2,132,208 (89.8% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
ID CODE (r0000100) n= 5225
mean= 2613.000 min= 1 max=
5225
black
float %9.0g
Race (r0002300) n= 5225 mean=
1.296 min= 1 max=3
imigrnt
float %9.0g
Was r's brthpl in the US?
(r0038000) n=4965 mean=0.98
mn=0 mx=1
hhead
float %9.0g
Person R lived w/ @ age 14
(r0039700) n= 5213 mean=1.92
mn=1 mx=9
mag_14
float %9.0g
Were magznes avail at age 14
(r0039900) n=5167 mean=0.69
mn=0 mx=1
news_14
float %9.0g
Were nwspaprs avail at age 14
(r0040000) n=5195 mean=0.85
mn=0 mx=1
lib_14
float %9.0g
Were lib-card avail at age14
(r0040100) n=5204 mean=0.66
mn=0 mx=1
num_sib
float %9.0g
Tot # sibs r 66 (r0056900)
n=5168 mean=3.408 min=0
max=18
65

fgrade
mgrade
iq

float %9.0g
float %9.0g
float %9.0g

bdate
gfill76

float %9.0g
float %9.0g

wt76
grade76
grade66
age66

float %9.0g
float %9.0g
float %9.0g
float %9.0g

smsa66

float %9.0g

region
smsa76
col4

float %9.0g
float %9.0g
float %9.0g

mcol4

float %9.0g

col4pub

float %9.0g

south76

float %9.0g

wage76
exp76

float %9.0g
float %9.0g

expsq76
age76
agesq76
reg1
reg2

float %9.0g
float %9.0g
float %9.0g
float %9.0g
float %9.0g

reg3

float %9.0g

reg4

float %9.0g

reg5

float %9.0g

reg6

float %9.0g

reg7

float %9.0g

reg8

float %9.0g

reg9

float %9.0g

Hgc by father, 66 (r0063100)


n=3930 mean=9.937 min=0 max=18
Hgc by mother, 66 (r0063300)
n=4573 mean=10.25 min=0 max=18
Iq_score (r0171100) n= 3369
mean=101.582 min=50 max=158
Birthdate - STATA formatted
'76 Grade level, some values
filled from prevs reports
'76 Weight
'76 Grade level
'66 Grade level
Age reported by screener
(r0002200)
If lived in SMSA in 1966
(r0002455=1,2)
Census Region in 1966
(r0002900)
If lived in SMSA in 1976
(r0437515=1,2)
If any 4-year college nearby
(r0004000!=4)
If male 4-year college nearby
(r0004100=1,2)
If public 4-year college nearby
(r0004000=2,3)
If lived in South in 1976
(r0437511=1)
'76 Wage
'76 experience, (10 + age66) grade76 - 6)
'76 experience, exp76 ^2/100
'76 age (age66 +10)
'76 age squared (age76^2)
region==NE
If lived in Region 2 (region=
MidAtl)
If lived in Region 3 (region=
ENC)
If lived in Region 4 (region=
WNC)
If lived in Region 5 (region=
SA )
If lived in Region 6 (region=
ESC)
If lived in Region 7 (region=
WSC)
If lived in Region 8 (region= M
)
If lived in Region 9 (region= P
)
66

momdad14

float %9.0g

If lived with both parents at


age 14

sinmom14

float %9.0g

If lived with mother only at


age 14

nodaded
nomomed
daded
momed
famed
famed1
famed2
famed3
famed4
famed5
famed6
famed7
famed8
famed9
int76
age1415
age1617
age1819
age2021
age2224
cage1415
cage1617
cage1819
cage2021
cage2224
cage66
a1
a2
a3
a4
a5
a6

float %9.0g

If father has no formal


education
float %9.0g
If mother has no formal
education
float %9.0g
Mean grade level of father
float %9.0g
Mean grade level of mother
float %9.0g
Father's and mother's education
float %9.0g
If mgrade> 12 & fgrade> 12
(famed=1)
float %9.0g
If mgrade>=12 & fgrade>=12
(famed=2)
float %9.0g
If mgrade==12 & fgrade==12
(famed=3)
float %9.0g
If mgrade>=12 & fgrade==-1
(famed=4)
float %9.0g
If fgrade>=12 (famed=5)
float %9.0g
If mgrade>=12 & fgrade> -1
(famed=6)
float %9.0g
If mgrade>=9 & fgrade>=9
(famed=7)
float %9.0g
If mgrade> -1 & fgrade> -1
(famed=8)
float %9.0g
If famed not in range (1-8)
float %9.0g
If wt76 not missing
float %9.0g
If in age group =14-15
float %9.0g
If in age group =16-17
float %9.0g
If in age group =18-19
float %9.0g
If in age group =20-21
float %9.0g
If in age group =20-24
float %9.0g
If in age group =14,15 and
lived near college
float %9.0g
If in age group =16,17 and
lived near college
float %9.0g
If in age group =18,19 and
lived near college
float %9.0g
If in age group =20,21 and
lived near college
float %9.0g
If in age group =20-24 and
lived near college
float %9.0g
Age in 66 and whether lived
near college
float %9.0g
If age in 66 = 14 (age66= 14)
float %9.0g
If age in 66 = 15 (age66= 15)
float %9.0g
If age in 66 = 16 (age66= 16)
float %9.0g
If age in 66 = 17 (age66= 17)
float %9.0g
If age in 66 = 18 (age66= 18)
float %9.0g
If age in 66 = 19 (age66= 19)
67

a7
a8
a9
a10
a11
ca1
ca2

float %9.0g
float %9.0g
float %9.0g
float %9.0g
float %9.0g
float %9.0g
float %9.0g

If age in 66 = 20 (age66= 20)


If age in 66 = 21 (age66= 21)
If age in 66 = 22 (age66= 22)
If age in 66 = 23 (age66= 23)
If age in 66 = 24 (age66= 24)
Not lived near college in 66
If age in 66 = 14 and lived
near college
ca3
float %9.0g
If age in 66 = 15 and lived
near college
ca4
float %9.0g
If age in 66 = 16 and lived
near college
ca5
float %9.0g
If age in 66 = 17 and lived
near college
ca6
float %9.0g
If age in 66 = 18 and lived
near college
ca7
float %9.0g
If age in 66 = 19 and lived
near college
ca8
float %9.0g
If age in 66 = 20 and lived
near college
ca9
float %9.0g
If age in 66 = 21 and lived
near college
ca10
float %9.0g
If age in 66 = 22 and lived
near college
ca11
float %9.0g
If age in 66 = 23 and lived
near college
ca12
float %9.0g
If age in 66 = 24 and lived
near college
g25
float %9.0g
Grade level when 25 years old
g25i
float %9.0g
If =g25 and intrvwed in year
used for determining g25
intmo66
float %9.0g
Intvw month in 1966, used to
identify cases incl by CARD
nlsflt
float %9.0g
Flag to identify if the case
was used by CARD
nsib
float %9.0g
Number of siblings
ns1
float %9.0g
If number of siblings = 0
(nsib= 0)
ns2
float %9.0g
If number of siblings = 2
(nsib= 2)
ns3
float %9.0g
If number of siblings = 3
(nsib= 3)
ns4
float %9.0g
If number of siblings = 4
(nsib= 4)
ns5
float %9.0g
If number of siblings = 6
(nsib= 6)
ns6
float %9.0g
If number of siblings = 9
(nsib= 9)
ns7
float %9.0g
If number of siblings =18
(nsib=18)
------------------------------------------------------------------------------68

Sorted by:
Note: dataset has changed since last saved
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
5225
2613 1508.472
1
5225
black |
5225 .2752153 .4466655
0
1
imigrnt |
5225 .0237321 .1522277
0
1
hhead |
5225 -.3783732 47.95128
-999
9
mag_14 |
5225 .6861566 .4616275
0
1
-------------+-------------------------------------------------------news_14 |
5225 .8483024 .3577176
0
1
lib_14 |
5225 .658469 .4733619
0
1
num_sib |
5168 3.407701 2.586307
0
18
fgrade |
3930 9.93715 3.777654
0
18
mgrade |
4573 10.25104 3.17986
0
18
-------------+-------------------------------------------------------iq |
3369 101.5818 15.93225
50
158
bdate |
5204 472926.6 31765.04 360823 521224
gfill76 |
5225 12.78718 2.802705
0
18
wt76 |
3695 475512.5 265188.5
98617 2582192
grade76 |
3671 13.23018 2.747627
0
18
-------------+-------------------------------------------------------grade66 |
5225 10.58431 2.433696
0
18
age66 |
5225 18.09129 3.157657
14
24
smsa66 |
5225 .6599043 .4737864
0
1
region |
5225 4.721722 2.300767
1
9
smsa76 |
5225 .491866 .4999817
0
1
-------------+-------------------------------------------------------col4 |
5225 .691866 .4617664
0
1
mcol4 |
5225 .6874641 .4635713
0
1
col4pub |
5225 .5129187 .4998809
0
1
south76 |
3695 .3964817 .4892328
0
1
wage76 |
3078 1.658013 .4430234
0 3.1797
-------------+-------------------------------------------------------exp76 |
3671 8.933533 4.212664
0
25
expsq76 |
3671 .9754971 .8778352
0
6.25
age76 |
5225 28.09129 3.157657
24
34
agesq76 |
5225 799.0896 182.0539
576
1156
reg1 |
5225
.04 .1959779
0
1
-------------+-------------------------------------------------------reg2 |
5225 .1617225 .3682313
0
1
reg3 |
5225 .1900478 .3923763
0
1
reg4 |
5225 .0639234 .2446399
0
1
reg5 |
5225 .2126316 .4092083
0
1
reg6 |
5225 .0895694 .2855912
0
1
-------------+-------------------------------------------------------reg7 |
5225 .1083254 .3108206
0
1
reg8 |
5225 .0304306 .1717855
0
1
69

reg9 |
5225 .1033493 .3044437
0
1
momdad14 |
5225 .7680383 .4221251
0
1
sinmom14 |
5225 .1182775 .3229673
0
1
-------------+-------------------------------------------------------nodaded |
5225 .2478469 .4318038
0
1
nomomed |
5225 .1247847 .3305062
0
1
daded |
5225 9.937162 3.276134
0
18
momed |
5225 10.25103 2.974812
0
18
famed |
5225 6.05933 2.643855
1
9
-------------+-------------------------------------------------------famed1 |
5225 .0610526 .2394497
0
1
famed2 |
5225 .0742584 .262216
0
1
famed3 |
5225 .1144498 .3183872
0
1
famed4 |
5225 .0474641 .2126498
0
1
famed5 |
5225 .077512 .2674276
0
1
-------------+-------------------------------------------------------famed6 |
5225 .1245933 .3302888
0
1
famed7 |
5225 .0486124 .215077
0
1
famed8 |
5225 .2273684 .4191726
0
1
famed9 |
5225 .224689 .4174173
0
1
int76 |
5225 .707177 .4551014
0
1
-------------+-------------------------------------------------------age1415 |
5225 .2595215 .4384141
0
1
age1617 |
5225 .2482297 .4320271
0
1
age1819 |
5225 .1751196 .3801058
0
1
age2021 |
5225
.11311 .3167576
0
1
age2224 |
5225 .2040191 .4030216
0
1
-------------+-------------------------------------------------------cage1415 |
5225 .1755024 .3804327
0
1
cage1617 |
5225 .1680383 .3739361
0
1
cage1819 |
5225 .1245933 .3302888
0
1
cage2021 |
5225 .0796172 .2707256
0
1
cage2224 |
5225 .1441148 .3512397
0
1
-------------+-------------------------------------------------------cage66 |
5225 12.56115 8.785895
0
24
a1 |
5225 .1314833 .3379605
0
1
a2 |
5225 .1280383 .3341644
0
1
a3 |
5225 .1326316 .3392086
0
1
a4 |
5225 .1155981 .3197729
0
1
-------------+-------------------------------------------------------a5 |
5225 .098756 .2983627
0
1
a6 |
5225 .0763636 .2656045
0
1
a7 |
5225 .0560766 .2300915
0
1
a8 |
5225 .0570335 .2319288
0
1
a9 |
5225 .0666029 .2493568
0
1
-------------+-------------------------------------------------------a10 |
5225 .0683254 .2523275
0
1
a11 |
5225 .0690909 .2536329
0
1
ca1 |
5225 .308134 .4617664
0
1
ca2 |
5225 .0876555 .2828203
0
1
ca3 |
5225 .0878469 .2830992
0
1
70

-------------+-------------------------------------------------------ca4 |
5225 .0870813 .2819812
0
1
ca5 |
5225 .0809569 .2727951
0
1
ca6 |
5225 .0708134 .2565374
0
1
ca7 |
5225 .0537799 .2256044
0
1
ca8 |
5225 .0390431 .193716
0
1
-------------+-------------------------------------------------------ca9 |
5225 .0405742 .1973204
0
1
ca10 |
5225 .0465072 .2106009
0
1
ca11 |
5225 .0484211 .2146748
0
1
ca12 |
5225 12.52593 2.740455
0
18
g25 |
5225 12.53923 2.749407
0
18
-------------+-------------------------------------------------------g25i |
4148 12.77929 2.740756
0
18
intmo66 |
5225 -5.790239 128.4984
-999
12
nlsflt |
5225 .9835407 .1272459
0
1
nsib |
5225 2.818565 2.473752
0
18
ns1 |
5225 .2547368 .4357549
0
1
-------------+-------------------------------------------------------ns2 |
5225 .3534928 .4780998
0
1
ns3 |
5225 .0109091 .1038853
0
1
ns4 |
5225 .1892823 .3917702
0
1
ns5 |
5225 .135311 .3420882
0
1
ns6 |
5225 .0558852 .2297218
0
1
-------------+-------------------------------------------------------ns7 |
5225 .0003828 .0195628
0
1
.
. * Define the exogenous regressors using the global macro exogregressors
. global exogregressors black south76 smsa76 reg2-reg9 /*
> */ smsa66 momdad14 sinmom14 nodaded nomomed daded momed famed1-famed8
.
. * Write data to a text (ascii) file so can use with programs other than stata
. outfile wage76 grade76 exp76 expsq76 col4 age76 agesq76 black south76 smsa76 reg2-reg9 /*
> */ smsa66 momdad14 sinmom14 nodaded nomomed daded momed famed1-famed8 /*
> */ using mma04p4ivweak.asc, replace
.
.
. ********** (1) OLS AND IV ESTIMATES: COLUMNS 1 AND 2 OF KLING TABLE 1
.
. * RETAIN cases for the analysis
. * Here drop if missing wages or missing schooling or not at first interview
. keep if wage76!=. & grade76!=. & nlsflt==1
(2216 observations deleted)
.
. * DESCRIBE dependent variable, regressors and instruments
. desc wage76 grade76 exp76 expsq76 col4 age76 agesq76 $exogregressors

71

storage display value


variable name type format
label
variable label
------------------------------------------------------------------------------wage76
float %9.0g
'76 Wage
grade76
float %9.0g
'76 Grade level
exp76
float %9.0g
'76 experience, (10 + age66) grade76 - 6)
expsq76
float %9.0g
'76 experience, exp76 ^2/100
col4
float %9.0g
If any 4-year college nearby
(r0004000!=4)
age76
float %9.0g
'76 age (age66 +10)
agesq76
float %9.0g
'76 age squared (age76^2)
black
float %9.0g
Race (r0002300) n= 5225 mean=
1.296 min= 1 max=3
south76
float %9.0g
If lived in South in 1976
(r0437511=1)
smsa76
float %9.0g
If lived in SMSA in 1976
(r0437515=1,2)
reg2
float %9.0g
If lived in Region 2 (region=
MidAtl)
reg3
float %9.0g
If lived in Region 3 (region=
ENC)
reg4
float %9.0g
If lived in Region 4 (region=
WNC)
reg5
float %9.0g
If lived in Region 5 (region=
SA )
reg6
float %9.0g
If lived in Region 6 (region=
ESC)
reg7
float %9.0g
If lived in Region 7 (region=
WSC)
reg8
float %9.0g
If lived in Region 8 (region= M
)
reg9
float %9.0g
If lived in Region 9 (region= P
)
smsa66
float %9.0g
If lived in SMSA in 1966
(r0002455=1,2)
momdad14
float %9.0g
If lived with both parents at
age 14
sinmom14
float %9.0g
If lived with mother only at
age 14
nodaded
float %9.0g
If father has no formal
education
nomomed
float %9.0g
If mother has no formal
education
daded
float %9.0g
Mean grade level of father
momed
float %9.0g
Mean grade level of mother
famed1
float %9.0g
If mgrade> 12 & fgrade> 12
(famed=1)
famed2
float %9.0g
If mgrade>=12 & fgrade>=12
(famed=2)
famed3
float %9.0g
If mgrade==12 & fgrade==12
72

famed4

float %9.0g

famed5
famed6

float %9.0g
float %9.0g

famed7

float %9.0g

famed8

float %9.0g

(famed=3)
If mgrade>=12 & fgrade==-1
(famed=4)
If fgrade>=12 (famed=5)
If mgrade>=12 & fgrade> -1
(famed=6)
If mgrade>=9 & fgrade>=9
(famed=7)
If mgrade> -1 & fgrade> -1
(famed=8)

.
. * SUMMARIZE dependent variable, regressors and instruments
. sum wage76 grade76 exp76 expsq76 col4 age76 agesq76 $exogregressors
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------wage76 |
3010 1.656664 .443798
0 3.1797
grade76 |
3010 13.26346 2.676913
1
18
exp76 |
3010 8.856146 4.141672
0
23
expsq76 |
3010 .9557907 .8461831
0
5.29
col4 |
3010 .6820598 .4657535
0
1
-------------+-------------------------------------------------------age76 |
3010 28.1196 3.137004
24
34
agesq76 |
3010 800.5495 180.7484
576
1156
black |
3010 .2335548 .4231624
0
1
south76 |
3010 .4036545 .4907113
0
1
smsa76 |
3010 .7129568 .4524571
0
1
-------------+-------------------------------------------------------reg2 |
3010 .1607973 .367405
0
1
reg3 |
3010 .1956811
.39679
0
1
reg4 |
3010 .0641196 .2450066
0
1
reg5 |
3010 .2083056 .406164
0
1
reg6 |
3010 .0960133 .2946584
0
1
-------------+-------------------------------------------------------reg7 |
3010 .1099668 .3129003
0
1
reg8 |
3010 .0282392 .165683
0
1
reg9 |
3010 .0903654 .2867522
0
1
smsa66 |
3010 .6495017 .4772053
0
1
momdad14 |
3010 .7893688 .4078247
0
1
-------------+-------------------------------------------------------sinmom14 |
3010 .1006645 .3009339
0
1
nodaded |
3010 .2292359 .4204111
0
1
nomomed |
3010 .1172757 .321802
0
1
daded |
3010 9.988262 3.266511
0
18
momed |
3010 10.33675 2.987507
0
18
-------------+-------------------------------------------------------famed1 |
3010 .0614618 .2402153
0
1
famed2 |
3010 .0787375 .2693734
0
1
famed3 |
3010 .1249169 .3306796
0
1
famed4 |
3010 .0475083 .2127588
0
1
73

famed5 |
3010 .0790698 .2698925
0
1
-------------+-------------------------------------------------------famed6 |
3010 .1328904 .3395126
0
1
famed7 |
3010 .0504983 .2190073
0
1
famed8 |
3010 .2202658 .4144947
0
1
.
. * OLS estimates of return to schooling.
. * This regression computes schooling coeff, se for Table1 col 1 p.359
. * based on all cases (age grp 14-24) reported highest grd cmpl 76
.
. reg wage76 grade76 exp76 expsq76 $exogregressors
Source |
SS
df
MS
Number of obs = 3010
-------------+-----------------------------F( 29, 2980) = 44.94
Model | 180.320527 29 6.21794919
Prob > F
= 0.0000
Residual | 412.32209 2980 .138363117
R-squared = 0.3043
-------------+-----------------------------Adj R-squared = 0.2975
Total | 592.642616 3009 .196956669
Root MSE
= .37197
-----------------------------------------------------------------------------wage76 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------grade76 | .072635 .0036984 19.64 0.000 .0653833 .0798868
exp76 | .0845293 .0066819 12.65 0.000 .0714277 .0976308
expsq76 | -.2289581 .0319499 -7.17 0.000 -.2916041 -.1663121
black | -.1894065 .0194462 -9.74 0.000 -.2275358 -.1512773
south76 | -.1464841 .0260345 -5.63 0.000 -.1975314 -.0954368
smsa76 | .1377121 .0201334 6.84 0.000 .0982353 .1771889
reg2 | .1023805 .0360137 2.84 0.005 .0317662 .1729947
reg3 | .1488958 .0352521 4.22 0.000 .0797748 .2180168
reg4 | .0601267 .0417556 1.44 0.150 -.021746 .1419994
reg5 | .1348504 .0419098 3.22 0.001 .0526752 .2170255
reg6 | .1452831 .0453155 3.21 0.001 .0564302 .2341359
reg7 | .1301968 .044965 2.90 0.004 .0420312 .2183624
reg8 | -.0444289 .0513937 -0.86 0.387 -.1451997 .0563419
reg9 | .1285658 .0389959 3.30 0.001 .0521042 .2050274
smsa66 | .0233775 .019544 1.20 0.232 -.0149436 .0616987
momdad14 | .0693317 .0263402 2.63 0.009
.017685 .1209785
sinmom14 | .0335387 .0354168 0.95 0.344 -.0359052 .1029825
nodaded | -.0390477 .0531089 -0.74 0.462 -.1431815 .0650862
nomomed | .0168143 .0348295 0.48 0.629 -.051478 .0851066
daded | -.0017839 .0043977 -0.41 0.685 -.0104068 .0068389
momed | .0081443 .0041513 1.96 0.050 4.64e-06 .0162839
famed1 | -.1166029 .0788125 -1.48 0.139 -.2711354 .0379296
famed2 | -.052544 .0712753 -0.74 0.461 -.1922977 .0872097
famed3 | -.0719675 .0654608 -1.10 0.272 -.2003205 .0563856
famed4 | -.0197095 .0437058 -0.45 0.652 -.1054062 .0659872
famed5 | -.0252185 .0643526 -0.39 0.695 -.1513985 .1009615
famed6 | -.0733887 .0621076 -1.18 0.237 -.1951667 .0483894
famed7 | -.059927 .0656929 -0.91 0.362 -.188735 .068881
74

famed8 | -.0738951 .0572428 -1.29 0.197 -.1861345 .0383444


_cons | -.0278815 .1005974 -0.28 0.782 -.2251288 .1693659
-----------------------------------------------------------------------------. estimates store ols
.
. * IV Instrumental variables estimates of return to schooling.
. * This regression computes schooling coeff and se for Table 1. col 2 p.359
. * Endogenous variables: schooling, experience, experience squared
. * Excl instruments: college in cnty, age age^2
. * based on all cases (age grp 14-24) reported highest grd cmpl 76 ***/
.
. ivreg wage76 $exogregressors /*
> */ (grade76 exp76 expsq76 = col4 age76 agesq76 $exogregressors)
Instrumental variables (2SLS) regression
Source |
SS
df
MS
Number of obs = 3010
-------------+-----------------------------F( 29, 2980) = 34.56
Model | 122.395448 29 4.22053269
Prob > F
= 0.0000
Residual | 470.247169 2980 .157801063
R-squared = 0.2065
-------------+-----------------------------Adj R-squared = 0.1988
Total | 592.642616 3009 .196956669
Root MSE
= .39724
-----------------------------------------------------------------------------wage76 |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------grade76 | .1324485 .0493419 2.68 0.007 .0357009 .2291961
exp76 | .0632411 .0241061 2.62 0.009 .0159748 .1105074
expsq76 | -.1266694 .1184765 -1.07 0.285 -.3589735 .1056347
black | -.1643766 .0292248 -5.62 0.000 -.2216795 -.1070737
south76 | -.1400178 .0283887 -4.93 0.000 -.1956812 -.0843545
smsa76 | .0909867 .0441338 2.06 0.039 .0044509 .1775224
reg2 | .0753178 .0444167 1.70 0.090 -.0117726 .1624083
reg3 | .1231473 .0431763 2.85 0.004
.038489 .2078057
reg4 | .0241968 .0534911 0.45 0.651 -.0806865 .1290801
reg5 | .1247819 .0455148 2.74 0.006 .0355383 .2140255
reg6 | .135761 .0490304 2.77 0.006
.039624 .2318979
reg7 | .1063645 .0519274 2.05 0.041 .0045472 .2081817
reg8 | -.0850609 .064327 -1.32 0.186 -.2111907 .0410688
reg9 | .0916464 .0515551 1.78 0.076 -.0094409 .1927337
smsa66 | .0379821 .0241116 1.58 0.115 -.0092951 .0852592
momdad14 | .043168 .0354056 1.22 0.223 -.0262539
.11259
sinmom14 | .025849 .0383465 0.67 0.500 -.0493392 .1010373
nodaded | -.0462392 .0570684 -0.81 0.418 -.1581366 .0656583
nomomed | .0266252 .0383434 0.69 0.487 -.048557 .1018074
daded | -.0110565 .0089768 -1.23 0.218 -.0286579 .0065449
momed | -.0017539 .0093223 -0.19 0.851 -.0200326 .0165249
famed1 | -.213271 .1160049 -1.84 0.066 -.4407287 .0141867
famed2 | -.1567074 .1145696 -1.37 0.171 -.3813508 .0679361
75

famed3 | -.1354685 .0872725 -1.55 0.121 -.3065889 .035652


famed4 | -.0707323 .0627189 -1.13 0.260 -.193709 .0522444
famed5 | -.0699675 .077928 -0.90 0.369 -.2227656 .0828306
famed6 | -.1171712 .0754408 -1.55 0.120 -.2650926 .0307502
famed7 | -.0921498 .0749801 -1.23 0.219 -.2391679 .0548683
famed8 | -.1184618 .0713021 -1.66 0.097 -.2582681 .0213445
_cons | -.4311125 .3567904 -1.21 0.227 -1.130693 .2684678
-----------------------------------------------------------------------------Instrumented: grade76 exp76 expsq76
Instruments: black south76 smsa76 reg2 reg3 reg4 reg5 reg6 reg7 reg8 reg9
smsa66 momdad14 sinmom14 nodaded nomomed daded momed famed1
famed2 famed3 famed4 famed5 famed6 famed7 famed8 col4 age76
agesq76
-----------------------------------------------------------------------------. estimates store iv
.
. ********** (2) NEW ANALYSIS: HETEROSKEDASTIC ROBUST STANDARD ERRORS
**********
.
. * Heteroskedastic errors makes little difference here.
.
. quietly reg wage76 grade76 exp76 expsq76 $exogregressors
. hettest /* Shows that here there is no heteroskeadsticity for OLS */
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of wage76
chi2(1)
= 0.42
Prob > chi2 = 0.5191
. quietly reg wage76 grade76 exp76 expsq76 $exogregressors, robust
. estimates store olshet
.
. quietly ivreg wage76 $exogregressors /*
> */ (grade76 exp76 expsq76 = col4 age76 agesq76 $exogregressors), robust
. estimates store ivhet
.
. **** DISPLAY RESULTS IN TABLE 4.5 p.111
.
. * Table 4.5 p.111: OLS and IV estimates, s.e.'s and R^2 in Table 4.5
.
. * Table reports only the coefficient and standard erros for grade76
. estimates table ols olshet iv ivhet, /*
76

>

*/ se stats(N ll r2 rss mss rmse df_r) b(%10.4f)

-----------------------------------------------------------------Variable | ols
olshet
iv
ivhet
-------------+---------------------------------------------------grade76 | 0.0726
0.0726
0.1324
0.1324
| 0.0037
0.0039
0.0493
0.0488
exp76 | 0.0845
0.0845
0.0632
0.0632
| 0.0067
0.0068
0.0241
0.0241
expsq76 | -0.2290 -0.2290 -0.1267
-0.1267
| 0.0319
0.0322
0.1185
0.1182
black | -0.1894 -0.1894 -0.1644 -0.1644
| 0.0194
0.0198
0.0292
0.0285
south76 | -0.1465 -0.1465 -0.1400 -0.1400
| 0.0260
0.0280
0.0284
0.0292
smsa76 | 0.1377
0.1377
0.0910
0.0910
| 0.0201
0.0193
0.0441
0.0440
reg2 | 0.1024
0.1024
0.0753
0.0753
| 0.0360
0.0350
0.0444
0.0432
reg3 | 0.1489
0.1489
0.1231
0.1231
| 0.0353
0.0338
0.0432
0.0418
reg4 | 0.0601
0.0601
0.0242
0.0242
| 0.0418
0.0412
0.0535
0.0531
reg5 | 0.1349
0.1349
0.1248
0.1248
| 0.0419
0.0428
0.0455
0.0459
reg6 | 0.1453
0.1453
0.1358
0.1358
| 0.0453
0.0452
0.0490
0.0483
reg7 | 0.1302
0.1302
0.1064
0.1064
| 0.0450
0.0457
0.0519
0.0516
reg8 | -0.0444 -0.0444 -0.0851 -0.0851
| 0.0514
0.0509
0.0643
0.0619
reg9 | 0.1286
0.1286
0.0916
0.0916
| 0.0390
0.0388
0.0516
0.0504
smsa66 | 0.0234
0.0234
0.0380
0.0380
| 0.0195
0.0187
0.0241
0.0231
momdad14 | 0.0693
0.0693
0.0432
0.0432
| 0.0263
0.0257
0.0354
0.0352
sinmom14 | 0.0335
0.0335
0.0258
0.0258
| 0.0354
0.0359
0.0383
0.0384
nodaded | -0.0390 -0.0390 -0.0462 -0.0462
| 0.0531
0.0511
0.0571
0.0550
nomomed | 0.0168
0.0168
0.0266
0.0266
| 0.0348
0.0344
0.0383
0.0375
daded | -0.0018 -0.0018 -0.0111 -0.0111
| 0.0044
0.0044
0.0090
0.0089
momed | 0.0081
0.0081
-0.0018
-0.0018
| 0.0042
0.0042
0.0093
0.0093
famed1 | -0.1166 -0.1166 -0.2133 -0.2133
| 0.0788
0.0792
0.1160
0.1160
famed2 | -0.0525 -0.0525 -0.1567 -0.1567
| 0.0713
0.0698
0.1146
0.1132
77

famed3 | -0.0720 -0.0720 -0.1355 -0.1355


| 0.0655
0.0644
0.0873
0.0865
famed4 | -0.0197 -0.0197 -0.0707 -0.0707
| 0.0437
0.0416
0.0627
0.0601
famed5 | -0.0252 -0.0252 -0.0700 -0.0700
| 0.0644
0.0625
0.0779
0.0763
famed6 | -0.0734 -0.0734 -0.1172 -0.1172
| 0.0621
0.0601
0.0754
0.0735
famed7 | -0.0599 -0.0599 -0.0921 -0.0921
| 0.0657
0.0640
0.0750
0.0730
famed8 | -0.0739 -0.0739 -0.1185 -0.1185
| 0.0572
0.0545
0.0713
0.0682
_cons | -0.0279 -0.0279 -0.4311
-0.4311
| 0.1006
0.0997
0.3568
0.3528
-------------+---------------------------------------------------N | 3010.0000 3010.0000 3010.0000 3010.0000
ll | -1279.2297 -1279.2297
r2 | 0.3043
0.3043
0.2065
0.2065
rss | 412.3221 412.3221 470.2472 470.2472
mss | 180.3205 180.3205 122.3954 122.3954
rmse | 0.3720
0.3720
0.3972
0.3972
df_r | 2980.0000 2980.0000 2980.0000 2980.0000
-----------------------------------------------------------------legend: b/se
.
. ********** (3) NEW ANALYSIS: CHECK FOR WEAK INSTRUMENTS **********
.
. * Model is y = b1*x1 + x2'b2 + u
. * where x1 is scalar endogenous (grade76)
. * where x2 is vector of regressors that includes
.*
exp76 and exp76 which are also endogenous
.*
and $exogregressors which are exogenous
. * and the instruments Z are grade76 col4 age76 agesq76 $exogregressors
.
. * Check for weak instruments
. * Focus on grade76 but can also do this for the other two endogenous regressors.
. * In this example no problems for the other two:
. * as age and age-squared are good instruments for exp and exp-squared.
.
. **** (A) Simple analysis R-squared and F-test [Given in Table 4.5]
.
. * R2 from regress endogenous regressor on instruments
. * This is same as correlation between x1 and projection of x1 on Z
. quietly reg grade76 col4 age76 agesq76 $exogregressors
. di e(r2) " r2 of x1 on Z"
.29677588 r2 of x1 on Z
.
. * Do the partial F-test on the three instruments
78

. * This is the standard first-stage regression F-test


.
. **** DISPLAY RESULT IN TABLE 4.5 page 111
.
. * First-stage F statistic given in Table 4.5
. test col4 age76 agesq76
( 1) col4 = 0
( 2) age76 = 0
( 3) agesq76 = 0
F( 3, 2980) = 8.07
Prob > F = 0.0000
.
. * Compare this to R-squared when only regress on instruments without Z
. quietly reg grade76 $exogregressors
. di e(r2) " r2 of x1 on Z with the three additional instruments dropped"
.29106483 r2 of x1 on Z with the three additional instruments dropped
.
. * Obtain first-stge F for the other two endogenous
. quietly reg exp76 col4 age76 agesq76 $exogregressors
. test col4 age76 agesq76
( 1) col4 = 0
( 2) age76 = 0
( 3) agesq76 = 0
F( 3, 2980) = 1772.03
Prob > F = 0.0000
. quietly reg expsq76 col4 age76 agesq76 $exogregressors
. test col4 age76 agesq76
( 1) col4 = 0
( 2) age76 = 0
( 3) agesq76 = 0
F( 3, 2980) = 1542.36
Prob > F = 0.0000
.
. **** (B) Minimum eigenvalue of matrix analog of the first-stage F statistic
.*
proposed by Stock et al (2002) and tables in Stock and Yogo (2003)
. * This test is not done here.
.
. **** (C) Bound et al (1995) partial R-squared
79

.
. * Not relevant here as more than one endogenous regressor
. * If only one endogenous regressor x1 Bound et al purge the effect of x2
. * by (1) get residual from regress x1 on x2
. * (2) get the residuals from regress z on x2
. * and then get the R-squared from regress (1) on (2).
.
. **** (D) Shea (1997) partial R-squared [Given in Table 4.5]
.
. * Here we have three endogenous regressors.
. * Focus on the endogenous schooling regressor.
. * For the other two just need to replace the first line of (1)
. * e.g. quietly reg exp76 grade76 expsq76 $exogregressors
. * and replace the first line of (2B)
. * e.g. quietly reg exp76hat grade76hat expsq76hat $exogregressors
.
. * (1) Form x1 - x1tilda: residual from regress x1 on other regressors
. quietly reg grade76 exp76 expsq76 $exogregressors
. predict x1minusx1tilda, resid
.
. * (2) Form x1hat - x1hattilda: residual from regress x1hat on fitted values of other regressors
. * (2A) First get the fitted values from regress endogenous on instruments
. quietly reg grade76 col4 age76 agesq76 $exogregressors
. predict grade76hat, xb
. di e(r2) " r2 from regress x1 on Z"
.29677588 r2 from regress x1 on Z
. quietly reg exp76 col4 age76 agesq76 $exogregressors
. predict exp76hat, xb
. di e(r2) " r2 from regress second endog regressor on Z"
.70622765 r2 from regress second endog regressor on Z
. quietly reg expsq76 col4 age76 agesq76 $exogregressors
. predict expsq76hat, xb
. di e(r2) " r2 from regress third endog regressor on Z"
.67573235 r2 from regress third endog regressor on Z
. * Fitted values for the exogenous from regress exogenous on instruments are the exogenous
. * (2B) Run the regression of x1hat on fitted values of other regressors
. quietly reg grade76hat exp76hat expsq76hat $exogregressors
. di e(r2) " r2 from regress prediction of x1 on predictions of x2
.98987117 r2 from regress prediction of x1 on predictions of x2
80

. predict x1hatminusx1hattilda, resid


.
. * (3) Form the correlation between (1) and (2)
. corr x1minusx1tilda x1hatminusx1hattilda
(obs=3010)
| x1minu~a x1hatm~a
-------------+-----------------x1minusx1t~a | 1.0000
x1hatminus~a | 0.0800 1.0000

.
. **** DISPLAY RESULT IN TABLE 4.5 page 111
.
. * Shea's Partial R^2 in Table 4.5
. di r(rho)^2 " Shea's partial R-squared measure"
.00640757 Shea's partial R-squared measure
.
. sum grade76 grade76hat exp76 exp76hat expsq76 expsq76hat grade76 x1minusx1tilda
x1hatminusx1hattilda grade76hat
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------grade76 |
3010 13.26346 2.676913
1
18
grade76hat |
3010 13.26346 1.458306 8.919074 17.42063
exp76 | 3010 8.856146 4.141672
0
23
exp76hat |
3010 8.856146 3.480551 1.329216 17.68953
expsq76 |
3010 .9557907 .8461831
0
5.29
-------------+-------------------------------------------------------expsq76hat |
3010 .9557907 .6955874 -.3913698 2.917523
grade76 |
3010 13.26346 2.676913
1
18
x1minusx1t~a |
3010 -8.71e-10 1.833502 -6.948598 5.661138
x1hatminus~a |
3010 -6.86e-11 .1467669 -.3732457 .3033035
grade76hat |
3010 13.26346 1.458306 8.919074 17.42063
.
. **** (E) Poskitt-Skeels (2002) partial R-squared
. * Not done here
.
. **** (F) If model was over-identified then do test of over-identifying restrictions
. * Not done here as model is just-identified
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma04p4ivweak.txt
log type: text
closed on: 17 May 2005, 13:46:03
81

-----------------------------------------------------------------------------------------------------------------------------------------------------

82

Chapter 5.9 pp.159-63

----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p1mle.txt
log type: text
opened on: 17 May 2005, 13:48:11
.
. ********** OVERVIEW OF MMA05P1MLE.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 5.9 pp.159-63
. * Maximum likelihood analysis.
.
. * Provides first two columns of Table 5.7
. * (1) OLS
using Stata command regress
. * (2) MLE
using Stata command exp for exponential MLE
. * (3) MLE
using Stata command ml for user-provided log-likelihood
. * using generated data (see below)
.
. * Related programs:
. * mma05p2nls.do
NLS, WNLS, FGNLS for same data using nl command
. * mma05p3nlsbyml.do
NLS, WNLS, FGNLS for same data using ml command
. * mma05p4margeffects.do Calculates marginal effects
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** GENERATE DATA and SUMMARIZE **********
.
. * Model is y ~ exponential(exp(a + bx))
.*
x ~ N[mux, sigx^2]
.*
f(y) = exp(a + bx)*exp(-y*exp(a + bx))
.*
lnf(y) = (a + bx) - y*exp(a + bx)
.*
E[y] = exp(-(a + bx)) note sign reversal for the mean
.*
V[y] = exp(-(a + bx)) = E[y]^2
.
. * The dgp sets particular values of a, b, mux and sigx
. * Here a = 2, b = -1 and x ~ N[1, 1]
. scalar a = 2

83

. scalar b = -1
. scalar mux = 1
. scalar sigx = 1
.
. * Set the sample size. Table 5.7 uses N=10,000
. set obs 10000
obs was 0, now 10000
.
. * Generate x and y
. set seed 2003
. gen x = mux + sigx*invnorm(uniform())
. gen lamda = exp(a + b*x)
. gen Ey = 1/lamda
. * To generate exponential with mean mu=Ey use
. * Integral 0 to a of (1/mu)exp(-x/mu) dx by change of variables
. * = Integral 0 to a/mu of exp(-t)dt
. * = incomplete gamma function P(0,a/mu) in the terminology of Stata
. gen y = Ey*invgammap(1,uniform())
. gen lny = ln(y)
. gen lnfy = ln(lamda) - y*lamda
. * twoway scatter Ey x
.
. * Descriptive Statisitcs
. describe
Contains data
obs:
10,000
vars:
6
size:
280,000 (97.3% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------x
float %9.0g
lamda
float %9.0g
Ey
float %9.0g
y
float %9.0g
lny
float %9.0g
lnfy
float %9.0g
------------------------------------------------------------------------------84

Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x | 10000 1.014313 1.004905 -2.895741 4.994059
lamda | 10000 4.457478 5.939084 .0500838 133.7191
Ey | 10000 .6185677 .8294007 .0074784 19.96655
y | 10000 .6194352 1.291416 .0000445 30.60636
lny | 10000 -1.554348 1.62358 -10.02114 3.421208
-------------+-------------------------------------------------------lnfy | 10000 -.0209485 1.419595 -7.52596 4.402257
.
. ********** WRITE DATA TO A TEXT FILE **********
.
. * Write data to a text (ascii) file
. * used for programs mma05p2nlsbyml.do, mma05p3nlsbynl.do
. * and mma05p4margeffects.do
. * and can also use with programs other than Stata
. outfile y x using mma05data.asc, replace
.
. ********** DO THE ANALYSIS: OLS and MLE **********
.
. ** (1) OLS ESTIMATION
.
. * OLS is inconsistent in this example
. regress y x
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 1, 9998) = 3030.74
Model | 3879.13606 1 3879.13606
Prob > F
= 0.0000
Residual | 12796.7438 9998 1.27993037
R-squared = 0.2326
-------------+-----------------------------Adj R-squared = 0.2325
Total | 16675.8799 9999 1.66775476
Root MSE
= 1.1313
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .6198182 .0112587 55.05 0.000 .5977488 .6418876
_cons | -.0092545 .016075 -0.58 0.565 -.0407648 .0222558
-----------------------------------------------------------------------------. estimates store rols
. regress y x, robust
Regression with robust standard errors

Number of obs = 10000


85

F( 1, 9998) = 596.30
Prob > F
= 0.0000
R-squared = 0.2326
Root MSE = 1.1313
-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .6198182 .0253823 24.42 0.000 .5700638 .6695725
_cons | -.0092545 .0171978 -0.54 0.591 -.0429655 .0244566
-----------------------------------------------------------------------------. estimates store rolsrobust
.
. ** (2) ML ESTIMATION USING STATA COMMAND FOR EXPONENTIAL MLE
.
. * The following uses Stata duration model commands.
. * First need to define the duration variable (here y)
. stset y
failure event: (assumed to fail at time=y)
obs. time interval: (0, y]
exit on or before: failure
-----------------------------------------------------------------------------10000 total obs.
0 exclusions
-----------------------------------------------------------------------------10000 obs. remaining, representing
10000 failures in single record/single failure data
6194.352 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 30.60636
. streg x, dist(exp) nohr
failure _d: 1 (meaning all fail)
analysis time _t: y
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -20754.005


log likelihood = -17232.884
log likelihood = -15760.556
log likelihood = -15752.193
log likelihood = -15752.19
log likelihood = -15752.19

Exponential regression -- log relative-hazard form


No. of subjects =

10000

Number of obs =

10000
86

No. of failures =
10000
Time at risk = 6194.352495
LR chi2(1)
Log likelihood =

-15752.19

= 10003.63
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0098692 -100.27 0.000 -1.008971 -.9702842
_cons | 1.982921 .0141496 140.14 0.000 1.955188 2.010654
-----------------------------------------------------------------------------. estimates store rexp
. streg x, dist(exp) nohr robust
failure _d: 1 (meaning all fail)
analysis time _t: y
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -20754.005


log pseudo-likelihood = -17232.884
log pseudo-likelihood = -15760.556
log pseudo-likelihood = -15752.193
log pseudo-likelihood = -15752.19
log pseudo-likelihood = -15752.19

Exponential regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
10000
Number of obs = 10000
=
10000
= 6194.352495
Wald chi2(1) = 9914.62
Log pseudo-likelihood = -15752.19
Prob > chi2 = 0.0000
-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0099388 -99.57 0.000 -1.009107 -.9701479
_cons | 1.982921 .0144307 137.41 0.000 1.954637 2.011205
-----------------------------------------------------------------------------. estimates store rexprobust
.
. ** (3) ML ESTIMATION USING STATA ML COMMAND
.
. * For MLE computation can use the following Stata commands
. * ml model lf
provide the log-density
. * ml model D0
provide the log-likelihood
. * ml model D1
provide the log-likelihood and gradient
87

. * ml model D2
provide the log-likelihood, gradient and hessian
.
. * At a minimum need to provide
. * (A) program define fcn where fcn is the function name
.*
defines the log-density (independent observations assumed)
. * (B) ml model lf fcn + some extras
.*
the extras give the dependent variable and regressors
. * (C) ml maximize
.*
obtains the mle
. * (D) ml model lf fcn + some extras, robust
.*
provides robust sandwich standard errors
.
. * Here we provide the log-density (ml model lf) as this is simplest,
. * and the Stata manual says that numerically only D2 is better.
.
. * (A) Define the log-density
.*
lnf(y) = (a+bx) - y*exp(a+bx) = theta - y*exp(theta) where theta = x'b
. program define mleexp0
1. version 8.0
2. args lnf theta
/* Must use lnf while could use name other than theta */
3. quietly replace `lnf' = `theta' - $ML_y1*exp(`theta')
4. end
.
. * (B) Say that dependent variable is y and regressors are x plus a constant
. ml model lf mleexp0 (y = x)
.
. * (C) Obtain the MLE
. ml search
/* Optional - can provide better starting values */
initial:
log likelihood = -6194.3525
improve:
log likelihood = -6194.3525
alternative: log likelihood = -5212.7607
rescale:
log likelihood = -5212.7607
. ml maximize
initial:
log likelihood = -5212.7607
rescale:
log likelihood = -5212.7607
Iteration 0: log likelihood = -5212.7607
Iteration 1: log likelihood = -1563.9176
Iteration 2: log likelihood = -217.6055
Iteration 3: log likelihood = -208.73633
Iteration 4: log likelihood = -208.71383
Iteration 5: log likelihood = -208.71383
Number of obs =
10000
Wald chi2(1) = 10054.85
Log likelihood = -208.71383
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------88

y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0098692 -100.27 0.000 -1.008971 -.9702842
_cons | 1.982921 .0141496 140.14 0.000 1.955188 2.010654
-----------------------------------------------------------------------------. estimates store rmle
.
. * (D) Obtain robust standard errors
. ml model lf mleexp0 (y = x), robust
. ml search
initial:
log pseudo-likelihood = -6194.3525
improve:
log pseudo-likelihood = -6194.3525
alternative: log pseudo-likelihood = -5212.7607
rescale:
log pseudo-likelihood = -5212.7607
. ml maximize
initial:
log pseudo-likelihood = -5212.7607
rescale:
log pseudo-likelihood = -5212.7607
Iteration 0: log pseudo-likelihood = -5212.7607
Iteration 1: log pseudo-likelihood = -1563.9176
Iteration 2: log pseudo-likelihood = -217.6055
Iteration 3: log pseudo-likelihood = -208.73633
Iteration 4: log pseudo-likelihood = -208.71383
Iteration 5: log pseudo-likelihood = -208.71383
Number of obs =
10000
Wald chi2(1) = 9914.62
Log pseudo-likelihood = -208.71383
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0099388 -99.57 0.000 -1.009107 -.9701479
_cons | 1.982921 .0144307 137.41 0.000 1.954637 2.011205
-----------------------------------------------------------------------------. estimates store rmlerobust
.
. * (E) Calculate R-squared and log-likelihood at the ML estimates
. * lnL sums lnf(y) = ln(lamda) - y*lamda
. gen lamdaml = exp(_b[_cons] + _b[x]*x)
. gen lnfml = ln(lamdaml) - y*lamdaml
. quietly means lnfml
89

. scalar LLml = r(mean)*r(N)


. * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2
. gen yhatml = 1/lamdaml
. egen ybar = mean(y)
. * quietly means y
. * scalar ybar = r(mean)
. gen y_yhatsqml = (y - yhatml)^2
. gen y_ybarsq = (y - ybar)^2
. quietly means y_yhatsqml
. scalar SSresidml = r(mean)
. quietly means y_ybarsq
. scalar SStotal = r(mean)
. scalar Rsqml = 1 - SSresidml/SStotal
. di LLml " " Rsqml
-208.71383 .39062307
.
. ********** DISPLAY RESULTS: First two columns of Table 5.7 p.161
.
. * (1) OLS - nonrobust and robust standard errors
. * Here OLS is inconsistent.
. * And expect sign reversal for slope as in true model mean E[y] = exp(-x'b)
. estimates table rols rolsrobust, b(%10.4f) se(%10.4f) t stats(N ll r2) keep(_cons x)
---------------------------------------Variable | rols
rolsrobust
-------------+-------------------------_cons | -0.0093 -0.0093
| 0.0161
0.0172
|
-0.58
-0.54
x | 0.6198
0.6198
| 0.0113
0.0254
|
55.05
24.42
-------------+-------------------------N | 10000.0000 10000.0000
ll | -1.542e+04 -1.542e+04
r2 | 0.2326
0.2326
---------------------------------------legend: b/se/t

90

.
. * (2) MLE by command ereg - nonrobust and robust standard errors
. estimates table rexp rexprobust, b(%10.4f) se(%10.4f) t stats(N ll) keep(_cons x)
---------------------------------------Variable | rexp
rexprobust
-------------+-------------------------_cons | 1.9829
1.9829
| 0.0141
0.0144
| 140.14
137.41
x | -0.9896 -0.9896
| 0.0099
0.0099
| -100.27
-99.57
-------------+-------------------------N | 10000.0000 10000.0000
ll | -1.575e+04 -1.575e+04
---------------------------------------legend: b/se/t
.
. * (3) MLE by command ml - nonrobust and robust standard errors
. estimates table rmle rmlerobust, b(%10.4f) se(%10.4f) t stats(N ll) keep(_cons x)
---------------------------------------Variable | rmle
rmlerobust
-------------+-------------------------_cons | 1.9829
1.9829
| 0.0141
0.0144
| 140.14
137.41
x | -0.9896 -0.9896
| 0.0099
0.0099
| -100.27
-99.57
-------------+-------------------------N | 10000.0000 10000.0000
ll | -208.7138 -208.7138
---------------------------------------legend: b/se/t
. * And ML log-likelihood (check) and R-squared (needed to be computed)
. di "Log likeihood for ML: " LLml
Log likeihood for ML: -208.71383
. di "R-squared for MLE: " Rsqml
R-squared for MLE: .39062307
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma05p1mle.txt
log type: text
closed on: 17 May 2005, 13:48:18
91

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p2nls.txt
log type: text
opened on: 17 May 2005, 13:53:31
.
. ********** OVERVIEW OF MMA05P2NLS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 5.9 pp.159-63
. * Nonlinear least squares
.
. * Provides last three columns of Table 5.7 results for
. * (1) NLS using Stata command nl (hard to get robust s.e.'s)
. * (2) FGNLS using Stata command nl (hard to get robust s.e.'s)
. * (3) WNLS using Stata command nl (hard to get robust s.e.'s)
. * using generated data set mma05data.asc
.
. * Note: Stata 8 does not give robust se's for nl
.*
But ml does - see program mma05p3nlsbyml.do
.*
New Stata 9 does have a robust se option (unlike Stata 8)
.
. * Related programs:
. * mma05p1mle.do
OLS and MLE for the same data
. * mma05p3nlsbyml.do
NLS using ml rather than nl
. * mma05p4margeffects.do Calculates marginal effects
.
. * To run this program you need data and dictionary files
. * mma05data.asc ASCII data set generated by mma05p1mle.do
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** READ IN DATA and SUMMARIZE **********
.
. * Model is y ~ exponential(exp(a + bx))
.*
x ~ N[mux, sigx^2]
.*
f(y) = exp(a + bx)*exp(-y*exp(a + bx))
.*
lnf(y) = (a + bx) - y*exp(a + bx)
.*
E[y] = exp(-(a + bx)) note sign reversal for the mean
.*
V[y] = exp(-(a + bx)) = E[y]^2
. * Here a = 2, b = -1 and x ~ N[mux=1, sigx^21]
92

. * and Table 5.7 uses N=10,000


.
. * Data was generated by program mma05p1mle.do
. infile y x using mma05data.asc
(10000 observations read)
.
. * Descriptive Statistics
. describe
Contains data
obs:
10,000
vars:
2
size:
120,000 (98.8% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y | 10000 .6194352 1.291416 .0000445 30.60636
x | 10000 1.014313 1.004905 -2.895741 4.994059
.
. ********** DO THE ANALYSIS: NLS, WNLS and NFGLS **********
.
. *** (1) NLS ESTIMATION USING STATA NL COMMAND (Nonlinear LS)
.
. * To do this in Stata
. * (A) program define nlfcn where fcn is the function name
.*
defines g(x_i'b) and says what the regressors x are
. * (B) nl fcn y
where fcn is the function name in (A)
.*
and y is the dependent variable
.*
does NLS of y on fcn defined in (A)
. * (C) Heteroskedastic-consistent standard errors requires extra coding
.
. * (1A) Define g(x'b)
.*
Note: Since E[y] = exp(-(a + bx)) there is sign reversal for the mean
. program define nlexpnls
1. version 7.0
2. if "`1'" == "?" {
/* if query call ... */
3.
global S_1 "b1int b2x"
/* declare parameters */
4.
global b1int=1
/* initial values */
93

5.
global b2x=0
6.
exit}
7. replace `1'=exp(-$b1int-$b2x*x) /* calculate function */
8. end
.
. * (1B) Do NLS of y on the function expnls defined in (A)
. nl expnls y
(obs = 10000)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

residual SS =
residual SS =
residual SS =
residual SS =
residual SS =
residual SS =

17308.68
10333.37
10150.66
10149.86
10149.86
10149.86

Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 5103.98
Model | 10363.0157 2 5181.50784
Prob > F
= 0.0000
Residual | 10149.8633 9998 1.01518937
R-squared = 0.5052
-------------+-----------------------------Adj R-squared = 0.5051
Total | 20512.879 10000 2.0512879
Root MSE
= 1.007566
Res. dev. = 28527.52
(expnls)
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------b1int | 1.887563 .0306819 61.52 0.000
1.82742 1.947705
b2x | -.9574684 .0097419 -98.28 0.000 -.9765645 -.9383724
-----------------------------------------------------------------------------(SEs, P values, CIs, and correlations are asymptotic approximations)
. estimates store bnls
.
. * Complications now begin: getting standard erors. Easier to use (1) !!
.
. * (1C) Get sandwich heteroskedastic-robust standard errors for NLS
.
. * Note that robust option does not work for nl
. * So wrong standard errors are given for this problem as errors are heterosckeastic
.
. * To get robust standard errors is not straightforward
.
. * Obtain them by OLS regress y - g(x,b) on dg/db with robust option.
. * Explanation: OLS regress y - g(x,b) = (dg/db)'a + v
. * This is NR algorithm for update of b
. * But a = 0 since iterations have converged, so v = y - g(x,b)
. * So nonrobust standard errors from this OLS regression yield
. * V[a] = s^2 (Sum_i (dg_i/db)(dg_i/db)')
94

. * where s^2 = (Sum_i(y - g(x_i,b)^2))


. * This is the nonrobust standard errors for NLS
. * And robust option gives robust standard errors from this OLS regression.
.
. * Obtain the derivatives dg/db
. * Here g = exp(x'b) so dg/db = exp(x'b)*x = yhat*x
. quietly nl expnls y
. predict residnls, residuals
. predict yhatnls, yhat
. scalar snls = e(rmse)

/* Use in earlier code */

. gen d1 = yhatnls
. gen d2 = x*yhatnls
. * This OLS regression gives robust standard errors
. regress residnls d1 d2, noconstant robust
Regression with robust standard errors
Number of obs = 10000
F( 2, 9998) = 0.00
Prob > F
= 1.0000
R-squared = 0.0000
Root MSE = 1.0076
-----------------------------------------------------------------------------|
Robust
residnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------d1 | 4.46e-07 .1420794 0.00 1.000 -.2785037 .2785046
d2 | -1.49e-07 .0611969 -0.00 1.000 -.1199583 .119958
-----------------------------------------------------------------------------. estimates store bnlsrobust
.
. * Check: Do OLS regression that gives nonrobust standard errors
.*
and verify that same results as in (1B)
. regress residnls d1 d2, noconstant
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 0.00
Model | 2.6739e-10 2 1.3370e-10
Prob > F
= 1.0000
Residual | 10149.8633 9998 1.01518937
R-squared = 0.0000
-------------+-----------------------------Adj R-squared = -0.0002
Total | 10149.8633 10000 1.01498633
Root MSE
= 1.0076
-----------------------------------------------------------------------------residnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
95

-------------+---------------------------------------------------------------d1 | 4.46e-07 .0306819 0.00 1.000 -.0601423 .0601432


d2 | -1.49e-07 .0097419 -0.00 1.000 -.0190961 .0190958
-----------------------------------------------------------------------------. estimates store bnlscheck
.
. * (1D) Alternative to (1C) robust NLS standard errors that are better.
. * These are sandwich form but use knowledge that V[u]=exp(x'b)^2
. * which can be estimated by Vhat[u] = yhat
. * Now use this knowledge here in computing S in DSD.
. * Form DSDknown = D'SD with S = Diag(yhat^2)
. gen ds1known = yhatnls*yhatnls
. gen ds2known = x*yhatnls*yhatnls
. matrix accum DSDknown = ds1known ds2known, noconstant
(obs=10000)
. matrix accum DD2 = d1 d2, noconstant
(obs=10000)

/* DD commented above */

. * Form the robust variance matrix estimate


. matrix vnlsknown = syminv(DD2)*DSDknown*syminv(DD2)
. * Calculate the robust standard errors
. scalar seb1intnlsknown = sqrt(vnlsknown[1,1])
. scalar seb2xnlsknown = sqrt(vnlsknown[2,2])
. di "Robust standard errors of NLS estimates of b1int and b2x: "
Robust standard errors of NLS estimates of b1int and b2x:
. di "Using knowledge that Var[u] = exp(x'b)^2 estimated by yhat"
Using knowledge that Var[u] = exp(x'b)^2 estimated by yhat
. di seb1intnlsknown " " seb2xnlsknown
.21097066 .08798113
.
. * (1E) Calculate R-squared and log-likelihood at the NLS estimates
. * Note that Stata version 8 reports the wrong R-squared
. * as uses TSS = Sum_i y_i^2 and not Sum_i(y_i - ybar)^2
. * lnL sums lnf(y) = ln(lamda) - y*lamda
. gen lamdanls = 1 / yhatnls
/* yhatnls saved earlier */
. gen lnfnls = ln(lamdanls) - y*lamdanls
. quietly means lnfnls

96

. scalar LLnls = r(mean)*r(N)


. * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2
. egen ybar = mean(y)
. * quietly means y
. * scalar ybar = r(mean)
. gen y_ybarsq = (y - ybar)^2
. quietly means y_ybarsq
. scalar SStotal = r(mean)
. gen y_yhatsqnls = (y - yhatnls)^2
. quietly means y_yhatsqnls
. scalar SSresidnls = r(mean)
. scalar Rsqnls = 1 - SSresidnls/SStotal

/* SStotal found earlier */

. di LLnls " " Rsqnls


-232.97524 .39134462
.
. ** (2) FGNLS ESTIMATION USING STATA NL COMMAND
.
. * The following gives FGNLS in Table 5.7
. * To instead get the WNLS estimates in Table 5.7
. * replace gen wfgnls = (1/yhatnls)^2 below by gen wfgnls = 1/yhatnls
.
. * The Feasible generalized NLS estimator minimizes
. * SUM_i (y_i - g(x_i'b))^2 / s_i^2 where s_i^2 = estimate of sigma_i^2
. * This is y_i = g(x_i'b) + u_i where u_i ~ (0,s_i^2)
. * Can do NLS with weighting option [aweight = 1/(s_i^2)]
. * Here s_i^2 = [exp(x_i'b)]^2 = yhatnls^2
.
. * The simplest way to proceed is to use the aweights option.
.
. * (2A) nls program expnls already defined in (1A)
.
. * (2B) For FGNLS do this nls but now with weights
. gen wfgnls = (1/yhatnls)^2
. * gen wfgnls = 1/yhatnls
. nl expnls y [aweight=wfgnls]
(sum of wgt is 405584.32)
Iteration 0: residual SS = 1127.256
Iteration 1: residual SS = 363.8331
Iteration 2: residual SS = 239.3399
97

Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

residual SS =
residual SS =
residual SS =
residual SS =

220.6796
220.2856
220.2851
220.2851

Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 4946.06
Model | 217.95244 2 108.97622
Prob > F
= 0.0000
Residual | 220.285065 9998 .022032913
R-squared = 0.4973
-------------+-----------------------------Adj R-squared = 0.4972
Total | 438.237505 10000 .043823751
Root MSE
= .1484349
Res. dev. = 8924.231
(expnls)
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------b1int | 1.984035 .0147737 134.30 0.000 1.955075 2.012994
b2x | -.990691 .01001 -98.97 0.000 -1.010313 -.9710694
-----------------------------------------------------------------------------(SEs, P values, CIs, and correlations are asymptotic approximations)
. estimates store bfgnls
.
. * (2C) Robust standard errors
. * The standard errors obtained given are consistent
. * assuming correct model for heteroskedasticity.
. * To guard against misspecification use similar approach to nls case
. * Obtain the derivatives dg/db
. * Here g = exp(x'b) so dg/db = exp(x'b)*x = yhat*x
. predict residoptnls, residuals
. predict yhatoptnls, yhat
. gen d1opt = yhatoptnls
. gen d2opt = x*yhatoptnls
. * This OLS regression gives robust standard errors
. regress residoptnls d1opt d2opt [aweight=wfgnls], noconstant robust
(sum of wgt is 4.0558e+05)
Regression with robust standard errors
Number of obs = 10000
F( 2, 9998) = 0.00
Prob > F
= 1.0000
R-squared = 0.0000
Root MSE = .14843
-----------------------------------------------------------------------------|
Robust
residoptnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
98

-------------+---------------------------------------------------------------d1opt | -9.85e-09 .0145803 -0.00 1.000 -.0285803 .0285802


d2opt | 8.81e-09 .0101319 0.00 1.000 -.0198606 .0198606
-----------------------------------------------------------------------------. estimates store bfgnlsrobust
. * This OLS regression gives nonrobust standard errors
. * It is a check and should equal (C)
. regress residoptnls d1opt d2opt [aweight=wfgnls], noconstant
(sum of wgt is 4.0558e+05)
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 0.00
Model | 2.2737e-13 2 1.1369e-13
Prob > F
= 1.0000
Residual | 220.285065 9998 .022032913
R-squared = 0.0000
-------------+-----------------------------Adj R-squared = -0.0002
Total | 220.285065 10000 .022028506
Root MSE
= .14843
-----------------------------------------------------------------------------residoptnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------d1opt | -9.85e-09 .0147737 -0.00 1.000 -.0289594 .0289594
d2opt | 8.81e-09 .01001 0.00 1.000 -.0196216 .0196216
-----------------------------------------------------------------------------. estimates store bfgnlscheck
.
. * (2D) Calculate R-squared and log-likelihood at the NLS estimates
. * Note that Stata version 8 reports the wrong R-squared
. * as uses TSS = Sum_i y_i^2 and not Sum_i(y_i - ybar)^2
. * lnL sums lnf(y) = ln(lamda) - y*lamda
. gen lamdafgnls = 1 / yhatoptnls
/* yhatoptnls saved earlier */
. gen lnffgnls = ln(lamdafgnls) - y*lamdafgnls
. quietly means lnffgnls
. scalar LLfgnls = r(mean)*r(N)
. * R-squared = 1 - Sum_i(y_i - yhat_i)^2 / Sum_i(y_i - ybar)^2
. gen y_yhatsqfgnls = (y - yhatoptnls)^2
. quietly means y_yhatsqfgnls
. scalar SSresidfgnls = r(mean)
. scalar Rsqfgnls = 1 - SSresidfgnls/SStotal
. di LLfgnls "

/* SStotal found earlier */

" Rsqfgnls
99

-208.71965

.39056605

.
. ** (3) WNLS ESTIMATION USING STATA NL COMMAND
.
. * To get WNLS estimates in Table 5.7
. * replace gen wfgnls = (1/yhatnls)^2 in (3) FGNLS by gen wfgnls = 1/yhatnls
. * Code is shorter as all comments are dropped
.
. gen wwnls = 1/yhatnls
. nl expnls y [aweight=wwnls]
(sum of wgt is 39858.614)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

residual SS =
residual SS =
residual SS =
residual SS =
residual SS =

2630.417
1694.802
1500.277
1494.658
1494.653

Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 5073.75
Model | 1517.00087 2 758.500436
Prob > F
= 0.0000
Residual | 1494.6525 9998 .149495149
R-squared = 0.5037
-------------+-----------------------------Adj R-squared = 0.5036
Total | 3011.65337 10000 .301165337
Root MSE
= .386646
Res. dev. = 14035.49
(expnls)
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------b1int | 1.990623 .0224903 88.51 0.000 1.946537 2.034708
b2x | -.9960671 .009777 -101.88 0.000 -1.015232 -.9769022
-----------------------------------------------------------------------------(SEs, P values, CIs, and correlations are asymptotic approximations)
. estimates store bwnls
. predict residwnls, residuals
. predict yhatwnls, yhat
. gen d1w = yhatwnls
. gen d2w = x*yhatwnls
. regress residwnls d1w d2w [aweight=wwnls], noconstant robust
(sum of wgt is 3.9859e+04)
Regression with robust standard errors
Number of obs = 10000
F( 2, 9998) = 0.00
100

Prob > F
= 1.0000
R-squared = 0.0000
Root MSE = .38665
-----------------------------------------------------------------------------|
Robust
residwnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------d1w | -1.11e-07 .0358551 -0.00 1.000 -.0702833 .0702831
d2w | 5.35e-08 .0224175 0.00 1.000 -.0439428 .043943
-----------------------------------------------------------------------------. estimates store bwnlsrobust
. regress residwnls d1w d2w [aweight=wwnls], noconstant
(sum of wgt is 3.9859e+04)
Source |
SS
df
MS
Number of obs = 10000
-------------+-----------------------------F( 2, 9998) = 0.00
Model | 1.8190e-12 2 9.0949e-13
Prob > F
= 1.0000
Residual | 1494.6525 9998 .149495149
R-squared = 0.0000
-------------+-----------------------------Adj R-squared = -0.0002
Total | 1494.6525 10000 .14946525
Root MSE
= .38665
-----------------------------------------------------------------------------residwnls |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------d1w | -1.11e-07 .0224903 -0.00 1.000 -.0440856 .0440853
d2w | 5.35e-08 .009777 0.00 1.000 -.0191649 .019165
-----------------------------------------------------------------------------. estimates store bwnlscheck
. gen lamdawnls = 1 / yhatwnls

/* yhatwnls saved earlier */

. gen lnfwnls = ln(lamdawnls) - y*lamdawnls


. quietly means lnfwnls
. scalar LLwnls = r(mean)*r(N)
. gen y_yhatsqwnls = (y - yhatwnls)^2
. quietly means y_yhatsqwnls
. scalar SSresidwnls = r(mean)
. scalar Rsqwnls = 1 - SSresidwnls/SStotal

/* SStotal found earlier */

. di LLwnls " " Rsqwnls


-208.93381 .39017996
101

.
. ***** PRINT RESULTS: Last three columns of Table 5.7 page 161
.
. * (1) NLS using NL - nonrobust and robust standard errors
. * Here nonrobust differs from robust asymptotically
.
. * Table 5.7 NLS nonrobust standard errors
. estimates table bnls, b(%10.4f) se(%10.4f) t stats(N ll)
--------------------------Variable | bnls
-------------+------------b1int | 1.8876
| 0.0307
|
61.52
b2x | -0.9575
| 0.0097
| -98.28
-------------+------------N | 10000.0000
ll |
--------------------------legend: b/se/t
. * Table 5.7 NLS robust standard errors
. estimates table bnlscheck bnlsrobust, b(%10.4f) se(%10.4f) t stats(N ll)
---------------------------------------Variable | bnlscheck bnlsrobust
-------------+-------------------------d1 | 0.0000
0.0000
| 0.0307
0.1421
|
0.00
0.00
d2 | -0.0000 -0.0000
| 0.0097
0.0612
|
-0.00
-0.00
-------------+-------------------------N | 10000.0000 10000.0000
ll | -1.426e+04 -1.426e+04
---------------------------------------legend: b/se/t
.
. /*
> * Check: Nonrobust standard errors of NLS b1int and b2x:
> di seb1intnlsnr " " seb2xnlsnr
> * Robust standard errors of NLS estimates of b1int and b2x:
> di seb1intnls " " seb2xnls
> */
. * Alternative Robust standard errors of NLS estimates of b1int and b2x:
102

. * These use knowledge that Var[u] = exp(x'b)


. di seb1intnlsknown " " seb2xnlsknown
.21097066 .08798113
.
. * (3) WNLS - nonrobust and robust standard errors
. * Here nonrobust = robust asymptotically as WNLS in LEF
. * Also should be same as MLE asymptotically
. * Table 5.7 WNLS nonrobust standard errors
. estimates table bwnls, b(%10.4f) se(%10.4f) t stats(N ll)
--------------------------Variable | bwnls
-------------+------------b1int | 1.9906
| 0.0225
|
88.51
b2x | -0.9961
| 0.0098
| -101.88
-------------+------------N | 10000.0000
ll |
--------------------------legend: b/se/t
. * Table 5.7 WNLS robust standard errors
. estimates table bwnlscheck bwnlsrobust, b(%10.4f) se(%10.4f) t stats(N ll)
---------------------------------------Variable | bwnlscheck bwnlsrob~t
-------------+-------------------------d1w | -0.0000 -0.0000
| 0.0225
0.0359
|
-0.00
-0.00
d2w | 0.0000
0.0000
| 0.0098
0.0224
|
0.00
0.00
-------------+-------------------------N | 10000.0000 10000.0000
ll | -4685.9286 -4685.9286
---------------------------------------legend: b/se/t
.
. * (2) FGNLS - nonrobust and robust standard errors
. * Here nonrobust = robust asymptotically as FGNLS in LEF
. * Also should be same as MLE asymptotically
. * Table 5.7 FGNLS nonrobust standard errors
. estimates table bfgnls, b(%10.4f) se(%10.4f) t stats(N ll)

103

--------------------------Variable | bfgnls
-------------+------------b1int | 1.9840
| 0.0148
| 134.30
b2x | -0.9907
| 0.0100
| -98.97
-------------+------------N | 10000.0000
ll |
--------------------------legend: b/se/t
. * Table 5.7 FGNLS robust standard errors
. estimates table bfgnlscheck bfgnlsrobust, b(%10.4f) se(%10.4f) t stats(N ll)
---------------------------------------Variable | bfgnlsch~k bfgnlsro~t
-------------+-------------------------d1opt | -0.0000
-0.0000
| 0.0148
0.0146
|
-0.00
-0.00
d2opt | 0.0000
0.0000
| 0.0100
0.0101
|
0.00
0.00
-------------+-------------------------N | 10000.0000 10000.0000
ll | 4887.7042 4887.7042
---------------------------------------legend: b/se/t
.
. * (4) Print the various log-likelihoods and R-squared
. * Log-likelihood for NLS and FNGLS
. di "LLnls: " LLnls " LLfgnls: " LLfgnls " LLwnls: " LLwnls
LLnls: -232.97524 LLfgnls: -208.71965 LLwnls: -208.93381
. * R-squared for MLE, NLS and FNGLS
. di "Rsqnls: " Rsqnls " Rsqfgnls: " Rsqfgnls " Rsqwnls: " Rsqwnls
Rsqnls: .39134462 Rsqfgnls: .39056605 Rsqwnls: .39017996
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma05p2nls.txt
log type: text
closed on: 17 May 2005, 13:53:34
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p3nlsbyml.txt
104

log type: text


opened on: 17 May 2005, 13:54:20
.
. ********** OVERVIEW OF MMA05P2NLSBYML.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 5.9 pp.159-63
. * Nonlinear Least Squares using Stata command ml
.
. * Provides third column of Table 5.7 for
. * (1) NLS using Stata ml command (easy to get robust s.e.'s)
. * using generated data set mma05data.asc
.
. * Note: Use ml rather than nl as then much easier to get robust s.e.'s
.*
Can instead use stata command nl see program mma05p2nlsbynl.do
.
. * Related programs:
. * mma05p1mle.do
OLS and MLE for the same data
. * mma05p2nls.do
NLS (and WMNLS and FGNLS) using Stata command nl
. * mma05p4margeffects.do Calculates marginal effects
.
. * To run this program you need data and dictionary files
. * mma05data.asc ASCII data set generated by mma05p1mle.do
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** READ IN DATA and SUMMARIZE **********
.
. * Model is y ~ exponential(exp(a + bx))
.*
x ~ N[mux, sigx^2]
.*
f(y) = exp(a + bx)*exp(-y*exp(a + bx))
.*
lnf(y) = (a + bx) - y*exp(a + bx)
.*
E[y] = exp(-(a + bx)) note sign reversal for the mean
.*
V[y] = exp(-(a + bx)) = E[y]^2
. * Here a = 2, b = -1 and x ~ N[mux=1, sigx^21]
. * and Table 5.7 uses N=10,000
.
. * Data was generated by program mma05p1mle.do
. infile y x using mma05data.asc
(10000 observations read)
105

.
. * Descriptive Statistics
. describe
Contains data
obs:
10,000
vars:
2
size:
120,000 (98.8% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y | 10000 .6194352 1.291416 .0000445 30.60636
x | 10000 1.014313 1.004905 -2.895741 4.994059
.
. ********** DO THE ANALYSIS: NLS using STATA COMMAND ML **********
.
. * (1) NLS ESTIMATION USING STATA ML COMMAND (maximum likelihood)
.
. * Advantage: ml command has robust standard errors as an option
.
. * The NLS estimator minimizes SUM_i (y_i - g(x_i'b))^2.
. * Here let g(x'b) = exp(a + b*x) = exp(b1int + b2x*x) say.
. * In fact for this dgp E[y] = exp(-(a + bx)) so sign reversal for the mean.
.
. * To adjust this code to other NLS problems
. * (a) If more regressors, say x1 x2 and x3, replace ml model line with
.*
ml model lf mlexp (y = x1 x2 x3) / sigma
. * (b) If different functional form for mean, say g(x'b), redefine `res' as
.*
`res' = $ML_y1 - g(`theta')
. * (c) If functional form for mean is not single-index then the program
. * will become considerably more complicated with more args.
.
. * (1A) The program "mlexp" defines the objective function
. program define mlexp
1. version 8.0
2. args lnf theta sigma
/* theta contains b1int and b2x; sigma is st.dev.of error */
3. tempvar res
/* create to shorten expression for lnf */
4. quietly gen double `res' = $ML_y1 - exp(-`theta')
106

5. quietly replace `lnf' = -0.5*ln(2*_pi) - ln(`sigma') - 0.5*`res'^2/`sigma'^2


6. end
.
. * (1B) The following command gives the dep variable (y) and regressors (x + intercept)
. ml model lf mlexp (y = x) / sigma
. ml search
initial:
log likelihood = -<inf> (could not be evaluated)
feasible:
log likelihood = -35613.002
improve:
log likelihood = -19164.648
rescale:
log likelihood = -16938.923
rescale eq: log likelihood = -16938.923
. ml maximize
initial:
log likelihood = -16938.923
rescale:
log likelihood = -16938.923
rescale eq: log likelihood = -16938.923
Iteration 0: log likelihood = -16938.923 (not concave)
Iteration 1: log likelihood = -15504.033
Iteration 2: log likelihood = -14673.535
Iteration 3: log likelihood = -14272.637
Iteration 4: log likelihood = -14263.775
Iteration 5: log likelihood = -14263.761
Iteration 6: log likelihood = -14263.761
Number of obs =
10000
Wald chi2(1) = 10492.88
Log likelihood = -14263.761
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------eq1
|
x | -.9574683 .0093471 -102.43 0.000 -.9757883 -.9391483
_cons | 1.887562 .0295701 63.83 0.000 1.829606 1.945519
-------------+---------------------------------------------------------------sigma
|
_cons | 1.007465 .0071239 141.42 0.000 .9935028 1.021428
-----------------------------------------------------------------------------. estimates store bnlsbymle
.
. * (1C) Adding ,robust gives Heteroskedastic robust standard errors
. ml model lf mlexp (y = x) / sigma, robust
. ml search
initial:
log pseudo-likelihood = -<inf> (could not be evaluated)
feasible:
log pseudo-likelihood = -35613.002
107

improve:
log pseudo-likelihood = -17310.807
rescale:
log pseudo-likelihood = -17310.807
rescale eq: log pseudo-likelihood = -16777.282
. ml maximize
initial:
log pseudo-likelihood = -16777.282
rescale:
log pseudo-likelihood = -16777.282
rescale eq: log pseudo-likelihood = -16777.282
Iteration 0: log pseudo-likelihood = -16777.282 (not concave)
Iteration 1: log pseudo-likelihood = -16097.359
Iteration 2: log pseudo-likelihood = -16013.711
Iteration 3: log pseudo-likelihood = -14412.885
Iteration 4: log pseudo-likelihood = -14264.159
Iteration 5: log pseudo-likelihood = -14263.761
Iteration 6: log pseudo-likelihood = -14263.761
Number of obs =
10000
Wald chi2(1) = 288.75
Log pseudo-likelihood = -14263.761
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------eq1
|
x | -.9574683 .0563463 -16.99 0.000 -1.067905 -.8470317
_cons | 1.887562 .127832 14.77 0.000 1.637016 2.138108
-------------+---------------------------------------------------------------sigma
|
_cons | 1.007465 .0561714 17.94 0.000 .8973713 1.117559
-----------------------------------------------------------------------------. estimates store bnlsbymlerobust
.
. ***** PRINT RESULTS: Third column of Table 5.7 p.111 **********
.
. * (1) NLS by ML - nonrobust and robust standard errors
. * The coefficient estimates are exactly the same as those using the nl command
. * The estimated standard errors are close - within 10% of those using the nl command
. * Table 5.7 reports the standard errors using the nl command
. estimates table bnlsbymle bnlsbymlerobust, b(%10.4f) se(%10.4f) t stats(N ll)
---------------------------------------Variable | bnlsbymle bnlsbyml~t
-------------+-------------------------eq1
|
x | -0.9575 -0.9575
| 0.0093
0.0563
| -102.43
-16.99
108

_cons | 1.8876
1.8876
| 0.0296
0.1278
|
63.83
14.77
-------------+-------------------------sigma
|
_cons | 1.0075
1.0075
| 0.0071
0.0562
| 141.42
17.94
-------------+-------------------------Statistics |
N | 10000.0000 10000.0000
ll | -1.426e+04 -1.426e+04
---------------------------------------legend: b/se/t
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma05p3nlsbyml.txt
log type: text
closed on: 17 May 2005, 13:54:27
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma05p4margeffects.txt
log type: text
opened on: 17 May 2005, 13:57:02
.
. ********** OVERVIEW OF MMA05P4MARGINALEFFECTS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 5.9.4 pp.162-3
. * Marginal effects analysis for a nonlinear model (here exponential regression).
.
. * Provides
. * (1) Sample average marginal effect using derivative
. * (2) Sample average marginal effect using first difference
. * (3) Marginal effect evaluated at the sample mean
. * (4) Marginal effects (1)-(3) when model estimated by Stata ml command
. * using generated data (see below)
.
. * Related programs:
. * mma05p1mle.do
OLS and MLE for the same data
. * mma05p2nls.do
NLS, WNLS, FGNLS for same data using nl command
. * mma05p3nlsbyml.do NLS for same data using ml command
.
109

. * To run this program you need data and dictionary files


. * mma05data.asc ASCII data set generated by mma05p1mle.do
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** READ IN DATA and SUMMARIZE **********
.
. * Model is y ~ exponential(exp(a + bx))
.*
x ~ N[mux, sigx^2]
.*
f(y) = exp(a + bx)*exp(-y*exp(a + bx))
.*
lnf(y) = (a + bx) - y*exp(a + bx)
.*
E[y] = exp(-(a + bx)) note sign reversal for the mean
.*
V[y] = exp(-(a + bx)) = E[y]
. * Here a = 2, b = -1 and x ~ N[mux=1, sigx^21]
. * and Table 5.7 uses N=10,000
.
. * Data was generated by program mma05p1mle.do
. infile y x using mma05data.asc
(10000 observations read)
.
. * Descriptive Statistics
. describe
Contains data
obs:
10,000
vars:
2
size:
120,000 (98.8% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y | 10000 .6194352 1.291416 .0000445 30.60636
x | 10000 1.014313 1.004905 -2.895741 4.994059
.
110

. ********** MARGINAL EFFECTS for CHAPTER 5.9.4 **********


.
. ** (1) DERIVATIVE METHOD FOR SAMPLE AVERAGE MARGINAL EFFECT
.
. * (1A) METHOD A: Use analytical results
. * Since E[y] = exp(-(a + bx)) Note: here sign reversal for the mean !!
.*
dE[y]/dx = -b*exp(-(a + bx)) = -b*E[y]
.
. * Estimate the model
. * The Stata code for exponential regression is unusual as st command
. * Need to declare data to be st data with dependent variable y
. stset y
failure event: (assumed to fail at time=y)
obs. time interval: (0, y]
exit on or before: failure
-----------------------------------------------------------------------------10000 total obs.
0 exclusions
-----------------------------------------------------------------------------10000 obs. remaining, representing
10000 failures in single record/single failure data
6194.352 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 30.60636
. quietly streg x, distribution(exponential) nohr
. gen dEydxanalyticalderivative = -_b[x]*exp(-_b[_cons] - _b[x]*x)
. * Alternative is to (1) predict the mean and (2) multiply by -_b[x]
. quietly sum dEydxanalyticalderivative
. scalar mesaad = r(mean)
. di "Sample average marginal effect by analytical derivative = " mesaad
Sample average marginal effect by analytical derivative = .60976598
.
. * (1B) METHOD B: Use numerical derivative (here one-sided)
. * This is same as first difference code, except have small change in x
. * Note: precision problems can arise with small changes in x
. * The following code tries to minimize such problems
. * Change in x will be 0.0001 times the standard deviation of x
. egen sdx = sd(x)
. quietly streg x, distribution(exponential) nohr
. * Need to tell streg to predict the mean as this is not the default.
. predict y0, mean time
111

. gen xoriginal = x
. replace x = x+0.0001*sdx
(10000 real changes made)
. predict y1, mean time
. gen dEydxnumericalderivative = (y1 - y0)/(0.0001*sdx)
. quietly sum dEydxnumericalderivative
. scalar mesand = r(mean)
. di "Sample average marginal effect by numerical derivative = " mesand
Sample average marginal effect by numerical derivative = .60949044
. replace x = xoriginal
(10000 real changes made)
. drop xoriginal sdx y0 y1
.
. ** (2) FINITE DIFFERENCE METHOD FOR SAMPLE AVERAGE MARGINAL EFFECT
.
. streg x, distribution(exponential) nohr /* y is dependent variable */
failure _d: 1 (meaning all fail)
analysis time _t: y
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -20754.005


log likelihood = -17232.884
log likelihood = -15760.556
log likelihood = -15752.193
log likelihood = -15752.19
log likelihood = -15752.19

Exponential regression -- log relative-hazard form


No. of subjects =
10000
No. of failures =
10000
Time at risk = 6194.352464

Number of obs =

LR chi2(1)
Log likelihood =

-15752.19

= 10003.63
Prob > chi2 =

10000

0.0000

-----------------------------------------------------------------------------_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | -.9896276 .0098692 -100.27 0.000 -1.008971 -.9702842
_cons | 1.982921 .0141496 140.14 0.000 1.955188 2.010654
-----------------------------------------------------------------------------112

.
. * The following method can be used following many stata estimation commands
. * 1. Predict y using sample data.
. * Need to say predict the mean as this is not the streg default.
. predict y0, mean time
. * 2. Predict y with regressor of x increased by one
. gen xoriginal = x
. replace x = x+1
(10000 real changes made)
. predict y1, mean time
. replace x = xoriginal /* Put x back to initial value for later analysis */
(10000 real changes made)
. * 3. Calculate difference
. gen dEydxfinitedifference = y1 - y0
. quietly sum dEydxfinitedifference
. scalar mesafd = r(mean)
. di "Sample average marginal effect by first differences = " mesafd
Sample average marginal effect by first differences = 1.0414485
. drop xoriginal y0 y1
.
. ** (3) DERIVATIVE METHOD FOR MARGINAL EFFECT AT SAMPLE MEAN
.
. * (3A) Use Stata command mfx
. quietly streg x, distribution(exponential) nohr
. * Need to tell mfx to predict the mean as this is not the streg default.
. mfx compute, dydx predict(mean time)
Marginal effects after ereg
y = predicted mean _t (predict, mean time)
= .37563828
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ]
X
---------+-------------------------------------------------------------------x | .371742
.00525 70.81 0.000 .361452 .382032 1.01431
-----------------------------------------------------------------------------. di "Marginal effect by analytical derivative at mean of x using mfx: "
Marginal effect by analytical derivative at mean of x using mfx:

113

. matrix list e(Xmfx_dydx)


symmetric e(Xmfx_dydx)[1,1]
x
r1 .371742
.
. * (3B) Write ones own code
. quietly streg x, distribution(exponential) nohr
. quietly sum x
. scalar meanx = r(mean)
. scalar dEydxatmeanx = -_b[x]*exp(-_b[_cons] - _b[x]*meanx)
. di "Marginal effect by analytical derivative at mean of x done manually: "
Marginal effect by analytical derivative at mean of x done manually:
. di dEydxatmeanx
.371742
.
. ** (4) MARGINAL EFFECTS AFTER ML COMMAND
.
. * Preceding (1) - (3) presume there is a built-in command to get MLE.
. * Now consider ML estimation using Stata's ml command.
. * After ml command cannot use predict or mfx.
. * Need to be more manual, as follows.
.
. * Estimate model by ml: for details see mma0p1mle.do
. program define mleexp0
1. version 8.0
2. args lnf theta
/* Must use lnf while could use name other than theta */
3. quietly replace `lnf' = `theta' - $ML_y1*exp(`theta')
4. end
. quietly ml model lf mleexp0 (y = x)
. quietly ml search
. quietly ml maximize
.
. * Note that here the mean is in fact exp(-a-b*x)
.
. * (1A) Sample average marginal effect by calculus methods
. gen mldEydxanalyticalderivative = -_b[x]*exp(-_b[_cons] - _b[x]*x)
. quietly sum mldEydxanalyticalderivative

114

. scalar mlmesaad = r(mean)


. di "Sample average marginal effect by analytical derivative = " mlmesaad
Sample average marginal effect by analytical derivative = .60976598
.
. * (1B) Sample average marginal effect by numerical derivative
. egen sdx = sd(x)
. gen y0 = exp(-_b[_cons] - _b[x]*x)
. gen xoriginal = x
. replace x = x+0.0001*sdx
(10000 real changes made)
. gen y1 = exp(-_b[_cons] - _b[x]*x)
. gen mldEydxnumericalderivative = (y1 - y0)/(0.0001*sdx)
. quietly sum mldEydxnumericalderivative
. scalar mlmesand = r(mean)
. di "ML sample average marginal effect by numerical derivative = " mlmesand
ML sample average marginal effect by numerical derivative = .60949063
. replace x = xoriginal
(10000 real changes made)
. drop xoriginal sdx y0 y1
.
. * (2) Sample average marginal effect by increase x by one unit (finite difference)
. gen mldEydxfinitedifference = exp(-_b[_cons]-_b[x]*(x+1)) - exp(-_b[_cons]-_b[x]*x)
. quietly sum mldEydxfinitedifference
. scalar mlmesafd = r(mean)
. di "Sample average marginal effect by first differnce = " mlmesafd
Sample average marginal effect by first differnce = 1.0414485
.
. * (3) Marginal effect estimated at the sample mean of x
. quietly sum x
. scalar meanx = r(mean)
. scalar mldEydxatmeanx = -_b[x]*exp(-_b[_cons] - _b[x]*meanx)

115

. di "ML marginal effect at mean of x by analytical derivative: "


ML marginal effect at mean of x by analytical derivative:
. di mldEydxatmeanx
.371742
.
. ********** DISPLAY RESULTS on p.162-3 **********
.
. di "Marginal Effects: (1A) Analytical deriv (1B) Numerical Deriv (2) First diff"
Marginal Effects: (1A) Analytical deriv (1B) Numerical Deriv (2) First diff
. sum dEydxfinitedifference dEydxanalyticalderivative dEydxnumericalderivative
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dEydxfinit~e | 10000 1.041449 1.373144 .01325 32.59646
dEydxanaly~e | 10000 .609766 .8039727 .0077578 19.08516
dEydxnumer~e | 10000 .6094904 .8035654 .0077479 19.11325
.
. di "KEY RESULTS FOR CHAPTER 5.9.4 pp.162-3 FOLLOW"
KEY RESULTS FOR CHAPTER 5.9.4 pp.162-3 FOLLOW
. di "(1A) Sample average marginal effect by analytical derivative = " mesaad
(1A) Sample average marginal effect by analytical derivative = .60976598
. di "(1B) Sample average marginal effect by numerical derivative = " mesand
(1B) Sample average marginal effect by numerical derivative = .60949044
. di "(2) Sample average marginal effect by first differences = " mesafd
(2) Sample average marginal effect by first differences = 1.0414485
. di "(3) Marginal effect at mean of x by analytical derivative = " dEydxatmeanx
(3) Marginal effect at mean of x by analytical derivative = .371742
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma05p4margeffects.txt
log type: text
closed on: 17 May 2005, 13:57:06

116

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma06p2Theil.txt
log type: text
opened on: 18 May 2005, 17:45:50
.
. ********** OVERVIEW OF MMA06P2THEIL.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * NOTE: Stata does not have a NL2SLS command
.
. * Chapter 6.5.4 nonlinear 2SLS example.
. * Table 6.4 partial only
. * (1) OLS
inconsistent
. * (2) NL2SLS consistent NOT INCLUDED AS STATA DOES NOT DO
. * (3) Wrong 2SLS inconsistent
.
. * To run this program you need data set
.*
mma06p1nl2sls.asc
. * generated by Limdep program MMA06P1NL2SLS.LIM
.
. * Some of the analysis is done in Limdep which (unlike Stata) has
. * an NL2SLS command
.
. ********** SETUP **********
.
. set more off
. version 8.0
.
. ********** READ DATA and SUMMARIZE **********
.
. * Model is y = 1*x^2 + u
.*
x = 1*z + v
. * where u and v are joint normal (0,0,1,1,0.8)
.
. infile y x xsq z zsq u v using mma06p1nl2sls.asc
(200 observations read)
.
. * Descriptive Statistics
. describe
Contains data
obs:
200
117

vars:
7
size:
6,400 (99.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
xsq
float %9.0g
z
float %9.0g
zsq
float %9.0g
u
float %9.0g
v
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y|
200 1.632794 2.418096 -2.332656 9.354863
x|
200 .9970513 .8330302 -1.908285 2.696363
xsq |
200 1.684581 1.638509 .0000948 7.270374
z|
200
1
0
1
1
zsq |
200
1
0
1
1
-------------+-------------------------------------------------------u|
200 -.0517871 .9427286 -2.816687 2.202356
v|
200 -.0029487 .8330302 -2.908285 1.696363
.
. ********** DO THE ANALYSIS: ESTIMATE MODELS **********
.
. * (1) OLS is inconsistent (first column of Table 4.4)
. regress y xsq, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 2250.83
Model | 1558.96322 1 1558.96322
Prob > F
= 0.0000
Residual | 137.83055 199 .692615831
R-squared = 0.9188
-------------+-----------------------------Adj R-squared = 0.9184
Total | 1696.79377 200 8.48396883
Root MSE
= .83224
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0250721 47.44 0.000 1.140054 1.238936
-----------------------------------------------------------------------------. estimates store olswrong

118

. regress y xsq, noconstant robust


Regression with robust standard errors
Number of obs =
F( 1, 199) = 3850.71
Prob > F
= 0.0000
R-squared = 0.9188
Root MSE = .83224

200

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0191687 62.05 0.000 1.151695 1.227295
-----------------------------------------------------------------------------. estimates store olswrongrob
.
. * (2) NL2SLS command Stata does not have
. * See LIMDEP program MMA06P1NL2SLS.LIM
.
. * (3A) Theil's 2sls where first regress x on z is inconsistent
. regress x z, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 286.51
Model | 198.822258 1 198.822258
Prob > F
= 0.0000
Residual | 138.093918 199 .693939288
R-squared = 0.5901
-------------+-----------------------------Adj R-squared = 0.5881
Total | 336.916176 200 1.68458088
Root MSE
= .83303
-----------------------------------------------------------------------------x|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | .9970513 .0589041 16.93 0.000 .8808949 1.113208
-----------------------------------------------------------------------------. predict xhat
(option xb assumed; fitted values)
. gen xhatsq = xhat*xhat
. regress y xhatsq, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181

119

-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xhatsq | 1.642466 .1719981 9.55 0.000 1.303293 1.981638
-----------------------------------------------------------------------------. estimates store ivwrong
.
. ********** DISPLAY KEY RESULTS Table 6.4 p.199 **********
.
. * Table 4.4 p.199
. estimates table olswrong olswrongrob ivwrong, b(%8.3f) se stats(N r2) keep(xsq xhatsq)
----------------------------------------------Variable | olswrong olswro~b ivwrong
-------------+--------------------------------xsq | 1.189
1.189
| 0.025
0.019
xhatsq |
1.642
|
0.172
-------------+--------------------------------N | 200.000 200.000 200.000
r2 | 0.919
0.919
0.314
----------------------------------------------legend: b/se
.
. * (3B) IV with instrument xsq for zsq should work but Stata cannot do
. ivreg y (xsq = xsq), noconstant
Instrumental variables (2SLS) regression
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) =
.
Model | 1558.96322 1 1558.96322
Prob > F
=
.
Residual | 137.83055 199 .692615831
R-squared =
.
-------------+-----------------------------Adj R-squared =
.
Total | 1696.79377 200 8.48396883
Root MSE
= .83224
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0250721 47.44 0.000 1.140054 1.238936
-----------------------------------------------------------------------------Instrumented: xsq
Instruments: xsq
-----------------------------------------------------------------------------. corr xsq xsq
(obs=200)
120

|
xsq
xsq
-------------+-----------------xsq | 1.0000
xsq | 1.0000 1.0000

. corr xsq z
(obs=200)
|
xsq
z
-------------+-----------------xsq | 1.0000
z|
.
.

. regress xsq z, noconstant


Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 211.41
Model | 567.562553 1 567.562553
Prob > F
= 0.0000
Residual | 534.257348 199 2.68471029
R-squared = 0.5151
-------------+-----------------------------Adj R-squared = 0.5127
Total | 1101.8199 200 5.50909951
Root MSE
= 1.6385
-----------------------------------------------------------------------------xsq |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | 1.684581 .1158601 14.54 0.000 1.45611 1.913052
-----------------------------------------------------------------------------. predict xsqhat
(option xb assumed; fitted values)
. regress y xsqhat, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsqhat | .9692582 .1015002 9.55 0.000 .7691043 1.169412
-----------------------------------------------------------------------------. * ivreg y (xsq = z), noconstant
.
121

. gen one = 1
. regress y one, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------one | 1.632794 .1709852 9.55 0.000 1.295618 1.969969
-----------------------------------------------------------------------------.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma06p2Theil.txt
log type: text
closed on: 18 May 2005, 17:45:50
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma06p2twostage.txt
log type: text
opened on: 18 May 2005, 17:59:06
.
. ********** OVERVIEW OF MMA06P2TWOSTAGE.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * NOTE: Stata does not have a NL2SLS command
.
. * Chapter 6.5.4 nonlinear 2SLS example on pages 198-9.
.
. * Table 6.4 partial only
. * (1) OLS
inconsistent
. * (2) NL2SLS consistent NOT INCLUDED AS STATA DOES NOT DO
. * (3) Twostage Here 2SLS using Theil's interpretation of 2SLS is inconsistent
.
. * To run this program you need data set
.*
mma06p1nl2sls.asc
. * generated by Limdep program MMA06P1NL2SLS.LIM
.
. * Some of the analysis is done in Limdep which (unlike Stata) has
122

. * an NL2SLS command
.
. ********** SETUP **********
.
. set more off
. version 8.0
.
. ********** READ DATA and SUMMARIZE **********
.
. * Model is y = 1*x^2 + u
.*
x = 1*z + v
. * where u and v are joint normal (0,0,1,1,0.8)
.
. infile y x xsq z zsq u v using mma06p1nl2sls.asc
(200 observations read)
.
. * Descriptive Statistics
. describe
Contains data
obs:
200
vars:
7
size:
6,400 (99.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------y
float %9.0g
x
float %9.0g
xsq
float %9.0g
z
float %9.0g
zsq
float %9.0g
u
float %9.0g
v
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y|
200 1.632794 2.418096 -2.332656 9.354863
x|
200 .9970513 .8330302 -1.908285 2.696363
xsq |
200 1.684581 1.638509 .0000948 7.270374
z|
200
1
0
1
1
zsq |
200
1
0
1
1
-------------+-------------------------------------------------------123

u|
v|

200 -.0517871
200 -.0029487

.9427286 -2.816687 2.202356


.8330302 -2.908285 1.696363

.
. ********** DO THE ANALYSIS: ESTIMATE MODELS **********
.
. * (1) OLS is inconsistent (first column of Table 4.4)
. regress y xsq, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 2250.83
Model | 1558.96322 1 1558.96322
Prob > F
= 0.0000
Residual | 137.83055 199 .692615831
R-squared = 0.9188
-------------+-----------------------------Adj R-squared = 0.9184
Total | 1696.79377 200 8.48396883
Root MSE
= .83224
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0250721 47.44 0.000 1.140054 1.238936
-----------------------------------------------------------------------------. estimates store olswrong
. regress y xsq, noconstant robust
Regression with robust standard errors
Number of obs =
F( 1, 199) = 3850.71
Prob > F
= 0.0000
R-squared = 0.9188
Root MSE = .83224

200

-----------------------------------------------------------------------------|
Robust
y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsq | 1.189495 .0191687 62.05 0.000 1.151695 1.227295
-----------------------------------------------------------------------------. estimates store olswrongrob
.
. * (2) NL2SLS command Stata does not have
. * See LIMDEP program MMA06P1NL2SLS.LIM
. * See also code further down
.
. * (3A) Theil's 2sls where first regress x on z
.*
and then use xhat^2 as instrument for x^2 is inconsistent
.
. regress x z, noconstant

124

Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 286.51
Model | 198.822258 1 198.822258
Prob > F
= 0.0000
Residual | 138.093918 199 .693939288
R-squared = 0.5901
-------------+-----------------------------Adj R-squared = 0.5881
Total | 336.916176 200 1.68458088
Root MSE
= .83303
-----------------------------------------------------------------------------x|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | .9970513 .0589041 16.93 0.000 .8808949 1.113208
-----------------------------------------------------------------------------. predict xhat
(option xb assumed; fitted values)
. gen xhatsq = xhat*xhat
. regress y xhatsq, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xhatsq | 1.642466 .1719981 9.55 0.000 1.303293 1.981638
-----------------------------------------------------------------------------. estimates store twostage
.
. ********** DISPLAY KEY RESULTS Table 6.4 p.199 **********
.
. * Table 4.4 p.199 first and third columns
. estimates table olswrong twostage, b(%8.3f) se stats(N r2) keep(xsq xhatsq)
-----------------------------------Variable | olswrong twostage
-------------+---------------------xsq | 1.189
| 0.025
xhatsq |
1.642
|
0.172
-------------+---------------------N | 200.000 200.000
r2 | 0.919
0.314
125

-----------------------------------legend: b/se
.
. ********** FURTHER ANALYSIS **********
.
. * For this particular example there are ways to get linear IV to work
. * as the problem is not very nonlinear
.
. * (2A) regress xsq on z giving xsqhat and then regress y on xsqhat
.*
Gives nl2sls estimator though not correct standard errors
.
. * Note we get estimator 0.969 which is correct - Table 6.4 had typo
. regress xsq z, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 211.41
Model | 567.562553 1 567.562553
Prob > F
= 0.0000
Residual | 534.257348 199 2.68471029
R-squared = 0.5151
-------------+-----------------------------Adj R-squared = 0.5127
Total | 1101.8199 200 5.50909951
Root MSE
= 1.6385
-----------------------------------------------------------------------------xsq |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------z | 1.684581 .1158601 14.54 0.000
1.45611 1.913052
-----------------------------------------------------------------------------. predict xsqhat
(option xb assumed; fitted values)
. regress y xsqhat, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 1, 199) = 91.19
Model | 533.203113 1 533.203113
Prob > F
= 0.0000
Residual | 1163.59065 199 5.84718921
R-squared = 0.3142
-------------+-----------------------------Adj R-squared = 0.3108
Total | 1696.79377 200 8.48396883
Root MSE
= 2.4181
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------xsqhat | .9692582 .1015002 9.55 0.000 .7691043 1.169412
-----------------------------------------------------------------------------.
. * (2B) IV with instrument z for xsq should work but Stata cannot do
.*
for some reason due to here z = 1 which has no variation
. ivreg y (xsq = z), noconstant
note: z dropped due to collinearity
126

equation not identified; must have at least as many instruments not in


the regression as there are instrumented variables
r(481);
end of do-file
r(481);
. exit, clear

127

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p1mltests.txt
log type: text
opened on: 17 May 2005, 13:59:20
.
. ********** OVERVIEW OF MMA07P1MLTESTS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 7.4 pp.241-3
. * Likelihood-based hypothesis tests
.
. * Implements the three likelihood-based tests presented in Table 7.1:
. * Wald test
. * LR test
. * LM test direct
. * LM test via auxiliary regression
. * for a Poisson model with simulated data (see below).
.
. * NOTE: To implement this program requires:
.*
the free Stata add-on rndpoix
. * To obtain this, in Stata give command: search rndpoix
. * If you don't want to do this, instead use the data set
.
. ********** SETUP ***********
.
. version 8
. set more off
.
. ********** GENERATE DATA ***********
.
. * Model is
. * y ~ Poisson[exp(b1 + b2*x2 + b3*x3 + b4*x4]
. * where
. * x2, x3 and x4 are iid ~ N[0,1]
. * and b1=0, b2=0.1, b3=0.1 and b4=0.1
.
. set seed 10001
. set obs 200
obs was 0, now 200
. scalar b1 = 0

128

. scalar b2 = 0.1
. scalar b3 = 0.1
. scalar b4 = 0.1
.
. * Generate regressors
. gen x2 = invnorm(uniform())
. gen x3 = invnorm(uniform())
. gen x4 = invnorm(uniform())
.
. * Generate y
. gen mupoiss = exp(b1+b2*x2+b3*x3+b4*x4)
. * The next requires Stata add-on. In Stata: search rndpoix
. rndpoix(mupoiss)
( Generating ....... )
Variable xp created.
. gen y = xp
.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
200 -.0091098 1.010072 -2.857666 2.149822
x3 |
200 -.1459839 1.109521 -3.086754 3.111421
x4 |
200 -.0325314 .9674748 -2.852186 2.379461
mupoiss |
200 1.000447 .1993649 .6191922 1.903112
xp |
200
.845 .951579
0
6
-------------+-------------------------------------------------------y|
200
.845 .951579
0
6
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 x3 x4 using mma07p1mltests.asc, replace
.
. ********** ANALYSIS: LIKELIHOOD-BASED HYPOTHESIS TESTS ***********
.
. * Hypotheses to test are
. * (A) Single exclusion: b3 = 0
. * (B) Multiple exclusion: b3 = 0, b4 = 0
. * (C) Linear:
b3 = b4
. * (B) Nonlinear:
b3/b4 = 1
.
129

. * Tests are Wald, LR, LM and LM (auxiliary)


.
. ****** (A) TEST H0: b3 = 0
.
. * First skip to (B) where many comments given.
.
. ****** (B) TEST H0: b3 = 0, b4 = 0.
.
. * (1) Wald test requires estimation of unrestricted model only
. poisson y x2 x3 x4
Iteration 0: log likelihood = -238.77153
Iteration 1: log likelihood = -238.77153
Poisson regression

Number of obs =
200
LR chi2(3)
=
8.30
Prob > chi2 = 0.0401
Log likelihood = -238.77153
Pseudo R2
= 0.0171

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0275702 .0767909 -0.36 0.720 -.1780775 .1229371
x3 | .1630037 .0670848 2.43 0.015 .0315199 .2944874
x4 | .1026568 .0802139 1.28 0.201 -.0545595 .2598732
_cons | -.1653238 .0773479 -2.14 0.033 -.316923 -.0137246
-----------------------------------------------------------------------------.
. * (1A) Stata Wald test command
. test (x3=0) (x4=0)
( 1) [y]x3 = 0
( 2) [y]x4 = 0
chi2( 2) = 8.57
Prob > chi2 = 0.0138
.
. * (1B) Wald test done manually
. * Use h'[RVR]-inv*h.
. * Details below will change for each example.
. * In particular, for nonlinear restrictions more work in forming R
. * Note that Stata puts the intercept last, not first.
. * So here the second and third elements of b are set to zero.
. matrix bfull = e(b)
/* 1xq row vector */
. matrix vfull = e(V)

/* qxq matrix */

. matrix h = (bfull[1,2]\bfull[1,3])

/* hx1 vector */

130

. matrix R = (0,1,0,0\0,0,1,0)

/* h x q matrix */

. matrix Wald = h'*syminv(R*vfull*R')*h /* scalar */


. matrix list h
h[2,1]
c1
r1 .16300365
r2 .10265681
. matrix list R
R[2,4]
c1 c2 c3 c4
r1 0 1 0 0
r2 0 0 1 0
. matrix list Wald
symmetric Wald[1,1]
c1
c1 8.5701855
. scalar WaldB = Wald[1,1]
.
. * (2) Likelihood ratio test requires estimating both models
.
. poisson y x2 x3 x4
Iteration 0: log likelihood = -238.77153
Iteration 1: log likelihood = -238.77153
Poisson regression

Number of obs =
200
LR chi2(3)
=
8.30
Prob > chi2 = 0.0401
Log likelihood = -238.77153
Pseudo R2
= 0.0171

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0275702 .0767909 -0.36 0.720 -.1780775 .1229371
x3 | .1630037 .0670848 2.43 0.015 .0315199 .2944874
x4 | .1026568 .0802139 1.28 0.201 -.0545595 .2598732
_cons | -.1653238 .0773479 -2.14 0.033 -.316923 -.0137246
-----------------------------------------------------------------------------. estimates store unrestricted
. scalar llunrest = e(ll)

/* Used for Stata lrtest */


/* Used for manual lrtest */
131

. poisson y x2
Iteration 0: log likelihood = -242.92271
Iteration 1: log likelihood = -242.92271 (backed up)
Poisson regression

Number of obs =
200
LR chi2(1)
=
0.00
Prob > chi2 = 0.9608
Log likelihood = -242.92271
Pseudo R2
= 0.0000
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0037493 .0763386 -0.05 0.961 -.1533701 .1458716
_cons | -.1684599 .0769294 -2.19 0.029 -.3192388 -.0176811
-----------------------------------------------------------------------------. estimates store restrictedB
. scalar llrestB = e(ll)

/* Used for Stata lrtest */


/* Used for Stata lrtest */

.
. * (2A) Stata likelihood ratio test
. lrtest unrestricted restrictedB
likelihood-ratio test
LR chi2(2) =
8.30
(Assumption: restrictedB nested in unrestricted)
Prob > chi2 =

0.0157

.
. * (2B) Likelihood test done manually
. scalar LRB = -2*(llrestB-llunrest)
. di "LR " LRB
LR 8.3023503
.
. * (3) LM test via direct compuation requires estimating only the restricted model.
.
. * For exclusion restrictions in the Poisson, from 7.6.2
. * LM = dlnL/db * V[b]-inv * dlnL/db where b evaluated at restricted
. * = [Sum_i u_i*x_i]'[Sum_i exp(x_i'b)*x_i*x_i'][Sum_i u_i*x_i]
. * First calculate Sum_i u_i*x_i' : a 1x4 row vector
.
. quietly poisson y x2
. predict yhatrest
(option n assumed; predicted number of events)
. gen u = y - yhatrest

/* yhatrest = exp(x_brest) calculated earlier */

132

. gen one = 1
. matrix vecaccum dlnL_db = u one x2 x3 x4, noconstant
. * Then calculate Sum_i exp(x_i'b)*x_i*x_i'
. gen trx1 = sqrt(yhatrest)
. gen trx2 = sqrt(yhatrest)*x2
. gen trx3 = sqrt(yhatrest)*x3
. gen trx4 = sqrt(yhatrest)*x4
. matrix accum Vb = trx1 trx2 trx3 trx4, noconstant
(obs=200)
. matrix LMdirect = dlnL_db*syminv(Vb)*dlnL_db'
. matrix list dlnL_db
dlnL_db[1,4]
one
x2
x3
x4
u 1.192e-07 -4.632e-08 37.578639 19.933299
. matrix list Vb
symmetric Vb[4,4]
trx1
trx2
trx3
trx4
trx1
169
trx2 -2.1828434 171.62608
trx3 -24.733563 16.929495 210.68156
trx4 -5.561359 17.0457 23.027167 157.58531
. matrix list LMdirect
symmetric LMdirect[1,1]
u
u 8.5750886
. scalar LMdirectB = LMdirect[1,1]
.
. * (4) LM test via auxiliary regression
.
. * N uncentered Rsq from regress (noconstant) 1 on the scores
. * Begin by computing the unrestricted scores at the restricted estimates.
. * This varies from problem to problem.
. * In general could compute lnf(y) at current parameters
. * and then get numerical derivative when perturb beta a little.
. * Here use analytical derivative.
. * s_j = dlnf(y)/db_j = (y-exp(x'b))*x_j for the Poisson
133

.
. drop yhatrest
. quietly poisson y x2
. predict yhatrest
(option n assumed; predicted number of events)
. gen s1 = (y-yhatrest)*1
. gen s2 = (y-yhatrest)*x2
. gen s3 = (y-yhatrest)*x3
. gen s4 = (y-yhatrest)*x4
. regress one s1 s2 s3 s4, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 4, 196) = 2.36
Model | 9.18577727 4 2.29644432
Prob > F
= 0.0549
Residual | 190.814223 196 .973541953
R-squared = 0.0459
-------------+-----------------------------Adj R-squared = 0.0265
Total |
200 200
1
Root MSE
= .98668
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------s1 | -.0265153 .0748092 -0.35 0.723 -.1740497 .121019
s2 | -.0102806 .0809418 -0.13 0.899 -.1699093 .1493481
s3 | .1794153 .0697359 2.57 0.011 .0418862 .3169444
s4 | .1225885 .0821671 1.49 0.137 -.0394566 .2846336
-----------------------------------------------------------------------------. * LM equals N times uncentered Rsq
. scalar LMauxB = e(N)*e(r2)
. * Check: LM equals explained sum of squares
. scalar LMauxB2 = e(mss)
. di "LMauxB " LMauxB " LMauxB2 " LMauxB2
LMauxB 9.1857773 LMauxB2 9.1857773
.
. * (5) DISPLAY RESULTS
.
. estimates table unrestricted restrictedB, se stats(N ll r2) b(%8.3f)
-----------------------------------Variable | unrest~d restri~B
-------------+---------------------134

x2 | -0.028 -0.004
| 0.077
0.076
x3 | 0.163
| 0.067
x4 | 0.103
| 0.080
_cons | -0.165 -0.168
| 0.077
0.077
-------------+---------------------N | 200.000 200.000
ll | -238.772 -242.923
r2 |
-----------------------------------legend: b/se
. * Wald test using stata default Poisson variance matrix
. di "WaldB " WaldB " p-value " chi2tail(2,WaldB)
WaldB 8.5701855 p-value .01377234
. * LR test using Poisson log-likelihoods
. di " LRB " LRB " p-value " chi2tail(2,LRB)
LRB 8.3023503 p-value .0157459
. * LM test direct
. di " LMdirectB " LMdirectB " p-value " chi2tail(2,LMdirectB)
LMdirectB 8.5750886 p-value .01373862
. * LM test direct by auxiliary regression
. di " LMauxB " LMauxB " p-value " chi2tail(2,LMauxB)
LMauxB 9.1857773 p-value .01012357
.
. ****** (A) TEST H0: b3 = 0
.
. * (1) Wald test
. quietly poisson y x2 x3 x4
. test (x3=0)
( 1) [y]x3 = 0
chi2( 1) = 5.90
Prob > chi2 = 0.0151
. scalar WaldA = r(chi2)
.
. * (2) LR test
. poisson y x2 x4
Iteration 0: log likelihood = -241.64842
135

Iteration 1: log likelihood = -241.64842


Poisson regression

Number of obs =
200
LR chi2(2)
=
2.55
Prob > chi2 = 0.2793
Log likelihood = -241.64842
Pseudo R2
= 0.0053

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0163179 .0770381 -0.21 0.832 -.1673098 .134674
x4 | .1278017 .0800348 1.60 0.110 -.0290637 .284667
_cons | -.1719505 .0772389 -2.23 0.026 -.3233359 -.0205651
-----------------------------------------------------------------------------. estimates store restrictedA
. lrtest unrestricted

/* Uses estimates store unrestricted from earlier */

likelihood-ratio test
LR chi2(1) =
5.75
(Assumption: restrictedA nested in unrestricted)
Prob > chi2 =

0.0165

. scalar LRA = r(chi2)


.
. * (3) LM test via direct compuation requires estimating only the restricted model.
. * See (B) for more explanation
. drop one yhatrest u trx1 trx2 trx3 trx4
. matrix drop dlnL_db Vb LMdirect
. quietly poisson y x2 x4
. predict yhatrest
(option n assumed; predicted number of events)
. gen u = y - yhatrest

/* yhatrest = exp(x_brest) calculated earlier */

. gen one = 1
. matrix vecaccum dlnL_db = u one x2 x3 x4, noconstant
. gen trx1 = sqrt(yhatrest)
. gen trx2 = sqrt(yhatrest)*x2
. gen trx3 = sqrt(yhatrest)*x3
. gen trx4 = sqrt(yhatrest)*x4
. matrix accum Vb = trx1 trx2 trx3 trx4, noconstant
136

(obs=200)
. matrix LMdirect = dlnL_db*syminv(Vb)*dlnL_db'
. matrix list dlnL_db
dlnL_db[1,4]
one
x2
x3
x4
u -1.788e-07 -1.717e-07 34.832631 -3.179e-07
. matrix list Vb
symmetric Vb[4,4]
trx1
trx2
trx3
trx4
trx1
169
trx2 -2.1828435 170.25918
trx3 -21.987555 15.647287 212.5673
trx4 14.371941 16.35821 22.067372 158.94405
. matrix list LMdirect
symmetric LMdirect[1,1]
u
u 5.9159017
. scalar LMdirectA = LMdirect[1,1]
.
. * (4) LM test via auxiliary regression
. * See (B) for more explanation
. drop yhatrest s1 s2 s3 s4 one
. quietly poisson y x2 x4
. predict yhatrest
(option n assumed; predicted number of events)
. gen s1 = (y-yhatrest)*1
. gen s2 = (y-yhatrest)*x2
. gen s3 = (y-yhatrest)*x3
. gen s4 = (y-yhatrest)*x4
. gen one = 1
. regress one s1 s2 s3 s4, noconstant
Source |
SS
df
MS
-------------+------------------------------

Number of obs = 200


F( 4, 196) = 1.57
137

Model | 6.21794802 4 1.554487


Prob > F
= 0.1832
Residual | 193.782052 196 .988683939
R-squared = 0.0311
-------------+-----------------------------Adj R-squared = 0.0113
Total |
200 200
1
Root MSE
= .99433
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------s1 | -.021781 .0760166 -0.29 0.775 -.1716964 .1281344
s2 | .0237921 .082791 0.29 0.774 -.1394834 .1870675
s3 | .1785093 .0711813 2.51 0.013 .0381297 .3188889
s4 | -.0065009 .084884 -0.08 0.939 -.1739042 .1609024
-----------------------------------------------------------------------------. * LM equals N times uncentered Rsq
. scalar LMauxA = e(N)*e(r2)
. di "LMauxA " LMauxA
LMauxA 6.217948
.
. * (5) DISPLAY RESULTS in Table 7.1 page 242
.
. estimates table unrestricted restrictedA, se stats(N ll r2) b(%8.3f)
-----------------------------------Variable | unrest~d restri~A
-------------+---------------------x2 | -0.028 -0.016
| 0.077
0.077
x3 | 0.163
| 0.067
x4 | 0.103
0.128
| 0.080
0.080
_cons | -0.165 -0.172
| 0.077 0.077
-------------+---------------------N | 200.000 200.000
ll | -238.772 -241.648
r2 |
-----------------------------------legend: b/se
. di "WaldA " WaldA " p-value " chi2tail(1,WaldA)
WaldA 5.9040087 p-value .01510647
. di " LRA " LRA " p-value " chi2tail(1,LRA)
LRA 5.7537678 p-value .01645333
. di " LMdirectA " LMdirectA " p-value " chi2tail(1,LMdirectA)
LMdirectA 5.9159017 p-value .01500482
138

. di " LMauxA " LMauxA " p-value " chi2tail(1,LMauxA)


LMauxA 6.217948 p-value .01264616
.
. ****** (C) TEST H0: b3 = b4
.
. * (1A) Wald test
. poisson y x2 x3 x4
Iteration 0: log likelihood = -238.77153
Iteration 1: log likelihood = -238.77153
Poisson regression

Number of obs =
200
LR chi2(3)
=
8.30
Prob > chi2 = 0.0401
Log likelihood = -238.77153
Pseudo R2
= 0.0171
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0275702 .0767909 -0.36 0.720 -.1780775 .1229371
x3 | .1630037 .0670848 2.43 0.015 .0315199 .2944874
x4 | .1026568 .0802139 1.28 0.201 -.0545595 .2598732
_cons | -.1653238 .0773479 -2.14 0.033 -.316923 -.0137246
-----------------------------------------------------------------------------. test (x3=x4)
( 1) [y]x3 - [y]x4 = 0
chi2( 1) = 0.29
Prob > chi2 = 0.5883
.
. * (1B) Wald test done manually
. * Note that Stata puts the intercept last, not first.
. * So here the second and third elements of b are tested as equal.
. matrix drop h R Wald
. matrix bfull = e(b)

/* 1xq row vector */

. matrix vfull = e(V)

/* qxq matrix */

. matrix h = (bfull[1,2]-bfull[1,3])
. matrix R = (0,1,-1,0)

/* hx1 vector */

/* h x q matrix */

. matrix Wald = h'*syminv(R*vfull*R')*h /* scalar */


. matrix list h
139

symmetric h[1,1]
c1
r1 .06034684
. matrix list R
R[1,4]
c1 c2 c3 c4
r1 0 1 -1 0
. matrix list Wald
symmetric Wald[1,1]
c1
c1 .29301766
. scalar WaldC = Wald[1,1]
. di " WaldC " WaldC " p-value " chi2tail(1,WaldC)
WaldC .29301766 p-value .5882932
.
. * (2) LR Test
. * In general getting the restricted MLE requires constrained ML
. * Here simple as if b3=b4 then mean is exp(b1+b2*x2+B3*(x3+x4))
. gen x3plusx4 = x3+x4
. poisson y x2 x3plusx4
Iteration 0: log likelihood = -238.91785
Iteration 1: log likelihood = -238.91785
Poisson regression

Number of obs =
200
LR chi2(2)
=
8.01
Prob > chi2 = 0.0182
Log likelihood = -238.91785
Pseudo R2
= 0.0165

-----------------------------------------------------------------------------y | Coef. Std. Err.


z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | -.0287235 .0768651 -0.37 0.709 -.1793763 .1219293
x3plusx4 | .1374814 .0479519 2.87 0.004 .0434974 .2314653
_cons | -.1672262 .0773265 -2.16 0.031 -.3187832 -.0156691
-----------------------------------------------------------------------------. estimates store restrictedC
. lrtest unrestricted

/* Uses estimates store unrestricted from earlier */

likelihood-ratio test

LR chi2(1) =

0.29
140

(Assumption: restrictedC nested in unrestricted)

Prob > chi2 =

0.5885

. scalar LRC = r(chi2)


.
. * (3) LM test direct
. * Can use same code as earlier. Just different restricted estimates.
. * Now from poisson y x2 x3plusx4
. drop one yhatrest u trx1 trx2 trx3 trx4
. matrix drop dlnL_db Vb
. quietly poisson y x2 x3plusx4
. predict yhatrest
(option n assumed; predicted number of events)
. gen u = y - yhatrest

/* yhatrest = exp(x_brest) calculated earlier */

. gen one = 1
. matrix vecaccum dlnL_db = u one x2 x3 x4, noconstant
. gen trx1 = sqrt(yhatrest)
. gen trx2 = sqrt(yhatrest)*x2
. gen trx3 = sqrt(yhatrest)*x3
. gen trx4 = sqrt(yhatrest)*x4
. matrix accum Vb = trx1 trx2 trx3 trx4, noconstant
(obs=200)
. matrix LMdirect = dlnL_db*syminv(Vb)*dlnL_db'
. matrix list dlnL_db
dlnL_db[1,4]
one
x2
x3
x4
u 8.345e-07 -3.601e-07 4.8459933 -4.8459932
. matrix list Vb
symmetric Vb[4,4]
trx1
trx2
trx3
trx4
trx1
169
trx2 -2.1828442 171.13986
trx3 7.9990827 13.105974 225.99023
trx4 19.217934 15.11254 28.153892 161.75506

141

. matrix list LMdirect


symmetric LMdirect[1,1]
u
u .29306257
. scalar LMdirectC = LMdirect[1,1]
.
. * (4) LM test via auxiliary regression
. drop yhatrest s1 s2 s3 s4 one
. quietly poisson y x2 x3plusx4
. predict yhatrest
(option n assumed; predicted number of events)
. gen s1 = (y-yhatrest)*1
. gen s2 = (y-yhatrest)*x2
. gen s3 = (y-yhatrest)*x3
. gen s4 = (y-yhatrest)*x4
. gen one = 1
. regress one s1 s2 s3 s4, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 4, 196) = 0.08
Model | .31510777 4 .078776943
Prob > F
= 0.9891
Residual | 199.684892 196 1.01880047
R-squared = 0.0016
-------------+-----------------------------Adj R-squared = -0.0188
Total |
200 200
1
Root MSE
= 1.0094
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------s1 | -.000531 .077731 -0.01 0.995 -.1538275 .1527654
s2 | .012802 .0857027 0.15 0.881 -.1562159 .1818199
s3 | .0283145 .0761713 0.37 0.711 -.121906 .1785351
s4 | -.0367099 .0869889 -0.42 0.673 -.2082642 .1348445
-----------------------------------------------------------------------------. * LM equals N times uncentered Rsq
. scalar LMauxC = e(N)*e(r2)
. di "LMauxC " LMauxC
LMauxC .31510777

142

.
. * (5) DISPLAY RESULTS in Table 7.1 page 242
.
. estimates table unrestricted restrictedC, se stats(N ll r2) b(%8.3f)
-----------------------------------Variable | unrest~d restri~C
-------------+---------------------x2 | -0.028 -0.029
| 0.077
0.077
x3 | 0.163
| 0.067
x4 | 0.103
| 0.080
x3plusx4 |
0.137
|
0.048
_cons | -0.165 -0.167
| 0.077
0.077
-------------+---------------------N | 200.000 200.000
ll | -238.772 -238.918
r2 |
-----------------------------------legend: b/se
. di "WaldC " WaldC " p-value " chi2tail(1,WaldC)
WaldC .29301766 p-value .5882932
. di " LRC " LRC " p-value " chi2tail(1,LRC)
LRC .29264001 p-value .5885337
. di " LMdirectC " LMdirectC " p-value " chi2tail(1,LMdirectC)
LMdirectC .29306257 p-value .58826462
. di " LMauxC " LMauxC " p-value " chi2tail(1,LMauxC)
LMauxC .31510777 p-value .57456264
.
. ****** (D) TEST H0: b3/b4 - 1 = 0
.
. * (1) Wald test of b3 /b4 - 1 = 0
. * Stata does not do nonlinear hypotheses.
. * Instead do 7.2.5 algebra.
. matrix drop h R Wald
. matrix h = (bfull[1,2]/bfull[1,3] - 1)
. matrix R = (0, 1/bfull[1,3], -bfull[1,2]/(bfull[1,3]^2), 0)
. matrix Wald = h'*syminv(R*vfull*R')*h

143

. matrix list h
symmetric h[1,1]
c1
r1 .58785028
. matrix list R
R[1,4]
r1

c1
c2
c3
c4
0 9.7411946 -15.467559

. matrix list Wald


symmetric Wald[1,1]
c1
c1 .15768686
. scalar WaldD = Wald[1,1]
. di " WaldD " WaldD " p-value " chi2tail(1,WaldD)
WaldD .15768686 p-value .69129516
.
. * (2) LR Test
. * This requires MLE subject to nonlinear constraints.
. * This is difficult so not done here.
. * But note that here will get same result as if
. * get MLE subject to b3 = b4 which was done in (C).
.
. * (3) LM test direct
. * Like (2) requires restricted MLE.
. * This is difficult so not done here.
. * But note that here will get same result as if
. * get MLE subject to b3 = b4 which was done in (C).
.
. * (4) LM test via auxiliary regrression
. * Same as for (3)
.
. * (5) DISPLAY RESULTS
. di "WaldD " WaldD " p-value " chi2tail(1,WaldD)
WaldD .15768686 p-value .69129516
.
.
. *********** DISPLAY RESULTS GIVEN IN TABLE 7.1 on page 242 ***********
.
. estimates table unrestricted restrictedA restrictedB restrictedC, se stats(N ll r2) b(%8.3f)
---------------------------------------------------------Variable | unrest~d restri~A restri~B restri~C
144

-------------+-------------------------------------------x2 | -0.028 -0.016 -0.004 -0.029


| 0.077
0.077
0.076
0.077
x3 | 0.163
| 0.067
x4 | 0.103
0.128
| 0.080
0.080
x3plusx4 |
0.137
|
0.048
_cons | -0.165 -0.172 -0.168 -0.167
| 0.077
0.077
0.077
0.077
-------------+-------------------------------------------N | 200.000 200.000 200.000 200.000
ll | -238.772 -241.648 -242.923 -238.918
r2 |
---------------------------------------------------------legend: b/se
. di "WaldA " WaldA " p-value " chi2tail(1,WaldA)
WaldA 5.9040087 p-value .01510647
.
. * Wald test statistics
. di "Wald A to D: (A) " %8.3f WaldA " (B) " %8.3f WaldB " (C) " %8.3f WaldC " (D) " %8.3f
WaldD
Wald A to D: (A) 5.904 (B) 8.570 (C) 0.293 (D) 0.158
. di " p-values : (A) " %8.3f chi2tail(1,WaldA) " (B) " %8.3f chi2tail(2,WaldB) " (C) " %8.3f chi2t
> ail(1,WaldC) " (D) " %8.3f chi2tail(1,WaldD)
p-values : (A) 0.015 (B) 0.014 (C) 0.588 (D) 0.691
.
. * LR test statistics
. di "LR A to D: (A) " %8.3f LRA " (B) " %8.3f LRB " (C) " %8.3f LRC " (D) " %8.3f LRC
LR A to D: (A) 5.754 (B) 8.302 (C) 0.293 (D) 0.293
. di " p-values : (A) " %8.3f chi2tail(1,LRA) " (B) " %8.3f chi2tail(2,LRB) " (C) " %8.3f chi2tail(
> 1,LRC) " (D) " %8.3f chi2tail(1,LRC)
p-values : (A) 0.016 (B) 0.016 (C) 0.589 (D) 0.589
.
. * Direct LM test statistics
. di "LM A to D: (A) " %8.3f LMdirectA " (B) " %8.3f LMdirectB " (C) " %8.3f LMdirectC " (D)
" %8.
> 3f LMdirectC
LM A to D: (A) 5.916 (B) 8.575 (C) 0.293 (D) 0.293
. di " p-values: (A) " %8.3f chi2tail(1,LMdirectA) " (B) " %8.3f chi2tail(2,LMdirectB) " (C) " %8.
> 3f chi2tail(1,LMdirectC) " (D) " %8.3f chi2tail(1,LMdirectC)
p-values: (A) 0.015 (B) 0.014 (C) 0.588 (D) 0.588

145

.
. * Auxiliary Regression LM test statistics
. di "LM* A to D: (A) " %8.3f LMauxA " (B) " %8.3f LMauxB " (C) " %8.3f LMauxC " (D) "
%8.3f LMauxC
LM* A to D: (A) 6.218 (B) 9.186 (C) 0.315 (D) 0.315
. di " p-values : (A) " %8.3f chi2tail(1,LMauxA) " (B) " %8.3f chi2tail(2,LMauxB) " (C) " %8.3f
chi
> 2tail(1,LMauxC) " (D) " %8.3f chi2tail(1,LMauxC)
p-values : (A) 0.013 (B) 0.010 (C) 0.575 (D) 0.575
.
. ********** CLOSE OUTPUT ***********
. log close
log: c:\Imbook\bwebpage\Section2\mma07p1mltests.txt
log type: text
closed on: 17 May 2005, 13:59:21
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p2power.txt
log type: text
opened on: 17 May 2005, 14:00:49
.
. ********** OVERVIEW OF MMA07P2POWER.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 7.6.3 pages 248-9
. * Asymptotic Power of Wald test
.
. * (1) Chapter 7.6.3 obtains power for noncentral chisquare
. * (2) Figure 7.2 (ch7power.wmf) plots against the noncentrality parameter lamda
. * No data needed
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** ANALYSIS **********
.
. * Obtain power of chi-square tests
146

. * with df degrees of freedom


. * and noncentrality parameter (ncp) lamda from 0 to 20
. * for size alpha = 0.01, 0.05 and 0.10
.
. set obs 201
obs was 0, now 201
. scalar df = 1

/* Degrees of freedom */

. gen lamda = 0.1*(_n-1) /* Lamda = 0, 0.1, 0.2, ..., 19.9, 20.0 */


.
. * Obtain power
.*
= Pr[W > chi-square(alpha) | W ~ chi-square(alpha)]
. * for alpha = 0.01, 0.05 and 0.10
.
. * Critical value at size alpha uses central chisquare
. * invchi2tail gives cv such that Pr(Chi2 > cv) = alpha
. * Power is 1 minus cdf of noncentral chisquare
. * nchi2 gives the cdf of noncentral chisquare
.
. scalar alpha = 0.01
. scalar criticalvalue = invchi2tail(df,alpha)
. gen power01 = 1-nchi2(df,lamda,criticalvalue)
.
. scalar alpha = 0.05
. scalar criticalvalue = invchi2tail(df,alpha)
. gen power05 = 1-nchi2(df,lamda,criticalvalue)
.
. scalar alpha = 0.10
. scalar criticalvalue = invchi2tail(df,alpha)
. gen power10 = 1-nchi2(df,lamda,criticalvalue)
.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lamda |
201
10 5.816786
0
20
power01 |
201 .6230651 .3095508
.01 .9710402
power05 |
201 .7583101 .2717153
.05 .9940005
power10 |
201 .8152767 .2396043
.1 .9976528

147

. * For lamda = 0 have size = power, here 0.01, 0.05 and 0.10
. list if lamda==0 | lamda==5 | lamda==10 | lamda==20
+----------------------------------------+
| lamda power01 power05 power10 |
|----------------------------------------|
1. | 0
.01
.05
.1 |
51. | 5 .3670189 .6087795 .7228636 |
101. | 10 .7212129 .8853791 .9354209 |
201. | 20 .9710402 .9940005 .9976528 |
+----------------------------------------+
.
. ********** FIGURE 7.1 (p.249): PLOT THE POWER FUNCTION **********
.
. graph twoway (line power10 lamda, clstyle(p1)) /*
> */ (line power05 lamda, clstyle(p2)) /*
> */ (line power01 lamda, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Test Power as a function of the ncp") /*
> */ xtitle("Noncentrality parameter lamda", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Test Power", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(3) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Test size = 0.10") label(2 "Test size = 0.05") /*
> */
label(3 "Test size = 0.01"))
. graph export ch7power.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch7power.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma07p2power.txt
log type: text
closed on: 17 May 2005, 14:00:52
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p3montecarlo.txt
log type: text
opened on: 18 May 2005, 11:28:58
.
. ********** OVERVIEW OF MMA07P3MONTECARLO.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 7.7.1-7.7.5 pp. 250-4
148

. * Size and power of the Wald test


.
. * (1) Figure 7.2 Density of Wald test statistic
. * (2) Table 7.2 Actual size of Wald test at various nominal sizes
. * (3) Table 7.2 Actual power of Wald test at various nominal sizes
. * (4) Table 7.2 Nominal power of Wald test at various nominal sizes
. * (5) Alternative way to simulate using postfile rather than simulate
.
. * on the slope coefficient for a Probit model with simulated data (see below).
.
. * NOTE: Because this is a simulation using many samples (here 10,000)
. * the generated data are not saved in a text file.
.
. * Problem can arise if in one of the simulations all of sample is y=0 or y=1
. * Then the probit model is not estimable.
. * Then need increase sample size, change dgp or reduce number of simulations.
. * Here used N=40 with S=10000 for size and for power
. * Another possible change is to have same regressors x across simulations
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** MONTE CARLO OVERVIEW **********
.
. * The data generating process is
. * - Probit with Pr[y=1] = Phi(b1 + b2*x2)
. * - where b1 = 0 and b2 = 1
. * - and regressor x ~ N[0,1] is fixed throughout the simulations
.
. * The sample size N set below in the global numobs
. * The number of simulations S is set below in the global numsims
. * A third option is to switch to same x in each sample. This needs to be done manually.
.
. * The simulation is done using stata command simulate
. * At the end of the program, an alternative using postfile is given
.
. * The program investigates both size and power
. * of the Wald test that b2 = 1.
. * For power the dgp instead uses b2 = 2.
.
. ********** INITIAL SIMULATION SET UP **********
.
. set seed 10101
. * Change the following for different sample size N
149

. global numobs "40"


. * Change the following for different number of simulations S
. global numsims "10000"
.
. ****** ANALYSIS: SIMULATION OF PROBIT MODEL SLOPE ESTIMATES AND WALD
TEST
.
. * The program is rclass.
. * This means the results returned by the program are put into r( )
. * Here we return meany, vary, betahat, sebetahat, ztestforbetaeq1
.
. * The probit model is Pr[y=1] = Phi(b1 + b2*x2) where b1=0 and b2=1
. * For size calculations: b2 = 1
. * For power calculations: b2 = 1.5 (as an example)
. * So pass the argument trueb2 as an argument.
.
. * The following three lines are only needed
. * if the regressors are constant across simulations,
. * as then need to generate once and put in a data file to be reused.
. * They are commented out here as here (x,y) both resampled.
. * Also simprobit and simprobit2 need one line changed if x is fixed.
. /*
> set obs numobs
> gen x = invnorm(uniform())
> save xforsim, replace
> */
. * This version of the program instead redraws both x and y in each simulation
.
. * The program has one argument
. * - trueb2 = value of b2 in the dgp
.
. program simprobit, rclass
1. version 8.0
2. /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */
. args trueb2
3. /* Generate the data: here x and y */
. drop _all
4. set obs $numobs
5. gen x = invnorm(uniform())
6. /* If instead want same x in each simulation,
>
replace above line with: use xforsim */
. gen y = 0
7. replace y = 1 if 0 + `trueb2'*x + invnorm(uniform()) > 0
8. /* Summarize the generated data as a check */
. summarize y
9. return scalar ymean=r(mean)
10. return scalar yvar=r(Var)
11. /* Do probit and store key results */
. probit y x
150

12. return scalar b2hat=_b[x]


13. return scalar seb2hat = _se[x]
14. return scalar ztestforb2eq1 = (_b[x]-1)/_se[x]
15. end
.
. ****** (1) DISTRIBUTION OF WALD TEST STATISTIC (Figure 7.2 p.253)
.
. * Now call the program simprobit where
. * - include values for each argument within the quotes " "
. * (here the argument is b2true and is set to 1 for size and 1.5 for power)
. * - make sure that ask for each of the returned results
.
. * For size calculations set trueb2 = 1
. simulate "simprobit 1" ymean=r(ymean) yvar=r(yvar) b2hat=r(b2hat) /*
> */ seb2hat=r(seb2hat) ztestforb2eq1=r(ztestforb2eq1), reps($numsims)
command:
simprobit 1
statistics: ymean
= r(ymean)
yvar
= r(yvar)
b2hat
= r(b2hat)
seb2hat = r(seb2hat)
ztestfor~1 = r(ztestforb2eq1)
.
. * Summary of the results returned by simulate
. * For Wald test key output is ztestforb2eq1
. describe
Contains data
obs:
10,000
simulate: simprobit 1
vars:
5
18 May 2005 11:29
size:
240,000 (97.7% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------ymean
float %9.0g
r(ymean)
yvar
float %9.0g
r(yvar)
b2hat
float %9.0g
r(b2hat)
seb2hat
float %9.0g
r(seb2hat)
ztestforb2eq1 float %9.0g
r(ztestforb2eq1)
------------------------------------------------------------------------------Sorted by:
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ymean | 10000
.49946 .0794447
.225
.775
yvar | 10000 .2499373 .0089917 .1788462 .2564103
151

b2hat | 10000 1.133952 .4516738 -.0306482 9.389184


seb2hat | 10000 .3589645 .1561059 .1902922 4.583915
ztestforb2~1 | 10000 .1141294 .9558451 -4.087344 2.278257
.
. * For b2hat there are two ways to estimate the standard deviation.
. * One is the average of seb2hat, the standard error of b2hat
. * The other is the standard deviation of b2hat.
. * These are equal asymptotically, but perhaps not in small samples due to bias.
. * Also aveseb2hat is used later in calculating asymptotic power.
. sum seb2hat
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------seb2hat | 10000 .3589645 .1561059 .1902922 4.583915
. scalar aveseb2hat = r(mean)
. sum b2hat
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------b2hat | 10000 1.133952 .4516738 -.0306482 9.389184
. scalar stdevb2hat = r(sd)
. di "Average standard error of b2hat: " aveseb2hat
Average standard error of b2hat: .3589645
. di "Standard deviation of b2hat:
" stdevb2hat
Standard deviation of b2hat:
.45167383
.
. * The Wald test statistic will be called Wald
. gen Wald = ztestforb2eq1
. label var Wald "Wald test statistic"
.
. * The mean and st.dev. should be 0 and 1 if Wald ~ N[0,1]
. sum Wald
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------Wald | 10000 .1141294 .9558451 -4.087344 2.278257
.
. * The 2.5 and 97.5 percentiles should be -1.96 and 1.96 if Wald ~ N[0,1]
. * They can be used to get size-adjusted Wald test at 5 percent.
. _pctile Wald, p(2.5,99.5)

152

. display "Wald: Lower 2.5 percentile = " r(r1) " Upper 2.5 percentile = " r(r2)
Wald: Lower 2.5 percentile = -1.904708 Upper 2.5 percentile = 2.0034728
.
. * The density of the simulated values of the Wald test should be
. * a standard normal density if Wald ~ N[0,1]
. * The following plots kernel estimate of density of Wald and a N[0,1] density
. * Could also do Student[N-k] but this looks same as N[0,1] if N>=30.
. gen N01density = normden(Wald)
. sum Wald
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------Wald | 10000 .1141294 .9558451 -4.087344 2.278257
.
. graph twoway (kdensity Wald, range(-3 3) clstyle(p1)) /*
> */ (connect N01density Wald if Wald>-3 & Wald<3, clstyle(p2) sort(Wald) s(i)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Monte Carlo Simulations of Wald Test") /*
> */ xtitle("Wald Test Statistic", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Monte Carlo") label(2 "Standard Normal") /*
> */
label(3 "Test size = 0.01"))
. graph export ch7montecarlo.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch7montecarlo.wmf written in Windows Metafile format)
.
. ****** (2) ACTUAL SIZE OF THE WALD TEST STATISTIC (Table 7.2, p.253)
.
. * Obtain the size properties of a two-sided Wald test
. * That rejects if |Wald| > z_alpha/2 where alpha = .01, .05, .1, .2
.
. * Convert to two-sided test by taking absolute value
. gen absWald = abs(Wald)
.
. * Give key percentiles of |Wald|
. * Percentiles must be in ascending order for Stata
. _pctile absWald, p(0.80,0.90,0.95,0.99)
. display "I[Upper percentiles of |Wald|: " " 1 " r(r4) " 5 " r(r3) " 10 " r(r2) " 20 " r(r1)
I[Upper percentiles of |Wald|: 1 .0115847 5 .01074749 10 .00998338 20 .00923005
.
. * Program to calculate actual size given nominal size
. * Temporary variables and scalars are in quotes ` '
. program size, rclass
153

1.
version 8.0
2.
args nominalsize
3.
tempvar reject
4.
tempname normalcriticalvalue
5.
quietly {
6.
scalar `normalcriticalvalue' = invnorm(1-(`nominalsize'/2))
7.
gen `reject' = 0
8.
replace `reject' = 1 if absWald > `normalcriticalvalue'
9.
summarize `reject'
10.
return scalar actualsize = r(mean)
11.
}
12. end
.
. * Calculate actual size for nominal sizes 0.01, 0.05, 0.10 and 0.20
. size 0.01
. scalar actualsize01 = r(actualsize)
. size 0.05
. scalar actualsize05 = r(actualsize)
. size 0.10
. scalar actualsize10 = r(actualsize)
. size 0.20
. scalar actualsize20 = r(actualsize)
.
. * Following gives Actual Size column of Table 7.2 (p.253)
. * Nominal Sizes and Actual Sizes of Two-sided Wald Test
. di "0.01: " actualsize01 _new "0.05: " actualsize05 _new /*
> */ "0.10: " actualsize10 _new "0.20: " actualsize20
0.01: .0053
0.05: .0294
0.10: .0805
0.20: .1922
.
. ****** (3) ACTUAL POWER OF THE WALD TEST STATISTIC (Table 7.2, p.253)
.
. * Consider power when b2 = 2 rather than 1
.
. * Obtain the actual power by simulation
. * Use the same program simprobit as for size,
. * except the argument b2true is 2.0 rather than 1.0
.
. drop _all
154

.
. * For size calculations set trueb2 = 2
. simulate "simprobit 2" ymean=r(ymean) yvar=r(yvar) b2hat=r(b2hat) /*
> */ seb2hat=r(seb2hat) ztestforb2eq1=r(ztestforb2eq1), reps(10000)
command:
simprobit 2
statistics: ymean
= r(ymean)
yvar
= r(yvar)
b2hat
= r(b2hat)
seb2hat = r(seb2hat)
ztestfor~1 = r(ztestforb2eq1)
.
. * Calculate |Wald|
. gen Wald = ztestforb2eq1
(71 missing values generated)
. gen absWald = abs(Wald)
(71 missing values generated)
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ymean |
9929 .4998389 .0791531
.225
.825
yvar |
9929 .249985 .0090933 .1480769 .2564103
b2hat |
9929 2.581075 2.73046 .8547966 209.9805
seb2hat |
9929 1.002628 5.799384 .2816004 540.1536
ztestforb2~1 |
9929 1.667773 .3853416 -.4042006 2.59991
-------------+-------------------------------------------------------Wald |
9929 1.667773 .3853416 -.4042006 2.59991
absWald |
9929 1.668285 .383118 .0033462 2.59991
.
. * Calculate actual power for nominal sizes 0.01, 0.05, 0.10 and 0.20
. * This can use the earlier program size
. size 0.01
. scalar actualpower01 = r(actualsize)
. size 0.05
. scalar actualpower05 = r(actualsize)
. size 0.10
. scalar actualpower10 = r(actualsize)
. size 0.20
155

. scalar actualpower20 = r(actualsize)


.
. * Following gives Actual Power column of Table 7.2 (p.253)
. * Nominal Sizes and Actual Power of Two-sided Wald Test
. di "0.01: " actualpower01 _new "0.05: " actualpower05 _new /*
> */ "0.10: " actualpower10 _new "0.20: " actualpower20
0.01: .0073
0.05: .2257
0.10: .6077
0.20: .8583
.
. ****** (4) ASYMPTOTIC POWER OF THE WALD TEST STATISTIC (Table 7.2, p.253)
.
. * Consider power when b2 = 2 rather than 1
.
. * Calculate asymptotic theoretical power using noncentral chisquare
. * Asymptotic power = Pr[W > chi-square(alpha) | W ~ noncentral chi-square(alpha,ncp)
. * The noncentrality parameter is 0.5*(delta^2)/(se[b2]^2)
. * Here size has b2 = 1 and power has b2 = 1+delta
. * So delta = b2true - 1.
. * Need to find the standard error of b2.
. * Use the average from earlier simulations.
.
. * Program to calculate asymptotic power given nominal size
. * Temporary variables and scalars and arguments are in quotes ` '
. * invchi2tail gives cv such that Pr(Chi2 > cv) = nominalsize
. * Power is 1 minus cdf of noncentral chisquare
. * nchi2 gives the cdf of noncentral chisquare
.
. drop _all
.
. * Arguments are alpha (size), lamda and df (degrees of freedom)
. program power, rclass
1.
version 8.0
2.
args alpha lamda df
3.
tempname criticalvalue powervianoncentralchi
4.
quietly {
5.
scalar `criticalvalue' = invchi2tail(`df',`alpha')
6.
scalar `powervianoncentralchi' = 1-nchi2(`df',`lamda',`criticalvalue')
7.
return scalar asymppower = `powervianoncentralchi'
8.
}
9. end
.
. * scalar criticalvalue = invchi2tail(df,alpha)
. * replace power = 1-nchi2(df,lamda,criticalvalue)
.
156

. * Calculate df and lamda.


. * This uses an estimate of se[beta] obtained earlier
. scalar delta = 1 /* Here 2 - 1. Changes for different alternatives */
. scalar lamda = 0.5*(delta*delta)/(aveseb2hat*aveseb2hat)
. scalar df = 1
. di "delta: " delta " aveseb2hat: " aveseb2hat " lamda: " lamda " df: " df
delta: 1 aveseb2hat: .3589645 lamda: 3.8803151 df: 1
.
. * Calculate asymptotic power for nominal sizes 0.01, 0.05, 0.10 and 0.20
. power 0.01 lamda df
. scalar asymppower01 = r(asymppower)
. power 0.05 lamda df
. scalar asymppower05 = r(asymppower)
. power 0.10 lamda df
. scalar asymppower10 = r(asymppower)
. power 0.20 lamda df
. scalar asymppower20 = r(asymppower)
.
. * Following gives Asymptotic Power column of Table 7.2 (p.253)
. * Nominal Sizes and Asymptotic Power of Two-sided Wald Test
. di "0.01: " asymppower01 _new "0.05: " asymppower05 _new /*
> */ "0.10: " asymppower10 _new "0.20: " asymppower20
0.01: .2722675
0.05: .50398701
0.10: .62755902
0.20: .75494224
.
. ****** (5) ALTERNATIVE ANALYSIS: SIMULATION METHOD USING POSTFILE
.
. * This is an alternative, given for completeness.
. * This fails if the model is not estimable in any of the simulation samples.
. * By contrast, simulate just drops that simulation sample and continues simulating.
.
. * For each round of the simulation, the variables in `sim' are sent
. * as a new line to a stata data set simprobitresults.
. * The names of these variables are given in quotes after S_1
. * Need as many names in quotes after S_1 as variables at post
. * Then can analyze these using summarize etcetera
157

.
. * This program has two arguments
. * - numsims = desired number of simulations
. * - trueb2 = slope coefficient used to generate the data
.
. drop _all
.
. program simprobit2
1.
version 8.0
2.
args numsims trueb2
3.
tempname sim
4.
postfile `sim' meany vary beta sterror ztestforbeta using probitsimresults, replace
5.
quietly {
6.
forvalues i = 1/`numsims' {
7.
drop _all
8.
set obs $numobs
/* may need to change */
9.
gen x = invnorm(uniform())
10.
/* If instead want same x in each simulation
>
replace above line with: use xforsim */
.
gen y = 0
11.
/* Use b2 = 1.0 for size and 1.5 for power */
.
replace y = 1 if 0+`trueb2'*x+invnorm(uniform()) > 0
12.
summarize y
13.
scalar meany=r(mean)
14.
scalar vary=r(Var)
15.
probit y x
16.
scalar beta=_b[x]
17.
scalar sterror = _se[x]
18.
scalar ztestforbeta = (beta-1)/sterror
19.
post `sim' (meany) (vary) (beta) (sterror) (ztestforbeta)
20.
}
21.
}
22.
postclose `sim'
23. end
.
. simprobit2 $numsims 1
. use probitsimresults, clear
.
. * Here we just summarize results for comparison with earlier
. * But could do the further analysis as above
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------meany | 10000 .4989575 .0791248
.225
.775
vary | 10000 .2499885 .0090127 .1788462 .2564103
beta | 10000 1.135003 .4315248 .0901358 7.205799
158

sterror | 10000 .3583266 .133302 .1863547 3.360862


ztestforbeta | 10000 .1218973 .954814 -3.401833 2.299991
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma07p3montecarlo.txt
log type: text
closed on: 18 May 2005, 11:29:29
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma07p4boot.txt
log type: text
opened on: 18 May 2005, 21:36:29
.
. ********** OVERVIEW OF MMA07BOOT4.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 7.8 pages 254-256
. * Bootstrap applied to probit model
. * Provides
. * (1) Bootstrap confidence intervals
. * (2) Bootstrap hypothesis test without refinement
. * (3) Bootstrap hypothesis test with refinement: percentile-t method
.
. * Note corrections to book
. * - sample size is N=40 not N=30
. * - use 999 bootstrap replications not 1000
. * - for asymptotic refinement p.256 the critical region
.*
is (-1.89, 1.80) not (-2.62, 1.83)
.
. * For more detail on bootstrap see
. * Chapter 11: Bootstrap Methods pages 355-383
. * and program mma11p1boot.do
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** GENERATE DATA **********
.
. * DGP is Probit: Pr[y=1] = PHI(a + bx)
159

. * where x is N[0,1]
. * and a = 0 and b = 1
.
. * Change the following for different sample size N
. global numobs "40"
.
. * Probit example with slope coefficient equal to 1
. set seed 10105
. set obs $numobs
obs was 0, now 40
. gen x = invnorm(uniform())
. gen y = 0
. replace y = 1 if 0+1.0*x+invnorm(uniform()) > 0
(19 real changes made)
. save xyforsim, replace
file xyforsim.dta saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x|
40 -.0359197 .9203391 -2.210579 1.45199
y|
40
.475 .5057363
0
1
. probit y x
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -27.675866


log likelihood = -22.927488
log likelihood = -22.735204
log likelihood = -22.733966
log likelihood = -22.733966

Probit estimates

Number of obs =
40
LR chi2(1)
=
9.88
Prob > chi2 = 0.0017
Log likelihood = -22.733966
Pseudo R2
= 0.1786
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .8168831 .2942893 2.78 0.006 .2400867 1.393679
_cons | -.0725436 .2162576 -0.34 0.737 -.4964006 .3513135
-----------------------------------------------------------------------------. save mma07p4boot, replace
160

file mma07p4boot.dta saved


.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x using mma07p4boot.asc, replace
.
. ********** (1) BOOTSTRAP CONFIDENCE INTERVALS **********
.
. * Stata produces four bootstrap 100*(1-alpha) confidence intervals
. * (1)-(2) have no asymptotic refinement
. * (3)-(4) have asymptotic refinement
.
. * (1) Regular asymptotic normal: bhat +/- t(S-1)_alpha/2*se(bhat)
. * except instead of using the initial se(bhat)
. * we use the standard deviation of bhat from the bootstrap reps
. * and use t(S-1) rather than z for critical value
. * where S = number of bootstrap reps
.
. * (2) Percentile method: which orders the bhat(s) from simulations and
. * goes from alpha/2 lowest bhat(s) to the alpha/2 highest bhat(s)
. * where (s) denotes the s-th bootstrap sample
.
. * (3) Bootstrap-corrected. Same as (4) with a=0
.
. * (4) Bootstrap-corrected and accelerated.
. * This works with the pivotal Wald statistic.
. * See the manual [R]bootstrap or a textbook.
. * e.g. Efron and Tibsharani (1993, pp.184-188) with a=0
. * This orders the bhats from simulations and
. * goes from p1 to the p2 highest
. * where p1 and p2 are bias-correction adjustments to alpha/2 and 1-alpha/2
. * Let p1 = Phi(2z0 - z_alpha/2)
.*
p2 = Phi(2z0 + z_alpha/2)
.*
z0 measures the median bias in bhat with
.*
z0 = Phi-inv(fraction of the bhat(s) < bhat)
. * And if z0=0 then p1 = alpha/2 and no correction
.
. * Change the following for different number of simulations S
. * From page 399, for testing better to use 999 than 1000
. global breps "999" /* The number of bootstrap reps used below */
.
. * (1A) Simplest bootstrap is of all the estimated coefficients
. set seed 10105
. bootstrap "probit y x" _b, reps($breps) bca
command:
probit y x
statistics: b_x
= _b[x]
b_cons = _b[_cons]
161

Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_x | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
|
.1552112 1.480223 (BCa)
b_cons | 999 -.0725436 -.0176301 .2448404 -.5530047 .4079175 (N)
|
-.596443 .4247662 (P)
|
-.5528302 .4381396 (BC)
|
-.5205303 .4445401 (BCa)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
BCa = bias-corrected and accelerated
.
. * (1B) This bootstrap is of MLE of b2 and the associated standard error
. * and additionally gives the bias-accelerated method of Efron
. set seed 10105
. bootstrap "probit y x" _b[x] _se[x], reps($breps) bca
command:
probit y x
statistics: _bs_1
= _b[x]
_bs_2
= _se[x]
Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
|
.1552112 1.480223 (BCa)
_bs_2 | 999 .2942893 .0422005 .0932673 .1112667 .4773118 (N)
|
.2323841 .5831083 (P)
|
.2214397 .4475662 (BC)
|
.2162534 .4143377 (BCa)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
BCa = bias-corrected and accelerated
162

.
. * (1C) This bootstrap repeats (2)
. * but will permit bootstrapping if Stata commands are more than one line
. use mma07p4boot, clear
. program define commandtobootstrap, rclass
1. version 8.0
2. quietly probit y x
3. return scalar b2hat=_b[x]
4. return scalar seb2hat=_se[x]
5. end
. set seed 10105
. bootstrap "commandtobootstrap" r(b2hat) r(seb2hat), reps($breps)
command:
commandtobootstrap
statistics: _bs_1
= r(b2hat)
_bs_2
= r(seb2hat)
Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
_bs_2 | 999 .2942893 .0422005 .0932673 .1112667 .4773118 (N)
|
.2323841 .5831083 (P)
|
.2214397 .4475662 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
.
. ********** (2) BOOTSTRAP HYPOTHESIS TESTS - NO REFINEMENT p.255 **********
.
. * We want to test H0: b2 = 1 against Ha: b2 not equal 1
.
. * For a simple test such as this we can just use
. * the bootstrap confidence intervals from (1)
. * and reject if bhat2 is not in the confidence interval
.
. * Here we instead present a common method without refinement
. * essentially (1) above, performing the usual Wald test,
. * except the standard error is estimated by bootstrap.
. * This is useful when hard to obtain standard error by other means.
163

. * Here W = (b2hat - b2_0) / seb2hat_boot where b2_0 = 1


. * and reject at level .05 if |W| > z_.025 = 1.96
.
. use mma07p4boot, clear
. * Save the estimate
. quietly probit y x
. scalar b2est = _b[x]
. * Obtain the bootstrap standard error
. set seed 10105
. bootstrap "probit y x" _b, reps($breps) bca
command:
probit y x
statistics: b_x
= _b[x]
b_cons = _b[_cons]
Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------b_x | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
|
.1552112 1.480223 (BCa)
b_cons | 999 -.0725436 -.0176301 .2448404 -.5530047 .4079175 (N)
|
-.596443 .4247662 (P)
|
-.5528302 .4381396 (BC)
|
-.5205303 .4445401 (BCa)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
BCa = bias-corrected and accelerated
. matrix sebboot = e(se)
. scalar seb2boot = sebboot[1,1] /* x is first then constant */
. * Calculate the test statistic
. scalar Wald = (b2est - 1)/seb2boot
.
. * DISPLAY RESULTS at bottom p.255
. * Note: Text had typo:
. * (1-0.817)/0.376 = -0.487 should be (0.817-1)/0.376 = -0.487
.
164

. di "Probit slope estimate is:


" b2est
Probit slope estimate is:
.8168831
. di "Bootstrap standard estimate is: " seb2boot
Bootstrap standard estimate is: .37638029
. di "Wald statistic (no refinement) is: " Wald
Wald statistic (no refinement) is: -.48652096
. di "Reject at level .05 if |Wald| > 1.96"
Reject at level .05 if |Wald| > 1.96
.
. ********** (3) BOOTSTRAP HYPOTHESIS TESTS - PERCENTILE-T p.256 **********
.
. * Stata does not give this. For methods see
. * e.g. Efron and Tibsharani (1993, pp.160-162)
. * e.g. Cameron and Trivedi (2005)

Chapter 11.2.6-11.2.7
. * For sample s compute t-test(s) = (bhat(s)-bhat) / se(s)
. * where bhat is initial estimate
. * and bhat(s) and se(s) are for sth round.
. * Order the t-test(s) statistics and choose the alpha/2 percentiles
. * which give the critical values for the t-test
.
. * Implementation requires saving the results from each bootstrap replication
. * in order to obtain ccritical values from percentiles of bootstrap distribution
.
. * (3A) Here bootstrap computes (b(s) - bhat) / se(s) s = 1,...,S
.
. use mma07p4boot, clear
. * Save the estimate and the Wald test statistic
. quietly probit y x
. scalar b2est = _b[x]
. scalar Wald = (_b[x] - 1)/_se[x]
. * Then bootstrap calculates (b(s) - bhat) / se(s)
. set seed 10105
. bootstrap "probit y x" ((_b[x]-b2est)/_se[x]), reps($breps) /*
> */ level(95) saving(mma07p4bootreps) replace
command:
probit y x
statistic: _bs_1
= (_b[x]-b2est)/_se[x]
Bootstrap statistics

Number of obs =
Replications =
999

40

165

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999
0 .1003619 .9350234 -1.834837 1.834837 (N)
|
-1.890602 1.801358 (P)
|
-2.101316 1.565618 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. * Then get data sets with result from each bootstrap
. use mma07p4bootreps, clear
(bootstrap: probit y x)
. sum

/* Here just _bs_1 */

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------_bs_1 |
999 .1003619 .9350234 -3.032139 2.572848
. gen b2test = _bs_1 /* _bs_1 is the bootstrap result of interest */
. sum b2test, detail /* Gives percentiles but not 2.5% and 97.5% */
b2test
------------------------------------------------------------Percentiles
Smallest
1% -2.188575 -3.032139
5% -1.540843 -2.605178
10% -1.137846 -2.599248
Obs
999
25% -.4995352 -2.566578
Sum of Wgt.
999
50%
75%
90%
95%
99%

.1238111
Mean
.1003619
Largest
Std. Dev.
.9350234
.7789762
2.22565
1.338348
2.359132
Variance
.8742688
1.560646
2.377491
Skewness
-.2505319
2.014282
2.572848
Kurtosis
2.853737

. _pctile b2test, p(2.5,97.5)


.
. * DISPLAY RESULTS on p.256
.
. * Note: Error on p.256 Here get (-1.89, 1.80) not (-2.62, 1.83)
. di "Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of coeff b for z: -1.8906019 and 1.8013585
. di "Reject H0 if Wald = " Wald " lies outside " r(r1) " ," r(r2) ")"
Reject H0 if Wald = -.62223436 lies outside -1.8906019 ,1.8013585)
166

.
. * (3B) Equivalently bootstrap calculates b(s) and se(s) s = 1,...,S
.*
and then later calculate (b(s) - bhat) / se(s)
.
. use mma07p4boot, clear
. * Save the estimate and the Wald test statistic
. quietly probit y x
. scalar b2est = _b[x]
. scalar Wald = (_b[x] - 1)/_se[x]
. * Then bootstrap calculates b(s) and se(s)
. set seed 10105
. bootstrap "probit y x" _b[x] _se[x], reps($breps) /*
> */ level(95) saving(mma07p4bootreps) replace
command:
probit y x
statistics: _bs_1
= _b[x]
_bs_2
= _se[x]
Bootstrap statistics

Number of obs =
Replications =
999

40

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999 .8168831 .1017329 .3763803 .0782956 1.555471 (N)
|
.3495505 1.878616 (P)
|
.2808956 1.600026 (BC)
_bs_2 | 999 .2942893 .0422005 .0932673 .1112667 .4773118 (N)
|
.2323841 .5831083 (P)
|
.2214397 .4475662 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
. * Then get data sets with result from each bootstrap
. use mma07p4bootreps, clear
(bootstrap: probit y x)
. sum

/* Here _bs_1 and _bs_2 */

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------_bs_1 |
999 .918616 .3763803 .0030288 3.806198
_bs_2 |
999 .3364898 .0932673 .2162534 1.34312
167

. gen b2test = (_bs_1 - b2est)/_bs_2


. _pctile b2test, p(2.5,97.5)
.
. * DISPLAY RESULTS on p.256
. * Note: Error on p.256 Here get (-1.89, 1.80) not (-2.62, 1.83)
. di "Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of coeff b for z: -1.8906019 and 1.8013583
. di "Reject H0 if Wald = " Wald " lies outside " r(r1) " ," r(r2) ")"
Reject H0 if Wald = -.62223436 lies outside -1.8906019 ,1.8013583)
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma07p4boot.txt
log type: text
closed on: 18 May 2005, 21:36:36

168

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma08p1cmtests.txt
log type: text
opened on: 17 May 2005, 14:04:20
.
. ********** OVERVIEW OF MMA08P1CMTESTS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 8.2.6 pages 269-71
. * Conditional moment tests example producing Table 8.1
.
. * (A) TEST OF THE CONDITIONAL MEAN
. * (B) TEST THAT CONDITIONAL VARIANCE = MEAN
. * (C) ALTERNATIVE TEST THAT CONDITIONAL VARIANCE = MEAN
. * (D) INFORMATION MATRIX TEST
. * (E) CHI-SQUARE GOODNESS OF FIT TEST
. * for a Poisson model with generated data (see below).
.
. * The data generation requires free Stata add-on command rndpoix
. * In Stata: search rndpoix
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * Model is
. * y ~ Poisson[exp(b1 + b2*x2]
. * where
. * x2 is iid ~ N[0,1]
. * and b1=0 and b2=1.
.
. set seed 10001
. set obs 200
obs was 0, now 200
. scalar b1 = 0

169

. scalar b2 = 1
.
. * Generate regressors
. gen x2 = invnorm(uniform())
.
. * Generate y
. gen mupoiss = exp(b1+b2*x2)
. * The next requires Stata add-on. In Stata: search rndpoix
. rndpoix(mupoiss)
( Generating ................ )
Variable xp created.
. gen y = xp
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 using mma08p1cmtests.asc, replace
.
. ********* POISSON REGRESSION **********
.
. poisson y x2
Iteration 0: log likelihood = -263.53818
Iteration 1: log likelihood = -263.5288
Iteration 2: log likelihood = -263.5288
Poisson regression

Log likelihood = -263.5288

Number of obs =
LR chi2(1)
= 321.75
Prob > chi2 = 0.0000
Pseudo R2
=

200

0.3791

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | 1.12402 .0687868 16.34 0.000 .9892006 1.25884
_cons | -.1652935 .089065 -1.86 0.063 -.3398578 .0092707
-----------------------------------------------------------------------------. * Obtain exp(x'b)
.
. * Obtain the scores to be used later
. predict yhat
(option n assumed; predicted number of events)
. * For the Poisson s = dlnf(y)/db = (y - exp(x'b))*x
. gen s1 = (y - yhat)

170

. gen s2 = (y - yhat)*x2
.
. * Summarize data
. * Should get s1 and s2 summing to zero
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
200 -.0091098 1.010072 -2.857666 2.149822
mupoiss |
200 1.599601 1.674071 .0574026 8.58333
xp |
200
1.525 2.363749
0
15
y|
200
1.525 2.363749
0
15
yhat |
200
1.525 1.803242 .0341372 9.498652
-------------+-------------------------------------------------------s1 |
200 1.36e-09 1.36719 -3.148933 6.245292
s2 |
200 6.69e-09 1.889198 -6.420406 12.97311
.
. ********** ANALYSIS: CONDITIONAL MOMENTS TESTS **********
.
. * The program is appropriate for MLE with density assumed to be correctly specified.
. * Let H0: E[m(y,x,theta)] = 0
. * Then CM = explained sum of squares or N times uncentered Rsq from
. * auxiliary regression of 1 on m and the components of s = dlnf(y)//dtheta
. * The test is chi-squared with dim(m) degrees of freedom.
.
. * Define the dependent variable one for the aucxiliary regressions
. gen one = 1
.
. *** (A) TEST OF THE CONDITIONAL MEAN (Table 8.1 p.270 row 1)
.
. * Test H0: E[(y - exp(x'b))*z] = 0 where z = x2sq
.
. * A smilar test is relevant for many nonlinear models
. * Just change the expression for the conditional mean.
. * Here we used E[y|x] = exp(x'b) for the Poisson
. * Also for the Poisson z cannot be x as this sums to zero by Poisson foc
. * For some other models (basically non-LEF models) z can be x
.
. gen z = x2*x2
. gen mA = (y - yhat)*z
. regress one mA s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 3, 197) = 1.09
Model | 3.27177115 3 1.09059038
Prob > F
= 0.3536
Residual | 196.728229 197 .998620451
R-squared = 0.0164
171

-------------+-----------------------------Total |
200 200
1

Adj R-squared = 0.0014


Root MSE
= .99931

-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mA | .1046155 .0577969 1.81 0.072 -.0093646 .2185956
s1 | -.0377486 .0822939 -0.46 0.647 -.2000387 .1245415
s2 | -.1544278 .1029465 -1.50 0.135 -.3574463 .0485908
-----------------------------------------------------------------------------. scalar CMA = e(N)*e(r2)
. di "CMA: " CMA " p-value: " chi2tail(1,CMA)
CMA: 3.2717711 p-value: .07048149
.
. * Check that three different ways give same answer.
. di "N times Uncentered R-squared: " e(N)*e(r2)
N times Uncentered R-squared: 3.2717711
. di "Explained Sum of Squares:
" e(mss)
Explained Sum of Squares:
3.2717711
. di "N minus Residual Sum of Squares: " e(N) - e(rss)
N minus Residual Sum of Squares: 3.2717711
.
. *** (B) TEST THAT CONDITIONAL VARIANCE = MEAN (Table 8.1 p.270 row 2)
.
. * Test H0: E[{(y - exp(x'b))^2 - exp(x'b)}*x] = 0
.
. * This test is peculiar to Poisson which restricts mean = variance
.
. * Here m has 2 terms
. gen mB1 = ((y - yhat)^2 - yhat)
. gen mB2 = ((y - yhat)^2 - yhat)*x2
. regress one mB1 mB2 s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 4, 196) = 0.60
Model | 2.43400011 4 .608500026
Prob > F
= 0.6604
Residual | 197.566 196 1.0079898
R-squared = 0.0122
-------------+-----------------------------Adj R-squared = -0.0080
Total |
200 200
1
Root MSE
= 1.004
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------172

mB1 | .0432045 .0542516 0.80 0.427 -.0637873 .1501963


mB2 | -.0052374 .0357193 -0.15 0.884 -.0756808 .065206
s1 | -.0399879 .1073712 -0.37 0.710 -.251739 .1717633
s2 | -.003196 .0852726 -0.04 0.970 -.1713655 .1649735
-----------------------------------------------------------------------------. scalar CMB = e(N)*e(r2)
. di "CMB: " CMB " p-value: " chi2tail(2,CMB)
CMB: 2.4340001 p-value: .29611717
.
. *** (C) ALTERNATIVE TEST THAT CONDITIONAL VARIANCE = MEAN (Table 8.1 p.270
row 3)
.
. * Test H0: E[{(y - exp(x'b))^2 - y}*x] = 0
.
. * This test is peculiar to Poisson which restricts mean = variance
. * This test is also peculiar as here dm/db = 0
.
. * Here m has 2 terms
. gen mC1 = ((y - yhat)^2 - y)
. gen mC2 = ((y - yhat)^2 - y)*x2
.
. * To be consistent with other tests include s1 and s2.
. regress one mC1 mC2 s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 4, 196) = 0.60
Model | 2.43400011 4 .608500027
Prob > F
= 0.6604
Residual | 197.566 196 1.0079898
R-squared = 0.0122
-------------+-----------------------------Adj R-squared = -0.0080
Total |
200 200
1
Root MSE
= 1.004
-----------------------------------------------------------------------------one |
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mC1 | .0432045 .0542516 0.80 0.427 -.0637873 .1501963
mC2 | -.0052374 .0357192 -0.15 0.884 -.0756808 .065206
s1 | .0032166 .0825345 0.04 0.969 -.1595531 .1659863
s2 | -.0084334 .0641096 -0.13 0.895 -.1348665 .1179997
-----------------------------------------------------------------------------. scalar CMC = e(N)*e(r2)
. di "CMC: " CMC " p-value: " chi2tail(2,CMC)
CMC: 2.4340001 p-value: .29611717
.
173

. * Since dm/db = 0 could just do the regression without the scores


. regress one mC1 mC2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 2, 198) = 1.21
Model | 2.40695177 2 1.20347588
Prob > F
= 0.3016
Residual | 197.593048 198 .997944688
R-squared = 0.0120
-------------+-----------------------------Adj R-squared = 0.0021
Total |
200 200
1
Root MSE
= .99897
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mC1 | .0458705 .0510111 0.90 0.370 -.0547243 .1464652
mC2 | -.0075807 .03212 -0.24 0.814 -.0709218 .0557605
-----------------------------------------------------------------------------. scalar CMCnoscores = e(N)*e(r2)
. di "CMCnoscores: " CMC " p-value: " chi2tail(2,CMCnoscores)
CMCnoscores: 2.4340001 p-value: .30014911
.
. *** (D) INFORMATION MATRIX TEST (Table 8.1 p.270 row 4)
.
. * Test H0: E[{(y - exp(x'b))^2 - y}*vech(xx')] = 0
.
. * A similar test is relevant for other parametric models
. * In general m = vech(d2lnf(y)/dbdb')
. * and for Poisson this yields above
.
. * Here m is a 3x1 vector
. gen mD1 = ((y - yhat)^2 - y)
. gen mD2 = ((y - yhat)^2 - y)*x2
. gen mD3 = ((y - yhat)^2 - y)*x2*x2
.
. * To be consistent with other tests include s1 and s2.
. regress one mD1 mD2 mD3 s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 5, 195) = 0.58
Model | 2.9463051 5 .58926102
Prob > F
= 0.7129
Residual | 197.053695 195 1.01053177
R-squared = 0.0147
-------------+-----------------------------Adj R-squared = -0.0105
Total |
200 200
1
Root MSE
= 1.0053
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
174

-------------+---------------------------------------------------------------mD1 | .0546342 .0566422 0.96 0.336 -.0570759 .1663442


mD2 | -.0712751 .0994042 -0.72 0.474 -.2673205 .1247703
mD3 | .0330527 .0464213 0.71 0.477 -.0584996 .124605
s1 | -.0098554 .0846533 -0.12 0.907 -.176809 .1570982
s2 | -.0146441 .0647803 -0.23 0.821 -.1424041 .1131158
-----------------------------------------------------------------------------. scalar CMD = e(N)*e(r2)
. di "CMD: " CMD " p-value: " chi2tail(3,CMD)
CMD: 2.9463051 p-value: .39997818
.
. * Since dm/db = 0 could just do the regression without the scores
. regress one mD1 mD2 mD3, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 3, 197) = 0.91
Model | 2.73445751 3 .911485837
Prob > F
= 0.4370
Residual | 197.265542 197 1.00134793
R-squared = 0.0137
-------------+-----------------------------Adj R-squared = -0.0013
Total |
200 200
1
Root MSE
= 1.0007
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mD1 | .056165 .054176 1.04 0.301 -.0506743 .1630043
mD2 | -.056325 .0911035 -0.62 0.537 -.2359884 .1233384
mD3 | .0233527 .0408339 0.57 0.568 -.057175 .1038805
-----------------------------------------------------------------------------. scalar CMDnoscores = e(N)*e(r2)
. di "CMDnoscores: " CMDnoscores " p-value: " chi2tail(3,CMDnoscores)
CMDnoscores: 2.7344575 p-value: .43440333
.
. *** (E) CHI-SQUARE GOODNESS OF FIT TEST (Table 8.1 p.270 row 5)
.
. * Test H0: E[{d_j - Pr[y = j]] = 0
. * where d_j = 1 if y = j for j = 0, 1, 2, and 3 or more
. * and Pr[y = j] = exp(-lamda)*lamda^y/y! for lamda = exp(x'b)
. * Cells get too small if have more cells than up to 3 or more.
.
. * A similar test is relevant for other parametric models,
. * though a natural partitioning for y may be less obvious.
.
. * Here m has 4 terms
. gen d0 = 0

175

. replace d0 = 1 if y==0
(87 real changes made)
. gen d1 = 0
. replace d1 = 1 if y==1
(51 real changes made)
. gen d2 = 0
. replace d2 = 1 if y==2
(22 real changes made)
. gen p0 = exp(-yhat)
. gen p1 = exp(-yhat)*yhat
. gen p2 = exp(-yhat)*(yhat^2)/2
. gen mE1 = d0 - p0
. gen mE2 = d1 - p1
. gen mE3 = d2 - p2
. regress one mE1 mE2 mE3 s1 s2, noconstant
Source |
SS
df
MS
Number of obs = 200
-------------+-----------------------------F( 5, 195) = 0.49
Model | 2.50056717 5 .500113433
Prob > F
= 0.7807
Residual | 197.499433 195 1.0128176
R-squared = 0.0125
-------------+-----------------------------Adj R-squared = -0.0128
Total |
200 200
1
Root MSE
= 1.0064
-----------------------------------------------------------------------------one |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------mE1 | 1.020078 .7290569 1.40 0.163 -.4177712 2.457927
mE2 | .7149016 .5053259 1.41 0.159 -.2817042 1.711507
mE3 | .2705081 .383646 0.71 0.482 -.4861201 1.027136
s1 | .2916116 .2217763 1.31 0.190 -.1457765 .7289997
s2 | -.1341565 .1125046 -1.19 0.235 -.3560384 .0877255
-----------------------------------------------------------------------------. scalar CME = e(N)*e(r2)
. di "CME: " CME " p-value: " chi2tail(3,CME)
CME: 2.5005672 p-value: .47518859
.
. * Wrong alternative is basic chisquare
176

. quietly sum d0
. scalar sumd0 = r(sum)
. quietly sum d1
. scalar sumd1 = r(sum)
. quietly sum d2
. scalar sumd2 = r(sum)
. scalar sumd3 = 1 - sumd0 - sumd1 - sumd2
. quietly sum p0
. scalar sump0 = r(sum)
. quietly sum p1
. scalar sump1 = r(sum)
. quietly sum p2
. scalar sump2 = r(sum)
. scalar sump3 = 1 - sump0 - sump1 - sump2
. scalar chisq = (sumd0-sump0)^2/sump0 + (sumd1-sump1)^2/sump1 /*
>
*/ + (sumd2-sump2)^2/sump2 + (sumd3-sump3)^2/sump3
. di "Wrong Traditional chi-square: " chisq " p = " chi2tail(3,chisq)
Wrong Traditional chi-square: .47431003 p = .92449803
.
.
. ********** DISPLAY RESULTS (Table 8.1 p.270) **********
.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
200 -.0091098 1.010072 -2.857666 2.149822
mupoiss |
200 1.599601 1.674071 .0574026 8.58333
xp |
200
1.525 2.363749
0
15
y|
200
1.525 2.363749
0
15
yhat |
200
1.525 1.803242 .0341372 9.498652
-------------+-------------------------------------------------------s1 |
200 1.36e-09 1.36719 -3.148933 6.245292
s2 |
200 6.69e-09 1.889198 -6.420406 12.97311
one |
200
1
0
1
1
177

z|
200 1.015227 1.286795 .0000877 8.166255
mA |
200 .1563713 3.403966 -13.52498 26.94856
-------------+-------------------------------------------------------mB1 |
200 .334863 3.470417 -6.436038 30.24896
mB2 |
200
.43869 5.749749 -11.74974 62.83503
mC1 |
200 .334863 3.077815 -6.838236 24.00367
mC2 |
200
.43869 4.897291 -12.484 49.86192
mD1 |
200 .334863 3.077815 -6.838236 24.00367
-------------+-------------------------------------------------------mD2 |
200
.43869 4.897291 -12.484 49.86192
mD3 |
200 .8381842 9.190652 -22.791 103.5763
d0 |
200
.435 .4970011
0
1
d1 |
200
.255 .436955
0
1
d2 |
200
.11 .3136749
0
1
-------------+-------------------------------------------------------p0 |
200 .429237 .2918348 .000075 .9664389
p1 |
200 .2406035 .1137756 .000712 .367864
p2 |
200 .1235594 .0894167 .0005631 .2706694
mE1 |
200 .005763 .4287003 -.9289918 .9571021
mE2 |
200 .0143965 .4210301 -.367864 .9315748
-------------+-------------------------------------------------------mE3 |
200 -.0135594 .3065698 -.2706694 .9688674
.
. * Gives Rows 1-5 of Table 8.1 (The CMxnoscores are not reported)
. di "CMA: " CMA " p-value: " chi2tail(1,CMA)
CMA: 3.2717711 p-value: .07048149
. di "CMB: " CMB " p-value: " chi2tail(2,CMB)
CMB: 2.4340001 p-value: .29611717
. di "CMC: " CMC " p-value: " chi2tail(2,CMC)
CMC: 2.4340001 p-value: .29611717
. di "CMD: " CMD " p-value: " chi2tail(3,CMD)
CMD: 2.9463051 p-value: .39997818
. di "CME: " CME " p-value: " chi2tail(3,CME)
CME: 2.5005672 p-value: .47518859
. di "CMCnoscores: " CMCnoscores " p-value: " chi2tail(2,CMCnoscores)
CMCnoscores: 2.4069518 p-value: .30014911
. di "CMDnoscores: " CMDnoscores " p-value: " chi2tail(3,CMDnoscores)
CMDnoscores: 2.7344575 p-value: .43440333
.
. ********** FURTHER ANALYSIS gives M** column in Table 8.1 **********
.
. * The following drops the scores from the regression. Provides lower bound.
. * Results are reported in last column in Table 8.1
178

. quietly regress one mA, noconstant


. di "CMA without scores:" e(N)*e(r2) " with p = " chi2tail(1,e(N)*e(r2))
CMA without scores:.42328231 with p = .51530376
. quietly regress one mB1 mB2, noconstant
. di "CMB without scores:" e(N)*e(r2) " with p = " chi2tail(2,e(N)*e(r2))
CMB without scores:1.8897296 with p = .38873213
. quietly regress one mC1 mC2, noconstant
. di "CMC without scores:" e(N)*e(r2) " with p = " chi2tail(2,e(N)*e(r2))
CMC without scores:2.4069518 with p = .30014911
. quietly regress one mD1 mD2 mD3, noconstant
. di "CMD without scores:" e(N)*e(r2) " with p = " chi2tail(3,e(N)*e(r2))
CMD without scores:2.7344575 with p = .43440333
. quietly regress one mE1 mE2 mE3, noconstant
. di "CME without scores:" e(N)*e(r2) " with p = " chi2tail(3,e(N)*e(r2))
CME without scores:.73842732 with p = .86413036
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma08p1cmtests.txt
log type: text
closed on: 17 May 2005, 14:04:20
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma08p2nonnested.txt
log type: text
opened on: 18 May 2005, 21:27:00
.
. ********** OVERVIEW OF MMA08P2NONNESTED.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 8.5.3 pages 283-4
. * Nonnested model comparison given in Table 8.2:
.
. * (A) AIC AND VARIATIONS
. * (B) VUONG TEST for Overlapping Models
179

. * for a Poisson model with simulated data (see below).


.
. * This example requires the free Stata add-on command rndpoix.
. * In Stata: search rndpoix
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * Dgp is
. * y ~ Poisson[exp(b1 + b2*x2 + b3*x3]
. * where
. * x2, x3 is iid ~ N[0,1]
. * and b1=0 and b2=1 and b3=1.
.
. * The Models compared are
. * Poisson of y on x2
. * Poisson of y on x3 and x3^2
.
. set seed 10001
. set obs 100
obs was 0, now 100
. scalar b1 = 0.5
. scalar b2 = 0.5
. scalar b3 = 0.5
.
. * Generate regressors
. gen x2 = invnorm(uniform())
. gen x3 = invnorm(uniform())
. gen x2sq = x2*x2
. gen x3sq = x3*x3
.
. * Generate y
. gen mupoiss = exp(b1+b2*x2+b3*x3)

180

. * The next requires Stata add-on. In Stata: search rndpoix


. rndpoix(mupoiss)
( Generating ......... )
Variable xp created.
. gen y = xp
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 x3 x2sq x3sq using mma08p2nonnested.asc, replace
.
. ********* SETUP FOR THIS PROGRAM *********
.
. * Change this if want different regressors
. * Here both models differ from the dgp
. * The Vuong test below assumes that the two models are OVERLAPPING
. global XLISTMODEL1 x2
. global XLISTMODEL2 x3 x3sq
.
. ********* (A) AIC AND VARIATIONS *********
.
. * Stata output from Poisson saves much of this.
. * Also calculate manually.
.
. * The following code can be changed to different models than poisson
. * provided
. * ereturn list yields N = e(N); q = e(k); and LnL = e(ll)
. * We use AIC = -2lnL+2q; BIC = -2lnL+lnN*q; CAIC = -2lnL+(1+lnN)*q
.
. poisson y $XLISTMODEL1
Iteration 0: log likelihood = -183.43146
Iteration 1: log likelihood = -183.43146
Poisson regression

Number of obs =
100
LR chi2(1)
=
16.28
Prob > chi2 = 0.0001
Log likelihood = -183.43146
Pseudo R2
= 0.0425
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x2 | .291164 .072311 4.03 0.000 .1494371 .4328909
_cons | .6084331 .0752833 8.08 0.000 .4608806 .7559857
-----------------------------------------------------------------------------. estimates store model1

181

. scalar ll1 = e(ll)


. scalar q1 = e(k)
. scalar N1 = e(N)
. scalar aic1 = -2*ll1 + 2*q1
. scalar bic1 = -2*ll1 + ln(N1)*q1
. scalar caic1 = -2*ll1 + (1 + ln(N1))*q1
.
. poisson y $XLISTMODEL2
Iteration 0: log likelihood = -176.09611
Iteration 1: log likelihood = -176.09119
Iteration 2: log likelihood = -176.09119
Poisson regression

Number of obs =
100
LR chi2(2)
=
30.96
Prob > chi2 = 0.0000
Log likelihood = -176.09119
Pseudo R2
= 0.0808
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245
x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029
_cons | .492656 .0958903 5.14 0.000 .3047144 .6805975
-----------------------------------------------------------------------------. estimates store model2
. scalar ll2 = e(ll)
. scalar q2 = e(k)
. scalar N2 = e(N)
. scalar aic2 = -2*ll2 + 2*q2
. scalar bic2 = -2*ll2 + ln(N2)*q2
. scalar caic2 = -2*ll2 + (1 + ln(N2))*q2
.
. * Display results given in first three rows of Table 8.2 page 284
.
. estimates table model1 model2, stats(N k ll aic bic)

182

---------------------------------------Variable | model1
model2
-------------+-------------------------x2 | .29116396
x3 |
.35884118
x3sq |
.09129986
_cons | .60843314 .49265596
-------------+-------------------------N|
100
100
k|
2
3
ll | -183.43146 -176.09119
aic | 370.86292 358.18238
bic | 376.07326 365.99789
---------------------------------------.
. di "Model 1: " _n "lnL: " ll1 " q: " q1 _n " N: " N1
Model 1:
lnL: -183.43146 q: 2
N: 100
. di "-2lnL: " -2*ll1 _n "AIC: " aic1 _n " BIC: " bic1 _n "caic: " caic1
-2lnL: 366.86292
AIC: 370.86292
BIC: 376.07326
caic: 378.07326
.
. di "Model 2: " _n "lnL: " ll2 " q: " q2 _n " N: " N2
Model 2:
lnL: -176.09119 q: 3
N: 100
. di "-2lnL: " -2*ll2 _n "AIC: " aic2 _n " BIC: " bic2 _n "caic: " caic2
-2lnL: 352.18238
AIC: 358.18238
BIC: 365.99789
caic: 368.99789
.
. ********* (B) VUONG TEST FOR OVERLAPPING MODELS *********
.
. * The test has three variants
. * (1) Nested models: G is contained in F
. * (2) Strictly non-nested models: F intersection G equals null set
. * (3) Overlapping models: F intersection G does not equal null set
.
. * Need to compute lnf(y) for models 1 and 2,
. * where density f is model 1 and density g is model 2
.
. * The procedures will vary with model. Here use Poisson.
183

.
. * (0) COMPUTE THE LR TEST STATISTIC
.
. * This is LR = Sum_i [ ln (fy1_i / gy2_i) ]
.*
= Sum_i lnfy1_i - Sum_i lngy2_i
.*
= difference in log-likelihood for the two models
.
. * Easiest if program output gives logL
. * Otherwise need to generate manually
.
. quietly poisson y $XLISTMODEL1
. scalar llf = e(ll)
. quietly poisson y $XLISTMODEL2
. scalar llg = e(ll)
. scalar LR = llf - llg
. di "LR = " LR " and llf = " llf " llg = " llg
LR = -7.3402698 and llf = -183.43146 llg = -176.09119
.
. * (1) NESTED MODELS
.
. * Not done here as not relevant for the example of this application.
.
. * (1A) Usual LR test if assume densities correctly specified.
.
. * (1B) If instead want robustified version then need to compute W
. * and use the weighted chi-square test.
. * This is not the appropriate test here,
. * but in 3(A) below W is computed and a weighted chi-square test used.
. * This code could be easily adapted to here.
.
. * (2) STRICTLY NON-NESTED MODELS
.
. * Not done here as not relevant for the example of this application.
. * Test uses LR/what ~ normal where what is computed in 3(B) below.
.
. * (3) OVERLAPPING MODELS
.
. * This is the relevant test here
. * First test whether overlapping (even though here know that is)
. * THen do the test
.
. * (3A-1) Compute what^2
.
. * Calculate what^2
. * = (1/N)*Sum_i[ln(fy1_i/gy2_i)^2] - [(1/N)*Sum_i[ln(fy1_i/gy2_i)]^2
184

. * = (1/N) * Sum_i [(ln(fy1_i) - ln(gy2_i))^2] - (LR/N)^2


.
. * For the Poisson
.*
f(y) = exp(-mu)*mu^y/y!
. * so lnf(y) = -mu + y*ln(mu) - lny!
. quietly poisson y $XLISTMODEL1
. predict yhatf
(option n assumed; predicted number of events)
. * Poisson default predict gives yhat = exp(x'b)
. gen lnf = -yhatf + y*ln(yhatf) - lnfact(y)
. quietly poisson y $XLISTMODEL2
. predict yhatg
(option n assumed; predicted number of events)
. gen lng = -yhatg + y*ln(yhatg) - lnfact(y)
. gen lnratiosq = (lnf-lng)^2
. sum lnratiosq
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnratiosq |
100 .6967792 1.816804 .0000331 13.85592
. scalar whatsq = r(sum)/_N - (LR/_N)^2
. scalar Nwhatsq = _N*whatsq
. di "First-stage test statistic whatsq - still need to find critical value"
First-stage test statistic whatsq - still need to find critical value
. di "N*omegahatsq = " Nwhatsq
N*omegahatsq = 69.139128
.
. * Aside: Check by recomputing LR this long way
. gen lnratio = (lnf-lng)
. sum lnratio
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------lnratio |
100 -.0734027 .8356883 -3.722355 2.571382
. scalar LRcheck = r(sum)
.
185

. *** Display results given in second last row of Table 8.2 page 284
.
. di "LR = " LR " and LRcheck = " LRcheck
LR = -7.3402698 and LRcheck = -7.3402702
.
. * (3A-2) Find the critical value by first find W, then eigenvalues lamda, then simulate
.
. * Calculate estimate of the W matrix on page ?? of Vuong.
. * (a) Can estimate Af = E[d2lnf(y)/dbdb'] as inverse of usual ML variance matrix
. * (b) Since the robust ML variance matrix is V = Ainv*B*Ainv
. * can estimate Bf = -E[dlnf(y)/dbxdlnf(y)/db'] by A*V*A where A is in (a)
. * (c) For Ag same as in part (a) except for model g
. * (d) For Bg same as in part (a) except for model g
. * (e) The only tricky bit is computation of Bfg
.
. gen one = 1
. * (a) Af
. quietly poisson y one $XLISTMODEL1, noconstant
. matrix Af = syminv(e(V))
. * (b) Bf
. quietly poisson y one $XLISTMODEL1, noconstant robust
. * robust gives Ainv*B*Ainv so pre and post multiply by A gives B
. * Also make adjustment s Stata divides by (_N-1). Here use _N.
. matrix Bf = Af*e(V)*Af*(_N-1)/_N
. * (c) Ag
. quietly poisson y one $XLISTMODEL2, noconstant
. matrix Ag = syminv(e(V))
. * (d) Bg
. quietly poisson y one $XLISTMODEL2, noconstant robust
. matrix Bg = Ag*e(V)*Ag*(_N-1)/_N
.
. * (e) Bfg requires more specialized code pecuuliar to this example
. * For Poisson dlnf(y)/db = Sum_I (y_i - mu_i)*x_i
. * so Bfg = (1/N)*Sum_i [(y_i - muf_i)*xf_i]*[(y_i - mug_i)*xg_i]'
. * For model 1 x is intercept and x2 (global XLISTMODEL1 x2)
. gen bf1 = (y - yhatf)
/* yhatf saved earlier = y - muf */
. gen bf2 = (y - yhatf)*x2
. * For model 2 x is intercept, x3 and x3sq (global XLISTMODEL2 x3 x3sq)
. gen bg1 = (y - yhatg)
/* yhatg saved earlier = y - mug */
186

. gen bg2 = (y - yhatg)*x3


. gen bg3 = (y - yhatg)*x3sq
. * Create Bfg
. matrix accum BfBg = bf1 bf2 bg1 bg2 bg3, noconstant
(obs=100)
. * and Bfg is the (1,2) submatrix: rows 1 to 2 and columns 3 to 5
. matrix Bfg = BfBg[1..2,3..5]
.
. * Form the matrix W
. * Note there is no need for minus sign as A has been defined as -A
. matrix W11 = Bf*syminv(Af)
. matrix W12 = Bfg*syminv(Ag)
. matrix W21 = Bfg'*syminv(Af)
. matrix W22 = Bg*syminv(Ag)
. matrix W = W11,W12\W21,W22
. matrix list W
W[5,5]

y:one
y:x2
bg1
bg2
bg3

y:
y:
y:
y:
y:
one
x2
one
x3
x3sq
1.5571072 .01745302 1.3738479 .03868485 -.1702893
.05110494 1.4484966 .61074273 .07847014 -.15039712
1.1488275 .1064062 1.6030095 .0647251 -.18944561
.39558125 .08428705 .20709641 1.0650899 -.05677421
1.1180355 -.0564763 .19914593 .07617139 .90718177

.
. * Calculate the eigenvalues of W
. matrix eigenvalues reigvalW ceigvalW = W
. * Real eigenvalues
. matrix list reigvalW
reigvalW[1,5]
y:
y:
y:
y:
y:
one
x2
one
x3
x3sq
real 2.7511946 .29082285 1.4750881 1.0021719 1.0616075
. * Complex eigenvalues - hopefully none
. matrix list ceigvalW

187

ceigvalW[1,5]
y: y: y: y: y:
one x2 one x3 x3sq
complex 0 0 0 0 0
.
. * This gives the vector lamda of eigenvalus of W
. matrix lamda = reigvalW
. scalar l1 = lamda[1,1]
. scalar l2 = lamda[1,2]
. scalar l3 = lamda[1,3]
. scalar l4 = lamda[1,4]
. scalar l5 = lamda[1,5]
.
. * Now obtain the p-value and critical value at level 0.05
. preserve
. * Obtain the 5 percent critical value by simulating 10000 draws from
. * M_p+q(lamda) = Sum_j lamda*j*z_j^2 where z_j are N[0,1] so z_j^2 are chi(1)
. set seed 10101
. set obs 10000
obs was 100, now 10000
. gen randomdraw = l1*invnorm(uniform())^2 + l2*invnorm(uniform())^2 + /*
> */ l3*invnorm(uniform())^2 + l4*invnorm(uniform())^2 + l5*invnorm(uniform())^2
. gen indicator = Nwhatsq >= randomdraw
. quietly sum indicator
. di "p-value for the Omegahatsq test = " 1-r(mean)
p-value for the Omegahatsq test = 0
. sum randomdraw, detail
randomdraw
------------------------------------------------------------Percentiles
Smallest
1% .6438425
.0756691
5% 1.286375
.1250253
10% 1.850972
.1326376
Obs
10000
25% 3.137835
.1402145
Sum of Wgt.
10000
50%

5.359223

Mean

6.614841
188

75%
90%
95%
99%

Largest
Std. Dev.
4.90562
8.751276
38.32291
12.8871
38.75208
Variance
24.06511
16.10237
40.94431
Skewness
1.733549
23.85304
44.08449
Kurtosis
7.514808

. di "Reject overlapping at level .05 if N*omegahatsq exceeds " r(p95)


Reject overlapping at level .05 if N*omegahatsq exceeds 16.102374
. restore
. di "where N*omegahatequals " Nwhatsq
where N*omegahatequals 69.139128
. di "If reject then continue to second step."
If reject then continue to second step.
. di "Otherwise stop as cannot determine whether models are overlapping."
Otherwise stop as cannot determine whether models are overlapping.
.
. * (3B) Do the second stage test if reject at (3A)
. gen TLR = (LR/sqrt(whatsq))/sqrt(_N)
.
. *** Display results given in second last row of Table 8.2 page 284
.
. di "TLR is N[0,1]. Here TLR = " TLR
TLR is N[0,1]. Here TLR = -.88277513
. di "Two-tailed test p-value: " chi2tail(1,TLR^2)
Two-tailed test p-value: .37735778
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma08p2nonnested.txt
log type: text
closed on: 18 May 2005, 21:27:00
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt
log type: text
opened on: 17 May 2005, 14:10:13
.
. ********** OVERVIEW OF MMA08P3DIAGNOSTICS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
189

. * by A. Colin Cameron and Pravin K. Trivedi (2005)


. * Cambridge University Press
.
. * Chapter 8.7.3 pages 290-1
. * Model diagnostics example (Table 8.3)
.
. * (A) DIFFERENT R-SQUAREDS
. * (B) CALCULATION OF RESIDUALS
. * for a Poisson model with simulated data (see below).
.
. * The data generation requires free Stata add-on command rndpoix
. * In Stata: search rndpoix
.
. * This program gives results for model 2
. * For model 1 need to rerun with only x3 as regressor
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * Model is
. * y ~ Poisson[exp(b1 + b2*x2 + b3*x3]
. * where
. * x2 and x3 are iid ~ N[0,1]
. * and b1=0.5 and b2=0.5 and b3=0.5.
.
. * The Diagnostics below are from Poisson regression of y on x3 alone
. * or from Poisson regression of y on x3 and x3sq. [Note" x2 is omitted]
.
. set seed 10001
. set obs 100
obs was 0, now 100
. scalar b1 = 0.5
. scalar b2 = 0.5
. scalar b3 = 0.5
.
. * Generate regressors
. gen x2 = invnorm(uniform())

190

. gen x3 = invnorm(uniform())
.
. * Generate y
. gen mupoiss = exp(b1+b2*x2+b3*x3)
. * The next requires Stata add-on. In Stata: search rndpoix
. rndpoix(mupoiss)
( Generating ......... )
Variable xp created.
. gen y = xp
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
100 .0053689 1.000686 -2.173506 2.106561
x3 |
100 -.0235884 1.024207 -2.857666 2.149822
mupoiss |
100 2.020511 1.400564 .3380426 7.029678
xp |
100
1.92 1.835013
0
8
y|
100
1.92 1.835013
0
8
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x2 x3 using mma08p3diagnostics.asc, replace
.
. ********* SETUP FOR THIS PROGRAM **********
.
. * Change this if want different regressors
. gen x3sq = x3*x3
. * global XLIST x3
/* Model 1 */
. global XLIST x3 x3sq /* Model 2 */
.
. ********* R-SQUARED (reported in Table 8.3 p.291) **********
.
. * The following code can be changed to diffferent models than poisson
. * For RsqRES, RsqEXP and RsqCOR need
.* y
dependent variable
. * yhat predicted value of dependent variable
. * For RsqWRSS additionally need
. * sigmasq predicted variance of dependent variable
. * For RsqRG need log density evaluated at values given below
.
. * Obtain exp(x'b) Will vary with the model
. poisson y $XLIST
Iteration 0: log likelihood = -176.09611
191

Iteration 1: log likelihood = -176.09119


Iteration 2: log likelihood = -176.09119
Poisson regression

Number of obs =
100
LR chi2(2)
=
30.96
Prob > chi2 = 0.0000
Log likelihood = -176.09119
Pseudo R2
= 0.0808
-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245
x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029
_cons | .492656 .0958903 5.14 0.000 .3047144 .6805975
-----------------------------------------------------------------------------. predict yhat
(option n assumed; predicted number of events)
. scalar dof = e(N)-e(k)
.
. * RsqRES and RsqEXP are R-squared from sums of squares
. * First get TSS, ESS and RSS
. egen ybar = mean(y)
. gen ylessybarsq = (y - ybar)^2
. quietly sum ylessybarsq
. scalar totalss = r(mean)
. gen yhatlessybarsq = (yhat - ybar)^2
. quietly sum yhatlessybarsq
. scalar explainedss = r(mean)
. gen residualsq = (y - yhat)^2
. quietly sum residualsq
. scalar residualss = r(mean)
. * Second computed the rsquared
. scalar sereg = sqrt(residualss/dof)
. scalar RsqRES = 1 - residualss/totalss
. scalar RsqEXP = explainedss/totalss

192

.
. * RsqCOR uses sample correlation
. quietly correlate y yhat
. scalar RsqCOR = r(rho)^2
.
. di "standard error of regression: " sereg
standard error of regression: .16620308
. di "totalss: " totalss _n "explainedss: " explainedss _n "residualss: " residualss
totalss: 3.3336
explainedss: .69556676
residualss: 2.6794761
. di "RsqRES: " RsqRES _n "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR
RsqRES: .19622149
RsqEXP: .20865333
RsqCOR: .19640666
.
. * RsqWRSS uses weighted sums of squares
. * First generate estimated variance of y
. * Here for Poisson use fact that variance = mean
. gen sigmasq = yhat
. gen weightedylessybarsq = ((y - ybar)^2) / sigmasq
. quietly sum weightedylessybarsq
. scalar weightedtotalss = r(mean)
. gen weightedresidualsq = ((y - yhat)^2) / sigmasq
. quietly sum weightedresidualsq
. scalar weightedresidualss = r(mean)
. scalar RsqWRSS = 1 - weightedresidualss/weightedtotalss
. di "RsqWRSS: " RsqWRSS
RsqWRSS: .16945018
.
. * RsqRG is from ML. Difficult to generalize beyond LEF models.
. * Need
. * lnL_fit log-likelihood at fitted values (the usual)
. * lnL_0 log-likelihood at intecept only
. * lnL_max log-likelihood at best fit
. quietly poisson y $XLIST

193

. scalar lnL_fit = e(ll)


. scalar lnL_0 = e(ll_0)
. * The following applies only for Poisson. Differs for otehr models.
. * lnf(y) = -mu + y*ln(mu) - ln(y!)
. * is maximized at mu = y
. * so compute lnL_max = sum of [-y + y*ln(y) - lny!]
. * Following sets 0*ln0 = 0
. gen ylny = 0
. replace ylny = y*ln(y) if y > 0
(51 real changes made)
. gen lnfyatmax = -y + ylny - lnfact(y)
. quietly sum lnfyatmax
. scalar lnL_max = r(sum)
. scalar RsqRG = (lnL_fit - lnL_0) / (lnL_max - lnL_0)
.
. * RsqQ should only be used for binary and other discrete choice models
. * And definitely use only if lnL_fit < 0
. scalar RsqQ = 1 - lnL_fit/lnL_0
.
. di "lnL_0: " lnL_0 _n "lnL_fit: " lnL_fit _n "lnL_max: " lnL_max
lnL_0: -191.57162
lnL_fit: -176.09119
lnL_max: -101.12402
. di "RsqRG: " RsqRG _n "RsqQ: " RsqQ
RsqRG: .17115358
RsqQ: .08080754
.
. * Check
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x2 |
100 .0053689 1.000686 -2.173506 2.106561
x3 |
100 -.0235884 1.024207 -2.857666 2.149822
mupoiss |
100 2.020511 1.400564 .3380426 7.029678
xp |
100
1.92 1.835013
0
8
y|
100
1.92 1.835013
0
8
-------------+-------------------------------------------------------x3sq |
100 1.039067 1.446146 .0000877 8.166255
yhat |
100
1.92 .838208 1.150405 5.398193
194

ybar |
100
1.92
0
1.92
1.92
ylessybarsq |
100
3.3336 5.966374
.0064 36.9664
yhatlessyb~q |
100 .6955668 1.572256 4.82e-06 12.09783
-------------+-------------------------------------------------------residualsq |
100 2.679476 4.830379 .0000825 36.93972
sigmasq |
100
1.92 .838208 1.150405 5.398193
weightedyl~q |
100 1.681324 2.560112 .0018502 19.23135
weightedre~q |
100 1.396423 2.424518 .0000276 19.21747
ylny |
100 2.15694 3.48234
0 16.63553
-------------+-------------------------------------------------------lnfyatmax |
100 -1.01124 .6233793 -1.969071
0
. poisson y $XLIST /* Stata Rsq = RsqQ */
Iteration 0: log likelihood = -176.09611
Iteration 1: log likelihood = -176.09119
Iteration 2: log likelihood = -176.09119
Poisson regression

Number of obs =
100
LR chi2(2)
=
30.96
Prob > chi2 = 0.0000
Log likelihood = -176.09119
Pseudo R2
= 0.0808

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x3 | .3588412 .07035 5.10 0.000 .2209578 .4967245
x3sq | .0912999 .0514311 1.78 0.076 -.0095032 .1921029
_cons | .492656 .0958903 5.14 0.000 .3047144 .6805975
-----------------------------------------------------------------------------.
. *** The following results are for Model 2 in Table 8.3 p.291
. *** For model 1 R-squareds need to rerun with only x3 as regressor
. di "standard error of regression: " sereg
standard error of regression: .16620308
. di "RsqRES: " RsqRES _n "RsqEXP: " RsqEXP _n "RsqCOR: " RsqCOR
RsqRES: .19622149
RsqEXP: .20865333
RsqCOR: .19640666
. di "RsqWRSS: " RsqWRSS _n "RsqRG: " RsqRG _n "RsqQ: " RsqQ
RsqWRSS: .16945018
RsqRG: .17115358
RsqQ: .08080754
.
. ********* RESIDUAL ANALYSIS (text bottom p.290 to top p.291) **********
.
. * Assume that from earlier have yhat
195

.
. * raw residual
. gen raw = y - yhat
. gen sigma = sqrt(yhat)
. gen Pearson = (y - yhat)/sigma
. * Note that earlier defined ylny = 0 if y=0 and = yln(y) otherwise
. gen deviance = sign(y-yhat)*sqrt(2*(-y+ylny)-2*(-yhat+y*ln(yhat)))
.
. *** The following are results reported in text bottom p.290 to top p.291
. sum raw Pearson deviance
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------raw |
100 -2.38e-09 1.645157 -2.993904 6.077806
Pearson |
100 -.0014455 1.187656 -1.498094 4.383774
deviance |
100 -.2103819 1.212345 -2.016939 3.264961
. corr raw Pearson deviance
(obs=100)
|
raw Pearson deviance
-------------+--------------------------raw | 1.0000
Pearson | 0.9852 1.0000
deviance | 0.9625 0.9818 1.0000

. * Example of use to find whether x3 belongs in the model


. * graph twoway scatter Pearson x3
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma08p3diagnostics.txt
log type: text
closed on: 17 May 2005, 14:10:13

196

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma09p1np.txt
log type: text
opened on: 17 May 2005, 14:16:51
.
. ********** OVERVIEW OF MMA09P1NP.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 9.2 p.295-297
. * Nonparametric density estimation and nonparametric regression using actual data.
.
. * (1) Histogram: Figure 9.1 in chapter 9.2.1 (ch9hist)
. * (2) Kernel density estimate as bandwidth varies: Figure 9.2 in chapter 9.2.1 (ch9kd1)
. * (3) Kernel density estimate as kernel varies: Figure 9.4 in chapter 9.3.4 (ch9kdensu1)
. * (4) Lowess regression: Figure 9.3 in chapter 9.4.3 (ch9ksm1)
. * (5) Extra: Nearest neighbours regression: using Lowess and using add-on knnreg
. * (6) Extra: Kernel regression: using add-on kernreg
.
. * using data on earnings and education (see below)
.
. * NOTE: This particular program uses version 8.2 rather than 8.0
.*
For kernel density Stata uses an alternative formulation of Epanechnikov
.*
To follow book and e.g. Hardle (1990) use epan2 rather than epan
.*
epan = epan2 if epan bandwidth is epan2 bandwidth divided by sqrt(5)
.*
where kernel epan2 is an update to Stata version 8.2
.
. * To run this program you need file
. * psidf3050.dat
. * in your directory
.
. * To do (5) and (6) you need Stata add-ons knnreg and kernreg
. * In Stata give command search knnreg and search kernreg
.
. * See also mma9p2npmore.do for more on nonparametric regression (Figures 9.5-9.7)
.
. ********** SETUP
.
. di "mma09p1np.do Cameron and Trivedi: Stata nonparametrics with wages and education"
mma09p1np.do Cameron and Trivedi: Stata nonparametrics with wages and education
. set more off
. version 8
. set scheme s1mono /* Graphics scheme */
197

.
. ********** DATA DESCRIPTION
.*
. * The original data are from the PSID Individual Level Final Release 1993 data
. * From www.isr.umich.edu/src/psid then choose Data Center
. * 4856 observations on 9 variables for Females 30 to 50 years
.
. * Fixed width data
. * intnum 1-4 V30001="1968 INTERVIEW NUMBER"
. * persnum 5-7 V30002="PERSON NUMBER"
. * age
8-9 V30809="AGE OF INDIVIDUAL
93"
. * educatn 10-11 V30820="G90 HIGHEST GRADE COMPLETED
93"
. * earnings 12-17 V30821="TOTAL LABOR INCOME
93"
. * hours 18-21 V30823="1992 ANNUAL WORK HOURS
93"
. * sex
22 V32000="SEX OF INDIVIDUAL"
. * kids 23-24 V32022="# LIVE BIRTHS TO THIS INDIVIDUAL"
. * [NOTE: DO NOT USE THE kids VARIABLE AS IT IS NUMBER OF BIRTHS
.*
NOT NUMBER OF KIDS CURRENTLYU IN HOUSEHOLD]
. * married 25 V32049="LAST KNOWN MARITAL STATUS"
.
. ********** READ DATA **********
.
. * Data are fixed format so use infix
. infix intnum 1-4 persnum 5-7 age 8-9 educatn 10-11 earnings 12-17 /*
> */ hours 18-21 sex 22 kids 23-24 married 25 using psidf3050.dat
(4856 observations read)
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------intnum |
4856 4598.101 2761.971
4
9306
persnum |
4856 59.21355 79.74856
1
205
age |
4856 38.46293 5.595116
30
50
educatn |
4855 16.37714 18.4495
0
99
earnings |
4856 14244.51 15985.45
0 240000
-------------+-------------------------------------------------------hours |
4856 1235.335 947.1758
0
5160
sex |
4856
2
0
2
2
kids |
4856 4.48126 14.88786
0
99
married |
4856 1.920717 1.504848
1
9
.
. ********** MISSING VALUES, DATA TRANSFORMATIONS and SAMPLE SELECTION
.
. * For Highest grade codes the missing codes are 98 DK and 99 NA and 0 inappropriate
. * Here treat these as missing
. replace educatn = . if (educatn==0 | educatn==98 | educatn==99)
(290 real changes made, 290 to missing)

198

.
. * For marital status the codes are
. * 1 married; 2 Never married; 3 Widowed; 4 Divorced, annulment;
. * 5 Separated; 8 NA / DK; 9 No histories 85-93
. * Recode 2-5 as not married and treat 8 and 9 as missing
. replace married = . if (married==8 | married==9)
(52 real changes made, 52 to missing)
. replace married = 0 if married > 1
(1785 real changes made)
.
. * For kids the missing codes are 98 DK/NA and 99 no birth history
. replace kids = . if (kids==98 | kids==99)
(118 real changes made, 118 to missing)
. * But do not use these data as it is number of births
. * not number of kids currently in household
. * So I drop kids
. drop kids
.
. * Work with positive earnings only
. drop if earnings==0
(1204 observations deleted)
. * Topcode women with very high earnings
. replace earnings=100000 if earnings>100000
(11 real changes made)
. * Create log hourly wage
. gen hwage = earnings/hours
. gen lnhwage = ln(hwage)
.
. * Work with age 36 and nonmissing education data
. keep if age == 36
(3468 observations deleted)
. drop if educatn == .
(7 observations deleted)
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------intnum |
177 4699.853 2765.081
14
9240
persnum |
177 59.53672 79.73001
1
188
age |
177
36
0
36
36
educatn |
177 12.58757 2.841347
3
17
199

earnings |
177 17470.55 13513.56
87
70000
-------------+-------------------------------------------------------hours |
177 1506.401 698.4145
8
3160
sex |
177
2
0
2
2
married |
177 .7457627 .4366669
0
1
hwage |
177 12.71631 16.58889 .6837607
175
lnhwage |
177 2.198163 .8281614 -.3801473 5.164786
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile intnum persnum age educatn earnings hours sex married hwage /*
> */ lnhwage using mma09p1np.asc, replace
.
. ********* ANALYSIS: (1)-(3) NONPARAMETRIC DENSITY ESTIMATES
.
. set scheme s1mono
.
. * Here give bin width for histogram and kdensity
.
. * Calculate Silberman's plugin estimate of optimal bandwidth in (9.13)
. * with delta given in Table 9.1 for Epanechnikov kernel
. quietly sum lnhwage, detail
. global sadj = min(r(sd),(r(p75)-r(p25))/1.349)
. di "sadj: " $sadj " iqr/1349: " (r(p75)-r(p25))/1.349 " stdev: " r(sd)
sadj: .65488184 iqr/1349: .65488184 stdev: .82816143
. global bwepan2 = 1.3643*1.7188*$sadj/(r(N)^0.2)
. di "Bandwidth: " $bwepan2
Bandwidth: .54538542
.
. * HISTOGRAM ONLY - Figure 9.1
. graph twoway (histogram lnhwage, bin(20) bcolor(*.2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Histogram for Log Wage") /*
> */ xtitle("Log Hourly Wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(10) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Histogram") label(2 "Kernel"))
. graph save ch9hist, replace
(file ch9hist.gph saved)
. graph export ch9hist.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9hist.wmf written in Windows Metafile format)

200

.
. * COMBINED HISTOGRAM AND KERNEL DENSITY ESTIMATE
. graph twoway (histogram lnhwage, bin(20) bcolor(*.2)) /*
> */ (kdensity lnhwage, width($bwepan2) epan2 clstyle(p1)), /*
> */ title("Histogram and Kernel Density for Log Wage") /*
> */ caption("Note: Kernel is Epanechnikov with bandwidth 0.55")
.
. * KERNEL DENSITY ESTIMATE FOR 3 BANDWIDTHS - Figure 9.2
. global bwonehalf = 0.5*$bwepan2
. global btwotimes = 2*$bwepan2
. graph twoway (kdensity lnhwage, width($bwonehalf) epan2 clstyle(p2)) /*
> */ (kdensity lnhwage, width($bwepan2) epan2 clstyle(p1)) /*
> */ (kdensity lnhwage, width($btwotimes) epan2 clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Density Estimates as Bandwidth Varies") /*
> */ xtitle("Log Hourly Wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Kernel density estimates", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "One-half plug-in") label(2 "Plug-in") /*
> */
label(3 "Two times plug-in"))
. graph save ch9kd1, replace
(file ch9kd1.gph saved)
. graph export ch9kd1.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9kd1.wmf written in Windows Metafile format)
.
. * KERNEL DENSITY ESTIMATE FOR 4 DIFFERENT KERNELS - Figure 9.4
. * Calculate Silberman's plugin optimal bandwidths using (9.13)
. * with delta given in Table 9.1 for the different kernels
.
. * Use sadj calculated earlier for Epanecnnikov
. global bwgauss = 1.3643*0.7764*$sadj/(_N^0.2)
. global bwbiweight = 1.3643*2.0362*$sadj/(_N^0.2)
. global bwrectang = 0.5*1.3643*1.3510*$sadj/(_N^0.2)
. di "Usual Epanechnikov (epan2):
" $bwepan2
Usual Epanechnikov (epan2):
.54538542
. di "Gaussian:
Gaussian:

" $bwgauss
.24635632

. di "Quartic or biweight:
Quartic or biweight:

" $bwbiweight
.64609832

201

. di "Uniform or rectangular:
" $bwrectang
Uniform or rectangular:
.21434015
. graph twoway (kdensity lnhwage, width($bwepan2) epan2) /*
> */ (kdensity lnhwage, width($bwgauss) gauss) /*
> */ (kdensity lnhwage, width($bwbiweight) biweight) /*
> */ (kdensity lnhwage, width($bwrectang) rectangle), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Density Estimates as Kernel Varies") /*
> */ xtitle("Log Hourly Wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Kernel density estimates", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(3) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Epanechnikov (h=0.545)") label(2 "Gaussian (h=0.246)") /*
> */
label(3 "Quartic (h=0.646)") label(4 "Uniform (h=0.214)"))
. graph save ch9kdensu1, replace
(file ch9kdensu1.gph saved)
. graph export ch9kdensu1.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9kdensu1.wmf written in Windows Metafile format)
.
. * SHOW THAT STATA EPANECHNIKOV = USUAL EPANECHNIKOV
. * Once divide usual Epanechnikov bandwidth by sqrt(5).
. * (Pagan and Ullah (1999, p.28) have formulae.)
. global bwepan = $bwepan2/sqrt(5)
. graph twoway (kdensity lnhwage, width($bwepan2) epan2) /*
> */ (kdensity lnhwage, width($bwepan) epan), /*
> */ title("Epan = Epan2 if bandwidth adjusted") /*
> */ legend( label(1 "Usual Epanechnikov") label(2 "Stata Epanechnikov"))
.
.
. ********* ANALYSIS: (4) LOWESS NONPARAMETRIC REGRESSION ESTIMATES
.
. * LOWESS WITH DEFAULT BANDWIDTH of 0.8
. lowess lnhwage educatn
.
. * LOWESS REGRESSION WITH BANDWIDTHS of 0.1, 0.4 and 0.8 - Figure 9.3
. graph twoway (scatter lnhwage educatn, msize(medsmall) msymbol(o)) /*
> */ (lowess lnhwage educatn, bwidth(0.8) clstyle(p2)) /*
> */ (lowess lnhwage educatn, bwidth(0.4) clstyle(p1)) /*
> */ (lowess lnhwage educatn, bwidth(0.1) clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Nonparametric Regression as Bandwidth Varies") /*
> */ xtitle("Years of Schooling", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Log Hourly Wage", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(12) ring(0) col(2)) legend(size(small)) /*
> */ legend( label(1 "Actual data") label(2 "Bandwidth h=0.8") /*
202

> */

label(3 "Bandwidth h=0.4") label(4 "Bandwidth h=0.1"))

. graph save ch9ksm1, replace


(file ch9ksm1.gph saved)
. graph export ch9ksm1.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9ksm1.wmf written in Windows Metafile format)
.
. ********* ANALYSIS: (5) EXTRA: K-NEAREST NEIGHBORS NONPARAMETRIC
REGRESSION
.
. * NEAREST NEIGHBOURS REGRESSION USING LOWESS
. * Use lowess with mean and noweight options to give running means = centered kNN
. global knnbwidth = 0.3
. di "knn via Lowess uses following % of sample: " $knnbwidth
knn via Lowess uses following % of sample: .3
. lowess lnhwage educatn, bwidth($knnbwidth) mean noweight
.
. * LOWESS COMPARED TO NEAREST NEIGHBOURS
. graph twoway (lowess lnhwage educatn, bwidth(0.3) mean noweight) /*
> */ (lowess lnhwage educatn, bwidth(0.3)), /*
> */ title("Centered kNN versus Lowess") /*
> */ legend( label(1 "Centered kNN") label(2 "Lowess 0.8"))
.
. * NEAREST NEIGHBOURS REGRESSION USING KNNREG COMPARED TO USING
LOWESS
. * knnreg is a Stata add-on (in Stata search knnreg to find and download)
. * Here we verify that same as lowess knn except knnreg drops endpoints
. global k = round($knnbwidth*_N)
. di "knnreg uses following number of neighbours: " $k
knnreg uses following number of neighbours: 53
. knnreg lnhwage educatn, k($k) gen(knnregpred) ylabel nograph
. lowess lnhwage educatn, bwidth($knnbwidth) gen(knnlowesspred) mean noweight nograph
. * Following shows that the same except knnreg drops endpoints and lowess does not
. sum knnlowesspred knnregpred
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------knnlowessp~d |
177 2.180308 .4522163 1.475512 2.954416
knnregpred |
125 2.184309 .3412013 1.529874 2.802865
. corr knnlowesspred knnregpred
203

(obs=125)
| knnlow~d knnreg~d
-------------+-----------------knnlowessp~d | 1.0000
knnregpred | 1.0000 1.0000

.
. ********* ANALYSIS: (6) EXTRA: KERNEL NONPARAMETRIC REGRESSION
.
. * KERNEL REGRESSION
. * Kercode 1 = Uniform; 2 = Triangle; 3 = Epanechnikov; 4 = Quartic (Biweight);
.*
5 = Triweight; 6 = Gaussian; 7 = Cosinus
. * bwidth(#) defines width of the weight function window around each grid point.
. * npoint(#) specifies the number of equally spaced grid points over range of x.
. * Here bwidth(3) gives e.g. positive weight from x=4 to x=10 if current x0=7
. kernreg lnhwage educatn, bwidth(3) kercode(3) npoint(100) ylabel gen(kernregpred1 xkernreg)
. graph twoway (lowess lnhwage educatn, bwidth(0.5) clstyle(p2)) /*
> */ (line kernregpred xkernreg, clstyle(p1)), /*
> */ title("Lowess versus kernel regression") /*
> */ legend( label(1 "Lowess") label(2 "Kernreg"))
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma09p1np.txt
log type: text
closed on: 17 May 2005, 14:17:05
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma09p2npmore.txt
log type: text
opened on: 17 May 2005, 14:17:35
.
. ********** OVERVIEW OF MMA09P2NPMORE.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 9.4-9.5 (pages 307-19)
. * More on nonparametric regression, including Figures 9.5 - 9.7
.
. * It provides
. * (1) Nonparametric regression
.*
k-nearest neighbors regression: Figure 9.5 in chapter 9.4.2 (ch9ksmma)
204

.*
Lowess regression: Figure 9.6 in chapter 9.4.3 (ch9ksmlowess)
.*
Kernel regression (using Stata add-on kernreg)
. * (2) Nonparametric derivative estimation
.*
Figure 9.7 in chapter 9.5.5 (ch9kderiv)
. * (3) Cross-validation - still incomplete
. * using generated data (see below)
.
. * See also mma09p1np.do for nonparametric density estimation and regression
.
. * This program uses free Stata add-on command kernreg
. * To obtain in Stata give command search kernreg
.
. ********** SETUP **********
.
. di "mma09p2npmore.do Cameron and Trivedi: Stata nonparametrics with generated data"
mma09p2npmore.do Cameron and Trivedi: Stata nonparametrics with generated data
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** GENERATE DATA **********
.
. * Model is y = 150 + 6.5*x - 0.15*x^2 + 0.001*x^3 + u
. * where u ~ N[0, 25^2]
.*
x = 1, 2, 3, ... , 100
.*
e ~ N[0, 2^2]
.
. set seed 10101
. set obs 100
obs was 0, now 100
. gen u = 25*invnorm(uniform())
. gen x = _n
. gen y = 150 + 6.5*x - 0.15*x^2 + 0.001*x^3 + u
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 2.809606 25.26291 -71.97334 73.59318
x|
100
50.5 29.01149
1
100
y|
100 228.5596 35.25377 132.2952 345.5873
.
205

. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x using mma09p2npmore.asc, replace
.
. ******** PARAMETRIC REGRESSION **********
.
. * OLS regression on cubic polymomial
. gen xsquared = x^2
. gen xcubed = x^3
. reg y x xsquared xcubed
Source |
SS
df
MS
Number of obs = 100
-------------+-----------------------------F( 3, 96) = 31.15
Model | 60691.6801 3 20230.56
Prob > F
= 0.0000
Residual | 62348.2994 96 649.461452
R-squared = 0.4933
-------------+-----------------------------Adj R-squared = 0.4774
Total | 123039.98 99 1242.82808
Root MSE
= 25.485
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | 6.055295 .9033915 6.70 0.000 4.262077 7.848513
xsquared | -.1402283 .0207284 -6.77 0.000 -.1813738 -.0990828
xcubed | .0009492 .0001349 7.03 0.000 .0006814 .0012171
_cons | 155.1521 10.58835 14.65 0.000 134.1344 176.1698
-----------------------------------------------------------------------------. predict ycubic
(option xb assumed; fitted values)
. summarize y ycubic
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------y|
100 228.5596 35.25377 132.2952 345.5873
ycubic |
100 228.5596 24.75979 161.0681 307.6293
.
. ******** (1) NONPARAMETRIC REGRESSION **********
.
. * K-NEAREST NEIGHBORS REGRESSION - FIGURE 9.5
. * ksm without options gives running mean = moving average = centered kNN
. * Here _N = 100 so bwidth = 0.05 gives 100*0.05 = 5 nearest neighbours
. graph twoway (scatter y x, msize(medsmall) msymbol(o)) /*
> */ (lowess y x, mean noweight bwidth(0.05) clstyle(p1)) /*
> */ (lfit y x, clstyle(p3)) /*
> */ (lowess y x, mean noweight bwidth(0.25) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("k-Nearest Neighbours Regression as k Varies") /*
206

>
>
>
>
>

*/ xtitle("Regressor x", size(medlarge)) xscale(titlegap(*5)) /*


*/ ytitle("Dependent variable y", size(medlarge)) yscale(titlegap(*5)) /*
*/ legend(pos(12) ring(0) col(1)) legend(size(small)) /*
*/ legend( label(1 "Actual Data") label(2 "kNN (k=5)") /*
*/
label(3 "Linear OLS") label(4 "kNN (k=25)"))

. graph save ch9ksmma, replace


(file ch9ksmma.gph saved)
. graph export ch9ksmma.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9ksmma.wmf written in Windows Metafile format)
.
. * VERIFY THAT kNN SAME AS MOVING AVERAGE
. * Do moving average by hand for k = 5
. gen yma5 = (y[_n-2] + y[_n-1] + y + y[_n+1] + y[_n+2])/5
(4 missing values generated)
. replace yma5 = (y[_n]+y[_n+1]+y[_n+2])/3 if _n==1
(1 real change made)
. replace yma5 = (y[_n-1]+y[_n]+y[_n+1]+y[_n+2])/4 if _n==2
(1 real change made)
. replace yma5 = (y[_n+1]+y[_n]+y[_n-1]+y[_n-2])/4 if _n==99
(1 real change made)
. replace yma5 = (y[_n]+y[_n-1]+y[_n-2])/3 if _n==100
(1 real change made)
. lowess y x, mean noweight bwidth(0.05) nogr gen(yknn5)
. sum yma5 yknn5
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------yma5 |
100 228.6037 26.63323 157.1434 297.4832
yknn5 |
100 228.6037 26.63323 157.1434 297.4832
.
. * LOWESS REGRESSION - FIGURE 9.6
. graph twoway (scatter y x, msize(medsmall) msymbol(o)) /*
> */ (lowess y x, bwidth(0.25) clstyle(p1)) /*
> */ (line ycubic x, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Lowess Nonparametric Regression") /*
> */ xtitle("Regressor x", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Dependent variable y", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(12) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Actual Data") label(2 "Lowess (k=25)") /*
> */
label(3 "OLS Cubic Regression") )
207

. graph save ch9ksmlowess, replace


(file ch9ksmlowess.gph saved)
. graph export ch9ksmlowess.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9ksmlowess.wmf written in Windows Metafile format)
.
. * KERNEL REGRESSION COMPARED TO k NEAREST NEIGHBORS REGRESSION
. * For this artificial example (with equally spaced x)
. * knn = kernel regression using uniform prior
. * Kercode 1 = Uniform; 2 = Triangle; 3 = Epanechnikov; 4 = Quartic (Biweight);
.*
5 = Triweight; 6 = Gaussian; 7 = Cosinus
. * bwidth(#) defines width of the weight function window around each grid point.
. * npoint(#) specifies the number of equally spaced grid points over range of x.
. * Here bwidth(12) gives e.g. positive weight from x=15 to x=39 if current x=37
. kernreg y x, bwidth(12) kercode(1) npoint(100) ylabel gen(pykernreg xkernreg)
. lowess y x, mean noweight bwidth(0.25) gen(yknn25)
. sum pykernreg yknn25
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pykernreg |
100 228.6856 18.75275 181.1579 272.5488
yknn25 |
100 228.6856 18.75275 181.1578 272.5488
.
. ******** (2) DERIVATIVE ESTIMATION **********
.
. * DERIVATIVE ESTIMATION - FIGURE 9.7
. * Here use Lowess regression
. lowess y x, xlab ylab bwidth(0.25) lowess nogr gen(yplowess)
. * Need to first sort data on regressor if data on regressor are not ordered
. sort x
. gen dydxlowess = (yplowess - yplowess[_n-1])/(x - x[_n-1])
(1 missing value generated)
. * And do the same for the earlier fitted cubic
. gen dydxcubic = (ycubic - ycubic[_n-1])/(x - x[_n-1])
(1 missing value generated)
. graph twoway (line dydxlowess x, clstyle(p1)) /*
> */ (line dydxcubic x, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Nonparametric Derivative Estimation") /*
> */ xtitle("Regressor x", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Dependent variable y", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(12) ring(0) col(1)) legend(size(small)) /*
208

> */ legend( label(1 "From Lowess (k=25)") /*


> */ label(2 "From OLS Cubic Regression") )
. graph save ch9kderiv, replace
(file ch9kderiv.gph saved)
. graph export ch9kderiv.wmf, replace
(file c:\Imbook\bwebpage\Section2\ch9kderiv.wmf written in Windows Metafile format)
.
. ******** (3) CROSS-VALIDATION [PRELIMINARY] **********
.
. /* The following does not work.
> I need to figure out use of macros */
.
. forvalues i = 5/25 {
2. scalar bd`i' = 0.01*`i'
3. global bw`i' = bd`i'
4. lowess y x, mean noweight bwidth($bw`i') gen(py`i') nogr
5. gen cv`i' = sum(3/2*(y-py`i')^2)
6. }
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 2.809606 25.26291 -71.97334 73.59318
x|
100
50.5 29.01149
1
100
y|
100 228.5596 35.25377 132.2952 345.5873
xsquared |
100
3383.5 3024.356
1
10000
xcubed |
100
255025 289320.7
1 1000000
-------------+-------------------------------------------------------ycubic |
100 228.5596 24.75979 161.0681 307.6293
yma5 |
100 228.6037 26.63323 157.1434 297.4832
yknn5 |
100 228.6037 26.63323 157.1434 297.4832
pykernreg |
100 228.6856 18.75275 181.1579 272.5488
xkernreg |
100
50.5 29.01149
1
100
-------------+-------------------------------------------------------yknn25 |
100 228.6856 18.75275 181.1578 272.5488
yplowess |
100 228.6494 25.46305 156.8217 302.5474
dydxlowess |
99 1.471977 2.20262 -1.953159 6.964434
dydxcubic |
99 1.480416 2.100452 -.8495026 6.342957
py5 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv5 |
100 84655.13 34359.8 10940.13 162417.9
py6 |
100 228.0408 8.046055 217.6967 243.0812
cv6 |
100 84655.13 34359.8 10940.13 162417.9
py7 |
100 228.0408 8.046055 217.6967 243.0812
cv7 |
100 84655.13 34359.8 10940.13 162417.9
-------------+-------------------------------------------------------py8 |
100 228.0408 8.046055 217.6967 243.0812
209

cv8 |
100 84655.13 34359.8 10940.13 162417.9
py9 |
100 228.0408 8.046055 217.6967 243.0812
cv9 |
100 84655.13 34359.8 10940.13 162417.9
py10 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv10 |
100 84655.13 34359.8 10940.13 162417.9
py11 |
100 228.0408 8.046055 217.6967 243.0812
cv11 |
100 84655.13 34359.8 10940.13 162417.9
py12 |
100 228.0408 8.046055 217.6967 243.0812
cv12 |
100 84655.13 34359.8 10940.13 162417.9
-------------+-------------------------------------------------------py13 |
100 228.0408 8.046055 217.6967 243.0812
cv13 |
100 84655.13 34359.8 10940.13 162417.9
py14 |
100 228.0408 8.046055 217.6967 243.0812
cv14 |
100 84655.13 34359.8 10940.13 162417.9
py15 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv15 |
100 84655.13 34359.8 10940.13 162417.9
py16 |
100 228.0408 8.046055 217.6967 243.0812
cv16 |
100 84655.13 34359.8 10940.13 162417.9
py17 |
100 228.0408 8.046055 217.6967 243.0812
cv17 |
100 84655.13 34359.8 10940.13 162417.9
-------------+-------------------------------------------------------py18 |
100 228.0408 8.046055 217.6967 243.0812
cv18 |
100 84655.13 34359.8 10940.13 162417.9
py19 |
100 228.0408 8.046055 217.6967 243.0812
cv19 |
100 84655.13 34359.8 10940.13 162417.9
py20 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv20 |
100 84655.13 34359.8 10940.13 162417.9
py21 |
100 228.0408 8.046055 217.6967 243.0812
cv21 |
100 84655.13 34359.8 10940.13 162417.9
py22 |
100 228.0408 8.046055 217.6967 243.0812
cv22 |
100 84655.13 34359.8 10940.13 162417.9
-------------+-------------------------------------------------------py23 |
100 228.0408 8.046055 217.6967 243.0812
cv23 |
100 84655.13 34359.8 10940.13 162417.9
py24 |
100 228.0408 8.046055 217.6967 243.0812
cv24 |
100 84655.13 34359.8 10940.13 162417.9
py25 |
100 228.0408 8.046055 217.6967 243.0812
-------------+-------------------------------------------------------cv25 |
100 84655.13 34359.8 10940.13 162417.9
. * Then need to choose the `i' with minimum cv`i'
. * Problem here is that this gives e.g. $bw5 = 5 not 0.05
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section2\mma09p2npmore.txt
log type: text
closed on: 17 May 2005, 14:17:43
210

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma09p3kernels.txt
log type: text
opened on: 18 May 2005, 21:31:55
.
. ********** OVERVIEW OF MMA09P3KERNELS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * This program plots different kernel regression functions
. * This is not included in the book
. * There is no data
.
. * Results:
. * Epanstata is similar to Gaussian kernel. Less peaked than Epanechnikov.
. * Triangular, Quartic, Triweight and Tricubic are similar,
. * and are more peaked than Epanechnikov
. * The fourth oreder Kernels can take negative values.
.
. * NOTE: For kernel density Stata uses an alternative formulation of Epanechnikov
.*
To follow book and e.g. Hardle (1990) use epan2
.*
(available in Stata version 8.2) rather than epan
.
. ********** SETUP **********
.
. di "mma09p3kernels.do Cameron and Trivedi: Stata Kernel Functions"
mma09p3kernels.do Cameron and Trivedi: Stata Kernel Functions
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** GENERATE DATA **********
.
. * Graphs will be for z = -2.5 to 2.5 in increments of 0.02
. set obs 251
obs was 0, now 251
. gen z = -2.52 + 0.02*_n
.
. ********** CALCULATE THE KERNELS **********
211

.
. * Indicator for |z| < 1
. gen abszltone = 1
. replace abszltone = 0 if abs(z)>=1
(152 real changes made)
.
. gen kuniform = 0.5*abszltone
.
. gen ktriangular = (1 - abs(z))*abszltone
.
. * Stata calls the usual Epanechnikov kernel epan2
. gen kepanechnikov = (3/4)*(1 - z^2)*abszltone
.
. * Stata uses alternative epanechnikov
. gen abszltsqrtfive = 1
. replace abszltsqrtfive = 0 if abs(z)>=sqrt(5)
(28 real changes made)
. gen kepanstata = (3/4)*(1 - (z^2)/5)/sqrt(5)*abszltsqrtfive
.
. gen kquartic = (15/16)*((1 - z^2)^2)*abszltone
.
. gen ktriweight = (35/32)*((1 - z^2)^3)*abszltone
.
. gen ktricubic = (70/81)*((1 - (abs(z))^3)^3)*abszltone
.
. gen kgaussian = normden(z)
.
. gen k4thordergauss = (1/2)*(3-(z^2))*normden(z)
.
. * This is the optimal 4th order - Pagan and Ullah p.57
. gen k4thorderquartic = (15/32)*(3 - 10*z^2 + 7*z^4)*abszltone
.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------z|
251
0 1.452033
-2.5
2.5
212

abszltone |
251 .3944223 .4897027
0
1
kuniform |
251 .1972112 .2448514
0
.5
ktriangular |
251 .1992032 .3058094
0
1
kepanechni~v |
251 .1991833 .2831384
0
.75
-------------+-------------------------------------------------------abszltsqrt~e |
251 .8884462 .3154457
0
1
kepanstata |
251 .199203 .1175801
0 .3354102
kquartic |
251 .1992032 .3209618
0
.9375
ktriweight |
251 .1992032 .351183
0 1.09375
ktricubic |
251 .1992032 .3191548
0 .8641976
-------------+-------------------------------------------------------kgaussian |
251 .1967985 .1323354 .0175283 .3989423
k4thorderg~s |
251 .2053453 .2297148 -.0327459 .5984134
k4thorderq~c |
251 .199253 .4584096 -.2676096 1.40625
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile z abszltone kuniform ktriangular kepanechnikov abszltsqrtfive /*
> */ kepanstata kquartic ktriweight ktricubic kgaussian /*
> */ k4thordergauss k4thorderquartic using mma09p3kernels.asc, replace
.
. ********** PLOT THE KERNEL FUNCTIONS **********
.
. * Epanstata is similar to Gaussian kernel. Less peaked than Epanechnikov
. graph twoway (line kuniform z) (line kepanechnikov z) (line kepanstata z) /*
> */ (line kgaussian z), title("Four standard kernel functions")
.
. * Triangular, Quartic, Triweight and Tricubic are similar
. * and are more peaked than Epanechnikov
. graph twoway (line ktriangular z) (line kquartic z) (line ktriweight z) /*
> */ (line ktricubic z), title("Four similar kernel functions")
.
. graph twoway (line k4thordergauss z) (line k4thorderquartic z), /*
> */ title("Two fourth order kernel functions")
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma09p3kernels.txt
log type: text
closed on: 18 May 2005, 21:32:00

213

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section2\mma10p1gradient.txt
log type: text
opened on: 17 May 2005, 14:21:11
.
. ********** OVERVIEW OF MMA10P1GRADIENT.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 10.2.4 page 338-9
. * Gradient Method Example (Newton-Raphson)
. * using artificial data
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** ANALYSIS: FIRST SIX ROUNDS OF NR **********
.
. * General Algorithm is
. * b_s+1 = b_s + A_s*g_s
.
. * For this the example in section 10.2.4
. * Q(b) = -(1/2N) * Sum_i {(y_i-exp(b))^2}
.*
= -(1/2N) * Sum_i {(y_i)^2 -2*y_i*exp(b) + exp(b)^2}
.*
= ymean*exp(b) - 0.5*(exp(b))^2 - (1/N) * Sum_i {(y_i)^2}
.
. * so the gradient vector (here a scalar)
.*
g = dQ_s / db
.*
= (ymean - exp(b))*exp(b)
.
. * and using the Method of scoring variation of Newton-Raphson
. * the weighting matrix (here a scalar)
. * A_s = Inv [ - E[d^2 Q_s / db^2 ] ]
. * A_s = Inv [ - E[(ymean - exp(b))*exp(b) - exp(b)*exp(b)] ]
.*
= Inv [ exp(2b) ] since E[(ymean - exp(b)] = 0
.*
= exp(-2b)
.
. * Data
. scalar ymean = 2.0

214

.
. * Starting value
. scalar b_1 = 0.0
.
. * First round
. scalar g_1 = (ymean - exp(b_1))*exp(b_1)
. scalar A_1 = exp(-2*b_1)
. scalar b_2 = b_1 + A_1*g_1
.
. * Second round
. scalar g_2 = (ymean - exp(b_2))*exp(b_2)
. scalar A_2 = exp(-2*b_2)
. scalar b_3 = b_2 + A_2*g_2
.
. * Third round
. scalar g_3 = (ymean - exp(b_3))*exp(b_3)
. scalar A_3 = exp(-2*b_3)
. scalar b_4 = b_3 + A_3*g_3
.
. * Fourth round
. scalar g_4 = (ymean - exp(b_4))*exp(b_4)
. scalar A_4 = exp(-2*b_4)
. scalar b_5 = b_4 + A_4*g_4
.
. * Fifth round
. scalar g_5 = (ymean - exp(b_5))*exp(b_5)
. scalar A_5 = exp(-2*b_5)
. scalar b_6 = b_5 + A_5*g_5
.
. * Sixth round
. scalar g_6 = (ymean - exp(b_6))*exp(b_6)
. scalar A_6 = exp(-2*b_6)
.
215

. * We also calculate the objective function at each round


. * (ignoring the term - (1/N) * Sum_i {(y_i)^2} which does not depend on b)
. scalar Q_1 = ymean*exp(b_1) - 0.5*(exp(b_1))^2
. scalar Q_2 = ymean*exp(b_2) - 0.5*(exp(b_2))^2
. scalar Q_3 = ymean*exp(b_3) - 0.5*(exp(b_3))^2
. scalar Q_4 = ymean*exp(b_4) - 0.5*(exp(b_4))^2
. scalar Q_5 = ymean*exp(b_5) - 0.5*(exp(b_5))^2
. scalar Q_6 = ymean*exp(b_6) - 0.5*(exp(b_6))^2
.
. * DISPLAY THE RESULTS GIVEN IN TABLE 10.1 page 339
. di "Round Estiamte Gradient Weight Function"
Round Estiamte Gradient Weight Function
. di " 1: " b_1 %8.6f " " g_1 %8.6f " " A_1 %8.6f " " Q_1 %8.6f
1: 0 1 1 1.5
. di " 2: " b_2 %8.6f " " g_2 %8.6f " " A_2 %8.6f " " Q_2 %8.6f
2: 1 -1.9524924 .13533528 1.7420356
. di " 3: " b_3 %8.6f " " g_3 %8.6f " " A_3 %8.6f " " Q_3 %8.6f
3: .73575888 -.18171081 .22957678 1.9962098
. di " 4: " b_4 %8.6f " " g_4 %8.6f " " A_4 %8.6f " " Q_4 %8.6f
4: .6940423 -.00358529 .24955284 1.9999984
. di " 5: " b_5 %8.6f " " g_5 %8.6f " " A_5 %8.6f " " Q_5 %8.6f
5: .69314758 -1.602e-06 .2499998 2
. di " 6: " b_6 %8.6f " " g_6 %8.6f " " A_6 %8.6f " " Q_6 %-8.6f
6: .69314718 -3.206e-13 .25 2
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section2\mma10p1gradient.txt
log type: text
closed on: 17 May 2005, 14:21:11
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma11p1boot.txt
log type: text
opened on: 18 May 2005, 15:52:55
.
. ********** OVERVIEW OF MMA11P1BOOT.DO **********
216

.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 11.3 pages 366-368
. * Bootstrap applied to exponential regression model
. * Provides
. * (1) Bootstrap distribution of beta and t-statistic (Table 11.1)
. * (2) Various statistics from bootstrap (pages 366-8)
. * (3) Bootstrap density of the t-statistic (Figure 11.1)
. * using generated data (see below)
.
. * Note: To speed up progam reduce breps - the number of bootstrap replications
.*
But final program should use many repications
.
. * Note: This program uses ereg which is an old Stata command
.*
superceded by streg, dist(exp)
.
. * Note: For bootstrap see also mm07p4boot.do
.*
which has additional commands / ways to bootstrap
.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** GENERATE DATA **********
.
. * Model is y ~ exponential(exp(a + bx + cz))
. * where x and z are joint normal (1,1,0.1,0.1,0.5)
. * i.e. means 0.1 and 0.1
.*
sd's 0.1 and 0.1 and correln 0.5 (so correln^2 = .25)
. * variances 0.01 and 0.01 and covariance 0.005
.
. * Generate data from joint normal
. * Use fact that x is N(mu0.1,0.1)
.*
and z | x is N(0.1 + .05/.1*(x - .1), .01x.75 = .0075)
.*
so that st dev = sqrt(0.0075) = 0.0866025
.
. set obs 50
obs was 0, now 50
. set seed 10001
. * Generate x and z bivariate normal
. scalar mu1=0.1
217

. scalar mu2=0.1
. scalar sig1=0.1
. scalar sig2=0.1
. scalar rho=0.5
. scalar sig12=rho*sig1*sig2
. gen x = mu1 + sig1*invnorm(uniform())
. gen muzgivx = mu2+(sig12/(sig2*sig2))*(x-mu1)
. gen sigzgivx = sqrt(sig2*sig2*(1-rho*rho))
. gen z = muzgivx + sigzgivx*invnorm(uniform())
. * To generate y exponential with mean mu=Ey use
. * Integral 0 to a of (1/mu)exp(-x/mu) dx by change of variables
. * = Integral 0 to a/mu of exp(-t)dt
. * = incomplete gamma function P(0,a/mu) in the terminology of Stata
. gen Ey = exp(-2.0+2*x+2*z)
. gen y = Ey*invgammap(1,uniform())
. gen logy = log(y)
.
. * Descriptive Statistics
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------x|
50 .0935209 .1031485 -.1173506 .2778609
muzgivx |
50 .0967604 .0515742 -.0086753 .1889304
sigzgivx |
50 .0866025
0 .0866025 .0866025
z|
50 .1033014 .0909297 -.0885447 .3137469
Ey |
50 .2114837 .071719 .0945722 .4314067
-------------+-------------------------------------------------------y|
50 .2024206 .2237202 .0005293 .9601147
logy |
50 -2.282336 1.45494 -7.543878 -.0407026
. ereg y x z
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -84.246434


log likelihood = -80.068104
log likelihood = -79.871694
log likelihood = -79.871338
log likelihood = -79.871338
218

Exponential regression -- entry time 0


log expected-time form
Number of obs =
LR chi2(2)
=
8.75
Log likelihood = -79.871338
Prob > chi2 =

50
0.0126

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------x | .2670543 1.417339 0.19 0.851 -2.510879 3.044988
z | 4.663384 1.740712 2.68 0.007 1.251652 8.075117
_cons | -2.191619 .2328589 -9.41 0.000 -2.648014 -1.735224
-----------------------------------------------------------------------------.
. save mma11p1boot, replace
file mma11p1boot.dta saved
.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile y x z using mma11p1boot.asc, replace
.
. ********** SIMPLE BOOTSTRAP **********
.
. * Stata produces four bootstrap 100*(1-alpha) confidence intervals
. * (N) and (P) have no asymptotic refinement
. * (BC)-(BCA) have asymptotic refinement
. * For details see program mma07p4boot.do
.
. * Change the following for different number of simulations S
. * From page 399, for testing better to use 999 than 1000
. global breps = 999 /* The number of bootstrap reps used below */
.
. set seed 20001
.
. * A simple and adequate bootstrap command for the slope coefficients is
. bs "ereg y x z" "_b[x] _b[z]", reps($breps) level(95)
command:
ereg y x z
statistics: _bs_1
= _b[x]
_bs_2
= _b[z]
Bootstrap statistics

Number of obs =
Replications =
999

50

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------219

_bs_1 | 999 .2670543 -.1885509 1.420956 -2.52135 3.055458 (N)


|
-2.9054 2.696445 (P)
|
-2.590993 2.864327 (BC)
_bs_2 | 999 4.663384 .0524786 1.939086 .8582302 8.468539 (N)
|
.5006047 8.483892 (P)
|
.231034 8.174835 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected
.
. ********** MORE DETAILED BOOTSTRAP **********
.
. * The following bootstrap also gives standard error at each replication
. * and saves data from replications for further analysis
.
. * In partiulcar, want to use the percentile-t method,
. * which provides asymtptotic refinement
.
. * Stata does not give this. For methods see
. * e.g. Efron and Tibsharani (1993, pp.160-162)
. * e.g. Cameron and Trivedi (2005) Chapter 11.2.6-11.2.7
. * For sample s compute t-test(s) = (bhat(s)-bhat) / se(s)
. * where bhat is initial estimate
. * and bhat(s) and se(s) are for sth round.
. * Order the t-test(s) statistics and choose the alpha/2 percentiles
. * which give the critical values for the t-test
.
. * Implementation requires saving the results from each bootstrap replication
. * in order to obtain ccritical values from percentiles of bootstrap distribution
.
. use mma11p1boot.dta, clear
.
. * Get and store coefficients (b)
. * for regressors in the original model and data before bootstrap
. quietly ereg y x z
. global bx=_b[x]
. global sex=_se[x]
. global bz=_b[z]
. global sez=_se[z]
. di " Coefficients bx: " $bx " and bz: " $bz
Coefficients bx: .26705432 and bz: 4.6633845
. di " Standard error sex: " $sex " and sez: " $sez
220

Standard error sex: 1.4173391 and sez: 1.7407119


.
. * Bootstrap and save coeff estimates and se's from each replication
. set seed 20001
. bs "ereg y x z" "_b[x] _b[z] _se[x] _se[z]", reps($breps) level(95) saving(mma11p1bootreps) repl
> ace
command:
ereg y x z
statistics: _bs_1
= _b[x]
_bs_2
= _b[z]
_bs_3
= _se[x]
_bs_4
= _se[z]
Bootstrap statistics

Number of obs =
Replications =
999

50

-----------------------------------------------------------------------------Variable | Reps Observed


Bias Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 | 999 .2670543 -.1885509 1.420956 -2.52135 3.055458
|
-2.9054 2.696445 (P)
|
-2.590993 2.864327 (BC)
_bs_2 | 999 4.663384 .0524786 1.939086 .8582302 8.468539
|
.5006047 8.483892 (P)
|
.231034 8.174835 (BC)
_bs_3 | 999 1.417339 .0644196 .1718393 1.080131 1.754547
|
1.234399 1.902349 (P)
|
1.196068 1.742845 (BC)
_bs_4 | 999 1.740712 .0910103 .186631 1.374478 2.106946
|
1.542322 2.257937 (P)
|
1.453673 2.058318 (BC)
-----------------------------------------------------------------------------Note: N = normal
P = percentile
BC = bias-corrected

(N)

(N)

(N)

(N)

.
. * Now use the bootstrap estimates
. use mma11p1bootreps, clear
(bootstrap: ereg y x z)
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------_bs_1 |
999 .0785034 1.420956 -9.431229 4.278278
_bs_2 |
999 4.715863 1.939086 -1.747643 12.09208
_bs_3 |
999 1.481759 .1718393 1.145421 2.761842
_bs_4 |
999 1.831722 .186631 1.387625 2.910449
221

. * Order comes from "_b[x] _b[z] _se[x] _se[z]" in earlier bs


. gen bxs = _bs_1
. gen bzs = _bs_2
. gen sexs = _bs_3
. gen sezs = _bs_4
. gen ttestxs = (bxs - $bx)/sexs
. gen ttestzs = (bzs - $bz)/sezs
.
. ********** (1) TABLE 11.1 (page 367)
.
. summarize bzs ttestzs, d
bzs
------------------------------------------------------------Percentiles
Smallest
1% -.3361366
-1.747643
5% 1.544816
-1.716207
10% 2.270323 -1.366866
Obs
999
25% 3.570291 -1.205571
Sum of Wgt.
999
50%
75%
90%
95%
99%

4.77197
Mean
4.715863
Largest
Std. Dev.
1.939086
5.970802
10.10243
7.100958
10.42623
Variance
3.760056
7.810663
10.76733
Skewness
-.1344324
9.426978
12.09208
Kurtosis
3.545415

ttestzs
------------------------------------------------------------Percentiles
Smallest
1% -2.66391 -3.921595
5% -1.727528
-3.483456
10% -1.32364 -3.201425
Obs
999
25% -.6209012 -2.975815
Sum of Wgt.
999
50%
75%
90%
95%
99%

.0618649
Mean
.0261125
Largest
Std. Dev.
1.046855
.7034938
2.693856
1.323415
3.087892
Variance
1.095904
1.70558
3.11692
Skewness
-.1596043
2.529097
3.738328
Kurtosis
3.337749

.
. * Additionally need the 2.5 and 97.5 percentiles not given in summarize, d
222

.
. * Coefficient of z
. _pctile bzs, p(2.5,97.5)
. di " Lower 2.5 and upper 2.5 percentile of coeff b for z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of coeff b for z: .50060469 and 8.4838924
.
. * t-statistic for z
. _pctile ttestzs, p(2.5,97.5)
. di " Lower 2.5 and upper 2.5 percentile of ttest on z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of ttest on z: -2.1827998 and 2.0659592
.
. ********** (2) RESULTS IN TEXT PAGES 366-7 **********
.
. * (2A) Bootstrap standard error estimate (no refinement)
. * These are given earlier in bootstrap table output
. * Equivalently get the standard deviation of bzs
.
. quietly sum bzs
. scalar bzbootse = r(sd)
. di "Bootstrap estimate of standard error: " bzbootse
Bootstrap estimate of standard error: 1.9390864
.
. * (2B) Test b3 = 0 using percentile-t method (asymptotic refinement)
. * Use the 2.5% and 97.5% bootstrap critical values for t-statistic for z
.
. _pctile ttestzs, p(2.5,97.5)
. di " Lower 2.5 and upper 2.5 percentile of ttest on z: " r(r1) " and " r(r2)
Lower 2.5 and upper 2.5 percentile of ttest on z: -2.1827998 and 2.0659592
.
. * (2D) 95% confidence interval with asymptotic refinement
. * Use the preceding critical values
.
. scalar lbz = $bz + r(r1)*$sez /* Note the plus sign here */
. scalar ubz = $bz + r(r2)*$sez
. di " Percentile-t interval lower and upper bounds: (" lbz "," ubz ")"
Percentile-t interval lower and upper bounds: (.86375888,8.2596243)
.
. * (2B-Var) Variation for symmetric two-sided test on z
.
223

. gen absttestzs = abs(ttestzs)


. _pctile absttestzs, p(95)
. di " Upper 5 percentile of symmetric two-sided test on z: " r(r1) "
Upper 5 percentile of symmetric two-sided test on z: 2.0775187
.
. * (2C) Test b3 = 0 without asymptotic refinement
. * Usual Wald test except use bootstrap estimate of standard error
.
. scalar Wald = ($bz - 0) / bzbootse
. di "Wald statistic using bootstrap standard error: " Wald
Wald statistic using bootstrap standard error: 2.404939
.
. * (2E) Bootstrap estimate of bias
. * This is given in the earlier bootstrap results table
. * and is explained in the text
.
. ********** (3) FIGURE 11.1 (p.368) PLOTS ESTIMATED DENSITY OF T-STATISTIC FOR
Z
.
. set scheme s1mono
. label var ttestzs "Bootstrap t-statistic"
. kdensity ttestzs, normal /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Bootstrap Density of 't-Statistic'") /*
> */ xtitle("t-statistic from each bootstrap replication", size(medlarge)) xscale(titlegap(*5)) /*
>
> */ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Bootstrap Estimate") label(2 "Standard Normal"))
. graph save ch11boot, replace
(file ch11boot.gph saved)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma11p1boot.txt
log type: text
closed on: 18 May 2005, 15:53:47

224

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma12p1integration.txt
log type: text
opened on: 18 May 2005, 21:17:14
.
. ********** OVERVIEW OF MMA12P1INTEGRATION.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 12.3.3 pages 391-2
. * Computes integral numerically and by simulation
. * (1) Illustrate Midpoint Rule (page 392)
. * (2) Illustrate Monte Carlo integral (Table 12.1 page 392)
.*
. * for computing E[x] and E[exp(-exp(x))] for x ~ N[0,1]
.
. * No data need be read in.
.
. ********** SETUP **********
.
. set more off
. version 8.0
.
. ********** (1) NUMERICAL INTEGRATION USING MIDPOINT RULE **********
.
. * Midpoint rule for n evaluation points between a and b is
. * Integral = Sum (j=1 to n) [(b-a)/n]*f(xbar_j)
. * where xbar_j is midpoint between x_j-1 and x_j
.
. program midpointrule, rclass
1. version 8
2. /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */
. args neval a b
3. drop _all
4. scalar increment = (`b'-`a') / `neval'
5. set obs `neval'
6. /* Compute the function of interest */
. gen xbar = `a' - 0.5*increment + increment*_n
7. gen density = exp(-xbar*xbar/2)/sqrt(2*_pi)
8. * Following is contribution to E[x] when x ~ N[0,1]
. gen f1xbar = xbar*density
9. * Following is contribution to E[exp(-exp(x))] when x ~ N[0,1]
. gen f2xbar = exp(-exp(x))*density
10. /* Compute the averages */
225

. quietly sum f1xbar


11. scalar Ex = r(sum)*increment
12. quietly sum f2xbar
13. scalar Eexpminexpx = r(sum)*increment
14. /* Print results */
. di "Evaluation points: " `neval' " over range: (" `a' "," `b' ")
15. di "Midpoint rule estimate of E[x] is: " Ex
16. di "Midpoint rule estimate of E[exp(-exp(x))] is: " Eexpminexpx
17. end
.
. midpointrule 20 -5 5
obs was 0, now 20
Evaluation points: 20 over range: (-5,5)
Midpoint rule estimate of E[x] is: 0
Midpoint rule estimate of E[exp(-exp(x))] is: .38175625
. midpointrule 200 -5 5
obs was 0, now 200
Evaluation points: 200 over range: (-5,5)
Midpoint rule estimate of E[x] is: 0
Midpoint rule estimate of E[exp(-exp(x))] is: .38175618
. midpointrule 2000 -5 5
obs was 0, now 2000
Evaluation points: 2000 over range: (-5,5)
Midpoint rule estimate of E[x] is: 0
Midpoint rule estimate of E[exp(-exp(x))] is: .38175618
.
. ********** (2) MONTE CARLO INTEGRATION USING DRAWS FROM DENSITY OF X
**********
.
. * To get E[g(x)]
. * make draws from N[0,1], compute g(x), and average over draws
.
. program simintegration, rclass
1. version 8
2. /* define arguments. Here trueb2 = b2 in Phi(b1 + b2*x2) */
. args nsims
3. /* Generate the data: here x */
. drop _all
4. set obs `nsims'
5. set seed 10101
6. gen x = invnorm(uniform())
7. /* Compute the function of interest */
. gen f1x = x /* For E[x] just need x */
8. gen f2x = exp(-exp(x)) /* For E[exp(-exp(x))] */
9. /* Compute the averages */
. quietly sum f1x
10. scalar Ex = r(mean)
226

11. quietly sum f2x


12. scalar Eexpminexpx = r(mean)
13. di "Number of simulations: " `nsims'
14. di "Monte Carlo estimate of E[x] is: " Ex
15. di "Monte Carlo estimate of E[exp(-exp(x))] is: " Eexpminexpx
16. end
.
. * Note a different program was used to obtain Table 12.1 on page 392
. * So results will differ somewhat from text, except for very high number of simulations
.
. simintegration 10
obs was 0, now 10
Number of simulations: 10
Monte Carlo estimate of E[x] is: -.10143571
Monte Carlo estimate of E[exp(-exp(x))] is: .42635197
. simintegration 25
obs was 0, now 25
Number of simulations: 25
Monte Carlo estimate of E[x] is: .17496346
Monte Carlo estimate of E[exp(-exp(x))] is: .35703296
. simintegration 50
obs was 0, now 50
Number of simulations: 50
Monte Carlo estimate of E[x] is: .0079132
Monte Carlo estimate of E[exp(-exp(x))] is: .37966293
. simintegration 100
obs was 0, now 100
Number of simulations: 100
Monte Carlo estimate of E[x] is: .11238423
Monte Carlo estimate of E[exp(-exp(x))] is: .3524417
. simintegration 500
obs was 0, now 500
Number of simulations: 500
Monte Carlo estimate of E[x] is: .06990338
Monte Carlo estimate of E[exp(-exp(x))] is: .36137551
. simintegration 1000
obs was 0, now 1000
Number of simulations: 1000
Monte Carlo estimate of E[x] is: .04309113
Monte Carlo estimate of E[exp(-exp(x))] is: .36945581
. simintegration 1000
obs was 0, now 1000
Number of simulations: 1000
Monte Carlo estimate of E[x] is: .04309113
227

Monte Carlo estimate of E[exp(-exp(x))] is: .36945581


. simintegration 100000
obs was 0, now 100000
Number of simulations: 100000
Monte Carlo estimate of E[x] is: -.00405425
Monte Carlo estimate of E[exp(-exp(x))] is: .38284684
. clear
. set mem 20m
(20480k)
. simintegration 1000000
obs was 0, now 1000000
Number of simulations: 1000000
Monte Carlo estimate of E[x] is: -.00085186
Monte Carlo estimate of E[exp(-exp(x))] is: .38192861
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma12p1integration.txt
log type: text
closed on: 18 May 2005, 21:17:16
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma12p2mslmsm.txt
log type: text
opened on: 18 May 2005, 21:46:27
.
. ********** OVERVIEW OF MMA12P2MSLMSM.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 12.4.5 pages 397-8 and 12.5.5 pages 402-4
. * Computes integral numerically and by simulation
. * (1) Maximum Simulated likelihood Table 12.2
. * (2) Method of Simulated Moments Table 12.3
. * with application to generated data
.
. * The application is only illustrative.
. * This is not a template program for MSL or MSM.
.
. * Different number of simulations S lead to different estimators.
. * This program gives entries in Tables 12.2 and 12.3 for S = 100
. * For other values of S change the value of simreps
228

. * from the current global simreps 100


.
. ********** SETUP **********
.
. set more off
. version 8
.
. ********** DATA DESCRIPTION **********
.
. * Model is y = theta + u + e
. * where theta is a scalar parameter equal to 1
.*
u is extreme value type 1
.*
e is N(0,1)
. * n is set in global numobs
.
. ********** DEFINE GLOBALS **********
.
. global simreps 100 /* change this to change the number of simulations */
. global numobs 100 /* change this to change the number of observations */
.
.
. ********** (1) MAXIMUM SIMULATED LIKELIHOOD (Table 12.2 p.398) **********
.
. * This MSL program is inefficiently written computer code
. * as it requires drawing the same random variates at each iteration
.
. * Generate data
. clear
. set obs $numobs
obs was 0, now 100
. set seed 10101
. gen u = -log(-log(uniform()))
. gen e = invnorm(uniform())
. gen y = 1 + u + e
. summarize u e y
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 .7236045 1.372637 -1.827296 6.423636
e|
100 .0415449 .9472174 -2.906972 2.302204
y|
100 1.765149 1.684177 -2.227185 8.143228
229

.
. * Write data to a text (ascii) file so can use with programs other than Stata
. outfile u e y using mma12p2mslmsm.asc, replace
.
. * Use the variant ml d0 as this gives the entire likelihood, not just one observation.
. * I want this so that seed is only reset for the entire data.
. * My program is inefficient as variates needs to be redrawn at each iteration
. program define msl
1. version 6.0
2. args todo b lnf
/* Need to use the names todo b and lnf
>
todo always contains 1 and may be ignored
>
b is parameters and lnf is log-density */
3. tempvar theta1
/* create as needed to calculate lf, g, ... */
4. mleval `theta1' = `b', eq(1) /* theta1 is theta1_i = x_i'b
*/
5. local y "$ML_y1"
/* create to make program more readable */
6. set seed 10101
7. tempvar denssim
8. global isim=1
9. quietly gen `denssim' = exp(-0.5*(`y'-`theta1'+log(-log(uniform())))^2)/sqrt(2*_pi)
10. while $isim < $simreps {
11.
quietly replace `denssim' = `denssim' + exp(-0.5*(`y'-`theta1'+log(-log(uniform())))^2)/sq
> rt(2*_pi)
12. global isim=$isim+1
13. }
14. mlsum `lnf' = ln(`denssim'/$isim)
15. end
.
. gen one = 1
. ml model d0 msl (y = one, nocons )
. ml maximize
initial:
log likelihood = -216.68168
alternative: log likelihood = -199.54479
rescale:
log likelihood = -191.09715
Iteration 0: log likelihood = -191.09715
Iteration 1: log likelihood = -190.4391 (not concave)
Iteration 2: log likelihood = -190.43885
Iteration 3: log likelihood = -190.4385
Iteration 4: log likelihood = -190.4385

Log likelihood = -190.4385

Number of obs =
100
Wald chi2(1) =
65.72
Prob > chi2 =

0.0000

-----------------------------------------------------------------------------y|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
230

-------------+---------------------------------------------------------------one | 1.177456 .1452451 8.11 0.000 .8927806 1.462131


-----------------------------------------------------------------------------.
. *** Display MSL results in one column of Table 12.2 p.398
.
. di "For number of simulations S = " $simreps
For number of simulations S = 100
. di "MSL estimator: " _b[one]
MSL estimator: 1.1774557
. di "Standard error: " _se[one]
Standard error: .14524511
.
. ********** (2) METHOD OF SIMULATED MOMENTS (Table 12.3 p.404) **********
.
. clear
. set obs $numobs
obs was 0, now 100
. set seed 10101
. gen u = -log(-log(uniform()))
. gen e = invnorm(uniform())
. gen y = 1 + u + e
. summarize u e y
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 .7236045 1.372637 -1.827296 6.423636
e|
100 .0415449 .9472174 -2.906972 2.302204
y|
100 1.765149 1.684177 -2.227185 8.143228
.
. global isim=1
. gen usim = -log(-log(uniform()))
. gen esim = invnorm(uniform())
. while $isim < $simreps {
2. quietly replace usim = usim-log(-log(uniform()))
3. quietly replace esim = esim+invnorm(uniform())
4. global isim=$isim+1
231

5. }
. gen usimbar = usim/$isim
. gen esimbar = esim/$isim
. gen theta = y - usimbar - esimbar
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------u|
100 .7236045 1.372637 -1.827296 6.423636
e|
100 .0415449 .9472174 -2.906972 2.302204
y|
100 1.765149 1.684177 -2.227185 8.143228
usim |
100 57.36345 13.16979 21.96637 90.07499
esim |
100 -.9702956 11.38655 -26.38858 33.28406
-------------+-------------------------------------------------------usimbar |
100 .5736345 .1316979 .2196637 .9007499
esimbar |
100 -.009703 .1138655 -.2638858 .3328406
theta |
100 1.201218 1.681435 -2.757669 7.75245
.
. * Results for Table 12.3 on page 404
. * Here the st.eror of theta_MSM is approximated by the st. dev. of theta
. * divided by the square root of S (the number of simulations)
. quietly sum theta
. scalar theta_MSM = r(mean)
. scalar approx_sterror = r(sd)/sqrt($simreps)
.
. * Display MSM results in one column of Table 12.3 p.404
. di "For number of simulations S = " $simreps
For number of simulations S = 100
. di "MSM estimator: " theta_MSM
MSM estimator: 1.2012178
. di "Approximate standard error: " approx_sterror
Approximate standard error: .16814348
.
. * As written this will not give the correct standard errors (see p.403).
. * Can get this by also computing the squared rv to get E[y^2]
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma12p2mslmsm.txt
log type: text
232

closed on: 18 May 2005, 21:46:28


-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma12p3draws.txt
log type: text
opened on: 18 May 2005, 21:48:36
.
. ********** OVERVIEW OF MMA12P3DRAWS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 12.8.2 pages 412-5
. * Draws figures that illustrate two common ways to draw random variates
.
. * (1) Illustrate Inverse Transformation method: Figure 12.2
. * (2) Illustrate Envelope method: Figure 12.3
.
. * No data need be read in.
.
. ********** SETUP **********
.
. set more off
. version 8
. set scheme s1mono
.
. ********** (1) INVERSE TRANSFORMATION - FIGURE 12.2 page 413 **********
.
. * Graph is for x = 0 to 5 in increments of 0.05
. set obs 100
obs was 0, now 100
. gen x = 0.05*_n
. * Unit Exponential cdf
. gen Fx = 1 - exp(-x)
. * Suppose uniform draw is 0.64
. gen uniformdraw = 0.64
.
. graph twoway (line Fx x, yline(0.64) xline(1.02)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Inverse Transformation Method") /*
233

> */ xtitle("Random variable x", size(medlarge)) xscale(titlegap(*5)) /*


> */ ytitle("Cdf F(x)", size(medlarge)) yscale(titlegap(*5)) /*
> */ caption(" " "Draw of 0.64 (vertical axis) yields x = 1.02 (horizontal axis).")
. graph save ch12fig2invtransform, replace
(file ch12fig2invtransform.gph saved)
. graph export ch12fig2invtransform.wmf, replace
(file c:\Imbook\bwebpage\Section3\ch12fig2invtransform.wmf written in Windows Metafile
format)
.
. ********** (2) ENVELOPE METHOD - FIGURE 12.3 **********
.
. * The following is a modification of the figure in the book
. * making clear that the envelope is a scaling up of g(x)
.
. clear
.
. * Graph is for x = 0 to 10 in increments of 0.1
. set obs 101
obs was 0, now 101
. gen x = -0.05 + 0.1*_n
. * Unit Exponential cdf
. gen fx = normden(x-4)
. gen gx = 1.5*normden(x-4)+0.005
.
. graph twoway (line fx x, clstyle(p1)) /*
> */ (line gx x, clstyle(p1) clwidth(*2) clcolor(gs12)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Accept-reject Method") /*
> */ xtitle("Random variable x", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("f(x) and kg(x)", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Desired density f(x)") label(2 "Envelope kg(x)") )
. graph save ch12fig3envelope, replace
(file ch12fig3envelope.gph saved)
. graph export ch12fig3envelope.wmf, replace
(file c:\Imbook\bwebpage\Section3\ch12fig3envelope.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma12p3draws.txt
234

log type: text


closed on: 18 May 2005, 21:48:42
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section3\mma13p1bayesthm.txt
log type: text
opened on: 24 May 2005, 11:04:08
.
. ********** OVERVIEW OF MMA13P1BAYESTHM.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 13.2.2 page 424
. * Create Figure 13.1
. * (1) Bayes Analysis illustrated using normal distribution and prior
.
. * No data are needed.
.
. ********** SETUP
.
. set more off
. version
version 8.2
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * Model is y ~ normal(theta, sigmesq) where sigmasq is known.
. * and the prior is theta ~ normal(mu, tau)
. * which gives a normal posterior
. * n is set below in set obs
.
. ********** CREATE DATA **********
.
. * The likleihood and prior are normal so the posterior is also normal
.
. * Will evaluate the densities at points between 0 and 15
. set obs 150
obs was 0, now 150
. gen xeval = 0.1*_n
.
235

. * Likelihood with sigmasq known


. scalar nobs = 50
. scalar ybar = 10
. scalar sigmasq = 100
. gen likelihood = normden(xeval,ybar,sqrt(sigmasq/nobs))
.
. * Prior
. scalar mu = 5
. scalar tausq = 3
. gen prior = normden(xeval,mu,sqrt(tausq))
.
. * Posterior given sample mean of using
. scalar tau1sq=1/((nobs/sigmasq)+(1/tausq))
. scalar mu1 = tau1sq*((ybar*nobs/sigmasq)+(mu/tausq))
. gen posterior = normden(xeval,mu1,sqrt(tau1sq))
.
. scalar list
mu1 =
tau1sq =
tausq =
mu =
sigmasq =
ybar =
nobs =

8
1.2
3
5
100
10
50

. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------xeval |
150
7.55 4.344537
.1
15
likelihood |
150 .0666548 .0944174 6.44e-12 .2820948
prior |
150 .0665247 .0804685 1.33e-08 .2303294
posterior |
150 .0666667 .1131755 1.85e-12 .3641828
.
. graph twoway (line likelihood xeval, clstyle(p2)) /*
> */ (line prior xeval, clstyle(p3)) /*
> */ (line posterior xeval, clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Bayes: Likelihood, Prior and Posterior") /*
> */ xtitle("Evaluation point", size(medlarge)) xscale(titlegap(*5)) /*
236

>
>
>
>

*/ ytitle("Density", size(medlarge)) yscale(titlegap(*5)) /*


*/ legend(pos(10) ring(0) col(1)) legend(size(small)) /*
*/ legend( label(1 "Likelihood N[10,2]") label(2 "Prior N[5,3]") /*
*/
label(3 "Posterior N[8,1.2]") )

. graph save Ch13_Bayes1, replace


(file Ch13_Bayes1.gph saved)
. graph export Ch13_Bayes1.wmf, replace
(file c:\Imbook\bwebpage\Section3\Ch13_Bayes1.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section3\mma13p1bayesthm.txt
log type: text
closed on: 24 May 2005, 11:04:12
1
The SAS System
25, 2005

08:50 Wednesday, May

NOTE: Copyright (c) 2002-2003 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) 9.1 (TS1M2)
Licensed to UNIV OF CA/DAVIS, Site 0029107010.
NOTE: This session is executing on the SunOS 5.9 platform.

You are running SAS 9. Some SAS 8 files will be automatically converted
by the V9 engine; others are incompatible. Please see
http://support.sas.com/rnd/migration/planning/platform/64bit.html
PROC MIGRATE will preserve current SAS file attributes and is
recommended for converting all your SAS libraries from any
SAS 8 release to SAS 9. For details and examples, please see
http://support.sas.com/rnd/migration/index.html

This message is contained in the SAS news file, and is presented upon
initialization. Edit the file "news" in the "misc/base" directory to
display site-specific news and information in the program log.
The command line option "-nonews" will prevent this display.

NOTE: SAS initialization used:


real time
0.11 seconds
cpu time
0.10 seconds
1
2

* MMA13P2BAYES.SAS March 2005 for SAS version 8.2

237

3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

********** OVERVIEW OF MMA13P2BAYES.SAS **********


* SAS Program
* copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
* used for "Microeconometrics: Methods and Applications"
* by A. Colin Cameron and Pravin K. Trivedi (2005)
* Cambridge University Press
* Chapter 13.6 p.452-4
* MCMC Example: Gibbs Sampler for 2 equation SUR
* Program creates the first column of Table 13.3
* (though differs somewhat due to use of different seed)
* For different columns of Table 13.3 change
* nobs = Sample size N (1000 or 10000)
* replics = Gibbs sample replications (50000 or 100000)
* tau = 1, 10 or 0.1
* This program does first column: tau=10, nobs=1000, replics=50000
* Note that the program does not exactly replicate Table 13.3
* Table 13.3 used the computer clock for seed,
* with third argument zero in rannor(j( , ,0))
* Here instead the seed is consecutively 10101, 20101, ... , 70101
* so third argument is eg rannor(j( , ,10101))
* to permit reproducability by other users
* This programs creates

238

2
25, 2005

The SAS System

08:50 Wednesday, May

30
* MMA13P2BAYES.1ST SAS Output with one column of Table 13.3
31
* MMA13P2BAYES.LOG SAS log file
32
33
* This program uses generated data - so no data set required
34
* This program uses a lot of memory - 1 gigabyte should do
35
* In Unix give command sas -MEMSIZE 1G mma13p2bayesgibbs.sas
36
37
*********************************************************************;
38
*****
BIVARIATE NORMAL-BAYESIAN-ESTIMATION-BY-MCMC
**************;
39
*********************************************************************;
40
41
OPTIONS LS=75;
42
options NOTES;
43
44
PROC IML;
NOTE: IML Ready
45
start main;
45
!
46
47
print "A. Colin Cameron and Pravin K. Trivedi (2005)";
47
!
48
print "Microeconometrics: Methods and Applications, CUP";
48
!
49
print "MCMC Example: Gibbs Sampler for SUR";
49
!
50
51
************* GENERATING DATA: 2 EQUATION SUR
51
! ****************;
52
53
nobs = 1000;
53
!
54
replics = 50000;
54
!
55
burn = 5000;
55
!
56
replics = replics + burn;
56
!
57
58
npar1 = 2;
58
!
59
npar2 = 2;
59
!
60
61
alpha1 ={1,1};
61
!
62
alpha2 ={1,1};
62
!
239

63
64
64
65
65
66
66
67
67
68
69

sigma = {1 -0.5,-0.5 1};


!
T = {0.15 2.18 0.725 0.45};
!
EPS = 1e-20;
!
IC = (1/2.506628275);
!
R1 = j(nobs,1,1)||rannor(j(nobs,1,10101));

240

3
69
70
70
71
72
72
73
73
74
74
75
76
76
77
77
78
79
79
80
81
81
82
82
83
84
84
85
85
86
86
87
87
88
89
89
90
90
91
91
92
93
93
94
95
95
96
97
97
98

The SAS System 08:50 Wednesday, May 25, 2005


!
R2 = j(nobs,1,1)||rannor(j(nobs,1,20101));
!
e = rannor(j(nobs,2,30101))*root(sigma);
!
e1 = e[,1];
!
e2 = e[,2];
!
Y1 = R1*alpha1 + e1;
!
Y2 = R2*alpha2 + e2;
!
*************
SPECIFY PRIOR DISTRIBUTIONS
! ******************;
alpha01 = j(npar1,1,0);
!
alpha02 = j(npar2,1,0);
!
sigma = I(2);
!
p = 3;
!
df = 5;
!
tau = 10;
!
MUalpha = alpha01//alpha02;
!
OMalpha = tau*I(npar1+npar2);
!
OMphi = I(2);
!
************ ANALYSIS: GIBBS SAMLING BEGINS HERE
! ***************;
do rep = 1 to replics;
!
*************
GENERATE ALPHA1 ALPHA2 RHO
! *******************;

241

99
99
100
101
102
102
103
104

isigma = inv(sigma);
!
LL = ((isigma[1,1]*R1`*R1||isigma[1,2]*R1`*R2)//
(isigma[2,1]*R2`*R1||isigma[2,2]*R2`*R2));
!
LisigY = ((isigma[1,1]*R1`*Y1+isigma[1,2]*R1`*Y2)//
(isigma[2,1]*R2`*Y1+isigma[2,2]*R2`*Y2));

242

4
104
105
106
107
107
108
109
109
110
110
111
112
112
113
113
114
115
115
116
117
118
118
119
119
120
120
121
121
122
122
123
123
124
124
125
126
126
127
128
128
129
130
130
131
131
132
132
133
134

The SAS System 08:50 Wednesday, May 25, 2005


!
alpha = inv(inv(OMalpha)+ LL)*(LisigY + inv(OMalpha)*MUalpha)
+ root(inv(inv(OMalpha)+
! LL))`*rannor(j(npar1+npar2,1,40101));
alpha1 = alpha[1:npar1];
!
alpha2 = alpha[npar1+1:npar1+npar2];
!
e1 = Y1 - R1*alpha1;
!
e2 = Y2 - R2*alpha2;
!
*************
GENERATE SIGMA
! *******************;
mt = (sqrt((rannor(j(1,nobs+df,50101))##2)[,+])||0)//
(rannor(j(1,1,60101))||sqrt((rannor(j(1,nobs+df-1,70101))##
! 2)[,+]));
mv = mt*mt`;
!
e=(e1||e2);
!
ms = e`*e+inv(OMphi);
!
ml = root(inv(ms))`;
!
mg = ml*mv*ml`;
!
sigma = inv(mg);
!
free mt mv e ml;
!
************* WRITE TO OUTPUT FILE IF AFTER BURN-IN
! **************;
if rep <= burn then goto point300;
!
sigma3 = sigma[1,1]||sigma[1,2]||sigma[2,2];
!
out1 = alpha1`||alpha2`||sigma3;
!
output1=output1//out1;
243

134
135
136
136
136
137
138
138

!
!

point300:
end;

*************
! **************;

END OF GIBBS SAMPLING

244

5
139
140
141
141
142
142
143
143
144
145
145
146
147
147
148
148
149
150
150
151
151
152
152
153
153
154
155
155
156
156
157
157
158
158
159
160
160
161
161
162
162
163
164
164
165
165
166
166
167

The SAS System 08:50 Wednesday, May 25, 2005

****************************************************************
! *****;
***** RESULTS: COMPARE LAST HALF WITH ALL (AFTER BURN-IN)
! *******;
****************************************************************
! *****;
replics = replics-burn;
!
out1 = output1[replics/2+1:replics,];
!
out = output1[1:replics,];
!
create exp from out1;
!
append from out1;
!
summary var _num_;
!
close exp;
!
create exp from out;
!
append from out;
!
summary var _num_;
!
close exp;
!
****************************************************************
! *****;
****** RESULTS: POSTERIOR MEAN AND SD - TABLE 13.3 P.454
! ********;
****************************************************************
! *****;
xnames1 = {"CONSTANT"} || {"R1"};
!
xnames2 = {"CONSTANT"} || {"R2"};
!
parnames = concat({"d1"}," ",xnames1)||concat({"d2"},"
! ",xnames2)||{"SIGMA11"}||{"SIGMA12"}||{"SIGMA22"};

245

168
168
169
169
170
170
171
171

meanout = out[+,]/replics;
!
stderr =
! sqrt(((out-j(replics,1,1)*meanout)##2)[+,]/(replics-1));
parm = meanout;
!
stderr = stderr`;
!

246

6
172
172
173
174
174
175
175
176
176
177
177
178
178
179
179
180
180
181
181
182
182
183
183
184
185
185
186
186
187
187
188
189
189
190
191
191
192
193
193
194
194
195
195
196
196
197
198
198
199

The SAS System 08:50 Wednesday, May 25, 2005


tnpar = npar1 + npar2 + 3;
!
tstat = parm`/ stderr;
!
coeff = parm` || stderr || tstat;
!
info = tau // nobs // replics // burn // tnpar;
!
rowinfo={'TAU' '# OBSERVATIONS' '# REPLICATIONS' '# BURN-IN' '#
! PARAMETERS'};
estcol ={ 'ESTIMATE' 'STD ERR' 'T-STAT'};
!
mattrib info rowname=rowinfo label={" "};
!
mattrib coeff rowname=parnames colname=estcol label={" "};
!
print / "Results for Table 13.3 p.454";
!
print info;
!
print coeff;
!
****************************************************************
! *****;
********** RESULTS: CONVERGENCE CHECK: SEE P.454
! ***************;
****************************************************************
! *****;
print / "Convergence check on p.454";
!
corr = j(20,7,0);
!
do i = 1 to 7;
!
cov = covlag(out[,i],20)`;
!
corr[,i] = cov/cov[1];
!
end;
!
covd1 = j(20,2,0);
!

247

200
200
201
201
202
202
203
203

do k = 1 to 3;
!
covd1 = corr[,2*k-1:2*k];
!
print covd1;
!
end;
!

248

The SAS System 08:50 Wednesday, May 25, 2005

204
205
covd1 = corr[,7];
205
!
206
print covd1;
206
!
207
208
finish main;
NOTE: Module MAIN defined.
208
!
209
210
run main;
NOTE: The data set WORK.EXP has 25000 observations and 7 variables.
NOTE: The data set WORK.EXP has 50000 observations and 7 variables.
210
!
NOTE: Exiting IML.
NOTE: 65925 workspace compresses.
NOTE: The PROCEDURE IML printed pages 1-6.
NOTE: PROCEDURE IML used (Total process time):
real time
5:44.35
cpu time
5:44.04

NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
real time
5:45.48
cpu time
5:45.15

249

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma14p1binary.txt
log type: text
opened on: 19 May 2005, 09:01:28
.
. ********** OVERVIEW OF MMA14P1BINARY.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 14.2 (pages 464-6) Logit and probit models.
. * Provides
. * (1) Table 14.1: Data summary
. * (2) Table 14.2: Logit, Probit and OLS slope estimates
. * (3) Figure 14.1: Plot of Logit Probit and OLS predicted probabilities
.
. * To run this program you need data file
. * Nldata.asc
.
. ********** SETUP
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION
.
. * Data Set comes from :
. * J. A. Herriges and C. L. Kling,
. * "Nonlinear Income Effects in Random Utility Models",
. * Review of Economics and Statistics, 81(1999): 62-72
.
. * The data are given as a combined observation with data on all 4 choices.
. * This will work for multinomial logit program.
. * For conditional logit will need to make a new data set which has
. * four separate entries for each observation as there are four alternatives.
.
. * Filename: NLDATA.ASC
. * Format: Ascii
. * Number of Observations: 1182
. * Each observations appears over 3 lines with 4 variables per line
. * so 4 x 1182 = 4728 observations
. * Variable Number and Description
. * 1 Recreation mode choice. = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
250

. * 2 Price for chosen alternative


. * 3 Catch rate for chosen alternative
. * 4 = 1 if beach mode chosen; = 0 otherwise
. * 5 = 1 if pier mode chosen; = 0 otherwise
. * 6 = 1 if private boat mode chosen; = 0 otherwise
. * 7 = 1 if charter boat mode chosen; = 0 otherwise
. * 8 = price for beach mode
. * 9 = price for pier mode
. * 10 = price for private boat mode
. * 11 = price for charter boat mode
. * 12 = catch rate for beach mode
. * 13 = catch rate for pier mode
. * 14 = catch rate for private boat mode
. * 15 = catch rate for charter boat mode
. * 16 = monthly income
.
. ********** READ IN DATA **********
.
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. * Divide income by 1000 so that results are easy to read
. gen ydiv1000 = income/1000
.
. label define modetype 1 "beach" 2 "pier" 3 "private" 4 "charter"
. label values mode modetype
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524
.0678
.5333
qpier |
1182 .1622237 .1603898 .0014 .4522
251

qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
.
. ********** CREATE BINARY DATA: CHARTER vs PIER **********
.
. * Binary logit of charter (mode = 2) versus pier (mode = 4)
. keep if mode == 2 | mode == 4
(552 observations deleted)
. * charter is 1 if fish from charter boat and 0 if fish from pier
. gen charter = 0
. replace charter = 1 if mode == 4
(452 real changes made)
.
. gen pratio = 100*ln(pcharter/ppier)
. gen lnrelp = ln(pchart/ppier)
.
. * Overall summary
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
630 3.434921 .9011843
2
4
price |
630 62.51669 52.31219
1.29 387.208
crate |
630 .5533478 .6953035
.0014 2.3101
dbeach |
630
0
0
0
0
dpier |
630 .2825397 .4505921
0
1
-------------+-------------------------------------------------------dprivate |
630
0
0
0
0
dcharter |
630 .7174603 .4505921
0
1
pbeach |
630 95.19802 95.62037
1.29 578.048
ppier |
630 95.19802 95.62037
1.29 578.048
pprivate |
630 55.26221 59.99482
2.29 494.058
-------------+-------------------------------------------------------pcharter |
630 84.89158 60.79327
27.29 529.058
qbeach |
630 .2546022 .1983357
.0678
.5333
qpier |
630 .1716835 .1687288
.0014 .4522
qprivate |
630 .1695303 .2033172
.0014
.7369
qcharter |
630 .6368509 .688508
.0029 2.3101
-------------+-------------------------------------------------------income |
630 3741.402 2145.71 416.6667
12500
ydiv1000 |
630 3.741402 2.14571 .4166667
12.5
charter |
630 .7174603 .4505921
0
1
252

pratio |
lnrelp |

630 27.45581 126.2598 -215.3976 406.2712


630 .2745581 1.262598 -2.153976 4.062713

. * Summary by charter or by pier


. sort mode
. by mode: summarize
----------------------------------------------------------------------------------------------------> mode = pier
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
178
2
0
2
2
price |
178 30.57133 35.58442
1.29 224.296
crate |
178 .2025348 .1702942
.0014
.4522
dbeach |
178
0
0
0
0
dpier |
178
1
0
1
1
-------------+-------------------------------------------------------dprivate |
178
0
0
0
0
dcharter |
178
0
0
0
0
pbeach |
178 30.57133 35.58442
1.29 224.296
ppier |
178 30.57133 35.58442
1.29 224.296
pprivate |
178 82.42908 69.30802
2.29 494.058
-------------+-------------------------------------------------------pcharter |
178 109.7633 72.37726
27.29 529.058
qbeach |
178 .2614444 .1949684
.0678
.5333
qpier |
178 .2025348 .1702942
.0014
.4522
qprivate |
178 .1501489 .0968393
.0014
.2601
qcharter |
178 .4980798 .3756255
.0029 1.0266
-------------+-------------------------------------------------------income |
178 3387.172 2340.324 416.6667
12500
ydiv1000 |
178 3.387172 2.340324 .4166667
12.5
charter |
178
0
0
0
0
pratio |
178 164.2956 104.3052 -79.13918 406.2712
lnrelp |
178 1.642956 1.043052 -.7913917 4.062713
----------------------------------------------------------------------------------------------------> mode = charter
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
452
4
0
4
4
price |
452 75.09694 52.51942
27.29 387.208
crate |
452 .6914998 .7714728
.0029 2.3101
dbeach |
452
0
0
0
0
dpier |
452
0
0
0
0
-------------+-------------------------------------------------------dprivate |
452
0
0
0
0
dcharter |
452
1
0
1
1
pbeach |
452 120.6483 99.78664
4.29 578.048
253

ppier |
452 120.6483 99.78664
4.29 578.048
pprivate |
452 44.56376 52.23744
2.29 362.208
-------------+-------------------------------------------------------pcharter |
452 75.09694 52.51942
27.29 387.208
qbeach |
452 .2519077 .1997956
.0678
.5333
qpier |
452 .1595341 .1667353 .0014 .4522
qprivate |
452 .1771628 .2318749
.0014
.7369
qcharter |
452 .6914998 .7714728
.0029 2.3101
-------------+-------------------------------------------------------income |
452
3880.9 2050.028 416.6667
12500
ydiv1000 |
452
3.8809 2.050028 .4166667
12.5
charter |
452
1
0
1
1
pratio |
452 -26.43243 87.53686 -215.3976 235.8242
lnrelp |
452 -.2643243 .8753686 -2.153976 2.358242

.
. * Write final data to a text (ascii) file so can use with programs other than Stata
. outfile charter lnrelp using mma14p1binary.asc, replace
.
. ********** TABLE 14.1 - DATA SUMMARY BY OUTCOME AND OVERALL **********
.
. * Following gives Table 14.1 page 464
. summarize charter pcharter ppier lnrelp
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------charter |
630 .7174603 .4505921
0
1
pcharter |
630 84.89158 60.79327
27.29 529.058
ppier |
630 95.19802 95.62037
1.29 578.048
lnrelp |
630 .2745581 1.262598 -2.153976 4.062713
. sort mode
. by mode: summarize charter pcharter ppier lnrelp
----------------------------------------------------------------------------------------------------> mode = pier
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------charter |
178
0
0
0
0
pcharter |
178 109.7633 72.37726
27.29 529.058
ppier |
178 30.57133 35.58442
1.29 224.296
lnrelp |
178 1.642956 1.043052 -.7913917 4.062713
----------------------------------------------------------------------------------------------------> mode = charter
Variable |

Obs

Mean

Std. Dev.

Min

Max
254

-------------+-------------------------------------------------------charter |
452
1
0
1
1
pcharter |
452 75.09694 52.51942
27.29 387.208
ppier |
452 120.6483 99.78664
4.29 578.048
lnrelp |
452 -.2643243 .8753686 -2.153976 2.358242

.
. ********** TABLE 14.2 - ESTIMATE LOGIT, PROBIT AND OLS MODELS
.
. logit charter lnrelp
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -375.06167


log likelihood = -223.44527
log likelihood = -208.29369
log likelihood = -206.84942
log likelihood = -206.82698
log likelihood = -206.82697

Logit estimates

Number of obs =
630
LR chi2(1)
= 336.47
Prob > chi2 = 0.0000
Log likelihood = -206.82697
Pseudo R2
= 0.4486
-----------------------------------------------------------------------------charter |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -1.82253 .1445681 -12.61 0.000 -2.105879 -1.539182
_cons | 2.053125 .1689307 12.15 0.000 1.722027 2.384223
-----------------------------------------------------------------------------. estimates store blogit
.
. probit charter lnrelp
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -375.06167


log likelihood = -221.55989
log likelihood = -205.42312
log likelihood = -204.41773
log likelihood = -204.41087

Probit estimates

Number of obs =
630
LR chi2(1)
= 341.30
Prob > chi2 = 0.0000
Log likelihood = -204.41087
Pseudo R2
= 0.4550
-----------------------------------------------------------------------------charter |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -1.055515 .0761117 -13.87 0.000 -1.204691 -.9063383
255

_cons | 1.19436 .089504 13.34 0.000 1.018936 1.369785


-----------------------------------------------------------------------------. estimates store bprobit
.
. regress charter lnrelp
Source |
SS
df
MS
Number of obs = 630
-------------+-----------------------------F( 1, 628) = 542.12
Model | 59.1676598 1 59.1676598
Prob > F
= 0.0000
Residual | 68.5402767 628 .109140568
R-squared = 0.4633
-------------+-----------------------------Adj R-squared = 0.4624
Total | 127.707937 629 .203033285
Root MSE
= .33036
-----------------------------------------------------------------------------charter |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -.2429137 .0104328 -23.28 0.000 -.2634011 -.2224262
_cons | .7841542 .0134701 58.21 0.000 .7577023 .8106061
-----------------------------------------------------------------------------. estimates store bOLS
.
. * Heteroskedastic robust standard errors only needed for OLS
. * but given for other models for completeness
.
. logit charter lnrelp, robust
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -375.06167


log pseudo-likelihood = -223.44527
log pseudo-likelihood = -208.29369
log pseudo-likelihood = -206.84942
log pseudo-likelihood = -206.82698
log pseudo-likelihood = -206.82697

Logit estimates

Number of obs =
630
Wald chi2(1) = 194.28
Prob > chi2 = 0.0000
Log pseudo-likelihood = -206.82697
Pseudo R2
= 0.4486
-----------------------------------------------------------------------------|
Robust
charter |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -1.82253 .1307556 -13.94 0.000 -2.078807 -1.566254
_cons | 2.053125 .1473477 13.93 0.000 1.764329 2.341921
-----------------------------------------------------------------------------. estimates store bloghet
256

.
. probit charter lnrelp, robust
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log pseudo-likelihood = -375.06167


log pseudo-likelihood = -221.55989
log pseudo-likelihood = -205.42312
log pseudo-likelihood = -204.41773
log pseudo-likelihood = -204.41087

Probit estimates

Number of obs =
630
Wald chi2(1) = 232.07
Prob > chi2 = 0.0000
Log pseudo-likelihood = -204.41087
Pseudo R2
= 0.4550
-----------------------------------------------------------------------------|
Robust
charter |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -1.055515 .0692881 -15.23 0.000 -1.191317 -.9197122
_cons | 1.19436 .0794429 15.03 0.000 1.038655 1.350066
-----------------------------------------------------------------------------. estimates store bprobhet
.
. regress charter lnrelp, robust
Regression with robust standard errors
Number of obs =
F( 1, 628) = 792.44
Prob > F
= 0.0000
R-squared = 0.4633
Root MSE = .33036

630

-----------------------------------------------------------------------------|
Robust
charter |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnrelp | -.2429137 .0086292 -28.15 0.000 -.2598592 -.2259681
_cons | .7841542 .0119566 65.58 0.000 .7606744 .8076341
-----------------------------------------------------------------------------. estimates store bOLShet
.
. * Following gives Table 14.2 page 465
. estimates table blogit bprobit bOLS bloghet bprobhet bOLShet, /*
> */ t stats(N ll r2 r2_p) b(%8.3f) keep(_cons lnrelp)
-------------------------------------------------------------------------------Variable | blogit bprobit
bOLS bloghet bprobhet bOLShet
257

-------------+-----------------------------------------------------------------_cons | 2.053
1.194
0.784
2.053
1.194
0.784
| 12.15
13.34
58.21
13.93
15.03
65.58
lnrelp | -1.823 -1.056 -0.243 -1.823 -1.056 -0.243
| -12.61 -13.87 -23.28 -13.94 -15.23 -28.15
-------------+-----------------------------------------------------------------N | 630.000 630.000 630.000 630.000 630.000 630.000
ll | -206.827 -204.411 -195.167 -206.827 -204.411 -195.167
r2 |
0.463
0.463
r2_p | 0.449
0.455
0.449
0.455
-------------------------------------------------------------------------------legend: b/t
.
. ********** FIGURE 14.1 - PLOT PREDICTED PROBABILITY AGAINST X FOR MODELS
.
. quietly logit charter lnrelp
. predict plogit, p
.
. quietly probit charter lnrelp
. predict pprobit, p
.
. quietly regress charter lnrelp
. predict pOLS
(option xb assumed; fitted values)
.
. sum charter plogit pprobit pOLS
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------charter |
630 .7174603 .4505921
0
1
plogit |
630 .7174603 .3193077 .0047196 .9974746
pprobit |
630
.72019 .3196164 .0009877 .9997377
pOLS |
630 .7174603 .3067022 -.2027341 1.307384
.
. sort lnrelp
.
. * Following gives Figure 14.1 page 466
. graph twoway (scatter charter lnrelp, msize(vsmall) jitter(3)) /*
> */ (line plogit lnrelp, clstyle(p1)) /*
> */ (line pprobit lnrelp, clstyle(p2)) /*
> */ (line pOLS lnrelp, clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
258

>
>
>
>
>
>

*/ title("Predicted Probabilities Across Models") /*


*/ xtitle("Log relative price (lnrelp)", size(medlarge)) xscale(titlegap(*5)) /*
*/ ytitle("Predicted probability", size(medlarge)) yscale(titlegap(*5)) /*
*/ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
*/ legend( label(1 "Actual Data (jittered)") label(2 "Logit") /*
*/
label(3 "Probit") label(4 "OLS"))

. graph export ch14binary.wmf, replace


(file c:\Imbook\bwebpage\Section4\ch14binary.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma14p1binary.txt
log type: text
closed on: 19 May 2005, 09:01:31

259

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma15p1mnl.txt
log type: text
opened on: 19 May 2005, 12:16:20
.
. ********** OVERVIEW OF MMA15P1MNL.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 15.2.1-3 pages 491-5
. * Multinomial and conditional logit models analysis.
. * It provides ....
. * (0) Data summary (Table 15.1)
. * (1A) Multinomial Logit estimates (Table 15.1)
. * (1B) Multinomial Logit marginal effects (text page 494)
. * (2A) Conditional Logit estimates (Table 15.2)
. * (2B) Conditional Logit marginal effects (Table 15.3)
. * (3) Multinomial estimates obtained using Cinditional Logit
. * (4) "Mixed Model" estimates (Table 15.1)
.
. * Related programs are
. * mma15p2gev.do estimates a nested logit model using Stata
. * mma15p3mnl.lim estimates multinomial models using Limdep
. * mma15p4gev.lim estimates conditional and nested logit models using Limdep
.
. * To run this program you need data file
. * Nldata.asc
.
. /* Program summary:
>
> (1) Multinomial logit of mode on alternative-invariant regressor (income)
>
mlogit mode income
>
> (2) Conditional logit of mode on alternative-specific regressor (price, catch rate)
>
First reshape data so 4 observations per individual - one for each mode.
>
clogit mode p q
>
> (3) Conditional logit of mode on alternative-invariant regressor (income)
>
First reshape data so 4 observations per individual - one for each mode.
>
Then create dummy variables for each mode d2 d3 d4
>
clogit mode d2 d3 d4 d2y d3y d4y
>
This gives same results as (1)
>
> (4) Conditional logit of mode on alternative-invariant regressor (income)
>
and on alternative-sepcific regressor (price, catch rate)
>
First reshape data so 4 observations per individual - one for each mode.
260

>
Then create dummy variables for each mode d2 d3 d4
>
clogit mode d2 d3 d4 d2y d3y d4y p q
> */
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * Data Set comes from :
. * J. A. Herriges and C. L. Kling,
. * "Nonlinear Income Effects in Random Utility Models",
. * Review of Economics and Statistics, 81(1999): 62-72
.
. * The data are given as a combined observation with data on all 4 choices.
. * This will work for multinomial logit program.
. * For conditional logit will need to make a new data set which has
. * four separate entries for each observation as there are four alternatives.
.
. * Filename: NLDATA.ASC
. * Format: Ascii
. * Number of Observations: 1182
. * Each observations appears over 3 lines with 4 variables per line
. * so 4 x 1182 = 4728 observations
. * Variable Number and Description
. * 1 Recreation mode choice. = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. * 2 Price for chosen alternative
. * 3 Catch rate for chosen alternative
. * 4 = 1 if beach mode chosen; = 0 otherwise
. * 5 = 1 if pier mode chosen; = 0 otherwise
. * 6 = 1 if private boat mode chosen; = 0 otherwise
. * 7 = 1 if charter boat mode chosen; = 0 otherwise
. * 8 = price for beach mode
. * 9 = price for pier mode
. * 10 = price for private boat mode
. * 11 = price for charter boat mode
. * 12 = catch rate for beach mode
. * 13 = catch rate for pier mode
. * 14 = catch rate for private boat mode
. * 15 = catch rate for charter boat mode
. * 16 = monthly income
.
. ********** READ IN DATA and SUMMARIZE (Table 15.1, p.492) **********
.
. * Method to read in depends on model used
261

.
. /* Data are on fishing mode: 1 beach, 2 pier, 3 private boat, 4 charter
> Data come as one observation having data for all 4 modes.
> Both alternative specific and alternative invariant regresssors.
> */
.
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000
.
. * Look at data by alternative
. label define modetype 1 "beach" 2 "pier" 3 "private" 4 "charter"
. label values mode modetype
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524
.0678
.5333
qpier |
1182 .1622237 .1603898 .0014 .4522
qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
. sort mode
. by mode: summarize
---------------------------------------------------------------------------------------------------262

-> mode = beach


Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
134
1
0
1
1
price |
134 35.69949 43.09414
1.29 306.82
crate |
134 .2791948 .1938734
.0678
.5333
dbeach |
134
1
0
1
1
dpier |
134
0
0
0
0
-------------+-------------------------------------------------------dprivate |
134
0
0
0
0
dcharter |
134
0
0
0
0
pbeach |
134 35.69949 43.09414
1.29 306.82
ppier |
134 35.69949 43.09414
1.29 306.82
pprivate |
134 97.80913 75.43844
2.29 392.946
-------------+-------------------------------------------------------pcharter |
134 125.0032 78.37641
27.29 427.946
qbeach |
134 .2791948 .1938734
.0678
.5333
qpier |
134 .2190015 .1677117
.0025
.4522
qprivate |
134 .1593985 .0948855 .0008 .2601
qcharter |
134 .5176089 .3629096
.0027 1.0266
-------------+-------------------------------------------------------income |
134 4051.617 2505.42 416.6667
12500
ydiv1000 |
134 4.051617 2.50542 .4166667
12.5
----------------------------------------------------------------------------------------------------> mode = pier
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
178
2
0
2
2
price |
178 30.57133 35.58442
1.29 224.296
crate |
178 .2025348 .1702942 .0014 .4522
dbeach |
178
0
0
0
0
dpier |
178
1
0
1
1
-------------+-------------------------------------------------------dprivate |
178
0
0
0
0
dcharter |
178
0
0
0
0
pbeach |
178 30.57133 35.58442
1.29 224.296
ppier |
178 30.57133 35.58442
1.29 224.296
pprivate |
178 82.42908 69.30802
2.29 494.058
-------------+-------------------------------------------------------pcharter |
178 109.7633 72.37726
27.29 529.058
qbeach |
178 .2614444 .1949684 .0678 .5333
qpier |
178 .2025348 .1702942
.0014
.4522
qprivate |
178 .1501489 .0968393
.0014
.2601
qcharter |
178 .4980798 .3756255
.0029 1.0266
-------------+-------------------------------------------------------income |
178 3387.172 2340.324 416.6667
12500
ydiv1000 |
178 3.387172 2.340324 .4166667
12.5

263

----------------------------------------------------------------------------------------------------> mode = private


Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
418
3
0
3
3
price |
418 41.60681 55.90806
2.29 666.11
crate |
418 .1775411 .2435798
.0002
.7369
dbeach |
418
0
0
0
0
dpier |
418
0
0
0
0
-------------+-------------------------------------------------------dprivate |
418
1
0
1
1
dcharter |
418
0
0
0
0
pbeach |
418 137.5271 115.3058
2.29 843.186
ppier |
418 137.5271 115.3058
2.29 843.186
pprivate |
418 41.60681 55.90806
2.29 666.11
-------------+-------------------------------------------------------pcharter |
418 70.58409 56.39575
27.29 691.11
qbeach |
418 .2082868 .1729351
.0678
.5333
qpier |
418 .1297646 .1368029
.0025 .4522
qprivate |
418 .1775411 .2435798
.0002
.7369
qcharter |
418 .6539167 .8064379
.0021 2.3101
-------------+-------------------------------------------------------income |
418 4654.107 2777.898 416.6667
12500
ydiv1000 |
418 4.654107 2.777898 .4166667
12.5
----------------------------------------------------------------------------------------------------> mode = charter
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
452
4
0
4
4
price |
452 75.09694 52.51942
27.29 387.208
crate |
452 .6914998 .7714728
.0029 2.3101
dbeach |
452
0
0
0
0
dpier |
452
0
0
0
0
-------------+-------------------------------------------------------dprivate |
452
0
0
0
0
dcharter |
452
1
0
1
1
pbeach |
452 120.6483 99.78664
4.29 578.048
ppier |
452 120.6483 99.78664
4.29 578.048
pprivate |
452 44.56376 52.23744
2.29 362.208
-------------+-------------------------------------------------------pcharter |
452 75.09694 52.51942
27.29 387.208
qbeach |
452 .2519077 .1997956
.0678
.5333
qpier |
452 .1595341 .1667353
.0014
.4522
qprivate |
452 .1771628 .2318749
.0014
.7369
qcharter |
452 .6914998 .7714728
.0029 2.3101
-------------+-------------------------------------------------------income |
452
3880.9 2050.028 416.6667
12500
ydiv1000 |
452
3.8809 2.050028 .4166667
12.5
264

.
. * Following commands give Table 15.1, p.492
. summarize ydiv100 pbeach ppier pprivate pcharter qbeach qpier /*
> */ qprivate qcharter dbeach dpier dprivate dcharter
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
pcharter |
1182 84.37924 63.54465
27.29 691.11
-------------+-------------------------------------------------------qbeach |
1182 .2410113 .1907524 .0678 .5333
qpier |
1182 .1622237 .1603898
.0014
.4522
qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
dbeach |
1182 .1133672 .3171753
0
1
-------------+-------------------------------------------------------dpier |
1182 .1505922 .3578023
0
1
dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
. sort mode
. by mode: summarize ydiv100 pbeach ppier pprivate pcharter qbeach qpier /*
> */ qprivate qcharter dbeach dpier dprivate dcharter
----------------------------------------------------------------------------------------------------> mode = beach
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
134 4.051617 2.50542 .4166667
12.5
pbeach |
134 35.69949 43.09414
1.29 306.82
ppier |
134 35.69949 43.09414
1.29 306.82
pprivate |
134 97.80913 75.43844
2.29 392.946
pcharter |
134 125.0032 78.37641
27.29 427.946
-------------+-------------------------------------------------------qbeach |
134 .2791948 .1938734
.0678
.5333
qpier |
134 .2190015 .1677117
.0025
.4522
qprivate |
134 .1593985 .0948855
.0008
.2601
qcharter |
134 .5176089 .3629096
.0027 1.0266
dbeach |
134
1
0
1
1
-------------+-------------------------------------------------------dpier |
134
0
0
0
0
dprivate |
134
0
0
0
0
dcharter |
134
0
0
0
0

265

----------------------------------------------------------------------------------------------------> mode = pier


Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
178 3.387172 2.340324 .4166667
12.5
pbeach |
178 30.57133 35.58442
1.29 224.296
ppier |
178 30.57133 35.58442
1.29 224.296
pprivate |
178 82.42908 69.30802
2.29 494.058
pcharter |
178 109.7633 72.37726
27.29 529.058
-------------+-------------------------------------------------------qbeach |
178 .2614444 .1949684 .0678 .5333
qpier |
178 .2025348 .1702942
.0014
.4522
qprivate |
178 .1501489 .0968393
.0014
.2601
qcharter |
178 .4980798 .3756255
.0029 1.0266
dbeach |
178
0
0
0
0
-------------+-------------------------------------------------------dpier |
178
1
0
1
1
dprivate |
178
0
0
0
0
dcharter |
178
0
0
0
0
----------------------------------------------------------------------------------------------------> mode = private
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
418 4.654107 2.777898 .4166667
12.5
pbeach |
418 137.5271 115.3058
2.29 843.186
ppier |
418 137.5271 115.3058
2.29 843.186
pprivate |
418 41.60681 55.90806
2.29 666.11
pcharter |
418 70.58409 56.39575
27.29 691.11
-------------+-------------------------------------------------------qbeach |
418 .2082868 .1729351
.0678
.5333
qpier |
418 .1297646 .1368029
.0025
.4522
qprivate |
418 .1775411 .2435798
.0002
.7369
qcharter |
418 .6539167 .8064379
.0021 2.3101
dbeach |
418
0
0
0
0
-------------+-------------------------------------------------------dpier |
418
0
0
0
0
dprivate |
418
1
0
1
1
dcharter |
418
0
0
0
0
----------------------------------------------------------------------------------------------------> mode = charter
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ydiv1000 |
452
3.8809 2.050028 .4166667
12.5
pbeach |
452 120.6483 99.78664
4.29 578.048
ppier |
452 120.6483 99.78664
4.29 578.048
pprivate |
452 44.56376 52.23744
2.29 362.208
266

pcharter |
452 75.09694 52.51942
27.29 387.208
-------------+-------------------------------------------------------qbeach |
452 .2519077 .1997956
.0678
.5333
qpier |
452 .1595341 .1667353
.0014
.4522
qprivate |
452 .1771628 .2318749
.0014
.7369
qcharter |
452 .6914998 .7714728
.0029 2.3101
dbeach |
452
0
0
0
0
-------------+-------------------------------------------------------dpier |
452
0
0
0
0
dprivate |
452
0
0
0
0
dcharter |
452
1
0
1
1

.
. ********** (1) MULTINOMIAL LOGIT: ALTERNATIVE-INVARIANT REGRESSOR
*********
.
. *** (1A) Estimate the model
.
. * Data are already in form for mlogit
.
. * The following gives MNL column of Table 15.2, p.493
. mlogit mode ydiv1000, basecategory(1)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -1497.7229


log likelihood = -1477.5265
log likelihood = -1477.1514
log likelihood = -1477.1506

Multinomial logistic regression


LR chi2(3)
Prob > chi2
Log likelihood = -1477.1506

Number of obs =
1182
=
41.14
= 0.0000
Pseudo R2
= 0.0137

-----------------------------------------------------------------------------mode |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------pier
|
ydiv1000 | -.1434029 .0532882 -2.69 0.007 -.2478459 -.03896
_cons | .8141503 .2286316 3.56 0.000 .3660405 1.26226
-------------+---------------------------------------------------------------private
|
ydiv1000 | .0919064 .0406638 2.26 0.024 .0122069 .1716059
_cons | .7389208 .1967309 3.76 0.000 .3533352 1.124506
-------------+---------------------------------------------------------------charter
|
ydiv1000 | -.0316399 .0418463 -0.76 0.450 -.1136571 .0503774
_cons | 1.341291 .1945167 6.90 0.000 .9600457 1.722537
-----------------------------------------------------------------------------(Outcome mode==beach is the comparison group)

267

.
. *** (1B) Calculate the marginal effects
.
. quietly mlogit mode ydiv1000, basecategory(1)
. * Predict by default gives the probabilities
. predict p1 p2 p3 p4
(option p assumed; predicted probabilities)
.
. * As check compare predicted to actual probabilities
. summarize dbeach p1 dpier p2 dprivate p3 dcharter p4
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dbeach |
1182 .1133672 .3171753
0
1
p1 |
1182 .1133672 .0036716 .0947395 .1153659
dpier |
1182 .1505922 .3578023
0
1
p2 |
1182 .1505922 .0444575 .0356142 .2342903
dprivate |
1182 .3536379 .4783008
0
1
-------------+-------------------------------------------------------p3 |
1182 .3536379 .0797714 .2396973 .625706
dcharter |
1182 .3824027 .4861799
0
1
p4 |
1182 .3824027 .0346281 .2439403 .4158273
.
. * Quick way to compute marginal effects (or semi-elasticities dp/dlnx or elasticities)
. * is to use built-in Stata function whcih evaluates at sample mean
. * dydx, eyex, dwex or eydx
. mfx compute, dydx predict(outcome(1))
Marginal effects after mlogit
y = Pr(mode==1) (predict, outcome(1))
= .11541492
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ]
X
---------+-------------------------------------------------------------------ydiv1000 | .000075
.00393 0.02 0.985 -.007635 .007785 4.09934
-----------------------------------------------------------------------------. mfx compute, dydx predict(outcome(2))
Marginal effects after mlogit
y = Pr(mode==2) (predict, outcome(2))
= .14472379
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ]
X
---------+-------------------------------------------------------------------ydiv1000 | -.0206598
.00487 -4.24 0.000 -.030212 -.011108 4.09934
------------------------------------------------------------------------------

268

. mfx compute, dydx predict(outcome(3))


Marginal effects after mlogit
y = Pr(mode==3) (predict, outcome(3))
= .35220366
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+-------------------------------------------------------------------ydiv1000 | .0325985
.00569 5.73 0.000 .021442 .043755 4.09934
-----------------------------------------------------------------------------. mfx compute, dydx predict(outcome(4))
Marginal effects after mlogit
y = Pr(mode==4) (predict, outcome(4))
= .38765763
-----------------------------------------------------------------------------variable |
dy/dx Std. Err. z P>|z| [ 95% C.I. ]
X
---------+-------------------------------------------------------------------ydiv1000 | -.0120137
.00608 -1.98 0.048 -.023922 -.000106 4.09934
-----------------------------------------------------------------------------.
. * Better is to evaluate marginal effect for each observation and average
. * The following calculates marginal effects using noncalculus methods
. * by comparing the predicted probability before and after change in x
. * Here consider small change of 0.0001 - then multiply by 1000
. * So should be similar to using calculus methods.
. replace ydiv1000 = ydiv1000 + 0.0001
(1182 real changes made)
. predict p1new p2new p3new p4new
(option p assumed; predicted probabilities)
. gen dp1dy = 10000*(p1new - p1)
. gen dp2dy = 10000*(p2new - p2)
. gen dp3dy = 10000*(p3new - p3)
. gen dp4dy = 10000*(p4new - p4)
.
. * The computed marginal effects follow.
. * These are close to those given in text page 494 (which were calculated using Limdep)
. sum dp1dy dp2dy dp3dy dp4dy
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dp1dy |
1182 .0001549 .0015919 -.0042468 .0027567
dp2dy |
1182 -.0207849 .0046004 -.0278652 -.0067055
269

dp3dy |
dp4dy |

1182 .0318045 .0014852 .0280142 .0336766


1182 -.0111929 .0041308 -.0190735 -.0026822

.
. * Note that here these are similar to the earlier values at means
. * This is because little variation in predicted probability across individuals here
.
. * ASIDE: Binary logit will differ a little from MNL
. keep if mode == 1 | mode == 2
(870 observations deleted)
. mlogit mode ydiv1000
Iteration 0: log likelihood = -213.14899
Iteration 1: log likelihood = -210.28877
Iteration 2: log likelihood = -210.28833
Multinomial logistic regression
LR chi2(1)
Prob > chi2
Log likelihood = -210.28833

Number of obs =
312
=
5.72
= 0.0168
Pseudo R2
= 0.0134

-----------------------------------------------------------------------------mode |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------beach
|
ydiv1000 | .1134757 .0481736 2.36 0.018 .0190571 .2078942
_cons | -.7037127 .2125851 -3.31 0.001 -1.120372 -.2870535
-----------------------------------------------------------------------------(Outcome mode==pier is the comparison group)
.
. ******* (2) CONDITIONAL LOGIT: ALTERNATIVE-SPECIFIC REGRESSOR *********
.
. *** (2A) Estimate the model
.
. * This requires reshaping the data
. clear
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000
.
. * Data are one entry per individual
. * Need to reshape to 4 observations per individual - one for each alternative
. * Use reshape to do this which also creates variable (see below)
270

. * alternatv = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter


. gen id = _n
. gen d1 = dbeach
. gen p1 = pbeach
. gen q1 = qbeach
. gen d2 = dpier
. gen p2 = ppier
. gen q2 = qpier
. gen d3 = dprivate
. gen p3 = pprivate
. gen q3 = qprivate
. gen d4 = dcharter
. gen p4 = pcharter
. gen q4 = qcharter
. describe
Contains data
obs:
1,182
vars:
30
size:
146,568 (98.6% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
271

qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
id
float %9.0g
d1
float %9.0g
p1
float %9.0g
q1
float %9.0g
d2
float %9.0g
p2
float %9.0g
q2
float %9.0g
d3
float %9.0g
p3
float %9.0g
q3
float %9.0g
d4
float %9.0g
p4
float %9.0g
q4
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524
.0678
.5333
qpier |
1182 .1622237 .1603898
.0014
.4522
qprivate |
1182 .1712146 .2097885 .0002 .7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
id |
1182
591.5 341.3583
1
1182
d1 |
1182 .1133672 .3171753
0
1
p1 |
1182 103.422 103.641
1.29 843.186
-------------+-------------------------------------------------------q1 |
1182 .2410113 .1907524
.0678
.5333
d2 |
1182 .1505922 .3578023
0
1
p2 |
1182 103.422 103.641
1.29 843.186
272

q2 |
1182 .1622237 .1603898
.0014
.4522
d3 |
1182 .3536379 .4783008
0
1
-------------+-------------------------------------------------------p3 |
1182 55.25657 62.71344
2.29 666.11
q3 |
1182 .1712146 .2097885
.0002
.7369
d4 |
1182 .3824027 .4861799
0
1
p4 |
1182 84.37924 63.54465
27.29 691.11
q4 |
1182 .6293679 .7061142
.0021 2.3101
.
. reshape long d p q, i(id) j(alterntv)
(note: j = 1 2 3 4)
Data
wide -> long
----------------------------------------------------------------------------Number of obs.
1182 -> 4728
Number of variables
30 ->
22
j variable (4 values)
-> alterntv
xij variables:
d1 d2 ... d4 -> d
p1 p2 ... p4 -> p
q1 q2 ... q4 -> q
----------------------------------------------------------------------------. * This automatically creates alterntv = 1 (beach), ... 4 (charter)
. describe
Contains data
obs:
4,728
vars:
22
size:
420,792 (95.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
alterntv
byte %9.0g
mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
273

qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
d
float %9.0g
p
float %9.0g
q
float %9.0g
------------------------------------------------------------------------------Sorted by: id alterntv
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219 .0002 .7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
.
. clogit d q, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -1627.3339


log likelihood = -1604.8049
log likelihood = -1604.6163
log likelihood = -1604.6163

Conditional (fixed-effects) logistic regression Number of obs =


LR chi2(1)
=
67.97

4728

274

Prob > chi2


Log likelihood = -1604.6163

= 0.0000
Pseudo R2
=

0.0207

-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------q | .6307908 .0757624 8.33 0.000 .4822993 .7792823
-----------------------------------------------------------------------------. clogit d p, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -1595.7652


log likelihood = -1411.4335
log likelihood = -1376.0224
log likelihood = -1372.9619
log likelihood = -1372.9332
log likelihood = -1372.9332

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(1)
= 531.33
Prob > chi2 = 0.0000
Log likelihood = -1372.9332
Pseudo R2
= 0.1621
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------p | -.0179501 .0010694 -16.79 0.000 -.0200461 -.0158542
-----------------------------------------------------------------------------.
. * The following gives CL column of Table 15.2
. clogit d p q, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -1581.9099


log likelihood = -1363.5718
log likelihood = -1317.8453
log likelihood = -1312.1013
log likelihood = -1311.9797
log likelihood = -1311.9796

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(2)
= 653.24
Prob > chi2 = 0.0000
Log likelihood = -1311.9796
Pseudo R2
= 0.1993
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------p | -.0204765 .0012231 -16.74 0.000 -.0228737 -.0180794
q | .9530985 .0894134 10.66 0.000 .7778514 1.128346
-----------------------------------------------------------------------------275

.
. *** (2B) Calculate the marginal effects
.
. quietly clogit d p q, group(id)
. predict pinitial
(option pc1 assumed; conditional probability for single outcome within group)
.
. * Now compute marginal effects
. * Consider in turn a change in each price and catch rate
. * Change price by 1 unit and then multiply by 100 as in Table 15.2
. * Change catch rate by 0.001 and then multiply by 1000
.
. * Change p1: price beach
. replace p = p + 1 if alterntv==1
(1182 real changes made)
. predict pnewp1
(option pc1 assumed; conditional probability for single outcome within group)
. gen mep1 = 100*(pnewp1 - pinitial)
. replace p = p - 1 if alterntv==1
(1182 real changes made)
.
. * Change p2: price pier
. replace p = p + 1 if alterntv==2
(1182 real changes made)
. predict pnewp2
(option pc1 assumed; conditional probability for single outcome within group)
. gen mep2 = 100*(pnewp2 - pinitial)
. replace p = p - 1 if alterntv==2
(1182 real changes made)
.
. * Change p3: price private boat
. replace p = p + 1 if alterntv==3
(1182 real changes made)
. predict pnewp3
(option pc1 assumed; conditional probability for single outcome within group)
. gen mep3 = 100*(pnewp3 - pinitial)
. replace p = p - 1 if alterntv==3
276

(1182 real changes made)


.
. * Change p4: price charter boat
. replace p = p + 1 if alterntv==4
(1182 real changes made)
. predict pnewp4
(option pc1 assumed; conditional probability for single outcome within group)
. gen mep4 = 100*(pnewp4 - pinitial)
. replace p = p - 1 if alterntv==4
(1182 real changes made)
.
. * Change q1: catch rate beach
. replace q = q + 0.001 if alterntv==1
(1182 real changes made)
. predict pnewq1
(option pc1 assumed; conditional probability for single outcome within group)
. gen meq1 = 1000*(pnewq1 - pinitial)
. replace q = q - 0.001 if alterntv==1
(1182 real changes made)
.
. * Change q2: catch rate pier
. replace q = q + 0.001 if alterntv==2
(1182 real changes made)
. predict pnewq2
(option pc1 assumed; conditional probability for single outcome within group)
. gen meq2 = 1000*(pnewq2 - pinitial)
. replace q = q - 0.001 if alterntv==2
(1182 real changes made)
.
. * Change q1: catch rate private boat
. replace q = q + 0.001 if alterntv==3
(1182 real changes made)
. predict pnewq3
(option pc1 assumed; conditional probability for single outcome within group)
. gen meq3 = 1000*(pnewq3 - pinitial)

277

. replace q = q - 0.001 if alterntv==3


(1182 real changes made)
.
. * Change q1: catch rate charter boat
. replace q = q + 0.001 if alterntv==4
(1182 real changes made)
. predict pnewq4
(option pc1 assumed; conditional probability for single outcome within group)
. gen meq4 = 1000*(pnewq4 - pinitial)
. replace q = q + 0.001 if alterntv==4
(1182 real changes made)
.
. * Following gives Table 15.3 on page 493
. sort alterntv
. by alterntv: sum pinitial mep1 mep2 mep3 mep4 meq1 meq2 meq3 meq4
----------------------------------------------------------------------------------------------------> alterntv = 1
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pinitial |
1182 .1942074 .1545855 6.19e-08 .6159062
mep1 |
1182 -.2703818 .1753241 -.5119085 -1.26e-07
mep2 |
1182 .1183563 .1425011
0 .5107701
mep3 |
1182 .0846517 .0561764 6.24e-08 .1818448
mep4 |
1182 .0675326 .0398588 6.44e-08 .1960158
-------------+-------------------------------------------------------meq1 |
1182 .1264198 .0817316 5.91e-08 .2382994
meq2 |
1182 -.0552685 .0664207 -.2378225
0
meq3 |
1182 -.0395602 .0262581 -.0849366 -2.91e-08
meq4 |
1182 -.0315872 .0186528 -.0915527 -3.00e-08
----------------------------------------------------------------------------------------------------> alterntv = 2
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pinitial |
1182 .1832872 .1456892 5.73e-08 .484103
mep1 |
1182 .1184102 .1425963
0 .5111754
mep2 |
1182 -.2618934 .1742628 -.5112112 -1.16e-07
mep3 |
1182 .0801368 .0543153 5.78e-08 .1729459
mep4 |
1182 .0636229 .0381182 5.96e-08 .1775354
-------------+-------------------------------------------------------meq1 |
1182 -.0552672 .0664175 -.2378225
0
meq2 |
1182 .1224849 .0812789 5.47e-08 .2380311
278

meq3 |
meq4 |

1182 -.0374514
1182 -.0297604

.0253908 -.0807345 -2.69e-08


.0178421 -.0829101 -2.78e-08

----------------------------------------------------------------------------------------------------> alterntv = 3
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pinitial |
1182 .3298317 .173932 .0000756 .6739099
mep1 |
1182 .084509 .0561326
0 .1815647
mep2 |
1182 .0799891 .0542687
0 .172469
mep3 |
1182 -.3897785 .1364849 -.5119085 -.0001532
mep4 |
1182 .2248109 .1606873 1.24e-08 .5118489
-------------+-------------------------------------------------------meq1 |
1182 -.0395636
.02626 -.0849366
0
meq2 |
1182 -.0374553 .0253917 -.0807345
0
meq3 |
1182 .1818861 .0633881 .0000721 .2382994
meq4 |
1182 -.104879 .0748259 -.2382398 -7.28e-09
----------------------------------------------------------------------------------------------------> alterntv = 4
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------pinitial |
1182 .2926737 .1807255 .000078 .7322331
mep1 |
1182 .0674624 .0398696
0 .1958013
mep2 |
1182 .0635479 .0381287
0 .1772434
mep3 |
1182
.22499 .1608719 1.24e-08 .511682
mep4 |
1182 -.3559665 .1370352 -.5119085 -.0001582
-------------+-------------------------------------------------------meq1 |
1182 -.0315891 .018653 -.0915825
0
meq2 |
1182 -.0297618 .0178418 -.0829399
0
meq3 |
1182 -.1048757 .0748219 -.2382398 -7.28e-09
meq4 |
1182 .1662257 .0636901 .0000744 .2382994

.
. ******* (3) CONDITIONAL LOGIT: ALTERNATIVE-INVARIANT REGRESSOR *********
.
. * Here we get clogit to do something that is easier done by mlogit
.
. clear
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000

279

.
. * Data are one entry per individual
. * Need to reshape to 4 observations per individual - one for each alternative
. * Use reshape to do this but first create variable
. * Alternative = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. gen id = _n
. gen d1 = dbeach
. gen d2 = dpier
. gen d3 = dprivate
. gen d4 = dcharter
. describe
Contains data
obs:
1,182
vars:
22
size:
108,744 (98.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
id
float %9.0g
d1
float %9.0g
d2
float %9.0g
d3
float %9.0g
d4
float %9.0g
------------------------------------------------------------------------------Sorted by:
Note: dataset has changed since last saved

280

. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate | 1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524 .0678 .5333
qpier |
1182 .1622237 .1603898
.0014
.4522
qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
id |
1182
591.5 341.3583
1
1182
d1 |
1182 .1133672 .3171753
0
1
d2 |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------d3 | 1182 .3536379 .4783008
0
1
d4 |
1182 .3824027 .4861799
0
1
.
. reshape long d, i(id) j(alterntv)
(note: j = 1 2 3 4)
Data
wide -> long
----------------------------------------------------------------------------Number of obs.
1182 -> 4728
Number of variables
22 ->
20
j variable (4 values)
-> alterntv
xij variables:
d1 d2 ... d4 -> d
----------------------------------------------------------------------------. describe
Contains data
obs:
4,728
vars:
20
size:
382,968 (96.3% of memory free)
------------------------------------------------------------------------------281

storage display value


variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
alterntv
byte %9.0g
mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
d
float %9.0g
------------------------------------------------------------------------------Sorted by: id alterntv
Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002
.7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
282

income |
4728 4099.337 2461.183 416.6667
ydiv1000 |
4728 4.099337 2.461183 .4166667
d|
4728
.25 .4330585
0
1

12500
12.5

.
. gen obsnum=_n
. gen d2 = 0
. replace d2 = 1 if mod(obsnum,4)==2
(1182 real changes made)
. gen d3 = 0
. replace d3 = 1 if mod(obsnum,4)==3
(1182 real changes made)
. gen d4 = 0
. replace d4 = 1 if mod(obsnum,4)==0
(1182 real changes made)
. gen d2y = 0
. replace d2y = d2*ydiv1000
(1182 real changes made)
. gen d3y = 0
. replace d3y = d3*ydiv1000
(1182 real changes made)
. gen d4y = 0
. replace d4y = d4*ydiv1000
(1182 real changes made)
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
283

pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919 .0678 .5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002 .7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------obsnum |
4728
2364.5
1365
1
4728
d2 |
4728
.25 .4330585
0
1
d3 |
4728
.25 .4330585
0
1
d4 |
4728
.25 .4330585
0
1
d2y |
4728 1.024834 2.160064
0
12.5
-------------+-------------------------------------------------------d3y |
4728 1.024834 2.160064
0
12.5
d4y |
4728 1.024834 2.160064
0
12.5
.
. * The following gives MNL column of Table 15.2, p.493,
. * which was more easily obtained using mlogit earlier
. clogit d d2 d3 d4 d2y d3y d4y, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -1570.1863


log likelihood = -1479.3713
log likelihood = -1477.159
log likelihood = -1477.1506
log likelihood = -1477.1506

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(6)
= 322.90
Prob > chi2 = 0.0000
Log likelihood = -1477.1506
Pseudo R2
= 0.0985
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------d2 | .8141503 .228632 3.56 0.000 .3660399 1.262261
d3 | .7389208 .1967309 3.76 0.000 .3533352 1.124506
d4 | 1.341291 .1945167 6.90 0.000 .9600457 1.722537
d2y | -.1434029 .0532884 -2.69 0.007 -.2478463 -.0389595
d3y | .0919064 .0406637 2.26 0.024 .0122069 .1716058
d4y | -.0316399 .0418463 -0.76 0.450 -.1136571 .0503774
-----------------------------------------------------------------------------.
284

. ******* (4) "MIXED LOGIT" = CONDITIONAL LOGIT WITH BOTH


.*
ALTERNATIVE-SPECIFIC REGRESSOR
.*
AND ALTERNATIVE INVARIANT REGRESSOR *********
.
. clear
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000
.
. * Data are one entry per individual
. * Need to reshape to 4 observations per individual - one for each alternative
. * Use reshape to do this but first create variable
. * Alternative = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. gen id = _n
. gen d1 = dbeach
. gen p1 = pbeach
. gen q1 = qbeach
. gen d2 = dpier
. gen p2 = ppier
. gen q2 = qpier
. gen d3 = dprivate
. gen p3 = pprivate
. gen q3 = qprivate
. gen d4 = dcharter
. gen p4 = pcharter
. gen q4 = qcharter
.
. reshape long d p q, i(id) j(alterntv)
(note: j = 1 2 3 4)
Data
wide -> long
----------------------------------------------------------------------------285

Number of obs.
1182 -> 4728
Number of variables
30 ->
22
j variable (4 values)
-> alterntv
xij variables:
d1 d2 ... d4 -> d
p1 p2 ... p4 -> p
q1 q2 ... q4 -> q
----------------------------------------------------------------------------. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter | 4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002
.7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
.
. * Bring in alternative specific dummies
. * Since d2-d4 already used instead call them dummy2 - dummy4
. gen obsnum=_n
. gen dummy1 = 0
. replace dummy1 = 1 if mod(obsnum,4)==1
(1182 real changes made)
. gen dummy2 = 0
286

. replace dummy2 = 1 if mod(obsnum,4)==2


(1182 real changes made)
. gen dummy3 = 0
. replace dummy3 = 1 if mod(obsnum,4)==3
(1182 real changes made)
. gen dummy4 = 0
. replace dummy4 = 1 if mod(obsnum,4)==0
(1182 real changes made)
. * And interact with income
. gen d1y = 0
. replace d1y = dummy1*ydiv1000
(1182 real changes made)
. gen d2y = 0
. replace d2y = dummy2*ydiv1000
(1182 real changes made)
. gen d3y = 0
. replace d3y = dummy3*ydiv1000
(1182 real changes made)
. gen d4y = 0
. replace d4y = dummy4*ydiv1000
(1182 real changes made)
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
287

-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219 .0002 .7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
obsnum |
4728
2364.5
1365
1
4728
dummy1 |
4728
.25 .4330585
0
1
dummy2 |
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------dummy3 |
4728
.25 .4330585
0
1
dummy4 |
4728
.25 .4330585
0
1
d1y |
4728 1.024834 2.160064
0
12.5
d2y |
4728 1.024834 2.160064
0
12.5
d3y |
4728 1.024834 2.160064
0
12.5
-------------+-------------------------------------------------------d4y |
4728 1.024834 2.160064
0
12.5
.
. clogit d dummy2 dummy3 dummy4 p q, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

log likelihood = -1548.5161


log likelihood = -1311.3761
log likelihood = -1247.5777
log likelihood = -1232.1412
log likelihood = -1230.7975
log likelihood = -1230.7838
log likelihood = -1230.7838

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(5)
= 815.63
Prob > chi2 = 0.0000
Log likelihood = -1230.7838
Pseudo R2
= 0.2489
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------dummy2 | .3070552 .1145738 2.68 0.007 .0824947 .5316158
dummy3 | .8713749 .1140428 7.64 0.000 .6478551 1.094895
dummy4 | 1.498888 .1329328 11.28 0.000 1.238345 1.759432
p | -.0247896 .0017044 -14.54 0.000 -.0281301 -.021449
q | .3771689 .1099707 3.43 0.001 .1616303 .5927074
288

-----------------------------------------------------------------------------.
. * The following gives Mixed column of Table 15.2, p.493
. clogit d p q dummy2 dummy3 dummy4 d2y d3y d4y, group(id)
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

log likelihood = -1538.389


log likelihood = -1297.4143
log likelihood = -1233.5431
log likelihood = -1216.8043
log likelihood = -1215.1582
log likelihood = -1215.1376
log likelihood = -1215.1376

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(8)
= 846.92
Prob > chi2 = 0.0000
Log likelihood = -1215.1376
Pseudo R2
= 0.2584
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------p | -.0251166 .0017317 -14.50 0.000 -.0285106 -.0217225
q | .357782 .1097733 3.26 0.001 .1426302 .5729337
dummy2 | .7779594 .2204939 3.53 0.000 .3457992 1.21012
dummy3 | .5272788 .2227927 2.37 0.018 .0906131 .9639444
dummy4 | 1.694366 .2240506 7.56 0.000 1.255235 2.133497
d2y | -.1275771 .0506395 -2.52 0.012 -.2268288 -.0283255
d3y | .0894398 .0500671 1.79 0.074 -.0086898 .1875695
d4y | -.0332917 .0503409 -0.66 0.508 -.131958 .0653746
-----------------------------------------------------------------------------.
. * Output data file for Read into Limdep program mma15p4gev.lim
. outfile id d p q ydiv1000 dummy2 dummy3 dummy4 d2y d3y d4y using mma15p4gev.asc, replace
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma15p1mnl.txt
log type: text
closed on: 19 May 2005, 12:16:24
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma15p2gev.txt
log type: text
opened on: 19 May 2005, 12:16:29
.
. ********** OVERVIEW OF MMA15P2GEV.DO **********
.
289

. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 15.6.3 page 511
. * Nested logit (GEV) model analysis.
. * (1) Set data up and reproduce Mixed estimates in Table 15.2 p.493
. * (2A) Nested logit model estimates (page 511)
. * (2B) Restricted nested logit model estimates (page 511)
. * (2C) Equivalent conditional logit model estimates (same as (2B))
.
. * Related programs are
. * mma15p1mnl.do multinomial and conditional logit using Stata
. * mma15p3mnl.lim multinomial logit using Limdep
. * mma15p4gev.lim conditional and nested logit using Limdep and Nlogit
.
. * To run this program you need data file
. * Nldata.asc
.
. * NOTE: The example here is deliberately simple and merely illustrative.
.*
with nesting structure
.*
/ \
.*
/ \ / \
. * In this case with parameter rho_j differing across alternatives
. * Stata 8 estimates the earlier variant of the nested logit model
. * rather than the preferred variant given in the text.
. * See the discussion at bottom of page 511 and also Train (2003, p.88)
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Graphics scheme */
.
. ********** DATA DESCRIPTION **********
.
. * Data Set comes from :
. * J. A. Herriges and C. L. Kling,
. * "Nonlinear Income Effects in Random Utility Models",
. * Review of Economics and Statistics, 81(1999): 62-72
.
. * The data are given as a combined observation with data on all 4 choices.
. * This will work for multinomial logit program.
. * For conditional logit will need to make a new data set which has
. * four separate entries for each observation as there are four alternatives.
.
290

. * Filename: NLDATA.ASC
. * Format: Ascii
. * Number of Observations: 1182
. * Each observations appears over 3 lines with 4 variables per line
. * so 4 x 1182 = 4728 observations
. * Variable Number and Description
. * 1 Recreation mode choice. = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. * 2 Price for chosen alternative
. * 3 Catch rate for chosen alternative
. * 4 = 1 if beach mode chosen; = 0 otherwise
. * 5 = 1 if pier mode chosen; = 0 otherwise
. * 6 = 1 if private boat mode chosen; = 0 otherwise
. * 7 = 1 if charter boat mode chosen; = 0 otherwise
. * 8 = price for beach mode
. * 9 = price for pier mode
. * 10 = price for private boat mode
. * 11 = price for charter boat mode
. * 12 = catch rate for beach mode
. * 13 = catch rate for pier mode
. * 14 = catch rate for private boat mode
. * 15 = catch rate for charter boat mode
. * 16 = monthly income
.
. ******* (1) CONDITIONAL LOGIT MODEL (Table 15.2 p.493 Mixed column) *********
.
. infile mode price crate dbeach dpier dprivate dcharter pbeach ppier /*
> */ pprivate pcharter qbeach qpier qprivate qcharter income /*
> */ using nldata.asc
(1182 observations read)
.
. gen ydiv1000 = income/1000
.
. * Data are one entry per individual
. * Need to reshape to 4 observations per individual - one for each alternative
. * Use reshape to do this which also creates variable (see below)
. * alternatv = 1 if beach, = 2 if pier; = 3 if private boat; = 4 if charter
. gen id = _n
. gen d1 = dbeach
. gen p1 = pbeach
. gen q1 = qbeach
. gen d2 = dpier
. gen p2 = ppier
. gen q2 = qpier
291

. gen d3 = dprivate
. gen p3 = pprivate
. gen q3 = qprivate
. gen d4 = dcharter
. gen p4 = pcharter
. gen q4 = qcharter
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------mode |
1182 3.005076 .9936162
1
4
price |
1182 52.08197 53.82997
1.29 666.11
crate |
1182 .3893684 .5605964
.0002 2.3101
dbeach |
1182 .1133672 .3171753
0
1
dpier |
1182 .1505922 .3578023
0
1
-------------+-------------------------------------------------------dprivate |
1182 .3536379 .4783008
0
1
dcharter |
1182 .3824027 .4861799
0
1
pbeach |
1182 103.422 103.641
1.29 843.186
ppier |
1182 103.422 103.641
1.29 843.186
pprivate |
1182 55.25657 62.71344
2.29 666.11
-------------+-------------------------------------------------------pcharter |
1182 84.37924 63.54465
27.29 691.11
qbeach |
1182 .2410113 .1907524
.0678
.5333
qpier |
1182 .1622237 .1603898
.0014
.4522
qprivate |
1182 .1712146 .2097885
.0002
.7369
qcharter |
1182 .6293679 .7061142
.0021 2.3101
-------------+-------------------------------------------------------income |
1182 4099.337 2461.964 416.6667
12500
ydiv1000 |
1182 4.099337 2.461964 .4166667
12.5
id |
1182
591.5 341.3583
1
1182
d1 |
1182 .1133672 .3171753
0
1
p1 |
1182 103.422 103.641
1.29 843.186
-------------+-------------------------------------------------------q1 |
1182 .2410113 .1907524
.0678
.5333
d2 |
1182 .1505922 .3578023
0
1
p2 |
1182 103.422 103.641
1.29 843.186
q2 |
1182 .1622237 .1603898
.0014
.4522
d3 |
1182 .3536379 .4783008
0
1
-------------+-------------------------------------------------------p3 |
1182 55.25657 62.71344
2.29 666.11
q3 |
1182 .1712146 .2097885
.0002
.7369
d4 |
1182 .3824027 .4861799
0
1
p4 |
1182 84.37924 63.54465
27.29 691.11
292

q4 |

1182 .6293679

.7061142

.0021

2.3101

.
. reshape long d p q, i(id) j(alterntv)
(note: j = 1 2 3 4)
Data
wide -> long
----------------------------------------------------------------------------Number of obs.
1182 -> 4728
Number of variables
30 ->
22
j variable (4 values)
-> alterntv
xij variables:
d1 d2 ... d4 -> d
p1 p2 ... p4 -> p
q1 q2 ... q4 -> q
----------------------------------------------------------------------------. * This automatically creates alterntv = 1 (beach), ... 4 (charter)
. describe
Contains data
obs:
4,728
vars:
22
size:
420,792 (95.9% of memory free)
------------------------------------------------------------------------------storage display value
variable name type format
label
variable label
------------------------------------------------------------------------------id
float %9.0g
alterntv
byte %9.0g
mode
float %9.0g
price
float %9.0g
crate
float %9.0g
dbeach
float %9.0g
dpier
float %9.0g
dprivate
float %9.0g
dcharter
float %9.0g
pbeach
float %9.0g
ppier
float %9.0g
pprivate
float %9.0g
pcharter
float %9.0g
qbeach
float %9.0g
qpier
float %9.0g
qprivate
float %9.0g
qcharter
float %9.0g
income
float %9.0g
ydiv1000
float %9.0g
d
float %9.0g
p
float %9.0g
q
float %9.0g
------------------------------------------------------------------------------293

Sorted by: id alterntv


Note: dataset has changed since last saved
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185 .0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919 .0678 .5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002
.7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
.
. * Bring in alternative specific dummies
. * Since d2-d4 already used instead call them dummy2 - dummy4
. gen obsnum=_n
. gen dummy1 = (mod(obsnum,4)==1) * 1
. gen dummy2 = (mod(obsnum,4)==2) * 1
. gen dummy3 = (mod(obsnum,4)==3) * 1
. gen dummy4 = (mod(obsnum,4)==0) * 1
. gen d1y = (mod(obsnum,4)==1) * ydiv1000
. gen d2y = (mod(obsnum,4)==2) * ydiv1000

294

. gen d3y = (mod(obsnum,4)==3) * ydiv1000


. gen d4y = (mod(obsnum,4)==0) * ydiv1000
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------id |
4728
591.5
341.25
1
1182
alterntv |
4728
2.5 1.118152
1
4
mode |
4728 3.005076 .9933008
1
4
price |
4728 52.08197 53.81289
1.29 666.11
crate |
4728 .3893684 .5604185
.0002 2.3101
-------------+-------------------------------------------------------dbeach |
4728 .1133672 .3170746
0
1
dpier |
4728 .1505922 .3576888
0
1
dprivate |
4728 .3536379 .478149
0
1
dcharter |
4728 .3824027 .4860256
0
1
pbeach |
4728 103.422 103.6081
1.29 843.186
-------------+-------------------------------------------------------ppier |
4728 103.422 103.6081
1.29 843.186
pprivate |
4728 55.25657 62.69354
2.29 666.11
pcharter |
4728 84.37924 63.52448
27.29 691.11
qbeach |
4728 .2410113 .1906919
.0678
.5333
qpier |
4728 .1622237 .1603389
.0014
.4522
-------------+-------------------------------------------------------qprivate |
4728 .1712146 .2097219
.0002
.7369
qcharter |
4728 .6293679 .7058901
.0021 2.3101
income |
4728 4099.337 2461.183 416.6667
12500
ydiv1000 |
4728 4.099337 2.461183 .4166667
12.5
d|
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------p|
4728 86.61996 88.01813
1.29 843.186
q|
4728 .3009544 .4335593
.0002 2.3101
obsnum |
4728
2364.5
1365
1
4728
dummy1 |
4728
.25 .4330585
0
1
dummy2 |
4728
.25 .4330585
0
1
-------------+-------------------------------------------------------dummy3 |
4728
.25 .4330585
0
1
dummy4 |
4728
.25 .4330585
0
1
d1y |
4728 1.024834 2.160064
0
12.5
d2y |
4728 1.024834 2.160064
0
12.5
d3y |
4728 1.024834 2.160064
0
12.5
-------------+-------------------------------------------------------d4y |
4728 1.024834 2.160064
0
12.5
.
. * The following gives Mixed column of Table 15.2 p.493
. * Note that dummy1 and d1y are omitted to avoid dummy variablle trap
.
295

. clogit d dummy2 dummy3 dummy4 d2y d3y d4y p q, group(id)


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

log likelihood = -1538.389


log likelihood = -1297.4143
log likelihood = -1233.5431
log likelihood = -1216.8043
log likelihood = -1215.1582
log likelihood = -1215.1376
log likelihood = -1215.1376

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(8)
= 846.92
Prob > chi2 = 0.0000
Log likelihood = -1215.1376
Pseudo R2
= 0.2584
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------dummy2 | .7779594 .2204939 3.53 0.000 .3457992 1.21012
dummy3 | .5272788 .2227927 2.37 0.018 .0906131 .9639444
dummy4 | 1.694366 .2240506 7.56 0.000 1.255235 2.133497
d2y | -.1275771 .0506395 -2.52 0.012 -.2268288 -.0283255
d3y | .0894398 .0500671 1.79 0.074 -.0086898 .1875695
d4y | -.0332917 .0503409 -0.66 0.508 -.131958 .0653746
p | -.0251166 .0017317 -14.50 0.000 -.0285106 -.0217225
q | .357782 .1097733 3.26 0.001 .1426302 .5729337
-----------------------------------------------------------------------------.
. ******* (2) NESTED LOGIT MODEL (p.511) *********
.
. * Define the Tree for Nested logit
.*
with nesting structure
.*
/ \
.*
/ \ / \
. * In this case with parameter rho_j differing across alternatives
. * Stata 8 estimates the earlier variant of the nested logit model
. * rather than the preferred variant given in the text.
. * See the discussion at bottom of page 511 and also Train (2003, p.88)
.
. nlogitgen type = alterntv(shore: 1 | 2 , boat: 3 | 4)
new variable type is generated with 2 groups
label list lb_type
lb_type:
1 shore
2 boat
. nlogittree alterntv type
tree structure specified for the nested logit model

296

top --> bottom


type
alterntv
-------------------------shore
1
2
boat
3
4
.
. *** (2A) Estimate the nested logit model
. ***
This is the model on p.511 that has "higher log-likelihood"
.
. * For the top level we use regressors that do not vary at the lower level
. * So not p or q, but could be income or alternative dummy
. * Here use income and alternative dummy
. gen dshore = (type ==1) * 1
. gen dshorey = (type ==1) * ydiv1000
. nlogit d (alterntv = p q) (type = dshore dshorey), group(id)
tree structure specified for the nested logit model
top --> bottom
type
alterntv
-------------------------shore
1
2
boat
3
4
initial:
log likelihood = -1256.8179
rescale:
log likelihood = -1256.8179
rescale eq: log likelihood = -1228.6278
Iteration 0: log likelihood = -1228.6278
Iteration 1: log likelihood = -1227.407 (backed up)
Iteration 2: log likelihood = -1225.366 (backed up)
Iteration 3: log likelihood = -1216.5831 (backed up)
Iteration 4: log likelihood = -1210.9623
Iteration 5: log likelihood = -1210.323 (backed up)
Iteration 6: log likelihood = -1199.5959
Iteration 7: log likelihood = -1198.2166
Iteration 8: log likelihood = -1193.1834
Iteration 9: log likelihood = -1190.8805
Iteration 10: log likelihood = -1188.0112
Iteration 11: log likelihood = -1185.7944
Iteration 12: log likelihood = -1184.8715
Iteration 13: log likelihood = -1183.776
Iteration 14: log likelihood = -1182.6316
297

Iteration 15: log likelihood = -1182.1119


Iteration 16: log likelihood = -1181.8783
Iteration 17: log likelihood = -1181.323
Iteration 18: log likelihood = -1181.162
Iteration 19: log likelihood = -1180.912
Iteration 20: log likelihood = -1180.7877
Iteration 21: log likelihood = -1180.5545
Iteration 22: log likelihood = -1180.4177
Iteration 23: log likelihood = -1180.2966
BFGS stepping has contracted, resetting BFGS Hessian (0)
Iteration 24: log likelihood = -1180.2253
Iteration 25: log likelihood = -1180.2209 (backed up)
Iteration 26: log likelihood = -1180.2139 (backed up)
Iteration 27: log likelihood = -1180.2137 (backed up)
Iteration 28: log likelihood = -1180.2113
Iteration 29: log likelihood = -1180.2019
Iteration 30: log likelihood = -1180.1739
Iteration 31: log likelihood = -1180.1278
BFGS stepping has contracted, resetting BFGS Hessian (1)
Iteration 32: log likelihood = -1180.0852
Iteration 33: log likelihood = -1180.0773 (backed up)
Iteration 34: log likelihood = -1180.0762 (backed up)
Iteration 35: log likelihood = -1180.0762 (backed up)
Iteration 36: log likelihood = -1180.0758
Iteration 37: log likelihood = -1180.0694
Iteration 38: log likelihood = -1180.0671
Iteration 39: log likelihood = -1180.0664
BFGS stepping has contracted, resetting BFGS Hessian (2)
Iteration 40: log likelihood = -1180.058
Iteration 41: log likelihood = -1180.0576 (backed up)
Iteration 42: log likelihood = -1180.0575 (backed up)
Iteration 43: log likelihood = -1180.0575 (backed up)
Iteration 44: log likelihood = -1180.0573
Iteration 45: log likelihood = -1180.0466
Iteration 46: log likelihood = -1180.0434
BFGS stepping has contracted, resetting BFGS Hessian (3)
Iteration 47: log likelihood = -1180.043
Iteration 48: log likelihood = -1180.0427 (backed up)
Iteration 49: log likelihood = -1180.0427 (backed up)
Iteration 50: log likelihood = -1180.0427 (backed up)
Iteration 51: log likelihood = -1180.0427
Iteration 52: log likelihood = -1180.0422
BFGS stepping has contracted, resetting BFGS Hessian (4)
Iteration 53: log likelihood = -1180.0414
Iteration 54: log likelihood = -1180.0412 (backed up)
Iteration 55: log likelihood = -1180.0412 (backed up)
Iteration 56: log likelihood = -1180.0412 (backed up)
Iteration 57: log likelihood = -1180.0411
Iteration 58: log likelihood = -1180.0404
Iteration 59: log likelihood = -1180.0401
BFGS stepping has contracted, resetting BFGS Hessian (5)
298

Iteration 60: log likelihood = -1180.0381


Iteration 61: log likelihood = -1180.038 (backed up)
Iteration 62: log likelihood = -1180.0364 (backed up)
Iteration 63: log likelihood = -1180.0364 (backed up)
Iteration 64: log likelihood = -1180.0364
Iteration 65: log likelihood = -1180.0361
Iteration 66: log likelihood = -1180.0357
BFGS stepping has contracted, resetting BFGS Hessian (6)
Iteration 67: log likelihood = -1180.0348
Iteration 68: log likelihood = -1180.0348 (backed up)
Iteration 69: log likelihood = -1180.0348 (backed up)
Iteration 70: log likelihood = -1180.0348 (backed up)
Iteration 71: log likelihood = -1180.0348
Iteration 72: log likelihood = -1180.0331
Iteration 73: log likelihood = -1180.0328
BFGS stepping has contracted, resetting BFGS Hessian (7)
Iteration 74: log likelihood = -1180.0319
Iteration 75: log likelihood = -1180.0318 (backed up)
Iteration 76: log likelihood = -1180.0317 (backed up)
Iteration 77: log likelihood = -1180.0317 (backed up)
Iteration 78: log likelihood = -1180.0317 (backed up)
Iteration 79: log likelihood = -1180.0313
BFGS stepping has contracted, resetting BFGS Hessian (8)
Iteration 80: log likelihood = -1180.031
Iteration 81: log likelihood = -1180.031 (backed up)
Iteration 82: log likelihood = -1180.031 (backed up)
Iteration 83: log likelihood = -1180.031 (backed up)
Iteration 84: log likelihood = -1180.031 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (9)
Iteration 85: log likelihood = -1180.0305
Iteration 86: log likelihood = -1180.0304 (backed up)
Iteration 87: log likelihood = -1180.0304 (backed up)
Iteration 88: log likelihood = -1180.0304 (backed up)
Iteration 89: log likelihood = -1180.0304
Iteration 90: log likelihood = -1180.0303
Iteration 91: log likelihood = -1180.0301
BFGS stepping has contracted, resetting BFGS Hessian (10)
Iteration 92: log likelihood = -1180.0296
Iteration 93: log likelihood = -1180.0295 (backed up)
Iteration 94: log likelihood = -1180.0295 (backed up)
Iteration 95: log likelihood = -1180.0295 (backed up)
Iteration 96: log likelihood = -1180.0295
Iteration 97: log likelihood = -1180.0292
Iteration 98: log likelihood = -1180.029
BFGS stepping has contracted, resetting BFGS Hessian (11)
Iteration 99: log likelihood = -1180.0288
Iteration 100: log likelihood = -1180.0288 (backed up)
Iteration 101: log likelihood = -1180.0288 (backed up)
Iteration 102: log likelihood = -1180.0288 (backed up)
Iteration 103: log likelihood = -1180.0288 (backed up)
Iteration 104: log likelihood = -1180.0285
299

BFGS stepping has contracted, resetting BFGS Hessian (12)


Iteration 105: log likelihood = -1180.0283
Iteration 106: log likelihood = -1180.0283 (backed up)
Iteration 107: log likelihood = -1180.0283 (backed up)
Iteration 108: log likelihood = -1180.0283 (backed up)
Iteration 109: log likelihood = -1180.0283
Iteration 110: log likelihood = -1180.0282
Iteration 111: log likelihood = -1180.028
BFGS stepping has contracted, resetting BFGS Hessian (13)
Iteration 112: log likelihood = -1180.0274
Iteration 113: log likelihood = -1180.0274 (backed up)
Iteration 114: log likelihood = -1180.0274 (backed up)
Iteration 115: log likelihood = -1180.0274 (backed up)
Iteration 116: log likelihood = -1180.0274 (backed up)
Iteration 117: log likelihood = -1180.0266
BFGS stepping has contracted, resetting BFGS Hessian (14)
Iteration 118: log likelihood = -1180.0265
Iteration 119: log likelihood = -1180.0265 (backed up)
Iteration 120: log likelihood = -1180.0265 (backed up)
Iteration 121: log likelihood = -1180.0265 (backed up)
Iteration 122: log likelihood = -1180.0265 (backed up)
Iteration 123: log likelihood = -1180.0263
BFGS stepping has contracted, resetting BFGS Hessian (15)
Iteration 124: log likelihood = -1180.0261
Iteration 125: log likelihood = -1180.0261 (backed up)
Iteration 126: log likelihood = -1180.0261 (backed up)
Iteration 127: log likelihood = -1180.0261 (backed up)
Iteration 128: log likelihood = -1180.0261 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (16)
Iteration 129: log likelihood = -1180.026
Iteration 130: log likelihood = -1180.026 (backed up)
Iteration 131: log likelihood = -1180.026 (backed up)
Iteration 132: log likelihood = -1180.026 (backed up)
Iteration 133: log likelihood = -1180.026 (backed up)
Iteration 134: log likelihood = -1180.0259
BFGS stepping has contracted, resetting BFGS Hessian (17)
Iteration 135: log likelihood = -1180.0213
Iteration 136: log likelihood = -1180.0208 (backed up)
Iteration 137: log likelihood = -1180.0207 (backed up)
Iteration 138: log likelihood = -1180.0207 (backed up)
Iteration 139: log likelihood = -1180.0206
Iteration 140: log likelihood = -1180.0191
Iteration 141: log likelihood = -1180.0186
BFGS stepping has contracted, resetting BFGS Hessian (18)
Iteration 142: log likelihood = -1180.0185
Iteration 143: log likelihood = -1180.0185 (backed up)
Iteration 144: log likelihood = -1180.0185 (backed up)
Iteration 145: log likelihood = -1180.0185
Iteration 146: log likelihood = -1180.0185 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (19)
Iteration 147: log likelihood = -1180.0184
300

Iteration 148: log likelihood = -1180.0184 (backed up)


Iteration 149: log likelihood = -1180.0184 (backed up)
Iteration 150: log likelihood = -1180.0184 (backed up)
Iteration 151: log likelihood = -1180.0184 (backed up)
Iteration 152: log likelihood = -1180.0184
Iteration 153: log likelihood = -1180.0183
BFGS stepping has contracted, resetting BFGS Hessian (20)
Iteration 154: log likelihood = -1180.0177
Iteration 155: log likelihood = -1180.0176 (backed up)
Iteration 156: log likelihood = -1180.0176 (backed up)
Iteration 157: log likelihood = -1180.0176 (backed up)
Iteration 158: log likelihood = -1180.0176 (backed up)
Iteration 159: log likelihood = -1180.0172
Iteration 160: log likelihood = -1180.0171
BFGS stepping has contracted, resetting BFGS Hessian (21)
Iteration 161: log likelihood = -1180.017
Iteration 162: log likelihood = -1180.017 (backed up)
Iteration 163: log likelihood = -1180.017 (backed up)
Iteration 164: log likelihood = -1180.017 (backed up)
Iteration 165: log likelihood = -1180.017
Iteration 166: log likelihood = -1180.017
BFGS stepping has contracted, resetting BFGS Hessian (22)
Iteration 167: log likelihood = -1180.0169
Iteration 168: log likelihood = -1180.0169 (backed up)
Iteration 169: log likelihood = -1180.0169 (backed up)
Iteration 170: log likelihood = -1180.0169 (backed up)
Iteration 171: log likelihood = -1180.0169 (backed up)
Iteration 172: log likelihood = -1180.0169
Iteration 173: log likelihood = -1180.0169
BFGS stepping has contracted, resetting BFGS Hessian (23)
Iteration 174: log likelihood = -1180.0167
Iteration 175: log likelihood = -1180.0167 (backed up)
Iteration 176: log likelihood = -1180.0167 (backed up)
Iteration 177: log likelihood = -1180.0167 (backed up)
Iteration 178: log likelihood = -1180.0167 (backed up)
Iteration 179: log likelihood = -1180.0166
BFGS stepping has contracted, resetting BFGS Hessian (24)
Iteration 180: log likelihood = -1180.0165
Iteration 181: log likelihood = -1180.0165 (backed up)
Iteration 182: log likelihood = -1180.0165 (backed up)
Iteration 183: log likelihood = -1180.0165 (backed up)
Iteration 184: log likelihood = -1180.0165 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (25)
Iteration 185: log likelihood = -1180.0165
Iteration 186: log likelihood = -1180.0165 (backed up)
Iteration 187: log likelihood = -1180.0165 (backed up)
Iteration 188: log likelihood = -1180.0164 (backed up)
Iteration 189: log likelihood = -1180.0164 (backed up)
Iteration 190: log likelihood = -1180.0164
BFGS stepping has contracted, resetting BFGS Hessian (26)
Iteration 191: log likelihood = -1180.0164
301

Iteration 192: log likelihood = -1180.0164 (backed up)


Iteration 193: log likelihood = -1180.0164 (backed up)
Iteration 194: log likelihood = -1180.0164 (backed up)
Iteration 195: log likelihood = -1180.0164 (backed up)
Iteration 196: log likelihood = -1180.0164
BFGS stepping has contracted, resetting BFGS Hessian (27)
Iteration 197: log likelihood = -1180.0163
Iteration 198: log likelihood = -1180.0163 (backed up)
Iteration 199: log likelihood = -1180.0163 (backed up)
Iteration 200: log likelihood = -1180.0163 (backed up)
Iteration 201: log likelihood = -1180.0163 (backed up)
Iteration 202: log likelihood = -1180.0162
BFGS stepping has contracted, resetting BFGS Hessian (28)
Iteration 203: log likelihood = -1180.0162
Iteration 204: log likelihood = -1180.0162 (backed up)
Iteration 205: log likelihood = -1180.0162 (backed up)
Iteration 206: log likelihood = -1180.0162 (backed up)
Iteration 207: log likelihood = -1180.0162 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (29)
Iteration 208: log likelihood = -1180.0161
Iteration 209: log likelihood = -1180.0161 (backed up)
Iteration 210: log likelihood = -1180.0161 (backed up)
Iteration 211: log likelihood = -1180.0161 (backed up)
Iteration 212: log likelihood = -1180.0161
Iteration 213: log likelihood = -1180.0161
BFGS stepping has contracted, resetting BFGS Hessian (30)
Iteration 214: log likelihood = -1180.016
Iteration 215: log likelihood = -1180.016 (backed up)
Iteration 216: log likelihood = -1180.016 (backed up)
Iteration 217: log likelihood = -1180.016 (backed up)
Iteration 218: log likelihood = -1180.016 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (31)
Iteration 219: log likelihood = -1180.016
Iteration 220: log likelihood = -1180.016 (backed up)
Iteration 221: log likelihood = -1180.016 (backed up)
Iteration 222: log likelihood = -1180.016 (backed up)
Iteration 223: log likelihood = -1180.016 (backed up)
BFGS stepping has contracted, resetting BFGS Hessian (32)
Iteration 224: log likelihood = -1180.0159
Iteration 225: log likelihood = -1180.0159 (backed up)
Iteration 226: log likelihood = -1180.0159 (backed up)
Iteration 227: log likelihood = -1180.0159 (backed up)
Iteration 228: log likelihood = -1180.0159
Iteration 229: log likelihood = -1180.0159
Iteration 230: log likelihood = -1180.0159
BFGS stepping has contracted, resetting BFGS Hessian (33)
Iteration 231: log likelihood = -1180.0157
Iteration 232: log likelihood = -1180.0157 (backed up)
Iteration 233: log likelihood = -1180.0157 (backed up)
Iteration 234: log likelihood = -1180.0157 (backed up)
Iteration 235: log likelihood = -1180.0157 (backed up)
302

Iteration 236: log likelihood = -1180.0156


Nested logit estimates
Levels
=
2
Dependent variable =
d
Log likelihood = -1180.0156

Number of obs
=
4728
LR chi2(6)
= 917.1687
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------alterntv |
p | -.0013303 .001081 -1.23 0.218 -.003449 .0007883
q | .1284825 .1038986 1.24 0.216 -.075155
.33212
-------------+---------------------------------------------------------------type
|
dshore | -11.40196 9.15307 -1.25 0.213 -29.34164 6.537733
dshorey | .1108341 .0531049 2.09 0.037 .0067505 .2149178
-------------+---------------------------------------------------------------(incl. value |
parameters) |
type
|
/shore | 29.98591 24.40089 1.23 0.219 -17.83896 77.81078
/boat | 14.06438 11.39886 1.23 0.217 -8.276971 36.40572
-----------------------------------------------------------------------------LR test of homoskedasticity (iv = 1): chi2(2)= 145.39 Prob > chi2 = 0.0000
-----------------------------------------------------------------------------. estimates store nlogitunrest
.
. *** (2B) Estimate the restricted nested logit model
. ***
This is the model on p.511 that has log L = -1252
.
. * Set the inclusive value parameters to 1
. nlogit d (alterntv = p q) (type = dshore dshorey), group(id) ivc(shore=1, boat=1)
tree structure specified for the nested logit model
top --> bottom
type
alterntv
-------------------------shore
1
2
boat
3
4
User-defined constraint(s):
IV constraint(s):
[shore]_cons = 1
[boat]_cons = 1
303

initial:
log likelihood = -1256.8179
rescale:
log likelihood = -1256.8179
rescale eq: log likelihood = -1228.6278
Iteration 0: log likelihood = -1264.4012
Iteration 1: log likelihood = -1264.1213 (backed up)
Iteration 2: log likelihood = -1256.9241 (backed up)
Iteration 3: log likelihood = -1255.0984 (backed up)
Iteration 4: log likelihood = -1254.4838
Iteration 5: log likelihood = -1252.7216
Iteration 6: log likelihood = -1252.7111
Iteration 7: log likelihood = -1252.711
Nested logit estimates
Levels
=
2
Dependent variable =
d
Log likelihood = -1252.711

Number of obs
=
4728
LR chi2(4)
= 771.7778
Prob > chi2
= 0.0000

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------alterntv |
p | -.020246 .0012832 -15.78 0.000 -.022761 -.017731
q | .7552644 .0918004 8.23 0.000
.575339 .9351899
-------------+---------------------------------------------------------------type
|
dshore | -.5897435 .1565201 -3.77 0.000 -.8965172 -.2829697
dshorey | -.0790869 .0381453 -2.07 0.038 -.1538503 -.0043235
-------------+---------------------------------------------------------------(incl. value |
parameters) |
type
|
/shore |
1
.
.
.
.
.
/boat |
1
.
.
.
.
.
-----------------------------------------------------------------------------LR test of homoskedasticity (iv = 1): chi2(0)= 0.00 Prob > chi2 =
.
-----------------------------------------------------------------------------. estimates store nlogitrest
.
. * Perform a likelihood ratio test that inclusive parameters = 1
. lrtest nlogitunrest nlogitrest
likelihood-ratio test
LR chi2(2) = 145.39
(Assumption: nlogitrest nested in nlogitunrest)
Prob > chi2 =

0.0000

.
. *** (2C) As a check, verify that this restricted nested logit = conditional logit
.
. clogit d p q dshore dshorey, group(id)
304

Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log likelihood = -1547.6028


log likelihood = -1317.5764
log likelihood = -1262.8183
log likelihood = -1253.096
log likelihood = -1252.7117
log likelihood = -1252.711

Conditional (fixed-effects) logistic regression Number of obs =


4728
LR chi2(4)
= 771.78
Prob > chi2 = 0.0000
Log likelihood = -1252.711
Pseudo R2
= 0.2355
-----------------------------------------------------------------------------d|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------p | -.0202461 .0012832 -15.78 0.000 -.0227611 -.0177311
q | .7552646 .0918003 8.23 0.000 .5753392 .9351899
dshore | -.5897442 .15652 -3.77 0.000 -.8965178 -.2829706
dshorey | -.0790866 .0381453 -2.07 0.038 -.1538499 -.0043232
-----------------------------------------------------------------------------.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma15p2gev.txt
log type: text
closed on: 19 May 2005, 12:19:10

305

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma16p1tobit.txt
log type: text
opened on: 19 May 2005, 13:00:31
.
. ********** OVERVIEW OF MMA16P1TOBIT.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 16.2.1 pages 530-1 and 16.9.2 page 565
. * Classic Tobit model with generated data
. * Provides
. * (1) Graph of various conditional means Figure 16.1 (ch16condmeans.wmf)
. * (2) Tobit model estimation: various estimators not reported in book
. * (3) Tobit model estimation: CLAD estimation mentioned on page 565
. * using generated data (see below)
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * Data generating process is
. * Regressor:
lnwage ~ N(2.75, 0.6^2)
. * Error term:
e ~ N(0, 1000^2)
. * Latent variable:
ystar = -2500 + 1000*lnwage + e
. * Truncated variable: ytrunc = 1(ystar>0)*ystar
. * Censored variable: ycens = 1(ystar<=0)*0 + 1(ystar>0)*ystar
. * Censoring Indicator: dy = 1(ycens>0)
.
. set seed 10101
. set obs 200
obs was 0, now 200
. gen e = 1000*invnorm(uniform( ))
. gen lnwage = 2.75 + 0.6*invnorm(uniform( ))
. gen ystar = -2500 + 1000*lnwage + e
306

. gen ytrunc = ystar


. replace ytrunc = . if (ystar < 0)
(70 real changes made, 70 to missing)
. gen ycens = ystar
. replace ycens = 0 if (ystar < 0)
(70 real changes made)
. gen dy = ycens
. replace dy = 1 if (ycens>0)
(130 real changes made)
.
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------e|
200 76.96455 977.5598 -2906.972 2943.727
lnwage |
200 2.792559 .6249093 .9039821 4.373462
ystar |
200 369.5237 1163.722 -2852.944 3105.383
ytrunc |
130 1047.602 712.0859 17.88135 3105.383
ycens |
200 680.9414 761.3346
0 3105.383
-------------+-------------------------------------------------------dy |
200
.65 .4781665
0
1
.
. * Save data as text (ascii) so that can use programs other than Stata
. outfile e lnwage ystar ytrunc ycens dy using mma16p1tobit.asc, replace
.
. ********** (1) PLOT THEORETICAL CONDITIONAL MEANS **********
.
. * Here we use the true parameter values used in the dgp
.
. * Compute the censored and truncated means
. gen xb = -2500 + 1000*lnwage
. gen sigma = 1000
. gen capphixb = normprob(xb/sigma)
. gen phixb = normd(xb/sigma)
. gen lamda = phixb/capphixb
. gen eytrunc = xb + sigma*lamda

307

. gen eycens = capphixb*eytrunc


.
. * Descriptive Statistics
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------e|
200 76.96455 977.5598 -2906.972 2943.727
lnwage |
200 2.792559 .6249093 .9039821 4.373462
ystar |
200 369.5237 1163.722 -2852.944 3105.383
ytrunc |
130 1047.602 712.0859 17.88135 3105.383
ycens |
200 680.9414 761.3346
0 3105.383
-------------+-------------------------------------------------------dy |
200
.65 .4781665
0
1
xb |
200 292.5592 624.9093 -1596.018 1873.462
sigma |
200
1000
0
1000
1000
capphixb |
200 .5983181 .2092614 .0552424 .9694977
phixb |
200 .3271769 .0771531 .0689849 .3989196
-------------+-------------------------------------------------------lamda |
200 .6687834 .3533611 .0711553 2.020711
eytrunc |
200 961.3426 283.2587 424.693 1944.617
eycens |
200 631.3493 380.6074 23.46106 1885.302
.
. * Plot Figure 16.1 on page 531
. sort lnwage
. graph twoway (scatter ystar lnwage, msize(small)) /*
> */ (scatter eytrunc lnwage, c(l) msize(vtiny) clstyle(p3) clwidth(medthick)) /*
> */ (scatter eycens lnwage, c(l) msize(vtiny) clstyle(p2) clwidth(medthick)) /*
> */ (scatter xb lnwage, c(l) msize(vtiny) clstyle(p1) clwidth(medthick)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Tobit: Censored and Truncated Means") /*
> */ xtitle("Natural Logarithm of Wage", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Different Conditional Means", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(5) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Actual Latent Variable") label(2 "Truncated Mean") /*
> */
label(3 "Censored Mean") label(4 "Uncensored Mean"))
. graph export ch16condmeans.wmf, replace
(file c:\Imbook\bwebpage\Section4\ch16condmeans.wmf written in Windows Metafile format)
.
. ********** (2) TOBIT MODEL ESTIMATION FOR THESE DATA **********
.
. * These are computations not reported in the book.
.
. * With only 200 observations the Heckman 2-step estimates given below
. * are very inefficient. To verify that they are consistent
. * increase the sample size e.g. set obs 20000
308

.
. * (2A) ESTIMATE THE VARIOUS MODELS
.
. *** UNCENSORED OLS REGRESSION
. * Possible here since for these generated data we actually know ystar
. * Yelds consistent estimate. Expect slope = 1000 approximately.
. regress ystar lnwage, robust
Regression with robust standard errors
Number of obs =
F( 1, 198) = 96.32
Prob > F
= 0.0000
R-squared = 0.2944
Root MSE = 980

200

-----------------------------------------------------------------------------|
Robust
ystar |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 1010.39 102.9518 9.81 0.000 807.3673 1213.413
_cons | -2452.05 303.2432 -8.09 0.000 -3050.051 -1854.049
-----------------------------------------------------------------------------. estimates store ols
. predict ystarols
(option xb assumed; fitted values)
.
. *** CENSORED OLS REGRESSION
. * Yields inconsistent estimates
. * From subsection 16.3.6 for slope coefficient OLS converges to p times b
. * where p is fraction of sample with positive values. Here 0.65*1000 = 650.
. regress ycens lnwage, robust
Regression with robust standard errors
Number of obs =
F( 1, 198) = 84.20
Prob > F
= 0.0000
R-squared = 0.2522
Root MSE = 660.04

200

-----------------------------------------------------------------------------|
Robust
ycens |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 611.8108 66.67493 9.18 0.000 480.3267 743.2949
_cons | -1027.577 176.0776 -5.84 0.000 -1374.805 -680.3484
-----------------------------------------------------------------------------. estimates store censols
. predict ycensols
309

(option xb assumed; fitted values)


.
. *** TRUNCATED OLS REGRESSION for POSITIVE WAGE
. * Yields inconsistent estimates
. * See subsection 16.3.6 for discussion.
. regress ytrunc lnwage, robust
Regression with robust standard errors
Number of obs =
F( 1, 128) = 22.05
Prob > F
= 0.0000
R-squared = 0.1261
Root MSE
= 668.28

130

-----------------------------------------------------------------------------|
Robust
ytrunc |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 442.6319 94.26938 4.70 0.000 256.1038
629.16
_cons | -282.4444 282.9091 -1.00 0.320 -842.2285 277.3396
-----------------------------------------------------------------------------. estimates store truncols
. predict ytrunols
(option xb assumed; fitted values)
.
. *** CENSORED TOBIT MLE REGRESSION for HWAGE
. * Yields consistent estimates
. tobit ycens lnwage, ll(0)
Tobit estimates

Number of obs =
200
LR chi2(1)
=
65.64
Prob > chi2 = 0.0000
Log likelihood = -1118.3857
Pseudo R2
= 0.0285
-----------------------------------------------------------------------------ycens |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 956.4877 116.8382 8.19 0.000 726.0879 1186.887
_cons | -2244.567 346.8778 -6.47 0.000 -2928.595 -1560.539
-------------+---------------------------------------------------------------_se | 896.6811 59.14988
(Ancillary parameter)
-----------------------------------------------------------------------------Obs. summary:
130

70 left-censored observations at ycens<=0


uncensored observations

. estimates store censtobit

310

. predict ycenstob
(option xb assumed; fitted values)
.
. *** TRUNCATED TOBIT MLE REGRESSION for HWAGE
. * If done propoerly yields consistent estimates
. * Not sure how to do this in Stata
. * The obvious command is
. * tobit ytrunc lnwage, ll(0)
. * but this gives the same estimates as truncated OLS
.
. *** PROBIT REGRESSION for HWAGE
. * Yields consistent estimates for slope b/s = 1000/1000 = 1
. * but uses less information so expect less efficient than tobit
. probit dy lnwage
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -129.48933


log likelihood = -106.07902
log likelihood = -105.30024
log likelihood = -105.29672

Probit estimates

Number of obs =
200
LR chi2(1)
=
48.39
Prob > chi2 = 0.0000
Log likelihood = -105.29672
Pseudo R2
= 0.1868
-----------------------------------------------------------------------------dy |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 1.173851 .1870053 6.28 0.000 .8073277 1.540375
_cons | -2.795715 .508104 -5.50 0.000 -3.79158 -1.799849
-----------------------------------------------------------------------------. estimates store probit
. predict yprobit
(option p assumed; Pr(dy))
.
. *** HECKMAN 2-STEP ESTIMATOR DONE MANUALLY
. * Yields consistent estimates but less efficient than censored tobit MLE
. * The second stage standard errors will be incorrect
. probit dy lnwage
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log likelihood = -129.48933


log likelihood = -106.07902
log likelihood = -105.30024
log likelihood = -105.29672

Probit estimates

Number of obs =
LR chi2(1)
=
48.39

200

311

Prob > chi2


Log likelihood = -105.29672

= 0.0000
Pseudo R2
=

0.1868

-----------------------------------------------------------------------------dy |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 1.173851 .1870053 6.28 0.000 .8073277 1.540375
_cons | -2.795715 .508104 -5.50 0.000 -3.79158 -1.799849
-----------------------------------------------------------------------------. predict probity, xb
. gen invmills = normd(probity)/normprob(probity)
. summarize dy probity invmills
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dy |
200
.65 .4781665
0
1
probity |
200 .482335 .7335506 -1.734574 2.33808
invmills |
200 .5867037 .3823083 .0261866 2.140342
. regress ytrunc lnwage invmills
Source |
SS
df
MS
Number of obs = 130
-------------+-----------------------------F( 2, 127) = 9.41
Model | 8440402.78 2 4220201.39
Prob > F
= 0.0002
Residual | 56971158.9 127 448591.802
R-squared = 0.1290
-------------+-----------------------------Adj R-squared = 0.1153
Total | 65411561.6 129 507066.369
Root MSE
= 669.77
-----------------------------------------------------------------------------ytrunc |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 176.6468 418.2392 0.42 0.673 -650.9731 1004.267
invmills | -498.9958 760.3525 -0.66 0.513 -2003.596 1005.604
_cons | 745.3069 1597.558 0.47 0.642 -2415.972 3906.586
-----------------------------------------------------------------------------. estimates store heck2step
. correlate lnwage invmills
(obs=200)
| lnwage invmills
-------------+-----------------lnwage | 1.0000
invmills | -0.9745 1.0000

. * And more robust standard errors may be found by


312

. regress ytrunc lnwage invmills, robust


Regression with robust standard errors
Number of obs =
F( 2, 127) = 13.96
Prob > F
= 0.0000
R-squared = 0.1290
Root MSE = 669.77

130

-----------------------------------------------------------------------------|
Robust
ytrunc |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 176.6468 379.1739 0.47 0.642 -573.6699 926.9636
invmills | -498.9958 635.4917 -0.79 0.434 -1756.519 758.5276
_cons | 745.3069 1431.149 0.52 0.603 -2086.68 3577.293
-----------------------------------------------------------------------------. estimates store heck2srobust
.
. *** HECKMAN 2-STEP ESTIMATOR DONE USING BUILT-IN HECKMAN COMMAND
. * Yields consistent estimates but less efficient than censored tobit MLE
. heckman ytrunc lnwage, select(lnwage) twostep
Heckman selection model -- two-step estimates Number of obs
(regression model with sample selection)
Censored obs
=
Uncensored obs =
130
Wald chi2(2)
Prob > chi2

200
70

= 39.57
= 0.0000

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------ytrunc
|
lnwage | 176.6469 425.0025 0.42 0.678 -656.3428 1009.636
_cons | 745.3067 1617.583 0.46 0.645 -2425.098 3915.711
-------------+---------------------------------------------------------------select
|
lnwage | 1.173851 .1870053 6.28 0.000 .8073277 1.540375
_cons | -2.795715 .508104 -5.50 0.000 -3.79158 -1.799849
-------------+---------------------------------------------------------------mills
|
lambda | -498.9957 760.5005 -0.66 0.512 -1989.549 991.5578
-------------+---------------------------------------------------------------rho | -0.67419
sigma | 740.1433
lambda | -498.99575 760.5005
-----------------------------------------------------------------------------. estimates store heckman
313

. predict ystarhec, xb
. predict ytrunhec, ycond
. predict ycenshec, yexpected
. predict yinvmill, mills
. predict yprobsel, psel
. correlate lnwage yinvmill
(obs=200)
| lnwage yinvmill
-------------+-----------------lnwage | 1.0000
yinvmill | -0.9745 1.0000

.
. * (2B) DISPLAY COEFFICIENT ESTIMATES
.
. * OLS estimates True model is -2500 + 1000*lnwage
. estimates table ols censols truncols, b(%10.2f) se(%10.2f) t stats(N ll)
----------------------------------------------------Variable | ols
censols
truncols
-------------+--------------------------------------lnwage | 1010.39
611.81
442.63
| 102.95
66.67
94.27
|
9.81
9.18
4.70
_cons | -2452.05 -1027.58 -282.44
| 303.24
176.08
282.91
|
-8.09
-5.84
-1.00
-------------+--------------------------------------N | 200.00
200.00
130.00
ll | -1660.29 -1581.24 -1029.07
----------------------------------------------------legend: b/se/t
.
. * Tobit estimates True model is -2500 + 1000*lnwage
. estimates table censtobit probit, b(%10.2f) se(%10.2f) t stats(N ll)
---------------------------------------Variable | censtobit
probit
-------------+-------------------------lnwage | 956.49
1.17
| 116.84
0.19
|
8.19
6.28
314

_se | 896.68
|
59.15
| 15.16
_cons | -2244.57
-2.80
| 346.88
0.51
|
-6.47
-5.50
-------------+-------------------------N | 200.00
200.00
ll | -1118.39
-105.30
---------------------------------------legend: b/se/t
.
. * Tobit estimates using Heckman manual True model is -2500 + 1000*lnwage
. estimates table heck2step heck2srobust, b(%10.2f) se(%10.2f) t stats(N ll)
---------------------------------------Variable | heck2step heck2sro~t
-------------+-------------------------lnwage | 176.65
176.65
| 418.24
379.17
|
0.42
0.47
invmills | -499.00 -499.00
| 760.35
635.49
|
-0.66
-0.79
_cons | 745.31
745.31
| 1597.56
1431.15
|
0.47
0.52
-------------+-------------------------N | 130.00
130.00
ll | -1028.85 -1028.85
---------------------------------------legend: b/se/t
.
. * Tobit estimates using Heckman built-in True model is -2500 + 1000*lnwage
. estimates table heckman, b(%10.2f) se(%10.2f) t stats(N ll)
--------------------------Variable | heckman
-------------+------------ytrunc
|
lnwage | 176.65
| 425.00
|
0.42
_cons | 745.31
| 1617.58
|
0.46
-------------+------------select
|
lnwage |
1.17
315

|
0.19
|
6.28
_cons | -2.80
|
0.51
|
-5.50
-------------+------------mills
|
lambda | -499.00
| 760.50
|
-0.66
-------------+------------Statistics |
N | 200.00
ll |
--------------------------legend: b/se/t
.
. ********** (3) CLAD ESTIMATION FOR THESE DATA page 565 **********
.
. * Compare tobit MLE with censored least absolute deviations (CLAD) estimator
. * Gives results at end of section 16.9.3 page 565
.
. tobit ycens lnwage, ll(0)
Tobit estimates

Number of obs =
200
LR chi2(1)
=
65.64
Prob > chi2 = 0.0000
Log likelihood = -1118.3857
Pseudo R2
= 0.0285
-----------------------------------------------------------------------------ycens |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------lnwage | 956.4877 116.8382 8.19 0.000 726.0879 1186.887
_cons | -2244.567 346.8778 -6.47 0.000 -2928.595 -1560.539
-------------+---------------------------------------------------------------_se | 896.6811 59.14988
(Ancillary parameter)
-----------------------------------------------------------------------------Obs. summary:
130

70 left-censored observations at ycens<=0


uncensored observations

. clad ycens lnwage, reps(100) ll(0)


Initial sample size = 200
Final sample size = 159
Pseudo R2 = .12380382
Bootstrap statistics
Variable | Reps Observed

Bias Std. Err. [95% Conf. Interval]


316

---------+------------------------------------------------------------------lnwage | 100 838.2366 59.09127 165.7476 509.3575 1167.116 (N)


|
666.9485 1298.217 (P)
|
664.528 1247.371 (BC)
---------+------------------------------------------------------------------const | 100 -1897.847 -184.2656 529.6713 -2948.83 -846.8643 (N)
|
-3406.233 -1435.466 (P)
|
-3406.233 -1435.466 (BC)
----------------------------------------------------------------------------N = normal, P = percentile, BC = bias-corrected
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section4\mma16p1tobit.txt
log type: text
closed on: 19 May 2005, 13:00:37
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma16p2mills.txt
log type: text
opened on: 19 May 2005, 13:02:12
.
. ********** OVERVIEW OF MMA16P2MILLS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 16.3.4 page 540
. * Presentation of Mills ratio
. * It provides
. * (1) Figure 16.1 (ch16millsratio.wmf)
. * This program requires no data
.
. ********** SETUP ***********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA AND FUNCTIONS
.
. * Create density cdf Mills ratio for N[0,1]
. set obs 100
obs was 0, now 100
317

. gen c = 4*(50-_n)/100
. gen PHIc = norm(c)
. gen phic = normden(c)
. gen lamdac = phic/(1-PHIc)
.
. * Descriptive statistics
. summarize
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------c|
100
-.02 1.16046
-2
1.96
PHIc |
100 .4952275 .338039 .0227501 .9750021
phic |
100 .2386177 .1157086 .053991 .3989423
lamdac |
100 .9284788 .7023349 .0552479 2.337835
.
. *********** FIGURE 16.2 page 540 ***********
.
. * This graph shows Mills ratio and cdf and density
. graph twoway (scatter lamdac c, c(l) msize(vtiny) clstyle(p1) clwidth(medthick)) /*
> */ (scatter PHIc c, c(l) msize(vtiny) clstyle(p3) clwidth(medthick)) /*
> */ (scatter phic c, c(l) msize(vtiny) clstyle(p2) clwidth(medthick)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Inverse Mills Ratio as Cutoff Varies") /*
> */ xtitle("Cutoff point c", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Inverse Mills, pdf and cdf", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Inverse Mills ratio") label(2 "N[0,1] Cdf") label(3 "N[0,1] Density"))
. graph export ch16millsratio.wmf, replace
(file c:\Imbook\bwebpage\Section4\ch16millsratio.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT ***********
. log close
log: c:\Imbook\bwebpage\Section4\mma16p2mills.txt
log type: text
closed on: 19 May 2005, 13:02:15
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma16p3selection.txt
log type: text
opened on: 19 May 2005, 13:04:33
.
. ********** OVERVIEW OF MMA16P3SELECTION.DO **********
318

.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 16.6 pages 553-5
. * Selection models example
. * It provides
. * (1) Two-part model estimation (Table 16.1)
. * (2) Selection model estimation
. * (2A) ML estimates (Table 16.1)
. * (2B) Heckman 2-step estimates (Table 16.1)
. * (2C) Check for possible collinearity problems in Heckman 2-Step
.
. * To use this program you need health expenditure data in Stata data set
. * randdata.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION **********
.
. * Essentially same data as in P. Deb and P.K. Trivedi (2002)
. * "The Structure of Demand for Medical Care: Latent Class versus
. * Two-Part Models", Journal of Health Economics, 21, 601-625
. * except that paper used different outcome (counts rather than $)
.
. * Each observation is for an individual over a year.
. * Individuals may appear in up to five years.
. * All available sample is used except only fee for service plans included.
. * In analysis here only year 2 is used so panel complications are avoided.
. * Clustering of individuals within household is ignored here.
.
. * Dependent variable is
.*
MED
med
Annual medical expenditures in constant dollars
.*
excluding dental and outpatient mental
.*
LNMED lnmeddol Ln(Medical expenditures) given meddol > 0
.*
Missing otherwise
.*
DMED binexp 1 if medical expenditures > 0
.
. * Regressors are
. * - Health insurance measures
.*
LC
logc
log(coinsrate+1) where coinsurance rate is 0 to 100
319

.*
IDP
idp
1 if individual deductible plan
.*
LPI
lpi
1og(annual participation incentive payment) or 0 if no payment
.*
FMDE
fmde
log(max(medical deductible expenditure)) if IDP=1 and MDE>1 or 0
otherw
> ise.
. * - Health status measures
.*
NDISEASE disea number of chronic diseases
.*
PHYSLIM physlm 1 if physical limitation
.*
HLTHG hlthg 1 if good health
.*
HLTHF hlthf 1 if good health
.*
HLTHP hlthp 1 if good health (omitted is excellent)
. * - Socioeconomic characteristics
.*
LINC linc
log of annual family income (in $)
.*
LFAM lfam
log of family size
.*
EDUCDEC educdec years of schooling of decision maker
.*
AGE
xage
exact age
.*
BLACK black 1 if black
.*
FEMALE female 1 if female
.*
CHILD child 1 if child
.*
FEMCHILD fchild 1 if female child
.
. * If panel data used then clustering is on
.*
zper
person id
.
. ********** READ DATA **********
.
. use randdata.dta, clear
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------plan | 20190 11.17553 3.976751
1
19
site | 20190 3.298811 1.80382
1
6
coins | 20190 26.3056 36.40386
0
100
tookphys | 20190 .5974245 .4904288
0
1
year | 20190 2.420109 1.217141
1
5
-------------+-------------------------------------------------------zper | 20190 357965.5 180868.1 125024 632167
black | 20190 .1814983 .3827071
0
1
income | 20190 8037.409 4058.371
0 29237.54
xage | 20190 25.72233 16.76945
0 64.27515
female | 20190 .5170381 .499722
0
1
-------------+-------------------------------------------------------educdec | 20186 11.96681 2.806255
0
25
time | 20190 .9989561 .0259741 .0767123
1
outpdol | 20190 51.12649 94.92627
0 2599.902
drugdol | 20190 13.1687 33.76212
0 706.3979
suppdol | 20190
6.8024 21.39346
0 1009.47
-------------+-------------------------------------------------------mentdol | 20190 6.870347 58.41298
0 1340.834
320

inpdol | 20190 100.4694 655.6215


0 38649.81
meddol | 20190 171.5679 698.2015
0 39182.02
totadm | 20190 .1127291 .4111857
0
8
inpmis | 20190 .0039624 .062824
0
1
-------------+-------------------------------------------------------mentvis | 20190 .4322437 3.430789
0
62
mdvis | 20190 2.860426 4.504365
0
77
notmdvis | 20190 .6855869 3.763543
0
109
num | 20190 3.954235 1.853034
1
14
mhi | 20190 76.55584 12.50224
12.2
100
-------------+-------------------------------------------------------disea | 20190 11.24449 6.741449
0
58.6
physlm | 20190 .1235003 .3220164
0
1
ghindx | 14967 73.09055 15.99371
3.7
100
mdeoff | 20185 417.8422 384.1199
0
1000
pioff | 20185 446.677 367.466
0 1291.68
-------------+-------------------------------------------------------child | 20190 .4013373 .4901812
0
1
fchild | 20190 .1937098 .3952139
0
1
lfam | 20190 1.248156 .539301
0 2.639057
lpi | 20190 4.707894 2.69784
0 7.163699
idp | 20190 .2599802 .4386343
0
1
-------------+-------------------------------------------------------logc | 20190 2.383342 2.041776
0 4.564348
fmde | 20190 4.029524 3.471353
0 8.294049
hlthg | 20190 .3620109 .4805938
0
1
hlthf | 20190 .077266 .2670196
0
1
hlthp | 20190 .0149579 .1213874
0
1
-------------+-------------------------------------------------------xghindx | 20190 73.2375 14.2332
3.7
100
linc | 20190 8.708265 1.228309
0 10.28324
lnum | 20190 1.248156 .539301
0 2.639057
lnmeddol | 15737 4.109318 1.484654 -.8495329 10.57597
binexp | 20190 .7794453 .414631
0
1
.
. /* Describe and summarize the original data.
> describe
> summarize
> * The orignal data are a panel.
> * The following summarizes panel features for completeness
> iis zper
> tis year
> xtdes
> xtsum meddol lnmeddol binexp
> */
.
. ********** DATA SELECTION AND TRANSFORMATIONS **********
.
. * Use only Year 2
. keep if year==2
321

(14615 observations deleted)


.
. * educdec is missing for one observation
. drop if educdec==.
(1 observation deleted)
.
. * rename variables
. rename meddol MED
. rename binexp DMED
. rename lnmeddol LNMED
. rename linc LINC
. rename lfam LFAM
. rename educdec EDUCDEC
. rename xage AGE
. rename female FEMALE
. rename child CHILD
. rename fchild FEMCHILD
. rename black BLACK
. rename disea NDISEASE
. rename physlm PHYSLIM
. rename hlthg HLTHG
. rename hlthf HLTHF
. rename hlthp HLTHP
. rename idp IDP
. rename logc LC
. rename lpi LPI
. rename fmde FMDE
.
. * Define the regressor list which in commands can refer to as $XLIST
322

. global XLIST LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF HLTHP /*
>
*/ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK
.
. * Summarize the dependents and regressors
. sum MED DMED LNMED $XLIST
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
DMED |
5574 .7680301 .4221277
0
1
LNMED |
4281 4.069462 1.499372 -.5343859 10.57597
LC |
5574 2.420739 2.043883
0 4.564348
IDP |
5574 .261751 .4396272
0
1
-------------+-------------------------------------------------------LPI |
5574 4.726834 2.681354
0 7.163699
FMDE |
5574 4.065015 3.450558
0 8.294049
PHYSLIM |
5574 .1242463 .3233768
0
1
NDISEASE |
5574 11.20526 6.788959
0
58.6
HLTHG |
5574 .3649085 .4814477
0
1
-------------+-------------------------------------------------------HLTHF |
5574 .0782203 .268542
0
1
HLTHP | 5574 .0156082 .123965
0
1
LINC |
5574 8.696929 1.220592
0 10.28324
LFAM |
5574 1.241407 .5403965
0 2.564949
EDUCDEC |
5574 11.9466 2.837492
0
25
-------------+-------------------------------------------------------AGE |
5574 25.57613 16.73011 .0253251 63.27515
FEMALE |
5574 .5184787 .4997032
0
1
CHILD |
5574 .4050951 .4909545
0
1
FEMCHILD |
5574 .1955508 .3966597
0
1
BLACK |
5574 .1859852 .3860055
0
1
.
. * Detailed summary shows that MED>0 very skewed whereas LNMED is not
. sum MED LNMED if MED>0, detail
medical exp excl outpatient men
------------------------------------------------------------Percentiles
Smallest
1% 2.109705
.5860291
5% 5.752914
.6630728
10% 9.376465
.6770833
Obs
4281
25% 21.31435
.6770833
Sum of Wgt.
4281
50%
75%
90%
95%
99%

52.64357
Mean
220.987
Largest
Std. Dev.
909.9021
136.4518
12044.11
453.8059
17465.98
Variance
827921.9
904.328
18641.98
Skewness
24.00829
2666.309
39182.02
Kurtosis
873.379
323

LNMED
------------------------------------------------------------Percentiles
Smallest
1%
.746548 -.5343859
5% 1.749707
-.4108706
10% 2.238203 -.3899609
Obs
4281
25% 3.059381 -.3899609
Sum of Wgt.
4281
50%
75%
90%
95%
99%

3.963544
Mean
4.069462
Largest
Std. Dev.
1.499372
4.915971
9.396331
6.11767
9.76801
Variance
2.248116
6.807192
9.833171
Skewness
.347695
7.888451
10.57597
Kurtosis
3.28909

.
. * Write final data to a text (ascii) file so can use with programs other than Stata
. outfile DMED MED LNMED LC IDP LPI FMDE PHYSLIM NDISEASE HLTHG HLTHF
HLTHP /*
>
*/ LINC LFAM EDUCDEC AGE FEMALE CHILD FEMCHILD BLACK /*
>
*/ using mma16p3selection.asc, replace
.
. ****************** CHAPTER 16.6 REGRESSION ANALYSIS **************
.
. * The analysis below models log expenditure (lny), not expenditure (y)
. * where here y = MED and lny = LNMED.
.
. * This makes regular tobit difficult as it is not clear
. * what the censoring/truncation point is since ln(0) = -infinity
. * Also note that some LNMED<0 as 0<MED<1 is possible.
. * So just do two-part model and sample selection model.
.
. * Interested in comparing MED not LNMED at end of day.
. * So use
. * If lny = xb + u, u ~ N[0, s^2] for y > 0
. * Then E[y] = exp(xb + (s^2)/2)
for y > 0
. * and E[y] = Pr[y>0]*exp(xb + (s^2)/2) for all y
.
. * The models estimated are
. * (1) Two-part model using
. * (a) probit for whether positive y
. * (b) regress with lny as dependent variable
. * (2) Sample selection model similar to (3)
. * except that inverse Mills ratio appears in (b), estimated by
. * (a) MLE
. * (b) Heckman 2-step
.
. * Additionally censored tobit and truncated tobit commands in levels
. * are given below for completeness.
324

.
. ************ (1) TWO-PART MODEL ************
.
. * Two-part model: binary probit and then lognormal for expenditures
.
. * First part: probit for MED > 0
. probit DMED $XLIST
/* global XLIST defined earlier */
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:

log likelihood = -3019.1326


log likelihood = -2698.302
log likelihood = -2690.6146
log likelihood = -2690.5768
log likelihood = -2690.5768

Probit estimates

Number of obs =
5574
LR chi2(17) = 657.11
Prob > chi2 = 0.0000
Log likelihood = -2690.5768
Pseudo R2
= 0.1088
-----------------------------------------------------------------------------DMED |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.118708 .0269005 -4.41 0.000 -.1714319 -.065984
IDP | -.1279483 .0522351 -2.45 0.014 -.2303272 -.0255693
LPI | .0283091 .0088793 3.19 0.001
.010906 .0457121
FMDE | .0075319 .0161584 0.47 0.641 -.024138 .0392018
PHYSLIM | .2732013 .0743761 3.67 0.000 .1274268 .4189758
NDISEASE | .0224861 .0035958 6.25 0.000 .0154384 .0295338
HLTHG | .0387516 .0438545 0.88 0.377 -.0472016 .1247049
HLTHF | .1920062 .0836688 2.29 0.022 .0280185 .355994
HLTHP | .6397294 .2126322 3.01 0.003 .222978 1.056481
LINC | .0518413 .0168128 3.08 0.002 .0188889 .0847938
LFAM | -.0335599 .041728 -0.80 0.421 -.1153452 .0482253
EDUCDEC | .036307 .0076536 4.74 0.000 .0213062 .0513078
AGE | .0002631 .0021606 0.12 0.903 -.0039715 .0044978
FEMALE | .4451035 .054292 8.20 0.000 .3386932 .5515138
CHILD | .111489 .0808338 1.38 0.168 -.0469424 .2699203
FEMCHILD | -.4512845 .0799219 -5.65 0.000 -.6079284 -.2946405
BLACK | -.6057367 .0523148 -11.58 0.000 -.7082718 -.5032017
_cons | -.271605 .1877345 -1.45 0.148 -.6395579 .0963478
-----------------------------------------------------------------------------. estimates store twoparta
. scalar llprobit = e(ll)
. predict probsel2part, p
. predict xbprobit, xb

/* version 8 command for later table */

/* Log-likelihood */
/* Pr[y>0] = PHI(x'b) */
/* x'b */

.
325

. * Second part: OLS for log of positive values


. * Here LNMED where LNMED missing if MED < 0
. regress LNMED $XLIST
Source |
SS
df
MS
Number of obs = 4281
-------------+-----------------------------F( 17, 4263) = 39.69
Model | 1314.70352 17 77.335501
Prob > F
= 0.0000
Residual | 8307.23358 4263 1.94868252
R-squared = 0.1366
-------------+-----------------------------Adj R-squared = 0.1332
Total | 9621.9371 4280 2.24811614
Root MSE
= 1.396
-----------------------------------------------------------------------------LNMED |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0164006 .0312495 -0.52 0.600 -.0776658 .0448647
IDP | -.0789998 .061796 -1.28 0.201 -.2001522 .0421526
LPI | .0027057 .0097138 0.28 0.781 -.0163383 .0217498
FMDE | -.0306123 .0180695 -1.69 0.090 -.0660379 .0048134
PHYSLIM | .2619829 .0687459 3.81 0.000 .1272052 .3967607
NDISEASE | .0198922 .0034441 5.78 0.000
.01314 .0266444
HLTHG | .1438008 .0483778 2.97 0.003 .0489553 .2386464
HLTHF | .3642649 .0881004 4.13 0.000 .1915422 .5369876
HLTHP | .7865099 .1700502 4.63 0.000 .453123 1.119897
LINC | .0931988 .0217849 4.28 0.000 .0504891 .1359085
LFAM | -.1408033 .046203 -3.05 0.002 -.2313852 -.0502214
EDUCDEC | -5.66e-06 .0082599 -0.00 0.999 -.0161993 .016188
AGE | .0055602 .002251 2.47 0.014 .0011471 .0099733
FEMALE | .3442509 .0571573 6.02 0.000 .2321929 .456309
CHILD | -.2677921 .0904307 -2.96 0.003 -.4450833 -.0905009
FEMCHILD | -.3512207 .0896517 -3.92 0.000 -.5269847 -.1754568
BLACK | -.1964412 .0677021 -2.90 0.004 -.3291725 -.0637099
_cons | 3.077182 .2213448 13.90 0.000
2.64323 3.511133
-----------------------------------------------------------------------------. estimates store twopartb
. scalar lllognormal = e(ll) /* Log-likelihood */
. scalar sols = e(rmse)

/* Standard error of the regression */

. predict pLNMED, xb

/* Predicted mean from OLS */

. predict rLNMED, residuals


(1293 missing values generated)
.
. * Check for normal errors
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
326

Variables: fitted values of LNMED


chi2(1)
= 17.11
Prob > chi2 = 0.0000
. * imtest
. sktest LNMED rLNMED
Skewness/Kurtosis tests for Normality
------- joint -----Variable | Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
-------------+------------------------------------------------------LNMED |
0.000
0.001
.
0.0000
rLNMED |
0.000
0.000
.
0.0000
.
. * Create two-part model log-likelihood
. scalar lltwopart = llprobit + lllognormal
. di "lltwopart = " lltwopart
lltwopart = -10184.076
.
. * Create predictions of level of expenditures not logs
. * E[y] = exp(pLNMED + (s^2)/2) for y > 0
. * and E[y] = Pr[y>0]*exp(xb + (s^2)/2) for all y
. gen pMEDpos2part = exp(pLNMED + (sols^2)/2)
. gen pMEDall2part = probsel2part*pMEDpos2part
.
. * Compare predictions to actual for MED > 0
. sum LNMED pLNMED MED pMEDpos2part if MED > 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------LNMED |
4281 4.069462 1.499372 -.5343859 10.57597
pLNMED |
4281 4.069462 .5542326 2.298199 6.482164
MED |
4281 220.987 909.9021 .5860291 39182.02
pMEDpos2part |
4281 183.462 126.0213 26.37827 1731.088
. corr LNMED pLNMED MED pMEDpos2part if MED > 0
(obs=4281)
| LNMED pLNMED
MED pMEDpo~t
-------------+-----------------------------------LNMED | 1.0000
pLNMED | 0.3696 1.0000
MED | 0.4560 0.1576 1.0000
pMEDpos2part | 0.3387 0.9204 0.1669 1.0000

327

.
. * Compare predictions to actual including zeroes
. sum MED pMEDall2part DMED probsel2part
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
pMEDall2part |
5574 140.966 120.2022 4.880651 1729.783
DMED |
5574 .7680301 .4221277
0
1
probsel2part |
5574 .7678377 .1457464 .1526731 .999246
. corr MED pMEDall2part DMED probsel2part
(obs=5574)
|
MED pMEDal~t DMED probse~t
-------------+-----------------------------------MED | 1.0000
pMEDall2part | 0.1772 1.0000
DMED | 0.1162 0.2158 1.0000
probsel2part | 0.1031 0.6380 0.3467 1.0000

.
. ************ (2) SELECTION MODEL ************
.
. * Sample selection model for log expenditures
. * Selection equation:
.*
Observe y = y* if I = z'a + u > 0 u ~ N[0,1]
. * Regression equation:
.*
y* = x'b + v v ~ N[0,s^2] and Corr[u,v]=rho
.
. * (2A) MLE for sample selection model
. heckman LNMED $XLIST, select (DMED = $XLIST)
Iteration 0: log likelihood = -10183.753 (not concave)
Iteration 1: log likelihood = -10183.676 (not concave)
Iteration 2: log likelihood = -10183.593 (not concave)
Iteration 3: log likelihood = -10183.525 (not concave)
Iteration 4: log likelihood = -10183.467 (not concave)
Iteration 5: log likelihood = -10183.408 (not concave)
Iteration 6: log likelihood = -10183.311 (not concave)
Iteration 7: log likelihood = -10183.21 (not concave)
Iteration 8: log likelihood = -10179.155
Iteration 9: log likelihood = -10176.799
Iteration 10: log likelihood = -10170.17
Iteration 11: log likelihood = -10170.11
Iteration 12: log likelihood = -10170.11
Heckman selection model
Number of obs
=
5574
(regression model with sample selection)
Censored obs
=
1293
328

Uncensored obs

Log likelihood = -10170.11

4281

Wald chi2(17)
= 805.17
Prob > chi2
=

0.0000

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNMED
|
LC | -.0760236 .0337456 -2.25 0.024 -.1421638 -.0098833
IDP | -.1497199 .0661379 -2.26 0.024 -.2793478 -.020092
LPI | .01493 .0105015 1.42 0.155 -.0056526 .0355127
FMDE | -.023522 .0194745 -1.21 0.227 -.0616913 .0146474
PHYSLIM | .3548628 .0755425 4.70 0.000 .2068023 .5029233
NDISEASE | .0286474 .0037972 7.54 0.000 .0212051 .0360897
HLTHG | .1559173 .0521775 2.99 0.003 .0536513 .2581834
HLTHF | .4451223 .0955263 4.66 0.000 .2578942 .6323505
HLTHP | .9986065 .1878791 5.32 0.000 .6303701 1.366843
LINC | .1214009 .0230845 5.26 0.000 .0761562 .1666457
LFAM | -.1583018 .0497464 -3.18 0.001 -.255803 -.0608005
EDUCDEC | .0175951 .0090183 1.95 0.051 -.0000805 .0352707
AGE | .0057376 .0024426 2.35 0.019 .0009501 .0105251
FEMALE | .5503441 .0633313 8.69 0.000 .4262171 .6744711
CHILD | -.1976875 .097398 -2.03 0.042 -.3885841 -.006791
FEMCHILD | -.5653227 .0975292 -5.80 0.000 -.7564765 -.374169
BLACK | -.5358684 .0749191 -7.15 0.000 -.6827072 -.3890296
_cons | 2.107745 .2442285 8.63 0.000 1.629066 2.586424
-------------+---------------------------------------------------------------DMED
|
LC | -.1068027 .0264766 -4.03 0.000 -.1586959 -.0549096
IDP | -.108769 .0509938 -2.13 0.033 -.2087149 -.0088231
LPI | .0294804 .0086214 3.42 0.001 .0125827 .0463781
FMDE | .0007403 .0158738 0.05 0.963 -.0303719 .0318524
PHYSLIM | .2848256 .0722656 3.94 0.000 .1431877 .4264635
NDISEASE | .0210805 .0034967 6.03 0.000 .0142271 .027934
HLTHG | .0576901 .042799 1.35 0.178 -.0261945 .1415747
HLTHF | .2237238 .0814547 2.75 0.006 .0640755 .3833721
HLTHP | .7984291 .2048087 3.90 0.000 .3970114 1.199847
LINC | .0553122 .0166179 3.33 0.001 .0227416 .0878827
LFAM | -.031201 .0402985 -0.77 0.439 -.1101846 .0477827
EDUCDEC | .031499 .0074987 4.20 0.000 .0168018 .0461961
AGE | -.0006072 .0021064 -0.29 0.773 -.0047357 .0035212
FEMALE | .4093059 .0532548 7.69 0.000 .3049283 .5136834
CHILD | .0530643 .0786326 0.67 0.500 -.1010527 .2071813
FEMCHILD | -.3953421 .0783811 -5.04 0.000 -.5489662 -.241718
BLACK | -.5831049 .0520534 -11.20 0.000 -.6851277 -.4810822
_cons | -.2141574 .1842169 -1.16 0.245 -.5752159 .146901
-------------+---------------------------------------------------------------/athrho | .9408188 .0736303 12.78 0.000
.796506 1.085132
/lnsigma | .4511091 .0177227 25.45 0.000 .4163732 .485845
-------------+---------------------------------------------------------------329

rho | .7355982 .0337886


.6620789 .7950943
sigma | 1.570053 .0278256
1.516452 1.625548
lambda | 1.154928 .0702985
1.017145 1.29271
-----------------------------------------------------------------------------LR test of indep. eqns. (rho = 0): chi2(1) = 27.93 Prob > chi2 = 0.0000
-----------------------------------------------------------------------------. estimates store heckmle
. scalar llhecklogs = e(ll)
. scalar shml = e(sigma)

/* Log-likelihood */
/* s where Var[v]=s^2 */

.
. * Save the Stata predictions:
. * Distinguish between ystar=E[y*], ypos=E[y|I>0] and yall=E[y]
. predict ystarhml, xb
/* E[y*] = x'b */
. predict yposhml, ycond

/* E[y|I>0] = E[y*|I>0] = x'b+c*lamda(z'a) */

. predict invmillhml, mills

/* lamda(z'a) = phi(z'a)/PHI(z'a) */

. predict probselhml, psel

/* PHI(z'a) */

. * The following not appropriate here as it sets y=0 if I<0


. * whereas here data is in logs and y=ln(MED)=-infinity if I<0
. predict yallhml, yexpected /* E[y] = PHI(z'a)*E[y|I>0] */
. sum ystarhml yposhml invmillhml probselhml yallhml
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ystarhml |
5574 3.543161 .7462608 .9570364 6.92732
yposhml |
5574 4.000607 .5482433 2.50515 6.92955
invmillhml |
5574 .396082 .2165116 .0019309 1.476998
probselhml |
5574 .7674107 .1404707 .1737047 .9994534
yallhml |
5574 3.124032 .9125439 .4932862 6.925763
.
. * Create predictions of level of expenditures not logs
. * E[y] = exp(ypos + (s^2)/2) for y > 0 Var[v]=s^2
. * and E[y] = Pr[y>0]*exp(ypos + (s^2)/2) for all y
. gen pMEDposhml = exp(yposhml + (shml^2)/2)
. gen pMEDallhml = probselhml*pMEDposhml
.
. * Compare predictions to actual for MED > 0
. sum LNMED yposhml MED pMEDposhml if MED > 0
Variable |

Obs

Mean

Std. Dev.

Min

Max
330

-------------+-------------------------------------------------------LNMED |
4281 4.069462 1.499372 -.5343859 10.57597
yposhml |
4281 4.071295 .5573439 2.50515 6.92955
MED |
4281 220.987 909.9021 .5860291 39182.02
pMEDposhml |
4281 240.4096 185.0424 42.00053 3505.48
. corr LNMED yposhml MED pMEDpos2part if MED > 0
(obs=4281)
| LNMED yposhml
MED pMEDpo~t
-------------+-----------------------------------LNMED | 1.0000
yposhml | 0.3690 1.0000
MED | 0.4560 0.1592 1.0000
pMEDpos2part | 0.3387 0.9343 0.1669 1.0000

.
. * Compare predictions to actual including zeroes
. sum MED pMEDallhml DMED probselhml
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
pMEDallhml |
5574 184.5571 174.1649 8.814864 3503.564
DMED |
5574 .7680301 .4221277
0
1
probselhml |
5574 .7674107 .1404707 .1737047 .9994534
. corr MED pMEDallhml DMED probselhml
(obs=5574)
|
MED pMEDal~l DMED probse~l
-------------+-----------------------------------MED | 1.0000
pMEDallhml | 0.1734 1.0000
DMED | 0.1162 0.2015 1.0000
probselhml | 0.1074 0.6092 0.3468 1.0000

.
. * (2B) Heckman 2 step for sample selection model
. * Same as MLE execpt add option twostep in heckman command
. heckman LNMED $XLIST, select (DMED = $XLIST) twostep
Heckman selection model -- two-step estimates Number of obs
(regression model with sample selection)
Censored obs
=
Uncensored obs =
4281

=
5574
1293

Wald chi2(34)
= 944.44
Prob > chi2
= 0.0000

331

-----------------------------------------------------------------------------|
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------LNMED
|
LC | -.0279209 .039754 -0.70 0.482 -.1058373 .0499955
IDP | -.0922898 .0680191 -1.36 0.175 -.2256048 .0410252
LPI | .0052225 .0111057 0.47 0.638 -.0165442 .0269893
FMDE | -.0295212 .0182427 -1.62 0.106 -.0652762 .0062339
PHYSLIM | .2814948 .0804535 3.50 0.000 .1238088 .4391808
NDISEASE | .021617 .0050395 4.29 0.000 .0117398 .0314943
HLTHG | .1474026 .0490497 3.01 0.003 .051267 .2435381
HLTHF | .3821683 .0961284 3.98 0.000
.19376 .5705765
HLTHP | .833294 .1974488 4.22 0.000 .4463015 1.220287
LINC | .0990973 .0251548 3.94 0.000 .0497948 .1483998
LFAM | -.1441358 .0468074 -3.08 0.002 -.2358766 -.052395
EDUCDEC | .0033639 .0109501 0.31 0.759 -.0180979 .0248257
AGE | .0055556 .0022549 2.46 0.014 .0011361 .0099751
FEMALE | .3846323 .1032799 3.72 0.000 .1822074 .5870573
CHILD | -.2565136 .0936771 -2.74 0.006 -.4401173 -.0729098
FEMCHILD | -.392146 .125089 -3.13 0.002 -.637316 -.146976
BLACK | -.2633649 .1577542 -1.67 0.095 -.5725574 .0458276
_cons | 2.882514 .4698969 6.13 0.000 1.961533 3.803495
-------------+---------------------------------------------------------------DMED
|
LC | -.118708 .0269005 -4.41 0.000 -.1714319 -.065984
IDP | -.1279483 .0522351 -2.45 0.014 -.2303272 -.0255693
LPI | .0283091 .0088793 3.19 0.001
.010906 .0457121
FMDE | .0075319 .0161584 0.47 0.641 -.024138 .0392018
PHYSLIM | .2732013 .0743761 3.67 0.000 .1274268 .4189758
NDISEASE | .0224861 .0035958 6.25 0.000 .0154384 .0295338
HLTHG | .0387516 .0438545 0.88 0.377 -.0472016 .1247049
HLTHF | .1920062 .0836688 2.29 0.022 .0280185 .355994
HLTHP | .6397294 .2126322 3.01 0.003 .222978 1.056481
LINC | .0518413 .0168128 3.08 0.002 .0188889 .0847938
LFAM | -.0335599 .041728 -0.80 0.421 -.1153452 .0482253
EDUCDEC | .036307 .0076536 4.74 0.000 .0213062 .0513078
AGE | .0002631 .0021606 0.12 0.903 -.0039715 .0044978
FEMALE | .4451035 .054292 8.20 0.000 .3386932 .5515138
CHILD | .111489 .0808338 1.38 0.168 -.0469424 .2699203
FEMCHILD | -.4512845 .0799219 -5.65 0.000 -.6079284 -.2946405
BLACK | -.6057367 .0523148 -11.58 0.000 -.7082718 -.5032017
_cons | -.271605 .1877345 -1.45 0.148 -.6395579 .0963478
-------------+---------------------------------------------------------------mills
|
lambda | .2358048 .5018117 0.47 0.638 -.7477282 1.219338
-------------+---------------------------------------------------------------rho | 0.16833
sigma | 1.4008246
lambda | .23580476 .5018117
------------------------------------------------------------------------------

332

. estimates store heck2step


. scalar sh2s = e(sigma)

/* s where Var[v]=s^2 */

.
. * Save the Stata predictions:
. * Distinguish between ystar=E[y*], ypos=E[y|I>0] and yall=E[y]
. predict ystarh2s, xb
/* E[y*] = x'b */
. predict yposh2s, ycond

/* E[y|I>0] = E[y*|I>0] = x'b+c*lamda(z'a) */

. predict invmillh2s, mills

/* lamda(z'a) = phi(z'a)/PHI(z'a) */

. predict probselh2s, psel

/* PHI(z'a) */

. * The following not appropriate here as it sets y=0 if I<0


. * whereas here data is in logs and y=ln(MED)=-infinity if I<0
. predict yallh2s, yexpected /* E[y] = PHI(z'a)*E[y|I>0] */
. sum ystarh2s yposh2s invmillh2s probselh2s yallh2s
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------ystarh2s |
5574 3.904371 .589474 2.005307 6.573941
yposh2s |
5574 3.997637 .5516546 2.337985 6.574553
invmillh2s |
5574 .3955256 .2253329 .002599 1.545223
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
yallh2s |
5574 3.124344 .9213697 .4450346 6.569597
.
. * Create predictions of level of expenditures not logs
. * E[y] = exp(ypos + (s^2)/2) for y > 0 Var[v]=s^2
. * and E[y] = Pr[y>0]*exp(ypos + (s^2)/2) for all y
. gen pMEDposh2s = exp(yposh2s + (sh2s^2)/2)
. gen pMEDallh2s = probselh2s*pMEDposh2s
.
. * Compare predictions to actual for MED > 0
. sum LNMED yposh2s MED pMEDposh2s if MED > 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------LNMED |
4281 4.069462 1.499372 -.5343859 10.57597
yposh2s |
4281 4.069462 .5543231 2.337985 6.574553
MED |
4281 220.987 909.9021 .5860291 39182.02
pMEDposh2s |
4281 184.9993 129.5432 27.63657 1911.624
. corr LNMED yposh2s MED pMEDpos2part if MED > 0
(obs=4281)

333

| LNMED yposh2s
MED pMEDpo~t
-------------+-----------------------------------LNMED | 1.0000
yposh2s | 0.3697 1.0000
MED | 0.4560 0.1584 1.0000
pMEDpos2part | 0.3387 0.9240 0.1669 1.0000

.
. * Compare predictions to actual including zeroes
. sum MED pMEDallh2s DMED probselh2s
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
pMEDallh2s |
5574 142.1438 123.2964 5.272963 1910.182
DMED |
5574 .7680301 .4221277
0
1
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
. corr MED pMEDallh2s DMED probselh2s
(obs=5574)
|
MED pMEDa~2s DMED probs~2s
-------------+-----------------------------------MED | 1.0000
pMEDallh2s | 0.1772 1.0000
DMED | 0.1162 0.2132 1.0000
probselh2s | 0.1031 0.6298 0.3467 1.0000

.
. * (2C) Check for possible collinearity problems in Heckman 2-Step
.
. * Check variation in inverse mills ratio and related measures
. gen zprimea = invnorm(probselh2s)
. gen zprimeasq = zprimea*zprimea
. sum invmillh2s probselh2s zprimea ystarh2s
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------invmillh2s |
5574 .3955256 .2253329 .002599 1.545223
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
zprimea |
5574 .8217315 .5175712 -1.025036 3.17314
ystarh2s |
5574 3.904371 .589474 2.005307 6.573941
. sum invmillh2s probselh2s zprimea ystarh2s, detail
Mills' ratio
------------------------------------------------------------334

Percentiles
Smallest
1% .0443035
.002599
5% .1081773
.0065964
10% .1479522
.0074306
25% .2404661
.0111331
50%
75%
90%
95%
99%

Obs
5574
Sum of Wgt.
5574

.3522253
Mean
.3955256
Largest
Std. Dev.
.2253329
.5044507
1.42819
.7088638
1.42819
Variance
.0507749
.863094
1.466996
Skewness
1.105156
1.080771
1.545223
Kurtosis
4.403004

Pr(DMED)
------------------------------------------------------------Percentiles
Smallest
1%
.338421
.1526731
5% .4598847
.1769602
10% .5570307
.1900167
Obs
5574
25% .6946899
.1900167
Sum of Wgt.
5574
50%
75%
90%
95%
99%

.7984734
Mean
.7678377
Largest
Std. Dev.
.1457464
.8717066
.9962835
.927941
.9976236
Variance
.021242
.9502093
.9979156
Skewness
-1.048826
.9823552
.999246
Kurtosis
3.903288

zprimea
------------------------------------------------------------Percentiles
Smallest
1% -.4167765
-1.025036
5% -.1007243
-.9270119
10% .1434453 -.8778346
Obs
5574
25% .5091883 -.8778346
Sum of Wgt.
5574
50%
75%
90%
95%
99%

.8361809
Mean
.8217315
Largest
Std. Dev.
.5175712
1.134495
2.676793
1.460626
2.82333
Variance
.2678799
1.646887
2.865093
Skewness
-.0298741
2.105021
3.17314
Kurtosis
3.462529

Linear prediction
------------------------------------------------------------Percentiles
Smallest
1% 2.770451
2.005307
5% 3.096997
2.005307
10% 3.248734
2.066777
Obs
5574
25% 3.460358
2.093177
Sum of Wgt.
5574

335

50%
75%
90%
95%
99%

3.818303
Mean
3.904371
Largest
Std. Dev.
.589474
4.304362
6.054721
4.68132
6.055911
Variance
.3474796
4.946257
6.273092
Skewness
.5047628
5.495563
6.573941
Kurtosis
3.235111

.
. * Check for Mills ratio linear in zprimea
. regress invmillh2s zprimea
Source |
SS
df
MS
Number of obs = 5574
-------------+-----------------------------F( 1, 5572) =84783.34
Model | 265.518552 1 265.518552
Prob > F
= 0.0000
Residual | 17.4500012 5572 .00313173
R-squared = 0.9383
-------------+-----------------------------Adj R-squared = 0.9383
Total | 282.968553 5573 .050774906
Root MSE
= .05596
-----------------------------------------------------------------------------invmillh2s |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------zprimea | -.4217284 .0014484 -291.18 0.000 -.4245677 -.418889
_cons | .7420731 .0014065 527.59 0.000 .7393158 .7448305
-----------------------------------------------------------------------------. regress invmillh2s zprimea zprimeasq
Source |
SS
df
MS
Number of obs =
-------------+-----------------------------F( 2, 5571) =
Model | 282.919807 2 141.459904
Prob > F
Residual | .04874607 5571 8.7500e-06
R-squared
-------------+-----------------------------Adj R-squared =
Total | 282.968553 5573 .050774906
Root MSE

5574
.
= 0.0000
= 0.9998
0.9998
= .00296

-----------------------------------------------------------------------------invmillh2s |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------zprimea | -.6381933 .0001715 -3720.60 0.000 -.6385296 -.6378571
zprimeasq | .1329635 .0000943 1410.22 0.000 .1327787 .1331484
_cons | .7945547 .0000831 9556.73 0.000 .7943917 .7947177
-----------------------------------------------------------------------------. * twoway scatter yinvmill probitxb
.
. * Check R-squared from regress yinvmill on other regressors
. regress invmillh2s $XLIST
Source |
SS
df
MS
Number of obs = 5574
-------------+-----------------------------F( 17, 5556) = 7477.36
Model | 271.118403 17 15.9481414
Prob > F
= 0.0000
Residual | 11.85015 5556 .002132856
R-squared = 0.9581
336

-------------+-----------------------------Adj R-squared = 0.9580


Total | 282.968553 5573 .050774906
Root MSE
= .04618
-----------------------------------------------------------------------------invmillh2s |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | .0529008 .000877 60.32 0.000 .0511815 .0546202
IDP | .0590603 .0017037 34.67 0.000 .0557204 .0624003
LPI | -.0113774 .0002792 -40.75 0.000 -.0119247 -.01083
FMDE | -.0054681 .0005178 -10.56 0.000 -.0064831 -.004453
PHYSLIM | -.0864947 .0021028 -41.13 0.000 -.090617 -.0823724
NDISEASE | -.0077731 .0001032 -75.31 0.000 -.0079754 -.0075707
HLTHG | -.0155696 .0013947 -11.16 0.000 -.0183037 -.0128355
HLTHF | -.0844067 .0025693 -32.85 0.000 -.0894435 -.0793698
HLTHP | -.2164141 .0052914 -40.90 0.000 -.2267872 -.206041
LINC | -.0293205 .0005678 -51.64 0.000 -.0304337 -.0282074
LFAM | .0170455 .0013216 12.90 0.000 .0144545 .0196364
EDUCDEC | -.0152414 .0002405 -63.38 0.000 -.0157128 -.01477
AGE | .0001145 .0000665 1.72 0.085 -.0000158 .0002448
FEMALE | -.1792718 .0016754 -107.00 0.000 -.1825563 -.1759873
CHILD | -.0474152 .0025807 -18.37 0.000 -.0524744 -.042356
FEMCHILD | .1803783 .002565 70.32 0.000 .1753498 .1854067
BLACK | .3020816 .0017915 168.62 0.000 .2985695 .3055937
_cons | .875215 .0061051 143.36 0.000 .8632467 .8871833
-----------------------------------------------------------------------------.
. * Find the condition number with inverse mills ratio included
. matrix accum XX = invmillh2s $XLIST
(obs=5574)
. matrix XXScaled = corr(XX)
. matrix symeigen XXSeigvec XXSeigval = XXScaled
. scalar rowsXX = rowsof(XX)
. scalar condnum1 = sqrt(XXSeigval[1,1]/XXSeigval[1,rowsXX])
. scalar condnum2 = sqrt(XXSeigval[1,1]/XXSeigval[1,(rowsXX-1)])
.
. * Find the condition number without inverse mills ratio
. matrix accum ZZ = $XLIST
(obs=5574)
. matrix ZZScaled = corr(ZZ)
. matrix symeigen ZZSeigvec ZZSeigval = ZZScaled
. scalar rowsZZ = rowsof(ZZ)
337

. scalar condnumnoinvmills1 = sqrt(ZZSeigval[1,1]/ZZSeigval[1,rowsZZ])


. scalar condnumnoinvmills2 = sqrt(ZZSeigval[1,1]/ZZSeigval[1,(rowsZZ-1)])
.
. * Condition numbers between 30 and 100 indicate a strong near dependency
. scalar list condnum1 condnum2
condnum1 = 82.333696
condnum2 = 24.558474
. scalar list condnumnoinvmills1 condnumnoinvmills2
condnumnoinvmills1 = 36.660119
condnumnoinvmills2 = 20.990872
.
. * (2D) Do Heckman 2 step manually (this is unnecessary)
. quietly probit DMED $XLIST
/* global XLIST defined earlier */
. predict pselmanual, p

/* Pr[y>0] = PHI(x'b) */

. predict xbmanual, xb

/* x'b */

. gen invmillsmanual = normden(xbmanual)/pselmanual


. regress LNMED $XLIST invmillsmanual if MED > 0
Source |
SS
df
MS
Number of obs = 4281
-------------+-----------------------------F( 18, 4262) = 37.49
Model | 1315.13292 18 73.06294
Prob > F
= 0.0000
Residual | 8306.80418 4262 1.94903899
R-squared = 0.1367
-------------+-----------------------------Adj R-squared = 0.1330
Total | 9621.9371 4280 2.24811614
Root MSE
= 1.3961
-----------------------------------------------------------------------------LNMED |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0279209 .0397381 -0.70 0.482 -.1058282 .0499864
IDP | -.0922898 .067979 -1.36 0.175 -.225564 .0409844
LPI | .0052225 .0110962 0.47 0.638 -.0165318 .0269769
FMDE | -.0295212 .01822 -1.62 0.105 -.065242 .0061996
PHYSLIM | .2814948 .0803424 3.50 0.000 .1239819 .4390076
NDISEASE | .0216171 .0050367 4.29 0.000 .0117426 .0314915
HLTHG | .1474026 .0489869 3.01 0.003 .0513627 .2434424
HLTHF | .3821683 .0960103 3.98 0.000 .1939381 .5703985
HLTHP | .833294 .1971219 4.23 0.000 .4468325 1.219756
LINC | .0990973 .0251514 3.94 0.000 .0497875 .1484071
LFAM | -.1441358 .0467495 -3.08 0.002 -.2357891 -.0524825
EDUCDEC | .0033639 .0109441 0.31 0.759 -.0180922 .0248201
AGE | .0055556 .0022512 2.47 0.014
.001142 .0099692
FEMALE | .3846324 .103291 3.72 0.000 .1821281 .5871366
338

CHILD | -.2565135 .0935766 -2.74 0.006 -.4399725 -.0730546


FEMCHILD | -.392146 .1250644 -3.14 0.002 -.6373374 -.1469547
BLACK | -.2633649 .1578399 -1.67 0.095 -.5728134 .0460835
invmillsma~l | .235805 .5023784 0.47 0.639 -.7491182 1.220728
_cons | 2.882514 .470116 6.13 0.000 1.960841 3.804186
-----------------------------------------------------------------------------. predict yposmanual, xb
. * Predictions here should equal those from heckman two-step earlier
. sum yposh2s yposmanual invmillh2s invmillsmanual probselh2s pselmanual
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------yposh2s |
5574 3.997637 .5516546 2.337985 6.574553
yposmanual |
5574 3.997637 .5516546 2.337985 6.574553
invmillh2s |
5574 .3955256 .2253329 .002599 1.545223
invmillsma~l |
5574 .3955256 .2253329 .002599 1.545223
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
-------------+-------------------------------------------------------pselmanual |
5574 .7678377 .1457464 .1526731 .999246
. * And put in squared invmills ratio
. gen invmillssq = invmillsmanual*invmillsmanual
. regress LNMED $XLIST invmillsmanual invmillssq if MED > 0
Source |
SS
df
MS
Number of obs = 4281
-------------+-----------------------------F( 19, 4261) = 35.64
Model | 1319.30272 19 69.4369854
Prob > F
= 0.0000
Residual | 8302.63438 4261 1.94851781
R-squared = 0.1371
-------------+-----------------------------Adj R-squared = 0.1333
Total | 9621.9371 4280 2.24811614
Root MSE
= 1.3959
-----------------------------------------------------------------------------LNMED |
Coef. Std. Err.
t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------LC | -.0793176 .0530386 -1.50 0.135 -.1833009 .0246658
IDP | -.1419148 .075965 -1.87 0.062 -.2908457 .0070161
LPI | .0174224 .0138796 1.26 0.209 -.0097888 .0446337
FMDE | -.0258495 .0183897 -1.41 0.160 -.0619029 .0102039
PHYSLIM | .3867535 .1078448 3.59 0.000 .1753217 .5981854
NDISEASE | .0305019 .0078898 3.87 0.000 .0150337 .0459701
HLTHG | .1652111 .0504705 3.27 0.001 .0662626 .2641596
HLTHF | .4576241 .1089774 4.20 0.000 .2439716 .6712766
HLTHP | 1.056745 .2493566 4.24 0.000 .5678762 1.545614
LINC | .1169339 .027948 4.18 0.000 .0621414 .1717264
LFAM | -.1550441 .0473343 -3.28 0.001 -.2478439 -.0622443
EDUCDEC | .018452 .0150373 1.23 0.220 -.011029 .047933
AGE | .0057227 .0022538 2.54 0.011
.001304 .0101414
FEMALE | .5748999 .1660813 3.46 0.001 .2492941 .9005056
339

CHILD | -.2096856 .0988886 -2.12 0.034 -.4035587 -.0158125


FEMCHILD | -.5873068 .1828525 -3.21 0.001 -.9457929 -.2288207
BLACK | -.5010232 .2264954 -2.21 0.027 -.9450721 -.0569744
invmillsma~l | 2.159812 1.407886 1.53 0.125 -.6003768 4.920001
invmillssq | -1.043357 .7132265 -1.46 0.144 -2.441653 .3549381
_cons | 1.909849 .8142753 2.35 0.019 .3134454 3.506253
-----------------------------------------------------------------------------.
. ************ (3) DISPLAY RESULTS FOR TABLE 16.1 (page 554) ************
.
. * Note for brevity the coefficients for only some of the regressors are reported
.
. * First two columns of Table 16.1 (page 554)
. * Two part estimates: probit for first part and lognormal for second
. estimates table twoparta twopartb, t stats(N ll rank aic bic) b(%10.3f)
---------------------------------------Variable | twoparta twopartb
-------------+-------------------------LC | -0.119
-0.016
|
-4.41
-0.52
IDP | -0.128
-0.079
|
-2.45
-1.28
LPI |
0.028
0.003
|
3.19
0.28
FMDE |
0.008
-0.031
|
0.47
-1.69
PHYSLIM |
0.273
0.262
|
3.67
3.81
NDISEASE |
0.022
0.020
|
6.25
5.78
HLTHG |
0.039
0.144
|
0.88
2.97
HLTHF |
0.192
0.364
|
2.29
4.13
HLTHP |
0.640
0.787
|
3.01
4.63
LINC |
0.052
0.093
|
3.08
4.28
LFAM | -0.034
-0.141
|
-0.80
-3.05
EDUCDEC |
0.036
-0.000
|
4.74
-0.00
AGE |
0.000
0.006
|
0.12
2.47
FEMALE |
0.445
0.344
|
8.20
6.02
CHILD |
0.111
-0.268
|
1.38
-2.96
FEMCHILD | -0.451
-0.351
340

|
-5.65
-3.92
BLACK | -0.606
-0.196
| -11.58
-2.90
_cons | -0.272
3.077
|
-1.45
13.90
-------------+-------------------------N | 5574.000 4281.000
ll | -2690.577 -7493.499
rank | 18.000
18.000
aic | 5417.154 15022.998
bic | 5536.419 15137.513
---------------------------------------legend: b/t
. di "lltwopart = " lltwopart
lltwopart = -10184.076
.
. * Last four columns of Table 16.1 (page 554)
. * Sample selection estimates: 2step and MLE estimates
. set matsize 60
. estimates table heck2step heckmle, t stats(N ll rank aic bic) b(%10.3f)
---------------------------------------Variable | heck2step heckmle
-------------+-------------------------LNMED
|
LC | -0.028
-0.076
|
-0.70
-2.25
IDP | -0.092
-0.150
|
-1.36
-2.26
LPI |
0.005
0.015
|
0.47
1.42
FMDE | -0.030
-0.024
|
-1.62
-1.21
PHYSLIM |
0.281
0.355
|
3.50
4.70
NDISEASE |
0.022
0.029
|
4.29
7.54
HLTHG |
0.147
0.156
|
3.01
2.99
HLTHF |
0.382
0.445
|
3.98
4.66
HLTHP |
0.833
0.999
|
4.22
5.32
LINC |
0.099
0.121
|
3.94
5.26
LFAM | -0.144
-0.158
|
-3.08
-3.18
EDUCDEC |
0.003
0.018
341

|
0.31
1.95
AGE |
0.006
0.006
|
2.46
2.35
FEMALE |
0.385
0.550
|
3.72
8.69
CHILD | -0.257
-0.198
|
-2.74
-2.03
FEMCHILD | -0.392
-0.565
|
-3.13
-5.80
BLACK | -0.263
-0.536
|
-1.67
-7.15
_cons |
2.883
2.108
|
6.13
8.63
-------------+-------------------------DMED
|
LC | -0.119
-0.107
| -4.41
-4.03
IDP | -0.128
-0.109
|
-2.45
-2.13
LPI |
0.028
0.029
|
3.19
3.42
FMDE |
0.008
0.001
|
0.47
0.05
PHYSLIM |
0.273
0.285
|
3.67
3.94
NDISEASE |
0.022
0.021
|
6.25
6.03
HLTHG |
0.039
0.058
|
0.88
1.35
HLTHF |
0.192
0.224
|
2.29
2.75
HLTHP |
0.640
0.798
|
3.01
3.90
LINC |
0.052
0.055
|
3.08
3.33
LFAM | -0.034
-0.031
|
-0.80
-0.77
EDUCDEC |
0.036
0.031
|
4.74
4.20
AGE |
0.000
-0.001
|
0.12
-0.29
FEMALE |
0.445
0.409
|
8.20
7.69
CHILD |
0.111
0.053
|
1.38
0.67
FEMCHILD | -0.451
-0.395
|
-5.65
-5.04
BLACK | -0.606
-0.583
| -11.58
-11.20
_cons | -0.272
-0.214
|
-1.45
-1.16
342

-------------+-------------------------mills
|
lambda |
0.236
|
0.47
-------------+-------------------------athrho
|
_cons |
0.941
|
12.78
-------------+-------------------------lnsigma
|
_cons |
0.451
|
25.45
-------------+-------------------------Statistics |
N | 5574.000 5574.000
ll |
-10170.110
rank | 37.000
38.000
aic |
. 20416.221
bic |
. 20668.004
---------------------------------------legend: b/t
.
. ************ (4) A LITTLE FURTHER ANALYSIS **********
.
. * Predictions
. * Compare predictions to actual for MED > 0
. sum MED pMEDpos2part pMEDposhml pMEDposh2s if MED > 0
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
4281 220.987 909.9021 .5860291 39182.02
pMEDpos2part |
4281 183.462 126.0213 26.37827 1731.088
pMEDposhml |
4281 240.4096 185.0424 42.00053 3505.48
pMEDposh2s |
4281 184.9993 129.5432 27.63657 1911.624
. corr MED pMEDpos2part pMEDposhml pMEDposh2s if MED > 0
(obs=4281)
|
MED pMEDpo~t pMEDpo~l pMEDp~2s
-------------+-----------------------------------MED | 1.0000
pMEDpos2part | 0.1669 1.0000
pMEDposhml | 0.1617 0.9830 1.0000
pMEDposh2s | 0.1669 0.9994 0.9887 1.0000

.
. * Compare predictions to actual including zeroes
. sum MED pMEDall2part pMEDallhml pMEDallh2s DMED probsel2part probselhml probselh2s

343

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------MED |
5574 169.7247 802.8303
0 39182.02
pMEDall2part |
5574 140.966 120.2022 4.880651 1729.783
pMEDallhml |
5574 184.5571 174.1649 8.814864 3503.564
pMEDallh2s |
5574 142.1438 123.2964 5.272963 1910.182
DMED |
5574 .7680301 .4221277
0
1
-------------+-------------------------------------------------------probsel2part | 5574 .7678377 .1457464 .1526731 .999246
probselhml |
5574 .7674107 .1404707 .1737047 .9994534
probselh2s |
5574 .7678377 .1457464 .1526731 .999246
. corr MED pMEDall2part pMEDallhml pMEDallh2s DMED probsel2part probselhml probselh2s
(obs=5574)
|
MED pMEDal~t pMEDal~l pMEDa~2s DMED probse~t probse~l probs~2s
-------------+-----------------------------------------------------------------------MED | 1.0000
pMEDall2part | 0.1772 1.0000
pMEDallhml | 0.1734 0.9861 1.0000
pMEDallh2s | 0.1772 0.9995 0.9909 1.0000
DMED | 0.1162 0.2158 0.2015 0.2132 1.0000
probsel2part | 0.1031 0.6380 0.5939 0.6298 0.3467 1.0000
probselhml | 0.1074 0.6552 0.6092 0.6468 0.3468 0.9980 1.0000
probselh2s | 0.1031 0.6380 0.5939 0.6298 0.3467 1.0000 0.9980 1.0000

.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section4\mma16p3selection.txt
log type: text
closed on: 19 May 2005, 13:04:40

344

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p1km.txt
log type: text
opened on: 19 May 2005, 13:19:55
.
. ********** OVERVIEW OF MMA17P1KM.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 17.2 (pages 574-5) and 17.5.1 (pages 581-3)
. * Nonparametric Duration Analysis
. * It provides
. * (1) Kaplan-Meier Survival Estimate Graph (Figure 17.1: kennanstrk.wmf)
. * (2) Nelson-Aalen Cumulative Hazard Estimate Graph
. * (3) Kaplan-Meier Survivor Function Estimates (Table 17.3)
. * (4) Shows that Cox regression on intercept gives same results
.
. * To run this program you need data file
. * strkdur.dta
.
. ********** SETUP **********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION
.
. * The data is the same data as given in Table 1 of
. * J. Kennan, "The Duration of Contract strikes in U.S. Manufacturing",
. * Journal of Econometrics, 1985, Vol. 28, pp.5-28.
.
. * There are 566 observations from 1968-1976 with two variables
. * 1. dur is duration of the strike in days
. * 2. gdp is a measure of stage of business cycle
.*
(deviation of monthly log industrial production in manufacturing
.*
from prediction from OLS on time, time-squared and monthly dummies)
.
. * All observations are complete for these data. There is no censoring !!
. * For an example with censoring see mma17p2kmextra.do or mma17p4duration.do
.
. ********** READ DATA **********
.
345

. use strkdur.dta
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------dur |
566 43.62367 44.66641
1
235
gdp |
566 .0060411 .0499072 -.13996 .08554
.
. * Create ASCII data set so that can use programs other than Stata
. outfile dur gdp using strkdur.asc, replace
.
. ********* ANALYSIS: NONPARAMETRIC SURVIVAL CURVE AND HAZARD
FUNCTION **********
.
. * Stata st curves require defining the dependent variable
. stset dur
failure event: (assumed to fail at time=dur)
obs. time interval: (0, dur]
exit on or before: failure
-----------------------------------------------------------------------------566 total obs.
0 exclusions
-----------------------------------------------------------------------------566 obs. remaining, representing
566 failures in single record/single failure data
24691 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
235
.
. * The data here are complete. If dur is instead right-censored,
. * then also need to define a censoring indicator. For example
. * stset dur, fail(censor=1)
. * where the variable censor=1 if data are right-censored and =0 otherwise
. * See mma17p3duration.do
.
. * (1) GRAPH KAPLAN-MEIER SURVIVAL CURVE
.
. * Minimal command that gives 95% confidence bands
. sts graph, gwood
failure _d: 1 (meaning all fail)
analysis time _t: dur
.
. * Longer command for Figure 17.1 (page 575)
346

. * Nicer graphs and also confidence bands are bolder and easier to read
. sts gen surv = s
. sts gen lbsurv = lb(s)
. sts gen ubsurv = ub(s)
. sort dur
. graph twoway (line ubsurv dur, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)) /*
> */ (line surv dur, msize(vtiny) mstyle(p1) c(J) clstyle(p1)) /*
> */ (line lbsurv dur, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)), /*
> */ scale(1.2) plotregion(style(none)) /*
> */ title("Kaplan-Meier Survival Function Estimate") /*
> */ xtitle("Strike duration in days", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Survival Probability", size(medlarge)) yscale(titlegap(*5)) /*
> */ ylabel(0.00(0.25)1.00,grid)/*
> */ legend(pos(3) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Upper 95% confidence band") label(2 "Survival Function") /*
> */
label(3 "Lower 95% confidence band") )
. graph export kennanstrk.wmf, replace
(file c:\Imbook\bwebpage\Section4\kennanstrk.wmf written in Windows Metafile format)
.
. * (2) GRAPH NELSON-AALEN CUMULATIVE HAZARD FUNCTION
.
. * Minimal command that gives 95% confidence bands
. sts graph, cna
failure _d: 1 (meaning all fail)
analysis time _t: dur
.
. * Longer command gives nicer figure
. sts graph, cna /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Nelson-Aalen Cumulative Hazard") /*
> */ xtitle("Strike duration in days", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(12) ring(0) col(1)) legend(size(small)) /*
> */ legend(label(1 "95% confidence bands") label(2 "Cumulative Hazard"))
failure _d: 1 (meaning all fail)
analysis time _t: dur
.
. * (3) LIST SURVIVOR and NELSON-AALEN CUMULATIVE HAZARD ESTIMATES
.
. * Gives a lot of output
.
347

. * Table 17.2: Kaplan-Meier Survivor Function (page 583)


. sts list
failure _d: 1 (meaning all fail)
analysis time _t: dur
Beg.
Net
Survivor
Std.
Time Total Fail Lost
Function Error [95% Conf. Int.]
------------------------------------------------------------------------------1
566 10
0
0.9823 0.0055 0.9674 0.9905
2
556 21
0
0.9452 0.0096 0.9230 0.9612
3
535 16
0
0.9170 0.0116 0.8910 0.9369
4
519 17
0
0.8869 0.0133 0.8578 0.9104
5
502 18
0
0.8551 0.0148 0.8234 0.8816
6
484
9
0
0.8392 0.0154 0.8063 0.8670
7
475 12
0
0.8180 0.0162 0.7837 0.8474
8
463 12
0
0.7968 0.0169 0.7613 0.8277
9
451 13
0
0.7739 0.0176 0.7371 0.8061
10
438
8
0
0.7597 0.0180 0.7223 0.7928
11
430
9
0
0.7438 0.0183 0.7058 0.7777
12
421 10
0
0.7261 0.0187 0.6874 0.7609
13
411 11
0
0.7067 0.0191 0.6673 0.7424
14
400 11 0
0.6873 0.0195 0.6473 0.7237
15
389 12
0
0.6661 0.0198 0.6256 0.7033
16
377
8
0
0.6519 0.0200 0.6111 0.6896
17
369
6
0
0.6413 0.0202 0.6003 0.6793
18
363
8
0
0.6272 0.0203 0.5860 0.6656
19
355
7
0
0.6148 0.0205 0.5734 0.6535
20
348
7
0
0.6025 0.0206 0.5609 0.6415
21
341
5
0
0.5936 0.0206 0.5519 0.6328
22
336 11
0
0.5742 0.0208 0.5324 0.6137
23
325 10
0
0.5565 0.0209 0.5146 0.5964
24
315
8
0
0.5424 0.0209 0.5004 0.5824
25
307
4
0
0.5353 0.0210 0.4934 0.5754
26
303
7
0
0.5230 0.0210 0.4810 0.5632
27
296
6
0
0.5124 0.0210 0.4704 0.5527
28
290
9
0
0.4965 0.0210 0.4546 0.5369
29
281
5
0
0.4876 0.0210 0.4458 0.5281
30
276
5
0
0.4788 0.0210 0.4371 0.5193
31
271
8
0
0.4647 0.0210 0.4231 0.5051
32
263
5
0
0.4558 0.0209 0.4144 0.4963
33
258
6
0
0.4452 0.0209 0.4039 0.4857
34
252
5
0
0.4364 0.0208 0.3952 0.4768
35
247
4
0
0.4293 0.0208 0.3883 0.4697
36
243
6
0
0.4187 0.0207 0.3779 0.4590
37
237
6
0
0.4081 0.0207 0.3675 0.4483
38
231
8
0
0.3940 0.0205 0.3537 0.4340
39
223
3
0
0.3887 0.0205 0.3485 0.4287
40
220
1
0
0.3869 0.0205 0.3468 0.4269
41
219
4
0
0.3799 0.0204 0.3399 0.4197
42
215
8
0
0.3657 0.0202 0.3261 0.4053
348

43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
67
68
70
71
72
74
75
77
82
83
84
85
86
87
88
90
91
92
94
98
99
100
101
102
103
104
105
106

207
203
194
191
187
182
179
174
166
165
157
151
150
148
145
142
141
137
131
126
124
122
117
114
113
112
108
107
106
105
104
101
99
98
95
93
92
91
90
89
87
86
85
82
79
77
74
72
71
68
67

4
9
3
4
5
3
5
8
1
8
6
1
2
3
3
1
4
6
5
2
2
5
3
1
1
4
1
1
1
1
3
2
1
3
2
1
1
1
1
2
1
1
3
3
2
3
2
1
3
1
2

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0.3587
0.3428
0.3375
0.3304
0.3216
0.3163
0.3074
0.2933
0.2915
0.2774
0.2668
0.2650
0.2615
0.2562
0.2509
0.2491
0.2420
0.2314
0.2226
0.2191
0.2155
0.2067
0.2014
0.1996
0.1979
0.1908
0.1890
0.1873
0.1855
0.1837
0.1784
0.1749
0.1731
0.1678
0.1643
0.1625
0.1608
0.1590
0.1572
0.1537
0.1519
0.1502
0.1449
0.1396
0.1360
0.1307
0.1272
0.1254
0.1201
0.1184
0.1148

0.0202
0.0200
0.0199
0.0198
0.0196
0.0195
0.0194
0.0191
0.0191
0.0188
0.0186
0.0186
0.0185
0.0183
0.0182
0.0182
0.0180
0.0177
0.0175
0.0174
0.0173
0.0170
0.0169
0.0168
0.0167
0.0165
0.0165
0.0164
0.0163
0.0163
0.0161
0.0160
0.0159
0.0157
0.0156
0.0155
0.0154
0.0154
0.0153
0.0152
0.0151
0.0150
0.0148
0.0146
0.0144
0.0142
0.0140
0.0139
0.0137
0.0136
0.0134

0.3193
0.3039
0.2988
0.2919
0.2834
0.2783
0.2698
0.2563
0.2546
0.2411
0.2310
0.2294
0.2260
0.2210
0.2159
0.2143
0.2076
0.1976
0.1893
0.1860
0.1827
0.1744
0.1695
0.1678
0.1662
0.1596
0.1580
0.1563
0.1547
0.1530
0.1481
0.1449
0.1432
0.1384
0.1351
0.1335
0.1319
0.1302
0.1286
0.1254
0.1238
0.1222
0.1173
0.1125
0.1093
0.1045
0.1013
0.0997
0.0950
0.0934
0.0902

0.3981
0.3819
0.3765
0.3693
0.3602
0.3548
0.3457
0.3312
0.3293
0.3147
0.3037
0.3019
0.2982
0.2927
0.2872
0.2854
0.2780
0.2669
0.2577
0.2540
0.2503
0.2410
0.2354
0.2335
0.2317
0.2242
0.2223
0.2205
0.2186
0.2167
0.2111
0.2073
0.2055
0.1998
0.1960
0.1942
0.1923
0.1904
0.1885
0.1847
0.1828
0.1809
0.1752
0.1695
0.1657
0.1600
0.1561
0.1542
0.1485
0.1465
0.1427
349

107
65
2
0
0.1113 0.0132 0.0871 0.1388
108
63
2
0
0.1078 0.0130 0.0839 0.1349
109
61
2
0
0.1042 0.0128 0.0808 0.1311
111
59
1
0
0.1025 0.0127 0.0792 0.1291
112
58
1
0
0.1007 0.0126 0.0777 0.1272
114
57
1
0
0.0989 0.0126 0.0761 0.1252
115
56
1
0
0.0972 0.0124 0.0745 0.1233
116
55
1
0
0.0954 0.0123 0.0730 0.1213
117
54
2
0
0.0919 0.0121 0.0699 0.1174
118
52
1
0
0.0901 0.0120 0.0683 0.1155
119
51
1
0
0.0883 0.0119 0.0668 0.1135
122
50
3
0
0.0830 0.0116 0.0622 0.1076
123
47
1
0
0.0813 0.0115 0.0606 0.1056
124
46
1
0
0.0795 0.0114 0.0591 0.1037
125
45
2
0
0.0760 0.0111 0.0561 0.0997
126
43
1
0
0.0742 0.0110 0.0545 0.0977
127
42
2
0
0.0707 0.0108 0.0515 0.0937
130
40
2
0
0.0671 0.0105 0.0485 0.0897
131
38
1
0
0.0654 0.0104 0.0470 0.0877
133
37
1
0
0.0636 0.0103 0.0455 0.0857
135
36
1
0
0.0618 0.0101 0.0440 0.0837
136
35
2
0
0.0583 0.0098 0.0410 0.0797
139
33
2
0
0.0548 0.0096 0.0381 0.0756
140
31
1
0
0.0530 0.0094 0.0366 0.0736
141
30
3
0
0.0477 0.0090 0.0323 0.0675
142
27
1
0
0.0459 0.0088 0.0308 0.0654
143
26
1
0
0.0442 0.0086 0.0294 0.0633
146
25
2
0
0.0406 0.0083 0.0265 0.0592
147
23
1
0
0.0389 0.0081 0.0251 0.0571
148
22
2
0
0.0353 0.0078 0.0223 0.0529
151
20
1
0
0.0336 0.0076 0.0209 0.0508
152
19
1
0
0.0318 0.0074 0.0196 0.0487
153
18
2
0
0.0283 0.0070 0.0169 0.0444
154
16
1
0
0.0265 0.0068 0.0155 0.0423
160
15
1
0
0.0247 0.0065 0.0142 0.0401
163
14
2
0
0.0212 0.0061 0.0116 0.0357
165
12
1
0
0.0194 0.0058 0.0103 0.0335
168
11
1
0
0.0177 0.0055 0.0091 0.0312
174
10
1
0
0.0159 0.0053 0.0079 0.0290
175
9
1
0
0.0141 0.0050 0.0067 0.0267
179
8
1
0
0.0124 0.0046 0.0055 0.0244
191
7
1
0
0.0106 0.0043 0.0044 0.0220
192
6
1
0
0.0088 0.0039 0.0034 0.0196
205
5
1
0
0.0071 0.0035 0.0024 0.0171
208
4
1
0
0.0053 0.0031 0.0015 0.0146
216
3
1
0
0.0035 0.0025 0.0007 0.0121
226
2
1
0
0.0018 0.0018 0.0002 0.0095
235
1 1 0
0.0000
.
.
.
------------------------------------------------------------------------------.
350

. * And Nelson-Aalen Integrated Hazard


. * sts list, na
.
. * (4) STCOX REGRESS ON INTERCEPT GIVES SAME RESULTS AS ABOVE
.
. * Cox Regression on an intercept
. gen one = 1
. stcox one, basesurv(coxbasesurv) basechazard(coxbasecumhaz) basehc(coxbasehaz)
failure _d: 1 (meaning all fail)
analysis time _t: dur
note: one dropped due to collinearity
Iteration 0: log likelihood = -3032.134
Refining estimates:
Iteration 0: log likelihood = -3032.134
Cox regression -- Breslow method for ties
No. of subjects =
No. of failures =
Time at risk =

566
566
24691

Number of obs =

LR chi2(0)
Log likelihood =

-3032.134

566

=
0.00
Prob > chi2 =

-----------------------------------------------------------------------------_t | Haz. Ratio Std. Err.


z P>|z| [95% Conf. Interval]
-------------+--------------------------------------------------------------------------------------------------------------------------------------------.
. * Instead use sts which analyzes dependent in isolation
. * sts gen surv = s
. sts gen cumhaz = na
. sts gen haz = h
.
. * Compare to verify that same answers
. sum surv coxbasesurv cumhaz coxbasecumhaz haz coxbasehaz
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------surv |
566 .493014 .2848417
0 .9823322
coxbasesurv |
566 .493014 .2848417
0 .9823322
cumhaz |
566
1 .9834583 .0176678 6.871446
coxbasecum~z |
566
1 .9834583 .0176678 6.871446
haz |
566 .0345186 .0515235 .0045455
1
-------------+-------------------------------------------------------coxbasehaz |
566 .0345186 .0515235 .0045455
1
351

. corr surv coxbasesurv


(obs=566)
| surv coxbas~v
-------------+-----------------surv | 1.0000
coxbasesurv | 1.0000 1.0000

. corr cumhaz coxbasecumhaz


(obs=566)
| cumhaz cox~mhaz
-------------+-----------------cumhaz | 1.0000
coxbasecum~z | 1.0000 1.0000

. corr haz coxbasehaz


(obs=566)
|
haz cox~ehaz
-------------+-----------------haz | 1.0000
coxbasehaz | 1.0000 1.0000

.
. * (5) ESTIMATE HAZARD FUNCTION
.
. * sts graph does not give the true hazard function - it instead gives the
. * difference in the cumulative hazard (without division by time difference).
.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section4\mma17p1km.txt
log type: text
closed on: 19 May 2005, 13:20:01
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p2kmextra.txt
log type: text
opened on: 19 May 2005, 13:24:01
.
. ********** OVERVIEW OF MMA17PP2KMEXTRA.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
352

. * Cambridge University Press


.
. * Chapter 17.5.1 pages 581-2
. * Nonparametric Survival Analysis
. * Provides
. * (1) K-M Survivor Function and N_A Cum Hazard Estimates (Table 17.2)
. * using artificial data
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA **********
.
. * The time does not matter except for the hazard.
. * Here arbitrarily let durations be 1, 4, 6, 11 and 20 (so irregularly spaced)
. * 1. At t = 10 (time t1): 6 failures
. * 2. At t = 15:
4 censored (lost) between t1 and t2
. * 3. At t = 20 (time t2): 5 failures
. * 4. At t = 25:
3 censored (lost) between t2 and t3
. * 3. At t = 30 (time t3): 2 failures
. * 4. At t = 35:
1 censored (lost) between t3 and t4
. * 3. At t = 40 (time t4): 1 failures
. * 4. At t = 45:
32 failures (lost) between t4 and t5
. * 5. At t = 50 (time t5): 26 censored
.
. * Indicator failed = 1 if fail and 0 if censored
. input duration failed

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.

duration
10 1
10 1
10 1
10 1
10 1
10 1
15 0
15 0
15 0
15 0
20 1
20 1
20 1
20 1
20 1
25 0

failed

353

17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.

25
25
30
30
35
40
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
45
50
50
50
50
50
50
50
50
50
50
50
50
50

0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
354

68. 50
69. 50
70. 50
71. 50
72. 50
73. 50
74. 50
75. 50
76. 50
77. 50
78. 50
79. 50
80. 50
81. end

1
1
1
1
1
1
1
1
1
1
1
1
1

.
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------duration |
80
39.625 13.40166
10
50
failed |
80
.5 .5031546
0
1
.
. ***** COMPUTATION USING STATA **********
.
. * Stata st curves require defining the dependent variable
. stset duration, fail(failed=1)
failure event: failed == 1
obs. time interval: (0, duration]
exit on or before: failure
-----------------------------------------------------------------------------80 total obs.
0 exclusions
-----------------------------------------------------------------------------80 obs. remaining, representing
40 failures in single record/single failure data
3170 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
50
. stsum
failure _d: failed == 1
analysis time _t: duration
|
incidence
no. of |------ Survival time -----|
| time at risk rate
subjects
25%
50%
75%
---------+--------------------------------------------------------------------355

total |

3170 .0126183

80

50

50

50

. stdes
failure _d: failed == 1
analysis time _t: duration
|-------------- per subject --------------|
Category
total
mean
min median
max
-----------------------------------------------------------------------------no. of subjects
80
no. of records
80
1
1
1
1
(first) entry time
(final) exit time
subjects with gap
time on gap if gap
time at risk

0
39.625

0
10

0
45

50

0
0
3170

39.625

10

45

50

failures
40
.5
0
.5
1
-----------------------------------------------------------------------------.
. * K-M survival graph
. * sts graph, gwood
.
. * N-A Cumulative Hazard
. * sts graph, cna
.
. * Kaplan-Meier Survivor Function listed (last column Table 17.2)
. sts list
failure _d: failed == 1
analysis time _t: duration
Beg.
Net
Survivor
Std.
Time Total Fail Lost
Function Error [95% Conf. Int.]
------------------------------------------------------------------------------10
80
6
0
0.9250 0.0294 0.8407 0.9656
15
74
0
4
0.9250 0.0294 0.8407 0.9656
20
70
5
0
0.8589 0.0395 0.7596 0.9193
25
65
0
3
0.8589 0.0395 0.7596 0.9193
30
62
2
0
0.8312 0.0428 0.7268 0.8984
35
60
0
1
0.8312 0.0428 0.7268 0.8984
40
59
1
0
0.8171 0.0443 0.7104 0.8875
45
58
0 32
0.8171 0.0443 0.7104 0.8875
50
26 26 0
0.0000
.
.
.
------------------------------------------------------------------------------.
356

. * Nelson-Aalen Cumulative Hazard Listed (second last column Table 17.2)


. sts list, na
failure _d: failed == 1
analysis time _t: duration
Beg.
Net
Nelson-Aalen Std.
Time Total Fail Lost
Cum. Haz. Error [95% Conf. Int.]
------------------------------------------------------------------------------10
80
6
0
0.0750 0.0306 0.0337 0.1669
15
74
0
4
0.0750 0.0306 0.0337 0.1669
20
70
5
0
0.1464 0.0442 0.0810 0.2648
25
65
0
3
0.1464 0.0442 0.0810 0.2648
30
62
2
0
0.1787 0.0498 0.1035 0.3085
35
60
0
1
0.1787 0.0498 0.1035 0.3085
40
59
1
0
0.1956 0.0526 0.1155 0.3313
45
58
0 32
0.1956 0.0526 0.1155 0.3313
50
26 26
0
1.1956 0.2030 0.8571 1.6678
------------------------------------------------------------------------------.
. ***** MANUAL COMPUTATION AS IN TABLE 17.2 (page 582) **********
.
. scalar cumhaz1 = 6/80
. scalar cumhaz2 = 6/80 + 5/70
. scalar cumhaz3 = 6/80 + 5/70 + 2/62
. scalar surv1 = 1-6/80
. scalar surv2 = (1-6/80)*(1-5/70)
. scalar surv3 = (1-6/80)*(1-5/70)*(1-2/62)
. di "Cumulative hazard at t1: " cumhaz1 " at t2: " cumhaz2 " at t3: " cumhaz3
Cumulative hazard at t1: .075 at t2: .14642857 at t3: .17868664
. di "Survivor function at t1: " surv1 " at t2: " surv2 " at t3: " surv3
Survivor function at t1: .925 at t2: .85892857 at t3: .8312212
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma17p2kmextra.txt
log type: text
closed on: 19 May 2005, 13:24:01
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p3weib.txt
log type: text
opened on: 19 May 2005, 14:22:25
357

.
. ********** OVERVIEW OF MMA17P3WEIB.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 17.6.1 (pages 584-6)
. * Plot of Weibull density, survuvor, hazard and cumulative hazard functions
. * Provides
. * (1) Figure 17.2 (ch17weibull.wmf)
.
. * This program requires no data
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** GENERATE DATA AND FUNCTIONS **********
.
. set obs 800
obs was 0, now 800
.
. gen t = 0.1*_n /* duration time */
.
. * Generate the survivor, hazard, cumulative hazard and density
. scalar g = 0.01 /* gamma */
. scalar a = 1.5 /* alpha */
. gen surv = exp(-g*(t^(a)))
. gen density = g*a*(t^(a-1))*exp(-g*(t^(a)))
. gen hazard = g*a*(t^(a-1))
. gen cumhaz = -ln(surv)
.
. ********** DO THE FOUR SEPARATE GRAPHS FOR FIGURE 17.2 **********
.
358

. * Weibull density
. graph twoway (scatter density t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /*
> */ ytitle("Weibull density", size(large)) yscale(titlegap(*5)) /*
> */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge))
. graph save ch17fig2a, replace
(file ch17fig2a.gph saved)
.
. * Weibull survivor
. graph twoway (scatter surv t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /*
> */ ytitle("Weibull survivor", size(large)) yscale(titlegap(*5)) /*
> */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge))
. graph save ch17fig2b, replace
(file ch17fig2b.gph saved)
.
. * Weibull hazard
. graph twoway (scatter hazard t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /*
> */ ytitle("Weibull hazard", size(large)) yscale(titlegap(*5)) /*
> */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge))
. graph save ch17fig2c, replace
(file ch17fig2c.gph saved)
.
. * Weibull cumulative hazard
. graph twoway (scatter cumhaz t, c(l) msize(vtiny) clwidth(medthick) clstyle(p1)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ xtitle("Duration time", size(large)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative hazard", size(large)) yscale(titlegap(*5)) /*
> */ xlabel(,labsize(medlarge)) ylabel(,labsize(medlarge))
. graph save ch17fig2d, replace
(file ch17fig2d.gph saved)
.
. ********** COMBINE THE FOUR GRAPHS FOR FIGURE 17.2 (page 585) **********
.
. graph combine ch17fig2a.gph ch17fig2b.gph ch17fig2c.gph ch17fig2d.gph, /*
> */ title("Weibull Distribution", margin(b=2) size(vlarge))
. graph export ch17weibull.wmf, replace
(file c:\Imbook\bwebpage\Section4\ch17weibull.wmf written in Windows Metafile format)
359

.
. ********** CLOSE OUTPUT
. log close
log: c:\Imbook\bwebpage\Section4\mma17p3weib.txt
log type: text
closed on: 19 May 2005, 14:22:39
-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma17p4duration.txt
log type: text
opened on: 19 May 2005, 15:25:00
.
. ********** OVERVIEW OF MMA17P4DURATION.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 17.11 (pages 603-8)
. * Duration regression with censored data example
. * Provides
. * (1) Data summary: Table 17.6
. * (2) List of Survivor Function and Cumulative Hazard Estimates: Table 17.7
. * (3) Various graphs describing the data
.*
(3A) K-M Survival Graph for all data (Figure 17.3: km_pt1.wmf)
.*
(3B) K-M Survival Graph by unemployment insurance (Figure 17.4: km_pt2.wmf)
.*
(3C) N-A Cumulative Hazard Graph for all data (Figure 17.5: na_pt1.wmf)
.*
(3D) N-A Cumulative Hazard Graph by unemployment insurance (Figure 17.6: na_pt2.wmf)
. * (4) Coefficient Estimates of Some Parametric Models (Table 17.8)
. * (4) Hazard Rate Estimates of Some Parametric Models (Table 17.9)
.
. * To run this program you need data file
. * ema1996.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
. set matsize 100
.
. ********** DATA DESCRIPTION **********
.
360

. * The data is from


. * B.P. McCall (1996), "Unemployment Insurance Rules, Joblessness,
.*
and Part-time Work," Econometrica, 64, 647-682.
.
. * McCalls data set named ema_1996_pt_lastweek.dta
. * has name changed to ema1996.dta
.
. * There are 3343 observations from the CPS Displaced Worker Surveys
. * of 1986, 1988, 1990 and 1992
. * 1. spell is length of spell in number of two-week intervals
. * 2. CENSOR1 = 1 if re-employed at full-time job
. * 3. CENSOR2 = 1 if re-employed at part-time job
. * 4. CENSOR3 = 1 if re-employed but left job: pt-ft status unknown
. * 5. CENSOR4 = 1 if still jobless
. * 6. ui (UI) = 1 if filed UI claim
. * 7. reprate (RR) = eligible replacement rate
. * 8. disrate (DR) = eligible disregard rate
. * 9. tenure (TENURE) = years tenure in lost job
. * 10. logwage (LOGWAGE) = log weekly earnings in lost job (1985$)
. * 11.-43. other variables listed in McCall (1986) table 2 p.657
.
. ********** READ DATA **********
.
. use ema1996.dta
(Sample for 1996 EMA paper: part-time= worked part-time last week)
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------ui |
3343 .5527969 .4972791
0
1
reprate |
3343 .4544717 .1137918
.066
2.059
logwage |
3343 5.692994 .5356591 2.70805 7.600402
tenure |
3343 4.114867 5.862322
0
40
disrate |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------slack |
3343 .4884834 .4999421
0
1
abolpos |
3343 .1456775 .3528354
0
1
explose |
3343 .5025426 .5000683
0
1
stateur |
3343
6.5516 1.803825
2.5
13
houshead |
3343 .6120251 .4873617
0
1
-------------+-------------------------------------------------------married |
3343 .5860006 .4926221
0
1
female |
3343 .3478911 .4763725
0
1
child |
3343 .4501944 .4975876
0
1
361

ychild |
3343 .1956327 .3967463
0
1
nonwhite |
3343 .1390966 .3460991
0
1
-------------+-------------------------------------------------------age |
3343 35.44331 10.6402
20
61
schlt12 |
3343 .2811846 .4496446
0
1
schgt12 |
3343 .3356267 .4722797
0
1
smsa |
3343 .7241998 .4469835
0
1
bluecoll |
3343 .6036494 .489212
0
1
-------------+-------------------------------------------------------mining |
3343 .029315 .1687132
0
1
constr |
3343 .1480706 .3552231
0
1
transp |
3343 .0646126 .2458778
0
1
trade |
3343 .1848639 .3882452
0
1
fire |
3343 .0514508 .2209484
0
1
-------------+-------------------------------------------------------services |
3343 .1699073 .3756075
0
1
pubadmin |
3343 .0095722 .097383
0
1
year85 |
3343 .2677236 .442839
0
1
year87 |
3343 .2174693 .4125862
0
1
year89 |
3343 .1998205 .3999251
0
1
-------------+-------------------------------------------------------midatl |
3343 .1088842 .3115405
0
1
encen |
3343 .1429853 .3501103
0
1
wncen |
3343 .0643135 .2453472
0
1
southatl |
3343 .2375112 .4256217
0
1
escen |
3343 .0532456 .2245564
0
1
-------------+-------------------------------------------------------wscen |
3343 .1441819 .3513266
0
1
mountain |
3343 .1079868 .3104102
0
1
pacific |
3343 .0260245 .159232
0
1
.
. * The following gives variables in same order as Table 2 p.657 of McCall (1996)
. * which gives fuller names for the variables
. sum spell censor1 censor2 censor3 censor4 age /*
> */ ui reprate disrate logwage tenure slack abolpos explose bluecoll /*
> */ houshead married child ychild female schlt12 schgt12 nonwhite smsa /*
> */ midatl encen wncen southatl escen wscen mountain pacific /*
> */ mining constr transp trade fire services pubadmin /*
> */ year85 year87 year89
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------age |
3343 35.44331 10.6402
20
61
ui |
3343 .5527969 .4972791
0
1
362

reprate |
3343 .4544717 .1137918
.066
2.059
disrate |
3343 .1094376 .0735274
.002
1.02
logwage |
3343 5.692994 .5356591 2.70805 7.600402
-------------+-------------------------------------------------------tenure |
3343 4.114867 5.862322
0
40
slack |
3343 .4884834 .4999421
0
1
abolpos |
3343 .1456775 .3528354
0
1
explose |
3343 .5025426 .5000683
0
1
bluecoll |
3343 .6036494 .489212
0
1
-------------+-------------------------------------------------------houshead |
3343 .6120251 .4873617
0
1
married |
3343 .5860006 .4926221
0
1
child |
3343 .4501944 .4975876
0
1
ychild |
3343 .1956327 .3967463
0
1
female |
3343 .3478911 .4763725
0
1
-------------+-------------------------------------------------------schlt12 |
3343 .2811846 .4496446
0
1
schgt12 |
3343 .3356267 .4722797
0
1
nonwhite |
3343 .1390966 .3460991
0
1
smsa |
3343 .7241998 .4469835
0
1
midatl |
3343 .1088842 .3115405
0
1
-------------+-------------------------------------------------------encen |
3343 .1429853 .3501103
0
1
wncen |
3343 .0643135 .2453472
0
1
southatl |
3343 .2375112 .4256217
0
1
escen |
3343 .0532456 .2245564
0
1
wscen |
3343 .1441819 .3513266
0
1
-------------+-------------------------------------------------------mountain |
3343 .1079868 .3104102
0
1
pacific |
3343 .0260245 .159232
0
1
mining |
3343 .029315 .1687132
0
1
constr |
3343 .1480706 .3552231
0
1
transp |
3343 .0646126 .2458778
0
1
-------------+-------------------------------------------------------trade |
3343 .1848639 .3882452
0
1
fire |
3343 .0514508 .2209484
0
1
services |
3343 .1699073 .3756075
0
1
pubadmin |
3343 .0095722 .097383
0
1
year85 |
3343 .2677236 .442839
0
1
-------------+-------------------------------------------------------year87 |
3343 .2174693 .4125862
0
1
year89 |
3343 .1998205 .3999251
0
1
.
. * The following creates a space-delimited data set with
. * variables in same order as Table 2 p.657 of McCall (1996)
. * Permits use by programs other than Stata
. * Note that order has been changed a little from the original Stata data set
.
. outfile spell censor1 censor2 censor3 censor4 age /*
> */ ui reprate disrate logwage tenure slack abolpos explose bluecoll /*
363

>
>
>
>

*/ houshead married child ychild female schlt12 schgt12 nonwhite smsa /*


*/ midatl encen wncen southatl escen wscen mountain pacific /*
*/ mining constr transp trade fire services pubadmin /*
*/ year85 year87 year89 using ema1996.asc, replace

.
. ********* ANALYSIS: UNEMPLOYMENT DURATION **********
.
. * Stata st curves require defining the dependent variable
. * and the censoring variable if there is one
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. stdes
failure _d: censor1 == 1
analysis time _t: spell
|-------------- per subject --------------|
Category
total
mean
min median
max
-----------------------------------------------------------------------------no. of subjects
3343
no. of records
3343
1
1
1
1
(first) entry time
(final) exit time
subjects with gap
time on gap if gap
time at risk

0
6.247981
0
0
20887 6.247981

0
1

0
5

28

28

failures
1073 .3209692
0
0
1
-----------------------------------------------------------------------------.
. * (1) SUMMARIZE KEY VARIABLES (Table 17.6, p.603)
.
. sum spell censor1 censor2 censor3 censor4 ui reprate disrate tenure logwage
364

Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------ui |
3343 .5527969 .4972791
0
1
reprate | 3343 .4544717 .1137918
.066
2.059
disrate |
3343 .1094376 .0735274
.002
1.02
tenure |
3343 4.114867 5.862322
0
40
logwage |
3343 5.692994 .5356591 2.70805 7.600402
.
. * (2) LIST SURVIVAL CURVE AND CUMULATIVE HAZARD ESTIMATES (Table 17.7,
p.605)
.
. * Kaplan-Meier Estimates of Survival Function
. sts list
failure _d: censor1 == 1
analysis time _t: spell
Beg.
Net
Survivor
Std.
Time Total Fail Lost
Function Error [95% Conf. Int.]
------------------------------------------------------------------------------1 3343 294 246
0.9121 0.0049 0.9019 0.9212
2 2803 178 304
0.8541 0.0062 0.8415 0.8659
3 2321 119 305
0.8103 0.0071 0.7960 0.8238
4 1897 56 165
0.7864 0.0076 0.7712 0.8008
5 1676 104 233
0.7376 0.0085 0.7206 0.7538
6 1339 32 111
0.7200 0.0088 0.7023 0.7369
7 1196 85 178
0.6688 0.0098 0.6492 0.6876
8
933 15 70
0.6581 0.0100 0.6380 0.6773
9
848 33 98
0.6325 0.0106 0.6113 0.6528
10
717
3 55
0.6298 0.0106 0.6086 0.6503
11
659 26 77
0.6050 0.0113 0.5825 0.6267
12
556
7 40
0.5974 0.0115 0.5744 0.6195
13
509 25 69
0.5680 0.0123 0.5434 0.5918
14
415 30 74
0.5270 0.0135 0.5001 0.5531
15
311 19 40
0.4948 0.0146 0.4658 0.5230
16
252 10 41
0.4751 0.0153 0.4449 0.5047
17
201
8 24
0.4562 0.0161 0.4245 0.4874
18
169
7 13
0.4373 0.0169 0.4040 0.4702
19
149
4 15
0.4256 0.0174 0.3912 0.4595
20
130
3 18
0.4158 0.0179 0.3804 0.4507
21
109
4 23
0.4005 0.0188 0.3635 0.4372
22
82
4
9
0.3810 0.0203 0.3412 0.4206
23
69
0
9
0.3810 0.0203 0.3412 0.4206
365

24
60
0
2
0.3810 0.0203 0.3412 0.4206
25
58
0 10
0.3810 0.0203 0.3412 0.4206
26
48
2 13
0.3651 0.0223 0.3214 0.4088
27
33
5 24
0.3098 0.0296 0.2528 0.3684
28
4
0
4
0.3098 0.0296 0.2528 0.3684
------------------------------------------------------------------------------.
. * Nelson-Aalen Estimates of Cumulative Hazard
. sts list, na
failure _d: censor1 == 1
analysis time _t: spell
Beg.
Net
Nelson-Aalen Std.
Time Total Fail Lost
Cum. Haz. Error [95% Conf. Int.]
------------------------------------------------------------------------------1 3343 294 246
0.0879 0.0051 0.0784 0.0986
2 2803 178 304
0.1514 0.0070 0.1383 0.1658
3 2321 119 305
0.2027 0.0084 0.1869 0.2199
4 1897 56 165
0.2322 0.0093 0.2147 0.2512
5 1676 104 233
0.2943 0.0111 0.2733 0.3169
6 1339 32 111
0.3182 0.0119 0.2957 0.3424
7 1196 85 178
0.3893 0.0142 0.3624 0.4181
8
933 15 70
0.4053 0.0148 0.3774 0.4353
9
848 33 98
0.4443 0.0162 0.4135 0.4773
10 717
3 55
0.4484 0.0164 0.4174 0.4818
11
659 26 77
0.4879 0.0182 0.4536 0.5248
12
556
7 40
0.5005 0.0188 0.4650 0.5387
13
509 25 69
0.5496 0.0212 0.5096 0.5927
14
415 30 74
0.6219 0.0250 0.5748 0.6728
15
311 19 40
0.6830 0.0286 0.6291 0.7415
16
252 10 41
0.7227 0.0313 0.6639 0.7866
17
201
8 24
0.7625 0.0343 0.6982 0.8327
18
169
7 13
0.8039 0.0377 0.7333 0.8812
19
149
4 15
0.8307 0.0400 0.7559 0.9130
20
130
3 18
0.8538 0.0422 0.7750 0.9406
21
109
4 23
0.8905 0.0460 0.8048 0.9853
22
82
4
9
0.9393 0.0521 0.8426 1.0470
23
69
0
9
0.9393 0.0521 0.8426 1.0470
24
60
0
2
0.9393 0.0521 0.8426 1.0470
25
58
0 10
0.9393 0.0521 0.8426 1.0470
26
48
2 13
0.9809 0.0598 0.8705 1.1055
27
33
5 24
1.1325 0.0904 0.9685 1.3242
28
4
0
4
1.1325 0.0904 0.9685 1.3242
------------------------------------------------------------------------------.
. * (3) VARIOUS GRAPHS (Figures 17.3-17.6)
.
. * (3A) Figure 17.3: Overall Survival Function (page 604)
366

. * sts graph, gwood


. * Nicer graphs and also confidence bands are bolder and easier to read
. sts gen surv = s
. sts gen lbsurv = lb(s)
. sts gen ubsurv = ub(s)
. sort spell
. graph twoway (line ubsurv spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)) /*
> */ (line surv spell, msize(vtiny) mstyle(p1) c(J) clstyle(p1)) /*
> */ (line lbsurv spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)), /*
> */ scale(1.2) plotregion(style(none)) /*
> */ title("Overall Survival Function Estimate") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Survival Probability", size(medlarge)) yscale(titlegap(*5)) /*
> */ ylabel(0.00(0.25)1.00,grid)/*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Upper 95% confidence band") label(2 "Survival Estimate") /*
> */
label(3 "Lower 95% confidence band") )
. graph export km_pt1.wmf, replace
(file c:\Imbook\bwebpage\Section4\km_pt1.wmf written in Windows Metafile format)
.
. * (3B) Figure 17.4: Survival Function by Treatment (here ui) (p.605)
. * sts graph, by(ui)
. sts graph, by(ui) /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Survival Function Estimates by UI Status") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Survival Probability", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend(label(1 "No UI (UI = 0)") label(2 "Received UI (UI = 1)") )
failure _d: censor1 == 1
analysis time _t: spell
. graph export km_pt2.wmf, replace
(file c:\Imbook\bwebpage\Section4\km_pt2.wmf written in Windows Metafile format)
.
. * (3C) Figure 17.5: Overall Cumulative Hazard Function (p.606)
. * sts graph, cna
. * Nicer graphs and also confidence bands are bolder and easier to read
. sts gen cumhaz = na
. sts gen lbcumhaz = lb(na)
. sts gen ubcumhaz = ub(na)
367

. sort spell
. graph twoway (line ubcumhaz spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)) /*
> */ (line cumhaz spell, msize(vtiny) mstyle(p1) c(J) clstyle(p1)) /*
> */ (line lbcumhaz spell, msize(vtiny) mstyle(p2) c(J) clstyle(p1) clcolor(gs10)), /*
> */ scale(1.2) plotregion(style(none)) /*
> */ title("Overall Cumulative Hazard Estimate") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ ylabel(0.00(0.50)1.50,grid)/*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Upper 95% confidence band") label(2 "Cumulative Hazard Estimate") /*
> */
label(3 "Lower 95% confidence band") )
. graph export na_pt1.wmf, replace
(file c:\Imbook\bwebpage\Section4\na_pt1.wmf written in Windows Metafile format)
.
. * (3D) Figure 17.6: Cumulative Hazard Function by Treatment (here ui) (p.606)
. * sts graph, na by(ui)
. sts graph, na by(ui) /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Cumulative Hazard Estimates by UI Status") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(1) ring(0) col(1)) legend(size(small)) /*
> */ legend(label(1 "No UI (UI = 0)") label(2 "Received UI (UI = 1)") )
failure _d: censor1 == 1
analysis time _t: spell
. graph export na_pt2.wmf, replace
(file c:\Imbook\bwebpage\Section4\na_pt2.wmf written in Windows Metafile format)
.
. * (4) VARIOUS PARAMETRIC MODELS: COEFFICIENTS (Table 17.8)
.
. * streg default is to report hazard rates ratehr than coeffcients
. * streg with nohr option reports coefficients
.
. * Create regressors
. gen RR = reprate
. gen DR = disrate
. gen UI = ui
. gen RRUI = RR*UI
. gen DRUI = DR*UI
368

. gen LOGWAGE = logwage


.
. * Define $xlist = list of regressors used in subsequent regressions
. global xlist RR DR UI RRUI DRUI LOGWAGE /*
> */ tenure slack abolpos explose stateur houshead married /*
> */ female child ychild nonwhite age schlt12 schgt12 smsa bluecoll /*
> */ mining constr transp trade fire services pubadmin /*
> */ year85 year87 year89 midatl /*
> */ encen wncen southatl escen wscen mountain pacific
.
. * Exponential regression
. streg $xlist, nohr robust dist(exponential)
failure _d: censor1 == 1
analysis time _t: spell
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3012.4909


log pseudo-likelihood = -2810.3791
log pseudo-likelihood = -2701.8024
log pseudo-likelihood = -2700.6911
log pseudo-likelihood = -2700.6903
log pseudo-likelihood = -2700.6903

Exponential regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 565.24


Log pseudo-likelihood = -2700.6903
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4720235 .6005534 0.79 0.432 -.7050396 1.649087
DR | -.5756396 .7624489 -0.75 0.450 -2.070012 .9187327
UI | -1.424561 .2493917 -5.71 0.000 -1.91336 -.9357622
RRUI | .9655904 .6118408 1.58 0.115 -.2335956 2.164776
DRUI | -.1990635 1.019118 -0.20 0.845 -2.196498 1.798371
LOGWAGE | .3508005 .115598 3.03 0.002 .1242327 .5773684
tenure | -.0001462 .0064637 -0.02 0.982 -.0128147 .0125224
slack | -.2593666 .0759363 -3.42 0.001 -.4081991 -.1105342
abolpos | -.1550897 .0953306 -1.63 0.104 -.3419342 .0317549
explose | .198458 .0648354 3.06 0.002
.071383 .3255331
stateur | -.064626 .0229903 -2.81 0.005 -.1096862 -.0195659
houshead | .3812208 .0836602 4.56 0.000 .2172499 .5451918
married | .369552 .0786145 4.70 0.000 .2154705 .5236335
369

female | .1164067 .0852986 1.36 0.172 -.0507754 .2835888


child | -.0333008 .0794577 -0.42 0.675 -.1890352 .1224335
ychild | -.1449722 .1022781 -1.42 0.156 -.3454336 .0554892
nonwhite | -.6692066 .1188272 -5.63 0.000 -.9021037 -.4363095
age | -.0220821 .0039256 -5.63 0.000 -.0297762 -.0143879
schlt12 | -.1231414 .0966102 -1.27 0.202 -.3124939 .066211
schgt12 | .1114395 .082945 1.34 0.179 -.0511297 .2740087
smsa | .1922291 .0799904 2.40 0.016 .0354508 .3490075
bluecoll | -.2033718 .085129 -2.39 0.017 -.3702215 -.036522
mining | -.1205818 .1973575 -0.61 0.541 -.5073955 .2662319
constr | -.04475 .1081519 -0.41 0.679 -.2567237 .1672238
transp | -.1786694 .156034 -1.15 0.252 -.4844906 .1271517
trade | -.0345159 .1019152 -0.34 0.735 -.234266 .1652341
fire | .1120549 .1386716 0.81 0.419 -.1597365 .3838462
services | .1840002 .0983911 1.87 0.061 -.0088428 .3768432
pubadmin | .1090606 .2954211 0.37 0.712 -.4699541 .6880752
year85 | .2147661 .0888664 2.42 0.016 .0405911 .388941
year87 | .3541162 .0948499 3.73 0.000 .1682139 .5400186
year89 | .467082 .1104355 4.23 0.000 .2506325 .6835316
midatl | .0264112 .1465647 0.18 0.857 -.2608503 .3136727
encen | .0043916 .1502813 0.03 0.977 -.2901544 .2989375
wncen | .1724311 .1607689 1.07 0.283 -.1426703 .4875324
southatl | .2638807 .1183726 2.23 0.026 .0318747 .4958867
escen | .35414 .19317 1.83 0.067 -.0244664 .7327463
wscen | .3385896 .1433308 2.36 0.018 .0576664 .6195128
mountain | .0063693 .1538821 0.04 0.967 -.2952341 .3079727
pacific | .0770202 .2393505 0.32 0.748 -.3920982 .5461385
_cons | -4.079107 .8767097 -4.65 0.000 -5.797426 -2.360788
-----------------------------------------------------------------------------. estimates store bexponential
.
. * Weibull regression
. streg $xlist, nohr robust dist(weibull)
failure _d: censor1 == 1
analysis time _t: spell
Fitting constant-only model:
Iteration 0: log pseudo-likelihood = -3012.4909
Iteration 1: log pseudo-likelihood = -3012.3543
Iteration 2: log pseudo-likelihood = -3012.3543
Fitting full model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -3012.3543


log pseudo-likelihood = -2799.9064
log pseudo-likelihood = -2688.7377
log pseudo-likelihood = -2687.6004
370

Iteration 4: log pseudo-likelihood = -2687.5995


Iteration 5: log pseudo-likelihood = -2687.5995
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 501.65


Log pseudo-likelihood = -2687.5995
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4481156 .6381895 0.70 0.483 -.8027127 1.698944
DR | -.4269187 .8086983 -0.53 0.598 -2.011938 1.158101
UI | -1.496066 .2639679 -5.67 0.000 -2.013434 -.9786984
RRUI | 1.015226 .6455611 1.57 0.116 -.2500501 2.280503
DRUI | -.2988417 1.065384 -0.28 0.779 -2.386956 1.789272
LOGWAGE | .3655253 .12212 2.99 0.003 .1261745 .6048761
tenure | -.0011127 .0068716 -0.16 0.871 -.0145809 .0123554
slack | -.2652154 .0803214 -3.30 0.001 -.4226424 -.1077883
abolpos | -.1604227 .1012942 -1.58 0.113 -.3589557 .0381103
explose | .2075085 .0684715 3.03 0.002 .0733068 .3417103
stateur | -.0708745 .0242117 -2.93 0.003 -.1183286 -.0234204
houshead | .3976626 .0887192 4.48 0.000 .2237762 .571549
married | .3786057 .0830317 4.56 0.000 .2158665 .541345
female | .1260829 .0896987 1.41 0.160 -.0497233 .301889
child | -.0336778 .0839956 -0.40 0.688 -.1983061 .1309505
ychild | -.1613066 .108947 -1.48 0.139 -.3748389 .0522256
nonwhite | -.7025504 .12426 -5.65 0.000 -.9460956 -.4590052
age | -.0235823 .0041922 -5.63 0.000 -.0317989 -.0153658
schlt12 | -.1226759 .1022762 -1.20 0.230 -.3231335 .0777816
schgt12 | .1162848 .0880692 1.32 0.187 -.0563278 .2888973
smsa | .1999567 .0841129 2.38 0.017 .0350985 .3648149
bluecoll | -.1994925 .0899354 -2.22 0.027 -.3757626 -.0232223
mining | -.1015676 .2036644 -0.50 0.618 -.5007425 .2976073
constr | -.0253737 .1135609 -0.22 0.823 -.247949 .1972016
transp | -.1981522 .1672141 -1.19 0.236 -.5258858 .1295814
trade | -.0311361 .1079502 -0.29 0.773 -.2427146 .1804423
fire | .1262153 .1492527 0.85 0.398 -.1663145 .4187452
services | .2031673 .1038945 1.96 0.051 -.0004622 .4067968
pubadmin | .1117728 .3087374 0.36 0.717 -.4933415 .716887
year85 | .2374972 .093387 2.54 0.011
.054462 .4205325
year87 | .3787397 .1011782 3.74 0.000 .1804341 .5770454
year89 | .4920278 .1180472 4.17 0.000 .2606596 .7233959
midatl | .02465 .1542139 0.16 0.873 -.2776037 .3269036
encen | -.0014111 .1579065 -0.01 0.993 -.3109023 .30808
wncen | .1844363 .1694444 1.09 0.276 -.1476687 .5165413
southatl | .2740974 .1250481 2.19 0.028 .0290076 .5191872
371

escen | .367742 .2024771 1.82 0.069 -.0291058 .7645899


wscen | .3440005 .1527804 2.25 0.024 .0445563 .6434446
mountain | .0159627 .1620188 0.10 0.922 -.3015883 .3335136
pacific | .0849532 .2504077 0.34 0.734 -.4058368 .5757432
_cons | -4.357886 .9196792 -4.74 0.000 -6.160424 -2.555347
-------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281
-------------+---------------------------------------------------------------p | 1.129225 .0219492
1.087014 1.173075
1/p | .8855632 .0172131
.8524608 .9199511
-----------------------------------------------------------------------------. estimates store bweibull
.
. * Gompertz regression
. streg $xlist, nohr robust dist(gompertz)
failure _d: censor1 == 1
analysis time _t: spell
Fitting constant-only model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -3012.4909


log pseudo-likelihood = -3002.0916
log pseudo-likelihood = -3002.026
log pseudo-likelihood = -3002.026

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3002.026


log pseudo-likelihood = -2796.0001
log pseudo-likelihood = -2701.6693
log pseudo-likelihood = -2700.6057
log pseudo-likelihood = -2700.605
log pseudo-likelihood = -2700.605

Gompertz regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

Log pseudo-likelihood =

3343
1073
20887

Number of obs =

Wald chi2(40) = 529.75


-2700.605
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .472405 .6033813 0.78 0.434 -.7102005 1.655011
DR | -.5627894 .7646131 -0.74 0.462 -2.061404 .9358247
372

UI | -1.428355 .2508349 -5.69 0.000 -1.919982 -.9367272


RRUI | .9689413 .6144464 1.58 0.115 -.2353514 2.173234
DRUI | -.2112495 1.021112 -0.21 0.836 -2.212593 1.790094
LOGWAGE | .3524722 .1162698 3.03 0.002 .1245876 .5803567
tenure | -.0002233 .0065002 -0.03 0.973 -.0129635 .0125168
slack | -.2593933 .0762829 -3.40 0.001 -.4089051 -.1098815
abolpos | -.1552595 .0958002 -1.62 0.105 -.3430244 .0325053
explose | .1991286 .0650876 3.06 0.002 .0715592 .326698
stateur | -.065244 .0231645 -2.82 0.005 -.1106456 -.0198424
houshead | .3822818 .0841671 4.54 0.000 .2173173 .5472464
married | .3700141 .0789107 4.69 0.000
.215352 .5246762
female | .1170987 .0856236 1.37 0.171 -.0507206 .2849179
child | -.0331425 .0798246 -0.42 0.678 -.1895958 .1233108
ychild | -.1466596 .102884 -1.43 0.154 -.3483085 .0549893
nonwhite | -.6720521 .1197092 -5.61 0.000 -.9066778 -.4374264
age | -.0222175 .0039787 -5.58 0.000 -.0300157 -.0144193
schlt12 | -.1228615 .097015 -1.27 0.205 -.3130075 .0672845
schgt12 | .1121295 .0831976 1.35 0.178 -.0509348 .2751938
smsa | .1925807 .0803478 2.40 0.017 .0351019 .3500596
bluecoll | -.203405 .0854986 -2.38 0.017 -.3709791 -.0358309
mining | -.1183683 .1976441 -0.60 0.549 -.5057435 .269007
constr | -.0423947 .1082891 -0.39 0.695 -.2546375 .169848
transp | -.1799724 .1570001 -1.15 0.252 -.487687 .1277422
trade | -.0341793 .1023611 -0.33 0.738 -.2348034 .1664447
fire | .1143611 .1398161 0.82 0.413 -.1596734 .3883955
services | .1854033 .0987923 1.88 0.061 -.0082261 .3790327
pubadmin | .1089298 .2965867 0.37 0.713 -.4723694 .690229
year85 | .2172389 .0890506 2.44 0.015 .0427028 .3917749
year87 | .3564181 .095298 3.74 0.000 .1696374 .5431988
year89 | .4690752 .1114266 4.21 0.000
.250683 .6874674
midatl | .026766 .1471298 0.18 0.856 -.2616031 .3151351
encen | .0043808 .15089 0.03 0.977 -.2913581 .3001198
wncen | .1735986 .1614007 1.08 0.282 -.142741 .4899382
southatl | .2647448 .1188746 2.23 0.026
.031755 .4977347
escen | .3560917 .1938142 1.84 0.066 -.0237772 .7359606
wscen | .3393956 .1442438 2.35 0.019 .0566829 .6221082
mountain | .0076507 .1545162 0.05 0.961 -.2951954 .3104969
pacific | .0778885 .2400495 0.32 0.746 -.3925999 .5483769
_cons | -4.09733 .8802997 -4.65 0.000 -5.822686 -2.371975
-------------+---------------------------------------------------------------gamma | .002658 .0067759 0.39 0.695 -.0106225 .0159386
-----------------------------------------------------------------------------. estimates store bgompertz
.
. * Weibull regression
. stcox $xlist, nohr robust
failure _d: censor1 == 1
analysis time _t: spell
373

Iteration 0: log pseudo-likelihood = -7981.9304


Iteration 1: log pseudo-likelihood = -7731.2822
Iteration 2: log pseudo-likelihood = -7717.3198
Iteration 3: log pseudo-likelihood = -7717.2334
Iteration 4: log pseudo-likelihood = -7717.2334
Refining estimates:
Iteration 0: log pseudo-likelihood = -7717.2334
Cox regression -- Breslow method for ties
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 540.98


Log pseudo-likelihood = -7717.2334
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .5222796 .5711698 0.91 0.361 -.5971926 1.641752
DR | -.752507 .72175 -1.04 0.297 -2.167111 .6620971
UI | -1.317719 .2372893 -5.55 0.000 -1.782798 -.8526409
RRUI | .8822462 .582115 1.52 0.130 -.2586783 2.023171
DRUI | -.0951357 .977774 -0.10 0.922 -2.011538 1.821266
LOGWAGE | .3352639 .1106483 3.03 0.002 .1183972 .5521306
tenure | .0008278 .0061286 0.14 0.893 -.0111841 .0128396
slack | -.247863 .0721173 -3.44 0.001 -.3892103 -.1065158
abolpos | -.1511638 .0905035 -1.67 0.095 -.3285475 .0262198
explose | .1865068 .0615742 3.03 0.002 .0658236 .30719
stateur | -.0590475 .022085 -2.67 0.008 -.1023334 -.0157616
houshead | .3601866 .0794827 4.53 0.000 .2044035 .5159698
married | .358819 .0746355 4.81 0.000 .2125362 .5051019
female | .1002758 .0813277 1.23 0.218 -.0591236 .2596753
child | -.0396054 .0755365 -0.52 0.600 -.1876542 .1084435
ychild | -.1276638 .0967856 -1.32 0.187 -.3173602 .0620325
nonwhite | -.6394475 .1151332 -5.55 0.000 -.8651043 -.4137906
age | -.0204623 .0037593 -5.44 0.000 -.0278305 -.0130942
schlt12 | -.1220585 .0920073 -1.33 0.185 -.3023895 .0582726
schgt12 | .1104817 .0783542 1.41 0.159 -.0430897 .2640531
smsa | .1864841 .0766075 2.43 0.015 .0363361 .3366321
bluecoll | -.2108023 .080867 -2.61 0.009 -.3692986 -.052306
mining | -.1238251 .1906352 -0.65 0.516 -.4974632 .249813
constr | -.054455 .1029488 -0.53 0.597 -.256231 .1473209
transp | -.1551657 .1466515 -1.06 0.290 -.4425973 .1322659
trade | -.0383252 .0968106 -0.40 0.692 -.2280706 .1514201
fire | .1097585 .1300779 0.84 0.399 -.1451895 .3647065
services | .1666262 .0939507 1.77 0.076 -.0175138 .3507662
pubadmin | .1022002 .2829817 0.36 0.718 -.4524336 .6568341
year85 | .204162 .084908 2.40 0.016 .0377454 .3705786
374

year87 | .3384229 .0899115 3.76 0.000 .1621997 .5146462


year89 | .4486559 .104937 4.28 0.000 .2429832 .6543286
midatl | .0342238 .140515 0.24 0.808 -.2411805 .3096282
encen | .0174597 .1438862 0.12 0.903 -.2645521 .2994716
wncen | .1650967 .1532559 1.08 0.281 -.1352795 .4654728
southatl | .2518023 .1127138 2.23 0.025 .0308874 .4727172
escen | .3450422 .1839818 1.88 0.061 -.0155554 .7056398
wscen | .3316752 .1359801 2.44 0.015 .0651591 .5981914
mountain | .009484 .1468626 0.06 0.949 -.2783613 .2973293
pacific | .0720292 .2263339 0.32 0.750 -.3715771 .5156355
-----------------------------------------------------------------------------. estimates store bcox
.
. * Display Results for Table 17.8 (page 607)
. estimates table bexponential bweibull bgompertz, t stats(N ll) b(%8.3f) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE _cons)
----------------------------------------------Variable | bexpon~l bweibull bgompe~z
-------------+--------------------------------RR | 0.472
0.448
0.472
| 0.79
0.70
0.78
DR | -0.576 -0.427 -0.563
| -0.75 -0.53 -0.74
UI | -1.425 -1.496 -1.428
| -5.71 -5.67 -5.69
RRUI | 0.966
1.015
0.969
| 1.58
1.57
1.58
DRUI | -0.199 -0.299 -0.211
| -0.20 -0.28 -0.21
LOGWAGE | 0.351
0.366
0.352
| 3.03
2.99
3.03
_cons | -4.079 -4.358 -4.097
| -4.65 -4.74 -4.65
-------------+--------------------------------N | 3343.000 3343.000 3343.000
ll | -2.7e+03 -2.7e+03 -2.7e+03
----------------------------------------------legend: b/t
. estimates table bcox, t stats(N ll) b(%8.3f) keep(RR DR UI RRUI DRUI LOGWAGE)
------------------------Variable | bcox
-------------+----------RR | 0.522
| 0.91
DR | -0.753
| -1.04
375

UI | -1.318
| -5.55
RRUI | 0.882
| 1.52
DRUI | -0.095
| -0.10
LOGWAGE | 0.335
| 3.03
-------------+----------N | 3343.000
ll | -7.7e+03
------------------------legend: b/t
.
. * (5) VARIOUS PARAMETRIC MODELS: HAZARD RATIOS (Table 17.9, page 608))
.
. * streg default is to report hazard rates rather than coeffcients
. * streg with nohr option reports coefficients
.
. * Exponential regression
. streg $xlist, robust dist(exponential)
failure _d: censor1 == 1
analysis time _t: spell
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3012.4909


log pseudo-likelihood = -2810.3791
log pseudo-likelihood = -2701.8024
log pseudo-likelihood = -2700.6911
log pseudo-likelihood = -2700.6903
log pseudo-likelihood = -2700.6903

Exponential regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 565.24


Log pseudo-likelihood = -2700.6903
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t | Haz. Ratio Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | 1.603235 .9628283 0.79 0.432 .494089 5.202226
DR | .5623451 .4287594 -0.75 0.450 .1261843 2.506112
UI | .2406141 .0600072 -5.71 0.000 .1475837 .3922867
RRUI | 2.626338 1.606901 1.58 0.115 .7916819 8.712654
DRUI | .8194978 .8351649 -0.20 0.845 .1111919 6.039799
LOGWAGE | 1.420204 .1641727 3.03 0.002 1.132279 1.781344
376

tenure | .9998539 .0064627 -0.02 0.982 .9872671 1.012601


slack | .7715401 .0585879 -3.42 0.001 .6648465 .8953557
abolpos | .8563384 .0816353 -1.63 0.104 .7103949 1.032264
explose | 1.219521 .0790681 3.06 0.002 1.073992 1.384769
stateur | .937418 .0215515 -2.81 0.005 .8961153 .9806243
houshead | 1.464071 .1224844 4.56 0.000 1.242655 1.724939
married | 1.447086 .1137619 4.70 0.000 1.240445 1.68815
female | 1.123453 .0958289 1.36 0.172 .9504921 1.327887
child | .9672475 .0768553 -0.42 0.675 .8277574 1.130244
ychild | .8650463 .0884753 -1.42 0.156 .7079133 1.057058
nonwhite | .5121147 .0608532 -5.63 0.000 .4057153 .6464176
age | .9781599 .0038399 -5.63 0.000 .9706627 .9857151
schlt12 | .8841386 .0854168 -1.27 0.202 .7316201 1.068452
schgt12 | 1.117886 .0927231 1.34 0.179 .9501554 1.315226
smsa | 1.211948 .0969443 2.40 0.016 1.036087 1.41766
bluecoll | .8159748 .0694631 -2.39 0.017 .6905813 .9641369
mining | .8864046 .1749386 -0.61 0.541 .6020616 1.305038
constr | .9562365 .1034188 -0.41 0.679 .7735819 1.182019
transp | .8363823 .1305041 -1.15 0.252 .6160109 1.135589
trade | .966073 .0984575 -0.34 0.735 .7911514 1.179669
fire | 1.118574 .1551145 0.81 0.419 .8523684 1.46792
services | 1.202016 .1182677 1.87 0.061 .9911962 1.457676
pubadmin | 1.11523 .3294624 0.37 0.712
.625031 1.989882
year85 | 1.239572 .1101563 2.42 0.016 1.041426 1.475418
year87 | 1.424921 .1351536 3.73 0.000
1.18319 1.716039
year89 | 1.595332 .1761812 4.23 0.000 1.284838 1.980861
midatl | 1.026763 .1504872 0.18 0.857 .7703962 1.368442
encen | 1.004401 .1509427 0.03 0.977 .7481481 1.348425
wncen | 1.18819 .191024 1.07 0.283 .8670399 1.628293
southatl | 1.301973 .1541179 2.23 0.026 1.032388 1.641953
escen | 1.424955 .2752586 1.83 0.067 .9758305 2.080787
wscen | 1.402967 .2010884 2.36 0.018 1.059362 1.858023
mountain | 1.00639 .1548654 0.04 0.967 .7443573 1.360664
pacific | 1.080064 .2585138 0.32 0.748 .6756378 1.726573
-----------------------------------------------------------------------------.
. * Weibull regression
. streg $xlist, robust dist(weibull)
failure _d: censor1 == 1
analysis time _t: spell
Fitting constant-only model:
Iteration 0: log pseudo-likelihood = -3012.4909
Iteration 1: log pseudo-likelihood = -3012.3543
Iteration 2: log pseudo-likelihood = -3012.3543
Fitting full model:

377

Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3012.3543


log pseudo-likelihood = -2799.9064
log pseudo-likelihood = -2688.7377
log pseudo-likelihood = -2687.6004
log pseudo-likelihood = -2687.5995
log pseudo-likelihood = -2687.5995

Weibull regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 501.65


Log pseudo-likelihood = -2687.5995
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t | Haz. Ratio Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | 1.56536 .998996 0.70 0.483 .4481117 5.46817
DR | .6525166 .527689 -0.53 0.598 .1337292 3.183881
UI | .2240097 .0591314 -5.67 0.000 .1335294 .3757999
RRUI | 2.759988 1.781741 1.57 0.116 .7787618 9.781599
DRUI | .7416768 .7901705 -0.28 0.779 .0919091 5.985096
LOGWAGE | 1.441271 .176008 2.99 0.003
1.13448 1.831025
tenure | .9988879 .006864 -0.16 0.871 .9855249 1.012432
slack | .7670407 .0616098 -3.30 0.001 .6553129 .8978176
abolpos | .8517837 .0862808 -1.58 0.113 .6984053 1.038846
explose | 1.230608 .0842616 3.03 0.002 1.076061 1.407352
stateur | .9315788 .0225551 -2.93 0.003 .8884041 .9768517
houshead | 1.488342 .1320445 4.48 0.000 1.250791 1.771008
married | 1.460247 .1212469 4.56 0.000 1.240937 1.718316
female | 1.134376 .101752 1.41 0.160 .9514927 1.352411
child | .966883 .0812139 -0.40 0.688 .8201188 1.139911
ychild | .8510311 .0927173 -1.48 0.139
.6874 1.053613
nonwhite | .4953204 .0615485 -5.65 0.000
.388254 .6319119
age | .9766936 .0040945 -5.63 0.000 .9687014 .9847517
schlt12 | .8845503 .0904684 -1.20 0.230 .7238772 1.080887
schgt12 | 1.123316 .0989295 1.32 0.187 .9452293 1.334955
smsa | 1.22135 .1027313 2.38 0.017 1.035722 1.440247
bluecoll | .8191464 .0736702 -2.22 0.027 .6867654 .9770452
mining | .9034201 .1839945 -0.50 0.618 .6060805 1.346633
constr | .9749455 .1107157 -0.22 0.823 .7803997 1.21799
transp | .820245 .1371565 -1.19 0.236 .5910316 1.138352
trade | .9693436 .1046408 -0.29 0.773 .7844954 1.197747
fire | 1.134526 .1693311 0.85 0.398 .8467799 1.520053
services | 1.225277 .1272996 1.96 0.051 .9995379 1.501999
pubadmin | 1.118259 .3452483 0.36 0.717 .6105827 2.048048
year85 | 1.268072 .1184214 2.54 0.011 1.055972 1.522772
year87 | 1.460443 .147765 3.74 0.000 1.197737 1.780769
year89 | 1.63563 .1930814 4.17 0.000 1.297786 2.061422
378

midatl | 1.024956 .1580625 0.16 0.873


.757597 1.386668
encen | .9985899 .1576839 -0.01 0.993 .7327855 1.36081
wncen | 1.20254 .2037638 1.09 0.276 .8627169 1.67622
southatl | 1.315343 .1644812 2.19 0.028 1.029432 1.680661
escen | 1.444469 .292472 1.82 0.069 .9713137 2.148113
wscen | 1.410579 .2155089 2.25 0.024 1.045564 1.903025
mountain | 1.016091 .1646258 0.10 0.922 .7396425 1.395864
pacific | 1.088666 .2726104 0.34 0.734 .6664189 1.778452
-------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281
-------------+---------------------------------------------------------------p | 1.129225 .0219492
1.087014 1.173075
1/p | .8855632 .0172131
.8524608 .9199511
-----------------------------------------------------------------------------.
. * Gompertz regression
. streg $xlist, robust dist(gompertz)
failure _d: censor1 == 1
analysis time _t: spell
Fitting constant-only model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:

log pseudo-likelihood = -3012.4909


log pseudo-likelihood = -3002.0916
log pseudo-likelihood = -3002.026
log pseudo-likelihood = -3002.026

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -3002.026


log pseudo-likelihood = -2796.0001
log pseudo-likelihood = -2701.6693
log pseudo-likelihood = -2700.6057
log pseudo-likelihood = -2700.605
log pseudo-likelihood = -2700.605

Gompertz regression -- log relative-hazard form


No. of subjects
No. of failures
Time at risk

=
=
=

Log pseudo-likelihood =

3343
1073
20887

Number of obs =

Wald chi2(40) = 529.75


-2700.605
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t | Haz. Ratio Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | 1.603847 .9677311 0.78 0.434 .4915456 5.233135
379

DR | .5696179 .4355373 -0.74 0.462 .1272752 2.549315


UI | .239703 .0601259 -5.69 0.000 .1466096 .3919084
RRUI | 2.635153 1.61916 1.58 0.115 .7902931 8.786655
DRUI | .809572 .8266639 -0.21 0.836 .1094166 5.990014
LOGWAGE | 1.42258 .165403 3.03 0.002 1.132681 1.786676
tenure | .9997767 .0064987 -0.03 0.973 .9871202 1.012595
slack | .7715195 .0588538 -3.40 0.001 .6643773 .8959403
abolpos | .856193 .0820234 -1.62 0.105 .7096209 1.033039
explose | 1.220339 .079429 3.06 0.002 1.074182 1.386383
stateur | .9368388 .0217014 -2.82 0.005 .895256 .9803531
houshead | 1.465625 .1233575 4.54 0.000 1.242738 1.728487
married | 1.447755 .1142433 4.69 0.000 1.240298 1.689912
female | 1.12423 .0962607 1.37 0.171 .9505442 1.329653
child | .9674007 .0772224 -0.42 0.678 .8272934 1.131236
ychild | .8635879 .0888493 -1.43 0.154 .7058811 1.056529
nonwhite | .5106596 .0611307 -5.61 0.000 .4038637 .6456961
age | .9780275 .0038913 -5.58 0.000 .9704303 .9856841
schlt12 | .8843861 .0857988 -1.27 0.205 .7312444 1.0696
schgt12 | 1.118658 .0930697 1.35 0.178 .9503406 1.316786
smsa | 1.212374 .0974117 2.40 0.017 1.035725 1.419152
bluecoll | .8159478 .0697624 -2.38 0.017 .6900584 .9648035
mining | .8883688 .1755808 -0.60 0.549 .603057 1.308664
constr | .9584913 .1037942 -0.39 0.695 .7751974 1.185125
transp | .8352933 .1311411 -1.15 0.252
.614045 1.13626
trade | .9663982 .0989216 -0.33 0.738 .7907263 1.181098
fire | 1.121157 .1567557 0.82 0.413 .8524222 1.474613
services | 1.203704 .1189167 1.88 0.061 .9918076 1.460871
pubadmin | 1.115084 .3307191 0.37 0.713 .6235232 1.994172
year85 | 1.242641 .110658 2.44 0.015 1.043628 1.479605
year87 | 1.428205 .1361051 3.74 0.000 1.184875 1.721505
year89 | 1.598515 .1781172 4.21 0.000 1.284903 1.988673
midatl | 1.027127 .1511211 0.18 0.856 .7698165 1.370444
encen | 1.00439 .1515525 0.03 0.977
.747248 1.35002
wncen | 1.189578 .1919987 1.08 0.282 .8669786 1.632215
southatl | 1.303098 .1549053 2.23 0.026 1.032265 1.644991
escen | 1.427739 .276716 1.84 0.066 .9765033 2.087486
wscen | 1.404099 .2025325 2.35 0.019
1.05832 1.862851
mountain | 1.00768 .1557029 0.05 0.961 .7443861 1.364103
pacific | 1.081002 .2594941 0.32 0.746 .6752989 1.730442
-------------+---------------------------------------------------------------gamma | .002658 .0067759 0.39 0.695 -.0106225 .0159386
-----------------------------------------------------------------------------.
. * Cox regression
. stcox $xlist, robust
failure _d: censor1 == 1
analysis time _t: spell
Iteration 0: log pseudo-likelihood = -7981.9304
380

Iteration 1: log pseudo-likelihood = -7731.2822


Iteration 2: log pseudo-likelihood = -7717.3198
Iteration 3: log pseudo-likelihood = -7717.2334
Iteration 4: log pseudo-likelihood = -7717.2334
Refining estimates:
Iteration 0: log pseudo-likelihood = -7717.2334
Cox regression -- Breslow method for ties
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 540.98


Log pseudo-likelihood = -7717.2334
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t | Haz. Ratio Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | 1.685866 .962916 0.91 0.361 .5503545 5.164209
DR | .4711838 .3400769 -1.04 0.297 .1145079 1.938854
UI | .2677452 .0635331 -5.55 0.000
.168167 .4262877
RRUI | 2.416321 1.406577 1.52 0.130 .7720714 7.562264
DRUI | .9092495 .8890406 -0.10 0.922 .1337828 6.179678
LOGWAGE | 1.398309 .1547206 3.03 0.002 1.125691 1.73695
tenure | 1.000828 .0061337 0.14 0.893 .9888782 1.012922
slack | .7804668 .0562851 -3.44 0.001 .6775918 .8989608
abolpos | .8597068 .0778065 -1.67 0.095 .7199688 1.026567
explose | 1.205033 .0741989 3.03 0.002 1.068038 1.359599
stateur | .942662 .0208187 -2.67 0.008 .9027285 .9843619
houshead | 1.433597 .1139461 4.53 0.000 1.226793 1.675262
married | 1.431638 .106851 4.81 0.000 1.236811 1.657154
female | 1.105476 .0899059 1.23 0.218 .9425903 1.296509
child | .9611687 .0726033 -0.52 0.600 .8289013 1.114542
ychild | .8801492 .0851858 -1.32 0.187 .7280685 1.063997
nonwhite | .5275839 .0607424 -5.55 0.000 .4210076 .6611394
age | .9797456 .0036832 -5.44 0.000 .9725532 .9869912
schlt12 | .8850966 .0814354 -1.33 0.185 .7390501 1.060004
schgt12 | 1.116816 .0875072 1.41 0.159 .9578255 1.302197
smsa | 1.205005 .0923125 2.43 0.015 1.037004 1.400224
bluecoll | .8099341 .0654969 -2.61 0.009 .6912189 .9490384
mining | .8835344 .1684327 -0.65 0.516 .6080713 1.283785
constr | .9470011 .0974926 -0.53 0.597 .7739632 1.158726
transp | .8562733 .1255737 -1.06 0.290 .6423659 1.141412
trade | .9623999 .0931706 -0.40 0.692
.796068 1.163485
fire | 1.116009 .1451681 0.84 0.399 .8648584 1.440091
services | 1.181313 .1109851 1.77 0.076 .9826387 1.420155
pubadmin | 1.107605 .313432 0.36 0.718 .6360783 1.928677
year85 | 1.226497 .1041394 2.40 0.016 1.038467 1.448572
year87 | 1.402734 .1261218 3.76 0.000 1.176095 1.673046
year89 | 1.566206 .1643529 4.28 0.000 1.275047 1.92385
381

midatl | 1.034816 .1454072 0.24 0.808 .7856998 1.362918


encen | 1.017613 .1464205 0.12 0.903 .7675496 1.349146
wncen | 1.179507 .1807665 1.08 0.281 .8734718 1.592767
southatl | 1.286342 .1449884 2.23 0.025 1.031369 1.604348
escen | 1.41205 .2597913 1.88 0.061
.984565 2.025142
wscen | 1.3933 .1894611 2.44 0.015 1.067329 1.818826
mountain | 1.009529 .148262 0.06 0.949 .7570232 1.346259
pacific | 1.074687 .243238 0.32 0.750 .6896459 1.674702
-----------------------------------------------------------------------------.
. * Display results for Table 17.9 page 608
. * Not possible here as estimates table gives coefficients not hazard rates
. * Instead need to use output for each model
. * Not sure why t-statistics differ somewhat from those in Table 17.9
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma17p4duration.txt
log type: text
closed on: 19 May 2005, 15:25:17

382

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma18p1heterogeneity.txt
log type: text
opened on: 19 May 2005, 17:58:22
.
. ********** OVERVIEW OF MMA18P1HETEROGENEITY.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 18.8 Pages 632-6
. * Unobserved Heterogeneity with Duration data Example
. * (1) Exponential with and without heterogeneity
.*
Residuals Plots: Figures 18.2 (exp.wmf) and 18.3 (exp_gamma.wmf)
.*
Tabulate Model Estimates: Table 18.1
. * (2) Weibull with and without heterogeneity: Generalized Residuals Plots
.*
Residuals Plots: Figures 18.4 (Weibul16.wmf) and 18.5 (Weibul16_IG.wmf)
.*
Tabulate model Estimates: Table 18.2
.
. * To run this program you need data file
. * ema1996.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
. set matsize 100
.
. ********** DATA DESCRIPTION **********
.
. * The data is from
. * B.P. McCall (1996), "Unemployment Insurance Rules, Joblessness,
.*
and Part-time Work," Econometrica, 64, 647-682.
.
. * There are 3343 observations from the CPS Displaced Worker Surveys
. * of 1986, 1988, 1990 and 1992 on 33 variables including
. * spell = length of spell in number of two-week intervals
. * CENSOR1 = 1 if re-employed at full-time job
.
. * See program mma17p4duration.do for further description of the data set
.
. ********** READ DATA **********
383

.
. use ema1996.dta
(Sample for 1996 EMA paper: part-time= worked part-time last week)
.
. ********** CREATE ADDITIONAL VARIABLES **********
.
. gen RR = reprate
. gen DR = disrate
. gen UI = ui
. gen RRUI = RR*UI
. gen DRUI = DR*UI
. gen LOGWAGE = logwage
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------ui |
3343 .5527969 .4972791
0
1
reprate |
3343 .4544717 .1137918
.066
2.059
logwage |
3343 5.692994 .5356591 2.70805 7.600402
tenure |
3343 4.114867 5.862322
0
40
disrate |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------slack |
3343 .4884834 .4999421
0
1
abolpos |
3343 .1456775 .3528354
0
1
explose |
3343 .5025426 .5000683
0
1
stateur |
3343
6.5516 1.803825
2.5
13
houshead |
3343 .6120251 .4873617
0
1
-------------+-------------------------------------------------------married |
3343 .5860006 .4926221
0
1
female |
3343 .3478911 .4763725
0
1
child |
3343 .4501944 .4975876
0
1
ychild |
3343 .1956327 .3967463
0
1
nonwhite |
3343 .1390966 .3460991
0
1
-------------+-------------------------------------------------------age |
3343 35.44331 10.6402
20
61
schlt12 |
3343 .2811846 .4496446
0
1
schgt12 |
3343 .3356267 .4722797
0
1
smsa |
3343 .7241998 .4469835
0
1
384

bluecoll |
3343 .6036494 .489212
0
1
-------------+-------------------------------------------------------mining |
3343 .029315 .1687132
0
1
constr |
3343 .1480706 .3552231
0
1
transp |
3343 .0646126 .2458778
0
1
trade |
3343 .1848639 .3882452
0
1
fire |
3343 .0514508 .2209484
0
1
-------------+-------------------------------------------------------services |
3343 .1699073 .3756075
0
1
pubadmin |
3343 .0095722 .097383
0
1
year85 |
3343 .2677236 .442839
0
1
year87 |
3343 .2174693 .4125862
0
1
year89 |
3343 .1998205 .3999251
0
1
-------------+-------------------------------------------------------midatl |
3343 .1088842 .3115405
0
1
encen |
3343 .1429853 .3501103
0
1
wncen |
3343 .0643135 .2453472
0
1
southatl |
3343 .2375112 .4256217
0
1
escen |
3343 .0532456 .2245564
0
1
-------------+-------------------------------------------------------wscen |
3343 .1441819 .3513266
0
1
mountain |
3343 .1079868 .3104102
0
1
pacific |
3343 .0260245 .159232
0
1
RR |
3343 .4544717 .1137918
.066
2.059
DR |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------UI |
3343 .5527969 .4972791
0
1
RRUI |
3343 .2478687 .2380667
0
2.059
DRUI |
3343 .0602776 .0754261
0
.824
LOGWAGE |
3343 5.692994 .5356591 2.70805 7.600402
.
. ********* ANALYSIS: UNEMPLOYMENT DURATION **********
.
. * Stata st curves require defining the dependent variable
. * and the censoring variable if there is one
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
385

. stdes
failure _d: censor1 == 1
analysis time _t: spell
|-------------- per subject --------------|
Category
total
mean
min median
max
-----------------------------------------------------------------------------no. of subjects
3343
no. of records
3343
1
1
1
1
(first) entry time
(final) exit time
subjects with gap
time on gap if gap
time at risk

0
6.247981
0
0
20887 6.247981

0
1

0
5

28

28

failures
1073 .3209692
0
0
1
-----------------------------------------------------------------------------.
. * Define $xlist = list of regressors used in subsequent regressions
. global xlist RR DR UI RRUI DRUI LOGWAGE /*
> */ tenure slack abolpos explose stateur houshead married /*
> */ female child ychild nonwhite age schlt12 schgt12 smsa bluecoll /*
> */ mining constr transp trade fire services pubadmin /*
> */ year85 year87 year89 midatl /*
> */ encen wncen southatl escen wscen mountain pacific
.
. * (1) EXPONENTIAL REGRESSION
.
. * Estimate exponential without heterogeneity
. streg $xlist, nolog nohr dist(exponential) robust
failure _d: censor1 == 1
analysis time _t: spell
Exponential regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 565.24


Log pseudo-likelihood = -2700.6903
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
386

-------------+---------------------------------------------------------------RR | .4720235 .6005534 0.79 0.432 -.7050396 1.649087


DR | -.5756396 .7624489 -0.75 0.450 -2.070012 .9187327
UI | -1.424561 .2493917 -5.71 0.000 -1.91336 -.9357622
RRUI | .9655904 .6118408 1.58 0.115 -.2335956 2.164776
DRUI | -.1990635 1.019118 -0.20 0.845 -2.196498 1.798371
LOGWAGE | .3508005 .115598 3.03 0.002 .1242327 .5773684
tenure | -.0001462 .0064637 -0.02 0.982 -.0128147 .0125224
slack | -.2593666 .0759363 -3.42 0.001 -.4081991 -.1105342
abolpos | -.1550897 .0953306 -1.63 0.104 -.3419342 .0317549
explose | .198458 .0648354 3.06 0.002
.071383 .3255331
stateur | -.064626 .0229903 -2.81 0.005 -.1096862 -.0195659
houshead | .3812208 .0836602 4.56 0.000 .2172499 .5451918
married | .369552 .0786145 4.70 0.000 .2154705 .5236335
female | .1164067 .0852986 1.36 0.172 -.0507754 .2835888
child | -.0333008 .0794577 -0.42 0.675 -.1890352 .1224335
ychild | -.1449722 .1022781 -1.42 0.156 -.3454336 .0554892
nonwhite | -.6692066 .1188272 -5.63 0.000 -.9021037 -.4363095
age | -.0220821 .0039256 -5.63 0.000 -.0297762 -.0143879
schlt12 | -.1231414 .0966102 -1.27 0.202 -.3124939 .066211
schgt12 | .1114395 .082945 1.34 0.179 -.0511297 .2740087
smsa | .1922291 .0799904 2.40 0.016 .0354508 .3490075
bluecoll | -.2033718 .085129 -2.39 0.017 -.3702215 -.036522
mining | -.1205818 .1973575 -0.61 0.541 -.5073955 .2662319
constr | -.04475 .1081519 -0.41 0.679 -.2567237 .1672238
transp | -.1786694 .156034 -1.15 0.252 -.4844906 .1271517
trade | -.0345159 .1019152 -0.34 0.735 -.234266 .1652341
fire | .1120549 .1386716 0.81 0.419 -.1597365 .3838462
services | .1840002 .0983911 1.87 0.061 -.0088428 .3768432
pubadmin | .1090606 .2954211 0.37 0.712 -.4699541 .6880752
year85 | .2147661 .0888664 2.42 0.016 .0405911 .388941
year87 | .3541162 .0948499 3.73 0.000 .1682139 .5400186
year89 | .467082 .1104355 4.23 0.000 .2506325 .6835316
midatl | .0264112 .1465647 0.18 0.857 -.2608503 .3136727
encen | .0043916 .1502813 0.03 0.977 -.2901544 .2989375
wncen | .1724311 .1607689 1.07 0.283 -.1426703 .4875324
southatl | .2638807 .1183726 2.23 0.026 .0318747 .4958867
escen | .35414 .19317 1.83 0.067 -.0244664 .7327463
wscen | .3385896 .1433308 2.36 0.018 .0576664 .6195128
mountain | .0063693 .1538821 0.04 0.967 -.2952341 .3079727
pacific | .0770202 .2393505 0.32 0.748 -.3920982 .5461385
_cons | -4.079107 .8767097 -4.65 0.000 -5.797426 -2.360788
-----------------------------------------------------------------------------. estimates store bexp
.
. * Figure 18.2 (p.633) - Generalized (Cox-Snell) Residuals for Exponential
. predict resid, csnell
. stset resid, fail(censor1)
387

failure event: censor1 != 0 & censor1 < .


obs. time interval: (0, resid]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
1073 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 5.218098
. sts generate survivor=s
. generate cumhaz = -ln(survivor)
. sort resid
. graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Exponential Model Residuals") /*
> */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(6) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line"))
. graph export exp.wmf, replace
(file c:\Imbook\bwebpage\Section4\exp.wmf written in Windows Metafile format)
. drop resid survivor cumhaz
.
. * Estimate exponential with gamma heterogeneity
. stset spell, fail(censor1)
failure event: censor1 != 0 & censor1 < .
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
388

last observed exit t =

28

. streg $xlist, nolog nohr dist(exponential) frailty(gamma) robust


failure _d: censor1
analysis time _t: spell
Exponential regression -- log relative-hazard form
Gamma frailty
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 576.86


Log pseudo-likelihood = -2695.3518
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .5005828 .6187508 0.81 0.419 -.7121465 1.713312
DR | -.8824469 .7894395 -1.12 0.264 -2.42972 .664826
UI | -1.584537 .2622252 -6.04 0.000 -2.098489 -1.070586
RRUI | 1.091168 .6327026 1.72 0.085 -.1489067 2.331242
DRUI | .0574048 1.047123 0.05 0.956 -1.994919 2.109729
LOGWAGE | .3792805 .1191278 3.18 0.001 .1457944 .6127666
tenure | .0007938 .0065903 0.12 0.904 -.012123 .0137106
slack | -.2862928 .0770348 -3.72 0.000 -.4372782 -.1353074
abolpos | -.1842749 .0977213 -1.89 0.059 -.3758051 .0072552
explose | .2151452 .0663117 3.24 0.001 .0851767 .3451137
stateur | -.0650451 .023552 -2.76 0.006 -.1112061 -.0188841
houshead | .3960399 .0847153 4.67 0.000 .2300009 .5620789
married | .3961194 .0806744 4.91 0.000 .2380005 .5542384
female | .1102564 .0869256 1.27 0.205 -.0601147 .2806275
child | -.0464355 .0815869 -0.57 0.569 -.206343 .113472
ychild | -.1213622 .103309 -1.17 0.240 -.3238441 .0811196
nonwhite | -.6909793 .1217489 -5.68 0.000 -.9296027 -.4523559
age | -.0225342 .0040184 -5.61 0.000 -.0304101 -.0146582
schlt12 | -.1513782 .0968026 -1.56 0.118 -.3411079 .0383515
schgt12 | .1011742 .0834622 1.21 0.225 -.0624088 .2647572
smsa | .212363 .081774 2.60 0.009
.052089 .372637
bluecoll | -.220439 .0862751 -2.56 0.011 -.3895351 -.0513429
mining | -.1721823 .2051663 -0.84 0.401 -.5743008 .2299362
constr | -.0897602 .11034 -0.81 0.416 -.3060225 .1265022
transp | -.1572488 .1563607 -1.01 0.315 -.4637102 .1492126
trade | -.0451107 .1034986 -0.44 0.663 -.2479642 .1577428
fire | .0881685 .1386688 0.64 0.525 -.1836175 .3599544
services | .1682835 .1005405 1.67 0.094 -.0287723 .3653393
pubadmin | .0961407 .3092103 0.31 0.756 -.5099004 .7021817
year85 | .1940199 .0906564 2.14 0.032 .0163366 .3717031
year87 | .3564373 .0959014 3.72 0.000 .1684741 .5444005
389

year89 | .4924007 .1101907 4.47 0.000 .2764308 .7083705


midatl | .0156736 .1488094 0.11 0.916 -.2759874 .3073347
encen | .0089345 .1538505 0.06 0.954 -.2926069 .3104759
wncen | .1742124 .1634726 1.07 0.287 -.1461881 .4946129
southatl | .2676635 .1192515 2.24 0.025 .0339348 .5013922
escen | .3741169 .199389 1.88 0.061 -.0166783 .7649121
wscen | .361461 .1423856 2.54 0.011 .0823903 .6405316
mountain | -.00019 .1557385 -0.00 0.999 -.3054318 .3050519
pacific | .0800478 .2463547 0.32 0.745 -.4027986 .5628941
_cons | -4.095067 .9086039 -4.51 0.000 -5.875898 -2.314236
-------------+---------------------------------------------------------------/ln_the | -1.462995 .31608 -4.63 0.000
-2.0825 -.8434894
-------------+---------------------------------------------------------------theta | .2315418 .0731857
.1246183 .4302067
-----------------------------------------------------------------------------. estimates store bexpgamma
.
. * Figure 18.3 (p.633) - Generalized (Cox-Snell) Residuals for Exponential-Gamma
. predict resid, csnell
(option unconditional assumed)
. stset resid, fail(censor1)
failure event: censor1 != 0 & censor1 < .
obs. time interval: (0, resid]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
1073 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 3.971096
. sts generate survivor=s
. generate cumhaz = -ln(survivor)
. sort resid
. graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Exponential-Gamma Model Residuals") /*
> */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
390

> */ legend(pos(6) ring(0) col(1)) legend(size(small)) /*


> */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line"))
. graph export exp_gamma.wmf, replace
(file c:\Imbook\bwebpage\Section4\exp_gamma.wmf written in Windows Metafile format)
. drop resid survivor cumhaz
.
. /*
> * Following did not work, even with starting values provided
> * Results in book obtained on different computer with different Stata version
> * Estimate exponential with IG heterogeneity
> stset spell, fail(censor1=1)
> quietly streg $xlist, nolog nohr dist(exponential) robust
> matrix theta = 1.6
> matrix bstart = e(b),theta
> streg $xlist, nohr dist(exponential) frailty(invgauss) robust from(bstart)
> * estimates store bexpIG
> */
.
. * Table 18.1 (p.634) - Display Parameter Estimates
. * Note that exponetial-IG missing
. estimates table bexp bexpgamma, t(%9.3f) stats(N ll) b(%9.3f) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE _cons)
-------------------------------------Variable | bexp
bexpgamma
-------------+-----------------------RR | 0.472
0.501
| 0.786
0.809
DR | -0.576 -0.882
| -0.755 -1.118
UI | -1.425 -1.585
| -5.712 -6.043
RRUI | 0.966
1.091
| 1.578
1.725
DRUI | -0.199
0.057
| -0.195
0.055
LOGWAGE | 0.351
0.379
| 3.035
3.184
_cons | -4.079 -4.095
| -4.653 -4.507
-------------+-----------------------N | 3343.000 3343.000
ll | -2700.690 -2695.352
-------------------------------------legend: b/t
.
. * (2) WEIBULL REGRESSION
391

.
. * Estimate Weibull without heterogeneity
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr dist(weibull) robust
failure _d: censor1 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 501.65


Log pseudo-likelihood = -2687.5995
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4481156 .6381895 0.70 0.483 -.8027127 1.698944
DR | -.4269187 .8086983 -0.53 0.598 -2.011938 1.158101
UI | -1.496066 .2639679 -5.67 0.000 -2.013434 -.9786984
RRUI | 1.015226 .6455611 1.57 0.116 -.2500501 2.280503
DRUI | -.2988417 1.065384 -0.28 0.779 -2.386956 1.789272
LOGWAGE | .3655253 .12212 2.99 0.003 .1261745 .6048761
tenure | -.0011127 .0068716 -0.16 0.871 -.0145809 .0123554
slack | -.2652154 .0803214 -3.30 0.001 -.4226424 -.1077883
abolpos | -.1604227 .1012942 -1.58 0.113 -.3589557 .0381103
explose | .2075085 .0684715 3.03 0.002 .0733068 .3417103
stateur | -.0708745 .0242117 -2.93 0.003 -.1183286 -.0234204
houshead | .3976626 .0887192 4.48 0.000 .2237762 .571549
married | .3786057 .0830317 4.56 0.000 .2158665 .541345
female | .1260829 .0896987 1.41 0.160 -.0497233 .301889
child | -.0336778 .0839956 -0.40 0.688 -.1983061 .1309505
ychild | -.1613066 .108947 -1.48 0.139 -.3748389 .0522256
392

nonwhite | -.7025504 .12426 -5.65 0.000 -.9460956 -.4590052


age | -.0235823 .0041922 -5.63 0.000 -.0317989 -.0153658
schlt12 | -.1226759 .1022762 -1.20 0.230 -.3231335 .0777816
schgt12 | .1162848 .0880692 1.32 0.187 -.0563278 .2888973
smsa | .1999567 .0841129 2.38 0.017 .0350985 .3648149
bluecoll | -.1994925 .0899354 -2.22 0.027 -.3757626 -.0232223
mining | -.1015676 .2036644 -0.50 0.618 -.5007425 .2976073
constr | -.0253737 .1135609 -0.22 0.823 -.247949 .1972016
transp | -.1981522 .1672141 -1.19 0.236 -.5258858 .1295814
trade | -.0311361 .1079502 -0.29 0.773 -.2427146 .1804423
fire | .1262153 .1492527 0.85 0.398 -.1663145 .4187452
services | .2031673 .1038945 1.96 0.051 -.0004622 .4067968
pubadmin | .1117728 .3087374 0.36 0.717 -.4933415 .716887
year85 | .2374972 .093387 2.54 0.011
.054462 .4205325
year87 | .3787397 .1011782 3.74 0.000 .1804341 .5770454
year89 | .4920278 .1180472 4.17 0.000 .2606596 .7233959
midatl | .02465 .1542139 0.16 0.873 -.2776037 .3269036
encen | -.0014111 .1579065 -0.01 0.993 -.3109023 .30808
wncen | .1844363 .1694444 1.09 0.276 -.1476687 .5165413
southatl | .2740974 .1250481 2.19 0.028 .0290076 .5191872
escen | .367742 .2024771 1.82 0.069 -.0291058 .7645899
wscen | .3440005 .1527804 2.25 0.024 .0445563 .6434446
mountain | .0159627 .1620188 0.10 0.922 -.3015883 .3335136
pacific | .0849532 .2504077 0.34 0.734 -.4058368 .5757432
_cons | -4.357886 .9196792 -4.74 0.000 -6.160424 -2.555347
-------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281
-------------+---------------------------------------------------------------p | 1.129225 .0219492
1.087014 1.173075
1/p | .8855632 .0172131
.8524608 .9199511
-----------------------------------------------------------------------------. estimates store bweib
.
. * Figure 18.4 (p.635) - Generalized (Cox-Snell) Residuals for Weibull
. predict resid, csnell
. stset resid, fail(censor1)
failure event: censor1 != 0 & censor1 < .
obs. time interval: (0, resid]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
1073 total analysis time at risk, at risk from t =
0
393

earliest observed entry t =


0
last observed exit t = 6.283261
. sts generate survivor=s
. generate cumhaz = -ln(survivor)
. sort resid
. graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Weibull Model Residuals") /*
> */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(6) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line"))
. graph export Weibul16.wmf, replace
(file c:\Imbook\bwebpage\Section4\Weibul16.wmf written in Windows Metafile format)
. drop resid survivor cumhaz
.
. * Estimate Weibull with gamma heterogeneity
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr dist(weibull) frailty(invgauss) robust
failure _d: censor1 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
Inverse-Gaussian frailty
No. of subjects
No. of failures

=
=

3343
1073

Number of obs =

3343

394

Time at risk

20887

Wald chi2(40) = 643.00


Log pseudo-likelihood = -2616.3216
Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .7356277 .9058181 0.81 0.417 -1.039743 2.510998
DR | -1.072566 1.149098 -0.93 0.351 -3.324758 1.179625
UI | -2.574752 .3843798 -6.70 0.000 -3.328123 -1.821381
RRUI | 1.733571 .9333928 1.86 0.063 -.0958458 3.562987
DRUI | -.060621 1.537813 -0.04 0.969 -3.07468 2.953438
LOGWAGE | .575656 .1766599 3.26 0.001 .2294089 .9219031
tenure | -.0009848 .0097472 -0.10 0.920 -.0200889 .0181194
slack | -.4416007 .1142976 -3.86 0.000 -.6656199 -.2175814
abolpos | -.2873066 .1465357 -1.96 0.050 -.5745113 -.0001019
explose | .3641943 .0976897 3.73 0.000 .1727259 .5556627
stateur | -.0981133 .0346763 -2.83 0.005 -.1660775 -.030149
houshead | .5924383 .1256739 4.71 0.000 .3461219 .8387546
married | .6083214 .1183487 5.14 0.000 .3763624 .8402805
female | .1788439 .1285074 1.39 0.164 -.0730259 .4307137
child | -.0914227 .121778 -0.75 0.453 -.3301031 .1472578
ychild | -.1805373 .1527477 -1.18 0.237 -.4799173 .1188426
nonwhite | -1.008517 .1725174 -5.85 0.000 -1.346645 -.6703894
age | -.0333776 .0059183 -5.64 0.000 -.0449772 -.0217779
schlt12 | -.2258621 .1439543 -1.57 0.117 -.5080075 .0562832
schgt12 | .1505129 .124469 1.21 0.227 -.0934418 .3944677
smsa | .3009952 .119907 2.51 0.012 .0659819 .5360086
bluecoll | -.3211857 .1253163 -2.56 0.010 -.5668012 -.0755702
mining | -.2319827 .3008491 -0.77 0.441 -.8216361 .3576708
constr | -.1260324 .1633669 -0.77 0.440 -.4462257 .1941609
transp | -.2763858 .225893 -1.22 0.221 -.7191279 .1663562
trade | -.0687616 .1518284 -0.45 0.651 -.3663399 .2288166
fire | .0668973 .2131814 0.31 0.754 -.3509306 .4847252
services | .231914 .1494712 1.55 0.121 -.0610441 .5248721
pubadmin | .0901949 .4579252 0.20 0.844 -.807322 .9877117
year85 | .2780139 .1339053 2.08 0.038 .0155644 .5404634
year87 | .5208783 .1415375 3.68 0.000 .2434699 .7982867
year89 | .7209598 .1655487 4.35 0.000 .3964903 1.045429
midatl | -.0192077 .2222646 -0.09 0.931 -.4548382 .4164228
encen | -.0297055 .2284931 -0.13 0.897 -.4775438 .4181328
wncen | .2460338 .24216 1.02 0.310 -.2285911 .7206586
southatl | .3563643 .1793284 1.99 0.047 .0048872 .7078415
escen | .5461543 .2910193 1.88 0.061 -.024233 1.116542
wscen | .4606814 .2140966 2.15 0.031 .0410598 .880303
mountain | .017581 .2293804 0.08 0.939 -.4319963 .4671584
pacific | .1379886 .3636985 0.38 0.704 -.5748475 .8508247
_cons | -5.303059 1.34133 -3.95 0.000 -7.932017 -2.6741
-------------+---------------------------------------------------------------/ln_p | .5611667 .0225898 24.84 0.000 .5168915 .6054418
395

/ln_the | 1.852696 .0896755 20.66 0.000 1.676935 2.028457


-------------+---------------------------------------------------------------p | 1.752716 .0395935
1.676807 1.832062
1/p | .570543 .0128884
.5458332 .5963715
theta | 6.376987 .5718595
5.349136 7.602343
-----------------------------------------------------------------------------. estimates store bweibIG
.
. * Figure 18.5 (p.636) - Generalized (Cox-Snell) Residuals for Weibull-IG
. predict resid, csnell
(option unconditional assumed)
. stset resid, fail(censor1)
failure event: censor1 != 0 & censor1 < .
obs. time interval: (0, resid]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
1073 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t = 5.044588
. sts generate survivor=s
. generate cumhaz = -ln(survivor)
. sort resid
. graph twoway (scatter cumhaz resid, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter resid resid, c(l) msymbol(i) msize(small) clstyle(p2)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Weibull-IG Model Residuals") /*
> */ xtitle("Generalized (Cox-Snell) Residual", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(6) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Cumulative Hazard") label(2 "45 degree line"))
. graph export Weibul16_IG.wmf, replace
(file c:\Imbook\bwebpage\Section4\Weibul16_IG.wmf written in Windows Metafile format)
. drop resid survivor cumhaz
.
396

. * Table 18.2 (p.635) - Display Parameter Estimates


. estimates table bweibIG bweib, t(%9.3f) stats(N ll) b(%9.3f) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE _cons)
-------------------------------------Variable | bweibIG
bweib
-------------+-----------------------RR | 0.736
0.448
| 0.812
0.702
DR | -1.073 -0.427
| -0.933 -0.528
UI | -2.575 -1.496
| -6.698 -5.668
RRUI | 1.734
1.015
| 1.857
1.573
DRUI | -0.061 -0.299
| -0.039 -0.281
LOGWAGE | 0.576
0.366
| 3.259
2.993
_cons | -5.303 -4.358
| -3.954 -4.738
-------------+-----------------------N | 3343.000 3343.000
ll | -2616.322 -2687.600
-------------------------------------legend: b/t
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma18p1heterogeneity.txt
log type: text
closed on: 19 May 2005, 17:58:38

397

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma19p1comprisks.txt
log type: text
opened on: 19 May 2005, 17:52:44
.
. ********** OVERVIEW OF MMA18P1COMPRISKS.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 19.5 pages 658-62
. * Competing Risks Example with censoring mechanism each of the three risks
. * (1A) Table 19.2 p.659 Exponential
. * (1B) Table 19.2 p.659 Exponential with IG frailty
. * (2A) Table 19.3 p.659 Weibull
. * (2B) Table 19.3 p.659 Weibull with IG frailty
. * (2C) Table 19.3 p.660 Cox model
. * (2D) Graph the resulting Cox baseline survival and cumulative hazards
.*
Figure 19.1: (combined_bsf.wmf) baseline survival functions
.*
Figure 19.2: (combined_cbh.wmf) baseline cumulative hazards
.
. * To run this program you need data file
. * ema1996.dta
.
. * NOTE: The IG Heterogeneity estimation was unsuccessful for exponential
.*
but successful for Weibull
.
. ********** SETUP **********
.
. set more off
. version 8
. set scheme s1mono /* Used for graphs */
. set matsize 80

/* Needed for this program */

.
. ********** DATA DESCRIPTION **********
.
. * The data is from
. * B.P. McCall (1996), "Unemployment Insurance Rules, Joblessness,
.*
and Part-time Work," Econometrica, 64, 647-682.
.
. * There are 3343 observations from the CPS Displaced Worker Surveys
. * of 1986, 1988, 1990 and 1992 on 33 variables including
. * spell = length of spell in number of two-week intervals
398

. * CENSOR1 = 1 if re-employed at full-time job


. * CENSOR2 = 1 if re-employed at part-time job
. * CENSOR3 = 1 if re-employed but left job: pt-ft status unknown
. * CENSOR4 = 1 if still jobless
.
. * See program mma17p4duration.do for further description of the data set
.
. ********** READ DATA and CREATE ADDITIONAL VARIABLES **********
.
. use ema1996.dta
(Sample for 1996 EMA paper: part-time= worked part-time last week)
.
. gen RR = reprate
. gen DR = disrate
. gen UI = ui
. gen RRUI = RR*UI
. gen DRUI = DR*UI
. gen LOGWAGE = logwage
. sum
Variable |
Obs
Mean Std. Dev.
Min
Max
-------------+-------------------------------------------------------spell |
3343 6.247981 5.611271
1
28
censor1 |
3343 .3209692 .4669188
0
1
censor2 |
3343 .1014059 .3019106
0
1
censor3 |
3343 .1717021 .3771777
0
1
censor4 |
3343 .3754113 .4843014
0
1
-------------+-------------------------------------------------------ui |
3343 .5527969 .4972791
0
1
reprate |
3343 .4544717 .1137918
.066
2.059
logwage |
3343 5.692994 .5356591 2.70805 7.600402
tenure |
3343 4.114867 5.862322
0
40
disrate |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------slack |
3343 .4884834 .4999421
0
1
abolpos |
3343 .1456775 .3528354
0
1
explose |
3343 .5025426 .5000683
0
1
stateur |
3343
6.5516 1.803825
2.5
13
houshead |
3343 .6120251 .4873617
0
1
-------------+-------------------------------------------------------married |
3343 .5860006 .4926221
0
1
female |
3343 .3478911 .4763725
0
1
child |
3343 .4501944 .4975876
0
1
ychild |
3343 .1956327 .3967463
0
1
399

nonwhite |
3343 .1390966 .3460991
0
1
-------------+-------------------------------------------------------age |
3343 35.44331 10.6402
20
61
schlt12 |
3343 .2811846 .4496446
0
1
schgt12 |
3343 .3356267 .4722797
0
1
smsa |
3343 .7241998 .4469835
0
1
bluecoll |
3343 .6036494 .489212
0
1
-------------+-------------------------------------------------------mining |
3343 .029315 .1687132
0
1
constr |
3343 .1480706 .3552231
0
1
transp |
3343 .0646126 .2458778
0
1
trade |
3343 .1848639 .3882452
0
1
fire |
3343 .0514508 .2209484
0
1
-------------+-------------------------------------------------------services |
3343 .1699073 .3756075
0
1
pubadmin |
3343 .0095722 .097383
0
1
year85 |
3343 .2677236 .442839
0
1
year87 |
3343 .2174693 .4125862
0
1
year89 |
3343 .1998205 .3999251
0
1
-------------+-------------------------------------------------------midatl |
3343 .1088842 .3115405
0
1
encen |
3343 .1429853 .3501103
0
1
wncen |
3343 .0643135 .2453472
0
1
southatl |
3343 .2375112 .4256217
0
1
escen |
3343 .0532456 .2245564
0
1
-------------+-------------------------------------------------------wscen |
3343 .1441819 .3513266
0
1
mountain |
3343 .1079868 .3104102
0
1
pacific |
3343 .0260245 .159232
0
1
RR |
3343 .4544717 .1137918
.066
2.059
DR |
3343 .1094376 .0735274
.002
1.02
-------------+-------------------------------------------------------UI |
3343 .5527969 .4972791
0
1
RRUI |
3343 .2478687 .2380667
0
2.059
DRUI |
3343 .0602776 .0754261
0
.824
LOGWAGE |
3343 5.692994 .5356591 2.70805 7.600402
.
. ********* COMPETING RISKS FOR UNEMPLOYMENT DURATION **********
.
. * Stata analysis requires using stset to define the dependent variable
. * and the censoring variable if there is one
.
. * For the competing risks model there are three censoring variables
. * CENSOR1 = 1 if re-employed at full-time job
. * CENSOR2 = 1 if re-employed at part-time job
. * CENSOR3 = 1 if re-employed but left job: pt-ft status unknown
.
. * Define $xlist = list of regressors used in subsequent regressions
. global xlist RR DR UI RRUI DRUI LOGWAGE /*
> */ tenure slack abolpos explose stateur houshead married /*
400

>
>
>
>

*/ female child ychild nonwhite age schlt12 schgt12 smsa bluecoll /*


*/ mining constr transp trade fire services pubadmin /*
*/ year85 year87 year89 midatl /*
*/ encen wncen southatl escen wscen mountain pacific

.
. *** (1A) EXPONENTIAL WITH NO HETEROGENEITY Table 19.2
.
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(exponential)
failure _d: censor1 == 1
analysis time _t: spell
Exponential regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 565.24


Log pseudo-likelihood = -2700.6903
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4720235 .6005534 0.79 0.432 -.7050396 1.649087
DR | -.5756396 .7624489 -0.75 0.450 -2.070012 .9187327
UI | -1.424561 .2493917 -5.71 0.000 -1.91336 -.9357622
RRUI | .9655904 .6118408 1.58 0.115 -.2335956 2.164776
DRUI | -.1990635 1.019118 -0.20 0.845 -2.196498 1.798371
LOGWAGE | .3508005 .115598 3.03 0.002 .1242327 .5773684
tenure | -.0001462 .0064637 -0.02 0.982 -.0128147 .0125224
slack | -.2593666 .0759363 -3.42 0.001 -.4081991 -.1105342
abolpos | -.1550897 .0953306 -1.63 0.104 -.3419342 .0317549
explose | .198458 .0648354 3.06 0.002
.071383 .3255331
401

stateur | -.064626 .0229903 -2.81 0.005 -.1096862 -.0195659


houshead | .3812208 .0836602 4.56 0.000 .2172499 .5451918
married | .369552 .0786145 4.70 0.000 .2154705 .5236335
female | .1164067 .0852986 1.36 0.172 -.0507754 .2835888
child | -.0333008 .0794577 -0.42 0.675 -.1890352 .1224335
ychild | -.1449722 .1022781 -1.42 0.156 -.3454336 .0554892
nonwhite | -.6692066 .1188272 -5.63 0.000 -.9021037 -.4363095
age | -.0220821 .0039256 -5.63 0.000 -.0297762 -.0143879
schlt12 | -.1231414 .0966102 -1.27 0.202 -.3124939 .066211
schgt12 | .1114395 .082945 1.34 0.179 -.0511297 .2740087
smsa | .1922291 .0799904 2.40 0.016 .0354508 .3490075
bluecoll | -.2033718 .085129 -2.39 0.017 -.3702215 -.036522
mining | -.1205818 .1973575 -0.61 0.541 -.5073955 .2662319
constr | -.04475 .1081519 -0.41 0.679 -.2567237 .1672238
transp | -.1786694 .156034 -1.15 0.252 -.4844906 .1271517
trade | -.0345159 .1019152 -0.34 0.735 -.234266 .1652341
fire | .1120549 .1386716 0.81 0.419 -.1597365 .3838462
services | .1840002 .0983911 1.87 0.061 -.0088428 .3768432
pubadmin | .1090606 .2954211 0.37 0.712 -.4699541 .6880752
year85 | .2147661 .0888664 2.42 0.016 .0405911 .388941
year87 | .3541162 .0948499 3.73 0.000 .1682139 .5400186
year89 | .467082 .1104355 4.23 0.000 .2506325 .6835316
midatl | .0264112 .1465647 0.18 0.857 -.2608503 .3136727
encen | .0043916 .1502813 0.03 0.977 -.2901544 .2989375
wncen | .1724311 .1607689 1.07 0.283 -.1426703 .4875324
southatl | .2638807 .1183726 2.23 0.026 .0318747 .4958867
escen | .35414 .19317 1.83 0.067 -.0244664 .7327463
wscen | .3385896 .1433308 2.36 0.018 .0576664 .6195128
mountain | .0063693 .1538821 0.04 0.967 -.2952341 .3079727
pacific | .0770202 .2393505 0.32 0.748 -.3920982 .5461385
_cons | -4.079107 .8767097 -4.65 0.000 -5.797426 -2.360788
-----------------------------------------------------------------------------. estimates store bexpr1
.
. stset spell, fail(censor2=1)
failure event: censor2 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
339 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
402

. streg $xlist, nolog nohr robust dist(exponential)


failure _d: censor2 == 1
analysis time _t: spell
Exponential regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
339
20887

Number of obs =

Wald chi2(40) = 227.08


Log pseudo-likelihood = -1250.5446
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.0928628 .9761428 -0.10 0.924 -2.006068 1.820342
DR | -.9600127 1.246692 -0.77 0.441 -3.403483 1.483458
UI | -1.047747 .5236826 -2.00 0.045 -2.074146 -.021348
RRUI | -.6698307 1.191869 -0.56 0.574 -3.005851 1.666189
DRUI | 1.987208 1.726509 1.15 0.250 -1.396688 5.371105
LOGWAGE | -.2577715 .1793075 -1.44 0.151 -.6092077 .0936646
tenure | .0053684 .0125538 0.43 0.669 -.0192366 .0299734
slack | -.2636908 .1311029 -2.01 0.044 -.5206477 -.0067339
abolpos | -.5626836 .202701 -2.78 0.006 -.9599703 -.1653969
explose | .0490271 .1130116 0.43 0.664 -.1724715 .2705258
stateur | -.1032439 .0406788 -2.54 0.011 -.182973 -.0235148
houshead | -.073544 .1343412 -0.55 0.584 -.3368479 .18976
married | -.0618813 .1339552 -0.46 0.644 -.3244287 .2006661
female | .4531912 .1384047 3.27 0.001
.181923 .7244594
child | -.2164986 .1452571 -1.49 0.136 -.5011973 .0682002
ychild | .149031 .1815684 0.82 0.412 -.2068365 .5048986
nonwhite | -.4563527 .1820135 -2.51 0.012 -.8130927 -.0996127
age | -.001781 .0064207 -0.28 0.781 -.0143653 .0108033
schlt12 | -.1803101 .1661528 -1.09 0.278 -.5059636 .1453433
schgt12 | -.0534463 .1462829 -0.37 0.715 -.3401555 .2332629
smsa | .1295376 .1384588 0.94 0.349 -.1418367 .400912
bluecoll | .0088207 .1510547 0.06 0.953 -.2872411 .3048825
mining | -.0141252 .4078632 -0.03 0.972 -.8135225 .785272
constr | .1867498 .1896106 0.98 0.325 -.1848802 .5583799
transp | -.402533 .2898061 -1.39 0.165 -.9705426 .1654766
trade | .1106678 .1735195 0.64 0.524 -.2294241 .4507598
fire | -.3396026 .3006096 -1.13 0.259 -.9287865 .2495813
services | .1619867 .1705571 0.95 0.342 -.172299 .4962724
pubadmin | .7445446 .5413463 1.38 0.169 -.3164746 1.805564
year85 | -.0548375 .149323 -0.37 0.713 -.3475052 .2378301
year87 | -.12113 .1616797 -0.75 0.454 -.4380164 .1957563
year89 | .1244437 .1950397 0.64 0.523 -.257827 .5067144
midatl | -.3969537 .2577568 -1.54 0.124 -.9021477 .1082403
403

encen | -.5115788 .2576815 -1.99 0.047 -1.016625 -.0065323


wncen | -.0674875 .257402 -0.26 0.793 -.5719862 .4370113
southatl | -.2719375 .1944647 -1.40 0.162 -.6530813 .1092062
escen | .065407 .3099463 0.21 0.833 -.5420766 .6728905
wscen | -.0941963 .2338712 -0.40 0.687 -.5525754 .3641827
mountain | .2287682 .2264905 1.01 0.312 -.215145 .6726814
pacific | -.2060074 .3970221 -0.52 0.604 -.9841563 .5721415
_cons | -.8636363 1.325425 -0.65 0.515 -3.461421 1.734148
-----------------------------------------------------------------------------. estimates store bexpr2
.
. stset spell, fail(censor3=1)
failure event: censor3 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
574 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(exponential)
failure _d: censor3 == 1
analysis time _t: spell
Exponential regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
574
20887

Number of obs =

Wald chi2(40) = 372.34


Log pseudo-likelihood = -1742.3964
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.6011551 .724665 -0.83 0.407 -2.021472 .8191621
DR | 1.121525 .9012528 1.24 0.213 -.6448975 2.887948
UI | -.9672682 .4486302 -2.16 0.031 -1.846567 -.0879691
RRUI | -.4326869 1.014413 -0.43 0.670
-2.4209 1.555526
DRUI | 2.102012 1.302564 1.61 0.107 -.450967 4.654991
404

LOGWAGE | .0029166 .1448149 0.02 0.984 -.2809153 .2867485


tenure | -.0479889 .0121403 -3.95 0.000 -.0717835 -.0241942
slack | -.4583215 .097709 -4.69 0.000 -.6498277 -.2668154
abolpos | -.2736409 .1396283 -1.96 0.050 -.5473073 .0000255
explose | .0246749 .0862551 0.29 0.775 -.144382 .1937319
stateur | -.1086692 .0319298 -3.40 0.001 -.1712504 -.046088
houshead | .5298135 .1054798 5.02 0.000 .3230769 .7365501
married | .0268657 .1062998 0.25 0.800 -.1814781 .2352095
female | .2590041 .109547 2.36 0.018 .0442959 .4737122
child | -.141802 .1114763 -1.27 0.203 -.3602915 .0766876
ychild | -.0885931 .136915 -0.65 0.518 -.3569416 .1797553
nonwhite | -.4668153 .143211 -3.26 0.001 -.7475036 -.186127
age | -.0247346 .0054431 -4.54 0.000 -.0354029 -.0140662
schlt12 | -.1034495 .1224893 -0.84 0.398 -.3435241 .1366251
schgt12 | .0952043 .1081669 0.88 0.379 -.1167988 .3072075
smsa | .0128711 .1021476 0.13 0.900 -.1873344 .2130767
bluecoll | .3098248 .1110841 2.79 0.005 .0921038 .5275457
mining | .2388579 .2604652 0.92 0.359 -.2716445 .7493603
constr | .0983356 .1419787 0.69 0.489 -.1799376 .3766088
transp | -.0783446 .1897853 -0.41 0.680 -.4503169 .2936278
trade | .1033278 .1292151 0.80 0.424 -.1499291 .3565847
fire | -.3607287 .2689374 -1.34 0.180 -.8878363 .166379
services | .0248212 .1323061 0.19 0.851 -.234494 .2841363
pubadmin | -1.770536 1.040329 -1.70 0.089 -3.809544 .2684714
year85 | .295673 .1143137 2.59 0.010 .0716222 .5197237
year87 | .4303606 .1198341 3.59 0.000 .1954901 .6652311
year89 | -.1373874 .1627204 -0.84 0.398 -.4563135 .1815386
midatl | -.5339921 .2188609 -2.44 0.015 -.9629516 -.1050326
encen | -.075022 .1998626 -0.38 0.707 -.4667454 .3167014
wncen | .1239805 .2095321 0.59 0.554 -.2866948 .5346559
southatl | .1522514 .1635982 0.93 0.352 -.1683951 .472898
escen | -.5123015 .3170723 -1.62 0.106 -1.133752 .1091488
wscen | .0198459 .1898764 0.10 0.917 -.3523051 .3919968
mountain | .1999108 .1869463 1.07 0.285 -.1664972 .5663188
pacific | .4481059 .2705097 1.66 0.098 -.0820833 .9782951
_cons | -1.620926 1.072666 -1.51 0.131 -3.723312 .4814595
-----------------------------------------------------------------------------. estimates store bexpr3
.
. * Table 19.2 (page 658) first three columns
. estimates table bexpr1 bexpr2 bexpr3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
----------------------------------------------------Variable | bexpr1
bexpr2
bexpr3
-------------+--------------------------------------RR |
0.472
-0.093
-0.601
|
0.601
0.976
0.725
DR | -0.576
-0.960
1.122
405

|
0.762
1.247
0.901
UI | -1.425
-1.048
-0.967
|
0.249
0.524
0.449
RRUI |
0.966
-0.670
-0.433
|
0.612
1.192
1.014
DRUI | -0.199
1.987
2.102
|
1.019
1.727
1.303
LOGWAGE |
0.351
-0.258
0.003
|
0.116
0.179
0.145
tenure | -0.000
0.005
-0.048
|
0.006
0.013
0.012
-------------+--------------------------------------N | 3343.000 3343.000 3343.000
ll | -2700.690 -1250.545 -1742.396
----------------------------------------------------legend: b/se
.
. *** (1B) EXPONENTIAL WITH IG HETEROGENEITY Table 19.2
.
. /* Did not work even though Weibull with IG heterogeneity did
>
> stset spell, fail(censor1=1)
> streg $xlist, nohr robust dist(exponential) frailty(invgauss)
> estimates store bexpigr1
>
> stset spell, fail(censor2=1)
> streg $xlist, nolog nohr robust dist(exponential) frailty(invgauss)
> estimates store bexpigr2
>
> stset spell, fail(censor3=1)
> streg $xlist, nolog nohr robust dist(exponential)
> estimates store bexpiggr3
>
> * Table 19.2 (page 658) first three columns
> estimates table bexpigr1 bexpigr2 bexpigr3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
>
> */
.
. *** (2A) WEIBULL WITH NO HETEROGENEITY Table 19.3
.
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
406

-----------------------------------------------------------------------------3343 obs. remaining, representing


1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull)
failure _d: censor1 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 501.65


Log pseudo-likelihood = -2687.5995
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .4481156 .6381895 0.70 0.483 -.8027127 1.698944
DR | -.4269187 .8086983 -0.53 0.598 -2.011938 1.158101
UI | -1.496066 .2639679 -5.67 0.000 -2.013434 -.9786984
RRUI | 1.015226 .6455611 1.57 0.116 -.2500501 2.280503
DRUI | -.2988417 1.065384 -0.28 0.779 -2.386956 1.789272
LOGWAGE | .3655253 .12212 2.99 0.003 .1261745 .6048761
tenure | -.0011127 .0068716 -0.16 0.871 -.0145809 .0123554
slack | -.2652154 .0803214 -3.30 0.001 -.4226424 -.1077883
abolpos | -.1604227 .1012942 -1.58 0.113 -.3589557 .0381103
explose | .2075085 .0684715 3.03 0.002 .0733068 .3417103
stateur | -.0708745 .0242117 -2.93 0.003 -.1183286 -.0234204
houshead | .3976626 .0887192 4.48 0.000 .2237762 .571549
married | .3786057 .0830317 4.56 0.000 .2158665 .541345
female | .1260829 .0896987 1.41 0.160 -.0497233 .301889
child | -.0336778 .0839956 -0.40 0.688 -.1983061 .1309505
ychild | -.1613066 .108947 -1.48 0.139 -.3748389 .0522256
nonwhite | -.7025504 .12426 -5.65 0.000 -.9460956 -.4590052
age | -.0235823 .0041922 -5.63 0.000 -.0317989 -.0153658
schlt12 | -.1226759 .1022762 -1.20 0.230 -.3231335 .0777816
schgt12 | .1162848 .0880692 1.32 0.187 -.0563278 .2888973
smsa | .1999567 .0841129 2.38 0.017 .0350985 .3648149
bluecoll | -.1994925 .0899354 -2.22 0.027 -.3757626 -.0232223
mining | -.1015676 .2036644 -0.50 0.618 -.5007425 .2976073
constr | -.0253737 .1135609 -0.22 0.823 -.247949 .1972016
transp | -.1981522 .1672141 -1.19 0.236 -.5258858 .1295814
trade | -.0311361 .1079502 -0.29 0.773 -.2427146 .1804423
fire | .1262153 .1492527 0.85 0.398 -.1663145 .4187452
407

services | .2031673 .1038945 1.96 0.051 -.0004622 .4067968


pubadmin | .1117728 .3087374 0.36 0.717 -.4933415 .716887
year85 | .2374972 .093387 2.54 0.011
.054462 .4205325
year87 | .3787397 .1011782 3.74 0.000 .1804341 .5770454
year89 | .4920278 .1180472 4.17 0.000 .2606596 .7233959
midatl | .02465 .1542139 0.16 0.873 -.2776037 .3269036
encen | -.0014111 .1579065 -0.01 0.993 -.3109023 .30808
wncen | .1844363 .1694444 1.09 0.276 -.1476687 .5165413
southatl | .2740974 .1250481 2.19 0.028 .0290076 .5191872
escen | .367742 .2024771 1.82 0.069 -.0291058 .7645899
wscen | .3440005 .1527804 2.25 0.024 .0445563 .6434446
mountain | .0159627 .1620188 0.10 0.922 -.3015883 .3335136
pacific | .0849532 .2504077 0.34 0.734 -.4058368 .5757432
_cons | -4.357886 .9196792 -4.74 0.000 -6.160424 -2.555347
-------------+---------------------------------------------------------------/ln_p | .1215314 .0194374 6.25 0.000 .0834348 .1596281
-------------+---------------------------------------------------------------p | 1.129225 .0219492
1.087014 1.173075
1/p | .8855632 .0172131
.8524608 .9199511
-----------------------------------------------------------------------------. estimates store bweibr1
.
. stset spell, fail(censor2=1)
failure event: censor2 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
339 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull)
failure _d: censor2 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
339
20887

Number of obs =

Wald chi2(40) =

3343

222.95
408

Log pseudo-likelihood = -1248.6859

Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.0855974 .9920715 -0.09 0.931 -2.030022 1.858827
DR | -.9387836 1.279111 -0.73 0.463 -3.445794 1.568227
UI | -1.110175 .5267037 -2.11 0.035 -2.142496 -.0778551
RRUI | -.6171912 1.203735 -0.51 0.608 -2.976469 1.742086
DRUI | 1.973269 1.756599 1.12 0.261 -1.469601 5.41614
LOGWAGE | -.2437885 .1833224 -1.33 0.184 -.6030938 .1155168
tenure | .0050643 .0127387 0.40 0.691 -.0199031 .0300317
slack | -.2689689 .133176 -2.02 0.043 -.529989 -.0079487
abolpos | -.5721689 .2059292 -2.78 0.005 -.9757826 -.1685551
explose | .0555267 .1147555 0.48 0.628
-.16939 .2804433
stateur | -.1087083 .0413647 -2.63 0.009 -.1897816 -.027635
houshead | -.0679894 .13661 -0.50 0.619 -.3357401 .1997613
married | -.060856 .1362403 -0.45 0.655 -.327882 .20617
female | .4583892 .1408831 3.25 0.001 .1822634 .734515
child | -.2228982 .147376 -1.51 0.130 -.5117499 .0659535
ychild | .1463598 .1844362 0.79 0.427 -.2151284 .507848
nonwhite | -.485664 .186033 -2.61 0.009 -.8502819 -.121046
age | -.0027009 .0065569 -0.41 0.680 -.0155521 .0101503
schlt12 | -.1837633 .1684487 -1.09 0.275 -.5139167 .1463901
schgt12 | -.0488958 .1485385 -0.33 0.742 -.340026 .2422343
smsa | .1380042 .1410747 0.98 0.328 -.1384971 .4145055
bluecoll | .0132584 .1537386 0.09 0.931 -.2880637 .3145805
mining | -.0138734 .4110202 -0.03 0.973 -.8194583 .7917115
constr | .1973771 .1920481 1.03 0.304 -.1790303 .5737845
transp | -.4116241 .2927848 -1.41 0.160 -.9854717 .1622234
trade | .1125741 .1765277 0.64 0.524 -.2334139 .4585621
fire | -.3378747 .3046641 -1.11 0.267 -.9350054 .2592561
services | .1700335 .1729565 0.98 0.326 -.1689551 .5090221
pubadmin | .7553679 .5487635 1.38 0.169 -.3201889 1.830925
year85 | -.0501695 .1515048 -0.33 0.741 -.3471135 .2467745
year87 | -.1116858 .1645254 -0.68 0.497 -.4341497 .2107781
year89 | .1344555 .1987084 0.68 0.499 -.2550059 .5239168
midatl | -.4039691 .2606153 -1.55 0.121 -.9147658 .1068276
encen | -.5105877 .2608364 -1.96 0.050 -1.021818 .0006423
wncen | -.0579723 .2607792 -0.22 0.824 -.5690902 .4531456
southatl | -.2682241 .1972983 -1.36 0.174 -.6549216 .1184733
escen | .079807 .3146812 0.25 0.800 -.5369568 .6965709
wscen | -.0854421 .2368638 -0.36 0.718 -.5496865 .3788024
mountain | .2441762 .2300886 1.06 0.289 -.2067892 .6951416
pacific | -.1999107 .4003467 -0.50 0.618 -.9845758 .5847544
_cons | -1.055211 1.353275 -0.78 0.436 -3.707582 1.597159
-------------+---------------------------------------------------------------/ln_p | .0815649 .0308379 2.64 0.008 .0211236 .1420061
-------------+---------------------------------------------------------------p | 1.084984 .0334587
1.021348 1.152584
409

1/p | .9216729 .0284225


.8676159 .9790979
-----------------------------------------------------------------------------. estimates store bweibr2
.
. stset spell, fail(censor3=1)
failure event: censor3 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
574 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull)
failure _d: censor3 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
No. of subjects
No. of failures
Time at risk

=
=
=

3343
574
20887

Number of obs =

Wald chi2(40) = 350.72


Log pseudo-likelihood = -1729.8356
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.6946399 .762754 -0.91 0.362 -2.18961 .8003305
DR | 1.361414 .9691375 1.40 0.160 -.5380611 3.260888
UI | -1.098453 .4595297 -2.39 0.017 -1.999115 -.1977918
RRUI | -.3055217 1.046769 -0.29 0.770 -2.357151 1.746107
DRUI | 1.990913 1.37004 1.45 0.146 -.6943156 4.676141
LOGWAGE | .0401096 .1526549 0.26 0.793 -.2590886 .3393078
tenure | -.0495153 .0126559 -3.91 0.000 -.0743204 -.0247103
slack | -.473113 .1025776 -4.61 0.000 -.6741614 -.2720647
abolpos | -.2910168 .1465355 -1.99 0.047 -.5782212 -.0038124
explose | .0315602 .0906338 0.35 0.728 -.1460787 .2091991
stateur | -.1199252 .0337488 -3.55 0.000 -.1860717 -.0537787
houshead | .5592843 .1107798 5.05 0.000 .3421598 .7764087
410

married | .032312 .1115613 0.29 0.772 -.1863442 .2509681


female | .2764899 .1147909 2.41 0.016 .0515039 .5014759
child | -.149619 .1167679 -1.28 0.200 -.3784799 .079242
ychild | -.1018703 .1436607 -0.71 0.478 -.3834401 .1796996
nonwhite | -.5164388 .1517355 -3.40 0.001 -.8138349 -.2190427
age | -.0275549 .0057648 -4.78 0.000 -.0388536 -.0162561
schlt12 | -.1115642 .1291366 -0.86 0.388 -.3646673 .1415389
schgt12 | .1015553 .1135108 0.89 0.371 -.1209217 .3240324
smsa | .0270168 .1078739 0.25 0.802 -.1844122 .2384459
bluecoll | .3229431 .1167884 2.77 0.006
.094042 .5518443
mining | .2437267 .2731206 0.89 0.372 -.2915799 .7790332
constr | .1307943 .1484399 0.88 0.378 -.1601425 .4217311
transp | -.1004424 .2004105 -0.50 0.616 -.4932397 .2923549
trade | .1181562 .136055 0.87 0.385 -.1485068 .3848192
fire | -.344603 .2792784 -1.23 0.217 -.8919787 .2027726
services | .0519644 .1386656 0.37 0.708 -.2198151 .3237438
pubadmin | -1.780582 1.049217 -1.70 0.090 -3.837009 .2758459
year85 | .311726 .1192592 2.61 0.009 .0779822 .5454698
year87 | .4514345 .126241 3.58 0.000 .2040067 .6988623
year89 | -.1180122 .1713414 -0.69 0.491 -.4538352 .2178108
midatl | -.5476552 .224463 -2.44 0.015 -.9875945 -.1077158
encen | -.084084 .20745 -0.41 0.685 -.4906786 .3225106
wncen | .1288938 .2191536 0.59 0.556 -.3006393 .5584268
southatl | .16223 .1702456 0.95 0.341 -.1714454 .4959053
escen | -.5110545 .3270884 -1.56 0.118 -1.152136 .130027
wscen | .0218047 .1978693 0.11 0.912 -.3660121 .4096214
mountain | .2045852 .1949939 1.05 0.294 -.1775957 .5867662
pacific | .4535074 .2840292 1.60 0.110 -.1031795 1.010194
_cons | -2.017592 1.123888 -1.80 0.073 -4.220372 .1851884
-------------+---------------------------------------------------------------/ln_p | .163312 .0235045 6.95 0.000
.117244 .2093801
-------------+---------------------------------------------------------------p | 1.177404 .0276744
1.124394 1.232914
1/p | .8493261 .019963
.8110869 .8893682
-----------------------------------------------------------------------------. estimates store bweibr3
.
. * Table 19.3 (page 659) first three columns
. estimates table bweibr1 bweibr2 bweibr3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
----------------------------------------------------Variable | bweibr1
bweibr2
bweibr3
-------------+--------------------------------------RR |
0.448
-0.086
-0.695
|
0.638
0.992
0.763
DR | -0.427
-0.939
1.361
|
0.809
1.279
0.969
UI | -1.496
-1.110
-1.098
411

|
0.264
0.527
0.460
RRUI |
1.015
-0.617
-0.306
|
0.646
1.204
1.047
DRUI | -0.299
1.973
1.991
|
1.065
1.757
1.370
LOGWAGE |
0.366
-0.244
0.040
|
0.122
0.183
0.153
tenure | -0.001
0.005
-0.050
|
0.007
0.013
0.013
-------------+--------------------------------------N | 3343.000 3343.000 3343.000
ll | -2687.600 -1248.686 -1729.836
----------------------------------------------------legend: b/se
.
. *** (2B) WEIBULL WITH IG HETEROGENEITY Table 19.3
.
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nohr robust dist(weibull) frailty(invgauss)
failure _d: censor1 == 1
analysis time _t: spell
Fitting weibull model:
Fitting constant-only model:
Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:
Iteration 6:

log pseudo-likelihood = -3134.2376 (not concave)


log pseudo-likelihood = -2998.472
log pseudo-likelihood = -2984.8299
log pseudo-likelihood = -2960.0446
log pseudo-likelihood = -2954.9102
log pseudo-likelihood = -2954.8838
log pseudo-likelihood = -2954.8838

412

Fitting full model:


Iteration 0:
Iteration 1:
Iteration 2:
Iteration 3:
Iteration 4:
Iteration 5:

log pseudo-likelihood = -2656.6306


log pseudo-likelihood = -2632.196
log pseudo-likelihood = -2616.9139
log pseudo-likelihood = -2616.3231
log pseudo-likelihood = -2616.3216
log pseudo-likelihood = -2616.3216

Weibull regression -- log relative-hazard form


Inverse-Gaussian frailty
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) = 643.00


Log pseudo-likelihood = -2616.3216
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .7356277 .9058181 0.81 0.417 -1.039743 2.510998
DR | -1.072566 1.149098 -0.93 0.351 -3.324758 1.179625
UI | -2.574752 .3843798 -6.70 0.000 -3.328123 -1.821381
RRUI | 1.733571 .9333928 1.86 0.063 -.0958458 3.562987
DRUI | -.060621 1.537813 -0.04 0.969 -3.07468 2.953438
LOGWAGE | .575656 .1766599 3.26 0.001 .2294089 .9219031
tenure | -.0009848 .0097472 -0.10 0.920 -.0200889 .0181194
slack | -.4416007 .1142976 -3.86 0.000 -.6656199 -.2175814
abolpos | -.2873066 .1465357 -1.96 0.050 -.5745113 -.0001019
explose | .3641943 .0976897 3.73 0.000 .1727259 .5556627
stateur | -.0981133 .0346763 -2.83 0.005 -.1660775 -.030149
houshead | .5924383 .1256739 4.71 0.000 .3461219 .8387546
married | .6083214 .1183487 5.14 0.000 .3763624 .8402805
female | .1788439 .1285074 1.39 0.164 -.0730259 .4307137
child | -.0914227 .121778 -0.75 0.453 -.3301031 .1472578
ychild | -.1805373 .1527477 -1.18 0.237 -.4799173 .1188426
nonwhite | -1.008517 .1725174 -5.85 0.000 -1.346645 -.6703894
age | -.0333776 .0059183 -5.64 0.000 -.0449772 -.0217779
schlt12 | -.2258621 .1439543 -1.57 0.117 -.5080075 .0562832
schgt12 | .1505129 .124469 1.21 0.227 -.0934418 .3944677
smsa | .3009952 .119907 2.51 0.012 .0659819 .5360086
bluecoll | -.3211857 .1253163 -2.56 0.010 -.5668012 -.0755702
mining | -.2319827 .3008491 -0.77 0.441 -.8216361 .3576708
constr | -.1260324 .1633669 -0.77 0.440 -.4462257 .1941609
transp | -.2763858 .225893 -1.22 0.221 -.7191279 .1663562
trade | -.0687616 .1518284 -0.45 0.651 -.3663399 .2288166
fire | .0668973 .2131814 0.31 0.754 -.3509306 .4847252
services | .231914 .1494712 1.55 0.121 -.0610441 .5248721
pubadmin | .0901949 .4579252 0.20 0.844 -.807322 .9877117
413

year85 | .2780139 .1339053 2.08 0.038 .0155644 .5404634


year87 | .5208783 .1415375 3.68 0.000 .2434699 .7982867
year89 | .7209598 .1655487 4.35 0.000 .3964903 1.045429
midatl | -.0192077 .2222646 -0.09 0.931 -.4548382 .4164228
encen | -.0297055 .2284931 -0.13 0.897 -.4775438 .4181328
wncen | .2460338 .24216 1.02 0.310 -.2285911 .7206586
southatl | .3563643 .1793284 1.99 0.047 .0048872 .7078415
escen | .5461543 .2910193 1.88 0.061 -.024233 1.116542
wscen | .4606814 .2140966 2.15 0.031 .0410598 .880303
mountain | .017581 .2293804 0.08 0.939 -.4319963 .4671584
pacific | .1379886 .3636985 0.38 0.704 -.5748475 .8508247
_cons | -5.303059 1.34133 -3.95 0.000 -7.932017 -2.6741
-------------+---------------------------------------------------------------/ln_p | .5611667 .0225898 24.84 0.000 .5168915 .6054418
/ln_the | 1.852696 .0896755 20.66 0.000 1.676935 2.028457
-------------+---------------------------------------------------------------p | 1.752716 .0395935
1.676807 1.832062
1/p | .570543 .0128884
.5458332 .5963715
theta | 6.376987 .5718595
5.349136 7.602343
-----------------------------------------------------------------------------. estimates store bweibigr1
.
. stset spell, fail(censor2=1)
failure event: censor2 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
339 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull) frailty(invgauss)
failure _d: censor2 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
Inverse-Gaussian frailty
No. of subjects
No. of failures
Time at risk

=
=
=

3343
339
20887

Number of obs =

3343

414

Wald chi2(40) = 253.77


Log pseudo-likelihood = -1230.1643
Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.3802006 1.452095 -0.26 0.793 -3.226255 2.465854
DR | -1.689504 1.779553 -0.95 0.342 -5.177363 1.798355
UI | -2.063963 .7469659 -2.76 0.006 -3.527989 -.5999369
RRUI | -.3019038 1.702153 -0.18 0.859 -3.638063 3.034255
DRUI | 3.263067 2.469908 1.32 0.186 -1.577863 8.103998
LOGWAGE | -.4954862 .2614747 -1.89 0.058 -1.007967 .0169948
tenure | .0174014 .0192239 0.91 0.365 -.0202768 .0550795
slack | -.3889861 .1911789 -2.03 0.042 -.7636898 -.0142824
abolpos | -.8027208 .2877528 -2.79 0.005 -1.366706 -.2387356
explose | .1187808 .1663987 0.71 0.475 -.2073546 .4449162
stateur | -.1753726 .059272 -2.96 0.003 -.2915437 -.0592015
houshead | -.0832153 .1944376 -0.43 0.669 -.464306 .2978754
married | -.0092249 .1945187 -0.05 0.962 -.3904747 .3720248
female | .6284921 .2064768 3.04 0.002
.223805 1.033179
child | -.389325 .2127697 -1.83 0.067 -.806346 .0276959
ychild | .3144939 .2663886 1.18 0.238 -.2076182 .836606
nonwhite | -.6691885 .2633831 -2.54 0.011 -1.18541 -.1529671
age | -.0034533 .0093696 -0.37 0.712 -.0218174 .0149108
schlt12 | -.3242365 .2380109 -1.36 0.173 -.7907293 .1422562
schgt12 | -.0745655 .2138285 -0.35 0.727 -.4936618 .3445307
smsa | .2107394 .2012744 1.05 0.295 -.1837512
.60523
bluecoll | -.0065426 .2175612 -0.03 0.976 -.4329548 .4198696
mining | .1293103 .6093175 0.21 0.832 -1.06493 1.323551
constr | .2870954 .2728176 1.05 0.293 -.2476172 .8218081
transp | -.6470251 .4118414 -1.57 0.116 -1.454219 .1601692
trade | .1901489 .2529975 0.75 0.452 -.3057172 .6860149
fire | -.4680763 .4488502 -1.04 0.297 -1.347807 .411654
services | .2462185 .2531429 0.97 0.331 -.2499325 .7423696
pubadmin | 1.351206 .7621665 1.77 0.076 -.1426127 2.845025
year85 | -.1501166 .2195046 -0.68 0.494 -.5803377 .2801044
year87 | -.2400145 .236954 -1.01 0.311 -.7044358 .2244069
year89 | .1828811 .2831188 0.65 0.518 -.3720216 .7377838
midatl | -.4074373 .3806192 -1.07 0.284 -1.153437 .3385627
encen | -.6525035 .381508 -1.71 0.087 -1.400245 .0952385
wncen | -.1300751 .3835973 -0.34 0.735 -.8819119 .6217617
southatl | -.3491396 .2954776 -1.18 0.237 -.928265 .2299859
escen | .2960895 .4558667 0.65 0.516 -.5973927 1.189572
wscen | -.0903554 .3527441 -0.26 0.798 -.7817212 .6010104
mountain | .3721587 .3457717 1.08 0.282 -.3055413 1.049859
pacific | -.1996218 .6042626 -0.33 0.741 -1.383955 .9847112
_cons | 1.157635 1.957298 0.59 0.554 -2.678599 4.993869
-------------+---------------------------------------------------------------/ln_p | .5004283 .0361284 13.85 0.000
.429618 .5712386
/ln_the | 2.896807 .1749249 16.56 0.000
2.55396 3.239653
415

-------------+---------------------------------------------------------------p | 1.649428 .0595911


1.53667 1.770459
1/p | .6062709 .0219036
.5648254 .6507577
theta | 18.11621 3.168976
12.85793 25.52487
-----------------------------------------------------------------------------. estimates store bweibigr2
.
. stset spell, fail(censor3=1)
failure event: censor3 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
574 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. streg $xlist, nolog nohr robust dist(weibull) frailty(invgauss)
failure _d: censor3 == 1
analysis time _t: spell
Weibull regression -- log relative-hazard form
Inverse-Gaussian frailty
No. of subjects
No. of failures
Time at risk

=
=
=

3343
574
20887

Number of obs =

Wald chi2(40) = 416.91


Log pseudo-likelihood = -1696.8456
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.4326716 1.111223 -0.39 0.697 -2.610628 1.745285
DR | 1.166629 1.377826 0.85 0.397 -1.533861 3.867119
UI | -1.761667 .623017 -2.83 0.005 -2.982758 -.5405758
RRUI | -.5160276 1.418361 -0.36 0.716 -3.295964 2.263909
DRUI | 3.668779 1.93489 1.90 0.058 -.1235355 7.461093
LOGWAGE | -.0069584 .2162461 -0.03 0.974 -.4307929 .4168762
tenure | -.0677151 .0174959 -3.87 0.000 -.1020065 -.0334237
slack | -.7093182 .145145 -4.89 0.000 -.9937971 -.4248392
416

abolpos | -.4327781 .2106818 -2.05 0.040 -.8457069 -.0198494


explose | .0930879 .1284587 0.72 0.469 -.1586864 .3448623
stateur | -.1684826 .0472936 -3.56 0.000 -.2611764 -.0757887
houshead | .7760519 .1555864 4.99 0.000 .4711081 1.080996
married | .0849334 .1585652 0.54 0.592 -.2258487 .3957154
female | .329107 .1637254 2.01 0.044 .0082111 .6500028
child | -.2734744 .1667453 -1.64 0.101 -.6002892 .0533403
ychild | -.101407 .2021952 -0.50 0.616 -.4977024 .2948883
nonwhite | -.7325977 .211777 -3.46 0.001 -1.147673 -.3175223
age | -.0354358 .007992 -4.43 0.000 -.0510998 -.0197719
schlt12 | -.1729163 .1803828 -0.96 0.338 -.5264602 .1806275
schgt12 | .0955174 .1615133 0.59 0.554 -.2210429 .4120777
smsa | .0225321 .1500451 0.15 0.881 -.2715509 .3166151
bluecoll | .4311626 .1651405 2.61 0.009 .1074931 .7548321
mining | .4464055 .3724328 1.20 0.231 -.2835495 1.17636
constr | .1875875 .2104018 0.89 0.373 -.2247926 .5999675
transp | -.0190191 .2877627 -0.07 0.947 -.5830237 .5449855
trade | .1708654 .1960546 0.87 0.383 -.2133945 .5551253
fire | -.3548846 .3851005 -0.92 0.357 -1.109668 .3998985
services | .0199891 .1978478 0.10 0.920 -.3677854 .4077636
pubadmin | -2.249289 1.450209 -1.55 0.121 -5.091646 .5930688
year85 | .3978277 .1726143 2.30 0.021 .0595099 .7361456
year87 | .6809662 .1807412 3.77 0.000
.32672 1.035212
year89 | -.1380237 .2307311 -0.60 0.550 -.5902485 .314201
midatl | -.7908245 .3280754 -2.41 0.016 -1.43384 -.1478085
encen | -.1035781 .2984816 -0.35 0.729 -.6885913 .4814351
wncen | .2578004 .3150731 0.82 0.413 -.3597316 .8753324
southatl | .2314723 .2430344 0.95 0.341 -.2448663 .7078109
escen | -.6777305 .4486486 -1.51 0.131 -1.557065 .2016045
wscen | .0308173 .2842933 0.11 0.914 -.5263874 .5880219
mountain | .2849032 .2816226 1.01 0.312 -.267067 .8368734
pacific | .7162217 .4103619 1.75 0.081 -.0880727 1.520516
_cons | -1.42279 1.617429 -0.88 0.379 -4.592894 1.747313
-------------+---------------------------------------------------------------/ln_p | .5795747 .026888 21.56 0.000 .5268752 .6322742
/ln_the | 2.262575 .1322516 17.11 0.000 2.003367 2.521783
-------------+---------------------------------------------------------------p | 1.785279 .0480026
1.693632 1.881886
1/p | .5601365 .0150609
.5313819 .5904471
theta | 9.607798 1.270647
7.413974 12.45078
-----------------------------------------------------------------------------. estimates store bweibigr3
.
. * Table 19.3 (page 659) first three columns
. estimates table bweibigr1 bweibigr2 bweibigr3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
----------------------------------------------------Variable | bweibigr1 bweibigr2 bweibigr3
417

-------------+--------------------------------------RR |
0.736
-0.380
-0.433
|
0.906
1.452
1.111
DR | -1.073
-1.690
1.167
|
1.149
1.780
1.378
UI | -2.575
-2.064
-1.762
|
0.384
0.747
0.623
RRUI |
1.734
-0.302
-0.516
|
0.933
1.702
1.418
DRUI | -0.061
3.263
3.669
|
1.538
2.470
1.935
LOGWAGE |
0.576
-0.495
-0.007
|
0.177
0.261
0.216
tenure | -0.001
0.017
-0.068
|
0.010
0.019
0.017
-------------+--------------------------------------N | 3343.000 3343.000 3343.000
ll | -2616.322 -1230.164 -1696.846
----------------------------------------------------legend: b/se
.
. *** (2C) ESTIMATE COX MODEL SPECIFICATION OF COMPETING RISKS
.
. stset spell, fail(censor1=1)
failure event: censor1 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
1073 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. stcox $xlist, nolog nohr robust basesurv(survrisk1) basechazard(chrisk1)
failure _d: censor1 == 1
analysis time _t: spell
Cox regression -- Breslow method for ties
No. of subjects
No. of failures
Time at risk

=
=
=

3343
1073
20887

Number of obs =

Wald chi2(40) =

3343

540.98
418

Log pseudo-likelihood = -7717.2334

Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | .5222796 .5711698 0.91 0.361 -.5971926 1.641752
DR | -.752507 .72175 -1.04 0.297 -2.167111 .6620971
UI | -1.317719 .2372893 -5.55 0.000 -1.782798 -.8526409
RRUI | .8822462 .582115 1.52 0.130 -.2586783 2.023171
DRUI | -.0951357 .977774 -0.10 0.922 -2.011538 1.821266
LOGWAGE | .3352639 .1106483 3.03 0.002 .1183972 .5521306
tenure | .0008278 .0061286 0.14 0.893 -.0111841 .0128396
slack | -.247863 .0721173 -3.44 0.001 -.3892103 -.1065158
abolpos | -.1511638 .0905035 -1.67 0.095 -.3285475 .0262198
explose | .1865068 .0615742 3.03 0.002 .0658236
.30719
stateur | -.0590475 .022085 -2.67 0.008 -.1023334 -.0157616
houshead | .3601866 .0794827 4.53 0.000 .2044035 .5159698
married | .358819 .0746355 4.81 0.000 .2125362 .5051019
female | .1002758 .0813277 1.23 0.218 -.0591236 .2596753
child | -.0396054 .0755365 -0.52 0.600 -.1876542 .1084435
ychild | -.1276638 .0967856 -1.32 0.187 -.3173602 .0620325
nonwhite | -.6394475 .1151332 -5.55 0.000 -.8651043 -.4137906
age | -.0204623 .0037593 -5.44 0.000 -.0278305 -.0130942
schlt12 | -.1220585 .0920073 -1.33 0.185 -.3023895 .0582726
schgt12 | .1104817 .0783542 1.41 0.159 -.0430897 .2640531
smsa | .1864841 .0766075 2.43 0.015 .0363361 .3366321
bluecoll | -.2108023 .080867 -2.61 0.009 -.3692986 -.052306
mining | -.1238251 .1906352 -0.65 0.516 -.4974632 .249813
constr | -.054455 .1029488 -0.53 0.597 -.256231 .1473209
transp | -.1551657 .1466515 -1.06 0.290 -.4425973 .1322659
trade | -.0383252 .0968106 -0.40 0.692 -.2280706 .1514201
fire | .1097585 .1300779 0.84 0.399 -.1451895 .3647065
services | .1666262 .0939507 1.77 0.076 -.0175138 .3507662
pubadmin | .1022002 .2829817 0.36 0.718 -.4524336 .6568341
year85 | .204162 .084908 2.40 0.016 .0377454 .3705786
year87 | .3384229 .0899115 3.76 0.000 .1621997 .5146462
year89 | .4486559 .104937 4.28 0.000 .2429832 .6543286
midatl | .0342238 .140515 0.24 0.808 -.2411805 .3096282
encen | .0174597 .1438862 0.12 0.903 -.2645521 .2994716
wncen | .1650967 .1532559 1.08 0.281 -.1352795 .4654728
southatl | .2518023 .1127138 2.23 0.025 .0308874 .4727172
escen | .3450422 .1839818 1.88 0.061 -.0155554 .7056398
wscen | .3316752 .1359801 2.44 0.015 .0651591 .5981914
mountain | .009484 .1468626 0.06 0.949 -.2783613 .2973293
pacific | .0720292 .2263339 0.32 0.750 -.3715771 .5156355
-----------------------------------------------------------------------------. estimates store bcoxrisk1
.
419

. stset spell, fail(censor2=1)


failure event: censor2 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
339 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. stcox $xlist, nolog nohr robust basesurv(survrisk2) basechazard(chrisk2)
failure _d: censor2 == 1
analysis time _t: spell
Cox regression -- Breslow method for ties
No. of subjects
No. of failures
Time at risk

=
=
=

Log pseudo-likelihood =

3343
339
20887

Number of obs =

Wald chi2(40) = 211.82


-2444.342
Prob > chi2

3343

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.0719673 .9513101 -0.08 0.940 -1.936501 1.792566
DR | -1.0236 1.193087 -0.86 0.391 -3.362007 1.314807
UI | -.906022 .5109396 -1.77 0.076 -1.907445 .0954013
RRUI | -.7818457 1.166182 -0.67 0.503 -3.06752 1.503829
DRUI | 2.031968 1.671862 1.22 0.224 -1.244821 5.308756
LOGWAGE | -.2800345 .1736454 -1.61 0.107 -.6203732 .0603043
tenure | .0059934 .0122664 0.49 0.625 -.0180483 .0300352
slack | -.2476685 .12775 -1.94 0.053 -.498054 .0027169
abolpos | -.5434923 .1976775 -2.75 0.006 -.9309331 -.1560516
explose | .0334802 .1101886 0.30 0.761 -.1824856 .2494459
stateur | -.0923228 .0393339 -2.35 0.019 -.1694157 -.0152299
houshead | -.0864111 .1303336 -0.66 0.507 -.3418602 .1690379
married | -.065464 .1298376 -0.50 0.614 -.3199409 .189013
female | .4386603 .1340263 3.27 0.001 .1759735 .7013471
child | -.2049337 .1413612 -1.45 0.147 -.4819966 .0721293
ychild | .1556684 .1766059 0.88 0.378 -.1904727 .5018095
nonwhite | -.3956483 .1761206 -2.25 0.025 -.7408382 -.0504583
age | .0001207 .0062519 0.02 0.985 -.0121327 .0123741
420

schlt12 | -.1723734 .1618354 -1.07 0.287 -.489565 .1448182


schgt12 | -.0583556 .142103 -0.41 0.681 -.3368724 .2201611
smsa | .1120279 .1334106 0.84 0.401 -.1494521 .3735079
bluecoll | -.0021333 .1460376 -0.01 0.988 -.2883617 .2840951
mining | -.0132972 .401138 -0.03 0.974 -.7995132 .7729188
constr | .1654229 .1852256 0.89 0.372 -.1976127 .5284584
transp | -.3818733 .2831048 -1.35 0.177 -.9367485 .1730019
trade | .1065755 .1677346 0.64 0.525 -.2221782 .4353293
fire | -.345295 .2945472 -1.17 0.241 -.9225969 .2320068
services | .1443583 .1664345 0.87 0.386 -.1818474 .470564
pubadmin | .7203208 .5238954 1.37 0.169 -.3064953 1.747137
year85 | -.0647735 .1460286 -0.44 0.657 -.3509844 .2214373
year87 | -.138436 .1574958 -0.88 0.379 -.4471221 .1702502
year89 | .100033 .1887671 0.53 0.596 -.2699437 .4700097
midatl | -.3838124 .2529706 -1.52 0.129 -.8796257 .1120009
encen | -.5058645 .2521219 -2.01 0.045 -1.000014 -.0117146
wncen | -.081463 .2512893 -0.32 0.746 -.5739811 .411055
southatl | -.2799968 .1891246 -1.48 0.139 -.6506742 .0906805
escen | .0372908 .2993588 0.12 0.901 -.5494417 .6240233
wscen | -.1157119 .2286912 -0.51 0.613 -.5639385 .3325146
mountain | .204597 .2206239 0.93 0.354 -.2278179 .6370119
pacific | -.2138749 .3899895 -0.55 0.583 -.9782404 .5504905
-----------------------------------------------------------------------------. estimates store bcoxrisk2
.
. stset spell, fail(censor3=1)
failure event: censor3 == 1
obs. time interval: (0, spell]
exit on or before: failure
-----------------------------------------------------------------------------3343 total obs.
0 exclusions
-----------------------------------------------------------------------------3343 obs. remaining, representing
574 failures in single record/single failure data
20887 total analysis time at risk, at risk from t =
0
earliest observed entry t =
0
last observed exit t =
28
. stcox $xlist, nolog nohr robust basesurv(survrisk3) basechazard(chrisk3)
failure _d: censor3 == 1
analysis time _t: spell
Cox regression -- Breslow method for ties
No. of subjects

3343

Number of obs =

3343
421

No. of failures
Time at risk

=
=

574
20887

Wald chi2(40) = 357.81


Log pseudo-likelihood = -4094.2361
Prob > chi2

0.0000

-----------------------------------------------------------------------------|
Robust
_t |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------RR | -.4692082 .7157644 -0.66 0.512 -1.872081 .9336643
DR | .8759221 .8786992 1.00 0.319 -.8462967 2.598141
UI | -.9051384 .4449384 -2.03 0.042 -1.777202 -.0330753
RRUI | -.5392752 1.002388 -0.54 0.591 -2.503919 1.425369
DRUI | 2.293752 1.274021 1.80 0.072 -.2032836 4.790787
LOGWAGE | -.0140883 .1415912 -0.10 0.921 -.291602 .2634253
tenure | -.0465013 .0118142 -3.94 0.000 -.0696567 -.0233458
slack | -.4587556 .0952092 -4.82 0.000 -.6453621 -.2721491
abolpos | -.2743895 .136703 -2.01 0.045 -.5423223 -.0064566
explose | .0199625 .0843281 0.24 0.813 -.1453176 .1852426
stateur | -.1013309 .0311307 -3.26 0.001 -.1623459 -.0403159
houshead | .5154239 .1031203 5.00 0.000 .3133117 .717536
married | .0280002 .1037338 0.27 0.787 -.1753143 .2313148
female | .2477194 .1071841 2.31 0.021 .0376425 .4577962
child | -.1477253 .1086376 -1.36 0.174 -.3606511 .0652005
ychild | -.0702224 .1341067 -0.52 0.601 -.3330667 .1926219
nonwhite | -.4472066 .1401892 -3.19 0.001 -.7219723 -.1724409
age | -.0227849 .0053188 -4.28 0.000 -.0332096 -.0123602
schlt12 | -.1050265 .1191449 -0.88 0.378 -.3385462 .1284931
schgt12 | .0912594 .1057371 0.86 0.388 -.1159815 .2985004
smsa | .0078536 .0994133 0.08 0.937 -.1869928
.2027
bluecoll | .2916892 .1085873 2.69 0.007 .0788619 .5045165
mining | .2392902 .2514416 0.95 0.341 -.2535263 .7321067
constr | .0659352 .1393882 0.47 0.636 -.2072606 .339131
transp | -.0724276 .1845329 -0.39 0.695 -.4341054 .2892502
trade | .0824395 .1260009 0.65 0.513 -.1645178 .3293967
fire | -.3901171 .2648329 -1.47 0.141
-.90918 .1289458
services | .0007351 .1296195 0.01 0.995 -.2533144 .2547847
pubadmin | -1.749927 1.038715 -1.68 0.092 -3.785771 .2859182
year85 | .2810465 .1124259 2.50 0.012 .0606957 .5013973
year87 | .4139684 .117016 3.54 0.000 .1846212 .6433155
year89 | -.1485614 .1590621 -0.93 0.350 -.4603173 .1631946
midatl | -.5271828 .2165005 -2.44 0.015 -.9515159 -.1028497
encen | -.063171 .1962513 -0.32 0.748 -.4478166 .3214745
wncen | .134275 .2051501 0.65 0.513 -.2678118 .5363617
southatl | .1522905 .1610446 0.95 0.344 -.1633512 .4679321
escen | -.5030762 .3118938 -1.61 0.107 -1.114377 .1082245
wscen | .0116807 .1858946 0.06 0.950 -.352666 .3760273
mountain | .2043736 .1827277 1.12 0.263 -.1537662 .5625134
pacific | .4327009 .2661013 1.63 0.104 -.088848 .9542498
------------------------------------------------------------------------------

422

. estimates store bcoxrisk3


.
. * Table 19.3 (page 659) last three columns
. * NOTE: The results from this program differ a little from those
.*
given in text. Need to resolve this.
. estimates table bcoxrisk1 bcoxrisk2 bcoxrisk3, b(%10.3f) se(%10.3f) stats(N ll) /*
> */ keep(RR DR UI RRUI DRUI LOGWAGE tenure)
----------------------------------------------------Variable | bcoxrisk1 bcoxrisk2 bcoxrisk3
-------------+--------------------------------------RR |
0.522
-0.072
-0.469
|
0.571
0.951
0.716
DR | -0.753
-1.024
0.876
|
0.722
1.193
0.879
UI | -1.318
-0.906
-0.905
|
0.237
0.511
0.445
RRUI |
0.882
-0.782
-0.539
|
0.582
1.166
1.002
DRUI | -0.095
2.032
2.294
|
0.978
1.672
1.274
LOGWAGE |
0.335
-0.280
-0.014
|
0.111
0.174
0.142
tenure |
0.001
0.006
-0.047
|
0.006
0.012
0.012
-------------+--------------------------------------N | 3343.000 3343.000 3343.000
ll | -7717.233 -2444.342 -4094.236
----------------------------------------------------legend: b/se
.
. *** (2D) GRAPHS FOR COX COMPETING RISKS MODEL
.
. * Figure 19.1 (page 661) - Plot the three baseline survival functions
. sort _t
. graph twoway (scatter survrisk1 _t, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter survrisk2 _t, c(J) msymbol(i) msize(small) clstyle(p2)) /*
> */ (scatter survrisk3 _t, c(J) msymbol(i) msize(small) clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Baseline Survival Functions") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Baseline Survival Probability", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(3) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Risk 1 (full-time job)") label(2 "Risk 2 (part-time job)") label(3 "Risk 3 (
> unknown job)"))
. graph export combined_bsf.wmf, replace
(file c:\Imbook\bwebpage\Section4\combined_bsf.wmf written in Windows Metafile format)
423

.
. * Figure 19.2 (page 659) - Plot the three baseline cumulative hazards
. sort _t
. graph twoway (scatter chrisk1 _t, c(J) msymbol(i) msize(small) clstyle(p1)) /*
> */ (scatter chrisk2 _t, c(J) msymbol(i) msize(small) clstyle(p2)) /*
> */ (scatter chrisk3 _t, c(J) msymbol(i) msize(small) clstyle(p3)), /*
> */ scale (1.2) plotregion(style(none)) /*
> */ title("Baseline Cumulative Hazard Functions") /*
> */ xtitle("Unemployment Duration in 2-week intervals", size(medlarge)) xscale(titlegap(*5)) /*
> */ ytitle("Baseline Cumulative Hazard", size(medlarge)) yscale(titlegap(*5)) /*
> */ legend(pos(11) ring(0) col(1)) legend(size(small)) /*
> */ legend( label(1 "Risk 1 (full-time job)") label(2 "Risk 2 (part-time job)") label(3 "Risk 3 (
> unknown job)"))
. graph export combined_cbh.wmf, replace
(file c:\Imbook\bwebpage\Section4\combined_cbh.wmf written in Windows Metafile format)
.
. ********** CLOSE OUTPUT **********
. log close
log: c:\Imbook\bwebpage\Section4\mma19p1comprisks.txt
log type: text
closed on: 19 May 2005, 17:53:08

424

-----------------------------------------------------------------------------------------------------log: c:\Imbook\bwebpage\Section4\mma20p1count.txt
log type: text
opened on: 20 May 2005, 08:41:33
.
. ********* OVERVIEW OF MMA20P1COUNT.DO **********
.
. * STATA Program
. * copyright C 2005 by A. Colin Cameron and Pravin K. Trivedi
. * used for "Microeconometrics: Methods and Applications"
. * by A. Colin Cameron and Pravin K. Trivedi (2005)
. * Cambridge University Press
.
. * Chapter 20.3 pages 671-4 and 20.7 page 690
. * Count data regression example
. * It provides
. * (1) Frequency distribution for count (Table 20.3)
. * (2) Data summary (Table 20.4)
. * (3) Poisson regression with various standard errors (Table 20.5)
. * (4) Negative binomial regression with various standard errors (Table 20.5)
.
. * To use this program you need health expenditure data in Stata data set
. * randdata.dta
.
. ********** SETUP **********
.
. set more off
. version 8.0
. set scheme s1mono /* Used for graphs */
.
. ********** DATA DESCRIPTION **********
.
. * Essentially same data as in P. Deb and P.K. Trivedi (2002)
. * "The Structure of Demand for Medical Care: Latent Class versus
. * Two-Part Models", Journal of Health Economics, 21, 601-625
. * except that paper used different outcome (counts rather than $)
.
. * Each observation is for an individual over a year.
. * Individuals may appear in up to five years.
. * All available sample is used except only fee for service plans included.
. * In analysis here only year 2 is used so panel complications are avoided.
. * Clustering of individuals within household is ignored here.
.
. * Dependent variable is
.*
MED
med
Annual medical expenditures in constant dollars
.*
excluding dental and outpatient mental
.*
LNMED lnmeddol Ln(Medical expenditures) given meddol > 0
425

.*
Missing otherwise
.*
DMED binexp 1 if medical expenditures > 0
.
. * Regressors are
. * - Health insurance measures
.*
LC
logc
log(coinsrate+1) where coinsurance rate is 0 to 100
.*
IDP
idp
1 if individual deductible plan
.*
LPI
lpi
1og(annual participation incentive payment) or 0 if no payment
.*
FMDE
fmde
log(max(medical deductible expenditure)) if IDP=1 and MDE>1 or 0
otherw
> ise.
. * - Health status measures
.*
NDISEASE disea number of chronic diseases
.*
PHYSLIM physlm 1 if physical limitation
.*
HLTHG hlthg 1 if good health
.*
HLTHF hlthf 1 if good health
.*
HLTHP hlthp 1 if good health (omitted is excellent)
. * - Socioeconomic characteristics
.*
LINC linc
log of annual family income (in $)
.*
LFAM lfam
log of family size
.*
EDUCDEC educdec years of schooling of decision maker
.*
AGE
xage
exact age
.*
BLACK black 1 if black
.*
FEMALE female 1 if female
.*
CHILD child 1 if child
.*
FEMCHILD fchild 1 if female child
.
. * If panel data used then clustering is on
.*
zper
person id
.
. ********** READ DATA, SELECT AND TRANSFORM ********