Académique Documents
Professionnel Documents
Culture Documents
Chapter 6
In order to assess the relationsip between a categorical variable and a quantitative dependent
variable, we introduce dummy (indicator) variables in the regression model.
A dummy variable assumes values 0 and 1, representing the absence and presence of a certain
characteristic, respectively.
For example, for a gender variable, we can define a dummy variable either
(6.1)
or
If we conder the folloiwn model
whereY is the annual expenditure on food and D is the dummy variable as defined in (6.1).
Then, the mean food expenditure for males is
and that for females is
We note that the slope coefficient associated with the dummy variable represents by how much
the mean of Y (food expenditure) of females is different from that of males.
It is often called the differential intercept coefficient.
Next, consider the following model
(6.2)
by adding the quantitative variable
and that for females is
These two regression lines are parallel with different intercept, is known as the parallel regression
model. The difference in mean of the food expenditure remains constant regardless of the value
of X.
2
If the main interest is to compare the means of different categories, then quantitative explanatory
variables in the model are called the covariates or control variabels. Models with both categorical
variables and quantitative variables as explanatory variabes are called analysis-of-covariance
(ACOVA) models.
Next, consider the following model
(6.3)
by adding the quantitative variable
and that for females is
These two regression lines have different intercetunless
is called
the interaction term between
and
.
The difference in mean of the food expenditure depends on the value of X.
In model (6.3)
if
is called coincidenceregresson;
if
and
and
and
.
In this example, since where if no dummy variable defined for the first quarter, the regression line
for the first quarter serves as the base line, in that, comparisons against the first quarter can be
easily made.
Example:
Data in the Excel file named RealProp were collected as a part of a citywide study of real estate
property valuation. They are observations on 60 parcels that sold in a particular calendar year
and neighborhood. The variable in the first four columns are: Market for the selling price of
parcel, Sq.ft for the square feet of living area, Grade for the type of construction (coded such that
1 for high, 0 for medium and -1 for low), and Assessed for the most recent assessed value on city
assessors books. The variables in the remaining columns are defined where appropriate. The
goal is to develop a model to predict the selling price of a parcel based on the size, grade and
assessed value.
3
Define
Y: the selling price
(6.3)
and
(6.4)
where
.
1. Find the regression line for each of the grade for model (6.3).
2. Find the regression line for each of the grade for model (6.4).
3. Discuss the implication of the use of
Properties of the logit:
1. As
ranges from 0 to 1,
ranges from to .
2.
.
4
3. If the logit
is positive, then the odds that Y equals 1 increases as the value of the
explanatory variable(s) increases. If the logit
The advantage of this regression function is that the estimated regression function is always
between 0 and 1 while the estimated linear regression function can be larger than 1 or less than 0.
If the estimated
and if the
estimated
.
As the regression function