Vous êtes sur la page 1sur 14


Gabe Wigtil
6 June 2014
SOC 516 Final Project

Vacant residential housing rates in the United States have a number of important
implications for planners, policy makers, and the general public, especially in light of changing
patterns of human settlement within the nation and recent economic crises. Here, higher rates of
neighborhood level residential vacancy are hypothesized to be associated with higher rates of
neighborhood level unemployment. This hypothesis builds off of existing theory related to
housing vacancy. Additional, previously tested determinants of housing vacancy are also
analyzed on a nationwide scale. Several OLS models are fitted with the variables, and the
assumptions of OLS are examined in detail. Results indicate that the positive association exists
and is robust to a number of controls and tests.
Vacant residential housing presents a number of concerns for the neighborhoods in which
they are located. In cities, vacant housing is associated with the idea of blight and can lead to
increased crime, lower property values, and increased fire risk (Kraut 1999). Many local
governments are tackling the issue of vacant and abandoned residential housing head-on by
condemning, demolishing, or revitalizing vacant spaces (Cohen 2001). These vacant spaces are
representational of the inner city decay that has been challenging urban environments for
decades as populations migrate to suburbs and exurban areas. This issue has been additionally
compounded by the financial crises of the last decade. During and after the Great Recession,
cities like Detroit have captured the nations attention as shrinking cities, and also because of
shifting patterns of migration following deindustrialization. In fact, it has been suggested that
states like Michigan and Ohio have experienced especially unique changes in housing stock
(Goodman 2013). Outside of regional or spatial challenges to residential housing, the recent
economic crises have taken another toll on the way that populations interact with the housing
supply. It has been noted that as a result of the Great Recession, household formation has
declined dramatically (Dunne 2012). Notably, young adults are now living more frequently in
households headed by their parents. Those young adults that are choosing to create their own
households are typically entering the rental market more frequently and the home ownership
market less frequently.
For the purposes of this analysis, residential vacancies are units that are either listed for
sale or rent that have not yet been purchased or rented, sold or rented units that have not yet been
occupied, or those that have been abandoned (Goodman 2013). In the economics literature, it is
suggested that within localities, there may be natural vacancy rates, akin to natural
unemployment rates (Gabriel & Nothaft 1988, 2001). In their examination of rental markets
specifically, they argue that residential vacancy is determined by the incidence of vacancy (the
likelihood that any given property will become vacant) and the duration of vacancy (the length
that a vacant property remains in that state). They find that the proportion of the population that
is elderly, in poverty, and configurations of racial diversity are determinants of vacancy rates.
Dow (2005) examined the determinants of vacancy rates in Los Angeles neighborhoods.
Poverty, race, ethnicity, and the proportion of multiple family units are associated with
residential vacancies. Dow supports the inclusion of poverty, income, race, and education
variables in regression models estimating impacts on vacancy rates, but cautions that checks for
multicollinearity be conducted on the pairs of poverty/income and on Hispanic/education. Baxter
and Lauria (2000) examined how the racial composition of neighborhoods affected vacancy
rates. They note that increased foreclosure rates increase vacancy rates, and that changes in the
racial composition of a neighborhood and the loss of employment were precursors to increased
foreclosure rates. Thus, this paper seeks to examine how unemployment rates affect vacancy
rates, both on its own, and as a potential proxy for rates of foreclosures. Dow (2005) provides
many potential control variables for a close examination of vacancy rates. Baxter and Lauria
(2000) also suggest that racial demographics may interact with other economic variables such as
Variables in this analysis are calculated from estimates taken from the 2006-2010
American Community Survey for all census block groups (N=220,334) (Appendix 1). The
American Community Survey (ACS) is the replacement for the long-form version of the decadal
US Census and is conducted on a continuous basis, thus providing more timely data albeit less
precise (MacDonald 2006). Margins of error are provided for ACS estimates and could be used
for conducting uncertainty analysis, but are not used in this analysis. Not all estimates are
provided for each census block group, thus the number of observations reported for each variable
is inconsistent. Table 1 presents summary statistics of the variables. Pairwise correlations
between the variables do not suggest any immediate concerns regarding multicollinearity (no
Table 1. Summary statistics.

michoh 220334 .0791662 .2699986 0 1
pered12les 219075 15.87862 13.75562 0 100

logmdhseval 213176 12.08548 .7583369 9.21024 13.81551
logmdgrent 190230 6.709981 .4425627 4.59512 7.601402
perrenter 218747 32.07616 25.74984 0 100
logpercap 219040 10.07861 .5123987 3.850147 12.76521
perpvty 218483 11.67829 14.2141 0 100

perhisp 219144 15.2349 23.47942 0 100
perasian 219144 4.080123 9.118214 0 100
perblack 219144 13.09015 23.41831 0 100
percvlun 218909 8.527372 7.656017 0 100
pervacant 218809 8.63804 8.915808 0 100

Variable Obs Mean Std. Dev. Min Max
correlation coefficients are greater than an absolute value of .8 results not shown), though this
will be examined in more detail further on.
The variables were fitted using nested OLS regressions, with groups of theoretically
related control variables sequentially added to the base model (Table 2). OLS was chosen
because the dependent variable, percent unemployment, is expressed as a continuous variable
ranging between 0 and 100. All models have significant F-statistics (not shown). In all models,
all beta coefficients are significant at the 99% significance level or above, likely because of the
large number of observations in the dataset (Lin, Lucas, Jr., & Shmueli 2013). In model 1, a
univariate model, percent unemployment is used as the only determinant on percent vacant
housing. This model accounts for roughly 5% of the variation observed in the dependent
variable, percent vacant housing. In this model, the beta coefficient on percent unemployment is
.263, meaning that for a one percentage point increase in unemployment, a neighborhood would
experience a .263 percentage point increase in vacant housing.
Table 2. OLS regressions on percent of housing units within a neighborhood that are

legend: * p<.1; ** p<.05; *** p<.01

bic 1561671.335 1545322.892 1530319.060 1277838.283 1277450.368 1277818.599
aic 1561650.744 1545271.414 1530236.707 1277726.878 1277318.708 1277697.066
r2_a 0.052 0.120 0.160 0.201 0.203 0.201
r2 0.052 0.120 0.160 0.201 0.203 0.202
N 218703 218703 218409 184923 184923 184926

_cons 6.375*** 6.099*** 31.855*** 26.809*** 20.981*** 22.611***
michoh 0.460*** 0.606***
pered12les 0.039*** 0.041***
logmdhseval -2.477*** -2.429*** -2.403***
logmdgrent -1.012*** -0.919*** -0.891***
perrenter 0.060*** 0.060*** 0.060***
blacklogpe~p -0.019*** -0.043*** -0.043*** -0.046***
logpercap -2.482*** 1.465*** 1.879*** 1.700***
perpvty 0.075*** 0.036*** 0.033*** 0.038***
perhisp 0.016*** -0.017*** 0.005*** -0.003*** -0.003***
perasian -0.066*** -0.042*** -0.009*** -0.011*** -0.011***
perblack 0.101*** 0.258*** 0.492*** 0.485*** 0.522***
percvlun 0.263*** 0.145*** 0.043*** 0.058*** 0.054***

Variable m1 m2 m3 m4 m5 m6


Given the theoretical discussion above, this model likely suffers from omitted variable
bias, thus additional models are fitted using the theoretically suggested control variables. This
suspicion is confirmed by examining the pairwise correlations between percent unemployment
and the other control variables, and by noting the significant beta coefficients on the control
variables in model 5. For example, given the positive correlation between percent unemployment
and percent black (Table 3) and the positive beta coefficient on percent black in the models in
which that variable is included, we can determine that the beta coefficient on percent
unemployed in model 1 is positively biased. A Ramsey
RESET test also confirms this suspicion with a p-value of
0.0000 that the model has no omitted variable.
With the addition of groups of control variables, the
beta coefficient on percent unemployment is attenuated. In
model 5, the beta coefficient on percent unemployment is
.054. The R-squared and adjusted R-squared values increase
with the inclusion of additional, theoretically suggested
control variables. Model 5 accounts for roughly 20% of the
variation observed in percent vacant housing. Other goodness
of fit measurements provide evidence for selecting model 5 as
the preferred model, including the AIC and BIC statistics, which are lowest for model 5 among
those fitted. Given the attenuation seen in the beta coefficient on percent unemployed, a question
may arise as to its true value and significance. To partially address this question, model 6 is fitted
without this dependent variable but with all of the other control variables. Here we see that all of
the remaining independent variables are biased in the expected direction as a result of the
Table 3. Pairwise
correlations between percent
unemployed and the control

All correlations are
significant at the 99%
significance level or above.

michoh 0.1047 0.0318 -0.0763 -0.1455 0.0221 -0.0447 -0.0503
pered12les 0.3370 0.2157 -0.0698 0.5559 0.5398 -0.6644 0.3478
logmdhseval -0.2496 -0.2235 0.3664 0.0121 -0.3876 0.6374 -0.0972
logmdgrent -0.1820 -0.0909 0.3121 0.0331 -0.3386 0.4793 -0.0686
perrenter 0.2653 0.3018 0.1084 0.2739 0.4910 -0.4183 1.0000
logpercap -0.4465 -0.3257 0.1610 -0.3699 -0.6639 1.0000
perpvty 0.4467 0.3490 -0.0841 0.3198 1.0000
perhisp 0.1425 -0.0853 0.0181 1.0000
perasian -0.0664 -0.0948 1.0000
perblack 0.3444 1.0000
percvlun 1.0000

percvlun perblack perasian perhisp perpvty logper~p perren~r
omission of the percent unemployed variable. Additionally, a Ramsey RESET test confirms that
model 6 likely suffers from omitted variable bias.
Even though model 5 is the best specified of the models presented, it also likely suffers
from omitted variable bias, as is confirmed once again by a Ramsey RESET test. Other likely
determinants of vacancy rates within neighborhoods include: age distribution (Dunne 2012;
Gabriel & Nothaft 2001); distribution of housing unit types (Dow 2005); and change in
population (Gabriel & Nothaft 1988). These variables are not included in the current dataset and
are thus not included in the regression models.
Given that our sample is in fact the population of all census block groups in the United
States, there exists the distinct possibility that significant outliers exists. Predicting Studentized
residuals and re-fitting model 5 with observations of large residuals removed provides us the
opportunity to examine the impact of these outliers on our results (Appendix 2). The trend
overall, with regard to beta coefficient magnitudes, is that these coefficients retain their size and
direction. Notably, the beta coefficient on the Michigan-Ohio dummy variable is attenuated, and
its significance reduced with the most aggressive exclusion of outliers. Goodness of fit
measurements demonstrate that this model may be a better choice, though at the expense of
nearly 5% of the observations. This analysis is informative, but the original model 5 will be used
for the remainder of the analyses.
Previous examination of all pairwise correlations between variables did not cause
immediate concern regarding imperfect multicollinearity. Model 5 (without the percent black/log
per capita income interaction terms) is tested for imperfect multicollinearity using variance
inflation factors. The highest variance inflation factor is calculated for the logged per capita
income term at 3.68. This is a somewhat high value, but not an indication of severe imperfect
multicollinearity. The variance inflation factor for our variable of interest, percent unemployed,
is 1.40, providing additional evidence that this variable is useful in explaining some of the
variation observed in the dependent variable. Heteroskedasticity is observed in model 5 when
examining a plot of the residuals against the fitted values. A White test confirms that
heteroskedasticity is present with a p-value of 0.000 that the model is homoskedastic. The beta
coefficients all remain significant at the 99% or greater level of significance with the use of
robust standard errors (results not shown), again likely because of the high number of
observations in the dataset.
Regarding the magnitude of our results, model 5 was fitted with all variables standardized
(results not shown). Percent black is the largest influencer of the dependent variable. For a one
standard deviation increase in the percentage of the population that is black (a roughly 23
percentage point increase) we would predict a 1.34 standard deviation increase in the percentage
of vacant housing units (1.34 standard deviations * (8.9 percentage points)/(1 standard deviation)
= approximately a 12 percentage point increase in vacant units). For our variable of interest, a
one standard deviation increase in the percent of the population that is unemployed (a 7.6
percentage point increase) is associated with a .05 standard deviation increase in the percentage
of vacant housing units (.05 standard deviations * (8.9 percentage points)/(1 standard deviation)
= .4 percentage point increase). Compared to other standardized beta coefficients, this is close in
magnitude, though opposite in sign, to the impact associated with the logged value of median
gross rent (standardized beta coefficient = -.047). As median gross rent is a theoretically
supported determinant of vacancy rates (Gabriel and Nothaft 1988), this comparison provides
additional evidence for including unemployment rates as a relevant determinant of vacancy.
Model 5 includes the interaction term blacklogpercap, which interacts the percent of the
population that is black with the logged value of per capita income (Baxter and Lauria 2000). For
purposes of graphical evaluation, a categorical variable (blackcat) representing percent of the
population that is black was generated (=1 if perblack={0,13.09}; =2 if perblack={13.09,36.51};
=3 if perblack={36.51,100}; these cutoff values [13.09 and 36.51] are the mean and one
standard deviation greater than the mean of the variable perblack). For values of logged per
capita income below its mean, there is a significant interaction between percent of the population
that is black, structured among the three aforementioned categories, and the logged value of per
capita income (Graph 1). This is confirmed by examining the significance of the interactive term
in the OLS regression.
Model 5 also include the dummy variable michoh, which indicates whether or not the
census block group is located in either the state of Michigan or Ohio. Goodman (2013) notes that
these two states may have experienced especially unique changes in housing stock. The beta
coefficient on this variable is
positive and significant. Thus,
on average, it can be stated that
census block groups in
Michigan and Ohio have
vacancy rates .46 percentage
points higher than census blocks
groups in all other locations.
This has implications for policy
makers working in these states
Graph 1. Interaction of percent of the population that is
black with logpercap


4 6 8 10 12
blackcat=1 blackcat=2

to address the unique concerns facing these states.
Much of the significance in model 5, and in fact all the others, can possibly be attributed
to the large number of observations in the dataset (Lin et al. 2013). Following a modified
approach to a solution they provide for more closely examining the significance of independent
values in large N datasets, random sample subsets of increasing size are drawn (using .sample)
from this dataset to test the fit of model 5 (though without the Michigan-Ohio dummy variable).
This allows us to examine the sensitivity of our beta coefficients to the size of the drawn sample,
and also helps examine issues of uncertainty. On the whole, the results regarding the beta
coefficient on percent unemployed appear somewhat robust to variations in sample size, though
there are concerns regarding the significance of this coefficient for samples that include ten
percent of the observations or less (N<19,000) (Graph 2). See Appendix 3 for beta coefficients
estimates for all other variables across sample sizes. Percent unemployment is also a consistently
significant determinant of vacancy rates, at least when compared to some of the other theorized
indicate that rates
of unemployment
are a significant
determinant of
vacancy rates
within census
block groups. This
has implications
Graph 2. Beta coefficients and 95% confidence intervals on percent
unemployed for random sample sizes drawn from the observed

0.1 1 10 100


sample size as a percentage of population size, on a logged scale
Lower Bound
Upper Bound
for urban planners, policy makers, and the general public when considering the context in which
vacant and abandoned housing is generated. While there may indeed be natural rates of
residential vacancy (Gabriel and Nothaft 1988, 2001) as there are natural rates of unemployment,
social policies have been adopted to help address the issue of unemployment. Perhaps a fuller
understanding of the determinants of residential vacancy rates can help generate policies that will
address the complex social context in which these vacancies are occurring.
Baxter, V., & Lauria, M. (2000). Residential mortgage foreclosure and neighborhood change.
Housing Policy Debate, 11(3), 675699.
Cohen, J. R. (2001). Abandoned housing: Exploring lessons from baltimore. Housing Policy
Debate, 12(3), 415448.
Dow, J. P., Jr. (2005). Neighborhood Factors Affecting Apartment Vacancy Rates in Los
Angeles. Southwestern Economic Review, 32(1), 3544.
Dunne, T. (2012). Household Formation and the Great Recession. Federal Reserve Bank of
Cleveland, August. Retrieved from
Gabriel, Stuart A and Nothaft, Frank E. (2001). Rental Housing Markets, the Incidence and
Duration of Vacancy, and the Natural Vacancy Rate1. Journal of Urban Economics, 49,
Gabriel, Stuart A. and Nothaft, Frank E. (1988). Rental housing markers and the natural vacancy
rate. Journal of the American Real Estate and Urban Economics Association, 16(4), 419
Goodman, A. C. (2013). Is there an S in urban housing supply? or What on earth happened in
Detroit? Journal of Housing Economics, 22, 179191.
Kraut, D. T. (1999). Hanging out the No Vacancy Sign: Eliminating the Blight of Vacant
Buildings from Urban Areas. New York University Law Review, 74, 1139-1177.
Lin, M., Lucas, H. C., Jr, & Shmueli, G. (2013). Research Commentary-Too Big to Fail: Large
Samples and the p-Value Problem. Information Systems Research, 24(4), 906-917.
MacDonald, H. (2006). The American Community Survey: Warmer (More Current), but Fuzzier
(Less Precise) than the Decennial Census. Journal of the American Planning Association,
72(4), 491503.

Appendix 1. Variables.
Variable Name Description Calculation
pervacant Percent of housing units that
are vacant
100*(vacant housing units
[excludes seasonal housing units
and migrant worker housing
units])/(housing units)
percvlun Percent of the civilian work
force than is unemployed
100*(unemployed civilian work
force)/(total civilian work
perblack Percent of the population
that for race identifies as
black or African American
100*(race black or African
American alone)/(total

Percent of the population
that for race identifies as
Asian, Native Hawaiian, or
Other Pacific Islander alone
100*(race Asian, Native Hawaiian,
or Other Pacific Islander
alone)/(total population)
perhisp Percent of the population
that for ethnicity identifies
as Hispanic or Latino
100*(Hispanic or Latino)/(total
perpovty Percent of the population for
whom poverty status is
defined, with an income in
the past 12 months below
poverty level
100*(income in the past 12 months
below poverty level)/(total
population for whom poverty
status is determined)
logpercap Logged value of per capita
income in the past 12 months
ln(per capita income)
perrenter Percent of the total
population that are occupying
housing units that are
100*(renter occupied
population)/(total population in
occupied housing units)
lnmdgrent Logged value of the median
gross rent
ln(median gross rent, renter-
occupied housing units paying
cash rent)
lnmdhseval Logged value of the median
house value
ln(median value, owner-occupied
housing units)
pered12les Percent of the population 25
years of age or older with
less than 12 years of
100*(those over 25 years of age
with education less than 12
years)/(total population, those
over 25 years of age)
michoh Dummy variable indicating
whether or not the census
block group is located in
either the state of Michigan
or Ohio
Census block groups with a
Federal Information Processing
Standard state code of either a
26 or 39 = 1 ; all else = 0

Appendix 2. Jack-knife analysis.

legend: * p<.1; ** p<.05; *** p<.01

bic 1277450.368 1257508.468 1226561.356 1137079.545
aic 1277318.708 1257376.853 1226429.854 1136948.508
r2_a 0.203 0.203 0.202 0.229
r2 0.203 0.203 0.202 0.229
N 184923 184288 182702 176266

_cons 20.981*** 21.166*** 20.758*** 18.082***
michoh 0.460*** 0.368*** 0.226*** 0.105*
pered12les 0.039*** 0.040*** 0.039*** 0.037***
logmdhseval -2.429*** -2.362*** -2.275*** -2.187***
logmdgrent -0.919*** -0.929*** -0.939*** -0.951***
perrenter 0.060*** 0.058*** 0.055*** 0.050***
blacklogpe~p -0.043*** -0.038*** -0.034*** -0.040***
logpercap 1.879*** 1.793*** 1.733*** 1.869***
perpvty 0.033*** 0.033*** 0.034*** 0.041***
perhisp -0.003*** -0.004*** -0.004*** -0.005***
perasian -0.011*** -0.013*** -0.012*** -0.010***
perblack 0.485*** 0.436*** 0.387*** 0.453***
percvlun 0.054*** 0.050*** 0.047*** 0.048***

Variable m5 exr4 exr3 exr2

Appendix 3. Coefficient, p-value, sample size chart.