Vous êtes sur la page 1sur 12

DS101 Housing Project

California Estate Firm

By: Amrit Dhaliwal & Demi To


Introduction

In the state of California, one of the biggest issue occurring is the increase in housing

prices. Certain areas of California such as the Bay Area or Southern California coastal counties’

housing prices are rocketing while other areas seem to be more affordable. There clearly are

factors to this case. When looking to purchase a home, it is important to consider certain factors

affect home prices. For example, home buyers might be interested in knowing the population,

median income of the neighborhood or even crime rates since safety is a big concern. It is also

important to compare the housing market other counties for such as if other counties have

significantly more expensive housing and what other factors might influence those home prices.

Taking time to consider all these factors and information when looking for a home is significant

since buying a house is a serious step.


Literature Review

For our research, we found interesting insight from “Regulation and the High Cost of

Housing in California” by John M. Quigley and Steven Raphael. According to this article, the

impact of housing costs comes from regulation and supply of housing units. Also, it mentions a

higher population density and the elasticity of housing prices therefore population density is a

variable to include. The article suggests people with various incomes and background play a

dynamic role in the housing market which led us to consider income as a variable.

An article called“The Impact of Interest Rates, Income, and Employment Upon Regional

Housing Prices” provides important information the differeny way housing prices react to local

and economic factors such as mortgage rates, population shift and income trends. The study is to

find how each factors affects housing prices in a unique way. The author also emphasxizes on

demographic factors for the demand of housing. The article focuses on the U.S. housing

marketing as a whole however, our problem statement will focus on only on counties in

California.
Problem Statement

With so many factors at large; why do some counties have significantly higher

average housing prices than other counties? After researching intensively onto the subject, I have

decided to incoporate six independent variables into this project. The variables I have chosen are:

Median income, Unemployment, population density, crime rate, house foreclosures, and coastal

counties. These are the variables I will be testing to see if they will be statistically significant in

determining the variability of the average housing prices in each California counties. The null

hypothesis for my project would be:

H0:B1=0
H0:B2=0
H0:B3=0
H0:B4=0
H0:B5=0
H0:B6=0

And the alternative hypothesis would be:

H0:B1≠0
H0:B2≠0
H0:B3≠0
H0:B4≠0
H0:B5≠0
H0:B6≠0

Methodology

For our process, we will talk about variability in average housing prices in all 58 counties

in California in the year of 2010. We will start working on this project by gathering the excel

data tables from California Rand. The website provides data tables which made it convenient to

export into excel where we can put into Statgraphics. The type of research we will conduct from

California Rand are crime rate, the median income, population density, and house foreclosure.

Finding data for each variable was tricky because depending on the information needed, it came

from various of sources. we was able to get the unemployment rate for each counties and the

coastal counties among California. This was found from the California Bureau Labor statistics

map. Through looking at a county map, we found out which counties are coastal for the coastal

variable. After gathering all the date, we imported into Statgraphics. After importing the data,

we made every column numeric. The first model was the simple model which includes one

dependent variable and six independent variables. The six independent variables are median

income, unemployment, population density, crime rate, house foreclosure, and the coastal

dummy. Lastly, a more complex model was developed since we added a lag variable, population

density that was two years before. After these models were developed, we transferred everything

by copy and pasting it onto a word document.


Analysis

The R2 for the first simple multiple regression model is 86.4603 percent, which means the

fitted model has shown a good correlation between the variables. The adjusted R square is

approximately 86 percent. The P-value for the model overall was 0.0000%, which implies the

model is statistically significant the the 95% confidence level. The P-value for each variable

varied of course. For example, median income has a P-value of 0.0000, Population density in

2010 has a P-value of 0.0206. Unemployment has a P-value of 0.4762, and the dummy variable

has a P-value of 0.0001 while the crime rate in 2009 has a P-value of 0.7036, house foreclousres

have a P-value of 0.0233. Only Unemployment and the crime rate has a P-value higher than 0.05.

These two variables are then significant. Therefore, every other variables are statistically

significant at the 95% confidence level. The first model has a fitted equation of:

Avg house sales = -230865. - 8779.05*Crime Rate in 2009 - 9.36419*house foreclosures in 2010
+ 8.19665*Median income in 2010 +
0.0348907*Population Density in 2010 - 2923.2*Unemployment in 2010 +
115281.*Coastal(dummy variable)

The slope can be interpreted that for every one unit increase in crime, there is a decrease

in average housing price by $.8779.05. For every one unit increase in house foreclosure in 2010,

there is a decrease the average housing prices by $9.36419. For every one unit increase in
median income, there is an increase in the average $8.19665. For every unit increase in

population density in 2010, it would increase the average housing price by $0.0348907. For

every one unit unemployment increase of percentage, the average housing price would decrease

by $2923.2. Every county in the coastal area would raise the average housing price by $115,281.

Therefore, coastal counties are significantly more expensive since there is a significant increase

in the prices if county is located coast. The intercept is interpreted as if all the variables in the

equation equal zero, average housing sales would be -$230,865 at minimum.

For the modified model, a lag variable was included, a population density lag of two

years ago. When the lag variable was incorporated the P-values changes. Therefore the P-value

for crime rate in 2009 is 0.7378, house foreclosures in 2010 is 0.0174, median income in 2010 is

0.0000, population density in 2010 is 0.0084, unemployment in 2010 is 0.2845, coastal is 0.0003,

and the population density lag is 0.0071. Unemployment in 2010 and crime rate in 2009 and has

a P-value of higher than 0.05, meaning at the 95% confidence level they are not significant. With

a P-value of 88.2401%, the variable have a positive relationship with each other.

The equation as fitted is for the modified model is:

Avg house sales = -174373. - 8191.75*Crime Rate in 2009 - 9.75421*house foreclosures in 2010
+ 7.44849*Median income in 2010 +
0.0393441*Population Density in 2010 - 4303.53*Unemployment in 2010 +
104498.*Coastal(dummy variable) + 0.0187885*lag(population density in
2008,2)

The intercept is interpreted as if all the variables equal zero, average house sales would

be -17,473 dollars at minimum. The slope for the equation as fitted can be interpreted that for

every one increase of crime rate in 2009, average housing prices decreases by $8,191.75. For

every house foreclosures in 2010, average housing decrease by $9.75421. For every one unit
increase of median income, average housing prices increase by $7.44849. For every increase in

population density in 2010, average housing prices increase by $0.0393441. For every increased

in percentage in unemployment in 2010, average housing prices decrease by $4303.53. For every

coastal counties, average housing prices increase by $104,498. For every incease in population

density in 2008, average housing prices increase by $0.0187885.

After running all the data into statgraphics and interpreting the results, we noticed that

coastal counties have such a high impact in the average housing prices. This result was expected

considering it makes for coastal homes to have higher housing prices than homes located inland.

We would expect crime rate to be more statistically significant since its P-value was higher that

0.05. At a 95% confidence level it is not .


Conclusion

Based on our findings, crime may not a significant variable in housing market

considering crime rates in the annual period in 2009 and the unemployment rate in each counties

are not statistically significant. Therefore, model is a good model aside from finding some non-

statistical variables since the overall P-value for the two models have indicated that as a whole, it

is statically significant at the 95% confidence level.

References
RAND California Statistics. (n.d.). Retrieved May 17, 2015, from

http://ca.rand.org/stats/statistics.html

Reichert, AlanK. “The Impact of Interest Rates, Income, and Employment Upon Regional

Housing Prices.” The Journal of Real Estate Finance and Economics 3.4 (1990): n. pag. Crossref.

Web.

Quigley, John, M., and Steven Raphael. 2005. "Regulation and the High Cost of Housing in

California." American Economic Review, 95 (2): 323-328.


Appendix

Multiple Regression - Avg house sales


Dependent variable: Avg house sales
Independent variables:
Crime Rate in 2009
house foreclosures in 2010
Median income in 2010
Population Density in 2010
Unemployment in 2010
Coastal(dummy variable)
Number of observations: 58

Avg house sales = -230865. - 8779.05*Crime Rate in 2009 - 9.36419*house foreclosures in 2010 + 8.19665*Median
income in 2010 +
0.0348907*Population Density in 2010 - 2923.2*Unemployment in 2010 + 115281.*Coastal(dummy variable)

Standard T
Parameter Estimate Error Statistic P-Value
CONSTANT -230865. 101416. -2.27642 0.0271
Crime Rate in 2009 -8779.05 22941.9 -0.382664 0.7036
house foreclosures in 2010 -9.36419 4.00311 -2.33923 0.0233
Median income in 2010 8.19665 0.8544 9.59345 0.0000
Population Density in 2010 0.0348907 0.0146009 2.38962 0.0206
Unemployment in 2010 -2923.2 4073.15 -0.717675 0.4762
Coastal(dummy variable) 115281. 27145.8 4.24675 0.0001

Analysis of Variance
Source Sum of Squares Df Mean Square F-Ratio P-Value
Model 1.66653E12 6 2.77754E11 54.28 0.0000
Residual 2.60979E11 51 5.11724E9
Total (Corr.) 1.92751E12 57

R-squared = 86.4603 percent


R-squared (adjusted for d.f.) = 84.8673 percent
Standard Error of Est. = 71534.9
Mean absolute error = 50210.1
Durbin-Watson statistic = 1.61536 (P=0.0703)
Lag 1 residual autocorrelation = 0.191338

Plot of Avg house sales

(X 100000)
10

8
observed

0
0 2 4 6 8 10
(X 100000)
predicted
Multiple Regression - Avg house sales
Dependent variable: Avg house sales
Independent variables:
Crime Rate in 2009
house foreclosures in 2010
Median income in 2010
Population Density in 2010
Unemployment in 2010
Coastal(dummy variable)
lag(population density in 2008,2)
Number of observations: 56

Avg house sales = -174373. - 8191.75*Crime Rate in 2010 - 9.75421*house foreclosures in 2010 + 7.44849*Median income in
2010 +
0.0393441*Population Density in 2010 - 4303.53*Unemployment in 2010 + 104498.*Coastal(dummy variable) +
0.0187885*lag(population density in
2008,2)

Standard T
Parameter Estimate Error Statistic P-Value
CONSTANT -174373. 99400.7 -1.75424 0.0858
Crime Rate in 2009 -8191.75 24326.6 -0.33674 0.7378
house foreclosures in 2010 -9.75421 3.95929 -2.46363 0.0174
Median income in 2010 7.44849 0.871245 8.54924 0.0000
Population Density in 2010 0.0393441 0.0143101 2.74939 0.0084
Unemployment in 2010 -4303.53 3975.81 -1.08243 0.2845
Coastal(dummy variable) 104498. 26867.4 3.88941 0.0003
lag(population density in 2008,2) 0.0187885 0.00668595 2.81015 0.0071

Analysis of Variance
Source Sum of Squares Df Mean Square F-Ratio P-Value
Model 1.67506E12 7 2.39294E11 51.45 0.0000
Residual 2.23238E11 48 4.65078E9
Total (Corr.) 1.89829E12 55

R-squared = 88.2401 percent


R-squared (adjusted for d.f.) = 86.5251 percent
Standard Error of Est. = 68196.7
Mean absolute error = 49710.1
Durbin-Watson statistic = 1.56434 (P=0.0552)
Lag 1 residual autocorrelation = 0.189108

Plot of Avg house sales

(X 100000)
10

8
observed

0
0 2 4 6 8 10
(X 100000)
predicted

Vous aimerez peut-être aussi