Académique Documents
Professionnel Documents
Culture Documents
Sales Prices
$400,000
$300,000
$200,000
$100,000 $100,000
$200,000
$300,000
$400,000
$500,000
$600,000
Predicted Values
History of Regression
James Galton created Regression Analysis in 1885 when he was attempting to predict a persons height based on the height of his or her parent.
History of Regression
Galton found that children born to tall parents would be shorter than their parents - and children born to short parents would be taller than their parents. Both groups of children regressed toward the mean height of all children.
History of Regression
In 1922, a PhD student by the name of Casper G. Haas suggested using regression for farm land valuation.
History of Regression
What is truly remarkable is that Mr. Haas was using a technique that required significant amounts of calculations, calculations that today are done by sophisticated computer programs in seconds. Mr. Haas did these calculations by hand. In looking at this excerpt, it is remarkable how the nomenclature and the statistical output has varied so little in more than 85 years.
Uses of Regression
Predicting the Weather
Uses of Regression
Predicting Election Results
Uses of Regression
Predicting Sales Prices
What is Regression?
When Regression Analysis is used to predict sales prices or establish assessments it becomes an
Automated Sales Comparison Approach
Steps in Regression
1. Data Exploration and cleanup
SALES PRICE
7000
HEATED AREA
Because of the potential for extreme values to influence the mean, modelers often remove or trim extreme values.
Model Specification
Specifying the model means picking the appropriate equation and which variables that will be used. Models can be: Additive - Most common for residential properties Multiplicative- Often used for land valuation Hybrid - Most advanced
Regression Components
Dependent Variable: Sales Price Independent Variables: Size Age Location Condition Lot size Construction Quality Amenities
Simple Regression
Simple Regression includes one Dependent Variable (sales price) and only one Independent Variable - such as Square Footage.
500000
400000
SALES PRICE
300000
200000
100000
Simple Regression
Simple Regression using only size as the independent variable will predict sales prices, however, it will treat all homes with the same size equally.
Multiple Regression
We know square footage is an important variable but what other variables should we include and how do we decide?
Effective Age
Correlation Analysis
Pearsons Correlation tells you the degree of relationships between variables.
Correlations SALEPRICE Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N SALEPRICE BLDSIZE BEDROOMS DOCK 1 0.855 0.557 0.142 Notice the high 0 . 0 0 1367 1367 1367 1367 correlation between 0.855 1 0.659 0.062 sales price and size 0. 0 0.021 1367 1367 1367 1367 0.557 0.659 1 0.037 0 0. 0.176 1367 1367 1367 1367 0.142 0.062 0.037 1 Very 0 0.021 0.176 . little 1367 1367 1367 1367 correlation between
sales price and dock
Correlation Analysis also helps identify Collinearity, which is a correlation between 2 independent variables. For example, the living area of a home is highly correlated to the number of bedrooms. It would only be necessary to have one of these variables in the model.
BLDSIZE
BEDROOMS
DOCK
Regression Equations
Y=mx+b
Y = b0 + b1 X1 + b2 X2 + . . . + bK XK
Running Regression
Statistical Software makes using Regression much easier,
performing the necessary calculations quickly and accurately.
Regression Results
Model 1
is to 1
$6,838
3.115 60.997
The adjusted R2 statistic measures the amount of total variation explained by the Regression Model. It ranges from 0.00 to 1.00 with 1.00 being the desired value. A high number, say 0.910 means that approximately 91% of the value can be explained by the model.
Regression Results
The output includes the coefficient and the Constant
Coefficients(a) Standardized Coefficients Beta t 3.115 .855 60.997 Sig. .002 .000
Unstandardized Coefficients Model 1 B (Constant) BLDSIZE a Dependent Variable: SALEPRIC 6838.585 75.068 Std. Error 2195.717 1.231
The Constant represents the un-explained value that is not included in the model.
Running Regression
Lets add another variable to the model - Say Land Size
see if results
improve.
Regression Results
Model 2
Our Adj. R2 went up from
Model Summary
.731 to .801!
R R Square .801 Adjusted R Square .801 Std. Error of the Estimate 21864.78975921
Model 1
.895(a)
$6,119
Unstandardized Coefficients B (Constant) BLDSIZE LANDSF 6119.232 72.660 .382 Std. Error 1889.914 1.065 .017
Model 1
Running Regression
Lets add Age to the model
If Age is significant
Regression Results
Model 3
Model Summary
Model 1
R .912(a)
R Square .832
R2 went up from
.801 to .832!
$22,855
+ Bldsize x $67.28
+ Landsf x $0.44
Running Regression
Lets add Building Quality to the model
We may have
a problem.
Lets run it and see.
Regression Results
Model 4
Model Summary Model 1 R .924(a) R Square .854 Adjusted R Square .853 Std. Error of the Estimate 18784.15717760
Our Adj. R2 went up from .832 to .854 after adding quality, but
Regression Results
Coefficients(a) Standardized Coefficients Beta t -8.794 .681 .309 -.182 .171 54.234 28.831 -16.205 14.171 Sig. .000 .000 .000 .000 .000 Unstandardized Coefficients Model 1 B (Constant) BLDSIZE LANDSF AGE QUAL a Dependent Variable: SALEPRIC -45723.503 59.808 .445 -605.886 26110.420 Std. Error 5199.675 1.103 .015 37.388 1842.475
Resulting Adjustment = 1 x $26,110 = $26,110 = 2 x $26,110 = $52,220 = 3 x $26,110 = $78,330 = 4 x $26,110 = $104,440 = 5 x $26,110 = $130,550
Binary: Either the item is present or not Examples: corner location, Lakefront Location
Transformations
To solve the problem we need to convert the discrete variable Quality into individual binary variables which allows Regression to distinguish each type:
Quality
BECOMES
Running Regression
Now that we have transformed the variable Quality we can put it back in the model
Notice we left
Average out
Regression Results
Model Summary
Model 5
R R Square .870 Adjusted R Square .869
Model 1
.933(a)
Unstandardized Coefficients Model 1 B (Constant) BLDSIZE LANDSF AGE FAIR GOOD EXCEL SUPERIOR a Dependent Variable: SALEPRIC 35633.753 58.537 .419 -625.742 -25511.289 21095.623 75844.967 305671.839 Std. Error 1922.792 1.045 .016 35.363 8693.178 1838.228 12720.934 18494.059
These Quality
t
adjustments
.667 .291 -.188 -.031 .127 .059 .169 56.031 26.342 -17.695 -2.935
18.532
Running Regression
Lets transform Neighborhood into a binary and add it to the model
Regression Results
Model 6
Model Summary Model 1 R .936(a) R Square .875 Adjusted R Square .874
.869 to .874.
Std. Error of the Estimate 17391.93018134
a Predictors: (Constant), NB211006, BLDSIZE, EXCEL, FAIR, SUPERIOR, NB211002, NB211001, NB211005, AGE, LANDSF, GOOD, NB211003
Coefficients(a) Standardized Coefficients Beta
Unstandardized Coefficients Model 1 B (Constant) BLDSIZE LANDSF AGE FAIR GOOD EXCEL SUPERIOR NB211001 NB211002 NB211003 NB211005 NB211006 a Dependent Variable: SALEPRIC 40799.859 56.000 .423 -671.493 -33476.331 17371.495 72617.618 313444.055 14199.881 -3514.034 -1483.623 4044.357 1915.755 Std. Error 2299.668 1.143 .016 37.221 8602.963 2023.937 12567.147 18313.237 2321.457 1657.862 1244.877 2266.186 2601.773
These Neighborhood
t Sig. 17.742 .638 .294 -.201 -.041 .105 .057 .173 .070 -.025 -.015 .021 .008 48.980 25.753 -18.041 -3.891 8.583 5.778 17.116 6.117 -2.120 -1.192 1.785 .000
adjustments
.000 .000 .000 .000 .000 .000 .000 .034 .234 .075 .462
.000
.736
Neighborhood
Running Regression
Multiplicative Transformations combine two variables into one Square Footage x Quality = SQFT1
Reflects the fact that quality may contribute greater value in larger homes and less value in smaller homes. In other words, without combining these variables, all Good Quality homes get the same adjustment regardless of their size. Lets add this new combined variable to the model.
them as stand-alone
variables
Regression Results
Our Adj. R2 went up from
Model Summary
Model 7
R R Square .880 Adjusted R Square .879 Std. Error of the Estimate 17065.96846831
.874 to .879.
Model 1
.938(a)
a Predictors: (Constant), SQFT5, SQFT4, AGE, NB211002, SQFT2, SQFT1, NB211006, NB211001, NB211005, LANDSF, NB211003, SQFT3
Coefficients(a) Standardized Coefficients Beta
Unstandardized Coefficients Model 1 B (Constant) LANDSF AGE NB211001 NB211002 NB211003 NB211005 NB211006 SQFT1 SQFT2 SQFT3 SQFT4 SQFT5 a Dependent Variable: SALEPRIC 43999.158 .418 -660.473 10975.273 -3611.418 -1250.573 6350.688 1923.311 21.119 53.673 63.139 77.267 108.100 Std. Error 2299.663 .016 36.505 2335.844 1624.028 1221.119 2243.206 2554.324 8.533 1.169 1.074 3.557 2.941
Sig.
amounts to
.753
Advanced Transformations
Exponential transformations - Raise variable to a power Land Size x .75 = LAND75
Reflects the principle of diminishing returns. The unit price of land tends to decrease as size increases. Without this transformation land would get the same adjustment, regardless of size. Raising land size to the power of .75 reflects the curve shown below.
SINGLE FAMILY LOT PRICES
$2.85 $2.80 $2.75 $2.70 $2.65 $2.60 $2.55 $2.50 $2.45 $2.40
PRICE PER SF
50 00 50 00 53 00 56 00 57 50 58 00 58 10 58 00 70 00 90 00 11 00 0 15 00 0 20 00 0 30 00 0
LOT SIZE
Running Regression
Lets add our new transformed land variable to the model
Regression Results
Our Adj. R2 went up from
Model 8
Model Summary Model 1 R .939(a) R Square .882 Adjusted R Square .881 Std. Error of the Estimate
.879 to .881.
16919.04533480
a Predictors: (Constant), LAND75, NB211005, NB211001, SQFT4, NB211002, SQFT5, SQFT1, AGE, SQFT2, NB211006, NB211003, SQFT3
Coefficients(a) Standardized Coefficients Beta t 17.903 -.219 .050 -.023 -.017 .035 -.024 .038 .698 .927 .194 .345 .314 -20.005 4.348 -1.986 -1.360 3.019 -2.131 3.640 44.421 56.177 20.094 35.625 26.668 Sig. .000 .000 .000 .047 .174 .003 .033 .000 .000 .000 .000 .000 .000
Unstandardized Coefficients Model 1 B (Constant) AGE NB211001 NB211002 NB211003 NB211005 NB211006 SQFT1 SQFT2 SQFT3 SQFT4 SQFT5 LAND75 a Dependent Variable: SALEPRIC 40782.649 -731.178 10061.900 -3196.888 -1646.847 6714.691 -5595.936 30.298 51.834 60.732 71.516 104.644 12.233 Std. Error 2277.915 36.549 2314.108 1609.968 1211.025 2224.018 2625.622 8.324 1.167 1.081 3.559 2.937 .459
Running Regression
Lets add garages, pools, and baths just to round out our model.
Regression Results
Our Adj. R2 went up from
Model 9
Model Summary(b) Model 1
Coefficients(a) Standardized Coefficients Beta t 10.286 -.212 .061 -.008 -.010 .066 .004 .039 .595 .808 .164 .312 .303 .076 .105 .038 -18.337 5.684 -.717 -.826 5.908 .336 4.016 32.349 41.857 16.974 32.186 27.240 5.765 11.279 3.427 Sig. .000 .000 .000 .474 .409 .000 .737 .000 .000 .000 .000 .000 .000 .000 .000 .001
.881 to .895.
R .947(a)
R Square .897
Unstandardized Coefficients Model 1 B (Constant) AGE NB211001 NB211002 NB211003 NB211005 NB211006 SQFT1 SQFT2 SQFT3 SQFT4 SQFT5 LAND75 BATHS POOL GARAGE a Dependent Variable: SALEPRIC 29680.695 -705.817 12374.064 -1094.891 -938.838 12639.946 852.109 31.388 44.166 52.939 60.447 94.723 11.788 7714.093 13359.275 10.750 Std. Error 2885.532 38.491 2176.815 1527.977 1136.671 2139.489 2535.266 7.815 1.365 1.265 3.561 2.943 .433 1338.204 1184.469 3.137
Regression Results
Coefficients(a) Standardized Coefficients Beta t 18.532 .667 .291 -.188 -.031 .127 .059 .169 56.031 26.342 -17.695 -2.935 11.476 5.962 16.528 Sig. .000 .000 .000 .000 .003 .000 .000 .000 Unstandardized Coefficients Model 1 B (Constant) BLDSIZE LANDSF AGE FAIR GOOD EXCEL SUPERIOR a Dependent Variable: SALEPRIC 35633.753 58.537 .419 -625.742 -25511.289 21095.623 75844.967 305671.839 Std. Error 1922.792 1.045 .016 35.363 8693.178 1838.228 12720.934 18494.059
The Beta value in column 4 indicates the partial correlation of the variable. It is used in stepwise regression in deciding which variable to add next.
Regression Results
The significance of each variable to the model can be determined by looking at the t values.
Coefficients(a) Standardized Coefficients Beta t 10.286 -.212 .061 -.008 -.010 .066 .004 .039 .595 .808 .164 .312 .303 .076 .105 .038 -18.337 5.684 -.717 -.826 5.908 .336 4.016 32.349 41.857 16.974 32.186 27.240 5.765 11.279 3.427 Unstandardized Coefficients Model 1 B (Constant) AGE NB211001 NB211002 NB211003 NB211005 NB211006 SQFT1 SQFT2 SQFT3 SQFT4 SQFT5 LAND75 BATHS POOL GARAGE a Dependent Variable: SALEPRIC 29680.695 -705.817 12374.064 -1094.891 -938.838 12639.946 852.109 31.388 44.166 52.939 Std. Error 2885.532 38.491 2176.815 1527.977 1136.671 2139.489 2535.266 7.815 1.365 1.265
be 2.0 or greater
.000 .000 .000 .474 .409 .000 .737 .000 .000 .000 .000 .000 .000 .000 .000 .001
2.943
Regression Results
Coefficients(a) Standardized Coefficients Beta t 18.532 .667 .291 -.188 -.031 .127 .059 .169 56.031 26.342 -17.695 -2.935 11.476 5.962 16.528 Sig. .000 .000 .000 .000 .003 .000 .000 .000 Unstandardized Coefficients Model 1 B (Constant) BLDSIZE LANDSF AGE FAIR GOOD EXCEL SUPERIOR a Dependent Variable: SALEPRIC 35633.753 58.537 .419 -625.742 -25511.289 21095.623 75844.967 305671.839 Std. Error 1922.792 1.045 .016 35.363 8693.178 1838.228 12720.934 18494.059
The t-statistic is calculated by dividing the coefficient of a variable by its standard error. For example: for the variable BLDSIZE, the t-statistic is calculated as follows: 58.537 / 1.045 = 56.0
Regression Results
Model Summary(b) Model 1 R .947(a) R Square .897 Adjusted R Square .895 Std. Error of the Estimate 15854.87728402
The Standard Error of the Estimate in the regression model tells us how much a sale estimate will vary from its actual value. This number alone is meaningless unless related to the average sales price in the sale sample. Dividing the Standard Error by the Average SalesPrice produces the Coefficient of Variation (COV)
Regression Options
Enter is the default regression method in most statistical software programs. This method includes all variables entered by the modeler. Stepwise multiple regression automatically eliminates redundant or insignificant variables.
Coefficients(a) Model: 4 Unstandardized Coefficients B (Constant) AGE NB211001 NB211005 SQFT1 SQFT2 SQFT3 SQFT4 SQFT5 LAND75 BATHS POOL GARAGE 28624.283 -697.862 12794.553 13302.885 31.406 44.305 53.134 60.544 94.884 11.891 7732.836 13317.394 10.586 Std. Error 2584.025 37.689 2071.093 1969.163 7.797 1.354 1.249 3.557 2.924 .393 1332.987 1179.165 3.047 -.209 .063 .069 .039 .597 .811 .164 .313 .305 .076 .105 .037 Standardized Coefficients Beta t 11.077
Regression
Sig.
-18.516
.000
6.178 6.756 4.028 32.723 42.525 17.023 32.446 30.243 5.801 11.294 3.474
low t-scores"
regression software.
Conclusion
Predicting assessments using Regression requires the appraiser to: Explore data to determine relationships and cleanup outliers Specify which model and variables will be used transform variables and run regression Review Results, modify or add variables Create predicted assessments and review ratio statistics Value Population using final coefficients
The End
500000 400000
SALE PRICES
300000
200000
100000
Predicted Values