Vous êtes sur la page 1sur 21

BiSAS Term Paper

Amandeep Kaur Osan

15A1HP032

Nandita Mehta

15A1HP086

Swarnima Singh Sengar

15A3HP624

Introduction
For the term paper, we will be using the data set House_data to run multivariate multiple regression.
As the name implies, multivariate regression is a technique that estimates a single regression model
with multiple outcome variables and one or more predictor variables. The objective we have is to find
out the relationship between dependent variable price with other independent variables. We will use
price as our dependent variable and others as our independent variables.

Variables description
Variables

Description

price

price of the house

sqft_living

area of living

sqft_lot

original plot area

sqft_above

floor area

sqft_basement

basement area

bedrooms

number of bedrooms

bathrooms

number of bathrooms

floors

number of floors

view

view from the house on a scale of 5

condition

condition of the house on a scale of 5

Objective
The objective of this analysis is to investigate that whether the price of a
house increases with the number of bedrooms increases, and also whether
the view, the condition of the house, the area in which it is made, etc., also
effect the price of the house.
Business Domain
Real Estate

Data Cleaning
In the first step, the water variable is removed for better analysis. There
were no missing values in the data. There were some unnecessary variables
which were removed for better statistical analysis.
Data Exploration
1. In data exploration, the data is first imported to SAS 9.2 to apply
techniques.
2. Univariate analysis is done on the variables price, sqft_living, sqft_lot,
sqft_above, and sqft_basement.

Price:The UNIVARIATE Procedure


Variable: price (price)
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation

5274
406001.838
258577.971
4.15851016
1.22192E15
63.6888672

Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean

5274
2141253691
6.68626E10
34.0894842
3.52566E14
3560.58585

Basic Statistical Measures


Location
Mean
Median
Mode

Variability

406001.8
350000.0
250000.0

Std Deviation
Variance
Range
Interquartile Range

258578
6.68626E10
4412000
230000

Tests for Location: Mu0=0


Test

-Statistic-

-----p Value------

Student's t
Sign
Signed Rank

t
M
S

Pr > |t|
Pr >= |M|
Pr >= |S|

114.0267
2637
6955088

<.0001
<.0001
<.0001

Tests for Normality


Test

--Statistic---

-----p Value------

Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling

D
W-Sq
A-Sq

Pr > D
Pr > W-Sq
Pr > A-Sq

0.145755
45.83572
269.9525

Quantiles (Definition 5)
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median

Estimate
4490000
1360000
822000
652000
480000
350000

The UNIVARIATE Procedure


Variable:

price

(price)

Quantiles (Definition 5)
Quantile
25% Q1
10%
5%
1%
0% Min

Estimate
250000
200000
170000
118000
78000

Extreme Observations
-----Lowest----

-----Highest-----

Value

Obs

Value

Obs

78000
81000
82000
82500
83000

3960
4173
2149
553
4780

2890000
3100000
3200000
3400000
4490000

1011
1881
5147
2725
2249

<0.0100
<0.0050
<0.0050

The UNIVARIATE Procedure


Variable: price (price)
Histogram
4500000+*
.
.
.
.
.*
.*
.*
.*
.*
.*
2300000+*
.*
.*
.*
.*
.*
.*
.***
.********
.*************************
.************************************************
100000+**********
----+----+----+----+----+----+----+----+----+--* may represent up to 56 counts

#
1

Boxplot
*

1
1
1
2
3
2
6
2
5
10
14
38
50
153
404
1386
2678
517

*
*
*
*
*
*
*
*
*
*
*
*
0
0
|
+--+--+
*-----*
|

4
The UNIVARIATE Procedure
Variable: price (price)
Normal Probability Plot
4500000+
*
|
|
|
|
|
*
|
*
|
*
|
*
|
*
|
*
2300000+
*
|
*
|
*
|
*
|
**
|
****
|
*** +++
|
****+++
|
+++******
|
+++*********
|
*****************
100000+*************+++++
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2

Sqft_living:The UNIVARIATE Procedure


Variable: sqft_living (sqft_living)
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation

5274
1480.67444
705.801163
2.11916203
1.41895E10
47.6675455

Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean

5274
7809077
498155.282
7.05883488
2626772800
9.71879243

Basic Statistical Measures


Location
Mean
Median
Mode

Variability

1480.674
1300.000
1010.000

Std Deviation
Variance
Range
Interquartile Range

705.80116
498155
7460
720.00000

Tests for Location: Mu0=0


Test

-Statistic-

-----p Value------

Student's t
Sign
Signed Rank

t
M
S

Pr > |t|
Pr >= |M|
Pr >= |S|

152.3517
2637
6955088

<.0001
<.0001
<.0001

Tests for Normality


Test

--Statistic---

-----p Value------

Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling

D
W-Sq
A-Sq

Pr > D
Pr > W-Sq
Pr > A-Sq

0.132984
36.02791
211.6481

<0.0100
<0.0050
<0.0050

Quantiles (Definition 5)
Quantile

Estimate

100% Max
7850
99%
4168
95%
2850
90%
2340
75% Q3
1730
50% Median
1300
The SAS System
01:47 Wednesday, October 19, 2016
6

The UNIVARIATE Procedure


Variable: sqft_living (sqft_living)
Quantiles (Definition 5)

Quantile

Estimate

25% Q1
10%
5%
1%
0% Min

1010
840
750
630
390

Extreme Observations
----Lowest----

----Highest---

Value

Obs

Value

Obs

390
420
460
470
480

5248
2972
4670
3944
2115

5780
5940
6330
6430
7850

1011
5210
1572
2249
5113

Histogram
7750+*
.
.
6250+*
.*
.*
4750+*
.*
.**
3250+***
.*****
.***********
1750+************************
.************************************************
.***************************
250+*
----+----+----+----+----+----+----+----+----+--* may represent up to 45 counts

#
1

Boxplot
*

2
6
9
12
35
62
97
191
454
1052
2146
1202
5

*
*
*
*
*
0
0
0
|
+-----+
*--+--*
|
|

Sqft_lot
The UNIVARIATE Procedure
Variable:

sqft_lot

(sqft_lot)

Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation

5274
13408.9863
45173.8626
18.9416156
1.17088E13
336.892449

Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean

5274
70718994
2040677858
511.48021
1.07605E13
622.038354

Basic Statistical Measures


Location

Mean
Median
Mode

Variability

13408.99
7351.00
6000.00

Std Deviation
Variance
Range
Interquartile Range

45174
2040677858
1650759
4630

Tests for Location: Mu0=0


Test

-Statistic-

-----p Value------

Student's t
Sign
Signed Rank

t
M
S

Pr > |t|
Pr >= |M|
Pr >= |S|

21.55653
2637
6955088

<.0001
<.0001
<.0001

Tests for Normality


Test

--Statistic---

-----p Value------

Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling

D
W-Sq
A-Sq

Pr > D
Pr > W-Sq
Pr > A-Sq

0.388485
302.3959
1445.553

Quantiles (Definition 5)
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median

Estimate
1651359
188760
32137
16030
9730
7351

The UNIVARIATE Procedure


Variable: sqft_lot (sqft_lot)
Quantiles (Definition 5)
Quantile
25% Q1
10%
5%
1%
0% Min

Estimate
5100
3621
2550
1105
600

<0.0100
<0.0050
<0.0050

Extreme Observations
----Lowest----

-----Highest-----

Value

Obs

Value

Obs

600
635
649
649
651

1950
5173
1153
143
5098

843309
871200
982278
1164794
1651359

1214
2537
1179
4455
441

Histogram
1650000+*
.
.
.
.
.*
.
.*
850000+*
.
.*
.
.*
.*
.*
.*
50000+************************************************
----+----+----+----+----+----+----+----+----+--* may represent up to 109 counts

#
1

Boxplot
*

1
2

*
*

5
4
30
41
5188

*
*
*
*
+--0--+

Sq_ft above
The UNIVARIATE Procedure
Variable: sqft_above (sqft_above)
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation

5274
1295.86974
590.95481
2.58433978
1.0698E10
45.6029485

Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean

5274
6834417
349227.587
10.8621274
1841477068
8.13737273

Basic Statistical Measures


Location
Mean
Median
Mode

1295.870
1140.000
1010.000

Variability
Std Deviation
Variance
Range
Interquartile Range

590.95481
349228
7460
520.00000

Tests for Location: Mu0=0


Test

-Statistic-

-----p Value------

Student's t
Sign
Signed Rank

t
M
S

Pr > |t|
Pr >= |M|
Pr >= |S|

159.2492
2637
6955088

<.0001
<.0001
<.0001

Tests for Normality


Test

--Statistic---

-----p Value------

Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling

D
W-Sq
A-Sq

Pr > D
Pr > W-Sq
Pr > A-Sq

0.155122
48.74819
278.5273

Quantiles (Definition 5)
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median

Estimate
7850
3650
2430
1950
1460
1140

The UNIVARIATE Procedure


Variable: sqft_above (sqft_above)
Quantiles (Definition 5)
Quantile
25% Q1
10%
5%
1%
0% Min

Estimate
940
790
720
610
390

Extreme Observations
----Lowest----

----Highest---

Value

Obs

Value

Obs

390
420
460
470
480

5248
2972
4670
3944
3643

5000
5320
5430
6430
7850

2814
365
5073
2249
5113

<0.0100
<0.0050
<0.0050

Histogram
#
7750+*
.
.
6250+*
.
.*
4750+*
.*
.*
3250+**
.***
.******
1750+***************
.************************************************
.***********************************
250+*
----+----+----+----+----+----+----+----+----+--* may represent up to 49 counts

Boxplot
1

3
7
22
35
66
102
259
726
2352
1692
8

*
*
*
*
0
0
0
|
+--+--+
+-----+
|

Sq_ft basement
Variable:

The UNIVARIATE Procedure


sqft_basement (sqft_basement)
Moments

N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation

5274
184.804702
345.082535
2.03724345
808040906
186.728222

Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean

5274
974660
119081.956
4.0259496
627919155
4.75174271

Basic Statistical Measures


Location
Mean
Median
Mode

Variability

184.8047
0.0000
0.0000

Std Deviation
Variance
Range
Interquartile Range

345.08254
119082
2250
250.00000

Tests for Location: Mu0=0


Test

-Statistic-

-----p Value------

Student's t
Sign
Signed Rank

t
M
S

Pr > |t|
Pr >= |M|
Pr >= |S|

38.89198
788
621338

<.0001
<.0001
<.0001

Tests for Normality


Test

--Statistic---

-----p Value------

Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling

D
W-Sq
A-Sq

Pr > D
Pr > W-Sq
Pr > A-Sq

0.405037
182.3533
905.4814

<0.0100
<0.0050
<0.0050

Quantiles (Definition 5)
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median

Estimate
2250
1350
960
730
250
0

The UNIVARIATE Procedure


Variable: sqft_basement (sqft_basement)
Quantiles (Definition 5)
Quantile

Estimate

25% Q1
10%
5%
1%
0% Min

0
0
0
0
0

Extreme Observations
----Lowest----

----Highest---

Value

Obs

Value

Obs

0
0
0
0
0

5274
5273
5272
5271
5270

2060
2100
2170
2196
2250

3174
1168
1414
2851
3477

Variable:

The UNIVARIATE Procedure


sqft_basement (sqft_basement)

Histogram
2250+*
.*
.*
.*
.*
.*
.*
.*
.*
.*
.*
1150+*
.**
.**
.**
.**
.***
.**
.***
.***
.***
.**
50+************************************************
----+----+----+----+----+----+----+----+----+--* may represent up to 78 counts

#
1
3
3
3
6
3
3
15
11
18
33
56
87
93
131
137
164
152
178
170
162
138
3707

Boxplot
*
*
*
*
*
*
*
*
*
*
*
*
0
0
0
0
0
|
|
|
+-----+
| + |
*-----*

Test of Normality
Variable:

sqft_living

(sqft_living)

Normal Probability Plot


7750+
*
|
|
6250+
*
|
*
|
*
4750+
*
|
****
|
***
3250+
**** ++++
|
****++++
|
++*****
1750+
++********
|
***********
|*******************
250+*
+++++++
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2

Variable:

sqft_lot

(sqft_lot)

Normal Probability Plot


1650000+
*
|
|
|
|
|
*
|
|
*
850000+
*
|
|
*
|
|
*
|
*
|
**
|
+****+
50000+***********************************************
+----+----+----+----+----+----+----+----+----+----+

Variable:

sqft_above

(sqft_above)

Normal Probability Plot


7750+
*
|
|
6250+
*
|
|
*
4750+
*
|
*
|
****
3250+
***
|
*** +++++
|
+*****+++
1750+
+++*******
|
*************
|*********************
250+*
++++++++
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2

Variable:

sqft_basement

(sqft_basement)

Normal Probability Plot


2250+
*
|
*
|
*
|
*
|
*
|
*
|
*
|
*
|
**
|
***
|
**
1150+
***
|
***
++
|
***
+++
|
** +++
|
***+++
|
**++
|
***+
|
+**
|
+++**
|
+++ **
|
+++
**
50+*******************************
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2

Therefore, the above chart shows that all the variables are multivariate normal.

Statistical Technique
The statistical technique used in the data is multivariate regression. It is a
technique that estimates a single regression model with more than one outcome variable. When there
is more than one predictor variable in a multivariate regression model, the model is a multivariate
multiple regression. The model is used to predict that whether one bedroom or bathroom
increases by one unit or whether there is better view available in a given house, are customers
willing to pay the price or not.

Data Analysis:Multivariate Regression:The REG Procedure


Model: MODEL1
Dependent Variable: bedrooms bedrooms
Number of Observations Read
Number of Observations Used

5274
5274

Analysis of Variance
DF

Sum of
Squares

Mean
Square

4
5269
5273

1030.10115
1817.60534
2847.70648

257.52529
0.34496

Root MSE
Dependent Mean
Coeff Var

0.58733
2.80319
20.95240

Source
Model
Error
Corrected Total

R-Square
Adj R-Sq

F Value

Pr > F

746.53

<.0001

0.3617
0.3612

NOTE: Model is not full rank. Least-squares solutions for the parameters are not
unique. Some statistics will be misleading. A reported DF of 0 or B means that the
estimate is biased.
NOTE: The following parameters have been set to 0, since the variables are a linear
combination of other variables as shown.
sqft_basement =

sqft_living - sqft_above

Parameter Estimates
Parameter
Variable

Label

Intercept
price
sqft_living
sqft_lot
sqft_above
sqft_basement

Intercept
price
sqft_living
sqft_lot
sqft_above
sqft_basement

Standard

DF

Estimate

Error

t Value

Pr > |t|

1
1
B
1
B
0

1.94080
-6.46778E-7
0.00074449
-7.395E-7
0.00002512
0

0.01983
4.169038E-8
0.00002542
1.820141E-7
0.00002819
.

97.88
-15.51
29.29
-4.06
0.89
.

<.0001
<.0001
<.0001
<.0001
0.3728

The REG Procedure


Model: MODEL1
Dependent Variable: bathrooms bathrooms
Number of Observations Read
Number of Observations Used

5274
5274

Analysis of Variance
DF

Sum of
Squares

Mean
Square

4
5269
5273

1518.67050
1111.72919
2630.39970

379.66763
0.21099

Root MSE
Dependent Mean
Coeff Var

0.45934
1.50436
30.53397

Source
Model
Error
Corrected Total

R-Square
Adj R-Sq

F Value

Pr > F

1799.42

<.0001

0.5774
0.5770

NOTE: Model is not full rank. Least-squares solutions for the parameters are not
unique. Some satistics will be misleading. A reported DF of 0 or B means that the
estimate is biased.
NOTE: The following parameters have been set to 0, since the variables are a linear
Combination of other variables as shown.
sqft_basement =

sqft_living - sqft_above

Parameter Estimates
Parameter
Variable

Label

Intercept
price
sqft_living
sqft_lot
sqft_above
sqft_basement

Intercept
price
sqft_living
sqft_lot
sqft_above
sqft_basement

Standard

DF

Estimate

Error

t Value

Pr > |t|

1
1
B
1
B
0

0.34830
2.567253E-8
0.00064208
-6.55906E-7
0.00015721
0

0.01551
3.26051E-8
0.00001988
1.423491E-7
0.00002204
.

22.46
0.79
32.30
-4.61
7.13
.

<.0001
0.4311
<.0001
<.0001
<.0001
.

The REG Procedure


Model: MODEL1
Dependent Variable: floors floors
Number of Observations Read
Number of Observations Used

5274
5274

Analysis of Variance
DF

Sum of
Squares

Mean
Square

4
5269
5273

253.90699
686.34728
940.25427

63.47675
0.13026

Root MSE
Dependent Mean
Coeff Var

0.36092
1.17349
30.75583

Source
Model
Error
Corrected Total

R-Square
Adj R-Sq

F Value

Pr > F

487.30

<.0001

0.2700
0.2695

NOTE: Model is not full rank. Least-squares solutions for the parameters are not
unique. Some statistics will be misleading. A reported DF of 0 or B means that the
estimate is biased.
NOTE: The following parameters have been set to 0, since the variables are a linear
combination of other variables as shown.
sqft_basement =

sqft_living - sqft_above

Parameter Estimates
Variable

Label

Intercept
price
sqft_living
sqft_lot
sqft_above

Intercept
price
sqft_living
sqft_lot
sqft_above

sqft_basement

DF

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

1
1
B
1
B

0.70070
1.745027E-7
-0.00009904
-5.14999E-7
0.00042866

0.01219
2.561875E-8
0.00001562
1.118477E-7
0.00001732

57.50
6.81
-6.34
-4.60
24.75

<.0001
<.0001
<.0001
<.0001
<.0001

sqft_basement

The REG Procedure


Model: MODEL1
Dependent Variable: view view
Number of Observations Read
Number of Observations Used

5274
5274

Analysis of Variance
DF

Sum of
Squares

Mean
Square

4
5269
5273

237.10683
1693.15634
1930.26318

59.27671
0.32134

Root MSE
Dependent Mean
Coeff Var

0.56687
0.14941
379.40089

Source
Model
Error
Corrected Total

R-Square
Adj R-Sq

F Value

Pr > F

184.47

<.0001

0.1228
0.1222

NOTE: Model is not full rank. Least-squares solutions for the parameters are not
unique. Some statistics will be misleading. A reported DF of 0 or B means that the
estimate is biased.
NOTE: The following parameters have been set to 0, since the variables are a linear
Combination of other variables as shown.
sqft_basement =

sqft_living - sqft_above

Parameter Estimates
Variable

Label

Intercept
price
sqft_living
sqft_lot
sqft_above
sqft_basement

Intercept
price
sqft_living
sqft_lot
sqft_above
sqft_basement

DF

Parameter
Estimate

Standard
Error

t Value

Pr > |t|

1
1
B
1
B

-0.17437
7.47685E-7
0.00011362
6.480955E-7
-0.00012092

0.01914
4.023783E-8
0.00002453
1.756725E-7
0.00002720

-9.11
18.58
4.63
3.69
-4.45

<.0001
<.0001
<.0001
0.0002
<.0001

The REG Procedure


Model: MODEL1
Dependent Variable: condition condition
Number of Observations Read
Number of Observations Used

5274
5274

Analysis of Variance
DF

Sum of
Squares

Mean
Square

4
5269
5273

51.22687
2517.61272
2568.83959

12.80672
0.47782

Root MSE
Dependent Mean
Coeff Var

0.69124
3.46189
19.96721

Source
Model
Error
Corrected Total

R-Square
Adj R-Sq

F Value

Pr > F

26.80

<.0001

0.0199
0.0192

NOTE: Model is not full rank. Least-squares solutions for the parameters are not
unique. Some statistics will be misleading. A reported DF of 0 or B means that the
estimate is biased.
NOTE: The following parameters have been set to 0, since the variables are a linear
combination of other variables as shown.
sqft_basement =

sqft_living - sqft_above

Parameter Estimates
Parameter
Variable

Label

Intercept
price
sqft_living
sqft_lot
sqft_above
sqft_basement

Intercept
price
sqft_living
sqft_lot
sqft_above
sqft_basement

Standard

DF

Estimate

Error

t Value

Pr > |t|

1
1
B
1
B
0

3.50167
2.506291E-7
0.00016466
-1.96459E-7
-0.00029533
0

0.02334
4.906598E-8
0.00002992
2.142149E-7
0.00003317
.

150.05
5.11
5.50
-0.92
-8.90
.

<.0001
<.0001
<.0001
0.3591
<.0001
.

The REG Procedure


Model: MODEL1
Multivariate Test 1
L Ginv(X'X) L'
5.038488E-15
-1.17092E-12
7.776209E-16
-7.63057E-14

-1.17092E-12
1.8732555E-9
-1.67642E-13
-1.651999E-9

LB-cj

7.776209E-16
-1.67642E-13
9.603697E-14
-1.31872E-12
L Ginv(X'X) L'

2.5672529E-8
0.0006420774
-6.559065E-7
0.0001572081

1.7450273E-7
-0.000099036
-5.149987E-7
0.0004286615

635548754985
2626772800
25805732831
1920165356.3

7.4768504E-7
0.0001136185
6.4809548E-7
-0.00012092

4.7124527E12
25805732831
1.0760494E13
24829857583

197751985.68
660570.8868
3368602.4915
668485.18942

2.5062915E-7
0.0001646605
-1.964595E-7
-0.000295333

Inv()(LB-cj)

Inv(L Ginv(X'X) L')


487966005.24
1987846.9879
13535835.314
1518177.496

-6.467775E-7
0.0007444866
-7.394999E-7
0.0000251197

LB-cj

Inv(L Ginv(X'X) L')


3.5256632E14
635548754985
4.7124527E12
470294937757

-7.63057E-14
-1.651999E-9
-1.31872E-12
2.3029018E-9

470294937757
1920165356.3
24829857583
1841477067.5

253454358.33
1573689.0694
8830446.8294
1153256.7884

Inv()(LB-cj)
282004486.05
558177.54077
10426828.758
363218.64619

53193572.838
19654.062571
-4016823.744
-114682.6826

Error Matrix (E)


1817.6053383
241.08037659
-12.10035207
-123.4849555
193.12284288

241.08037659
1111.7291944
306.4134363
-43.33131935
-56.21408864

-12.10035207
306.4134363
686.34727878
20.028916995
-155.8725535

-123.4849555
-43.33131935
20.028916995
1693.1563433
38.762558085

193.12284288
-56.21408864
-155.8725535
38.762558085
2517.61272

Hypothesis Matrix (H)


1030.1011463
1192.4463583
378.1856763
234.57483031
-19.68256984

1192.4463583
1518.6705022
532.09623378
415.8948385
-1.40934708

378.1856763
532.09623378
253.90698743
144.25891008
-39.75543283

234.57483031
415.8948385
144.25891008
237.10683454
53.269296295

-19.68256984
-1.40934708
-39.75543283
53.269296295
51.226870419

The REG Procedure


Model: MODEL1
Multivariate Test 1
Hypothesis + Error Matrix (T)
2847.7064846
1433.5267349
366.08532423
111.08987486
173.44027304

1433.5267349
2630.3996966
838.50967008
372.56351915
-57.62343572

366.08532423
838.50967008
940.25426621
164.28782708
-195.6279863

111.08987486
372.56351915
164.28782708
1930.2631779
92.03185438

173.44027304
-57.62343572
-195.6279863
92.03185438
2568.8395904

0.003560
0.008803
0.015117
-0.010636
0.010215

-0.000054776
0.001252
0.006737
0.018051
0.005257

Eigenvectors
0.006046
-0.010945
-0.007015
-0.003435
0.016544

0.013853
-0.003438
0.009279
0.002717
-0.020334

0.002709
0.027851
-0.022405
0.008322
0.012991
Eigenvalues
0.639941
0.113344
0.101557
0.002117
1.035931E-16

Multivariate Statistics and F Approximations


S=4
Statistic
Wilks' Lambda
Pillai's Trace
Hotelling-Lawley Trace
Roy's Greatest Root

M=0

N=2631.5

Value

F Value

Num DF

Den DF

Pr > F

0.28621943
0.85695903
2.02031257
1.77732087

400.06
287.27
531.74
1872.59

20
20
20
5

17463
21072
11576
5268

<.0001
<.0001
<.0001
<.0001

NOTE: F Statistic for Roy's Greatest Root is an upper bound.

Appendix
PROC IMPORT OUT= WORK.swarnanaman
DATAFILE= "C:\Users\imt\Desktop\Nandita_SAS\housedataset.xls
x"
DBMS=EXCEL REPLACE;
RANGE="housedataset$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
PROC univariate plot normal;
var price sqft_living sqft_lot sqft_above sqft_basement;
run;
data Reg;
set Swarnanaman;
run;
proc reg data = Reg;
model bedrooms bathrooms floors view condition
sqft_lot sqft_above sqft_basement;
mtest / details print;
run;
quit;

= price sqft_living

Vous aimerez peut-être aussi