Académique Documents
Professionnel Documents
Culture Documents
Rev 10
Agenda
1.
2.
3.
Regression Modeling
Regression fundamentals Significance of model terms Confidence intervals
88
86 84 82 80 78 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 time order
Comparison of Treatments
Population A
Population C Population B
Sample A
Sample B
Sample C
Pool these (across different conditions) to get estimate of common within group variance:
If the treatments in fact have different means, then sT2 estimates something larger:
11
degrees of freedom
mean square
F0
Pr(F0)
12
Example: Anova
A 11 10 12 B 10 8 6 C 12 10 11
12
10 8 6 A (t = 1) B (t = 2) C (t = 3)
Count 3 3 3
Sum 33 24 33
Average Variance 11 1 8 4 11 1
SS 18 12 30
df 2 6 8
MS 9 2
F 4.5
P-value 0.064
F crit 5.14
13
Where t is the treatment mean (for treatment type t) And t is the treatment effect With ti being zero mean normal residuals ~N(0, 02) Checks
Plot residuals against time order Examine distribution of residuals: should be IID, Normal Plot residuals vs. estimates Plot residuals vs. other variables of interest
14
Assumes that the effects from the two variables are additive
15
Effect Tests
Source Tube Gas Nparm 1 2 DF Sum of Squares 1 150.00 2 1200.00 F Ratio 10.71 42.85
7 36 2 13 44 18
20 20 20 20 20 20
-5 5
-5 -5 5 5
2 -2
1 -3 -1 3
16
17
18
Think of this as the fraction of squared deviations (from the grand average) in the data which is captured by the model
Adjusted R2
For fair comparison between models with different numbers of coefficients, an alternative is often used
19
Regression Fundamentals
Use least square error as measure of goodness to estimate coefficients in a model One parameter model:
Model form Squared error Estimation using normal equations Estimate of experimental error Precision of estimate: variance in b Confidence interval for Analysis of variance: significance of b Lack of fit vs. pure error
Polynomial regression
20
21
22
Why?
23
Analysis of variance
Test hypothesis: If confidence interval for includes 0, then not significant
24
Example Regression
Age 8 22 35 40 57 73 78 Income 6.16 9.88 14.35 24.06 30.34 32.17 42.18 Whole Model
Analysis of Variance
Source DF Sum of Squares Model 1 8836.6440 Error 8 64.6695 C. Total 9 8901.3135 Tested against reduced model: Y=0 Mean Square 8836.64 8.08 F Ratio 1093.146 Prob > F <.0001
Parameter Estimates
Term Intercept age Source age Zeroed Estimate 0 0.500983 DF 1 Std Error 0 0.015152 t Ratio . 33.06 F Ratio 1093.146 Prob>|t| . <.0001 Prob > F <.0001
87
98
income Leverage Residuals
43.23
48.76
Effect Tests
Nparm 1 Sum of Squares 8836.6440
50
40 30 20 10
Note that this simple model assumes an intercept of zero model must go through origin
0 25 50 75 100 age Leverage, P<.0001
25
26
27
Our confidence interval on y widens as we get further from the center of our data!
28
Polynomial Regression
We may believe that a higher order model structure applies. Polynomial forms are also linear in the coefficients and can be fit with least squares
Curvature included through x2 term
29
30
Source
Model
Sum of squares
SM = 67,428.6
Degrees of freedom
2
Mean square
67,404.1 24.5 85.8
{ {
1 1 4 4
Residual Total
SR = 686.4
SL = 659.40 SE = 27.0
8 10
ST = 68,115.0
31
Source
Model
Sum of squares
mean 67,404.1 SM = 68,071.8 extra for linear 24.5 extra for quadratic 643.2 SR = 43.2
Mean square
67,404.1 24.5 643.2
1 1 1 3 4
Residual Total
SL = 16.2 SE = 27.0
7 10
ST = 68,115.0
32
Regression Statistics Multiple R 0.968 R Square 0.936 Adjusted R Square 0.918 Standard Error 2.541 Observations 10 ANOVA df Regression Residual Total 2 7 9 SS MS F Significance F 665.706 332.853 51.555 6.48E-05 45.194 6.456 710.9 Standard Lower Upper P-value Error t Stat 95% 95% 5.618 6.347 0.0004 22.373 48.942 0.558 9.431 3.1E-05 3.943 6.582 0.013 -9.966 2.2E-05 -0.158 -0.097
Intercept x x^2
33
Polynomial Regression
Analysis of Variance
Source Model Error C. Total
Source Lack Of Fit Pure Error Total Error
DF 2 7 9
Lack Of Fit
DF 3 4 7 Sum of Squares Mean Square 18.193829 6.0646 27.000000 6.7500 45.193829
Summary of Fit
RSquare
RSquare Adj Root Mean Sq Error Mean of Response
Parameter Estimates
Term Intercept x x*x Estimate 35.657437 5.2628956 -0.127674 Nparm 1 1 DF 1 1 Std Error 5.617927 0.558022 0.012811 t Ratio 6.35 9.43 -9.97 Prob>|t| 0.0004 <.0001 <.0001 F Ratio 88.9502 99.3151 Prob > F <.0001 <.0001
Effect Tests
Source x x*x Sum of Squares 574.28553 641.20451
34
Summary
Comparison of Treatments ANOVA Multivariate Analysis of Variance Regression Modeling
Next Time
Time Series Models Forecasting
35
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.