Académique Documents
Professionnel Documents
Culture Documents
By: Ayush Sharma 09 Mickey Haldia 19 Prerna Makhijani 29 Sanoj George 39 Sushant Jaggi 49 Nitish Dorle 59
Example
Population on Farm (in millions) 32.1 30.5 24.4 23.0 19.1 15.6 12.5
Scatter Plot
Population(in millions)
35 30 25 20 15 10 5 0 1930 Poplation(in millions)
1940
1950
1960
1970
It is a measure of strength of the linear relationship between two variables and is calculated using the following formula:
Interpretation
Coefficient of Determination
Squaring the correlation coefficient (r) gives us the percent variation in the y-variable that is described by the variation in the x-variable To relate x and y, the Regression Equation is calculated using Least Squares technique. Regression Equation: Y = a +bX Slope of the regression line:
Regression
1940
Year 1950
1960
1970
Interpretation
We conclude that 98.7% of the decrease in farm population can be explained by timeline progression. Theoretically, population is a dependent variable (y-axis) and timeline is an independent variable (x-axis).
Error
MSE and co-efficient of determination (r2) does not provide a good measure of accuracy when the sample size is small In this case, it is necessary to test the model for significance Linear Model is given by,
Y=
0
1X
Null Hypothesis :If 1 = 0, then there is no linear relationship between X and Y Alternate Hypothesis : If 1 0, then there is a linear relationship
Steps in Hypothesis Test for a Significant Regression Model Specify null and alternative hypothesis. 2. Select the level of significance ( ). Common values are between 0.01 and 0.05 3. Calculate the value of the test statistic using the formula: F = MSR/MSE 4. Make a decision using one of the following methods: a) Reject if Fcalculated > Ftable b) Reject if p-value <
1.
Selling Price ($) 95000 119000 124800 135000 142800 145000 159000 165000 182000 183000 200000 211000 215000 219000
Suare Footage 1926 2069 1720 1396 1706 1847 1950 2323 2285 3752 2300 2525 3800 1740
AGE 30 40 30 15 32 38 27 30 26 35 18 17 40 12
Condition GOOD Excellent Excellent GOOD Mint Mint Mint Excellent Mint GOOD GOOD GOOD Excellent Mint
SUMMARY OUTPUT
Multiple R R Square
ANOVA
SS
MS
Significance F 0.002178765
The p-values are used to test the individual variables for significance
Upper 95% Lower 95.0% Upper 95.0%
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Intercept SF AGE
Indicator Variable Assigned a value of 1 if a particular condition is met, 0 otherwise The number of dummy variables must equal one less than the number of categories of a qualitative variable The Jenny Wilson realty example : X3= 1 for excellent condition = 0 otherwise X4= 1 for mint condition = 0 otherwise
Selling Price Suare Footage ($) 95000 1926 119000 2069 124800 1720 135000 1396 142800 1706 145000 1847 159000 1950 165000 2323 182000 2285 183000 3752 200000 2300 211000 2525 215000 3800 219000 1740
ANOVA
AGE 30 40 30 15 32 38 27 30 26 35 18 17 40 12
X3(Exc.) 0 1 1 0 0 0 0 1 0 0 0 0 1 0
X4(Mint) 0 0 0 0 1 1 1 0 1 0 0 0 0 1
Condition GOOD Excellent Excellent GOOD Mint Mint Mint Excellent Mint GOOD GOOD GOOD Excellent Mint
Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.94762 0.89798 0.85264 14987.6 14
The coefficients of age is negative, indicating that the price decreases as a house gets older
df SS 4 9 13 17794427451 2021641120 19816068571 MS 4E+09 2E+08 F
Significance F 0.000174421
19.8044
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
Upper 95.0%
Model Building
The value of r2 can never decrease when more variables are added to the model Adjusted r2 often used to determine if an additional independent variable is beneficial
The adjusted r2 is
A variable should not be added to the model if it causes the adjusted r2 to decrease
Multiple Regression
Sales/Decision to buy = B0+ B1* Price
Sales/Decision to buy = B0+ B1* (Price)3+ B2*(Design)2+B3*(Performance) L = (Price)3 M = (Design)2 N = (Performance)
Pitfalls In Regression
A High Correlation does not mean one variable is causing a change in another (Some regressions have shown a significantly positive relation between individuals' college GPA and future salary. )
Values of the dependent variable should not be used that are above or below the ones from the sample
The number of independent variables that should be used in the model is limited by the number of observations.