Y: response variable(dependent variable) x: explanatory(independent) ß0: Intercept ß1: slopes E: random error term
Simple Linear Regression: 𝑦̂ = 𝑏𝑜 + 𝑏1𝑥1 where y-hat: predicted variable bo: point estimate of Bo b1 is point estimate of B1
We must follow these steps to ensure a viable SLR:
1. Create a scatterplot and infer the slope. Can be positive, negative, or no relationship. If there is no relationship, STOP! We picked the wrong x! 2. Correlation Matrix: measure of strength of relationship between x and y. −1 < 𝑟 < +1 |R|≥ 70% If positive scatterplot, r will be positive. If negative, neg. r 3. Finding Determination Coeff 𝑅 2 : Percentage of variability being explained by x 0 < 𝑅2 < 1 Where 𝑅 2 ≥ 70% ***WILL USE MULTIPLE R if for multiple! 4. Standard Error: SE of estimate “standard deviation” Standard error is the variability of the actual values about the regression line. SE should have at least one less digit compared to the mean of y’s (or y-bar). Ex: mean of y-bar=20 and SE is 2, therefore OK! 5. Significant F: Validity of regression model. Ho: Regression model with all x’s is not valid model (don’t use it) Reject Ho if p value is less than alpha. Ex: .01898 < .05 would reject Ho 6. P -Value of Independent value (individual x’s) Use the P value column Intercept ----- X ----- Focus: must be less than alpha Ho: 𝛽1 = 0 (𝑛𝑜 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝) Ha: 𝛽1 ≠ 0 (𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑥 𝑎𝑛𝑑 𝑦) 7. Set up Confidece Interval:
CI: 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑉𝑎𝑙𝑢𝑒 ± 2 ∗ 𝑆𝐸
Differences in Multiple Regression:
1. Examine Adjusted R squared instead of R squared 2. Special Cases: Multicollinearity: high correlation between independent variable Heteroskedasticity: variance of error or residuals increase Autocorrelation: successive value of dependent variable is correlated Regression model does not provide accurate predictions if we have these special cases.