Vous êtes sur la page 1sur 2

DISCOVERING STATISTICS USING SPSS THIRD EDITION BY ANDY FIELD From page 197 If the squared differences in a regression

on analysis are large, the line is not representative of the data and if the differences are small, the line is representative. The F-ratio is a measure of how much the model has improved the prediction of the outcome compared to the level of inaccuracy of the model. If a model (e.g. linear regression is called a model, multiple regression is also a model etc.) if a model is good, then we expect the improvement in prediction due to the model to be large (so, the model mean square will be large) and the difference between the model and the observed data to be small (so, model square of the residual will be small). In short, a good model should have a large F-ratio (greater than 1 at least) because the top of the equation will be bigger than the bottom. If the standard error is very small, it means that, most samples are likely to have a b-value similar to the one in our sample (because there is little variation across samples). The t-test tells us whether the b-value is different from 0 relative to the variation in b-values across samples. Parameters (variable quantity determining outcome)(a fact or circumstance that restricts how something is done or what can be done) In a multiple regression, multiple R is the correlation between the observed values of Y and the values of Y predicted by the multiple regression model. Therefore, large values of the multiple R represent a large correlation between the predicted and observed values of the outcome. A multiple R of 1 represents a situation in which the model perfectly predicts the observed data. As such, multiple R is a gauge of how well the model predicts the observed data. It follows that the resulting R square can be interpreted the same way as simple regression. It is the amount of variation in the outcome variable that is accounted for by the model. An outlier is a case that differs substantially from the main trend of the data. Outliers can cause your model to be biased because they affect the values of the estimated regression coefficients and thus changes the regression model. The outlier can affect the score of another predictor. How do we detect an outlier is to look at the score that is most different from the group. The differences between the values of the outcome predicted by the model and the values of the outcome observed in the sample are known as the residuals. These residuals represent the errors present in the model. If a model fit the sample data well, then all residuals will be small (if the model is a perfect fit of the sample data all data point will fall on the regression line then all residuals will be zero). If a model is a poor fit of the sample, then the residuals will be large. If any cases stand out as having a large residual, then they could be outliers page 215-216. Standardized residuals are use to determine the degree of a large or small residual. (1)A standardized residual with a size of for example 3.29 (which will be considered as 3, will considered as high. (2) If more than 1% of our sample cases have standardized residuals with an absolute value greater than 2.58 (we say 2.5) there is evidence that the level of error within our model is unacceptable ( the model is a fairly poor fit of the sample data; (3) if more than 5% of the cases have standardized residuals with an absolute value greater than 1.96 ( we say 2 for convenience) then there is also evidence that the model is a poor representation of the actual data.

It is possible to run the regression with a case included and then rerun the analysis with that case excluded. If we did this, undoubtedly there would be some differences between the b coefficients in the two regressions equations. These differences would tell us how much influence a particular case has on the parameters of the regression model. CROSS VALIDATION OF THE MODEL whereas R (raise to the power 2) tells us how much of the variance in Y is accounted for by the regression model from our sample, the adjusted value tells us how much variance in Y would be accounted for if the model had been derived from the population from which the sample was taken. ABOUT LISWISE AND PAIRWISE on page 231 listwise means if a person has a missing value for any variable, then they are excluded from the analysis. Pairwise means that if a participant has a score missing for a particular variable, then their data is excluded only from calculations involving the variable for which they have no score. Listwise is preferred to pairwise.

Vous aimerez peut-être aussi