Vous êtes sur la page 1sur 6

POLYNOMIAL AND MULTIPLE REGRESSION Polynomial Regression Polynomial regression used to fit nonlinear (e.g.

. curvilinear) data into a least squares linear regression model. It is a form of linear regression that allows one to predict a single y variable by decomposing the x variable into a nth order polynomial.

Generally has the form: y = a + 1x + 2x2 + 3x3..... kxk In polynomial regression, different powers of the x variable are successively added to the equation to see if they increase r2 significantly. Note, as successive powers of x are added to the equation, the best fit line changes shape. y = a + 1x (a straight line) y = a + 1x + 2x2 (this quadratic equation produces a parabola) y = a + 1x + 2x2 + 3x3 (this cubic equation produces a s-shaped curve)

An examination of a scatter plot of the data should help you decide to what power to use. For a single curvilinear line use the quadratic equation. If the curve has two bends, you'll need a cubic model. Curves with multiple kinks need even higher-order terms. It's rare to go past a quadratic. Significance tests should be conducted to test if the polynomial regression explains a significant amount of the variance. One conducts hypothesis tests at each step to see if the increase in r2 is significantly different. Thus you test the difference in the r2 value between the lower power and next higher power equations asking the question: does an additional term significantly increase the r2 value? Begin by testing if the data are linear. If so, stop (no polynomial regression is necessary). If the linear regression is insignificant, compare the r2 for the linear versus the r2 for the quadratic equation. Continue adding powers and doing tests until significance is reached.

H 0 : ri2 = rj2 H1 : ri2 < rj2


Where i refers to the lower power equation and j refers to the higher power equation. This test is done using the F test statistic where:

F = df j

rj2 " ri2 1 " rj2

F is distributed with j degrees of freedom in the numerator and n-j-1 degrees of freedom in the denominator where j is the power of the higher order polynomial.

Warning. With the addition of successive powers of x, the r2 will no doubt increase. As the powers of x (k or the order of the polynomial equation) approaches n-1, the equation will begin to give a "perfect fit" by connecting all the data points. This is not what one wants! Best to generally stay within the quadratic (x2) or cubic (x3) orders
2

Multiple Regression (Least Squares) Multiple regression allows one to predict a single variable from a set of k additional variables where k = m-1 Generally has the form: y = a + 1x1 + 2x2 ..... kxk a = initial intercept = partial regression coefficients for variables x1 .... xk Can get the matrix of coefficients (the 's) by the equation B=AC where ... B = vector of regression coefficients A = matrix of sums of squares and cross products (based on mean deviations) C = a vector of sums of cross products of y
A B C 2 " " 1 % (x1 " (x1 x k % " (x1 y % ' $ $ ' $ ' ! ! # ! ! $ ' = ) $ ' $ ' $ ' $ ' 2 ' $ ( x x " ( x ( x y # k& # k 1 k & # k &

To get the initial intercept (a) plug in the means for y and each x, then solve! for a a = y " 1 x1 " 2 x 2 ..... " k x k The equation values for each regressed object onto the line equals: y! i =a+1xi1 + 2xi2 .... kxik + ei

The error term (ei) for each yi can be found (yi- yi ) The 's may be difficult to interpret (but may be thought of as weights or loadings). In matrix form, Y=XB+E where ... Y is a vector of independent variables X is the matrix of dependent variables B is the column vector of regression coefficients E is the column vector of residuals (error terms) B is determined to minimize the sums of squares of the residuals and corresponds to finding the best fitting k-dimensional hyper plane in m dimensional space. The least-squares approach minimized the sum of the squared distances parallel to the y-axis Significant tests on multiple regression H0: 0 = 1 = 2 ... k = 0 H1: at least one of the coefficients ! 0 Statistics are the same as with bivariate regression and analysis of variance. Partition the variance into three different sums of squares

SS r =

yi ! y) "(

SS D =

y1) "( yi !
r

SST = SSr + SSD

Variance explained by regression = S 2 = SS r / k Unexplained variance =


t

S2 D

= SS D / n - k - 1

Total variance = S 2 = SS t / n - 1 note: the ratio SSr/SSt = R2 coefficient


4

( )
where R = the multiple correlation

Use F-test to see if the regression explains a significant proportion of the variance, i.e. the multiple regression coefficients = 0. Observed value of F can be obtained by: S2 F= r S2
D

and should be tested against a critical value of F from a table for a desired ! and k df in numerator and n-k-1 df in denominator. Each individual regression coefficient can also be tested by the t-test by calculating the standard error for each variable. For each coefficient calculate the standard error (seb) and then calculated t value. for coefficient j = s eb = S2 ! a" 1
j

jj

where a is element j in the inverses matrix of sums of squares and cross products

-1 jj

t=

j ! 0 s eb
j

test the observed t-value for each coefficient (which is distributed with n-k-1 degrees of freedom

e.g. Variables measured on trilobites


Sample 1 2 3 4 5 6 7 8 9 10 L (Y) 23 24.2 24.9 24.65 24.75 25.5 25.3 24.75 25.05 25.15 W(X1) 6.2 6.45 6.95 6.75 7 7.15 7.35 6.85 6.85 6.85 VARIABLES GSL(X2) 4.05 4.3 3.9 4.1 4 4.1 4.05 4.1 4.35 4.1 GA(X3) 1.3 1.25 1.15 1.2 1.2 1.2 1.3 1.2 1.35 1.25 TH(X4) 5.05 5.25 5.7 5.65 5.3 5.6 5.85 5.55 5.5 5.65

Vous aimerez peut-être aussi