Vous êtes sur la page 1sur 7

84456945.

doc

Page 1 of 7

Dependent variable: The variable that being predicted or estimated.


Independent variable: The variable that provides the basis for estimation. Scatter Diagram: The diagrammatic way of representing bivariate data. Correlation coefficient: Measures strength of the linear relationship between two variables. The sample correlation coefficient for the n pairs ( x1 , y1 ),( x2 , y2 ),...,( xn , yn ) is ( xi x )( yi y ) r= , 1 r 1 , r =1 if and only if all ( xi , yi ) pairs lie on a straight line with ( xi x ) 2 ( y i y ) 2 positive slope, and r = -1 if and only all ( xi , yi ) pairs lie on a straight line with negative slope. Regression: measures the probable movement of one variable in terms of other. General form of a linear equation: y = + 1 x1 + 2 x 2 + + k x k + , (y- dependent and xs-independent variables). The regression equation for a set of n data points (with one independent variable) is y = a + bx , ( xi x )( yi y ) and a = y bx where, b = ( xi x ) 2 Coefficient of determination: r 2 is the percentage of total variation in the observed y-values that are explained by the regression. 0 r2 1; Values of r 2 near zero indicate that the regression equation is not very useful for making predictions and values of r 2 near 1 indicate that the regression equation is extremely useful for making predictions. We add more relevant independent variables in the linear regression model when r 2 is low. Multiple linear regression: y = + 1 x1 + 2 x 2 + + k x k + Example 1: x 2 4 6 8 y 5 6 8 1 0 x y xd 2 5 4 6 6 8 8 10 1012 1213 1416
Sum

1 0 1 2
xd
2

1 2 1 3

1 4 9
2

Questions:

= x x yd = y y

yd

( xd )( yd )

-6 -4 -2 0 2 4 6

-5 -4 -2 0 2 3 6

5670 Interpretations: a = 0.625002 , when x is zero, estimated value of y on average is 0.625002 unit. b = 0.910714 , if x increases by 1 unit then y increases on average by 0.910714 unit. r 2 = 0.988222 , 98.82% total variation in y is explained by the independent variable x (only 1.18% is left unexplained, which is due to other variables). The model fits very well. Prediction:
y = 3.630359

36 16 4 0 4 16 36 112

25 16 4 0 4 9 36 94

30 16 4 0 4 12 36 102

r=

102 (112 )( 94 )

= 0.994093

b=

102 = 0.910714 112

a = 7 0.910714 8 = 0.625002

Estimated Reggression Eq. is


y = 0.625002 + 0.910714 x

unit, when x=3.3 units


Page 1 of 7

/opt/scribd/conversion/tmp/scratch6285/84456945.doc

84456945.doc

Page 2 of 7

Scatter diagrabm between x and y


20 Y-variable 15 10 5 0 2 4 6 8 X-variable 10 12 14

Calculation by Calculator (for exam):

Zero c orrelation, r =0
Zer o cor r el ati on, r = . 60 12 10 8 6 4 2 0

C orrelation, r= -.52

S catter diagram ( r= +1)

10 8 6 4 2 0 0 2
X

10 8 6 4 2 0

15 13 11 9 7 5
1 6

4
X

10

X (i ndependent var i abl e)

Perfect negative correlation Strong Moderate negative negative correlation correlation -1.00 Weak negative correlation

No correlation

Perfect positive correlation Weak Moderate Strong Positive Positive Positive Correlation Correlation Correlation

-0.50 Negative correlation

0.50 Positive correlation

1.00

/opt/scribd/conversion/tmp/scratch6285/84456945.doc

Page 2 of 7

84456945.doc

Page 3 of 7

Example 2: The owner of Maumee Motors wants to study the relationship between the age of car and its selling price. Listed below is a random sample of 12 used cars sold at Maumee motors during the last year. a. If we want to estimate selling price based on the age of the car, which variable is the dependent variable and which is the independent variable? b. Draw a scatter diagram. c. Determine the coefficient of correlation. d. Determine the coefficient of determination. e. Interpret these statistical measures. Does it surprise you that the relationship is inverse? 18. a. Determine the regression equation. b. Estimate the selling price of a 10-year old car. c. Interpret the regression equation. Selling Age (yrs) price Car X ($000) Y 1 9 8.1 2 7 6 3 11 3.6 4 12 4 5 8 5 6 7 10 7 8 7.6 8 11 8 9 10 8 10 12 6 11 6 8.6 12 6 8 8.9167 6.9083

y= 1 1 .1-8
6.87 7.83 5.91 5.43 7.35 7.83 7.35 5.91 6.39 5.43 8.30 8.30

.4 8 x

e = y- y
Residuals 1.23 -1.83 -2.31 -1.43 -2.35 2.17 0.25 2.09 1.61 0.57 0.30 -0.30

r=

( x - x ) (y - y ) = 2 2 ( ( x - x ) y - y )

- 2.9 62 =5 .9 4 . 9 4 2 25

.544

1.52 3.33 5.34 2.05 5.51 4.73 0.06 4.36 2.59 0.32 0.09 0.09

b=

) ( x - x ) (y - y = 2 ( x- x )

- 2.9 62 =5 .9 42

.479

r 2 = 02959 .

a = y - bx = 6908 +.4 9 891 7 111 . 3 7 . 6 = . 8


Reg Eqn.

y =1 . 8 .4 9x 11 - 7

/opt/scribd/conversion/tmp/scratch6285/84456945.doc

Page 3 of 7

84456945.doc

Page 4 of 7

Example 4. Associated with a job are two random variables: CPU time required (Y) and the number of disk I/O operations (X). Given the following data, compute the sample correlation coefficient. Number (X) 398 390 410 502 590 305 210 252 398 392 Time (y) 40 38 42 50 60 30 20 25 40 39 a. Draw a scatter diagram from these data. Does a linear fit seem reasonable? Assuming we wish to predict the CPU time requirement given an I/O request count, perform a linear regression:

y = a + bx

Compute point estimates of a and b as well as 90 percent confidence intervals. b. Next suppose we want to predict a value of I/O request count, given a CPU time requirement. Thus perform a linear regression of X on Y. Calculate 90 percent confidence intervals for c and d with the regression line:

x = c + dy

In both cases compute the coefficients of determination. rxy =

xi y b b x gi y g = ( xi x )2 ( yi y ) 2

Scatter diagram of Time and number


600 550 500 450 400 350 300 250 200 15 20 25 30 35 40 45 50 55 60 65 Time (X)

11593.2 ==0.998919, r 2 = 0.99784 111464.1 1208.4 xi x yi y = 11593.2 =0.104008, a = y bx =38.4-0.104008 384.7= -1.612 byx = 2 111464.1 xi x . . Thus, the estimated regression line is, y = 1612 + 0104008 x

Number (Y)

b g g b b g

Scatter Diagram between Y and X


65 55 45 35 25 15 200 250 300 350 400 450 500 550 600 Number (X)
Page 4 of 7

/opt/scribd/conversion/tmp/scratch6285/84456945.doc

Time (Y)

84456945.doc

Page 5 of 7

b. x = c + dy x == 16.29643 + 9.59384 y -------------------------------------------------------------------------------------------Example 4: The failure rate of certain electronic device is suspected to increase linearly with its temperature. Fit a least-squares linear line through the data in the following table. Table: The Failure Rate versus Temperature 55 65 75 85 95 105 55 65 75 85 95 105 1.90 1.93 1.97 2.00 2.01 2.01 1.94 1.95 1.97 2.02 2.02 2.04 1. 2. 3. 4. Draw a scatter diagram. Estimate a least squares line. Comment on the line. Determine correlation coefficient and coefficient of determination and comment on them.
Temp (F) Line Fit Plot
2.06 2.04 2.02 2 1.98 1.96 1.94 1.92 1.9 1.88
50 55 60 65 70 75 80 85 90 95 100 105 110

Temp (F) Failure rate 55 65 75 85 95 105 55 65 75 85 95 105 1.9 1.93 1.97 2 2.01 2.01 1.94 1.95 1.97 2.02 2.02 2.04

Failure rate

Temp (F)

SUMMARY OUTPUT Regression Statistics Multiple R 0.93037806 R Square 0.86560333 Adjusted R 0.85216366 Square Standard 0.01663902 Error Obstions 12 ANOVA df Regressio Residual SS MS 1 0.017831 0.017831 10 0.002769 0.000277 F 64.4066 Significance F 1.145E-05
Page 5 of 7

/opt/scribd/conversion/tmp/scratch6285/84456945.doc

84456945.doc

Page 6 of 7

Total

11 Coefficients

0.0206 Lower 95.0% 1.748165654 0.001630477 Upper 95.0% 1.85069149 0.00288381

Intercept Temp (F)

Standard t Stat P-value Lower 95% Upper 95% Error 1.79942857 0.023007 78.21204 2.85E-15 1.7481657 1.85069149 0.00225714 0.000281 8.025373 1.15E-05 0.0016305 0.00288381

Example 5.: An economist is interested in the relationship between the disposable income of a family and the amount of money spent annually on food. For a preliminary study. the economist takes a random sample of eight middle-income families of the same size father. mother, two children). The results are as follows, where x denotes disposable income. in thousands of dollars. and y denotes food expenditure. in hundreds of taka. x y 30 36 27 20 16 24 19 25 55 60 42 40 37 26 39 43

a. Determine the regression equation for the data. b. Graph the regression equation and the data points. c. Describe the apparent relationship between disposable income and annual food expenditure. d. What does the slope of the regression line represent in terms of disposable income and annual food expenditure? e. Use the regression equation to predict the annual food expenditure of a family with a disposable income of Tk25000. f. Identify the predictor and response variables. g. Discuss the graphical implication implications of the value of r. h. Determine and interpret the value of r. Example: A department store gives in-service training to its salesmen which are followed by a test. It is considering whether it should terminate the services of any salesman who does not do well in the test. The following data gives the test scores and sales made by the salesmen during a certain period.
Test Scores Sales (Thousand Tk) 15 32 20 37 25 49 22 38 27 51 23 46 16 33 21 41 20 39

i) Compute the correlation coefficient between the test scores and the sales. ii) Does it indicate that the termination of services of low test scores is justified? iii) If the firm wants a minimum sales of Taka 55000, what is the minimum test scores that will ensure continuation of service?
Scatter diagram
54 52 50 48 46 44 42 40 38 36 34 32 30 12 14 16 18 20 Test Scores 22 24 26 28

/opt/scribd/conversion/tmp/scratch6285/84456945.doc

Sales (Thousand Taka)

Page 6 of 7

84456945.doc

Page 7 of 7

Analysis by SPSS: Open SPSS

Creating Variable Names


In the "SPSS Data Editor" window, click on the "Variable View" tab at the bottom of the screen. In this view variables and their definitions are listed as rows. Click in the first cell under the "Name" column. Type X in the cell. Press the Down Arrow to move to the next cell under "Name". Repeat the procedure for Y. Entering the data In the "SPSS Data Editor" window, click on the "Data View" tab at the bottom of the screen. In this view the variables are the columns and the cases (subjects) are the rows. Click in the cell for the first variable (X) and the first case. Type 2 . Press the Tab key to transfer the number into the cell and move to the next cell. Continue to enter data for all the cases. Saving the data file Click File. Click Save . On the "Save Data As" window, type a:yourname Click SAVE. SPSS will add a .sav extension.

Analysis
Click Analyze. Click Correlate. Click Bivariate. Select the variables X and Y by highlighting them and clicking the Right Arrow.

Click OK.
Plot:

/opt/scribd/conversion/tmp/scratch6285/84456945.doc

Page 7 of 7

Vous aimerez peut-être aussi