Vous êtes sur la page 1sur 2

INFERENCES FOR BIVARIATE DATA and REGRESSION COEFFICIENTS

CORRELATION COEFFICIENT (r) measures the degree of relationship between two (or more) sets of data.
- -1.00 r 1.00
- Positive r indicates direct relationship; Negative r indicates inverse relationship.
- / r / = 1.00 means perfect relationship
- r = 0.00 means no relationship

a) Pearson Product Moment Correlation Coefficient

b) Spearman Correlation Coefficient

- used for two independent sets of data in interval / ratio


- may also be used for dichotomous data (nominal).

rp =

[N X

N XY X Y
2

][

( X ) N Y 2 ( Y )
2

- used for two variables in interval / ratio, which are skewed.


- used for two independent sets of data in ordinal measurement.

rs = 1

D2

N(N2 1)

- for inferences, a significant relationship between the two variables exist if the null hypothesis (r = 0.00) is rejected.
Ho: no significant relationship
Ha: has significant relationship

Note that Pearson-Product Moment Correlation Coefficient assumes normally distributed data.
If two variables in interval/ratio are skewed, use Spearman correlation coefficient.
If two dichotomous data are skewed, use Chi-square test of association,
For df = 1, use Chi-square test of association with Yates correction, where

( O E 0.5)

x2 =

or Fishers exact, where p = ( A + B )!(C + D )!( A + C )!(B + D )!


A! B!C! D! N !

SIMPLE LINEAR REGRESSION


- used to estimate the dependent variable Y for given set of independent variable X.

Y = a + bX + or

Y = 0 + 1X + ; where

0 = y 1 x , and 1 =
- Test of significance of
1 0
t=

( xy) x y
n x ( x )
2

may be performed to determine if 1 = 0

, with df = n 2
s y2 x
(x x )2
- a linear relationship (linearity) exists between Y and if the p-value of 1 (using t-test) <
- R is the proportion of the total variance (s) of Y that can be explained by the linear regression of Y on x.

MULTIPLE REGRESSION

Y = 0 + 1X1 + 1X1 + + kXk +

- a linear relationship (linearity) exists between y and xk if the p-value of the k (using individual t-tests of ANOVA) < .
Diagnostic checking of the regression models may be applied by checking if:

the residuals are normally distributed (by using Chi-square test of Normality)

the residuals have constant variance (by using Bartletts test)

EXAMPLES:
1. sodiumbp.xls contains randomly selected individuals daily sodium intake and their systolic blood pressure readings. Can the
researcher conclude, at 5% level of significance, that theres a significant relationship between sodium intake and blood
pressure?
Hypotheses:
Ho:
________________________________________________________________________________
Ha:

________________________________________________________________________________

Statistical Test to Use: _____________________


Correlation coefficient: _____

Test Statistic: __________ Critical Value: __________

p-value:

___________

Conclusion:

Decision:

___________

___________________________________________________

2.

Assuming that the data in sodiumbp.xls is not from randomly selected individuals, can the researcher conclude, at 5% level
of significance, that theres a significant relationship between sodium intake and blood pressure?
Statistical Test to Use: _____________________
Correlation coefficient: _____

3.

Test Statistic: __________ Critical Value: __________

p-value:

___________

Conclusion:

Decision:

___________

___________________________________________________

The oc.xls shows the results of a survey done regarding the use of oral contraceptives (1 = never used & 2 = used) and
incidence of ovarian cancer (1 = no cancer & 2 = has cancer) on randomly selected patients. Is the incidence of ovarian
cancer related to the use of oral contraceptives? Test at = 0.05.
Hypotheses:
Ho:
Ha:

_____________________________________________________________________
_____________________________________________________________________

Correlation coefficient: _____

4.

p-value:

___________

Conclusion:

Decision:

___________

___________________________________

Using sodiumbp.xls, put up a regression model that best fits the data, with blood pressure as the dependent variable. Test at
= 0.05 if the linearity exists between the two variables.
Hypotheses:
Ho:
_____________________________________________________________________
Ha:

_____________________________________________________________________

Correlation coefficient: _____


p-value:

___________

Conclusion:

Decision:

___________

___________________________________

Estimate the blood pressure at 95% confidence interval of a patient with a sodium intake = 7.7
Answer: ____________________

5.

Suppose that a researcher wants to investigate the factors that determine heights. He gathered the heights of individuals and
its parents height as well. The results are at heights.xls. Do a regression analysis and put the necessary values below:
2
What is the value of R ?

_____________________

What does the value of the R2 imply?

___________________________________________________

Find the regression equation.

___________________________________________________

Testing at = 0.05,
are the heights of the son and the heights of their father linearly related (Yes/No)?
are the heights of the son and the heights of their mother linearly related (Yes/No)?

_____
_____

are the heights of the son and the heights of their taller grandfather linearly related (Yes / No)? _____