Vous êtes sur la page 1sur 7

1. Please fill in this table as the first and second rows.

Correlation
coefficient
Pearson
scale by scale

Sperman Rho
scale by ordinal

Kendall's tau_a
ordinal by ordinal

Kendall's tau_b
ordinal by ordinal
22 tables
Kendall's tau_c
ordinal by ordinal
33 tables

Assumptions
1.linear relationship between x and y
2.continuous random variables
3. both variables must be normally
distributed
4. x and y must be independent of each
other
1. Variables are measured on an
ordinal, interval or ratio scale.
2. Variables need NOT be normally
distributed.
3. There is a monotonic
relationship between the two variables,
i.e. either the variables increase in
value together or as one variable value
increases the other variable value
decreases.
4. This type of correlation is NOT very
sensitive to outliers.
Tau-a statistic tests the strength of
association of the cross tabulations.
Both variables have to be ordinal. Taua will not make any adjustment for ties.

example

comments

-Test score in science and test score in


Math
- age and blood pressure
- amount of time spent studying for
an exam (X) in hours and the score
that person makes on an exam (Y)
A teacher is interested in those who do
the best at English also do better in
Maths (assessed by exam) students in
English are also the best performers in
Maths. She records the scores of her
10 students as they performed in endof-year examinations for both English
and Maths.

1. The linear correlation coefficient


is sometimes referred to as the
Pearson product moment
correlation coefficient in honor of its
developer Karl Pearson.

a tutor's ranking of ten clinical


psychology students as to their
suitability for their career and their
knowledge of psychology

1. A tau test is a nonparametric hypothesis


test which uses the coefficient
to test for statistical
dependence.
2. Kendalls Tau is equivalent
to Spearmans Rho, with
regard to the underlying
assumptions, but
Spearmans Rho and
Kendalls Tau are not

Tau-b statistic, unlike tau-a, makes


adjustments for ties and is suitable
for square tables.( 2-by-2 tables)
Tau-c differs from tau-b as in being
more suitable for rectangular tables
than for square tables.
1

1. A nonparametric version of
the Pearson product-moment
correlation.
2. Spearmans is the more
widely used than Kendall

identical in magnitude, since


their underlying logic and
computational formulae are
quite different.
3. If the agreement between
the two rankings is perfect
and the two rankings are the
same, the coefficient has
value 1.
Eta ()
Correlation Ratio

1. The correlation ratio defines the


relationship or the association which is
perfect in nature as a curvilinear
relationship and the null relationship as the
statistical independence.
The researcher should keep in mind that
the perfect association as curvilinear
depicts that the correlation ratio is not
affected with the order of the classes of the
categorical variable.
2. The correlation ratio assumes
asymmetry, or, one can say that it is
asymmetric in nature. In other words,
unlike Pearsons correlation, the researcher
will get different values for the coefficient
depending upon the type of independent
and dependent variables.
3. The correlation ratio cannot
prove causal direction like other types of
correlations and associations; however, it
can measure the level of causal direction.
It is for this reason the correlation ratio
does not have any sign and only varies
from zero to one.

Measures of effect size in ANOVA are


measures of the degree of association
between and effect (e.g., a main effect,
an interaction, a linear contrast) and
the dependent variable. They can be
thought of as the correlation between
an effect and the dependent variable.

This statistic is interpreted


similar to the Pearson, but cant
be negative.
2. Correlation ratio is a
coefficient of non linear
association. In the case of linear
relationships, the correlation
ratio that is denoted by eta
becomes the correlation
coefficient. In the case of non
linear relationships, the value of
the correlation ratio is greater,
and therefore the difference
between the correlation ratio and
the correlation coefficient refers
to the degree of the extent of the
non-linearity of relationship.
3. eta-square ( 2) can be
interpreted in exactly the same
way as r2. Therefore an 2
= .367 means that 36.7% of the
variability in the dependent
variable can be explained or
accounted for by, the
independent variable.

Contingency
coefficient
Nominal by nominal
RxC tables

Phi ( )
Nominal by nominal
22 tables

If we have N observations with two


variables where each observation can
be classified into one of R mutually
exclusive categories for variable one
and one of C mutually exclusive
categories for variable two, then a
cross-tabulation of the data results in a
two-way contingency table (also
referred to as an RxC contingency
table). The resulting contingency table
has R rows and C columns.
- measures the strength of relationship
between two dichotomous variables.
- Any type of variable that can be
classified 1 and 0 can use the phi
coefficient.

Biserial
Nominal by scale

is computed between one interval or ratio


variable and one dichotomous variable
(was interval and change to be
dichotomous) . The term biserial refers
to the fact that there are two groups of
persons (X= 0,1) being observed on the
continuous variable (Y).

Point Biserial
Nominal by scale

is computed between one interval or


ratio variable and one dichotomous
variable.

Rank Biserial
ordinal by scale

The rank biserial correlation


coefficient is much like the point
biserial, except that it uses an ordinal
variable in place of an interval/ratio
variable.

1. Correlation between number of


study hours and grade point average in
first year seminary students.
2. A common question with regards to
a two-way contingency table is
whether we have independence. By
independence, we mean that the row
and column variables are unassociated
(i.e., knowing the value of the row
variable will not help us predict the
value of column variable).
A study of marital status and attrition
rate in college might arbitrarily
assign a 1 to married and 0 to not
married; a 1 to dropped out and a
0 to remaining in school.
test correlations of attitude scores or
test scores of subjects between high
anxious and low anxious level
(using interval measurement to test the
anxious level then categorizing the
students according to cretin criterion.
test correlations of attitude scores or
test scores of subjects between
haves and have-nots: preschoolers
who have had a specified early
education program and those who
havent.
The coefficient measures degree of
relationship between a dichotomous
condition (1,0) and a ranking.

The rejection of H0, however,


only establishes the existence
of a statistical association: it
does not measure its
strength.

Phi is a special case of


Contingency coefficient

Other coefficients:
Lambda
This is a measure of association for cross tabulations of nominal-level variables. It measures
the percentage improvement in predictability of the dependent variable (row variable or
column variable), given the value of the other variable (column variable or row variable).

Goodman Kruskal Gamma


Another non-parametric measure of correlation is Goodman Kruskal Gamma ( which is
based on the difference between concordant pairs (C) and discordant pairs (D). Gamma is
computed as follows:
= (C-D)/(C+D)

Thus, Gamma is the surplus of concordant pairs over discordant pairs, as a percentage of all
pairs, ignoring ties. Gamma defines perfect association as weak monotonicity. Under
statistical independence, Gammawill be 0, but it can be 0 at other times as well (whenever
concordant minus discordant pairs are 0).

Gamma is a symmetric measure and computes the same coefficient value, regardless of which is the
independent (column) variable. Its value ranges between +1 to 1.
In terms of the underlying assumptions, Gamma is equivalent to Spearmans Rho or Kendalls Tau; but in
terms of its interpretation and computation, it is more similar to Kendalls Tau than
Spearmans Rho.Gamma statistic is, however, preferable to Spearmans Rho and Kandalls Tau, when
the data contain many tied observations

Tetrachoric Correlation Coefficient


The tetrachoric correlation coefficient, rtet, is used when both variables are dichotomous, like
the phi, but we need also to be able to assume both variables really are continuous and
normally distributed. Thus it is applied to ordinal vs. ordinal data which has this
characteristic. Ranks are discrete so in this manner it differs from the Spearman. The
formula involves a trigonometric function called cosine. The cosine function, in its simpliest
form, is the ratio of two side lengths in a right triangle, specifically, the side adjacent to the
reference angle divided by the length of the hypotenuse. The formula is: rtet = cos (180/(1 +
sqrt(BC/AD)).
2. What is the determination coefficient and when you can use it?
The coefficient of determination is a measure of how well the regression line represents the data. If
the regression line passes exactly through every point on the scatter plot, it would be able to explain
all of the variation. The further the line is away from the points, the less it is able to explain.
The coefficient of determination, r 2, is useful because it gives the proportion of the variance
(fluctuation) of one variable that is predictable from the other variable. It is a measure that allows us
5

to determine how certain one can be in making predictions from a certain model/graph. The coefficient
of determination is the ratio of the explained variation to the total variation. The coefficient of
determination is such that 0 < r 2 < 1, and denotes the strength of the linear association between x
and y.
A more meaningful approach, in determining the importance of a correlation coefficient, is the
coefficient of determination (r). By squaring the correlation coefficient, one obtains a measure of the
common variance between two variables, the proportion of variance accounted for in one of the
variables, or explained by, the other. If the correlation between marital satisfaction and number
of months married is 0.40, then 16% of the variance (-.40 x -.40 = .16) of one variable is accounted
for by the variance of the other. We could say that 16% of the variability in marital satisfaction and
number-of-months-married overlaps. It follows that 84% of the variability is unaccounted for.
In an education study, the results shows that the r2 value of r=0.07 is 0.0049, or 0.49%: one-half of
one percent of variance accounted for. Ninety-nine and three-fourths percent (99.51%) of variance was
unaccounted for. This was a meaningless significant finding to be sure.

3. Complete the following table


Variable 1
Variable 2
Interval/Ratio

Ordinal

Nominal

Dichotomous

Interval/Ratio

Pearson

Spearman*

Point Biserial

Point Biserial

Ordinal

Spearman*

spearman

Rank Biserial

Rank Biserial

Nominal

Point Biserial

Rank Biserial

Dichotomous

Point Biserial

Rank Biserial

*requires interval/ratio data to be ranked

Contingency coefficient
Phi

Vous aimerez peut-être aussi