Vous êtes sur la page 1sur 9

Chapter 24

Analysing Quantitative Data Correlation Using a Pearson r


Phillip Patman

What does correlation do?


Correlation analyses allow us to assess both the strength and the direction of a linear relationship between two variables. There are a number of different kinds of statistical techniques that will correlate variables but the most commonly used correlation statistic is Pearsons r. Pearsons r is used when both variables hold data at the interval level or ratio level but in the large data sets commonly used in social research, such as the AuSSA, ratio level data is relatively rare. Instead we tend to have a lot of data from likert-type items and although they dont strictly meet data level criteria in the social sciences, it is common practice to treat data from likert-type items as continuous variables (see chapter 6 for explanations of likert-type items) in correlations. If both of the variables used in the correlation are likert-type items, what you are exploring is whether people have responded to one question in the same way to another question. That is, if they have selected a 5 on a 5-point likert-type scale, does that response relate to how they answered another similar or dissimilar question? Correlation can let us know not only if there is a pattern of similarity within a respondents answer choice, but also if the responses are patterned in a dissimilar way. The correlation matrix, outlined below, is the guide to how the variables differ.

Understanding correlation
To conduct correlation analysis you need two continuous (also known as ratio or scale) variables. You can also correlate one dichotomous (bivariate) variable and one continuous (scale) variable. For example, sexmale/femalecould be correlated with height. Correlation can also be used for ordinal (rank) data, using Spearmans rank correlation or Kendalls tau instead

Part 5: Other Social Research Methods

of the Pearson correlation. If you want to treat likert-type scale data in a conservative manner you could also use Spearmans rank correlation instead of the Pearson correlation to test associations between scale items. Correlation is exploring the relationship between one variable and another but it does not explore or assume a causal relationship. The question correlation answers is: does variable A relate or associate with variable B. An important caution regarding its use, then, is that there is no assumed cause and effect in correlation. Correlation does not suggest that variable A might cause the change in variable B or that that variable B might cause a change in variable A. Other tests are needed to answer that question.
A Positive relationship as A increases B increases or B increases A increases A Negative relationsip as A increases B decreases or B increases A decreases

Figure 24.1: Correlative relationships between variables

The results can range from: positive one (1), which is a complete match between the variables: as one variable goes up, the other goes up at the same rate negative one (1), which is a complete mismatch between the variables: as one variable goes down, the other goes down at the same rate zero (0), which indicates no relationship at all between the variables: as one variable goes up or down the other goes up or down randomly.

The slope of the line determines the relationship between the variables. The test assumes the data has a linear relationship and a normal distribution and does not have problems with skewness, kurtosis or outliers. Outliers are scores or responses on items that fall at the outer edges of a distribution and can create problems when a mean is used, as they can change the mean value drastically.

Social Research Methods

Chapter 24: Analysing Quantitative Data Correlation Using a Pearson r

Skewness is seen when there is a tail in the distribution and where a zero value shows a normal distribution, a positive result is skewed (has a tail) to the left and a negative result (has a tail) to the right (see Pallant 2007; Tabachnick & Fidell 2007).

Negative skew

Zero skew

Positive skew

Figure 24.2: Skewed distributions

Kurtosis is the peakedness of the distribution. Zero shows a normal distribution and a positive result shows the distribution as clustered in the centrea high peak; and a negative result shows a flat distribution and may indicate extreme values (outliers). A scatterplot can also be used to look at the distribution and the ideal has a roughly cigar shape to the plot.

Positive kurtosis

Zero kurtosis

Negative kurtosis

Figure 24.3: Distributions showing kurtosis

Other ways to check for normality in distribution is to look at scatterplots, histograms and boxplots (see Pallant 2007; Tabachnick & Fidell 2007). Boxplots are useful when outliers appear in the data as each outlier is identified with a case number and decisions about those cases are then easier to make. Outliers can distort any of the parametric tests, and create either higher or lower results depending on which end of the distribution the outliers lie. Most parametric tests are reasonably robust and have large sample sizes, so skewness and kurtosis are less of a problem when the number of respondents is large. Tabachnick and Fidell (2007:80) suggest that at a sample size of more than 200, kurtosis underestimates of variance disappear.

Conducting a correlation
Conducting a Pearson r correlation in SPSS is a straightforward process. The complexity is in ensuring that all the assumptions of Pearson r are met before you proceed, or at least

Phillip Patman

Part 5: Other Social Research Methods

before you attempt to interpret your output. SPSS will attempt to undertake any task you ask it to; just because output is generated does not mean that the statistics are valid.

Figure 24.4: The Analyze menu


One-tailed test (also known as a directional test): tests only one end of the bell curve or distribution and assumes the researcher knows which end of the distribution needs to be checked. Two-tailed test: tests both ends of the distribution and is routinely used in most statistical tests.

The first step in conducting a Pearson r correlation between two variables is to select the Correlate option from the Analyze tool list. From this list select Bivariatein this example, we are correlating two variables. This screen is shown in figure 24.4. As shown in figure 24.5, missing variables are excluded from the analysis pairwise which means a missing response about, for example, government spending on health will not be included in any correlations involving this variable.

Social Research Methods

Chapter 24: Analysing Quantitative Data Correlation Using a Pearson r

On this screen, we then add the variables we want to correlate and select Pearson r from our options. In figure 24.5, four items relating to government spending are crossanalysed against each other, two at a time, and the results are laid out in the correlations output table. Using the arrows transfer two or more variables into the Variables window and select Options if needed.

Pearsons: Used for scale (interval or ratio) data. If the data is not normally distributed use Spearman or Kendalls tau-b. Kendalls tau-b: used for ordinal data. Spearman: used for ordinal data and more often reported than Kendalls tau-b.

Figure 24.5: Bivariate Correlations screen

The default settings are shown in the Options window and other analysis can be selected if required, including means and standard deviations, which can be useful but can be generated elsewhere within SPSS.

Phillip Patman

Part 5: Other Social Research Methods

Now examine the correlation table generated from this analysis. The correlation output has each variable tested against all other selected variables and shown figure 24.6. The lines show that there is a mirror image above and below the grey area (not normally grey) and lines indicate this mirroring in some of the cells. The grey results are the questions correlated to themselves and therefore must be a perfect match or 1.0.
Correlations
Govt spending b. Health Govt spending b. Health Govt spending c. The police and law enforcement Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N 2701.000 .373** .000 2656 .420** .000 2677 .126** .000 2646 2669.000 .250** .000 2652 .351** .000 2626 2687.000 .027 .164 2646 2655.000 1.000 Govt spending c. The police and law enforcement .373** .000 2656 1.000 Govt spending d. Education .420** .000 2677 .250** .000 2652 1.000 Govt spending e. The military and defence .126** .000 2646 .351** .000 2646 .027 .164 2646 1.000

Govt spending d. Pearson Correlation Education Sig. (2-tailed) N Govt spending e. The military and defence Pearson Correlation Sig. (2-tailed) N

** Correlation is significant at the 0.01 level (2-tailed)

Figure 24.6: Correlation table

Each cell in the table has a similar result where the first number is the correlation coefficient, the second the significance result and the last is the number of people. The number of people can vary from variable to variable depending on how many people have responded to the questions in the analysis.
Govt spending c. The police and law enforcement .373** .000 2656 Correlation result Significance result Number in the sample

Figure 24.7: Correlation result

There are many different interpretations but Pallant (2007:132) suggests that correlation results can be interpreted as follows: r = 0.10 to 0.29 small correlation r = 0.30 to 0.49 medium correlation r = 0.50 to 1.00 high correlation.

Social Research Methods

Chapter 24: Analysing Quantitative Data Correlation Using a Pearson r

Notation at the bottom of the output can be interpreted as follows: no star: result is not significant one star (*): result is significant at the 0.05 level two stars (**): result is significant at the 0.01 level three stars (***): result is significant at the 0.001 level.

Interpreting Pearson r correlations


For the correlation in the table, the result of correlating responses on government spending on health and government spending on police and law enforcement indicate that the there is a statistically significant correlation between the pattern of responses for these two variables. Using Pallants interpretation we can say that for the government spending on health variable, there is a small to medium positive correlation to the government spending on police and law enforcement variable (r = 0.373, p < 0.001). That is, those who are more likely to support more government spending on health are also more likely to support increased spending on police and law enforcement and the strength of this relationship is small to medium.
Table 24.1: Correlating responses on government spending Correlation
Health to police and law Health to education Health to military and defence 0.370 0.420 0.126 * * *

Multiplied by itself
0.370 0.420 0.126

Result
0.137 0.176 0.016 * * *

Multiplied by 100
100 100 100

Percentage of variance
13.7 17.6 1.6

Looking at the other correlations in the table, we can see that there is a medium positive correlation between spending on health and spending on education (r = 0.420, p < 0.001) and a low positive correlation to spending on military and defence (r = 0.126, p < 0.001) and all results are significant at the 0.001 level. From these results, a calculation of the amount of variance or overlap between the variables can be performed. We can see that health has more overlap with police and law and education than military and defence. But how can this be interpreted? One explanation that makes intuitive sense would be that those who want more spending on health also want increased government spending in other areas, but are more likely to want more spending on education than on funding for the police and military.

Phillip Patman

Part 5: Other Social Research Methods

Exercise 24.1: Pearson r correlation with AuSSA data


Correlation analyses both the strength and the direction of a linear relationship between two variables using ratio or interval level data. The most common correlation used in social research is Pearsons r. For this exercise, we are interested in seeing if a respondents level of education (as measured in the variable R: How many years of education have you completed?) is correlated with their attitudes on a range of items. First, conduct a Pearson r correlation task using these variables:

. .

M3: R: How many years of education have you completed? H1e: Level of Agreement with the statement: Aboriginal people who no longer follow traditional lifestyles are not really Aboriginal.

Question
Is there a statistically significant correlation between the number of years of education a respondent has and their attitudes towards Aboriginal identity and Aboriginal traditional lifestyles? How strong is this relationship and in what direction is the correlation? (Remember, to interpret this correlation matrix, you need to check on the variable screen what the direction of the level of agreement is on the likert-type scale.)

What does this mean in words?

How might you explain or theorise the relationship between these two variables?

Run four other correlations you think might be interesting and report and interpret the results.

Now have a play with the data and see what you can find. Remember, this is nationally representative data so any results can be generalised to the Australian population overall.

Social Research Methods

Chapter 24: Analysing Quantitative Data Correlation Using a Pearson r

References
Pallant, J. (2007). SPSS Survival Analysis. Crows Nest: Allen & Unwin. Tabachnick, B. G. and Fidell, L. S. (2007). Using Multivariate Statistics. Boston: Pearson.

Phillip Patman

Vous aimerez peut-être aussi