Vous êtes sur la page 1sur 175

PREST

Practitioner Research and Evaluation Skills Training in Open and Distance Learning
MODULE

Getting and analysing quantitative data

A3

The PREST training resources aim to help open and distance learning practitioners develop and extend their research and evaluation skills.They can be used on a self-study basis or by training providers.The resources consist of two sets of materials: a six-module foundation course in research and evaluation skills and six handbooks in specific research areas of ODL.There is an accompanying user guide. A full list appears on the back cover. The print-based materials are freely downloadable from the Commonwealth of Learning (COL) website (www.col.org/prest). Providers wishing to print and bind copies can apply for camera-ready copy which includes colour covers (info@col.org).They were developed by the International Research Foundation for Open Learning (www.irfol.ac.uk) on behalf of COL.

The PREST core team


Charlotte Creed (Programme coordinator) Richard Freeman (Instructional designer, editor and author) Bernadette Robinson (Academic editor and author) Alan Woodley (Academic editor and author)

Additional members
Terry Allsop (Critical reviewer) Alicia Fentiman (Basic education adviser) Graham Hiles (Page layout) Helen Lentell (Commonwealth of Learning Training Programme Manager) Santosh Panda (External academic editor) Reehana Raza (Higher education adviser)

Steering group
The PREST programme has been guided by a distinguished international steering group including: Peter Cookson, Raj Dhanarajan,Tony Dodds,Terry Evans, Olugbemiro Jegede, David Murphy, Evie Nonyongo, Santosh Panda and Hilary Perraton.

Acknowledgements
We are particularly grateful to Hilary Perraton and Raj Dhanarajan who originally conceived of the PREST programme and have supported the project throughout. Among those to whom we are indebted for support, information and ideas are Honor Carter, Kate Crofts, John Daniel, Nick Gao, Jenny Glennie, Keith Harry, Colin Latchem, Lydia Meister, Roger Mills, Sanjaya Mishra, Ros Morpeth, Rod Tyrer, Paul West and Dave Wilson. In developing the materials, we have drawn inspiration from the lead provided by Roger Mitton in his handbook, Mitton, R. 1982 Practical research in distance education, Cambridge: International Extension College.

Handbook A3: Getting and analysing quantitative data


Author: Alan Woodley Critical reviewers: Richard Freeman, Santosh Panda and Bernadette Robinson. 2004 Commonwealth of Learning ISBN 1-894975-13-8 Permission is granted for use by third parties on condition that attribution to COL is retained and that their use is strictly for non-commercial purposes and not for resale. Training providers wishing to version the materials must follow COL's rules on copyright matters.

Permissions
See the last page of the module.

Contents
Getting and analysing quantitative data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Aims of the module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Module objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Module organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Unit 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Unit overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 The rules of quantitative methods and how to apply them: an introductory case study . . . . .6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 Feedback to selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 Unit 2:What do we mean by quantitative methods? . . . . . . . . . . . . . . . . . . . . . .17 Unit overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 Which questions can be answered with a quantitative approach? . . . . . . . . . . . . . . . . . . . . . . .18 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 The rest of the module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 Unit 3: Analysing other peoples data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 Unit overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 Raw numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24 Percentages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 Excel for beginners: calculating totals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26 The open schools case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45 Feedback to selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46 Unit 4: Quantitative institutional data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 Unit overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 Types of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 Types of institutional data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59 Dealing with quantitative data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59 Summarising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60 Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65 Spread, dispersion and deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68 Are we getting more young students? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73 Patterns and trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .86 Feedback to selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87 Unit 5: Doing institutional research from scratch . . . . . . . . . . . . . . . . . . . . . . . .89 Unit overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .90 Validity and reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .90

Commonwealth of Learning

iii

Module A3 Getting and analysing quantitative data

The dimensions of data collection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91 The range of quantitative research methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94 Designing good questions to ask people . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94 Guidelines for good question writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97 Forms of questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100 Designing good questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108 Designing for disability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113 Carrying out a survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .125 Feedback to selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .126 Unit 6: Analysing your research results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .133 Unit overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .133 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .133 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134 Exploring relationships using correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .155 Looking back and looking forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .163 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .164 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .165 Feedback to selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .166 Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .171

iv

Practitioner research and evaluation skills training in open and distance learning

Getting and analysing quantitative data

MODULE

A3

Module overview
You have seen in the earlier modules that there are two broad approaches to collecting research data: qualitative methods and quantitative methods.This module looks at the latter type.Through studying the module you will gain an overview of some of the key issues in collecting and analysing quantitative data. You will also learn some of the common methods of statistical analysis of numerical data, although this is not a module on statistical methods in general that is far too large a topic to treat here. Instead, I have concentrated on showing you how to use some of the basic analytical tools that you can find in Microsoft Excel. During the module you will explore and learn about: deciding what data you need in order to answer given research questions interpreting quantitative data types of quantitative data methods of summarising quantitative data methods of describing patterns and trends in data methods of collecting quantitative data, including questionnaire design some methods of analysing quantitative data.

Aims of the module


The overall aim of this module is to introduce you to the concepts and techniques of quantitative research methods in the context of open and distance learning. A second aim is to demonstrate that facts involving quantitative data are socially constructed and to examine the underlying processes involved. Our third aim is to enable you to analyse other peoples data and to appreciate how and why secondary analysis of external quantitative data

Commonwealth of Learning

Module A3 Getting and analysing quantitative data

needs to be done with care if you are to extract meaningful information from such data. Our fourth aim is to enable you to analyse quantitative institutional data (e.g. data on students, their courses and their marks) that already exist in order to extract meaning and information from that data, using methods such as averages, measures of spread and trend analysis. Our final aim is to help you develop the skills of doing institutional research from scratch.You will look at how to decide what data to collect, which methods to use and how to design your data collection instruments so that they will yield valid and reliable results.

Module objectives
When you have worked through this module, you should be able to: 1 Identify the sorts of research questions that can be answered by a quantitative approach. 2 Calculate percentages as a means of comparing data. 3 Calculate averages as a means of summarising data. 4 Calculate some common measures of dispersion for data. 5 Explain the ideas of validity and reliability and identify methods of maximising these in your research. 6 Design effective instruments for collecting quantitative data.

Module organisation
The module is structured is in seven parts: this introduction and six units, as follows. This introduction: (1 hr) Unit 1: Introduction (2 hrs) Unit 2: What do we mean by quantitative data? (1 hr) Unit 3: Analysing other peoples data (10 hrs) Unit 4: Quantitative institutional data (9 hrs) Unit 5: Doing institutional data from scratch (9 hrs) Unit 6: Analysing your research results (9 hrs) Each unit is made up of the following components: an introductory paragraph or two that provide an overview of the unit, its focus and outcomes

Practitioner research and evaluation skills training in open and distance learning

Overview

a range of activities for you to engage in, many based on Excel spreadsheets unit summary feedback on your responses to the questions or problems posed in each activity.

Requirements
In order to make your learning more interactive, and hence more efficient, we have included lots of practical exercises.These use numerical data and we take you through all of the necessary calculations. We do not assume any great mathematical knowledge. If you understand the concepts of addition, subtraction, multiplication and division then you should be able to manage. However the activities will use the general piece of office software called Microsoft Excel.This is normally built into desktop computers as standard nowadays. It will be possible to work through the module without Excel, but you are strongly advised to get access to it.

Excel
To carry the activities out exactly as laid down you will need access to Excel Version 4 or above. However you should be able to open the data in earlier versions and still carry out the exercises. Some of the instructions might have to be interpreted and adapted. If you are fairly experienced in Excel, you will be able to carry out the tasks relatively quickly. For novice Excel users we have tried to spell out what you need to do. If you get stuck you can turn to the Model worksheet in each workbook where we have carried out all of the stages.The letters in brackets, e.g. (W1) tell you where to look on the relevant Model sheet. In the Excel instructions, an arrow is used as shorthand for click and drag. So the instruction Edit Paste means that you click and hold on the menu item Edit at the top of the screen, then drag or scroll down to Paste and then release the mouse button.

Commonwealth of Learning

Module A3 Getting and analysing quantitative data

Resources
The following resources are used in this module:
Resource Name when referred to in our text Women M101 Summarising Analysing 1 Analysing 2 Correlation Location Resources File Resources File Resources File Resources File Resources File Resources File

Excel workbook Women


Excel workbook M101 Excel workbook Summarising Excel workbook Analysing 1 Excel workbook Analysing 2 Excel workbook Correlation

Teaching boxes
We have used two types of teaching boxes in the text as follows:
Statistical note:

Provides additional explanations of some of the statistical terms and methods that I discuss.
Excel note:

Provides additional information and explanation on how to use Excel.

Practitioner research and evaluation skills training in open and distance learning

Introduction

U N I T

1
Unit overview
We live in an age where it is claimed that policy decisions are based on facts rather than hunches, prejudices and rumours.This is commonly referred to as evidence-based decision-making.The field of education is no exception, where it is felt that this evidence is needed both to decide between policy options and also to evaluate the success or failure of past policies. Furthermore, this evidence tends to be facts and figures or statistics. I think it is fairly uncontentious to say that managers, bosses, civil servants, politicians prefer quantitative data to qualitative. Faced with an argumentative audience they like to be armed with charts, spreadsheets and survey results. This unit will introduce you to: some of the difficulties of deciding just what a fact is some of the issues that arise when we try to describe a system or a situation using statistical data.

Learning outcomes
When you have worked through this unit, you should be able to: 1 Discuss the difficulties of deciding just what a fact is. 2 Explain why all knowledge can be seen to be provisional. 3 Illustrate some of the difficulties of using statistical data to describe a situation such as course enrolments in an ODL institution. I want you to begin with a fairly light-hearted activity to get you thinking about facts and figures

Activity 1

10 mins

Which of these statements are facts and why? 1 Paris is the capital city of France. 2 The author of this module is 21 years old and 2 metres (approximately 6 ft 7 ins) tall. 3 Mount Everest is the worlds tallest mountain at 8848 metres high.

Commonwealth of Learning

Module A3 Getting and analysing quantitative data

4 In 2003 The Sukhothai Thammathirat Open University (STOU) in Thailand, with over 300,000 students, was the biggest university in the world.

The feedback to this activity is at the end of the unit One thing that we can conclude from this last activity is that good research should tell you something about precision levels and how much confidence you should place in the results.
Statistical note: Accuracy and rounding Numbers are frequently rounded before being made public. For example, the number 7.98 may be rounded up to 8 and the number 632 may be rounded down to 600. In the case of Everest the height has been rounded to the nearest metre. The actual result would have been somewhere between 8847.50001 and 8848.49999. The measurement method may have been such that this was the greatest level of accuracy that could have been claimed. The actual result may have been 8848.234632 and this was then rounded down by the scientists because they knew that the instrument used was not really that accurate. For example, on a hotter day it may have given a slightly higher reading. In reality they have probably made lots of measurements and then calculated a middle or average value which they then published as their best estimate. (Statistical averages will be covered later in this module.) You will frequently see results being presented in social research that exaggerate the accuracy of the measurement tool. Do not be impressed if the results say that 10.256% of students disliked the course when in reality it refers to a survey of 100 students where 19 replied and only 2 gave this answer.

The provisional nature of knowledge


Karl Popper, a 20th century philosopher (but one who repeated and expanded on the ideas of his predecessors) maintained that a theory could never be completely verified. Even if it has been rigorously tested over a long period of time, the most that one could say was that the theory has received a high measure of corroboration. It can be provisionally retained as the best available theory until it is finally falsified (if indeed it is ever falsified), and/or is superseded by a better theory. In my opinion, a strong case can be made for the view that all knowledge is provisional in the Popperian sense. However, I think it is undeniable that all knowledge, including facts and figures, is socially constructed.This has led people like Henry Ford (the founder of the US car company) to proclaim that there are lies, damn, lies and statistics. Many people in the general public share his feelings, namely that figures can be manipulated to prove anything and that they should not be trusted. I do not take such a negative, cynical view. My position is more a sceptical, questioning one:

Practitioner research and evaluation skills training in open and distance learning

Unit 1: Introduction

while all knowledge is provisional, some forms of knowledge can be relied upon more than others the trust one places in a piece of knowledge depends to a great extent upon whether appropriate research methods have been used and how clearly these have been documented arguments in favour of quantitative methods rather than qualitative methods, or vice versa, will never be conclusive in general, the truth about a social situation is best approached by a combination of methods, both quantitative and qualitative there are certain rules for constructing and presenting knowledge one must be cautious when interpreting knowledge. Now lets move on to learning the rules of quantitative methods and how to apply them.

The rules of quantitative methods and how to apply them: an introductory case study
Rather than beginning with a long list of the whys?, hows? and whens? of quantitative research, I am going to begin with a case study in open and distance learning that I hope you will find realistic. I thought that this would be more interesting for you, and more like the reality of practitioner research. At this stage I will not go into the details of the statistical techniques that I am using.The important things at the moment are the processes involved in doing this type of research.

Case studies
The term case study is used in many areas and has several meanings. For example, it can mean the selection and in-depth study of a single case perhaps one particular ODL institution. By looking intensely at one, complex example it is intended to give an insight into the context of a problem as well as illustrating the main point. Here we are using case study in the sense of a student-centred activity based on a topic that demonstrates theoretical concepts in an applied setting. In this case study we want you to imagine that you are Abida Quuyaam. As we explained in Module 1 and the User Guide, Abida is a researcher at Auranzeb Open University (AOU) and has been working there since she completed her degree in sociology six years ago. She is part of the Evaluation and Research Group (ERG) at the university where she works as a junior researcher at the unit. Besides herself, there is another junior researcher, a senior researcher and a director.Their mandate is everything and anything the Vice-Chancellor deems necessary to be investigated, from compiling statistics for different

Commonwealth of Learning

Module A3 Getting and analysing quantitative data

government departments, evaluations of programmes and research projects that come from abroad. The Vice-chancellor wants to present findings to the Minister of Education to show how successful AOU has been in enrolling students on its teacher training programme. He gives her the data on enrolments on the programme for the last three semesters and asks the ERG to produce a graph to illustrate the rise in numbers (Table 1).
Table 1 Enrolments on the teacher training programme at AOU (Semesters 8 to 10) Semester 8 Number of enrolments 94 Semester 9 197 Semester 10 320

Abida goes ahead and constructs the graph shown in Figure 1and submits it to the Vice-chancellor. He, being an astute person, suggests that it be re-drawn as in Figure 2 because he feels that this shows the growth in numbers better.The second graph is then sent off to the Ministry.
Figure 1 Enrolments on the teacher training programme at AOU (Semesters 810)
350

300

250

200

150

100

50

0 Semester8 Semester9 Semester10

Practitioner research and evaluation skills training in open and distance learning

Unit 1: Introduction

Figure 2 Enrolments on the teacher training programme at AOU (Semesters 8-10)


350

300

250

200

150

100

50

0 Semester 8 Semester 9 Semester 10

Activity 2

10 mins

Imagine that you are the Minister of Education. 1 What would you make of Abidas graph (Figure 2)? 2 Would you be impressed by AOUs performance in teacher training?

The feedback to this activity is at the end of the unit


Statistical note: Graphs Graphs (or charts as they tend to be called in Excel) are important and powerful tools. The horizontal line is called the x-axis and tends to be used for the categories being looked at such as age, gender, semester, etc. The vertical line is called the y-axis and tends to be used for the numbers in each category. The plural of axis is axes. There are generally accepted rules for the shape of graphs. The default shape produced by Excel is usually in line with these rules.

Commonwealth of Learning

Module A3 Getting and analysing quantitative data

The role of the researcher in presenting data


Now, what about Abida? Did she behave appropriately? Well as a junior researcher she is probably not in a very powerful position.To some extent she has to do exactly what she is told. However, I believe that it is part of Abidas professional role to raise questions. She might only be able to raise them with her line manager, who might block them or produce convincing counter-arguments. On the other hand her ideas might go further up the chain and actually effect what goes to the Minister. Lets proceed by suggesting some of the questions Abida might have asked. What should have been her concerns? Just like the Minister, she should have been concerned about constructing a graph based on just three data points. Lets imagine that in this case she was able to access the information for the previous eight semesters and that the full data set is in Table 2.
Table 2 Enrolments on the teacher training programme at AOU (Semesters 1 to 10) Semester Enrolments 1 97 2 122 3 285 4 133 5 137 6 144 7 122 8 94 9 197 10 320

The numbers from Table 2 are plotted in Figure 3.


Figure 3 Enrolments on the teacher training programme at AOU (Semesters 1-10)
350

300

250

200

150

100

50

0 1 2 3 4 5 Semester 6 7 8 9 10

Activity 3

3 mins

What does Figure 3 tell you about the growth of enrolments over the ten semesters?

The feedback to this activity is at the end of the unit

10

Practitioner research and evaluation skills training in open and distance learning

Unit 1: Introduction

In order to discover underlying trends Abida could use statistical techniques to smooth out this graph.These tools attempt to remove random fluctuations in the data.This topic is covered in more detail later in this module. Here we just illustrate the concept in Figure 4 where we have added the linear trend line that is the straight line that best represents all of the data mathematically.
Figure 4 Enrolments on the teacher training programme at AOU with trend line added (Semesters 110)
350

300

250

Linear trend
200

150

100

50

Activity 4

3 mins

What does the trend line in Figure 4 tell you about the growth of enrolments over the ten semesters?

The feedback to this activity is at the end of the unit

Other factors behind the enrolment pattern


Abida might also have the time, the intellectual curiosity and the necessary data, to investigate other factors that lie behind the enrolment patterns. Lets say that the programme consisted of three courses. Course A Curriculum Design run in semesters 1, 2, 3, 9 and 10. Course B Developmental Psychology run through semesters 3 to 10. Course C Teaching Methods only started in semester 10.The enrolments for each course in each semester are shown in Table 3.

Commonwealth of Learning

11

Module A3 Getting and analysing quantitative data

Table 3 Enrolments on the teacher training programme at AOU (Semesters 1 to 10) Semester Course A Course B Course C Total Average enrolments per course 97 97 122 122 285 143 133 133 137 137 144 144 122 122 94 94 197 99 1 97 2 122 3 127 158 133 137 144 122 94 4 5 6 7 8 9 97 100 10 98 112 110 320 107

Now, if we graph these numbers (Figure 5) we see that each course is different.
Figure 5 Enrolments on the teacher training programme at AOU (Semesters 110)
180

160

140

120

100

Course A Course B Course C

80

60

40

20

0 1 2 3 4 5 6 7 8 9 10 Semester

Activity 5

5 mins

Describe the different enrolment patterns of three courses, as shown in Figure 5.

The feedback to this activity is at the end of the unit Looked at in this way we can see that the peak in enrolments in Semester 3 was because Course B started then with a lot of students.The peak in Semester 10 occurred because it was the only semester in which all three courses were running. In fact if you calculate the average number of enrolments per course (these number are shown in Table 3) then you can see that there were six semesters that had a higher number of enrolments per course than Semester 10. Growth at AOU does not seem to be due to

12

Practitioner research and evaluation skills training in open and distance learning

Unit 1: Introduction

greater student demand for each course, but merely the provision of more courses.

Summary
In the case study we have been looking at quantitative data. We have not infringed any methodological rules (except, perhaps, when we squeezed Figure 2) but we have come up with some different stories and explanations. This shows that little if anything is self-evident from quantitative data. It has to be assembled, arranged, presented and interpreted by people or socially constructed. Secondly, how it is done can depend a great deal on the skills and interests of the researcher. Thirdly, the context of the data is important. Abida has shown that by adding in data from more semesters and breaking it down into courses, we gain a richer understanding and a more sceptical view of enrolment growth. However, if the Vice-chancellor knows that there are more courses coming on stream, and that they seem to attract about one hundred students each, then his optimism may well be justified. Finally, there are clearly issues of power involved when it comes to decisions about what questions are asked, what data is collected, and how it is analysed, presented and interpreted.

Feedback to selected activities

Feedback to Activity 1
1 Paris is the capital city of France.

This seems like a fact to me because that is what my book on France says. (Pedantically you could say that it is a partial fact.There is a town in the USA called Paris that is not the capital of France.) So facts dont necessarily contain figures.
2 Alan Woodley is 21 years old and 2 metres (approximately 6 ft 7 ins) tall.

You would only have to meet me to see that this is not true. So the obvious point is that figures do not necessarily constitute facts. For them to be facts they have to be correct.

Commonwealth of Learning

13

Module A3 Getting and analysing quantitative data

3 Mount Everest is the worlds tallest mountain at 8848 metres high.

Well I looked this up on my computer using GOOGLE, the web-based search engine. In the first reference that I went to, it said that Everest was 8848 metres high, so I was fairly sure that this was a fact, but: it is not a very precise fact. All we know here is that, to the nearest metre, the height is 8848.You could measure a pile of books with your ruler more precisely than that! However, this apparent lack of precision does not necessarily mean bad research the scientists may have much more precise results but have chosen to present them in a simplified fashion they should not be presenting results to several decimal places if the measurement techniques do not justify it. Strictly speaking, even if this fact was accurate to several decimal places, it should have a date attached to it! Apparently Mount Everest is very slowly getting taller at the rate of two inches (5 centimetres) per year as the geological movements that created the mountain range continue to force it upwards. I have trusted what I have found on my computer but people can put up any facts they want to on a website.You need to know whether the source is reliable, or you should check several sources. In this instance I went to a second website and it said that the latest estimate using satellite technology was 8872 metres. A case could be made that the tallest mountain isnt Everest but Hawaiis Mauna Kea, which rises to a height of 9500 metres from the seabed. It just happens that most of it is under water and mountains are traditionally measured from sea level. (Of course, this also raises the question of sea-level how do we measure it and is it a constant?) A third contender is Chimborazo in Ecuador. Because of the equatorial bulge its peak is actually the furthest from the centre of the earth. My point is that you need to know the assumptions and definitions that lie behind the facts. There are also cultural aspects to this fact. When asked to name the tallest mountain you may get different answers depending upon where you are. In Tibet the local name for the mountain that most of us know as Everest is Chomolungma. In Nepal it is called Sagarmatha. Early British surveyors labelled it Peak XV and in 1856 Surveyor General Andrew Waugh, unaware of local names, named the mountain after his predecessor, George Everest.
4 In 2003 The Sukhothai Thammathirat Open University (STOU) in Thailand, with over 300,000 students, was the biggest university in the world

I think that this is probably a fact. I looked this up on the ICDL database, which is a reputable source that I trust (http://www-icdl.open.ac.uk/).

14

Practitioner research and evaluation skills training in open and distance learning

Unit 1: Introduction

However, given the variety of ways that students are counted in different institutions, I would like to know what definition of student they were using. For example, if a person is registered simultaneously on four courses, do they count as one or four students? Is this the number of students studying at one point in time, or the number who studied over a given time period such as a year? To compare STOU with other universities one needs a standard form of measurement. One such system is full-time equivalent students (FTEs). Many of STOUs students are studying part-time so it can be argued that they should only count as a fraction of a student.This fraction would depend upon what proportion of a full-time load they were carrying. I would also want to check out which other universities had been included and how. For example, if the Chinese Central Radio and Television University and its Regional Television Universities were considered to be one university then it might be bigger than STOU.Then there is the growing number of virtual universities such as the University of Phoenix that would need to be looked at.

Feedback to Activity 2
Well, as a busy Minister, I would be pleased to receive the information in the form of a graph. Most people find them easier to digest than tables of figures. They enable you to see patterns and trends in the data. However, I dont think that I would be convinced by a graph that only has three points on it. I would not be confident that you could extend that line into the future to predict a similar rate of growth. I would also be suspicious of the shape of the graph. It looks as though the lower axis has been shortened to make the line on the graph steeper. I would like to see some comparative figures from other institutions. Even if I was impressed by the growth in enrolments, I would like to see some figures on student progress as well.

Feedback to Activity 3
You can see immediately that the graph does not show a pattern of continuous growth over the ten semesters. In fact, there was also a peak in the third semester that was almost as big as that in the tenth semester.

Feedback to Activity 4
This trend line still indicates a general increase in enrolments, but a much smaller one than suggested by just the three most recent semesters.

Commonwealth of Learning

15

Module A3 Getting and analysing quantitative data

Feedback to Activity 5
The enrolments for Course A increased over the first 3 semesters but were quite low when it resumed in Semester 9. Course B showed a general decline over time and Course C had only got data for one semester, the most recent one.

16

Practitioner research and evaluation skills training in open and distance learning

What do we mean by quantitative methods?

U N I T

Unit overview
This unit introduces you to the idea of quantitative research methods by considering the types of questions such methods could be used to answer.

Learning outcomes
When you have worked through this unit, you should be able to: 1 Describe the basic difference between quantitative and qualitative research in terms of their outcomes. 2 List the types of questions that are suited to the quantitative approach.

Introduction
As you have seen earlier in this series, the stages of a research project are typically: Design the research question. Identify the population to be studied. Select the research tools for data collection. Collect the research data. Analyse the research data. Interpret the results. Present the results. These stages are the same, whatever the size of the project; and regardless of whether the project is essentially qualitative or quantitative. The essential difference between the qualitative and quantitative approaches is in their outputs. Put at its simplest, quantitative research is about measuring things in a way that can give meaningful numerical results. It is what researchers in the physical sciences do all of the time. Qualitative research aims for a subjective understanding of a situation using non-numerical results.

Commonwealth of Learning

17

Module A3 Getting and analysing quantitative data

However, while just about any research question in ODL could be formulated in a way that could be answered by using quantitative or qualitative methods, there are certain types of question that lend themselves more to a quantitative approach.

Which questions can be answered with a quantitative approach?


As we outlined in Module1, the range of topics and areas of inquiry that can involve institutional or practitioner research is huge. It is also the case that just about any research question in ODL could be formulated in a way that could be answered by using quantitative or qualitative approaches, or a combination of the two. However, there are certain types of question that lend themselves more to a quantitative approach and there are certainly situations where numerical answers are expected and are more appropriate. To give you some idea of the types of question that quantitative methods can be used to answer, we list some examples below.To keep things simple, they all relate to the subject of student age. For each one, we will describe how a quantitative researcher might set about the task and we will introduce some of the technical language involved. (These are the words in bold. Dont worry if you do not understand some of them.Their meaning will become clear later in the module.)

Descriptive questions
Example Purpose Source Pre-collection issues Post-collection issues How old are our students this year? To provide descriptive data. Probably our institutional database. We will need to establish working definitions. For example, do we mean all students registered at a particular date. Is it age on January 1st? We will probably need to group the data into age bands. We will use descriptive statistics such as frequencies. The data will be tabulated in order to condense and summarise the information. We will probably draw a graph or chart based on raw numbers or percentages. We may summarise the age distribution using a measure of central tendency and a measure of dispersion.

18

Practitioner research and evaluation skills training in open and distance learning

Unit 2: What do we mean by quantitative methods?

Comparative questions
Example Purpose Source How is this age distribution different from other institutions like us? To compare. e.g. in this case to compare a number of different student populations. Comparative data may be gained from the government publications, from published research, from direct collaboration with other institutions both inside and outside ones own country. We will need to decide which institutions are appropriate for comparisons. Further data manipulation may be necessary if, for example, other institutions have used different age bands. If necessary, any differences that are found between age distributions in the different institutions can be tested for statistical significance.

Pre-collection issues Post-collection issues

Trend questions
Example Purpose Source Pre-collection issues Post-collection issues Is the age distribution of our students changing over time? To identify the long-term direction in which the data is moving, e.g. is the average age growing? declining? Historical data will be needed to be extracted from the institutional database. Which should be our base year? This question requires trend analysis. Regression or other curve-fitting techniques will be used.

Relationship questions
Example Purpose Do young people perform as well as older students? To find out whether one factor (e.g. performance) seems to be linked to another factor (e.g. age). It is important to note that, even when we can show that a link exists, that does not necessarily mean that there is a causal relationship between the two factors. Source Additional institutional data will be required on student performance.This might be whether the student dropped out or not (a nominal scale variable), what position in the class they came (an ordinal scale variable) or what exam score they gained (an interval scale variable). (Age is a ratio scale variable.) Do we need to control for other variables such as previous educational qualifications which may disguise the true relationship?

Pre-collection issues

Post-collection issues Depending upon the type of performance variable used, an appropriate correlation or contingency table technique would be selected.

Commonwealth of Learning

19

Module A3 Getting and analysing quantitative data

Explaining questions
Example Purpose Source Pre-collection issues Why do so many young people drop out of our courses? To look for the reasons for an effect that we have already observed, e.g. differential drop-out rates. A postal survey using a self-completion questionnaire might be made of young students who had dropped out, asking them for their reasons. A random or stratified sample of young students who had dropped out would be drawn from the institutional database. A control group of older students who had dropped out might also be used for comparison purposes. Post-collection issues The data would be coded, edited then keyed into a computer and cleaned. The data would be analysed using frequency tables and cross-tabulations. Statistical analysis of the institutional database might be carried out to help answer this question.Various forms of multi-variate analysis could see whether this is true across the curriculum and whether age appears to be the causal factor.

Attitude questions
Example Purpose Source Pre-collection issues Do young people like studying our courses? To find out how people feel about a particular issue. A course feedback survey across the age range could be carried out as described in the previous question. What is meant by like? We need to operationalize our terms so that people can answer in numerical terms.

Post-collection issues Crosstabulations and correlational techniques could be used to see whether there is a relationship between age and attitudes to some or all of the courses.

Predictive questions
Example Purpose Source Pre-collection issues What will happen if we become more attractive to young people? What are the implications for the institution and its various subsystems if this happens? The institutional database. None.

Post-collection issues Statistical modelling techniques can be used to predict effects on course numbers in different areas, drop-out rates, the demand on financial assistance funds, etc

20

Practitioner research and evaluation skills training in open and distance learning

Unit 2: What do we mean by quantitative methods?

Summary
In this unit, you have explored some of the types of question that can be answered using quantitative techniques.

The rest of the module


This is the end of the introductory part of the module.The rest of the module has been structured in a particular way that we hope will improve your learning and retain your interest. It is in three broad sections: 1 The first is about you carrying out secondary analysis of external data. 2 The second chiefly concerns institutional data that is based on information collected on a regular basis; the third is where you collect data for a specific research purpose. 3 The third and last section will help you to plan your own study in which you wish to collect quantitative data. We introduce particular research methods and statistical techniques as we go along whenever they become relevant. The approach is meant to be very interactive.You will be guided through a set of activities using real data in a spreadsheet software package called Excel. If you do not have access to Excel, you should be able to carry out the calculations by hand, or by using a calculator. If you use a statistical software package such as SPSS, you may wish to use that instead.

Commonwealth of Learning

21

Analysing other peoples data

U N I T

3
Unit overview
This unit is designed to introduce you to: the basics of using Excel some simple methods of analysing student data using Excel.

Learning outcomes
When you have worked through this unit, you should be able to: 1 Calculate percentages with and without Excel. 2 Calculate totals using Excel. 3 Copy formulae in Excel by dragging. 4 Copy and paste values in Excel. 5 Copy and paste formats in Excel. 6 Sort data in Excel. 7 Apply these methods to a case study on enrolments and exam passes.

Introduction
While most books on research methods concentrate on how to collect and analyse your own data, a large part of your work is likely to involve dealing with data that has been collected and compiled by others. In this section we are going to look at ways to interpret and further analyse such information. We begin with a basic example for you to work through. Please remember that the actual numbers are not important here. It is more important to concentrate on the thought processes involved and the techniques that you will be able to use elsewhere.

The data
One of our pen portrait learners in Module1 and the User Guide was Abida Quuyaam, a researcher at Auranzeb Open University (AOU) She is particularly interested in improving access for women and so she is pleased when the Vice-chancellor asks for a short report on how well AOU is doing

Commonwealth of Learning

23

Module A3 Getting and analysing quantitative data

in this area compared to the rest of higher education. Abida begins by consulting a table of figures published by the Ministry of Education for 2003 that covers the countrys five universities, four teacher training colleges and two open universities.This is shown below as Table 4.
Table 4 Students registered on higher education courses by institution and gender (2003) Registered students Men (n) Women (n) University 1 University 2 University 3 University 4 University 5 Teacher Training College 1 Teacher Training College 2 Teacher Training College 3 Teacher Training College 4 AOU BOU Totals 2387 1683 1004 1287 3567 2231 1444 176 856 2879 1236 18750 1432 654 175 776 1678 2487 1987 776 1453 3556 984 15958

Raw numbers
We will start by looking at the raw numbers that means numbers that have not been processed in any way. Table 4 contains raw numbers.This is signified by the letter n at the top of each column. Sometimes numbers will contain commas to indicate thousands, millions, etc. For example, 2357854 could be written as 2,357,854. If numbers are very big they may be given in other units such as thousands. If the column heading had been n 000s, the first number would have been written as 2.387.

Activity 1

1 mins

How many women were registered at AOU in 2003?

The feedback to this activity is at the end of the unit

24

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Percentages
I think you will agree that Table 4 is a very compact way of displaying a lot of data. It tells you about the gender of thousands of students spread across eleven institutions. As the activity showed, it is easy to look up certain kinds of data. However, it is difficult to get an overall picture of what is really going on by just looking at the figures. We are going to make things much easier by using percentages. Percentages are probably the most used statistical technique in institutional and practitioner research, so you must feel comfortable with them. Percentages enable us to make comparisons between groups. Lets take the example of a group of 25 people, 5 of whom are women and 20 of whom are men (Table 5).The second column of the table shows us the raw numbers. In the third column, we have expressed the raw numbers as fractions of the total, e.g. there are 5 women out of 25 people. However, the third column still expresses the data in terms of the raw numbers. What we need is a way of showing the proportion of women in a standard way so that we can make comparisons with other groups. We do this using percentages. Fractions remain the same if you multiply the top figure and the bottom figure by the same number or divide the top and bottom figure by the same number. In the fourth column, we have multiplied both the top and the bottom number by four.This figure (20/100) is called 20 per cent (or 20%). 20% tells us the proportion of the group that is women, and it tells us in a way that gets gets us away from the raw data. It says that 20 in every 100 are women.
Percentages

Why are percentages so good? Because they allow us to make instant comparisons. Imagine that you had two groups of people. In the first there were 18 women out of a total of 72. In the second there were 123 women out of 492. Which group contains a higher proportion of women? With percentages you can say that the proportion is the same in each group, i.e. 25%.
Table 5 Expressing a group as percentages Group Women Men Totals n 5 20 25 Fraction of total 5 out of 25 20 out of 25 25 out of 25 Fraction of 100 20 out of 100 = 20% 75 out of 100 = 75% 100 out of 100 = 100%

Commonwealth of Learning

25

Module A3 Getting and analysing quantitative data

Formula for percentages


A general formula to calculate a percentage is as follows. Percentage = Num ber in group Total number 100

So, for our example, we have: Percentage of women = 5 25 100 = 20%

We are now going to use the data from Table 4 in a series of activities that will simulate the sorts of processes that Abida might go through when compiling her report on the situation of women at AOU.

Excel for beginners: Calculating totals


In this section you will work through a number of activities, that will introduce you to some of the basic techniques for using Excel.

Activity 2

5 mins

Creating a copy of a worksheet 1 Open the Excel Workbook Women see the Resources File. It should open on Sheet 1 which contains Table 4. If it does not, just click the Sheet 1 tab at the bottom of the screen. 2 Copy and paste Table 4 into Sheet 2 by following these instructions: highlight the whole of Sheet 1 by clicking on the cell in the top left hand corner that is the cell above 1 and to the left of A from the menu at the top of the screen select Edit Copy. (i.e. First select Edit and then select Copy.) click on Sheet 2 at the bottom to open it click on cell A1 from the menu select Edit Paste. You will now have a copy of the data on Sheet 2. If you corrupt it in any way you can always go back to Sheet 1 and copy and paste another copy.

There is no feedback to this activity

Excel note: Checking screen settings You will need to check your Excel screen settings in order to make sure that your screen is displaying both the Formula bar and the Standard toolbar.

26

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Formula bar Select View from the menu items at the top of the screen. When the options drop down there should be a tick by the side of Formula bar. If there is no tick, scroll down to Formula bar then release the mouse button this will add a tick. Standard toolbar Select View from the menu items at the top of the screen. When the options drop down, scroll to Toolbars. This will produce another menu where there should be a tick by the side of both Standard and Formatting. If not, drag to each in turn and release the mouse button.

Adding numbers in rows and columns


To start, we are going to calculate the number of students at each institution.

Activity 3

5 mins

Adding two columns of numbers In this activity you will insert a formula to calculate the total number of students at each of the institutions in Table 4. The process is in two steps. Step 1: Create the formula in one cell 1 Make sure that you are in Sheet 2, where you have your working copy of Sheet 1. 2 Click on cell D6 and type in the formula = B6 + C6. (This will appear in the Formula bar above the worksheet as you type.) 3 Click the green arrow to the left of the Formula bar. Square D6 should now contain 10804. (W1) (Check your answers as you go against those on the Model worksheet. These answers are numbered W1, W2, etc.) Step 2: Copy the formula to all the other cells 4 Click and drag from D6 to D17 so as to highlight the cells to be filled. (W1a). 5 Then from the menu bar use Edit Fill Down. This will calculate the total for all the institutions. (W2)

There is no feedback to this activity

Excel note: Writing your own formulae and using the sigma symbol Writing your own formulae There are quicker ways to add cells together. You can write your own formula like = SUM (C3:C45) that will produce a total for the array of cells specified.

Commonwealth of Learning

27

Module A3 Getting and analysing quantitative data

The sigma symbol Or, you can use the sigma symbol (pronounced sigma), which appears as in the Standard toolbar. In our example if you had clicked on cell D6 then clicked on the sigma symbol, Excel would have guessed and entered = SUM (B6:C6). (If it guesses wrong, you just alter it manually.) Clicking a second time and Excel will complete the sum. You can copy a formula in a given cell to a range of other cells as follows: grab the cell containing the formula by putting the cursor over the bottom right-hand corner of the cell a square with arrow heads in opposite corners will appear drag the square over the cells that you wish to fill with the same formula.

Calculating percentages
We are now going to calculate the percentage of women at each institution.

Activity 4

2 mins

Calculating a percentage We will start by calculating the percentage of women at University 1. 1 Click on cell E6 and type in = C6/D6*100. Press Enter. (W2a and W3) 2 You should see the result (19.72 ) in cell E6.

There is no feedback to this activity

Excel note: / and * You should have recognised the formula that you typed in from our earlier explanation of percentages In Excel: / means divide * means multiply.

Activity 5

5 mins

Copying your percentage formula Copying the formula 1 Enter the results for all of the other institutions by using Edit Fill Down as you did before.

28

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

2 You now have the percentages but you probably have some very long figures in the cells, since Excel does not know the degree of precision that you wish to see displayed. (W3a and W4) Specifying the degree of precision 3 We will display these percentages as whole numbers. 4 Highlight cells E6 to E17 (W4a) then Format Cells Number Decimal places Zero. This will simplify the column to whole numbers. (W5)

There is no feedback to this activity

Excel note: How Excel stores and displays numbers Excel stores number with up to 30 decimal places. Even when you display a number to zero decimal places, that number is still stored in its original form.

Conclusion
As a result of this analysis Abida is now in a position to offer preliminary findings to her Vice-chancellor. In a short note, she might say something like this: Slightly over a half (55%) of AOU students are women. This compares favourably with the overall figure of 40% for higher education generally in our country. What more could she tell the Vice-chancellor? Vice-chancellors always like to know how well they are doing as an individual institution. League tables are now a very common method of ranking institutions according to a variety of performance indicators. So, we will now try to make some comparisons between the various institutions.

Activity 6

10 mins

Pasting a copy of a block of cells When you copy a block of cells, you can choose which characteristics you wish to be copied, e.g. the values in the cells, the formulae, the formatting of the cells. In this activity you are going to copy rows 4 to 16 to a position lower down on your worksheet. In doing this, you will copy the values and the format, but no other cell characteristics. Your version of Excel may allow you to do this in one step. The instructions below are for the two-step process. Step 1: Copy rows 4-16 with their values 1 Highlight the whole of rows 4 through 16 by clicking and dragging from the 4 of Row 4 to the 16 of Row 16. (W5a)

Commonwealth of Learning

29

Module A3 Getting and analysing quantitative data

2 Click on cell A20 this is where you are going to paste the rows. 3 Click on Edit Paste special and select Values. 4 Click OK. 5 Repeat steps 2-4, but this time select Formats. You now have a copy of rows 4-16, but without their formulae.

There is no feedback to this activity

Excel note: Paste special This the first time that we have used Paste special. This is because if we had used Paste here it would have pasted in the formulae that were in the copied cells, and the specified cells would have changed and hence the results. Try Copy and Paste in a spare part of the worksheet to the right and you will see the difference. Here we just want the values (the results produced by the formulae) and the formats from the original cells.

Sorting a list of numbers


We now wish to re-order the list of institutions by their percentages of women students.The next activity shows you how to do this.

Activity 7

10 mins

Sorting the institutions Remove redundant data 1 First we will get rid of the raw data. Click/Hold on the centre of cell B20 then drag the cursor to cell D32 to highlight a rectangle of cells. (W6a) 2 Use Edit Delete to delete the data in the cells. At the same time, use the option Shift cells left. (W7) Sort the remaining data 3 Now Click/Hold on the centre of A22 then drag the cursor to cell B32 to highlight a rectangle of cells. (W7a) 4 Now use Data Sort to arrange the institutions in order by % women students. You can choose ascending or descending order. (W8) By examining the sorted data you can see that AOU was the fourth most successful institution in attracting women and it outperformed all of the other universities.

There is no feedback to this activity

30

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Activity 8

15 mins

Sorting by raw data In this activity you will again sort the institutions, but this time by number of women rather than by the percentage of women. 1 Sort the data using cells C6 to C16. 2 What do you conclude from the new rank ordering?

The feedback to this activity is at the end of the unit

More practice with percentages


Since it is important to be very comfortable with percentages, I have added an extra activity here to give you more practice.

Activity 9

15 mins

More percentages The Vice-chancellor is fairly happy with AOUs numbers of women students, but she wants three more figures. She wants to know the percentage of women students among: 1 All conventional university students (University 1-5). 2 All teacher training college students (Teacher Training Colleges 1-4). 3 All open university students (AOU and BOU). Calculate these, by hand, with a calculator or with Excel.

The feedback to this activity is at the end of the unit We are now going to move on to another example using a different set of data.You may remember from Module1 that one of our pen-portraits was Agatha who has been asked by the Minister of Education of the Republic of Nuime to look at the case for using ODL methods to provide open schooling rather than traditional classrooms in order to achieve universal schooling.The Minister wants to know the extent of the needs and whether open schooling would be an effective and cost-effective way of addressing this issue. Table 6 is a made-up, but pretty realistic, set of figures that might have been produced by an education ministry. It concerns ten open schools that we have labelled from A to J and gives the numbers of registrations and passes on the basic maths course M101 over a five-year period. (Let us imagine that M101 is a very standardised course using ODL material. No matter which open school that pupils attended, they would use exactly the same learning materials.)

Commonwealth of Learning

31

Module A3 Getting and analysing quantitative data

Table 6 Students in ten open schools on the basic maths course M101 (1998-2002) Registrations Open school A B C D E F G H I J Total 1998 286 354 122 435 143 298 467 432 306 192 3035 1999 272 322 140 432 156 304 488 411 230 222 2977 2000 298 377 123 422 167 325 476 439 210 216 3053 2001 324 381 124 411 169 329 506 409 208 218 3078 2002 355 385 125 276 170 332 523 485 195 220 3066 Total 1535 1818 635 1976 805 1588 2460 2176 1149 1068 15209

Passes Open school A B C D E F G H I J Total 1998 174 276 54 289 111 213 376 243 233 154 2123 1999 183 302 47 294 128 224 376 303 225 162 2244 2000 194 321 62 311 136 237 392 322 206 175 2355 2001 198 324 59 325 137 242 423 325 200 173 2405 2002 208 335 74 251 140 256 431 398 190 177 2460 Total 956 1558 296 1470 652 1172 1998 1591 1054 840 11587

Activity 10 1 mins Reading data from tables This activity is just to help you to check that you can read data accurately from tables. Use Table 6 to answer these two questions: 1 How many registrations were there on M101 at open school F in 2001? 2 How many passes on M101 were there in total in 1999?

The feedback to this activity is at the end of the unit

32

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

The open schools case study


Exploring the open school data
Firstly, I want you to put yourself in the position of Agatha. Imagine that as part of her research she has been given Table 6 and has been asked to summarise the results for the Minister. He wants to know whether the M101 course has been a success or not. Well, she could of course simply say that almost twelve thousand pupils have passed M101 in the last five years through the open school programme.That is a summary and in some circumstances this might be sufficient. However, I dont think that the Minister would be satisfied with this summary. A thorough practitioner researcher would do some further analysis, even if it were never actually used by the Minister. She would ask herself questions such as the following: Are registrations going up over the five years? Are pass rates going up? Are some of the schools doing better than others? You will explore these questions in the next few activities.

Activity 11

15 mins

Prepare the M101 workbook In this activity you will prepare the M101 workbook, ready for the following activities. 1 Open the Excel workbook M101. It will open on Sheet 1, which contains Table 6. If it does not open at Sheet 1, click the Sheet 1 tab at the bottom of the worksheet. 2 Copy and paste the table into Sheet 2 into a new worksheet, Just as you did at the start of the activities on the Women worksheet. Is important to copy the tables so that they are in exactly the same cells as they were in Sheet 1. e.g. a) Registrations should appear in cell A5. (If you put the data into different cells, then you will not be able to follow the instructions for the next few activities.) 3 Check your answers as you go against those on the model worksheet. These answers are numbered M1, M2, etc. 4 The feedback to this activity is M1 on the model worksheet. 5 Do all of your own calculations on Sheet 2.

There is no feedback to this activity

Commonwealth of Learning

33

Module A3 Getting and analysing quantitative data

Are registrations going up over the five years?


Firstly we are going to look at whether registrations on M101 have been going up or not.You can of course just look at the total figures in each of the five years in Row 18, but it is hard to get a mental image of what is happening with such big raw numbers.To make it easier we are going to index the numbers.
Statistical note: Indexing Indexing is very similar to calculating percentages. We are going to select 1998 as our base year and give it the value of 100. Then for each of the other four years we are going to calculate their totals as a percentage of the 1998 total. We could have picked any year as our base year and we could have used some other figure than 100 as the starting point.

Activity 12

10 mins

Indexing the registrations (base year 1998) 1 First of all, type in 100 in square B19 we are going to use this as our base. 2 Then in cell C19, type the formula = (C18/$B$18*100) and click Enter. The result should be 98. (M1) 3 We now want to apply this formula to the other squares, so click and drag from C19 to F19 to highlight this block of cells. Then use Edit Fill Right. (M2 )

The feedback to this activity is at the end of the unit


Statistical note: Choosing a baseline year It is clearly critical which year you pick as your baseline year. Watch out for cases where people have picked a particular baseline point to exaggerate growth, or lack of it.

Excel note: the $ sign You can use the dollar signs $ to make a formulas reference to a cell into a constant, i.e. to stop it changing as you drag a formula. This means that in our example when we are filling right, the cells will change from C18 to C19 to C20 etc, but the divisor will be cell B18 in all cases.

34

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Are the number of passes going up?


We now want to explore what is happening to the number of passes.

Activity 13

10 mins

Apply the same procedure that you used in the Indexing the registrations activity, but this time apply it to the number of passes.

The feedback to this activity is at the end of the unit

What about pass rates?


Statistical note: Pass rates A pass rate is just another percentage. It converts a given number of passes into a number of passes per 100 students. The formula to calculate a pass rate is:

Pass rate =

Number of passes Number of registrations

100

So, for our example, we have: Percentage of women = For example: Now you can practise some pass rates. 12 24 100 = 20%

Activity 14

10 mins

M101 pass rates You are now going to calculate the overall pass rate for M101 in each of the years. 1 In cell B38 type the formula = B34/B18*100. 2 Now apply this formula to the other squares, so click and drag from B38 to F38 to highlight this block of cells. Then use Edit Fill Right. (M3)

The feedback to this activity is at the end of the unit

Are some of the schools doing better than others?


Now we are going to look at whether some open schools are doing better than others. Firstly we must decide on what we mean by doing better. I can think of at least six ways in which we can interpret this question, even with such a basic set of data. In the activities that follow we are going to look at all six.
Commonwealth of Learning

35

Module A3 Getting and analysing quantitative data

In real life somebody senior to you might have strong views as to which of these measures, if any, is the most appropriate to use.You, on the other hand, as a conscientious researcher, should be aware of the other possibilities.You should be prepared to make the other possibilities known and to argue or negotiate over their various merits. Three of the proposed measures are related to size, or the number of registrations: Total registrations over the five-year period (TotReg). Which school has offered the most opportunities for M101 students over the period? Registrations in the most recent year (RecReg). Which school is currently teaching the most M101 students? Growth in registrations over the five-year period (GroReg). Which school seems to be growing fastest in terms of M101 registrations? The other three refer to performance, or pass rates. The average pass rate over the five-year period (AvePass). Which school has averaged the highest M101 pass rate over the period? Pass rate in the most recent year (RecPass). Which school currently has the highest M101 pass rate? Changes in pass rates over the five-year period (ChaPass). In which school is the M101 pass rate growing fastest?
Statistical note: Variable names You will see that we have given abbreviated titles for each of the measures, e.g. AvePass. These are variable names and they are a useful shorthand in Excel and packages such as

SPSS.
The variable names can be anything, but it is best to choose a name that will remind you what it stands for and that makes sense to the next person to use your data. Some types of software do not like spaces in variable names, so it best to avoid them, or to use the underscore. For example, Ave_Pass. Also the software may be case sensitive. If this is so, and you ask it to search for ave_pass it will not find it. You might want to avoid this by consistently using upper OR lower case. (e.g. AVE_PASS or ave_pass).

Total registrations
We are now going to calculate the values of these six measures, beginning with total registrations.

36

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Activity 15

15 mins

Calculating total registrations Step 1: Delete the cells that you dont need 1 Click and drag to highlight the whole of rows 7 to 17, then Edit Copy. 2 Click on cell A40, then Edit Paste. (M4) 3 Click and drag from B40 to F50 to highlight a block of cells (M4a). Then use Edit Delete and OK the Shift cells left option. (M5) Step 2: Sort the remaining cells 4 Highlight all the remaining cells by clicking and dragging from A40 to B50. (M5a) 5 Go to Data Sort. In the Options box select Sort by total (rather than by open school) and select Descending. Ensure that My list has header row is checked. Then click OK. (M6) Step 3: Show the ranking 6 Type in the numbers 1 to 10 in cells C41 to C50. (M7) This has now given you a rank ordering. School G had the most registrations and is ranked 1, School H is next and ranked 2, etc. 7 Type in TotReg in cell C40 to keep track of what these figures are. Step 4: Put the data back into school order 8 For convenience we will now put them back in school order. Highlight the block from A40 to C50, and then go to Data Sort. In the Options box select Sort by open school and Ascending. Ensure that My list has header row is checked. Click OK. (M8) 9 So we end up with Table 7 below. This shows, for example, that School H had the second highest number of M101 registrations over the period.

There is no feedback to this activity

Commonwealth of Learning

37

Module A3 Getting and analysing quantitative data

Table 7 Open School M101 total registrations 1998-2002 with rank ordering Open School A B C D E F G H I J Total 1535 1818 635 1976 805 1588 2460 2176 1149 1068 Rank 6 4 10 3 9 5 1 2 7 8

Excel note: Typing in numerical sequences There is a useful way of getting Excel to type in numerical sequences for you. For example, when you needed to type in 1, 2, 3, 10 for the TotReg activity, you could have done as follows: Type 1 in cell C41 and 2 in cell C42. Select the two cells. Drag from the bottom right corner of C42 to C50. Excel will complete the sequence. The method will also work for sequences with fixed gaps e.g. 2, 4, 6, 8, 10 It will not work for letters.

Registrations in 2002
Next we need to do exactly the same but for registrations in 2002 (RecReg), rather than total registrations.

Activity 16

15 mins

Registrations in 2002 Repeat the process that you used in the Calculating total registrations, but this time do it for Registrations in 2002, i.e. produce a table to show the schools in rank ordering by registrations in 2002.

The feedback to this activity is at the end of the unit

38

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Growth in registrations
The third measure concerns growth in registrations. We have already seen that there has been very little overall change but now we want to look at the change in individual schools.

Activity 17

15 mins

Growth in registrations Use Excel to produce indexed figures for each school similar to those produced in M2 for the total figures.

The feedback to this activity is at the end of the unit

Interpreting the figures

The figures in the table (Table 11 in the feedback to Activity 17) are very interesting. We had already noted that the overall situation had changed little, but there had been a lot of fluctuation in certain individual schools. You might be able to look at Table 11 and see the patterns. However, most people prefer pictures to numbers, so we are going to construct a graph.The following activity will take you through the steps.

Activity 18

15 mins

Constructing a graph of growth in registrations 1 First, we need to tell Excel that the year numbers (1998, 1999, ) are to be treated as labels and not as numbers. To do this, put an apostrophe before each year. So 1998 becomes 1998, 1999 becomes 1999, etc. 2 Highlight the whole block of cells from Open School to 220, (cells A7 to F17). 3 Click Insert Chart. Excel will now take you through the stages of Chart wizard. These are straightforward but the details may vary between different versions of Excel. 4 Under Chart type select Line. Select the first of the Chart sub-types. Click Next. 5 Under Chart source data select Series in Rows. Click Next. 6 Under Chart options: Chart title: type in Open School M101 registrations 1998-2002 (Base = 1998 = 100). 7 Under Category (X) axis type in Year. Click Next. 8 Under Chart location click on As new sheet and type in a name for your chart. Click Finish.

Commonwealth of Learning

39

Module A3 Getting and analysing quantitative data

You should now have produced a graph like Figure 6. (If not see M101 and Figure 6 that was produced from it and saved as a new worksheet.) The figure is a bit complicated because it has data for five years for ten schools. However, look at it for a few moments and see what patterns emerge. Figure 6 Open school M101 registrations 1998-2002 (Base: 1998 = 100)
140

120

100

A B

80

C D E F G H I

60

40

20

0 1998 1999 2000 Year 2001 2002

(Note: a larger version of this chart appears in Workbook M101, Figure 6.)
Commentary

While we have seen little change in the overall registration figures, no individual school seems to have behaved in this way. School C comes the closest but it actually grew in 1999 when the total went down. Eight out of ten schools show a reasonable growth in registrations over the period. Schools D and E show a marked decline in registrations. With D the major decline took place in 2002 but for E it was in 1999.

Growth patterns using a single figure


Now we want to summarise these growth patterns using just a single figure. Here we are going to use the 2002 index figure GroReg as our measure. So for open school A the value will be 124 (the 2002 figure is 24% greater than the 1998 figure) and for open school D it will be 63 (the 2002 figure is 63% of the 1998 figure).

40

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Activity 19

10 mins

Ranking by growth Sort and rank the ten schools in terms of their growth (GroReg).

The feedback to this activity is at the end of the unit

Pass rates
We turn now to pass rates and we begin with the schools average pass rate for M101(AvePass).This is defined as the total number of passes divided by the total number of registrations, multiplied by 100.

Activity 20 Pass rates

10 mins

You should be able to calculate the average pass rate (AvePass) for each school as we did for average registration numbers and then rank them (if not see M13 a-c).

The feedback to this activity is at the end of the unit

Activity 21

15 mins

Recent pass rates Now do the same for the most recent pass rates (RecPass).

The feedback to this activity is at the end of the unit

The sixth and last measure concerns changes in pass rates. We have already seen that there has been a year-on-year improvement in the overall pass rate for M101, but now we consider individual schools.

Activity 22 Pass rates

15 mins

1 Calculate indexed pass rates as we did for registration numbers. 2 Produce a line graph similar to Figure 6, but this time for indexed pass rates. 3 Study the new graph and prepare a short commentary on it.

The feedback to this activity is at the end of the unit

Commonwealth of Learning

41

Module A3 Getting and analysing quantitative data

Changes in pass rates using a single figure


We are going to use the 2002 index figure as our measure of ChaPass.This is our single figure summary of the changes in pass rates that have been happening in individual schools. For example, the indexed pass rate for Open School A in 2002 was 96.This means that the pass rate in 2002 was 96% of that in 1998.

Activity 23

10 mins

Changes in pass rates 1 Take the 2002 indexed figures for pass rates for each school, similar to those produced for GroReg above. 2 Then rank order them.

The feedback to this activity is at the end of the unit

Combining the measures


Now we are going to put all six of the measures together. In each case we are just going to use the rankings that we have calculated rather than the actual figures.

Activity 24

15 mins

Combining the measures 1 Cut and paste special the six sets of results to create a composite table as in Table 8. (M17) 2 Faced with the question Which school is doing best with M101?, what would be your answer? Use the results in Table 8 and other results that might be useful.

The feedback to this activity is at the end of the unit

42

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Table 8 The rankings for 10 open school on six performance indicators for M101 Total Recent Growth in registrations registrations registrations Open school A B C D E F G H I J (TotReg) 6 4 10 3 9 5 1 2 7 8 (RecReg) 4 3 10 6 9 5 1 2 8 7 (GroReg) 1 7 8 10 2 6 5 4 9 3 Average pass rate (AvePass) 9 2 10 6 4 7 3 8 1 5 Recent pass rate (RecPass) 10 3 9 2 5 8 4 6 1 7 Change in pass rates (ChaPass) 10 5 3 2 7 6 8 1 4 9

Conclusions
In Table 8 we reduced the data to a composite table that placed the ten open schools in rank order in terms of six different performance measures. What have we learned from this league table approach? Well, the exercise would have been much simpler if we had used a single measure rather than six. But which one should it have been? If you were an Education Minister you might say that the best school was the one that offered the most learning opportunities, i.e. the most registrations, and that was school G. Or you might go for output and again school G had produced the most passes. However, you might also know that school G had been far more generously funded than other schools and was geographically situated in an area that made recruitment for M101 very easy.That is, it should have performed better than the other schools. You might also be concerned that school G had not increased its registration numbers or pass rates as much as many other schools over the same period. If you were the head teacher at one of the open schools, I am sure that you would select the measure that showed your school in the best light. School A would point out that it had the highest growth in registrations and would avoid discussions about its pass rates. A statistical lesson that you should have learned is that there are problems with index figures when you start from a small numerical base.To take an extreme example, if a school had two pupils one year and four the next, this
Commonwealth of Learning

43

Module A3 Getting and analysing quantitative data

would give an index figure of 200! Secondly, growth or change rates can be misleading. A pass rate can go from a low 10% to a still low 30% but give an index figure of 300. I think we have seen that the rank ordering technique that we have used is simple to understand but can distort the true picture.To illustrate this, look at Table 9, where ten students have taken two exams.Their scores are shown and their ranks based on these scores. On the first exam there were two star pupils (A and B) and the rest did fairly badly with scores ranging from 36 to 33. However, C came third equal and J came 10th. With rank ordering, the gaps between the rankings are considered to be equal. C goes home happy although he scored 56 marks lower than the second student. J is miserable although he only scored 3 fewer points than C. On the second exam everybody did well with scores between 99 and 96. However, J is bottom of the class even though he nearly scored as many as the top student. With rank ordering somebody always has to come bottom.
Table 9 Exam grades scored by 10 students Student A B C D E F G H I J Exam 1 94 92 36 36 35 35 34 34 34 33 Rank 1 1 2 3= 3= 5= 5= 7= 7= 7= 10 Exam 2 99 98 98 97 97 97 97 97 97 96 Rank 2 1 2 3 4= 4= 4= 4= 4= 4= 10

In this exercise we have been using ordinal scales. We have taken actual numbers on registrations and we have reduced them to rank orders for the purposes of simplicity. But some accuracy and power has been lost. In the next section we return to the question of different types of scale. Before leaving this section I want you to go back to the basic data that we started with Table 4 and Table 6. We gave you these figures and, reasonably enough, you accepted them as accurate and comprehensive. However, in real life you will be drawing this type of data from government publications, from journal articles, from books, from websites, etc. To a great extent you have to take other peoples data on trust, but it is up to you to question it as much as possible.This data has been constructed like any other. Here are some considerations:

44

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

What actually is the data? Table 4 was titled Students registered on higher education courses analysed by institution and gender (2003), which seems pretty straightforward. But does this include undergraduate and postgraduate courses? What about extra-mural, continuing education or other non-creditbearing courses? Is it a student-based or a course-based table? Because there are figures there for AOU and BOU one can assume that parttime and ODL students are included, but often they are explicitly excluded. You have to read introductions, footnotes and technical appendices very carefully to make sure the figures are what you think they are. Where are the figures from? Are they collected by the institution, by the Ministry, or by a researcher? Are they based on samples or censuses? Are they collected using standard, reliable methods? Are the figures internally consistent? Do the columns add up to the totals given? Do the percentages add up to one hundred? The statistics are compiled by humans and errors can always occur. Are there any extreme values that seem to be out of place? These outliers may be due to errors. Finally, one needs to know the context of various figures before jumping to conclusions. For example, a college with very few women may only offer courses in engineering or some other subject that is historically unattractive or unwelcoming to women. A major drop in registrations in a particular district in a certain year may be due to a harvest failure or some other local phenomenon.

Summary
In this unit you have looked at how Excel can help you begin analysing data that already exist. In particular, you have seen that you can use Excel to: calculate percentages find totals of rows or columns of data copy formulae from one cell to a range of cells paste either data or formulae, or both sort data in Excel.

Commonwealth of Learning

45

Module A3 Getting and analysing quantitative data

Feedback to selected activities

Feedback to Activity 1
The number of women was 3556 (or 3,556). This should have been straightforward.You just look along the row that begins with AOU and down the column for Women. In the cell where the row and column intersect you find the number 3556.

Feedback to Activity 8
Well, if you have carried out your calculations correctly, you will have seen that a very different pattern emerges (W9). AOU is now the top institution because it has the most women students. University 1 has risen from bottom to third. Neither of the rank orderings is wrong.The BOU is twice as attractive to women than is University 1 in relative terms (44% compared to 20%), but University 1 is so big that it still offers more places to women than does the BOU. The important lesson is that one has to understand the rationale and the value system underlying league tables. In this case a government might consider setting up more open universities in order to attract more women because they seem relatively more attractive to women. However, it would be counter-productive to consider closing down existing conventional universities because they are actually teaching large numbers of women.

Feedback to Activity 9
The answers you should have come up with are: 1 25% 2 59% 3 52% So, open universities were twice as successful at enrolling women than were conventional universities (52% is more than 2 25%), but not quite as successful as teacher training colleges (59% is greater than 52%). There are many ways to end up with these answers. My method is shown on the model sheet W10. The calculations could have been carried out anywhere on the worksheet but I chose to do them within the table. I did this by highlighting cells (W10a) then using Insert Cells Shift cells down OK. This makes space for the total figures for All Universities (W10b). You can then type in the formulae you need in the appropriate cells (W10c).
46
Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Excel note: For the total men I selected the empty cell where I wanted the answer, then double clicked on the Sigma () symbol on the Toolbar. I then grabbed the cell and dragged across the Women and Total columns. Finally I grabbed the Women % cell for University 5 (cell X163) and dragged it down one cell.

Feedback to Activity 10
Your answers should have been: 1 329 2 2,244 If you got either of the answers wrong, do not carry on until you can see where you went wrong.This is quite a complex table and you have to pay great attention to detail. In (1) you have to look up registrations which are given in the first part of the table.Then you need the answer for the right school (F) and in the right year (2001).You do this by running your eye or finger across the F row and down the 2001 column. Where they intersect, there is your answer 329. Table 6 contains even more data than Table 5. It tells you what happened on a particular course to 15,209 students spread over five years and ten schools. Once again, as the activity showed, it is easy to look up certain kinds of data. However, it is likely that the question that you have cannot be answered without some further work on your part.

Feedback to Activity 12
The cells from B19 to F19 should now read:
100 98 101 101 101

By looking at these figures we can easily see that little has changed between 1998 and 2002. After a slight dip in 1999, registrations have remained pretty constant at a level 1% above the baseline figure.

Feedback to Activity 13
My results are shown in M2.They are:
1998 100 1999 106 2000 111 2001 113 2002 116

They show that there has been a healthy year-on-year growth in the number of passes on M101. However, these are pass numbers not pass rates. Pass

Commonwealth of Learning

47

Module A3 Getting and analysing quantitative data

numbers can go up if the number of registrations is going up but the pass rate remains the same, or even declines.

Feedback to Activity 14
The cells from B38 to F38 should now read (once you have converted them to whole numbers):
70 75 77 78 80

So, the good news is that the pass rates have been going up continuously over the five-year period.They have risen from 70% to 80%.
Excel note: Tidying up your results You might want to tidy up your results a little as I have done in M3. It looks better, but more importantly it labels your results so that you know what they are. I have copied and pasted the years. I have typed in Pass rates in cell A38. I have got rid of decimal places. I have centred the results using Align center in the Formatting toolbar.

Feedback to Activity 16
You should end up with the figures shown in Table 10, but you can go to M9ac if you get stuck.
Table 10 Open School M101 registrations in 2002 with rank ordering Open School A B C D E F G H I J 2002 RecReg 355 385 125 276 170 332 523 485 195 220 4 3 10 6 9 5 1 2 8 7

48

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Feedback to Activity 17
You should end up with the figures shown in Table 11, but you can go to M10a-b if you get stuck.
Table 11 Open School M101 registrations 1998-2002 (Base = 1998 = 100) Open School A B C D E F G H I J Total 1998 100 100 100 100 100 100 100 100 100 100 100 1999 95 91 115 99 109 102 104 95 75 116 98 2000 104 106 101 97 117 109 102 102 69 112 101 2001 113 108 102 94 118 110 108 95 68 113 101 2002 124 109 103 63 119 111 112 112 64 115 101

Feedback to Activity 19
You should arrive at a table like Table 12 below (if not see M12a-c).
Table 12 Open school M101 registrations growth rates 1998-2002 (Base: 1998 = 100) Open School Growth in Rank registrations A B C D E F G H I J 124 109 103 63 119 111 112 112 64 115 1 7 8 10 2 6 5 4 9 3

Feedback to Activity 20
The results should be as in Table 13. (They show that, for example, School A had the ninth highest average pass rate.)

Commonwealth of Learning

49

Module A3 Getting and analysing quantitative data

Table 13 Average open school M101 pass rates 1998-2002 Open School AvePass Rank A B C D E F G H I J 62 86 47 74 81 74 81 73 92 79 9 2 10 6 4 7 3 8 1 5

Feedback to Activity 21
The results should be as in Table 14 and show that, for example, School I had the highest recent pass rate. (If you had any difficulty with this, see M14a-c.)
Table 14 Open school M101 recent pass rates Open School A B C D E F G H I J Recent Rank pass rate 59 87 59 91 82 77 82 82 97 80 10 3 9 2 5 8 4 6 1 7

Feedback to Activity 22
You can find the working for this in M15 and Figure 7.

50

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

Figure 7 Open school M101 indexed pass rates, 1998-2002 (Base: 1998 = 100)
160

140

120 A 100 B C D 80 E F 60 G H I 40

20

0 1998 1999 2000 Year 2001 2002

(Note: a larger version of this chart appears in Workbook 101, Figure 7)


Commentary

Out of the ten schools only one, school D, showed a continuous improvement in M101 pass rates over the period. However, only one school had a lower pass rate in 2002 than 1998.That was school A. Something very serious seems to have happened at school C in 1999.There was either a 24% drop in the pass rate or there may be an error in the data. School H has shown the greatest sustained growth. However, there were also marked improvements in the case of schools C (despite the 1999 figure), D and I.

Feedback to Activity 23
You should end up with the figures shown in Table 15. If you had any problems with this, look at M16. This shows that, for example, School H had the highest growth in pass rates at 46%.The pass rate at School A had actually fallen slightly.

Commonwealth of Learning

51

Module A3 Getting and analysing quantitative data

Table 15 Open school M101 most recent pass rates Open School A B C D E F G H I J Most recent Rank pass rates 96 112 133 137 106 108 102 146 128 100 10 5 3 2 7 6 8 1 4 9

Feedback to Activity 24
After data collection and analysis comes interpretation. A difficult task in this case! No school comes top on all measures, so there is no obvious winner. Lets take it school by school: School A: This school had the fourth largest number of registrations in 2002 and had grown more than any other school over the time period. In 2002 there were 25% more students than in 1998. However, its pass rates were disappointing.They were relatively bad to start with and they have not improved. School B: This school performed fairly well on six of the seven measures. It has not done very well on increasing registrations. School C: School C does not seem to be doing very well on our measures. It has increased its pass rates from 44% to 59% but they are still well below average. School D: This school has performed well, but has not managed to increase its registrations as much as other schools. School E: This is a very small school but is getting relatively bigger. Pass rates are average. School F: A pretty average school. It had the 8th worst pass rate in 2002, but it was only 3% below the total figure. School G: This seems to be a good school in terms of size and pass rate. But, relatively, it hasnt improved. School H: Is a good school in terms of size but not so good on pass rates. It achieved its high status on ChaPass by going from a very low pass rate in 1998 to an average pass rate in 2002.

52

Practitioner research and evaluation skills training in open and distance learning

Unit 3: Analysing other peoples data

School I: Is a relatively small school but good on pass rates. School J: This school achieved a relatively high growth in registrations but remains a small school. It maintained a constant pass rate but fell down the rankings because most of the other schools improved theirs.

Commonwealth of Learning

53

Quantitative institutional data

U N I T

4
Unit overview
This unit is designed to introduce you to some of the basic processing that can be used to analyse data that already exists in particular, data from your own institution.You will look at: types of data nominal, ordinal, interval and ratio ranking data methods of measuring the spread of data averages trends in data.

Learning outcomes
When you have worked through this unit, you should be able to: 1 Describe the four types of data and identify to which type a given piece of data belongs. 2 Rank data. 3 Describe the main characteristics of the Normal distribution. 4 Calculate a mean using Excel. 5 Calculate a mode. 6 Calculate a median. 7 Identify the effects of using weighted and unweighted averages. 8 Calculate a standard deviation. 9 Calculate a range. 10 Calculate quartiles. 11 Calculate the interquartile range. 12 Smooth curves. 13 Fit a straight line to a curve.

Commonwealth of Learning

55

Module A3 Getting and analysing quantitative data

Introduction
In this section we are going to be looking at data that is gathered by institutions on a regular basis, held on a database and analysed in order to monitor and evaluate its own performance. Handbook B1, Using programme monitoring in research and evaluation, focuses on the use of such databases for research purposes. The type and size of this institution can vary greatly, from a small open school where records are kept on paper, and where institutional research involves hand-counting these records, to a huge open university with many staff assigned full-time to these tasks. Typically the database will contain most or all of the following: student contact details: name, address, phone numbers, email address, etc student characteristics: age, gender, previous education, occupation, etc (collectively these are generally referred to as demographics) financial information: payments and dates, financial assistance given by the institution, external sponsorship student progress: courses taken, assignment grades, qualifications awarded, etc. Other data may also be available. For example, in large institutions there may be comparable data for full-time and part-time staff and data may also be generated concerning the system itself e.g. turn-around times for assignment marking, impact on registrations of particular advertising schemes. However, we are going to restrict ourselves to student data because this will give us sufficient variety to introduce all of the relevant research methods.

Types of data
All of the types of information listed above are called variables. A variable simply means a characteristic or condition for which each case or subject (here each student) has any of a number of pre-determined values. We will explore variables with the aid of four bits of data on each of two students, as in Table 16.These examples demonstrate one example from each of the four types of variables that represent different levels of measurement, namely nominal, ordinal, interval and ratio.This is not categorising for its own sake.The level of the variable that you are using will determine what type of statistical calculation is permissible.

56

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

Table 16 Data on students A and B Variable Male Exam grade Exam score Age Value for Value for student A student B M B 75 23 F C 68 45

Nominal level data


Nominal level data describes something. Examples of nominal data are: gender ethnicity region In the data in Table 16, gender is our example of a nominal level variable. A nominal variable describes a fixed number of categories. In the case of gender, there are two categories: M F The only the information that the data item contains is whether an individual belongs to a given category or not. Nominal level variables can have numerical values, e. g. you could name regions as: 1, 2, 3, However, these numbers would still be names, standing for Region 1, Region 2, and so on.

Ordinal level data


Ordinal level puts things into a rank order. Examples of ordinal level data are: 3rd out of 50 in a class of students grade C in an exam. In Table 16, exam grade is an ordinal variable.The student with grade B has done better than the student with grade C but we dont know how much better. Ordinal or ranked data indicates the position or order that an individual holds. However, it tells you nothing about the distance between the ranks.

Commonwealth of Learning

57

Module A3 Getting and analysing quantitative data

We have already encountered the limitations of this type of variable in Table 9, where we had ranks such as 3 = . In Table 16 grade B might include all those students who got exam scores from 70 to 90 and grade C those from 60 to 69. We do not know how much grade B is better than grade C. We do not know if moving from a grade C to a grade B is the same as moving from a grade D to a grade C. We just know that in both cases it is an improvement.

Interval level data


At interval level there are equal distances (or intervals) between each of the measures on the scales. So with exam score the difference between 75 and 68 is taken to be the same as the difference between 60 and 53. However, there are complications with interval level data when there is no absolute zero point. For example, you could obtain a score of zero on the exam but it does not mean that you know absolutely nothing about the subject. Neither would it make statistical sense to say that somebody with an exam score of 60 knows twice as much about the subject as somebody with a score of 30.

Ratio level data


Ratio level data is like interval level data but with an absolute zero.This means that you can draw conclusions about relative size.That is you can calculate meaningful ratios.You can say that Student B is 55/22 = 2.5 times older than Student A, because age starts from the absolute zero of birth and is measured in the regular units of time.

Activity 1

10 mins

Practice in classifying variables So far this seems to be quite a neat typology, but that is because I have chosen straightforward examples. I want you to try quickly to classify the following variables. (Dont spend too long because I have not given you enough information to make really good decisions). 1 Students occupation. 2 Students disability. 3 Students intelligence or IQ score. 4 Students income. 5 Students previous educational qualifications. 6 Students rating of the level of difficulty of a course. 7 Students fee payments.

58

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

8 Number of students registered on a course.

The feedback to this activity is at the end of the unit

Some other points about scales


In the case of interval and ratio scales the data can be either discreet or continuous. Discreet data only contains whole numbers, e.g. 3 children, 7 exam passes. Continuous data can take values between the whole numbers. For example, an age of 45.75 years makes sense. For continuous variables you have to decide on accuracy levels.You could say that you were 45.739842 years old but it would not be true for very long! For each variable there will be values that are permissible and those that are not. For age, nothing under zero makes sense and values over 100 become increasingly suspicious. (It is quite common for people to give you their year of birth by mistake, rather than their age.) When data is missing for a student on a given variable, you need to assign a value that implies that it is missing and you have to make sure that it does not form part of any statistical calculation.

Types of institutional data


Institutional data can be divided into three types: 1 necessary: there are certain basic types of data required to actually deliver the learning.These generally include the name and contact details of learners, choice of course, funding, grades gained, etc. 2 required: the government or funding body will lay down what information they require, what questions are to be used and what form the answers must be presented in. 3 chosen: your own institution may choose to collect extra data on a regular basis.You may be involved in collecting this data and designing the questions. Advice on how to go about this is given later in this module. In the following section we look at analysing quantitative institutional data that has been collected in any of the three categories, but which can be used to assist institutional decision making.

Dealing with quantitative data


Earlier in this module we listed a number of questions as examples of the type of question that quantitative methods can be used to answer. Here we are going to address some of these questions, plus some extra ones, and take you through the processes and techniques of answering them. We have described what quantitative data is and the next concern is how to deal with

Commonwealth of Learning

59

Module A3 Getting and analysing quantitative data

it. How can we make it manageable? How can we present it to others so they can grasp what is going on? Many books on statistics and research methods use relatively small datasets in order to simplify calculations, but in ODL you are likely to be confronted by large, or even huge, datasets.This can bring with it different requirements and different procedures. For this reason we are going to use a dataset that contains 500 cases.These are students enrolled with a particular institution. Our task is to extract and describe the information that this database contains.

Summarising
We will begin with the apparently straightforward question posed by the head of the institution How old are our students this year? We could of course simply give her a list of the 500 students with each ones age attached, but this would not be very helpful. She would be looking for some type of summary.

Activity 2

5 mins

Summarising ages For this section you will be using the Excel workbook Summarising, which you will find in your Resources File. 1 Open the workbook Summarising. If it does not open at Sheet 1, click the Sheet 1 tab at the foot of the workbook. 2 The sheet shows the ages of the students, e.g. student number 1 is aged 29, Student number 2 is aged 25, etc. 3 Copy and paste all of the data into Sheet 2, as you have done in previous activities, so that you have a set of figures to work on. 4 What can you conclude about the age distribution from looking at the data in Columns A and B?

The feedback to this activity is at the end of the unit


Statistical note: Measuring age We have already said that age is a continuous ratio scale. However, for our purposes it is sufficient to measure age in whole years. There has to be a baseline date, so age should really read as something like Age in years as at January 1st, 2004.

Rank ordering
To overcome the problem found in the last activity, I would suggest rank ordering the ages.That means arranging the students from the one with the
60

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

highest age to the one with the lowest.This is something that you have already done in several activities in the previous section.

Activity 3

5 mins

Rank ordering the students ages 1 Copy and paste the whole of columns A and B into columns E and F. 2 Highlight the whole of columns E and F. 3 Use Data Sort with Sort by age, descending and My list has header row. This should rank the students by age, starting with the oldest. 4 What, if anything can you conclude?

The feedback to this activity is at the end of the unit


If you get stuck look at S1 on the model worksheet.

Showing how data are distributed


Showing how data are distributed is one of the main uses of statistics. Given an array of data ages in our example how can we accurately describe it using just one or two measures? If statisticians are allowed just one, they will probably select some measure of central tendency or average. Given two they would pick a measure of central tendency plus some measure of dispersion. In other words, what is a middle sort of value and how much are the others spread out around this value? However, we are going to learn the lesson from our case study and begin by drawing a graph.To assist you in this we have already counted how many students are aged 15, 16, 17, etc and this data is in the Model worksheet as S3 (go to cell D502 to see this).This is an example of grouped data that is arranged in a Frequency table. It tells you how frequently something occurs. In this case there was one student aged 35, two aged 34, etc and you can see that the total is 500.

Activity 4

10 mins

Showing an age distribution In this activity you will produce a graphical display of the distribution of the students ages. To do this, you will need to use the grouped version of the data, which is in the model worksheet as S4. 1 Highlight the block of cells from H1 to H22. This selects the data and the header row. 2 Go to Insert Chart. Under Chart Wizard select Line as your chart type.

Commonwealth of Learning

61

Module A3 Getting and analysing quantitative data

3 In Data range select columns. 4 Under Chart options type in the title The age distribution of current students; then Age for the Category (X) axis and Number of students for the Value (Y) axis. 5 Under Legend, unclick the box so that it will not be shown. 6 For Chart location type in Age and select New sheet. Click on Finish. You should have the same chart as in Age model in the workbook. It is also shown below as Figure 8.

There is no feedback to this unit


Figure 8 The age distribution of current students
70

60

50

Number of students

40

30

20

10

0 15 16 17 18 19 20 21 22 23 24 25 Age 26 27 28 29 30 31 32 33 34 35

The Normal distribution


In Figure 8 we can see that ages range from 15 to 35, which we knew already, and also that most students were aged between 21 and 29. In fact what we have here is a classic shape that is known as a Normal distribution.The Normal distribution is important because it possesses a number of useful characteristics and because many statistical tests are based on the assumption of such a distribution.

Statistical note: The Normal distribution It is sometimes called a bell curve because it resembles a cross-section through a bell. It is called Normal because such distributions occur among a lot of natural phenomena. Again the institution head might find this chart interesting but not terribly useful. They will want to know what is the average age of the students. As we will see, there are several types of average but she probably wants the arithmetic mean and so we will calculate that to start with.

62

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

We have already come across the arithmetic mean in Table 3, when we looked at average enrolments per course. There we took the number of students in a given semester and divided it by the number of courses on offer. In this example we have to add up all the ages of the student and then divide the total by the number of students.

Activity 5

10 mins

Finding an average using Excel 1 Go back to Summarising, Sheet 2. 2 Type in Total in cell A502. 3 In cell B502 calculate the total of all the students ages by typing in the formula = SUM(B2:B501) or by double-clicking on . 4 This gives the result 12500. 5 In cell A503 type in Total/500. 6 In cell B503 type in the formula = B502/500 and press Enter. This gives the value 25 and this is the arithmetic mean, or the average of all the ages. The calculations are all in S3, at the bottom of the figures in Columns E and F. To locate them you might find it quicker to use Edit Find. Type in S3 and click on Find next.

There is no feedback to this unit

Excel note: The average function Excel contains a large number of functions. These are small program routines that will carry out common calculations for you. One of these built-in functions is the average function. To use it in the above activity, you would type in: = Average (B2:B501) (See S3)

In effect this says to Excel carry out an average calculation on the data in cells B2 to B501.

You will see that the arithmetic mean of 25 is the same as the mid-point on our age distribution graph.This is what you would expect given the symmetry of the curve. Here the average is a good summary of the situation. However, the arithmetic mean does not always fit so neatly with the curve. Figure 9 shows another set of age figures.The shape is clearly not that of a Normal distribution. and is said to be a skewed distribution.

Commonwealth of Learning

63

Module A3 Getting and analysing quantitative data

Figure 9 A skewed age distribution


140

120

100

80

60

40

20

0 15 17 19 21 23 25 27 29 31 33 35 37 39 Age 41 43 45 47 49 51 53 55 57 59 61

Activity 6

15 mins

The average of a skewed distribution It is very difficult to judge the average of a skewed distribution just by visual inspection of the graph. You might like to try and guess the average for the distribution in Figure 9, but you are also going to calculate it. 1 The data for the skewed distribution is in Columns B and C of worksheet Skew of the Summarising workbook. 2 Note that this data is already grouped. It tells us that there are 80 students aged 15, 85 aged 16, etc. So calculating the total age is slightly different, since each row now represents a different number of students. 3 In D2 type in = B2*C2 and press Enter. This gives the value 1200 and it is the total age of all the 15 year-olds. 4 Use Fill Down to D49 to calculate similar totals for each age. 5 Calculate the totals for columns C and D in cells C50 and D50 using the S button. 6 Calculate the arithmetic mean in cell D52 using the formula = D50/C50.

The feedback to this activity is at the end of the unit


Statistical note: Grouping data Sometimes you will find that the data has been grouped even more. For example, you may find that age has been grouped into Under 21, 2130, 3140, 415, 5160, 6170, 71 and over.

64

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

It is customary to use the mid-point of each group for your calculations. The 21-30 group contains people aged 21.0 to 30.999 so the midpoint is (21.0 + 30.999)/2 = approximately 26. The two extreme groups are problematic. The standard approach is to assume they are like the next group. So Under 21 would be treated as 11-20 and 70 and over as 71-80.

Averages
Up until now we have used the arithmetic mean as our measure of an average value for a distribution. However, there other measures we might want to consider.

The mode
Perhaps you were a bit surprised by Activity 6 where we found that the arithmetic mean is also 25, as in Figure 8.To look at Figure 9 you might suppose that the average was much lower. In the case of such a skewed distribution you might want to use another type of average the mode. The mode is simply the value that has the most people in it and in this case it is 19.The arithmetic mean has been heavily influenced by the few people with extremely high values and consequently the average does not appear to be typical.

The median
Look now at a different distribution in Figure 10. Here the arithmetic mean is 32 and the mode is 16 and neither seems very satisfactory. Another measure of central tendency to consider is the median.This is very simple to calculate because it is just the value that divides the distribution in half. If you have the median then you know that 50% of the population are below it and 50% are above it. In this particular case the median is 30 and it seems a better measure of central tendency than either the mean or the mode. (The data and calculations are in S6.)

Commonwealth of Learning

65

Module A3 Getting and analysing quantitative data

Figure 10 Another age distribution


120

100

80

60

40

20

0 15 17 19 21 23 25 27 29 31 33 35 37 Age 39 41 43 45 47 49 51 53 55 57 59

Averages in bimodal distributions


Finally I want you to look at the distribution in Figure 11. Here the arithmetic mean is 25 again.The median is also 25, but there are two modes. One is 19 and the other is 31. Not surprisingly this is called a bimodal distribution. In common sense terms there really is no central tendency. (The data and calculations are in S7.)

66

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

Figure 11 A bimodal distribution


60

50

40

30

20

10

0 15 16 17 18 19 20 21 22 23 24 25 Age 26 27 28 29 30 31 32 33 34 35

Averages a summary
So what have we learned so far about averages? when people talk about the average they are generally thinking of the arithmetic mean there are other averages such as the mode and the median the arithmetic mean is a reasonable measure of central tendency when the distribution is not skewed with skewed distributions it is often more appropriate to use the mode or the median with a symmetrical distribution (e.g. the Normal distribution) the arithmetic mean, the mode and the median actually assume the same value in some circumstances no measure of central tendency is appropriate e.g. bimodal distributions.
Statistical note: Means with different types of data Strictly speaking, means, modes and medians are for use with ratio and interval level variables only. However, it is fairly common for them to be used with ordinal level variables

Weighted and unweighted averages


A researcher is given the figures shown in Table 17 and he uses the arithmetic mean to conclude that the average percentage of women at university is 29%. He calculated his answer by adding the percentage of women at each university, then dividing by 5, the number of universities:
67

Commonwealth of Learning

Module A3 Getting and analysing quantitative data

Average =

38

28

7 5

38

32

= 29

The number of students in each university is not taken into account. University 3 is a very small university but it is considered equal to the other four in this calculation.
Table 17 Comparing averages Men Women Total % Women % Women weighted by unweighted university 38 28 7 38 32 29 38 28 7 38 32 34

University 1 University 2 University 3 University 4 University 5 All

4678 1683 680 1287 3547 11875

2873 654 54 776 1678

7551 2337 734 2063 5225

6035 17910

A second researcher feels that this is inappropriate. She feels that every student should be considered or weighted equally. She calculates her result by dividing the total number of women by the total number of students and multiplying by 100 and comes up with the answer of 34%: Average = 6035 17910 100 = 34

Where does this difference of 5% come from? Which answer is correct? Essentially both researchers were answering correctly but they were answering two different questions.The first was answering What is the average percentage of women at the five universities?. For the second the question was What percentage of university students were women?. Essentially we are talking about weighted and unweighted averages.The average of 34% could be termed unweighted because all students are equal, or weighted where each student has the weight of 1.The average of 29% came about because universities were weighted equally. It could have been by subject or by some other factor.

Spread, dispersion and deviation


So far we have looked at measures of central tendency and averages. Now look at the Figure 12 below. It contains two distributions and they both have the same central tendency.They have the same mean, mode and median of 25, but they are quite different distributions.

68

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

Figure 12 Distributions with the same central tendency but with different spreads
25

20

15

10

0 20 21 22 23 24 25 Ag e 26 27 28 29 30

Visually, one distribution has a sharp peak and the other is more rounded.This means that the first distribution is less spread out around the central value. When describing the distribution we would like to give a numerical value to the amount of this spread. When the distribution is more or less Normal a good measure of this spread is the standard deviation.This can be calculated by using the following equation. Standard deviation = (x x) n 1 In words this means the square root of the sum of the squared deviations from the mean divided by the number of cases minus one.
Statistical note: x-bar The symbol x is called x-bar.
2

Example: Calculating standard deviations The slow way 1 Open the worksheet SD that is in the Summarising workbook. 2 Example A shows the individual ages (the x values) of the 56 students whose ages formed the pointed distribution in Figure 12. I will now take you through the steps to calculate the standard deviation (SD). 3 Click on cell B63. You will see that it contains the function to calculate the average age of the students (= AVERAGE(B6:B61) which is 25. 4 This average, or x, was entered in cells C6:C61. 5 The next column takes the average away from each age (x x).

Commonwealth of Learning

69

Module A3 Getting and analysing quantitative data

6 In the next column (E6:E61) the results from the previous column are squared (that is, multiplied by themselves). 7 The values in E6:E61 are totalled in E62. This gives us

x2

8 The total in E62 is divided by 55 (since n -1 = 55) in cell E63. 9 The square root of E63 goes in E64 using the formula = SQRT(E63). (The square root of a number is the number that when multiplied by itself gives you the first number. So, the square root of 9 is 3 because 3 X 3 = 9.) 10 The result in E64 is the standard deviation and the value of 1.6 has been transferred into E65. The fast way A much quicker way to get the standard deviation is to use the Excel function = STDEV(B6:B61) and this has been done in E67 (If you adjust the number the number of decimal places on view, you will see that the results are exactly the same.) However, by going through the stages yourself, you should get an idea of what a standard deviation actually is.

In the next activity you will calculate the standard deviations of the two distributions in Figure 12, using Excel.

Activity 7

30 mins

Calculating standard deviations For this activity you need Example B on worksheet SD of the Summarising workbook. This is the raw data from the students who formed the more rounded distribution in Figure 12. You should now be able to calculate the standard deviation by repeating the steps we went through in the example above. Use both the slow way and the fast way.

The feedback to this activity is at the end of the unit

Standard deviation
So what do these figures mean? Well if the distribution is approximately normal then 60% of the cases should fall within plus or minus one standard deviation of the mean and 95% should fall within two standard deviations from the mean. In our first example, the pointed curve, 60% of the students should be aged between: 25 + 1.6 = 26.6 years old

70

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

and 25 1.6 = 23.4 years old. Similarly, 95% of them should be aged between: 25 + (2 and 25 (2 1.6) = 21.8 years old. 1.6) = 28.2 years old

This gives you some standards. For example, you could choose to You can describe students as being young because their age is more than two standard deviations below the population average.

Activity 8

5 mins

More standard deviations 1 Calculate the corresponding figures for the other group of students in example B. 2 Are the results different from A in ways that you might have expected?

The feedback to this activity is at the end of the unit

Other measures of dispersion


The standard deviation does not work very well with skewed distributions. Clearly more of the cases are on one side of the mean than the other.There are other measures of dispersion that do not rely upon symmetrical distributions.

The range
We have already come across the range.This simply records the highest and lowest values in a distribution. In Example A on Worksheet SD the range is 21-29.

Activity 9

2 mins

What is the range in Example B?

The feedback to this activity is at the end of the unit In our examples, the range is quite useful because, taken together with the average, it gives us a reasonable view of the distribution. However, imagine that one 50 year-old had sneaked into the class in Example A.The average would remain at 25 (when rounded to zero decimal places) but the range

Commonwealth of Learning

71

Module A3 Getting and analysing quantitative data

would now be 21-50. Single extreme values clearly have large and distorting impacts when one is using the range.
Statistical note: Extreme values A single extreme value can also affect means and standard deviations. Try going back to Worksheet SD and changing the last students age from 29 to 59. Note the changes to both the mean and the standard deviation.

Percentiles
Another way of measuring dispersion, and one that is not disturbed greatly by extreme values, is to use percentiles. Percentiles locate individuals in terms of where they come in a distribution.The percentile tells you what percentage of the population is above them and what percentage is below. If a person is in the 56th percentile by age, it means that 55% are younger than him and 44% are older. Rather than individual percentiles, cases are usually banded into bigger units such as deciles (10% bands).We are going to work through an example that calculates the quartiles for a distribution.These are just extensions of the median concept.The median divides the population in half, quartiles divide it into quarters.

Activity 10

15 mins

Calculating quartiles 1 Open the Worksheet Quartiles in the Summarising workbook. This contains a dataset representing 100 students and their ages. To keep things simple, each of their ages is different and they are already rank ordered by age. (Q1) 2 Clearly the youngest 25% of students are the first 25 cases and they are aged from 20.11 to 23.22 years old. 3 To obtain a value so that we can say that 25% of students are below this age, we have calculated the arithmetic mean of the 25th and 26th case. This gives us our first quartile value of 23.67. 4 Now calculate the second and third quartiles.

The feedback to this activity is at the end of the unit


Excel note: Using the Excel quartile function You can use the quartile function in Excel. For example: = quartile (A1:A50, 1)

72

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

will give you the top value of the lowest quartile for the values in the range A1-A50. Similarly: = quartile (A1:A50, 2) will give you the second quartile, etc. You do not need to sort the values before doing this. Please note, though, that Excel calculates the quartiles in a slightly different way. With the data in our example, it takes the 25th and 26th value, then calculates Q1 as being the value of the 25th case plus 75% of the difference between the two cases. Similarly, Q3 is the value of the 76th case minus 75% of the difference between the 75th and 76th cases. Q2 is simply the arithmetic mean of cases 50 and 51.

Boundary problems

There are problems with quartiles at the boundaries. People with very similar scores can end up in different quartiles. However, that is the nature of boundaries. Also, as we shall see later, quartiles do not have the useful statistical properties that standard deviations have. However, quartiles are good when there are one or two extreme values that would have a disproportionate effect on the arithmetic mean, or for skewed and non-normal distributions in general.

The interquartile range


A further useful measure is the interquartile range, i.e. the distance between the 1st and 3rd quartiles. In our example: Interquartile range = 25.21 to 27.63 We can say that 50% of all students fall within that range.You can then compare your interquartile range with that of other populations.

Are we getting more young students?


This was one of our original questions. I am going to answer it using data from my own institution and you will probably begin to see why this is not always a straightforward question to answer. Table 18 shows the number of enrolments for new students on undergraduate courses from 1997 to 2002, arranged into age bands.

Commonwealth of Learning

73

Module A3 Getting and analysing quantitative data

Table 18 New student enrolments at the UKOU, analysed by age, 1997-2002 (raw numbers) 1997 18-24 25-29 30-39 40-49 50-59 60-64 65 + Total 5743 9552 15797 7727 2929 727 709 43184 1998 7006 10691 17706 8740 3566 822 831 49362 1999 7018 9523 17172 8484 3310 845 797 47149 2000 10300 12939 23728 11660 4689 1002 980 65298 2001 10369 11677 21927 10792 4035 814 782 60396 2002 12140 12414 23416 11586 4405 867 976 65804

Answer 1
Clearly there are lots more young students. Students aged between 18 and 24 went from 5,743 in 1997 to 12,140 in 2002. Ah, yes, says the Vice-chancellor, but the overall number of students went up during this period as well. What I really meant was, are there more young students in a relative sense?

Answer 2
To answer this question we indexed the figures just as we did earlier.The results are shown in Table 19. From these figures we can say that the numbers of young students during this period grew at a relatively fast rate.Their numbers more than doubled compared to an overall increase of 52%.This increase was greater than that for any other age band.
Table 19 New student enrolments at the UKOU, analysed by age, 1997-2002 (Base: 1997 = 100) 1997 18-24 25-29 30-39 40-49 50-59 60-64 65 + Totals 100 100 100 100 100 100 100 100 1998 122 112 112 113 122 113 117 114 1999 122 100 109 110 113 116 112 109 2000 179 135 150 151 160 138 138 151 2001 181 122 139 140 138 112 110 140 2002 211 130 148 150 150 119 138 152

74

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

Answer 3
So says the Vice-chancellor, This must mean that that our classes are now bulging with young students. Well, not really. Look at Table 20, where we use the same figures but display them as vertical percentages.The 1824 group increased from 13% to 18%. This means that in a tutorial group of 20 students one might have expected 3 young students in 1997 (20 13 100 = 2.6) and possibly 4 in 2002 (20 18 100 = 3.6).
Table 20 New student enrolments at the UKOU, analysed by age, 1997-2002 (Vertical percentages) 1997 % 18-24 25-29 30-39 40-49 50-59 60-64 65+ Totals 13 22 37 18 7 2 2 100 1998 % 14 22 36 18 7 2 2 100 1999 % 15 20 36 18 7 2 2 100 2000 % 16 20 36 18 7 2 2 100 2001 % 17 19 36 18 7 1 1 100 2002 % 18 19 36 18 7 1 1 100

Answer 4
Well says the Vice-chancellor, What about the average age of our students? Surely that has come down? You can see from Table 20 that the age distribution is skewed (there are more very young students than very old ones), so the median is the most appropriate average to use. If you have access to the whole institutional database then you can use students dates of birth to calculate this very accurately, but here we use it as an exercise to calculate the median from grouped data.

Example: Finding the median Before you do an activity on finding the median, I will first work through an example for you. I will be using the data from Table 18, which you can find on the worksheet Group Median in the Summarising workbook. I will do the calculations for 1997. The median is the value that divides the distribution in two. So if we divide the total by 2 we get the cumulative frequency at where the median will be found:

Commonwealth of Learning

75

Module A3 Getting and analysing quantitative data

1 2 3

Find the total Find the frequency at which the median lies Find the band in which the 21,592nd case lies.

Total = 43184 (GM1) Median lies at cumulative frequency = 431284 = 21592 (GM1) 2

In cells B17 to B23 we have calculated the cumulative numbers. So 5743 students were aged 1825, and 15,295 (5743 + 9552 = 15295) were aged 1829, etc. So if 15297 were aged 1829 and 31092 were aged 18-39, then the 21,592nd case lies somewhere in the 3039 band. (GM2)

Find the point in the band where the median lies

Well the 21,592nd case lies 6297 above the lowest case in this band and 9500 below the highest case.The band is ten years wide, so the median can be calculated as: Median = 30 + 6297 = 34 (GM2) 15797

Activity 11

15 mins

Calculating medians with grouped data Now its your turn. Calculate the medians for the years 1998 to 2202 using the same procedures. If you find this difficult, you will find the answers in GM3.

There is no feedback to this activity Your results, and those in GM3, show that the median age changed very little over the time period. It was 34.0 in 1997 and 33.6 in 2002 So what is going on here? Faced with a simple question and one set of data, we have come up with four very different answers. In the case of the first answer, it arose because the question was not specified well enough. It is generally up to the researcher to negotiate what is really required and what is appropriate. The difference between the other three answers arose because the 1997 figures for the 1825 group were so low. Despite a doubling in their numbers, young students still formed only a relatively small proportion of the total and consequently had only a minimal effect on the median and on their visibility in class. Statistically, a good case could be made out that nothing much has happened to the age distribution over the time period in question. However, my ViceChancellor is very interested in the first row of figures in Table 18. She wants to know what is happening, why it is happening and whether it is going to continue.

76

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

The context is that more of the cost of conventional full-time United Kingdom higher education is being borne by the student.This in turn might be making part-time distance education study more attractive to school leavers. They will have to pay fees but they can earn a full-time salary while they study. Further detailed analysis indeed confirmed that the growth was greater among the 1819 year-olds than for the 2324 year-olds. But will this growth in the number of younger students continue? Here we move from describing what has happened to predicting what might happen. Well, as we and all gamblers know, the future is not ours to see. However, some phenomena are much more predictable than others. For example, if a food parcel is dropped from an aeroplane flying at a given height and at a given speed, then it is very predictable where the food parcel will land.The results are predictable, because lots of measurements have been made in the past, the results have been consistent, and because the ceteris paribus rule is unlikely to be infringed.The latter rule means everything else being equal or there being no major change in the conditions. With our food parcel example it would require something like a major change in the law of physics or in the speed of rotation of the earth! In the social sciences such conditions are rare. We are unlikely to have hundreds of consistent previous measurements, and ceteris paribus is a dangerous assumption. In the case of younger students we only have a few years of data and figures do not show a simple linear increase. Nor can we assume the social conditions will remain constant. If conventional universities start to lose their students to distance education, then they are likely to change aspects of their systems to recapture them. This is a good point to move on to looking at patterns and trends.

Patterns and trends


If a strong and distinct pattern can be identified in repeated measures over time than we can attempt to predict future measurements. I want you to look at some graphs and look for patterns. I produced Figure 13 by generating 25 random numbers between 0 and 99 in Excel using the = rand() function.The chart represents 25 measures taken at fixed time intervals over a period of time. It could be 25 years, 25 semesters, 25 weeks etc. As you can see, there is no overall pattern in the figure.The next measurement could be any number from 0 to 99 and each one is equally likely. But note that if you just took only the last 4 measurements you might think that there was an overall downward trend.The simple lesson here is that the more points you have on your graph the more chance you have of seeing the real pattern, or lack of a pattern.

Commonwealth of Learning

77

Module A3 Getting and analysing quantitative data

Figure 13 Twenty five random numbers


10 0

90

80

70

60

50

40

30

20

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 Ti me 14 15 16 17 18 19 20 21 22 23 24 25

Statistical note: Random numbers Selecting random numbers is like drawing numbers out of a hat. You pick one number out, note it down, return it to the hat, mix the numbers up, then pull out the second number, etc. Each number has an equal chance of being picked each time. (In everyday life, when you draw lots, the numbers that are drawn are not put back in the hat. In statistics, they must be put back before the next number is drawn. This is to ensure that all numbers have the same chance of being drawn for every draw.) This does not mean that you will actually get what most people think a random sample should look like. There is a small chance of you picking the same number each time, or the numbers 1, 2, 3, in the right order. Each sequence of 25 numbers has the same chance of being selected as any other. Fortunately for me, my selection came out in no particular order or pattern. Now look at Figure 14.The upper line is a perfect straight line. In our sense this is a pattern and the longer this straight line becomes, the more confident we will be in predicting no change for the future. Actual straight lines are rare in the social sciences. The lower line in Figure 14 could be termed a straight line with small random fluctuations. In other words there is an underlying straight line pattern, but the actual measurements are affected, both upwards and downwards, by small random events. The overall average measurement is 2.5 and one would predict that measurements will remain at between 2 and 3 and will average out at around 2.5.

78

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

Figure 14 Straight lines


4.5

3.5

2.5

1.5

0.5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 Ti me 14 15 16 17 18 19 20 21 22 23 24 25

Figure 15 shows a similar situation to Figure 14, but with regular curves.The upper line is a perfectly cyclical curve.The period of the cycle is 8.This means a complete cycle takes place every 8 measurements. Future measurement will be easy to predict providing that one knows where in the cycle you are.
Figure 15 No change with curves
12

10

0 1 -2 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

-4

-6

-8 Ti me

Some things occur in education in cyclical patterns. For example, applications to enrol may peak and trough each year according to the seasons. Withdrawals may peak around the examination time. However, the patterns are unlikely to be as neat and symmetrical as our first curve.The lower curve shows another symmetrical curve with random fluctuations. Here it is a slower cycle with a peak every 40 measurements. Again future measurements are largely predictable given the position one is in the cycle and the average level of error caused by the random fluctuations.

Commonwealth of Learning

79

Module A3 Getting and analysing quantitative data

Finally, I want to talk about trends. A trend is a general tendency to move up or down over a period of time.The charts that we have looked at so far do not have trends.Their average values remain constant.

Activity 12

10 mins

Predicting from straight lines Look at Figure 16 below. It shows two straight line trends. 1 What differences do you see between the two? 2 How would you predict future measurements? 3 Which is likely to be the better predictor?

The feedback to this activity is at the end of the unit


Figure 16 Two straight line trends
30

25

20

15

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 Time 14 15 16 17 18 19 20 21 22 23 24 25

Activity 13

10 mins

Predicting with curved trend lines Figure 17 shows two curved trends. 1 What differences do you see between the two? 2 How would you have predicted future measurements if you had only got the first 5 values in each case? 3 Which is likely to be the better predictor?

The feedback to this activity is at the end of the unit

80

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

Figure 17 Two curved trends


19000 17000

15000

13000

11000 9000

7000

5000 3000

1000 -1000 1 2 3 4 5 6 7 8 9 10 11 12 13 Time 14 15 16 17 18 19 20 21 22 23 24 25

Curves
Faced with graphs that do not display perfect straight lines, there are a number of strategies we can use to tame them.
Smoothing curves

Given that many curves are subject to random fluctuations, one strategy is to use moving averages. In Figure 18 we have taken the data from the lower curve in Figure 15 and put it into a separate graph. We have clicked on the curve and used: Chart Add trend line Moving average Period = 10 This produced the relatively smooth curve that has been superimposed in Figure 18. With a period of 10, the average of 10 consecutive measurements is taken and plotted, then the first measurement is dropped and the average of the next 10 consecutive measurements is taken, and so on.This smoothes out the noise in the system and the more general pattern can be seen.

Commonwealth of Learning

81

Module A3 Getting and analysing quantitative data

Figure 18 Smoothing a curve with moving averages


8

0 1 -2 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

-4

-6

-8 Ti me

Fitting straight lines

The ultimate smoothing can be achieved by fitting a straight line to a curve. This was done in our opening case study and is shown again in Figure 19.This is achieved by clicking on the curve and then using: Chart Add trend line Linear Order = 2 As can be seen, the straight line is a good fit with the data. But beware. Excel will fit a straight line to any data, even when it is not appropriate.
Figure 19 Adding a straight line trend
30

25

20

15

10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Time

Changing curves to straight lines

Some curves that you create in graphs will conform, more or less, to curves that can be created by mathematical equations. If you are sufficiently
82

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

mathematical you will be able to transform these curves into straight lines and then extrapolate the straight lines for prediction purposes.
Fitting curves to curves

An alternative is to attempt to fit mathematical curves to your own graph. Look at the curve in Figure 20. Here there is almost no change in the early period followed by a period of explosive growth.You could fit a straight line to either of these periods but it would be impossible for a single straight line to do justice to the whole curve.
Figure 20 The curve to be fitted
11 000 0

90 000

70 000

50 000

30 000

10 000

-10 000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 5 16 1 7 18 19 20 21 22 23 24 25 Ti me

In Figure 21 we have used Excel to fit a particular type of curve called a second order polynomial, using: Chart Add trend line Polynomial Order = 2
Figure 21 Curve fitted using a second order polynomial function
11 000 0

90 000

70 000

50 000

30 000

10 000

-10 000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Ti me

Commonwealth of Learning

83

Module A3 Getting and analysing quantitative data

You dont need to understand the maths, just examine the fit. Clearly it is a better fit than a straight line but it does not really match the original curve very well. Now look at Figure 22 where we have fitted an exponential curve. Again the maths is not important but you can see that the new curve fits the original one very well. We can now use the formula that underlies the exponential curve to predict the measurement on the y-axis for any value of time on the x-axis.
Figure 22 Curve fitted with an exponential function
11 000 0

90 000

70 000

50 000

30 000

10 000

-10 000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Ti me

Summary
In this unit, you have looked at a wide range of data summarising techniques including: identifying to which type a given piece of data belongs ranking data using the Normal distribution calculating a mean using Excel calculating a mode calculating a median using weighted and unweighted averages calculating a standard deviation calculating a range calculating quartiles calculating the interquartile range

84

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

smoothing curves fitting a straight line to a curve.

Feedback to selected activities

Feedback to Activity 1
I think that you will find that most of my answers are of the it depends! kind.
1 Students occupation

If your list of occupations is something like farmer, miner, mechanic, etc then occupation is clearly a nominal level variable. However, there are also carefully designed occupational scales where those at the top have more skills, more income, more social prestige or whatever, than those at the bottom. In such cases occupation would become an ordinal scale.
2 Students disability

This is fairly similar to occupation. If the question is Does the student have a disability or not?, then it is a nominal scale. An ordinal scale could also be constructed in which the extent of the students disability could be calculated from very slight to extremely severe. However, in the cases of certain disabilities one could go further. It could be argued that a hearing loss for one ear can be calculated from zero to 100 as a ratio scale, where zero meant no hearing loss and 100 meant complete hearing loss. However, this zero is not really absolute as it relies on a human definition of what is normal hearing. So it is really an interval scale.
3 Students IQ score

An IQ score cannot be ratio data, because there is no absolute zero nobody can have a zero intelligence. Nor can they really be interval data. While one can use sensitive equipment to measure percentage hearing loss, one cannot guarantee that the difference between scores of 98 and 99 is the same as the difference between 136 and 137. So it is best described as an ordinal scale. However, most researchers treat it as though it was an interval scale.
4 Students income

This is ratio scale.The units are equal, zero really is zero and the fact that income might be a negative amount does not invalidate the argument.
5 Students previous educational qualifications

This could be a nominal or an ordinal scale. For the latter you would need to rank order all of the possible qualifications held. It could also be a ratio scale if
85

Commonwealth of Learning

Module A3 Getting and analysing quantitative data

for example you were just measuring the number of passes in a school certificate or some other named examination. In that case you could say that someone with six passes had twice as many as the person with three.
6 Students rating of the level of difficulty of a course

Strictly speaking these are ordinal scales. However, in practice rating scales have numbers attached to them and they are often treated as though they are interval scales.This implies that the spaces on the scale are equidistant, but this is rarely demonstrated.
7 Fee payments

This is a ratio scale.


8 Number of students registered on a course

Another ratio scale.

Feedback to Activity 2
I think you will agree that it is very difficult to get any feeling of what the distribution of ages is. So, where to begin? I will answer that question in the next part of the text.

Feedback to Activity 3
The first thing that you should have spotted is that student number 225 appears to be 221 years old. Clearly something has gone wrong here. Lets say that the students real age was 21 and that somebody made a mistake when recording the age. For us it is now simple to correct student 225s score. Change the age from 221 to 21 and sort the data again. The first student should now be number 252 aged 35. (S2 on the Model worksheet) This is an example of data cleaning that is covered in more detail later in the module. Of course there may be other students whose ages have been incorrectly entered, but here we can only spot ages that are outside the possible range. Apart from that, about all we can say is that the students ages ranged from a minimum of 15 to a maximum of 35.This will tell the head of the institution a little but she will probably need more than that. Clearly we need to know something about the pattern of scores how are they distributed? What is a middle or average score? She might want to know how many students were very young or very old.

86

Practitioner research and evaluation skills training in open and distance learning

Unit 4: Quantitative institutional data

Feedback to Activity 6
Your answer should be 25 if you calculate it with zero decimal places. My calculations are in S5 in the Model worksheet.

Feedback to Activity 7
Your answer should be 2.1. If you had any problems, look at my calculations in C that starts in column N on the SD worksheet.

Feedback to Activity 8
You should have got: 60% will be between 27.1 and 22.9 95% will be between 29.2 and 20.8 We can make direct comparisons between the two groups, even though they have different numbers of cases. Because the second group has a bigger standard deviation than the first group, we know that their ages are more spread out, even though they have the same average.This is what we should have expected given the shape of the graph in Figure 12.

Feedback to Activity 9
The range in this case is 20-30.

Feedback to Activity 10
You should have found that the second quartile is 25.21 and the third quartile is 27.63. (See calculations in Q2)

Feedback to Activity 12
1 The longer line shows an upwards trend over time.The shorter line shows a downward trend over time.The downward trend is steeper than the upward trend. 2 Given that these are straight lines and we have no other contextual information, the simplest way to predict future measurements is to place a ruler on each line and simply extend the line towards the right hand side. 3 The longer line is likely (but not certain) to be the better predictor because it is based on a longer run of measurements. But you should have noticed that the shorter line is already down to zero. Whether or not it is meaningful to extend this line depends upon the context. While institutional budgets can go into deficit, if we are talking about student enrolments then they cannot go below zero.

Commonwealth of Learning

87

Module A3 Getting and analysing quantitative data

Feedback to activity 13
1 Line A shows a declining trend. It starts off quite steeply and then flattens. Line B represents a rising trend. It starts off as almost a flat line and then increases dramatically until the line is almost vertical. 2 If we had only got the first five measurement points, with line A we would have predicted a much steeper decline than actually occurred. With line B we would have predicted almost no increase at all and we would have been spectacularly wrong. 3 In both cases we have 25 measurement points and a clearly defined curve. Both should produce good predictions. However, we know nothing about the context of these figures so it is difficult to evaluate them. Subjectively I would feel that the slow decline is more sustainable.The very rapid increase of line B is heading towards infinity and I feel that something will have to give!

88

Practitioner research and evaluation skills training in open and distance learning

Doing institutional research from scratch

U N I T

Unit overview
This unit is designed to help you plan how you will collect data for a particular research project.You will look at: the ideas of reliability and validity the dimensions of data collection methods the range of quantitative research methods designing good questions to ask people borrowing good questions forms of questions designing a good questionnaire which people to study carrying out a survey analysing.

Learning outcomes
When you have worked through this unit, you should be able to: 1 Identify factors that affect the validity of data. 2 Identify factors that affect the reliability of data. 3 Identify dimensions of data collection methods which might affect the collection in a given situation. 4 Describe the main quantitative research methods. 5 Identify poorly designed questions and rectify their faults. 6 Implement the guidelines for good question writing. 7 Use open- and closed-questions appropriately. 8 Design an effective questionnaire. 9 Decide on the most suitable sampling strategy for your survey.
89

Commonwealth of Learning

Module A3 Getting and analysing quantitative data

Introduction
In the previous sections we have looked at two sorts of institutional research. First there was the research that involved examining the research of others, and then there was the research where you looked at institutional databases. In this section we are going to look at the type of research where you, the researcher, make virtually all of the decisions, such as, what data to collect, how to collect it, how to analyse it, etc. I found it difficult to come up with a title for this section. I have used from scratch to indicate that nothing is given lots of decisions have to be made. I could have said ad hoc, a Latin expression that means that it is done for a single special purpose, or improvised. In the physical sciences this is virtually the only type of research that is done and so no extra label is required. In the case of institutional research a standard label for this third strand has yet to emerge. The key distinguishing characteristic of this type of research is that it involves collecting new data.This data has to be collected by a research method. But which one? Our choice should be guided by our aims.

Validity and reliability


Whatever quantitative methods are selected, our aim is to collect data that is both valid and reliable.These terms are fairly straightforward but are often misused. Imagine that you have an old pair of weighing scales and that you are trying to weigh a bunch of bananas.You arrange the weights until the scales balance and you decide that the bananas weigh 9 kilos.To check your calculations, you ask a few friends to repeat the procedure and they all agree that the answer is 9 kilos.Therefore you have a reliable measure i.e. repeated measures produce the same results. However, it turns out that the weights are very worn and the bananas actually weigh 10 kilos. So while your measure is reliable it is not valid i.e. it is not actually measuring what you have set out to measure. So in this case the research method is reliable but not valid.

Activity 1

15 mins

Valid but not reliable Using the weighing example, try to think of a situation where the results would be valid but not reliable.

The feedback to this activity is at the end of the unit

90

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

So in principle, we should use a research method that we believe to be valid (it measures what we want to measure) and reliable (the results are consistent). However, the phenomena that we are studying are highly complex and so the position of us, the authors of all these training modules, is that whenever possible one should use an array of methods. In this way one gets a variety of perspectives on the phenomenon in question. By a process of triangulation (a concept taken from map-makers), we hope to establish where reality lies.

The dimensions of data collection methods


The great majority of this type of research involves learners providing information by filling in forms or questionnaires. We too will concentrate on this method when exploring design and analysis techniques later. However, quantitative information or data can be collected in a great variety of ways. Lets consider some of the important dimensions of this variety.

Obtrusiveness
Is the student aware of what you are doing and why? At one extreme, data can be collected without the people being aware that it is happening. For example, you could secretly tape record a tutorial or class then later measure the amount of time that the tutor spoke, or the time that the students spoke. Or you could ask for copies of student assignments and score them for understanding of certain concepts. At the other extreme, ODL researchers have been known to attach equipment to students heads in order to measure eye movement when they were learning from ODL printed course material. In general, the more obtrusive the research method, the more likely is the phenomenon likely to be affected and distorted by your measurement techniques.Thus your results are less likely to be valid. On the other hand, such close control of the research situation can improve reliability.

The research setting


This is related to the first topic. In what setting is the research carried out? On the one hand you could strive for the most naturalistic setting possible. Teaching and learning is allowed to proceed in the normal way, at home or in classrooms, and measures are taken unobtrusively or afterwards. However, you may want to research something in isolation from the whole programme.You might want to try out two versions of a radio programme and decide that the best way is to randomly divide a group of volunteer learners into two groups.They listen separately to one of the two versions, then you measure their reactions and compare the results.

Commonwealth of Learning

91

Module A3 Getting and analysing quantitative data

The more naturalistic the research setting, the more complex the social situation will be and the more difficult it will be to isolate the effects of a single factor such as a single radio programme.

The level of control


Clearly you, the researcher, are more in control if you have brought the research subjects together in a single place and can tell them exactly what to do. With surveys the amount of control is more variable. Traditionally in market research and government social surveys the questions are read out by an interviewer who then records the persons answers. However, in ODL there has been a huge use of self-completion questionnaires. In the main these have been mailed out to learners who complete them and then mail them back. Interviewers are usually trained to be talking tape-recorders.They are meant to read the questions out and then record the answers in a machine-like manner. However, human touches may occur! Respondents may ask them to explain what a question means. When coding answers into categories (e.g. If the person says they are an engineer, should that be recorded as a professional engineer or a skilled manual worker?) Human judgement and variability comes into play. The more control you exert, the more standardised the data collection can be.

Involvement of the learner


Some research involves no contact at all with the people being researched. For example, if policy-makers want to know the answer to the question, If you add more tutorial sessions to an ODL course, do retention rates go up?, this can be researched by comparing retention rates on similar courses that have different numbers of tutorials. On the other hand, if the question is If you add more tutorial sessions to an ODL course, do students enjoy the courses more?, then the learners are more involved.They become the subjects rather than the objects of your study.They will have their own ideas of why you are doing the study; they may well have a vested interest in the outcomes of the study; but they may not have thought about a particular topic until you ask them for their opinion. A further step is to involve the learner in the research design itself. At one level this can be to involve learners in groups to discuss questionnaire wording. However, it can be extended to finding out what they feel the research issues should be, all the way to collaboration over data analysis and interpretation.

92

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Experimental manipulation
Experiments are a fundamental technique of the natural sciences. In chemistry you may want to answer the question, If you add chemical A to chemical B at different temperatures, is chemical C produced?You conduct an experiment in a laboratory under carefully controlled conditions and conclude that chemical C is only produced when the temperature is above 50C. In a similar way one might imagine setting up a social experiment and concluding that If you add more tutorial sessions to an ODL course, retention rates go up, but only among students aged under 25. In practice, such results are very hard to achieve. One could attempt to set up a real laboratory experiment. Here the learners would be randomly allocated to different experimental groups and some groups would be given more tutorial support than others. The more one makes it a laboratory situation, the more control one has over the situation and the more one can control for other factors that might influence the results. (For example, whether or not learners can get to tutorials.) However, the laboratory is a very unnatural setting. Results obtained there may not generalise to real life. One could attempt a quasi-experiment where learners in some tutorial groups or regions are given more tutorial sessions than others. Here one is trying out a policy decision in a real-life situation. However, there are many other factors at work other than the number of tutorials and these may confuse or obliterate the effects that you are trying to measure.

Activity 2

10 mins

Your research project If you have been designing your own research proposal as you work through the modules, spend a few minutes considering how it fits in along these five dimensions. Think about what you may have lost or gained in terms of validity and reliability by the decisions you have taken. If your research proposal involves examining the research of others or looking at institutional databases, then the activity is fairly simple. If you will be researching from scratch the activity will be more challenging.

There is no feedback to this activity

Commonwealth of Learning

93

Module A3 Getting and analysing quantitative data

The range of quantitative research methods


There are many types of quantitative social research methods (Table 21). Each has its strengths and weaknesses and each is more or less suitable in different situations. Here we uncritically describe the range in terms of how the data is collected just to make you aware of the variety.
Table 21 Types of quantitative social research methods Category Directly asking people to provide information Examples and comments Face-to-face: you the researcher go to the researched in their home or workplace they come to you you meet in a third place e.g. a study centre By telephone By mail By email or other electronic methods Other, e.g. students keep diaries or make audio tapes Observational Much ODL involves learners studying at home on their own and observational studies are virtually impossible. However, when learners come together for classes, tutorials, residential schools or whatever, they can be observed in action, so to speak. This often shades into our second type of research which draws upon institutional databases. Records of assignment grades, tutorial attendance, the grades given by a particular tutor, etc can be drawn upon for research purposes. While tutorial attendance records may have been kept for the purposes of deciding where to geographically locate study centres, you may want to use them to look at the relationship between tutorial attendance and student progress. There are many possibilities of indirect methods that do not fit easily into the first three categories. For example: interviewing tutors about the experiences of their students obtaining measures of energy use after teaching a course on reducing the consumption of domestic energy noting which newspaper adverts generate the most enquiries looking at measures of economic activity in areas where an ODL scheme has taken place.

Records

Other methods

All of these methods will generate quantitative data that need to be statistically analysed and we will go through analysis procedures in the last section of this module. In the following sections we will focus on the methods of collecting data from people by asking them questions.

Designing good questions to ask people


In real life it does not matter so much if you ask an ambiguous question.Take this simple example. You: What day is it today?

94

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Me: You: Me:

Thursday. No, I meant what is the date. Oh, it is the 22nd of March.

With a self-completion questionnaire you have to get it right first time. If the questionnaire is being administered by interviewers, they can try to clear up ambiguities but there is no guarantee that they will do this consistently and in line with your intentions.

Activity 3

15 mins

What makes a good questionnaire question? Consider the two questions below. Which do you think is the best question to put in a selfcompletion questionnaire and why? Q1 Where are you from? (Please specify) Q2 In which country were you born? (Tick one only) A-land. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-land. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-land. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-land . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other (Please specify below). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . _______________________________________________

1
2 3 4 5

The feedback to this activity is at the end of the unit In summary I would say that our aim is to ask unambiguous questions that people find easy to answer in a way that we can record and count them and that will produce valid and reliable data. Lets unpack those words in bold.

Unambiguous
We want to ask questions that people can understand, and understand in the same way as everybody else. Consider these questions:
Q1 Have you been satisfied with this course?

What do we mean by course? Do we mean a whole programme or a module? Do we mean the written ODL material or everything including tutoring, assessment, administration etc?

Commonwealth of Learning

95

Module A3 Getting and analysing quantitative data

Q2 Do you like your tutor?

What do we mean by like like as a person, as a teacher, as a grader of assignments?


Q3 What languages do you speak?

Do we mean fluently? Do we mean just speak? What about writing skills in each language?

Activity 4

5 mins

What makes a question ambiguous? 1 Consider the question How many children do you have? 2 Can you see why it might be ambiguous? 3 Can you suggest an alternative?

The feedback to this activity is at the end of the unit

Easy to answer
We want to ask people questions that they can answer, possibly with a little thought, but not requiring a lot of research of their own. Consider these two questions:
Q1 How many kilograms of grain did you harvest last year?

Is harder to answer than:


Q2 Was last year a good harvest?

Or another two:
Q1 Please list all of the exams that you took at school, giving dates and grades.

Is harder to answer than:


Q2 Did you stay on at school after the minimum leaving age?

Activity 5

5 mins

Making a question easy to answer You want to know how big a students village is. How could you make the question How many people live in your village? easier to answer?

The feedback to this activity is at the end of the unit

96

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Easy to record and count


The answers need to be recorded in ways that are easy for the researcher to add together.This will normally be done on a computer. In our examples in the What makes a good questionnaire question? activity it was clear that the second question was much better in this respect.

Valid and reliable


As we described earlier, the question must measure what you want to measure, and get the same data if repeated. It must produce valid and reliable data. Before framing a question you must be clear what you want to know. Imagine that your students tend to be people who work in the city during the week but return to their villages for weekends or holidays. A question such as Where do you live? will not produce valid and reliable data.The question should be re-phrased, perhaps as one of the following:
Where were you staying last Wednesday night? Where is your family based? Where are you registered to vote?

It depends upon what it is that you actually want to know.

Guidelines for good question writing


Writing good questions is probably more an art than a science but there are certain basic guidelines that should be borne in mind. Eleven are offered below:

1 Use the right language


This seems simple enough. Surely you just design the questionnaire in the national language of the country where you are doing the survey. Well, not always. When I was in South Africa I was reminded that for many students their first language was their tribal one, while Afrikaans was their second language and English their third. Furthermore they were generally more fluent in Afrikaans but chose to be taught in English. In Mauritius a survey was carried out both in the official language and the local patois a variant of French. In the United Kingdom some survey documents also have to be produced in Welsh. I do not know your situation but I am sure it is likely that you will have to design your questions in two or more languages.This brings with it problems of strict comparability. Experts should be used to translate variants of questions backwards and forwards to see whether they really match up.
Commonwealth of Learning

97

Module A3 Getting and analysing quantitative data

2 Use the right tone


I feel that a questionnaire should be somewhere between a formal government form such as a census or income tax form and a letter to a friend. Not too formal and not too chatty or colloquial. It should be friendly and use plain language. It should not be patronising or condescending your respondents should be treated as equals.

3 Avoid technical terms


Try not to use technical terms that might be in common use between you and your colleagues but might not be known by your respondents. (This is a problem I have had to face when writing this module). Acronyms should be avoided whenever possible.You may refer to tutor-marked assignments as TMAs but not everybody does.

4 Use short questions


Short questions are easier to grasp than long ones.To take an exaggerated example, it is better to say How old are you? than If I were to ask you your age at this moment, to the nearest year, what would be your reply? However, this example does show that simple questions may require some explanatory notes before the short question is posed. For example, we might have to say something like:
What is your gross personal income, approximately?

By this we mean all that you earn, including overtime, bonuses and benefits (e.g. company car). Plus all unearned income from savings and investments, etc. Before any deductions for income tax, national insurance, etc.

5 Ask one question at a time


It is a classic mistake in surveys to ask two questions in one.You have to be extremely vigilant and usually the simplest solution is to turn it into two questions. One question bad
Did you find this course interesting and enjoyable?

Two questions good


Did you find the course interesting? plus Did you find the course enjoyable? One question bad If you did not like the course, what should we change? Two questions good

98

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Q1 Did you like the course? Yes/No Q2 If no, what should we change?

6 Do not ask invasive questions


You do not want to put people off answering your questionnaire by asking them questions that they feel are rude or embarrassing. If such questions are necessary, they should be placed towards the end of the questionnaire and people should be offered the chance to say that they prefer not to answer the question. What constitutes an invasive question will vary from culture to culture. It may relate to religion, politics, family arrangements, etc. In Britain it was considered that asking a person their income was the height of bad manners, but things seem to have changed recently.

7 Do not use leading questions


You should not ask questions that guide people towards a particular answer. Again to use an extreme example, if you say Most people think that our courses are wonderful. What do you think? you are already suggesting what answer you expect. Usually it is more subtle than that. Apparently people like to say Yes to things a phenomenon known as Yea-saying so you should mix up positive and negative statements when presenting respondents with a list of statements to agree or disagree with. Also people will say Yes to everything if you ask them whether they want lower fees, more tutorial support, more radio programmes, etc.You need to set up your questions so they have to choose what they would also accept less of in order to fund the improvements.

8 Do they have the information to answer the question?


This comes in several forms. If you asked me my height in metres, I dont know. I know it in feet and inches. I guess that I could look up a conversion table but I dont have one to hand. If you ask me my secondary school grades, I dont know. I have forgotten them and I have no written records. If you ask me my feelings about a particular part of last years course, I dont know. I simply cant remember. If you ask me about my opinion about whether the course should be on a CD-ROM, I dont know. I have not thought sufficiently about the topic to have an opinion. I do not have the information needed to form an opinion.

Commonwealth of Learning

99

Module A3 Getting and analysing quantitative data

9 Is there an option for everybody?


If you are asking people questions where they have to tick one or more of the answers that you have given them, it is very important that everybody can find their own answer in the list. For example, if you are asking the learners to categorise their fathers occupation, the options should include: I do not know my fathers occupation My father has no occupation My father is no longer alive The option Other can also be used, but it is a sign of a badly designed question if more than 5% of your respondents actually select it. Pre-testing of the questionnaire should have thrown up most of the main possible answers.

10 Is it relevant to them?
While a topic may be of great relevance to you as a researcher and to policymakers within your institution, it may seem irrelevant to the respondent. If so, they may abandon the survey.You should make great efforts to explain why it is relevant or consider dropping the topic.

11 Can others answer it?


Always try out your questions on colleagues, family, friends, students, etc.You will be amazed how what seems straightforward and easy to you, is confusing to others. A final critical test of any question is whether you could answer it, or would be prepared to answer it, yourself.Try it!

Activity 6

10 mins

Improving poorly worded questions 1 Based on the points listed above and on your own experience, what if anything is wrong with the questions A, B and C below? 2 In each case, write a better version of the question. A Did you find the SAQs useful and stimulating? Yes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

100

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

B What newspaper do you usually read? (Please write in)


..........................................................................................

C Do you own a computer? Yes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The feedback to this activity is at the end of the unit

Borrowing good questions


So far we have just thought about designing your own questions for your own purposes. However, researchers have been designing questions for years and it will not always be necessary for you to start from scratch. If you consult the research literature you should be able to find many questions that will meet your needs. Apart from the waste of energy involved in you designing new questions, there should be two other big advantages if you borrow from elsewhere: Firstly the previous researcher should have tested out the question to ensure that it provided valid and reliable results. To take a simple example, you may have been considering using the question:
What is your age? _________

But then you see in the literature that it is more common to say:
What year were you born? 19 _ _

(Example 19 4 7)

Secondly iIf you ask the same question and analyse the answers in the same way, there should be results that you can compare your data with. For example, if you take the questions about occupation from your national census and code the answers into the standard occupational groups, you will be able to compare your student figures with regional and national data.Then you will be able to see where you are being more or less successful in attracting certain occupational groups. This borrowing of questions reaches its peak when researchers use whole questionnaires, often in the form of inventories or psychometric tests. Inventories tend to be lists of statements for the respondent to tick or not, to agree with or not. An example is Kembers Distance Education Student Progress (DESP) Inventory (1995). He himself took the original Approaches to Studying Inventory (Entwistle and Ramsden, 1983) and adapted it for distance education students. An extract from this inventory can be seen in Table 1.The four items shown are about Intrinsic and Extrinsic motivation for studying, When the whole inventory has been completed the student can be

Commonwealth of Learning

101

Module A3 Getting and analysing quantitative data

scored on various factors such as deep and surface learning, academic integration, etc. An extract from this inventory can be seen in Table 22.
Table 22 Extract from Kembers Distance Education Student Progress (DESP) Inventory (1995) Strongly agree My main reason for doing this course is so that I can learn more about subjects which really interest me I suppose I am more interested in the qualification Ill get than in the course Im taking I find that studying academic topics can often be really exciting I chose the present course mainly to give me a chance of a really good job afterwards 1 Agree Neither agree nor disagree 3 Disagree Strongly disagree 5

1 1

2 2

3 3

4 4

5 5

Psychometric tests attempt to measure certain psychological characteristics or aptitudes.These may be things like intelligence, motivation, introversion, etc. Such tests will have norms.These are scores obtained by the general population, or by specified sub-sections of it, and they will provide comparisons for your own results. When you are thinking about using other peoples questions there are four things to bear in mind:

Suitability
The tests and inventories will have been designed with certain age-groups and literacy-levels in mind. Do your learners fit in with the target group? It is just as important to establish whether the cultural norms of your target population match those of the original population. Many intelligence tests are considered to be culturally-biased, even within a single country. I came across an example in Alaska where children were shown a picture of a car tyre and then pictures of a car, a boat, a shopping trolley and a bicycle.The question was which picture does the car tyre go with.The local Inuit children picked the boat because car tyres were used as fenders on their boats.They were marked as being wrong.

Permissions
It is good manners, and generally a legal requirement, to ask the original author for permission to use them. In the case of tests and inventories there may well be licence fees involved.

102

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Proficiency
In many cases you will have to hold a certain professional psychological qualification before you are allowed to use them.

Conditions
Several inventories, including the DESP mentioned above have been administered at a distance. However, most psychometric tests have to be carried out under carefully controlled conditions.This means that selfcompletion at home will not be acceptable.

Quality
Just because questions have been asked before does not mean that they are good ones. When you contact them, the author may well suggest improvements.You should also read around in the research literature to get the opinions of other researchers on the questions.

Forms of questions
By now you will have come across various forms of questions, either in this module or elsewhere. Here we present the types that are in most common use and offer a short commentary on each.

1 Open-ended questions
Here the respondent is asked a question and the given space to answer.
Example: What is your job title? What do you feel about the new arrangements for local tutorials? In terms of quantitative research, such answers have to be put into a numerical format so that they can be subjected to statistical analysis. This is what we refer to as coding. With the job title question it may well involve you and your team of coders working to a precise set of rules set down in a government or international classification scheme. It is a tedious, time-consuming job that requires both intelligence and consistency to achieve valid and reliable results. It is also not helped by respondents who, when asked to give their precise job title, simply answer police or engineer. You may well want to consider whether it is better to offer a list of job categories, with examples, and ask respondents to code themselves into one of them. With the second example where people are asked to write an answer at length, you are faced with two choices: 1 you can read through 50 or more answers and draw up a list of the ten or so most common themes. Then you code everybodys answers using this coding frame and analyse the data as a quantitative measure

Commonwealth of Learning

103

Module A3 Getting and analysing quantitative data

2 you can use selective quotations from the answers to illustrate findings from other numerical data. There are several drawbacks with the first method: it is costly in time and money you get variability between coders you often end up with a large other category a thought that is only mentioned by a few respondents may well have been mentioned by many if it had been given as an option in the original questionnaire.

Activity 7

5 mins

Open numerical questions Numerical questions can also be asked in an open way. Compare these two questions. Which do you think is better and why? A How old are you? (Please write in age) I am _ _ years old B How old are you? (Please circle one number only) Under 18 . . . . . . . . . . 1 18-20 . . . . . . . . . . . . . 2 21-25 . . . . . . . . . . . . . 3 26-30 . . . . . . . . . . . . . 4 31-40 . . . . . . . . . . . . . 5 41-50 . . . . . . . . . . . . . 6 51 and over . . . . . . . . 7

The feedback to this activity is at the end of the unit If you are faced with a big survey, I would recommend that if you want quantitative data you should use closed questions as far as possible, i.e. those that can be instantly turned into numbers. We will continue with a list of typical closed-ended question types.

2 Select one or several from a list


These are straightforward types of question and we have already come across many examples in this module. Here is a question that combines both types:

104

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Q8a Which tutorials did you manage to get to? (Tick all that apply in column A) b Which tutorial did you get the most educational benefit from? (Tick ONE box only in Column B)

A Attended Tutorial number 1. Held on April 23rd 2. Held on May 17th 3. Held on June 12th 4. Held on July 25th Did not attend any (Tick all that apply)

B Most educational benefit (Tick ONE only)

3 Scales
With this type of question respondents are asked to rate something on a scale, i.e. an array of numbers running from low to high.
Likert scales

The most commonly used is the Likert technique, named after the inventor. We have already seen an example from Kembers DESP, above, where respondents are presented with a set of attitude statements and are asked to express their agreement or disagreement on a five-point scale. Such scales are sometimes seven- rather than five-point, and they can have different headings. For example:
Q42 Use the following list of characteristics to describe yourself. (Circle a number from 1 to 7 to indicate how true of you each characteristic is.)

Never true Logical Confident Nervous Patient 1 1 1 1

Usually not true 2 2 2 2

Sometimes Occasionally Often true true true 3 3 3 3 4 4 4 4 5 5 5 5

Usually true 6 6 6 6

Always true 7 7 7 7

Semantic differential scales

Another type of scale draws on the work of Osgood and semantic differentials. Here one places words or statements with opposite meaning at each end of a scale. Respondents have to indicate their own position between the two extremes:

Commonwealth of Learning

105

Module A3 Getting and analysing quantitative data

Q23 How would you describe the course you took this year? (Please look at each pair of statements and indicate your own feelings by circling a number from 1 to 5)

Good value for money Well taught Boring Easy

1 1 1 1

2 2 2 2

3 3 3 3

4 4 4 4

5 5 5 5

Bad value for money Badly taught Interesting Difficult

Note that with this type of scale you avoid the need to find appropriate labels for each of the values.

4 Ranking
Respondents can be asked to put various items in order. For example, you can ask students to rank order the various modules of a course in terms of a selected characteristic.
Q17 This course contained 5 modules.We would like you to rank order them by putting a 1 against the module that you found the most interesting, a 2 by the second most interesting, etc down to 5 for the least interesting.

Your rank order Module A Module B Module C Module D Module E

Example Module A Module B Module C Module D Module E

Your rank order 4 3 5 1 2

Although it seems a good idea to get students to make comparisons in this way, my experience is that ranking does not work well for the following reasons: no matter how well you explain the idea, many people will not answer in the way you want them to many like to say that two or more items cannot be divided and they will write in = signs how do they rank order a module that they have not studied? statistical analysis is difficult. For example, you cannot assume that the distance between ranks 1 and 2 is the same as that between 4 and 5.

Other question formats


Questionnaires are very flexible and people have used them in many imaginative ways. For example, some have included pictures and diagrams,

106

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

both to remind students of certain documents and concepts and to test out new ideas and designs.The limits are only your own creativity and the ability of respondents to work out what is required of them. We will leave this section with three further examples:

Example 1 Here we asked undergraduates to assess just how part-time their course was. As some would measure it in points and others would not, we offered alternative labels for the same categories. Q1 What is your current workload on this course? (Please cross one box only) CATS/Credit points Fewer than 60 60-69 70-79 80-99 100-119 120 or more Approximately equivalent to:

Less than half-time Half-time Two-thirds Three-quarters Almost full-time Full-time

Example 2 Here we assumed that undergraduates would be familiar with the concept of percentages and asked them to break down how their course fees were paid so that they added to 100%. Q2 How were these course fees paid? (Please fill in the boxes so that they add to 100%) Your answer Myself Family, friends, bank loans, etc Fee waiver/financial assistance scheme My employer Other source e.g. Trade Union, educational trust (Please specify below) Example 0 5 0 0 1 0 0 1 0 0 2 0 0 1 0 Trade Union Total

1 0 0%

Commonwealth of Learning

107

Module A3 Getting and analysing quantitative data

Example 3 This example was an attempt to get undergraduates to use the concept of trade-offs. If they wanted more of something, then they would have to be prepared to give up something. Q3 Your course consists of the components listed below. If we were to try to improve the course, which would you like more of, which less off and which about the same? (We have no extra resources so, for each item that you would like more of, please tick at least one item that you would not mind less of.) Would like more of Written material Assignments TV programmes Radio programmes Face to face tutorials Telephone tutorials Residential school Exams The same amount Would not mind less of

Designing good questionnaires


Questionnaires are now very widespread and most of you are likely to have come across them.You may have been stopped on the street or phoned up and asked a series of questions about some commercial product or government service.You may have been asked to answer one when checking out of a hotel. Or your favourite magazine may include one that offers to tell you what sort of person you are. So what are they? What basic characteristics do they share? My dictionary says that a questionnaire is: A prepared set of written questions, for purposes of statistical compilation or comparison of the information gathered. This seems an adequate definition for our purposes and we will assume that the questions have been prepared in line with the design principles outlined above. However, designing a good questionnaire requires more than putting a list of good questions together. We will consider some of the decisions to be made.

Length
Colleagues often ask me how long a questionnaire should be, but there is no simple answer. It is certainly not the case that longer questionnaires inevitably

108

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

lead to lower response rates. I have had a 75% response to a 28 page questionnaire and a 5% response to one that was only one page long. It seems to depend upon how relevant the questionnaire is seen to be and how committed the respondents are to the institution running the survey. I would argue that a one page questionnaire might be seen to be trivial or inconsequential and thus be ignored.

Order
It is better to start with easier questions while respondents get used to how to fill in answers.These might be simple details about the course that they are studying. Some people start with demographics such as age and gender as these are easy, but it is more common to put these at the end.

Anonymous or not?
Whether or not to make the survey anonymous or not is a big decision. There are arguments in favour: it may encourage people to respond it may encourage them to be more open and honest with their replies. But there are a lot of counter-arguments: it means that you have to ask them for information that you already have on record, e.g. age, gender, course details you do not know who has replied, so if you want to use reminders they have to go to everybody you cannot do longitudinal studies where you match up a given respondents answers over two or more surveys it is difficult to calculate response bias. On balance, I prefer to know who a questionnaire has come from. In my surveys I put the students ID number on the questionnaire and explain why it is there. Also, it means that if the questionnaire does fall into the wrong hands the identity of the respondent will not be revealed.

Rewards
Some surveyors, especially those in the market research area, offer small rewards for completing questionnaires or a big prize based on a lottery. There is little evidence that this improves the amount or quality of data received and it is seldom used in the education field. When rewards are offered in ODL they tend to be in the shape of sharing the survey results with the respondents.

Commonwealth of Learning

109

Module A3 Getting and analysing quantitative data

Routeing
In most surveys people are not required to answer all of the questions.They need to be routed around the questionnaire using clear instructions like those below. Yes.. 1 or Questions 10 to 12 are for everybody to answer. No 2 Now go to Question 8

Variety
Try to use a variety of question types. It is very boring for people to have to work through lots of almost identical questions. It has also been found that this encourages people to get into mind sets where they answer in particular ways and patterns rather than responding to the actual questions.

Page breaks
Try to avoid questions spreading over two pages. If forced to, you should repeat the question, and any column headings, so that it is not separated from the answers.

Lay-out
I think that those of you who have seen some self-completion questionnaires would agree that the lay-out (the physical appearance) is important. Some questionnaires are just not very appealing and this can be for a number of reasons.

Size of font
The writing should be easy to read. Many people are near-sighted so I try not to use a font smaller than 12 point.

Font styles
Different font styles can be used for different functions. I use bold for the actual questions, italic for instructions and regular for the answer options.
Example: Q1 How many wheels does a bicycle have? (Circle one answer only) One Two Three or more I dont know 1 2 3 4

110

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Colour of paper
There is some evidence that certain paper colours achieve higher response rates. I am more concerned about whether the people who will key punch the data like the colour and whether having different colours for different versions of the questionnaire will make organising the survey easier.

Space
It is usually a false economy to try to crush your questions into the minimum number of pages. Space is good because it is more friendly to a respondents eye; fewer errors are made in recording answers; fewer errors are made in key-punching.

Introducing the questionnaire


The envelope containing the questionnaire should also contain an introductory letter.This could be a separate sheet or an integral part of the questionnaire itself. (Be warned! If you make it part of the questionnaire, the person might tear it off before returning the questionnaire, so dont print any questions on the back of that sheet.) This letter is your opportunity to influence the person to complete the questionnaire, so the letter should be both persuasive and short. We give an example below:

Commonwealth of Learning

111

Module A3 Getting and analysing quantitative data

Institutional headed notepaper Dear Alan I am writing to ask you to spare a few minutes of your time to complete a questionnaire. The University is reviewing the likely costs of the proposed new student support system. To do this we need to know the costs of your studies and how you are managing to pay them. In order to gather accurate information our researchers have drawn up a carefully selected sample across a range of courses. For the results to be reliable and valid it is important that they get responses from as many people as possible within this sample. The information that you provide will be STRICTLY CONFIDENTIAL and your answers will only be identifiable by members of the research team. They will not form part of your student record and any published results will be in the form of grouped data. The questionnaire is enclosed. As you will see it is quite short and should only take you about 15 minutes to complete. We hope that you can find the time to complete it and return it in the reply-paid envelope provided. Parts of the survey may seem quite intrusive. There are questions about your income, dependant children, etc. Unfortunately this level of detail is needed in order to calculate the impact of the proposed new student support schemes. If there are questions that you prefer not to answer, please skip past them and answer the rest. If you have any problems or concerns about this survey, please get in touch with me at the address given above. Thank you for your help Yours sincerely The Vice-chancellor

The letter has many important features, all of which help to increase the response rate: the survey is confirmed as official by the headed notepaper and as important by the seniority of the person signing the letter the student is addressed by name the first line of the letter gets straight to the point.This is what we want you to do it explains why the information is needed the person is told the basis on which they were selected and the need for a high response rate the confidentiality level of their answers is explained

112

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

they are told how long the task will take and how to return the completed questionnaire they are told what to do if they do not want to answer some of the questions they are told who to contact if they have any problems they are thanked.

Instructions
It is customary to include a few general instructions at the start of the questionnaire that tell people how to fill it in.The precise instructions will depend upon the type of questionnaire, but here are two examples: The first is for a paper questionnaire that will be key-punched. Instructions Please circle the number corresponding to your answer like this or write in your answer in the space provided. The second is for a questionnaire that will be scanned in by a computer. Instructions Please use a ball-point pen to complete the questionnaire. Do not use a fountain or felt tip pen as the ink may be visible on the other side of the page. The questionnaire will be read by a computer scanner, so please fill it in as follows. Place a X in the appropriate box, keeping within the boundary, for example: X . If you make a mistake and cross the wrong box, please block out your answer and then cross the correct box. For example: X X

Finishing off
At the end of the questionnaire you should do three things: thank the person for filling in the questionnaire tell them how and where to return their completed questionnaire give them a contact phone number or name and address in case they have problems or queries.

Designing for disability


Some people have disabilities that affect their ability to read and to complete questionnaire surveys. As most ODL material is distributed as print we assume that the learners can at least access text in some way, so we usually add a line to the letter asking them to get in touch if they would like to complete the questionnaire in a different format, e.g. large print, Braille, audiotape.

Commonwealth of Learning

113

Module A3 Getting and analysing quantitative data

Activity 8

45 mins

Critiquing a questionnaire I have designed a short imaginary questionnaire below. I want you to go through it carefully and to critique it in terms of questionnaire design. Do not worry about the wording of the actual questions. Concentrate instead on the structure and lay-out.

The feedback to this activity is at the end of the unit

114

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Commonwealth of Learning

115

Module A3 Getting and analysing quantitative data

Which people to study?


Well the most obvious category is the students and learners on ODL courses, but clearly there are others.

Activity 9

10 mins

Which people to study? List other categories of people that you might want to collect data from as part of your practitioner-based research.

The feedback to this activity is at the end of the unit

All or some?
If you decide to collect data from all of the people in your chosen category (the population), and you have a full list of such people (a sampling frame) then you are carrying out a census. Any inaccuracies in your estimates of the population characteristics or parameters will then be due to measurement error. (For example, a government might try to measure how many people live in the country but actually under-estimate the figure because the researchers failed to include homeless people.) However, if the population that you are dealing with is large (e.g. 10,000 students or 1000 tutors) then it is expensive and usually unnecessary to involve everybody. If you select a fraction of the population a sample and study it you should be able to estimate the population parameters.The estimate is likely to be inaccurate because there will now be both measurement error and sampling error. The latter arises when the sample is not completely typical of the population.

Activity 10

10 mins

Ways of sampling Look at the following three examples of sampling. 1 In each case, decide how satisfactory the method is. 2 If the method is not satisfactory, what is wrong with the sampling method? Example 1 A cook tastes her soup and decides that her soup does not have enough salt in it.

116

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Example 2 An educational researcher wants to study a class of students to see how well a particular teaching innovation is working. The responsible administrator offers her a class to observe that actually has the best tutor and the best students. Example 3 A farmer picks a few ears of corn, examines them and concludes that the crop is ready to be harvested. In fact he picked ears from a few different parts of the field because he knew that some parts were wetter than others, some had different types of soil. And some got more sunshine.

The feedback to this activity is at the end of the unit

Types of sample
Whatever the sample size, it is unlikely to give a totally accurate estimate of the population.This is because you now have both sampling error and measurement error. However, in some cases a sample study will actually give a more accurate estimate of the population parameter than would a census. If a census requires a large team of data-collectors then lack of close supervision may lead to greater measurement error.The trick is how to maximise accuracy while minimising costs, and, if possible, to be able to estimate how accurate your results from a sample are likely to be. There are two types of sample probabilistic and non-probabilistic samples. With the former you can calculate the likely error in your estimate of the population values or parameters.
Simple random sampling

The fundamental method of probability sampling is simple random sampling.This means that every member of the population has an equal and independent chance of being selected.To achieve this your list, or sampling frame, must consist of all members of the population and you must select your sample from it randomly. (This can be done by using the computerised random number generator in Excel, as we will see later.)
Systematic random sampling

A commonly used variant of this is systematic random sampling.This is where you pick the first person at random and take, say, every tenth person. This increases your sampling error slightly and can be dangerous if your sampling frame is arranged in a particular order such as boy, girl, boy, girl. However, it is simpler to use if you are instructing field workers which students to pick in a class. Imagine that your country is divided into 12 different regions and that after your survey you want to be able to describe the situation in each region. A simple random sample might work but it is also possible that some regions
Commonwealth of Learning

117

Module A3 Getting and analysing quantitative data

might have very few cases. Here one should consider using a stratified random sample.The population is divided into non-overlapping groups or strata in this case the strata are regions and individuals are selected at random from within each stratum. If you use proportionate stratified random sampling, the sampling fraction is the same within each stratum and the sample will match the population. However imagine that some of the regions are very small.Then you should use disproportionate stratified random sampling.The sampling fraction would be larger in the smaller regions.The sample would not match the population but estimates of the population parameters can be made by weighting the results from the different strata.
Cluster sampling

Cluster sampling is another form of probability sampling. Here one selects units randomly, then you take all elements of those units as your sample. For example, you might pick certain of your classes at random then study all of the learners in those classes.This can cut down on the costs of a survey but it does increase sampling error.
Quota sampling

Quota sampling is the most widely-used form of non-probability sampling, particularly within the field of market research. Interviewers are given targets or quotas to achieve such as 10 women aged 40-50 or 20 middle class men aged over 35.
Other techniques

There are also other techniques such as opportunity sampling, where one studies whoever is available, and purposive sampling where one deliberately selects individuals who are not typical. When carrying out your own studies your choice of sampling technique will depend upon your research aims. In general you will be trying to maximise accuracy while minimising costs, and, if possible, to estimate how accurate your results are likely to be.
Carrying out a survey

Carrying out a survey properly is a quite complicated logistical exercise. It requires planning, it requires teamwork and it requires attention to detail.You should begin by drawing up a chart of activities and dates similar to the one shown in Figure 23.This is known as a Gant Chart and it is easy to draw up in Excel.

118

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Figure 23 A Gantt chart for a proposed survey Month 1 Week 1 Questionnaire design Sampling Stationery Printing Packaging Mail out 1 Logging in Reminder 1 Reminder 2 Editing/coding Data entry Analysis Report writing X X X 2 3 4 5 Month 2 Week 6 7 8 9 Month 3 Week 10 11 12 13 Month 4 Week 14 15 16

Clearly I cannot specify precisely how long each of these processes are going to take for your survey. I dont know how long the questionnaire is, your sample size, delivery times for stationery, etc. However, it is evident that some processes come before others (you need envelopes before you can mail your questionnaire) and certain processes will overlap (you will probably begin data entry before the last questionnaire is returned). It does not matter if your Gant chart is not exact but it will help you clarify many things. For example, when you will need to recruit extra staff to help pack the questionnaires, or a realisation that you are due to mail out the survey just as a major public holiday begins. In what follows I will take each process and offer some general thoughts and advice.

Questionnaire design
As well as the design considerations mentioned above, this stage will involve consultations with interested parties and graphic design. It will also probably require permissions from different parts of your institution. An important part of the design process involves pre-testing or piloting.
Piloting

The design of your questionnaire, and in fact the design of your whole survey, can be improved by piloting or pre-testing. How much you do and what

Commonwealth of Learning

119

Module A3 Getting and analysing quantitative data

form it takes will depend upon many factors including the scale of your survey, available time and the costs involved. I will sketch out two extremes.
Comprehensive piloting

Draft questionnaires are sent for comments to experts in the field, to those commissioning the survey and to key individuals such as community leaders, student representatives. A pilot questionnaire is designed and completed by small groups typical of the whole population.Their reactions to individual questions and to the whole questionnaire are closely monitored. New versions of the questionnaire are designed and re-piloted. A revised version of the questionnaire is mailed out to several hundred people drawn from the actual population to be studied. Analysis of the results leads to layout change, revision/deletion/addition of questions. A final version of the questionnaire is prepared. The sampling and mailing strategy is revised in light of despatch problems, key punching difficulties who responded, overall response rates, etc.
Minimal piloting

Ask one or two people to complete your draft questionnaire. Pick people who are most like your target population. If possible, sit with them as they do it and get their immediate thoughts and feelings. See how long it takes them to complete it. If somebody is commissioning the research, check with them that the survey is what they want. Consultation does have its limits.The more you do, the more conflicting advice and suggestions you will receive, and the longer the questionnaire will get. So you need to agree a cut-off date for going to print. However, my general advice would be to do as much piloting as you can. Spotting a design error early on can often avoid you collecting worthless data.

Sampling
If you are relying on an administrator or a computer programmer to draw your sample, give them clear instructions about the nature of the sample required. Give them plenty of warning about when you need the sample. Draw the sample as close to the despatch day as possible.This ensures that addresses and course status is up to date. Ask for a paper copy of the sample list plus as many sets of labels as you will need for the survey.

120

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Ideally the labels should be of the sticky, peel-off variety.

Stationery
Order in good time. It will save time if the despatch envelope is big enough to slide in the questionnaire and letter without folding them. If possible use a despatch envelope that does not look like junk mail. The reply-paid envelope should hold a questionnaire that has been folded once. Check what arrangements you need to make to get reply-paid envelopes printed. Here in the UK, I would have to have a licence number from the Post Office. Check that the reply-paid system will work for all areas/countries that your sample live in.

Printing
Tell your printers when you need to have your questionnaires and letters. Negotiate a hand-over date for the master copies. Specify your requirements exactly as well as number of copies you must specify size and colour of paper, pages backed-up (printed on both sides) or not, number and position of staples.

Packing
Unless you have access to very expensive machinery, this involves the very boring and lengthy job of stuffing questionnaires, letters and reply-paid envelope in to despatch envelopes. If you are going to put a label onto the despatch envelope and another onto the questionnaire for identification, it is critical that these two jobs are synchronised.

Mail out 1
If it is a big survey, ensure that your institution can do the franking or put the stamps on that many envelopes. Alert your internal and external mailing system so that they are expecting your bulk mailing. You may want to time the mailing so that it arrives at the end of the week when the respondents might have more spare time.

Commonwealth of Learning

121

Module A3 Getting and analysing quantitative data

Logging in
Set up a system for opening and recording all returned questionnaires.This can be done on the paper copy of your original sample list, or on an Excel spreadsheet. Table 23 suggests a list of logging categories and necessary actions.
Table 23 Logging categories and necessary actions Logged as Gone away/not known at this address Reply from another person saying the respondent is dead Refusal to complete/blank questionnaire Partially completed questionnaire Completed questionnaire Action Inform student records Inform student records Possibly remove from future survey samples Decide whether to include in the analysis Proceed to editing/coding

There will also be a variety of other responses to deal with. Ive lost the questionnaire, please send another, Could you send me details of next years courses?, etc.

Reminders 1 and 2
By the end of two weeks or so, the responses will be dropping off and you will have to decide whether to send out a reminder. A standard method would be to send out a reminder card or letter, then, after a further two weeks, another questionnaire and a reply-paid envelope. Of course, if the questionnaires are completely anonymous, any reminder would have to go to everybody. Whether to send out reminders and of what type will depend upon costs, initial response rates and, ultimately, what overall response rate you decide is acceptable.

Editing/coding
Some preparatory work is usually necessary before the data from the questionnaire can be entered into a computer. Editing involves tidying up the respondents answers before data entry. Some respondents ignore the instructions to tick boxes or circle numbers but you can deduce their answers from what they have written in.These answers can then be ticked/circled appropriately. Some will have selected more than one answer where only one answer is allowed.You can deal with this in a number of ways: erase all answers erase all answers except one, using a random system create a new code for multiple answers

122

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

create new codes for the most common multiple answers. (If lots of respondents have given multiple answers to a particular question, the question was probably badly-designed.) Some blank answers can be imputed. For example, if a person has ticked the 4 children category, one could impute that they should have ticked Yes when asked Do you have any children?. Similarly, certain illogical combinations of answers can be edited out. For example, if a person has said that they did not attend any tutorials they could not select their favourite one. Coding, as we have noted earlier with regard to open-ended questions, is the process of converting written-in answers to numbers.

Data entry
If you are carrying out a small survey, you will probably be transferring the responses yourself to an Excel spreadsheet.You will set up a system that suits you and that is based on your own questionnaire. However, it should resemble the one outlined in Table 24.This is a rectangular dataset. Each row (going from left to right) represents a respondent. Each column (going from top to bottom) represents the answers to a particular question. So, in this example respondent 2 answered 2 to Question 1a, 27 to Question 1b, etc. I have chosen to use the value 999 when the person has not answered. It could be any value, but I use this because it is obviously not a real answer and avoids confusion between blanks and zeroes.
Table 24 Example of data inputted to a rectangular dataset Respondent 1 2 3 4 5 etc Q1a 1 2 2 1 1 Q1b 55 27 267 434 23 Q2 M F F M M Q3 25.3 23.2 22.4 25.7 999 etc

With large surveys you may choose to use a professional agency to enter the data. In the early days of computing, data was transferred via a keyboard to punched cards hence the term key-punching. Nowadays it will probably go straight to an electronic file and can be sent to you as an Excel file or in whatever electronic format you want. It is important to liaise with the agency over your questionnaire design.They will want it laid out and labelled in a way that makes it easy for the key-

Commonwealth of Learning

123

Module A3 Getting and analysing quantitative data

punchers to work quickly and accurately. In general, they prefer all of the tick boxes to be lined up on the right-hand side of the page and each box to have a number indicating where the data is to go. For example:
Example:

Here the key-punchers will know that they should key in a 1 in Card 2 Column 12 and also a 1 in Card 2 Column 14. (Although punch cards are rarely used now, it is still customary to refer to them. It is convenient because cards have 80 columns and they can be displayed on a normal computer screen.) You will see that there are two extra boxes (16) and (17). These have been put on the questionnaire in case one or two fruits emerge from the written in answers. For example, (16) might become melon and (17) grapefruit.

Scanning
Many questionnaires are now scanned-in directly to a computer.There are two main types of scanning:
Optical mark recognition (OMR)

With OMR the scanner reads marks such as answers that have been underlined or boxes that have been blocked in.They can also record pictures of hand-written answers.
Optical character recognition (OCR)

With OCR the scanner attempts to record numbers and letters that have been written in by the respondent such as:

124

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

S E P T E M B E R

27

In both cases the questionnaires have to be laid out in very specific ways, so you must consult with the person who is going to do the scanning.

Electronic questionnaires
A third form of data entry occurs when the respondents fill in questionnaires electronically. In effect they enter their own data.This is a growing area and has many advantages in terms of speed and costs. However, it requires ease of access to computers and networks so we are not going to concentrate on it in this course.

Data cleaning
Under the editing process we talked about clerically eliminating illegal values and imputing missing values before the data is entered. However, once you have your own Excel file you can carry out these activities quickly and efficiently on-screen.

Analysis
We cover the analysis procedure in great detail in the next unit. At this stage I would just say that you should not wait until the last questionnaire is in before you begin thinking about analysis.You can carry out preliminary analysis on the early respondents and so work out appropriate analysis procedures.

Report writing
How to write a report is covered in Module A6 Reporting on research and evaluation to support or influence change. Again we just note here that you can start writing it fairly early on.You can write introductions and methodological sections, before all of the responses are in.

Summary
In this unit you have looked at a wide range of methods for collecting data. These should enable you to: collect data that is reliable and valid take account of the dimensions of data collection methods when planning your own survey choose appropriate quantitative research methods for your survey design effective questions for your survey use borrowed questions effectively and ethically write well-designed questions

Commonwealth of Learning

125

Module A3 Getting and analysing quantitative data

design effective questionnaires choose your sample in the most appropriate way for your survey carry out your survey.

Feedback to selected activities

Feedback to Activity 1
I thought of a complex weighing machine that was extremely accurate but was very difficult to use. If a group of people used it to weigh the bananas, then the average answer would probably be 10 kilos but there would be some variation.

Feedback to Activity 3
I hope it is clear that Q2 is a much better question that Q1. In answer to Q1 people might say a town, a region or a country.They might refer to where their parents or grandparents came from. Q2 states exactly what you mean. It is the country where you were born. In question Q1 the person has to write down their answer whereas in Q2 they just have to tick a box.This makes it much quicker for the respondent to answer and much easier to enter the data into a computer. Q2 guides the respondent by showing them what are considered to be acceptable answers. The Other category in Q2 makes the question comprehensive.There is an option for everybody.

Feedback to Activity 4
In my experience, people will answer this in different ways. In an interview the person might ask me: Does that include adopted children, foster children, step-children, etc? Do you mean living with me? What about my 24 year-old? Do I count him as a child? So your question should unambiguously ask what you need to know. It might be something like this:
How many children do you have who are aged under 18 and are living at home with you? (Please include adopted, foster and step-children.)

126

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Feedback to Activity 5
Well you dont want to student to have to count all the villagers and an exact answer is probably not necessary. So you could make it easier to answer simply by giving some options such as: fewer than 50 people 50-100 people more than 100 people.

Feedback to Activity 6
A

SAQ stands for self assessment question but the person answering might not know that. (Guideline 3) What do we mean by useful? Do we mean useful for completing the assignment, useful when revising, or what? (i.e. there are problems of validity and reliability here.) This is two questions useful and stimulating. (Guideline 5) Yes or No is rather blunt. A better question might be something like: Did you find that doing the self assessment questions (SAQs) helped you to learn what was being taught in the course materials? (Tick one box only) Yes, a great deal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yes, quite a lot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In between/it varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No, not very much . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . No, not at all. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I did not do any SAQs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B

This can be a useful question in many cultures.The newspaper can indicate other factors such as literacy levels, social class and political allegiance. As it stands it is a leading question. It implies that everybody does read a newspaper. (Guideline 7) Usually is not defined, so there are problems of validity and reliability.

Commonwealth of Learning

127

Module A3 Getting and analysing quantitative data

The open-ended nature of the question will make coding more expensive, and more prone to errors. It would be better, if possible, to list the most popular newspapers. A better question might be: What newspaper did you read yesterday? Newspaper A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Newspaper B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Newspaper C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Newspaper D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other paper (Write in below) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Did not read a paper yesterday. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C

This question is fine, as far as it goes. It is short and clear. However, is this the information that you want? If you want to know whether a learner can have the use of a computer for study purposes, you may need a question such as:
Do you have regular easy access to a computer for study purposes?

(This may be at home, at work, at a public library, etc) You may want to know what type of computer it is its speed, its memory, whether it has Iinternet access, etc.This will require further questions. Access to a computer tells you nothing about a persons skills.You may want to know how experienced they are with computers, how familiar they are with various applications, whether they can programme, etc.

Feedback to Activity 7
B is less intrusive and therefore might be answered by more people, but it has several disadvantages: the data is in a cruder form than in A. Averages can be calculated but they will be less accurate.The categories Under 18 and 51 and over create particular problems somebody who is 31 might not like to group themselves in with 40 yearolds you can never go back and un-group the data, so it is important that you actually use the groupings that are needed the question takes up more space.

128

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

However, groupings might be better when the person is unlikely to know the answer in detail (e.g. number of hours spent studying the course each week) or is unwilling to be too specific (e.g. income).

Feedback to Activity 8
Instruction box Basically a good idea.You might have added Please use a blue biro or felt-tip pen.This shows up better than pencil or black ink for the key-punchers.
Q1

There is no need to put the answers so far away from the numbers. Yes 1 would be better The instruction (Go to Q7) would be better after the number to circle.
Q2

There should be an option Did not study this Unit for each of the four. The font size in the box is too small. The unit titles should be written in to remind the learners what each unit is about.
Q3

This seems to be out of place logically. Demographic information should be collected at the end or the beginning. Students might wonder why you are asking this as it will be on their record.
Q4

This question has been compressed to save space. It would be much better if all the answer options were in a single column.
Q5

A variety of font styles are used for no apparent reason. It is best to use just one font for the whole questionnaire.
Q6

This could be improved by adding a completed example. But, as with Q3, do you need to ask it?
Q7

It is probably a good idea to ask the students to up-date their name and address at this point.

Commonwealth of Learning

129

Module A3 Getting and analysing quantitative data

Thank you box

It is best to give an address as well, in case they lose the reply-paid envelope.
General

The questions are all in bold and the instructions in italic, which is good. At the moment there are no instructions on the questionnaire for the keypunchers. It is always a good idea to have a box at the end for people to write in any other comments that they would like to make. You may want to take the opportunity at the end of the questionnaire to ask students whether they are willing to be interviewed at a later stage of the survey.

Feedback to Activity 9
Here are some other groups that you might have thought of: the general public have they heard of your programme? a particular section of the public who might be interested in courses on certain topics, e.g. farmers, pregnant women, those aged over 60 the writers of your courses the teachers of your courses the people who dropped out of your courses the employers of your students the students some time after the end of the course. You might have listed others.The point is that there are many groups who might be studied.

Feedback on Activity 10
Example 1

Although the cook is doing this subconsciously, she has decided that her sample is typical of the whole population.This is because she has been stirring the soup and can safely conclude that her spoonful is typical of the whole.The more homogeneous the population, the smaller the sample needs to be. If the soup has really been well stirred, there is an even consistency and any spoonful will taste like any other spoonful and they will all be truly representative of the whole population.To be completely sure of the soups goodness the cook would have had to taste all of it!

130

Practitioner research and evaluation skills training in open and distance learning

Unit 5: Doing institutional research from scratch

Example 2

How the sample is selected is important.The educational researcher has been given a group to study that is in no way typical. Sh/he has been given a biased sample.
Example 3

Prior knowledge can improve sample selection.The farmer knows that it is better to sample in different places. He has drawn a stratified sample. So, it is clear that sampling can save time, money and soup! All other things being equal, the bigger the sample you take, the more accurate your estimate will be for the whole population. However, in an homogeneous population the sample can be quite small.

References
Entwistle, N. and Ramsden, P. 1983 Understanding student learning, London: Croom Helm Kember, D. 1995 Open learning courses for adults: a model of student progress, Englewood Cliffs, NJ: Educational Technology Publications

Commonwealth of Learning

131

Analysing your research results

U N I T

6
Unit overview
This unit is designed to introduce you to some of the commonest methods of statistical analysis. All the methods will be illustrated using Excel.You will look at: a data set for 427 students on a range of courses how to use pivot tables in Excel to produce counts and crosstabulations the meaning of standard error and how to calculate it the meaning of a confidence interval and how to calculate one how different sampling methods can affect the representativeness of a sample the meaning of statistical significance how to use a t-test to decide whether the difference between two means is significant the chi-square test for comparing observed and expected data values the idea of correlation Pearsons correlation coefficient.

Learning outcomes
When you have worked through this unit, you should be able to: 1 Use pivot tables in Excel to produce counts and crosstabulations. 2 Explain the meaning of standard error and calculate them. 3 Explain the meaning of a confidence interval and how to calculate them. 4 Choose sampling methods so as to maximise the chances of their producing a representative sample. 5 Explain the meaning of statistical significance. 6 Use a t-test to decide whether the difference between two means is significant. 7 Use a chi-square test to compare observed and expected data values.

Commonwealth of Learning

133

Module A3 Getting and analysing quantitative data

8 Explain the meaning of linear correlation. 9 Calculate Pearsons correlation coefficient for given data.

Introduction
Having designed and carried out your study collected, coded, edited, entered and cleaned your data you are now left with the most exciting and the most complex stage analysing the results and interpreting what they mean. Again, as you have now probably come to expect, there is no precise set of instructions as to how to go about this, but there are certain good practices and procedures that you should adopt. What we are going to do is present you with a dataset from a survey carried out on four particular distance education courses and we are going to carry out certain exploratory analyses of it.You will find the data in the worksheet labelled Data in the Excel workbook entitled Analysing 1. The details of the dataset are as follows: the worksheet contains the responses to a survey from 427 students for each student there are values for 13 variables, as shown in Table 25.
Table 25 The variables in the dataset Variable Label 1 2 3 4 5 6 7 8 9 10 11 12 13 CASE COUNT GENDER URBAN AGE QUALS DESCRIP CONFUSING TUTORHELP SATIS EXAM PASS COURSE 0-90 P, F C1-C4 Values Description 1-427 1 M, F N,Y 27-78 1-4 1-5 1-5 1-5 Student number Set to 1 for all cases M = Male, F = Female N = No,Y = Yes Age in years Previous educational qualifications where 1 = Very low, 2 = Low, 3 = Medium, 4 = High The course description in the prospectus was accurate The learning materials are presented in a confusing way The tutor's comments on my assignments have helped me to study Considering the course as a whole, I am satisfied with it Exam score P = Pass, F = Fail C1 = Course1, etc

134

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Comments on the variables


Variable 1, CASE, is the case number. Each respondent has been given a number just for ease of reference. In practice this may be the students unique identification number given by the institution.Then you can use it to link survey data to information held on administrative records. Strictly speaking COUNT is not a variable because each respondent has the value 1. I created it because it helps in certain Excel procedures. GENDER, URBAN, AGE and QUALS are examples of demographic variables. They could have been asked for in the survey, but they are probably already held on your institutional database. GENDER and URBAN are alpha-numeric variables.This means that they use letters and symbols other than numbers.They could have been recorded as numbers but they remain nominal level variables and you still cannot use statistical procedures such as means on them. URBAN, AGE and QUALS are examples of derived variables. Whether a person is urban or not has been derived from their address or post-code. Their age has been derived from their date of birth.Their qualifications have been derived from a longer list of possible qualifications. DESCRIP, CONFUSING,TUTORHELP and SATIS represent statements about the courses the students were taking.They were asked in the survey how much they agreed with them. In each case 1 = strongly disagree, 2 = disagree, 3 = neutral, 4 = agree and 5 = strongly agree. Strictly speaking these four plus QUALS are ordinal level variables but they and others like them are often treated by researchers as interval level. EXAM, PASS and COURSE were taken from administrative records.

What do we want to know from this data?


The producers of the courses were anxious to find out what the students felt about their courses. Were some courses doing better than others? Did some groups of students particularly like or dislike the course that they had taken? Did their attitudes to their courses affect their academic progress? We will try to answer some of these questions by looking at our database.Your dataset will generally contain many more variables than this, but we have enough here for teaching purposes.)
Excel note: The limitations and possibilities of Excel Faced with such a database, most social researchers would use a statistical software package such as SPSS or SAS. They are tremendously powerful, very sophisticated and, with drop-down menus, fairly easy to use. However, such packages are very expensive and so we decided to restrict ourselves to what can be achieved using Excel.

Commonwealth of Learning

135

Module A3 Getting and analysing quantitative data

So writing this module has been a learning experience for me! I have had to find ways to do things in Excel that I have never done before. I succeeded, but maybe there are easier ways to achieve the same results. Please let me know if you find them.

I am going to take you through the stages of analysis in the order that I would do them and using the procedures that I consider to be appropriate. I will also add a running commentary to explain my thinking.

Question 1 Who has responded to the survey?


Responses rates to surveys vary a lot.The lower the response rate, the more I worry that those who have responded may not be typical of the survey population and so bias the results. So the first thing that I do is look at the characteristics of the respondents and check whether they are different from those of the population. We will begin by looking at gender. What we want to do is to create a table like Table 26.To do this in Excel, you use a method called pivot tables.These tables are a bit odd to get used to, but extremely useful. In the next activity I will show you how to create the gender pivot table.
Table 26 Count of gender in Excel pivot table format y y
Sum of Count Gender F Total 188 M 239 Grand Total 427

Activity 1

10 mins

Creating a pivot table for gender Copy the data to a new worksheet Open Excel Analysing 1 and go to the worksheet DATA. This is the total dataset with 427 cases and 13 variables. The data is contained in a rectangle that runs from cell A1 to cell M428. Highlight the whole worksheet by clicking on the diamond in the top left hand corner. Select Edit Copy then select Insert Worksheet select. A blank worksheet will appear named Sheet 1. Insert a copy of the data by selecting Edit Paste Right click on the Sheet 1 tab and select Rename. Call the new sheet DATA2 This step was not really necessary for this activity but it gets you into the habit of creating backup versions of your work whenever possible. In fact you should create backups of the whole workbook at frequent intervals. 136

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Open the worksheet with the data that you wish to analyse Open the worksheet Gender. This contains all 427 cases but for simplicity I have just taken three variables: CASE NUMBER, COUNT and GENDER. We want to know how many men and women there are among the respondents. Tell Excel that you want to create a pivot table framework Click on Cell E1 then select Data Pivot table report. This takes you into pivot table wizard that has 3 steps: Step 1 Accept the default you are using an Excel list or database. Click Next. Step 2 Type in the range of your data, i.e. A1:C428. Click Next. Step 3 Make sure that Existing worksheet is selected. Click Finish. This produces the framework of a table and a floating pivot table selection box that can be moved round to a convenient place on the screen. Drag the relevant variables and data into the framework Click and drag the word GENDER from the selection box and drop it onto Drop column fields here. Click and drag the word COUNT from the selection box and drop it onto Drop data items here. You should now have a table that looks like Table 26. This a simple frequency table. Finishing off Take these figures and use Excel to show that 44% of the respondents are women and 56% are men. My calculations are shown in Columns L-P (AN1)

There is no feedback to this activity


Excel note: Workbooks and worksheets An Excel workbook is indeed like a book, having separate pages or sheets As you have just seen, you can insert new sheets. You can also get rid of them by selecting Edit Delete Sheet. You can move sheets within a workbook by clicking and dragging the name tabs at the bottom of the screen.

You can also move sheets to other workbooks by selecting Edit Move or Copy Sheet. If the actual population was somewhere between 40 and 50% women, then I would not worry too much. My sample of respondents is fairly close. However, if, say, only one in ten of the students in the population were women, I might be concerned that my results might be somewhat biased,

Commonwealth of Learning

137

Module A3 Getting and analysing quantitative data

especially if I had reason to believe that women gave different answers to men. We will return to this later. First we will consider age.
Average age

We know that the average age of the student population is 37. How do our respondents compare?

Activity 2

5 mins

Average age You have already learnt how to find averages using Excel. Open the worksheet Age and find the average age of our respondents.

The feedback to this activity is at the end of the unit

Qualifications

Next, we will look at the previous educational qualifications of our respondents and compare them to those of the population.The population distribution of qualifications are shown in Table 27.
Table 27 Educational qualifications of the population Educational % qualifications Very low Low Medium High Total 14 28 25 33 100

Activity 3

10 mins

Qualifications In this activity you are going to create a pivot table of educational qualifications. 1 Open the worksheet Edquals. 2 Use the pivot table method to calculate the distribution of educational qualifications for our respondents.

The feedback to this activity is at the end of the unit

138

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Question 2 What should we do about possible response bias?


There are several options:
Option 1: Ignore it

This is the most common approach. Some researchers do not perform the calculations shown above, others do not report them. Some simply say that their respondents came from all sections of the population or were broadly representative of the population. In some cases they merely give general warnings such as women are over-represented. Ignoring possible response bias can lead to distortions in your results. It can also invalidate statistical tests.
Option 2: Control for bias by analysis

If there is known bias among your survey respondents you can control for it by using cross-tabulations. For example, if men and women are known to answer differently to a given question, then it is best to look at their results separately. Otherwise, if there are too many women among the respondents, they will have a disproportionate effect on the total figures.
Option 3: Control for bias by weighting

Statistical packages such as SPSS allow you to weight your respondents. For example, imagine that you had 300 female respondents and 100 males, whereas you really wanted equal numbers to match the overall population. You would weight the women down by a factor of 0.5 and weight the men up by a factor of 2.0.You would still have 400 cases, but they would be weighted cases.
Option 4: Control for bias by removing cases

Weighting is difficult with Excel. However, one could use a primitive form of weighting by removing cases. For example, if there were proportionately too many women among your respondents, you could randomly remove some of them until the proportions were correct.You would of course be throwing away some of your data that you had worked hard to get.
Option 5: Control for bias by looking at late respondents

There is some evidence to suggest that the people who respond to a questionnaire after several reminders are more like those people who do not respond at all. If this is true, late-respondents can be positively weighted in order to represent the non-respondents. The effects of ignoring or controlling for bias are difficult to calculate. Personally I tend to use weighting on those factors that I hypothesise will have a large effect on the topic of the survey. In our database I would be particularly anxious to see whether we had the correct proportions of

Commonwealth of Learning

139

Module A3 Getting and analysing quantitative data

successes and failures on the courses in question. Students who fail are less likely to complete questionnaires and also less likely to rate courses positively!

Question 3 Are the students generally happy with the courses in general?
Students were asked to respond to the statement Considering the course as a whole, I am satisfied with it. Possible answers ranged from 1 meaning strongly disagree to 5 meaning strongly agree. We want to look at how students on each of the four courses answered this question. The simplest way is to use pivot tables in Excel to construct a crosstabulation of the two variables COURSE and SATIS. This simply means a rectangular table where one variable runs across the top and the other variable down the side.The values in the table are the number of cases in each square or cell.

Activity 4

5 mins

Crosstabulation of satisfaction by course In this activity you are going to produce a pivot table to show satisfaction by course. It will have the following format.

Sum of count Satis 0 1 2 3 4 5 Grand Total

Course C1 C2 C3 C4 Grand Total

1 Open Worksheet SATIS. 2 Select all the data. (Cells A1 to D248) 3 Activate the pivot table wizard. 4 You will need to drag: COURSE to the top of the table

140

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

SATIS to the left-hand column of the table. COUNT to the body of the table. If you get stuck, a version is provided on the SATIS worksheet in Column L (AN4).

The feedback to this activity is at the end of the unit

Activity 5

10 mins

Satisfaction percentages Now that you have the raw numbers for the crosstabulation of satisfaction by course, you can produce the percentages in a format such as the following:

Course 1 % 1 Strongly Disagree 2 Disagree 3 Neutral 4 Agree 5 Strongly Agree Total Average rating

Course 2 %

Course 3 %

Course 4 %

Total %

Use Excel to produce the table above with the correct figures in the cells. Notes: 1 The first thing to notice is that there were only 424 actual replies to the satisfaction question. That is because three respondents did not answer the satisfaction question and therefore recorded an answer of zero. I decided to leave them out of the final table. 2 The percentages should add up to 100 in each column (if you have rounded the results to zero decimal places, they may not quite add up to 100).

The feedback to this activity is at the end of the unit What does Table 37 tell us? Here we are back to interpretation, and that in turn depends upon your standpoint. An outsider might start with the averages and say that overall the students are pretty uncommitted about the courses.The average score is 2.9 which is very close to the neutral middle score of 3.To go beyond that the person might want to know if that is an improvement over previous years, or whether scores for other courses, within the institution or elsewhere, are similar.

Commonwealth of Learning

141

Module A3 Getting and analysing quantitative data

As a course producer you might be concerned that over a third of the students did not agree with the statement (27 + 10 = 37%) and one in ten (10%) disagreed strongly. Is this degree of dissatisfaction acceptable? A legitimate question might also be How sure are you about these results, and hence these interpretations? Remember that the survey was sent to a sample of students on the courses and not all of them replied. If we did the survey again with a new sample it is unlikely that we would obtain exactly the same results. Some of this inaccuracy is due to sampling error. Even with random samples you cannot guarantee that the sample will be truly representative of the population. If you take a large number of samples from a population and measure the mean value of each sample, these means will not be identical. So when we calculate means from survey results we should attempt to say how accurate we think the results are.To do this we calculate the standard error.

Standard error
Statistical note: Central limit theorem The central limit theorem (one of the most important theorems of statistics) states that if large, random samples are drawn repeatedly from a population, then the means of those samples will be approximately normally distributed. This is true even if the population itself is not Normal. Also the mean of the sample means will be approximately the same as the population mean. Using known characteristics of normal distributions we can then make statements such as We are 95% confident that the mean of the population falls within plus or minus 1.96 standard deviations of the sample mean. Strictly speaking this should be the population standard deviation. However, as this is not known we use the standard deviation of the sample instead.

We are going to calculate the standard error of the mean using the formula: Standard deviation of the sample Square root of N The standard deviation of the variable Satis is 1.1 (You can check this yourself.) Standard error of the mean = N = 427 So 427 = 20.7 1.1 = 0.05 20.7

And hence, Standard error of the mean =

142

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Interpreting the standard error

We can now take this figure and say that the population mean has a 95% chance of lying between: Sample mean +1.96 and Sample mean 1.96 Standard error of the mean Standard error of the mean

i.e. the population mean lies between 2.9 + 1.96 0.05 and 2.9 1.96 0.05

Or between 3.01 and 2.79 These values are the 95% confidence limits for our estimate of the population mean. (Note: We are only talking about sampling error here. As mentioned earlier, there may also be response bias and measurement error.) Now I want you to look at what the results would look like if the sample size were much smaller.

Activity 6

20 mins

Comparing different methods of random sampling In this activity you will look at three different methods of selecting 25 cases from our sample of students. Then you will compare the confidence intervals for the mean satisfaction rating in each case. The three random samples You will find the three random samples on the worksheet Random. They were prepared as follows: Method 1 The first 25 I assumed the cases were in no particular order so I just picked the first 25 cases. This method can be dangerous since cases might be in the order that the questionnaires were returned so the first 25 might be from the keenest students or from those who lived closest in other words, not random. Method 2 A systematic random sample This time I just took every 17th case (this yields a sample of 25). I first drew a random number between 1 and 17 to decide where to start. I came up with 7, so I first picked the 7th case, and then picked every 17th case, i.e. case numbers 24, 41, 58,

Commonwealth of Learning

143

Module A3 Getting and analysing quantitative data

Method 3 A simple random sample The safest method is to use random numbers. I allocated a random number to each case, then sorted by this random number (I could have selected Ascending or Descending) and selected the first 25 cases. Your tasks For each set of 25 cases (they are in bold in the worksheet): 1 Calculate the mean, the standard deviation, the sampling error and the 95% confidence limits. 2 Compare your results with those obtained from the full dataset. 3 What can you conclude?

The feedback to this activity is at the end of the unit

The problem of respondent bias

A low response rate can mean that the non-respondents differed in some way from the respondents.This can give us problems when drawing conclusions from the respondents. As an illustration of this I want to take you through an imaginary extreme example: A survey is mailed out to 200 students and 100 reply. 80 said they enjoyed the course, 20 did not. The researchers say that they are 95% confident that the population figure falls between 76 and 84%. However it could be that all of the 100 who did not respond did not enjoy the course. If they had responded the true population satisfaction figure would be 40% ((20 + 100) 200 100). If the 100 who did not respond did enjoy the course the true population satisfaction figure would be 90% ((80 + 100) 200 100). If you suspect that the non-responders are not like the responders then the most cautious estimate would be that the population figure falls somewhere between 40% and 90%.

Question 4 Are the students more satisfied with Course 1 than with Course 2?
Course 1 and Course 2 are both introductory courses to the same discipline. It appears that students find Course 1 more satisfactory than Course 2. (The average rating for Course 1 was 3.1 compared to 2.6 for Course 2.The % dissatisfied were 30% and 45% respectively.) But is this difference real?

144

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Course 1 results are based on 155 students and Course 2 on 157. If you carried out the survey again on same size but different groups of students you would almost certainly come up with slightly different results. It might even be the case that Course 1 would come out top in a second survey. Imagine that Course 1 and Course 2 were in reality equally popular.This means that if you accurately measured the satisfaction of every student on both courses you would find the average satisfaction rates were identical. Now imagine that you randomly picked out two students, one from each course.The student on Course 1 agreed strongly with the statement and the student on Course 2 disagreed strongly. I am sure that you would not conclude that Course 1 was better on such flimsy evidence.You would demand a bigger sample. Our task is to decide whether the difference between 3.1 and 2.6 is big enough to be meaningful. It may be that the overall figures for the two courses are the same. What are the chances of our results occurring if that is indeed the case? This depends upon the size of the difference in the means, the variation in scores within each group, and the size of the two groups. We are going to use a test of statistical difference called a t-test to measure those chances, or the probability of our results occurring by chance.
Statistical note: Null hypothesis The method we are using (and it is widely used in statistics) is to say Lets assume that there is no real difference between the two sets of figures. Then we look to see if there is enough evidence for us to reject this assumption. The assumption there is no difference is called a null hypothesis.

Statistical note: Tests of significance A test of significance is used to measure whether a value (e.g. a sample mean) differs significantly from some other value (e.g. a population mean). If the difference is too large to have occurred by chance then we say that the difference is statistically significant. You are about to meet your first test of significance the t-test.

Statistical note: The t-test The t-test works as follows: The formula used is:

M1
t= Where

M2 SD2 N2
2

SD1 N1

Commonwealth of Learning

145

Module A3 Getting and analysing quantitative data

M1 = Mean of sample 1 M2 = Mean of sample 2 SD1 = Standard Deviation of sample 1 SD2 = Standard Deviation of sample 2 N1 = number in sample 1 N2 = number in sample 2 The greater the resulting t value is, the higher the statistical significance is. The t-test also involves another statistical idea: degrees of freedom. This is a number based on the sample size. The smaller the sample, the fewer are the degrees of freedom and the greater the t value has to be to achieve significance. If you were calculating the test manually you would work out the value of t then look up in a statistical table whether it exceeded a critical value, given the degrees of freedom that you had.

Activity 7

10 mins

t-test for the difference between two means In this activity you are going to use Excel to test whether the difference for the mean satisfaction of students on Courses 1 and 2 is significantly different. 1 Open worksheet ttest. Here you will see that I have taken the students who were taking Course 1 and those who were taking Course 2 and placed the datasets alongside each other. With Course 1, I have excluded the two students who did not answer the satisfaction question, so N = 155. For Course 2, N = 157. 2 Click on cell H2. Insert Function Function category Statistical TTEST. 3 In the TTEST dialogue box: For Array 1, type in B2:B156 For Array 2, type in F2:F158 For Tails, type in 2 For Type, type in 2 4 Click OK The result, to nine decimal places, should be 0.000148277. To four decimal places it is 0.0001

There is no feedback to this activity

146

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Statistical note: One- or two-tailed? Statistical tests can be one-tailed or two-tailed. You use a one-tailed test when you are looking for a difference in a particular direction, e.g. Is Course 1 better designed than Course 2? You use a two-tailed test when you are looking for a difference in either direction, e.g. Is there a difference between Course 1 and Course 2?

Statistical note: Type 1 or Type 2? Type 1 is used when you have paired scores for the same people, e.g. their scores in maths and their scores in English. Type 2 is used when you have two different groups and both groups have approximately the same variance.

Interpretation of the t-test result

A t-test value of 0.1 would have meant that there was a one in ten chance that the two samples came from the same population. A t-test value of 0.05 would have meant that there was only a five in one hundred chance that the two samples came from the same population. In these circumstances, we are inclined to say the samples probably come from different populations.This is called rejecting the null hypothesis at the 95% confidence level.This is the level that is the general standard for measuring significance.
Our case

Now look at our case. Our t-test result was a probability of 0.000148277.This is very much smaller than 0.05, so we can confidently say that the result is statistically significant. Now, look at what that means in terms of our hypothesis: we chose a null hypothesis of satisfaction on Course 1 = satisfaction on Course 2 we calculated a t value on the assumption that this hypothesis was true we found that the probability of obtaining our t value if the null hypothesis were true was 0.000148277 this is much lower than 0.05 (the 95% confidence value) so we conclude that our null hypothesis is false so, we believe that there is a difference in satisfaction levels on the two courses.

Commonwealth of Learning

147

Module A3 Getting and analysing quantitative data

Activity 8

15 mins

Is there a real difference in satisfaction between Course 3 and Course 4? One course has a satisfaction rating of 3.0, the other of 2.9. 1 Test whether this difference is significant by using a t-test. 2 Remember to exclude missing values.

The feedback to this activity is at the end of the unit

Activity 9

5 mins

Interpreting probabilities in statistical significance The following probabilities were found in carrying out statistical tests. What do these tell you about the significance of the result in each case? 1 p = 0.06 2 p = 0.01 3 p = 0.002 4 p = 0.95

The feedback to this activity is at the end of the unit

Question 5 Are some categories of students more satisfied than others with the courses?
It would be interesting to see if any of the demographic factors are related to study satisfaction. Here we are going to look at gender and urban/rural location.The necessary data is in worksheet Satis Gender.
Gender

Were women more satisfied with the courses than men? We can answer this by: producing a crosstabulation of satisfaction against gender (Table 28) finding the means for each gender (Table 29) and then doing a t-test The calculations are in the Satis Gender worksheet (AN8). Note how I sorted the data in Columns L and M to make the t-test simpler. I sorted by GENDER and then by SATIS in one go.This enabled me to specify the t-test between mean and women, but excluding missing data.

148

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Table 28 Satisfaction ? gender Excel pivot table

Sum of Count Satis

Grand Total

Gender F M Grand Total 0 1 2 3 1 14 28 42 2 55 58 113 3 67 84 151 4 33 47 80 5 18 20 38 188 239 427

Table 29 Calculation of mean for each gender n = 187 % Strongly disagree Disagree Neutral Agree Strongly agree 7 29 36 18 10 100 Mean 2.9 n = 237 % 12 24 35 20 8 100 2.9 n = 424 % 10 27 36 19 9 100 2.9

The distribution of ratings for women and men showed only small differences. The mean scores were actually the same when measured to one decimal place and, not surprisingly, the t-test gives a value of 0.66 which indicates no significant difference.
Location

We can also ask whether location affects satisfaction. Are the students who live in an urban area more satisfied with the courses than those in a rural setting? You can find this out for yourself in the next activity.

Activity 10

15 mins

Location and satisfaction You will need to use the data in Satis Loc for this activity. Carry out the following steps to establish whether the variations in satisfaction by location are statistically significant. (These are the same steps that I have just demonstrated for gender.) 1 The data you require is in worksheet Satis Loc, columns B to E. 2 Create a pivot table to give you the crosstabulation of satisfaction URBAN). location (SATIS X

Commonwealth of Learning

149

Module A3 Getting and analysing quantitative data

3 Calculate the mean satisfaction for each location. 4 Use a t-test to test the null hypothesis Urban satisfaction = Rural satisfaction.

The feedback to this activity is at the end of the unit So course satisfaction does not seem to be related to gender but urban students were more satisfied with the courses than were rural students. What does this mean? Well I think that we can probably rule out the possibility that courses affect where people live, but what is the causal relationship, if any? It might be that urban students find it easier to get to study centres and to mix with other students. Perhaps the curriculum is more suited to urban dwellers. To use that classic phrase, more research is needed.

Question 6 Are the attitudinal or demographic variables related to each other?


The calculations that we made in answer to Question 5 were based on assumptions that the variable SATIS is an interval scale and that the population data is normally distributed.These are the requirements for what are known as parametric tests. Parametric tests include tests such as correlations and t-tests. (Purists would say that SATIS is an ordinal scale and so t-tests and correlations are not appropriate.) However, there is another set of non-parametric statistical tests that make far fewer assumptions about population data.They can be used when the data are nominal or ordinal. We want to introduce you now to the most commonly used non-parametric test, the chi (pronounced kai) square test.
Statistical note: Power of tests Parametric tests are said to be more powerful than non-parametric tests. The power of a test refers to the probability of rejecting the null hypothesis when it is in fact false. This means that, for a given sample size, the parametric test is more likely to reject a false hypothesis.

We are going to begin by looking at the relationship between gender and course. Do men and women appear to choose different courses to study? The required data has been placed in worksheet Chisq, which you will find in workbook Analysing 2. It is also summarised in Table 30, using numbers and percentages.

150

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Table 30 Gender

course C1 (n) C2 (n) 60 97 157 C2 % 38 62 100 C3 (n) 46 24 70 C3 % 66 34 100 C4 (n) 24 19 43 C4 % 56 44 100 Total (n) 188 239 427 Total % 44 56 100

Women Men Totals

58 99 157 C1 %

Women Men Totals

37 63 100

It should strike you straight away that women seem to prefer courses C3 and C4 to courses C1 and C2. But is the difference important? After all, the numbers on courses C3 and C4 are relatively small. With chi-square you compare the number actually in each cell (the observed) with the number that you would expect to be there if there was no relationship (the expected). For example the observed for the number of women taking Course 1 is 58. What is the expected? Well women form 188/427ths of the population so you would expect them to form 188/427ths of the 157 students taking Course 1. 157 X 188/247 = 69. So, all other things being equal, we would have expected 69 women to have taken Course 1. (In the following calculations I have rounded to whole numbers for the sake of simplicity.) Chi-square is a measure of how far the observed figures differ from the expected figures.The formula to calculate chi-square is: (Observed Expected)2 Expected or, using the symbol for chi-square: Chi-square = (Observed Expected)2 Expected It is very straightforward to calculate and you will do it in the next activity. x2 =

Activity 11

15 mins

Calculating chi-square Step 1: Find the expected values 1 Open worksheet Chisq in workbook Analysing 2. 2 You will see that to the right of the data I have already entered the observed figures. Then there is a table for the expected figures containing the figure 69 for women on course 1 as we calculated above. You should now be able to calculate the remaining

Commonwealth of Learning

151

Module A3 Getting and analysing quantitative data

expected figures. (If you get stuck, refer to my calculations on the right of the worksheet. AN10.) When you have calculated the figures, check that they still add up to the original totals. For each cell we now have an observed and an expected figure. Step 2: Find (O E)2 for each cell. E

I have placed all of the observed figures in a column labelled O, and the first expected figure (69) alongside the first observed figure. The calculations for this first pair are as follows: (O (O (O E) E)2 E)2/E 58 11 121 69 11 69 11 121 1.8 (to 2 significant figures)

3 Calculate the values for the rest of the cells in this table. Step 3: Find the chi-square total 4 Then calculate the total of all the values in the last column. You should come up with a figure for chi-square of 21.1 (to one decimal place). Check with my calculations if you get a different figure. The bigger the chi-square result is, the more likely it is that the results have not occurred by chance. However, you also have to take into account the number of cells in your table. The more there are, the bigger the chi-square value has to be to achieve significance.

There is no feedback to this activity


Statistical note: Degrees of freedom When looking up the significance of a chi-square value you have to know the degrees of freedom. The number of degrees of freedom in an m (m 1)(n 1). In our case: Degrees of freedom = (2 1)(4 1) = 3 n table is usually

You can find a table of chi-square critical values for a range of degrees of freedom at http://www.qualityadvisor.com/sqc/formulas/chi-square-f.htm Chi-square tables are also published in many statistical textbooks.

When I look up my chi-square tables I see that the critical values when there are 3 degrees of freedom are 7.8 at the 5% level and 11.3 at the 1%. Our result of 21.1 is well above these critical values so it is highly significant.The figures are extremely unlikely to have occurred by chance.

152

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Another use of chi-square

When we looked at the urban/rural issue we concluded that urban students were much more satisfied with the courses than were rural students. In this activity we will use chi-square to explore a possible reason for this. Students were asked whether they agreed with the statement The tutors comments on my assignments have helped me to study (the variable TUTORHELP).The results are shown in Table 31, split by location.
Table 31 Tutor comments Response Strongly disagree Disagree Neutral Agree Strongly agree Totals location

Rural % Urban % 15 41 11 24 10 100 19 54 8 10 8 100

It seemed possible that that rural students were less satisfied than urban students because they were getting less study support. However, the figures in the table suggest that the rural students were getting more helpful comments from their tutors than were urban students (34% agreed with the statement compared to 18%). Is this difference significant? You will explore this in the next activity.

Activity 12

10 mins

Another use of chi-square You will need to use the worksheet Chisq 2 in Analysis 2 for this activity. You will see that I have laid out the relevant data in columns A to D and that I have excluded cases with missing values. We are going to use similar procedures to those in the previous activity but we are going to use an Excel function to calculate for the significance level of chi-square as follows: 1 I have used a pivot table to calculate the Observed values 2 Calculate the Expected values in the box below, as we did in the previous exercise. 3 Click on the empty cell I29 then Insert Function Statistical CHITEST OK. 4 In the dialogue box that appears, type in the details of the Actual_range of figures G8:H12. 5 Type in the details of the Expected_range of figures G20:H24. 5 Click OK.

Commonwealth of Learning

153

Module A3 Getting and analysing quantitative data

This will give you an answer of 0.00101164 (or 0.001 to 3 decimal places), 7 If you did not get this result, see my calculations in AN11 This means that there is only a one in a thousand chance that that there is no relationship between the two variables. So it would be safe to conclude that urban and rural students differ in their views about the helpfulness of tutor comments. Further more, from looking at the distribution of answers it seems that rural students found the comments more helpful.

Question 7 What appears to determine exam performance?


I think that we can assume that people with an exam score of zero did not actually turn up for the exam. Apart from that, exam scores varied from 15 to 90. What else can we say that will shed some light on exam performance? Well the average exam score for those who attended was 56. (Calculations in Analysis 2 Worksheet Exam1). Which groups scored higher or lower? If you look at previous educational qualifications you will see that these are related to exam performance.The average exam score for each group is shown in Table 32.You can clearly see that the higher the previous level of education, the better was the exam performance. (You can see how I calculated these averages in worksheet Exam2.)
Table 32 Average exam score by previous qualification Previous educational Average qualifications exam score Very low Low Medium High 45 51 55 60

Activity 12

10 mins

Urban versus rural exam scores Do urban students perform better in exams than rural students? Use the data in Exam3 Use a t-test to answer this question.

The feedback to this activity is at the end of the unit

154

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Exploring relationships using correlation


In this section we are going to show you how to calculate and interpret correlation coefficients. If you have not studied statistics before that might sound a bit scary, but dont worry! You do correlations in your head all the time anyway. Have you noticed that the more you eat, the heavier you weigh? Or that the longer you run, the faster your heart rate gets? Well you are performing an informal type of correlation calculation you are noticing how changes in one variable relate to changes in another.You are thinking about how they are corelated. The technique that we are going to use is formally called Pearsons Product Moment Correlation. Pearson should only be used when the variables you are looking at are either interval or ratio scales.
Statistical note: Correlation coefficients with scales Some researchers are more purist about this than others. Some will argue that you can calculate correlation coefficients using five-point rating scales while others insist that these are only ordinal scales.

With spreadsheets and statistical packages it is all too easy to produce huge numbers of coefficients and other statistics without knowing what they mean. The first golden rule is to draw a graph so that you can see what data you have actually got!
Example: This example can be found in the Correlation workbook on the worksheet Example 1. The data is reproduced here in Table 33. Table 33 Correlation Example 1: Students on a course in chemistry Student Age Exam number score 1 2 3 4 5 6 7 8 9 10 15 17 24 36 18 22 39 27 16 16 64 55 54 35 59 45 32 38 65 70

Commonwealth of Learning

155

Module A3 Getting and analysing quantitative data

This is some imaginary data on ten students on an ODL course in chemistry. There has been a wide range of exam scores and a teacher has suggested that performance might be related to age. She thinks that older students have been struggling. For each student we show their age and their exam score. (For example, student number 6 is aged 22 and has scored 45 on the exam.) It is impossible to tell from this raw data what, if anything, is going on so the best thing is to draw a graph.

Activity 14

10 mins

Drawing a graph of the chemistry student data Use Chart wizard to produce a graph that shows Age on the x-axis and Exam score on the y-axis 1 Use Chart wizard to draw a graph of the data in cells C7 to D16 in the Worksheet Example 1. 2 Under Chart type click on XY (Scatter). Various Chart sub-types will appear on the right. Just click on Next at the bottom this selects the default chart sub-type which is what we want. 3 Chart wizard step 2 of 4 will appear. This just checks that you have selected the data range that you wanted. It should say Example 1 !$C$7:$D$16. If so, click on

Next. (If not, click on Cancel and start again.)


4 Chart wizard step 3 of 4 will appear. In the box under Chart title, type in Chart 1. For Value (X) axis type in Age and for Value (Y) axis, type in ExamScore. Then click on Next. 5 With Chart wizard step 4 of 4, just click on Finish. Your new graph called Chart 1 will appear on the worksheet. (You may need to click and drag it if it obscures the figures in the table.) It should look like Figure 24.

There is no feedback to this activity

156

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Figure 24 Graph of chemistry scores against age


80 70 60 Exam score 50 40 30 20 10 0 0 10 20 Age 30 40 50 Series1

Looking at the graph, there is clearly a pattern. If you drew a line to enclose all of the data-points, it would resemble a cigar leaning down from left to right. There does indeed appear to be a relationship between age and exam score. The good scores were gained by the young students and the poorer scores by the older students. How can we quantify this relationship? Pearsons Product Moment Correlation provides one way to do this and it can be calculated easily using Excel. Before doing it, here are a few important background pieces of information: Pearsons Product Moment Correlation is a measure of linear correlation. this means that we are calculating how well the points on the graph can be represented by a straight line the calculated coefficient will fall between -1 and + 1. If the coefficient is zero then there is no correlation.The closer the coefficient is to -1 or + 1, the higher is the level of correlation the letter r is generally used as shorthand for the Pearson Correlation Coefficient this technique is so common that it is rarely spelled out in research papers. If the text says the correlation was 0.6 it will be the Pearson Correlation that is meant correlation does not mean causation! Just because two variables are correlated it does not mean that one variable is the cause of the other. Bearing these points in mind, lets do the calculation.This is in the next activity.

Activity 15

10 mins

Calculating the correlation coefficient for the chemistry students 1 Go back to the Excel workbook Correlation, worksheet Example 1. 2 Click on cell C21. This is where we are going to calculate the correlation coefficient r, by pasting in a function. 157

Commonwealth of Learning

Module A3 Getting and analysing quantitative data

3 Select Insert from the Toolbar then scroll down to Function. 4 In the pop-up window, click on Statistical, then under Function name, select Correl. Then click on OK. 5 You will see in the next pop-up window that you have to specify Array1 and Array2. These are the two sets of data to be correlated. 6 Type in C7:C16 in the box for Array1 (Age) and D7:D16 in the box for Array2 (Exam score). Then click OK. In box C21 you will see that it now says -0.9. (Your answer might have more decimal places but I have restricted it to just one decimal place.) If you have a problem, my answer is shown as AN12.

There is no feedback to this activity Congratulations, you have calculated a correlation coefficient! But what does it mean? The minus sign means that the two variables are negatively correlated.That means that as the values for one variable go up, the values for the other tend to go down. The size of the coefficient (-0.9) means that the relationship is very strong. Remember that the maximum it could be is 1. So it seems that the older the chemistry student is, the lower their exam scores tend to be.

A second example
This second example uses the data in Table 34.The full data can be found in the worksheet Example 2 of the correlation workbook.

158

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Table 34 Data on course enjoyment and number of radio programmes Course % Who enjoyed Number of radio number the course programmes in the course 1 2 3 4 5 6 7 8 9 10 22 24 27 34 36 39 42 45 55 57 7 8 11 12 11 15 19 21 23 22

This is another set of imaginary data, but this one concerns courses rather than students. Some courses use more radio programmes than others and, because these are expensive to make, the institution wants to know whether they have a beneficial effect.The students have been surveyed and the first column of data shows the percentage who said that they enjoyed each course.The second column shows the number of radio programmes on each course. If the programmes were beneficial one would expect a positive correlation between the two sets of figures. Note that the courses have been arranged in such a way that the enjoyment percentages have been ranked from low to high. Just looking at the radio figures, you should be able to see that these figures are also roughly in ascending order. However, it is still worth drawing a graph.

Activity 16

10 mins

Drawing the graph for the radio programmes 1 Draw the graph for the data in Table 34. (The data is in worksheet Example 2) 2 From the shape of the graph, what can you predict about the correlation coefficient? 3 Calculate the correlation coefficient in Cell C21. Note that when you go to Insert Function, the pop-up box will probably have highlighted Most recently used on the left and Correl on the right. Just click on OK.

The feedback to this activity is at the end of the unit

Commonwealth of Learning

159

Module A3 Getting and analysing quantitative data

What do these examples tell us?


General interpretation

In the first example there seems to be a strong relationship between age and chemistry performance. What might be going on? Well we can certainly rule out the idea that chemistry marks cause age! Getting old happens to everybody and the process seem to be unaffected by any variable. It is possible that age causes chemistry marks but correlation does not prove this. Further investigations would be needed to see whether this age effect holds up in different situations and, if it does, to see what aspect of age is involved. For example, it might be actually harder to learn chemistry as the human brain gets older, or it might be that the younger students have been taught chemistry better at school, or just more recently. In the second example it is clear that the students enjoyment levels did not cause the number of radio programmes. It may be that increases in the number of radio programmes produces greater levels of enjoyment, but there may be other factors at work. For example, the courses with more radio programmes might also have more TV programmes as well, or they may be teaching more enjoyable subjects. How do you interpret the actual correlation coefficients? So far we have said that the higher the coefficient, the higher the correlation, and the greater the explanatory power. However this is only part of the story. Imagine that you did two studies. One correlated dropout rates on courses with the average age of students and the other dropout rates with the average tutorial attendance rate.The calculated coefficients were 0.4 and 0.8 respectively. It would be tempting to say that tutorial attendance is twice as good at explaining dropout than age.Tempting, but wrong. Lets go back to the data in Example 1. Imagine that you had to guess the exam grade of each of the ten students. Well the average exam grade was 52%, so, statistically it would be best if you guessed that each of them got 52%.You would actually be wrong in all cases because nobody actually scored 52%.You would be pretty close in the case of Student 3 who scored 54%, but you would be wildly out for Student 10 who got 70%. A measure of the total amount of error that you would make is called the variance. Now imagine that you were told each students age before you had to guess their exam grade.This would improve your guesses because you know that older students gain lower exam scores. But how much would it improve them? If you square the correlation coefficient (multiply it by itself) you get rsquared.This is a measure of how much of the variance you have explained. In Example 1 our coefficient of 0.9 gives an r-squared of 0.9 0.9 0.81.This means that age explains 81% of the variance in exam grades which is very good.
160
Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

With our dropout example age explains 16% of the variance (0.4 0.4 = 0.16) and tutorial attendance explains 64% (0.8 0.8 0.64). So you could say that tutorial attendance is four times better than age in explaining dropout rates.

How important are the correlation coefficients?


Imagine that your data consists of three points on a graph and they fall in a straight line.This would produce a coefficient of 1.0, but it would be very silly to advance a theory based on so few cases. Whether a coefficient is statistically significant depends upon both its size and the number of cases that it is based on. You can find a calculator for the significance of r at http://faculty.vassar.edu/lowry/VassarStats.html
Example 3

So far we have used carefully constructed data that give high correlation coefficients. Lets move on to our third example. A tutor on an ODL course thinks that students who live a long way away from the exam centres are disadvantaged.The data are shown in Table 35.To test his hypothesis he takes the exam grades of ten students and then for each of them he measures the distance from their home to the exam centre.
Table 35 Data on exam grades and distance from exam centres Student number Distance 1 2 3 4 5 6 7 8 9 10 22 34 120 78 45 23 58 66 88 29 Exam grade 78 54 20 44 50 35 27 78 21 21

The data, when plotted, are shown in Figure 25.You will see that there is no obvious pattern and this is reflected in the correlation coefficient of 0.0.The tutors hypothesis seems to be wrong.

Commonwealth of Learning

161

Module A3 Getting and analysing quantitative data

Figure 25 Plot of data on exam grades and distance from exam centres
90 80 70 60 Exam score 50 40 30 20 10 0 0 20 40 Distance 60 80 100

However, just imagine that he had made one small error when keying in his data. Student 3 actually lived 90 kilometres away and not 20. When this error is corrected in the data you can see that the correlation coefficient has jumped dramatically to 0.4! Perhaps there is something in his hypothesis! (You can try this for yourself in worksheet Example 3. When you alter the 20 to 90 you will see the correlation coefficient change automatically. Experiment by changing one or two other figures) The lesson to draw from this exercise is that the bigger the number of cases that you look at, the more trustworthy your results will be.The freak effects of one error, or indeed of one case that is true but goes against the general pattern (an outlier), will be minimised. On the other hand if you have thousands of cases on which to base your correlation, then you might find that you obtain a correlation coefficient of 0.2 that is highly significant, but only explains 4% of the variance (0.2 0.2 = .04). So statistical significance does not guarantee practical importance.

Two last safety warnings about correlations


Safety warning 1: A correlation of zero does not mean that the two variables are unrelated

Go to worksheet Example 4 in the Correlation workbook where there is data for ten ODL courses. For each course we have the number of students who were taking the course and the pass rate for that course. We have already graphed the data and you will see in Chart 4 a beautifully symmetric curve. It seems that very small courses have poor pass rates (possibly because there is little student interaction) and very big courses suffer too (possibly because they become too impersonal).

162

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

However, if you calculate the correlation coefficient you will see that it is zero. We said earlier that Pearson is a measure of linear correlation. Because the relationship in Chart 4 is curvilinear it is not detected by Pearson.
Safety warning 2: A correlation can conceal as well as reveal

In Example 1 we took an imaginary chemistry course and showed that there was a strong negative correlation between age and exam score. Now lets add the data from a second imaginary chemistry course as shown in worksheet Example 5.The data for Course 2 is actually taken from Example 2 where we showed that there was a strong positive correlation between the two variables. This is very clear from the graph in Chart 5. If we had treated then separately, we would have concluded from Course 1 that older students fared worse in chemistry, but from Course 2 we would have concluded the opposite. However, when they are put together, the correlation coefficient comes out as fairly strongly negative (0.6), thus concealing the very real differences between the two courses. Never rush into statistical calculations. Always look at the data in graphical form.

Activity 17

15 mins

Difficulty and confusing course description Is there any evidence from our survey data that students are finding the courses difficult because the course description was inaccurate? (Hint: Is there any relationship between the variables CONFUSING and DESCRIP? You will need to get the relevant data from workbook Analysis 1, worksheet data)

The feedback to this activity is at the end of the unit You might like to practice by calculating other correlations among the attitudinal data from the survey.You will find that in many cases that the coefficients are almost zero and in some cases the relationship is not in the direction you might have predicted.

Looking back and looking forward


By now you should be beginning to grasp some of the basic statistical techniques used by educational researchers. Used and interpreted sensibly they will get you a long way when carrying out your own research. If you want to consolidate and expand on this learning I would suggest that you read an introductory statistics textbook such as:

Commonwealth of Learning

163

Module A3 Getting and analysing quantitative data

Statistics for people who (think they) hate statistics by Neil J. Salkind. Sage Publications Inc, 2000,Thousand Oaks, California. ISBN 0-76191622-9. (This book contains printed versions of all the data sets that are used in the examples and they can also be downloaded from a website). For detailed analysis of large datasets you really need packages like SPSS. If you can gain access to this software, I would recommend that you read the following book because it will teach you statistics and SPSS at the same time: Discovering statistics using SPSS for Windows by Andy Field. Sage Publications Inc, 2000,Thousand Oaks, California. ISBN 0-7619-5755-3. (This book contains a CDROM with data sets). However, as you have seen, statistics only deal with the numbers that have already been produced. What figures are produced and how they are produced are just as important, if not more so. If you ask the wrong questions in the first place, no amount of statistical manipulation will salvage the results. A good basic introduction to the whole process is: Data Collection and Analysis Edited by Roger Sapsford and Victor Jupp. Sage Publications Inc, 1996,Thousand Oaks, California. ISBN 0-7619-5046 X. You will also have to read the research of others. Here you will only see chosen results that are presented and interpreted in ways chosen by the author. I hope that this module has given you some clues what to look for, but the following book might help you to evaluate such work. Reading Statistics and Research 2nd Edition by Schuyler W. Huck and William H. Cormier. Harper Collins College Publishers, 1996, New York. ISBN 0-06-500606-2. The computer began to revolutionise quantitative social research over thirty years ago with its ability to process huge amounts of data at astonishing speed. A second digital revolution is now taking place in terms of communication. More and more people now have access to computers and to the Internet and this brings possibilities for social surveys, especially in ODL situations. Researchers can now collaborate with colleagues around the world seeking guidance, accessing datasets, reading the literature, carrying out collaborative projects, etc.The possibilities are almost endless. However, the underlying processes of knowledge construction and interpretation remain the same. Remember that quantitative facts are socially constructed and provisional and that they should be dealt with critically. In the words of an old computing saying Garbage in, garbage out!

164

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Summary
You have now looked at a range of basic statistical methods and you should be able to: use pivot tables to produce counts and crosstabulations explain the meaning of standard error and calculate them explain the meaning of a confidence interval and how to calculate them choose sampling methods so as to maximise the chances of their producing a representative sample explain the meaning of statistical significance use a t-test to decide whether the difference between two means is significant use a chi-square test to compare observed and expected data values explain the meaning of linear correlation calculate Pearsons correlation coefficient for given data.

References
A useful web site that goes into detail on the methods introduced here (and many others) can be found at VassarStats web site at http://faculty.vassar.edu/lowry/VassarStats.html.This site includes many interactive windows that will carry out statistical calculations for you. Field, A. 2000 Discovering statistics using SPSS for Windows, Thousand Oaks, California: Sage Publications Inc. ISBN 0-7619-5755-3 (This book contains a CDROM with data sets) Huck, S. and Cormier, W. 1996 Reading statistics and research 2nd Edition, New York: Harper Collins College Publishers. ISBN 0-06-500606-2. Salkind, N. 2000 Statistics for people who (think they) hate statistics,Thousand Oaks, California: Sage Publications Inc. ISBN 0-7619-1622-9. (This book contains printed versions of all the data sets that are used in the examples and they can also be downloaded from a website.) Sapsford, R. and Jupp,V. (Eds) 1996 Data collection and analysis,Thousand Oaks, California: Sage Publications Inc. ISBN 0-7619-5046 X

Commonwealth of Learning

165

Module A3 Getting and analysing quantitative data

Feedback to selected activities

Feedback to Activity 2
The simplest way is to click on cell B429. Then go Insert Function Average. This gives the answer 39. (AN2) So our respondents are a little older than the population.

Feedback to Activity 3
Your pivot table should look like this:
Sum of COUNT Total QUALS 1 16 2 99 3 113 4 Grand Total 199 427

From this table, you should have been able to produce the percentages of respondents with each level of qualifications and then compare them with the population as in Table 36. (My calculations are in the Worksheet AN3).You will see that our survey respondents are quite different from the population. Those with high previous qualifications were much more likely to complete the questionnaire.
Table 36 Comparison of qualifications of respondents against those of the population Educational Population % qualifications Very low Low Medium High Totals 14 28 25 33 100 Survey % 4 23 26 47 100

166

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Feedback to Activity 4
Your table should look like the following:

Sum of Count Satis 0 1 2 3 4 5 Grand Total

Course C1 2 8 38 54 37 18 157 25 46 56 19 11 157 C2 C3 1 6 18 25 12 8 70 3 11 16 12 1 43 C4 Grand Total 3 42 113 151 80 38 427

Feedback to Activity 5
Your table should look like Table 37. My calculations are shown in the SATI Worksheet (AN5).
Table 37 Percentages for satisfaction by course Course 1 % Course 2 % Course 3 % Course 4 % Total % 1 Strongly Disagree 2 Disagree 3 Neutral 4 Agree 5 Strongly Agree Totals Average ratings 5 25 35 24 12 100 3.1 16 29 36 12 7 100 2.6 9 26 36 17 12 100 3.0 7 26 37 28 2 100 2.9 10 27 36 19 9 100 2.9

Feedback to Activity 6
You should have come up with the following figures.The calculations are to the right of the worksheet. (AN6)

Commonwealth of Learning

167

Module A3 Getting and analysing quantitative data

Table 38 Comparison of three methods of sampling Method 1 Method 2 Method 3 Whole group Mean SD Standard error Upper confidence limit (95%) Lower confidence limit (95%) 2.6 1.2 0.24 3.12 2.16 2.9 1.2 0.23 3.37 2.47 2.8 1.0 0.20 3.19 2.41 2.9 1.1 0.05 3.01 2.79

What you should have noted is that the means for Methods 1, 2 and 3 are all fairly similar to that obtained for the whole group.This should boost your faith in the value of sampling. However, because the number of cases for each method is much smaller, our confidence limits are always likely to be much wider.You can see that in our example the standard error is indeed much smaller for the whole group than for the three samples and hence the confidence limits are narrower. The results using Method 1, i.e. selecting the first 25 cases, were the worst. The mean gave a low estimate and the confidence limits were the widest. In general, the bigger the number of respondents, and the more random the selection procedure, the more accurate one would expect the estimate of the mean to be and the smaller the confidence limits. However, you should also remember that these confidence limits are based on the assumption that the respondents are a random selection from the whole population. If the response rate for your survey is low and there is evidence of response bias, these confidence limits might be artificially low.

Feedback to Activity 8
In this case the null hypothesis is: Satisfaction on Course 3 = Satisfaction on Course 4. The t-test probability is 0.9938 1 or 100%. (See worksheet t-test2 for my calculations.) In other words, the support for the null hypothesis is overwhelmingeven if the satisfaction was the same on both courses, there would be a 10% chance of observing a difference as large as the one in our two samples.On this basis, we have no evidence to reject the null hypothesis, so it stands, i.e. we conclude: Satisfaction on Course 3 = Satisfaction on Course 4. Were you surprised? Well, it should not be too surprising as the sample sizes are much smaller and the difference between the two means is less.

168

Practitioner research and evaluation skills training in open and distance learning

Unit 6: Analysing your research results

Feedback to Activity 9
1 This one just fails to be significant at the 95% level. 2 This one is significant at the 99% level.The 99% standard is a more rigorous one than the 95% one. It is often used in medicine, where errors in interpreting results have to be kept very low. 3 This one is highly significant. 4 This one is not significant.

Feedback to Activity 10
Students living in a rural location were much less satisfied with the courses than were their urban counterparts. This can be seen from the two distributions.58% percent of rural students disagreed with the statement compared to 14% of urban students. Furthermore the t-test showed a highly significant difference between the mean scores.The result was so close to zero that Excel prints it as 1.26106E27 (that means 1.26106 X .000000000000000000000000001 a 1 after 26 zeroes).This is a mathematical technique for writing the number accurately using fewer digits otherwise there would have to be a very long string of zeroes. My calculations are shown in columns O to U. (AN9)

Feedback to Activity 12
You should have come up with a value of 18.4.

Feedback to Activity 13
Urban students had an average exam score of 59. For rural students it was 53. A t-test gives a value of p = 4.60698E-05 or 0.000 in its number Format. This difference was highly significant. Details of my calculations are in worksheet Exam 4.

Feedback to Activity 16
Your graph should look like Figure 26.You would expect a strong correlation because once again the points on the graph resemble a thin cigar.The correlation is positive. Excel only displays 0.9, but it is more correct to write it as + 0.9. Generally speaking, as the value of one variable increases, so does the value of the other variable. Put in simple terms the cigar slopes up from left to right.

Commonwealth of Learning

169

Module A3 Getting and analysing quantitative data

Figure 26 Graph of the relationship between course enjoyment and number of radio programmes
90 80 70 60 Exam score 50 40 30 20 10 0 0 20 40 Distance 60 80 100

Feedback to Activity 17
My calculations are in workbook Correlation, worksheet Descrip. I began by removing cases where either of the answers were missing (zeroes). I calculated the correlation coefficient to be 0.2629 or approximately 0.3.

This result is highly statistically significant, but with these number of cases a result of r .09 would be significant at the 5% level. It is best to describe it as a moderate negative correlation. A negative correlation means that the values of one variable go up as the values of another variable go down. (I did not draw a graph because there are too many data points.) In this case it means that the students who agreed that The learning materials are presented in a confusing way tended to disagree with the statement that The course description in the prospectus was accurate. The implication is that those who felt misled by the prospectus tended to find the course confusing. Changes in the prospectus might be called for.

170

Practitioner research and evaluation skills training in open and distance learning

Permissions
The publishers, editors and authors of this handbook are very grateful to the following copyright holders and authors for permission to include extracts from their work. We are particularly indebted to those publishers and individuals who supported the project by waiving copyright fees. We have made every effort to track down copyright holders. If you consider we have used material which is your copyright without acknowledgement, please accept out apologies and let COL know so the correction can be put into the next edition. Dr Richard Lowry of Vassar College, New York for permission to use a link to http://faculty.vassar.edu/lowry/VassarStats.html PQ Systems, Inc., in Dayton Ohio (http://www.pqsystems.com) for permission to link to their Quality Advisor site at http://www.qualityadvisor.com/sqc/formulas/chi-square-f.htm Professor David Kember and Educational Technology Publications for permission to use an extract from Kember, D. 1995 Open learning courses for adults: a model of student progress, Englewood Cliffs, NJ: Educational Technology Publications Professor Noel Entwhistle of the University of Edinburgh for permission to use ideas from Entwhistle, N. and Tait, H. 1983 Understanding student learning, London: Croom Helm

Commonwealth of Learning

171