Vous êtes sur la page 1sur 34

The effect of testing on student achievement: 1910-2010

Richard P. PHELPS
2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012 1

Meta-analysis
A method for summarizing a large research literature, with a single, comparable measure.

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

The effect of testing on student achievement


12-year long study
analyzed close to 700 separate studies, and more than 1,600 separate effects 2,000 other studies were reviewed and found incomplete or inappropriate

lacking sufficient time and money, hundreds of other studies will not be reviewed
2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012 3

Looking for studies to include in the meta-analyses

1. Included only those studies that found an effect from testing on student achievement or on teacher instruction
2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012 4

Studies included in the meta-analyses

2. when:
a test is newly introduced, or newly removed quantity of testing is increased or reduced test stakes are introduced or increased, or removed or reduced
2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012 5

Studies included in the meta-analyses


3. plus previous research summaries (e.g.)
Kulik, Kulik, Bangert-Drowns, & Schwalb (1983-1991) on: mastery testing, frequency of testing, and programs for high-risk university students Basol & Johanson (2009) on testing frequency Jaekyung Lee (2007) on cross-state studies W.J. Haynie (2007) in career-tech ed
2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012 6

Number of studies of effects, by methodology type


Number of studies 177 247 245 Number of effects 640 813 245

Methodology type
Quantitative Surveys and public opinion polls (US & Canada) Qualitative

TOTAL
2012, Richard P PHELPS

669
International Test Commission, 8th Conference, Amsterdam, July, 2012

1698
7

Effect size: Cohens d

d = (YE - YC) / Spool


YE = mean, experimental group YC = mean, control group Spooled = standard deviation
2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012 8

Effect size: Other formulae

d = t*((n1+n2/n1*n2)^0.5 d = 2r/(1-r)^0.5 d = (YE pre-YE post-YC pre+ YC post)/Spooled post

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

Effect size: Interpretation

d between 0.25 & 0.50 weak effect d between 0.50 et 0.75 medium effect d more than 0.75 strong effect

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

10

Quantitative studies
(population coverage 7 million persons)

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

11

Quantitative studies: Effect size


Bare bones calculation: d +0.55 a medium effect

Bare bones effect size adjusted for measurement error d +0.71 a stronger effect

Using same-study-author aggregation

d +0.88

a strong effect

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

12

Which predictors matter?

Treatment Group is made aware of performance, and control group is not receives targeted instruction (e.g., remediation) is tested with higher stakes than control group is tested more frequently than control group

Mean Effect Size +0.98 +0.96 +0.87 +0.85

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

13

More Moderators Source of Test


Number of Mean Effect Studies Size 87 0.93 24 0.87 38 0.82 11 0.72 160

Researcher or Teacher National Commercial State or District Total

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

14

More Moderators Sponsor of Test

International

Local National State Total

Number of Mean Studies Effect Size 5 1.02 99 0.93 45 0.81 11 0.64 160

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

15

More Moderators - Study Design


Number of Studies 12 107 26 7 8
160

Pre-post Experiment, Quasi-experiment Multivariate Experiment, posttest only Pre-post (with shadow test) Total

Mean Effect Size 0.97 0.94 0.80 0.60 0.58

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

16

More Moderators Scale of Analysis

Aggregated Small-scale Large-scale Total

Number of Mean Studies Effect Size 9 1.60 118 33 160 0.91 0.57

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

17

More Moderators Scale of Administration

Classroom Mid-scale Large-scale Total

Number Mean of Studies Effect Size 115 0.95 6 39 160 0.72 0.71

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

18

Surveys and opinion polls

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

19

Percentage of survey items, by respondent group and type of survey


50 45 40 35 30 25 20 15 10 5 0 Public opinion polls Program evaluation surveys*

Percent

Education Providers Education Consumers

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

20

Number and percent of survey items, by test stakes and target group

Test stakes High Medium Low Unknown TOTAL

Number 507 184 33 89 813

% 62 23 4 11

Target group Students Schools Teachers No stakes TOTAL

Number 393 281 116 64 854

% 46 33 14 7

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

21

Opinion polls, by year


244 between 1958--2008, in the U.S. & Canada 813 unique question-response combinations close to 700,000 individual respondents
120

100

80

60

40

20

1960

1965

1970

1975

1980 Year

1985

1990

1995

2000

2005

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

22

Surveys and opinion polls: Regular standardized tests, performance tests

Regular tests (N 125)


Respondent opinion Achievement is increased weighted by size of study population Instruction is improved weighted by size of study population Tests help align instruction weighted by size of study population d 1.2 1.9 1.0 0.9 1.0 0.5

Performance tests (N 50)


d 1.0 0.5 1.4 0.9 1.0 0.9

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

23

Qualitative studies: Summary


(One cannot calculate an effect size.)

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

24

Qualitative studies, by methodology type

Methodology Case study Experiment or pre-post study Interviews (individual or group) Journal Review of official records, documents, reports

Number of studies 120 21 75 2 33

% 43 7 27 1 12

Research review
Survey TOTAL

8
22 281

3
8 100

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

25

Qualitative studies: Effect on student achievement


244 studies conducted in the past century in over 30 countries
Number of studies 204 24 5 8 Percent without the inferred 93

Direction of effect Positive Positive inferred Mixed No change

Percent of studies 84 10 2 3

2 4

Negative
TOTAL
2012, Richard P PHELPS

3
244

1
100

1
100
26

International Test Commission, 8th Conference, Amsterdam, July, 2012

Qualitative studies: Testing improves student achievement and teacher instruction


Number of studies 200

Achievement is improved
Yes

%
95

Mixed results
No TOTAL

1
10 211

<1
5 100

Instruction is improved Yes No TOTAL

Number of studies 158 7 165

% 96 4 100

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

27

Qualitative studies: Variation by rigor and test stakes


Level of rigor Direction of effect Positive Positive inferred Mixed No change Negative TOTAL high 95 10 3 4 1 113 Stakes Direction of effect Positive Positive inferred Mixed No change Negative TOTAL
2012, Richard P PHELPS

medium 67 8 1 3 1 80

low 42 6 1 1 1 51

Total 204 24 5 8 3 244

high 133 12 4 2 3 154

medium 27 5 1 33

low 38 7 1 5 51

unknown 6

Total 204 24 5 8 3

244
28

International Test Commission, 8th Conference, Amsterdam, July, 2012

Qualitative studies: Regular standardized tests and performance tests

Regular tests (N =176) Study results %

Performance tests (N = 69) %

Generally positive
High-stakes tests High level of study rigor Student attitudes toward test positive Teacher attitudes toward test positive Student achievement improved Instruction improved Large-scale testing

93
71 46 60 55 95 92 86

95
42 48 71 80 95 100 68

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

29

An enormous research literature


But, assertions that it does not exist at all are common
Some claims are made by those who oppose standardized testing, and may be wishful thinking Others are firstness claims

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

30

Dismissive research reviews

With a dismissive research literature review, a researcher assures all that no other researcher has studied the same topic

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

31

Firstness claims

With a firstness claim, a researcher insists that he or she is the first to ever study a topic

2012, Richard P PHELPS

International Test Commission, 8th Conference, Amsterdam, July, 2012

32

Social costs are enormous


Research conducted by those without power or celebrity is dismissed -- ignored and lost Public policies are skewed, based exclusively on the research results of those with power or celebrity Society pays again and again for research that has already been done
2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012 33

The effect of testing on student achievement: 1910-2010

Richard P. PHELPS
2012, Richard P PHELPS International Test Commission, 8th Conference, Amsterdam, July, 2012 34

Vous aimerez peut-être aussi