TXT

Research Design
and Statistical Analysis

Third Edition
Research Design
and Statistical Analysis
Third Edition
Jerome L. Myers
University of Massachusetts at Amherst
Arnold D. Well
University of Massachusetts at Amherst
Robert F. Lorch, Jr.
University of Kentucky
Published in 2010
by Routledge
711 Third Avenue
New York, NY 10017
www.psypress.com
www.researchmethodsarena.com
Published in Great Britain
by Routledge
27 Church Road
Hove, East Sussex BN3 2FA
Copyright © 2010 by Routledge
Routledge is an imprint of the Taylor & Francis Group, an Informa business
Typeset in Times by RefineCatch Limited, Bungay, Suffolk, UK
Cover design by Design Deluxe
All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or
by any electronic, mechanical, or other means, now known or hereafter invented, including
photocopying and recording, or in any information storage or retrieval system, without permission
in writing from the publishers.
Library of Congress Cataloging-in-Publication Data
Myers, Jerome L.
Research design and statistical analysis / Jerome L. Myers, Arnold D. Well. –3rd ed. / Robert F.
Lorch, Jr.
p. cm.
Includes bibliographical references.
1. Experimental design. 2. Mathematical statistics. I. Well, A. (Arnold). II. Lorch, Robert
Frederick, 1952– . III. Title.
QA279.M933 2010
519.5—dc22 2009032606
ISBN: 978–0–8058–6431–1 (hbk)

To Nancy, Susan, and Betty
Contents
Preface xv
PART 1:
Foundations of Research Design and Data Analysis 1
CHAPTER 1 PLANNING THE RESEARCH 3
1.1 Overview 3
1.2 The Independent Variable 5
1.3 The Dependent Variable 6
1.4 The Subject Population 7
1.5 Nuisance Variables 9
1.6 Research Design 10
1.7 Statistical Analyses 15
1.8 Generalizing Conclusions 16
1.9 Summary 17
CHAPTER 2 EXPLORING THE DATA 19
2.1 Overview 19
2.2 Plots of Data Distributions 20
2.3 Measures of Location and Spread 27
2.4 Standardized (z) Scores 35
2.5 Measures of the Shape of a Distribution 36
2.6 Comparing Two Data Sets 38
2.7 Relationships Among Quantitative Variables 40
2.8 Summary 43
vii
viii CONTENTS
CHAPTER 3 BASIC CONCEPTS IN PROBABILITY 47

3.1 Overview 47
3.2 Basic Concepts for Analyzing the Structure of Events 48
3.3 Computing Probabilities 52
3.4 Probability Distributions 58
3.5 Connecting Probability Theory to Data 59
3.6 Summary 60
CHAPTER 4 DEVELOPING THE FUNDAMENTALS
OF HYPOTHESIS TESTING USING
THE BINOMIAL DISTRIBUTION 65
4.1 Overview 65
4.2 What Do We Need to Know to Test a Hypothesis? 66
4.3 The Binomial Distribution 70
4.4 Hypothesis Testing 74
4.5 The Power of a Statistical Test 78
4.6 When Assumptions Fail 84
4.7 Summary 86
CHAPTER 5 FURTHER DEVELOPMENT OF THE FOUNDATIONS
OF STATISTICAL INFERENCE 91
5.1 Overview 91
5.2 Using Sample Statistics to Estimate Population Parameters 93
5.3 The Sampling Distribution of the Sample Mean 98
5.4 The Normal Distribution 101
5.5 Inferences About Population Means 102
5.6 The Power of the z Test 109
5.7 Validity of Assumptions 112
5.8 Relationships Between the Normal and Other Distributions 114
5.9 Summary 116
CHAPTER 6 THE t DISTRIBUTION AND ITS
APPLICATIONS 124
6.1 Overview 124
6.2 Design Considerations: Independent Groups or Correlated
Scores? 125
6.3 The t Distribution 127
6.4 Data Analyses in the Independent-Groups Design 128
6.5 Data Analyses in the Correlated-Scores Design 132
6.6 Assumptions Underlying the Application of the t Distribution 134
6.7 Measuring the Standardized Effect Size: Cohen’s d 139
6.8 Deciding on Sample Size 142
6.9 Post Hoc Power 147
6.10 Summary 148
CONTENTS ix
CHAPTER 7 INTEGRATED ANALYSIS I 154

7.1 Overview 154
7.2 Introduction to the Research 155
7.3 Method 155
7.4 Exploring the Data 156
7.5 Confidence Intervals and Hypothesis Tests 159
7.6 The Standardized Effect Size (Cohen’s d ) 160
7.7 Reanalysis: The Trimmed t Test 160
7.8 Discussion of the Results 161
7.9 Summary 162
PART 2:
Between-Subjects Designs 167
CHAPTER 8 BETWEEN-SUBJECTS DESIGNS: ONE
FACTOR 169
8.1 Overview 169
8.2 An Example of the Design 170
8.3 The Structural Model 172
8.4 The Analysis of Variance (ANOVA) 173
8.5 Measures of Importance 179
8.6 When Group Sizes Are Not Equal 182
8.7 Deciding on Sample Size: Power Analysis in the Between-Subjects
Design 185
8.8 Assumptions Underlying the F Test 187
8.9 Summary 195
CHAPTER 9 MULTI-FACTOR BETWEEN-SUBJECTS

DESIGNS 200
9.1 Overview 200
9.2 The Two-Factor Design: The Structural Model 201
9.3 Two-Factor Designs: The Analysis of Variance 205
9.4 Three-Factor Between-Subjects Designs 211
9.5 More Than Three Independent Variables 220
9.6 Measures of Effect Size 220
9.7 A Priori Power Calculations 224
9.8 Unequal Cell Frequencies 224
9.9 Pooling in Factorial Designs 229
9.10 Advantages and Disadvantages of Between-Subjects Designs 230
9.11 Summary 231
CHAPTER 10 CONTRASTING MEANS IN BETWEEN-SUBJECTS

DESIGNS 238
10.1 Overview 238
10.2 Definitions and Examples of Contrasts 239
x CONTENTS
10.3 Calculations for Hypothesis Tests and Confidence Intervals on

Contrasts 240
10.4 Extending Cohen’s d to Contrasts 246
10.5 The Proper Unit for the Control of Type 1 Error 246
10.6 Controlling the FWE for Families of K Planned Contrasts Using
Methods Based on the Bonferroni Inequality 248
10.7 Testing All Pairwise Contrasts 252
10.8 Comparing a – 1 Treatment Means with a Control: Dunnett’s Test 257
10.9 Controlling the Familywise Error Rate for Post Hoc Contrasts 258
10.10 Controlling the Familywise Error Rate in Multi-Factor Designs 260
10.11 Summary 264
CHAPTER 11 TREND ANALYSIS IN BETWEEN-SUBJECTS
DESIGNS 271
11.1 Overview 271
11.2 Some Preliminary Concepts 272
11.3 Trend Analysis in a One-Factor Design 275
11.4 Plotting the Estimated Population Function 283
11.5 Trend Analysis in Multi-Factor Designs 284
11.6 Some Cautions in Applying and Interpreting Trend Analysis 289
11.7 Summary 290
CHAPTER 12 INTEGRATED ANALYSIS II 295
12.1 Overview 295
12.2 Introduction to the Experiment 295
12.3 Method 296
12.4 Results and Discussion 297
12.5 Summary 304
PART 3:
Repeated-Measures Designs 307
CHAPTER 13 COMPARING EXPERIMENTAL DESIGNS
AND ANALYSES 309
13.1 Overview 309
13.2 Factors Influencing the Choice Among Designs 310
13.3 The Treatments × Blocks Design 312
13.4 The Analysis of Covariance 316
13.5 Repeated-Measures (RM ) Designs 319
13.6 The Latin Square Design 323
13.7 Summary 326
CHAPTER 14 ONE-FACTOR REPEATED-MEASURES
DESIGNS 332
14.1 Overview 332
14.2 The Additive Model in the One-Factor Repeated-Measures
Design 333
CONTENTS xi
14.3 Fixed and Random Effects 335

14.4 The ANOVA and Expected Mean Squares for the Additive Model 335
14.5 The Nonadditive Model for the S × A Design 337
14.6 The Sphericity Assumption 341
14.7 Dealing with Nonsphericity 343
14.8 Measures of Effect Size 346
14.9 Deciding on Sample Size: Power Analysis in the Repeated-Measures
Design 349
14.10 Testing Single df Contrasts 352
14.11 The Problem of Missing Data in Repeated-Measures Designs 353
14.12 Nonparametric Procedures for Repeated-Measures Designs 355
14.13 Summary 358
CHAPTER 15 MULTI-FACTOR REPEATED-MEASURES AND
MIXED DESIGNS 362
15.1 Overview 362
15.2 The S × A × B Design with A and B Fixed 363
15.3 Mixed Designs with A and B Fixed 366
15.4 Designs with More Than One Random-Effects Factor: The Fixed- vs
Random-Effects Distinction Again 373
15.5 Rules for Generating Expected Mean Squares 374
15.6 Constructing Unbiased F Tests in Designs with Two Random
Factors 377
15.7 Fixed or Random Effects? 382
15.8 Understanding the Pattern of Means in Repeated-Measures
Designs 383
15.9 Effect Size 388
15.10 A Priori Power Calculations 390
15.11 Summary 391
CHAPTER 16 NESTED AND COUNTERBALANCED VARIABLES
IN REPEATED-MEASURES DESIGNS 397
16.1 Overview 397
16.2 Nesting Stimuli Within Factor Levels 398
16.3 Adding a Between-Subjects Variable to the Within-Subjects
Hierarchical Design 402
16.4 The Replicated Latin Square Design 404
16.5 Including Between-Subjects Variables in the Replicated Square
Design 410
16.6 Balancing Carry-Over Effects 412
16.7 Greco-Latin Squares 414
16.8 Summary 415
CHAPTER 17 INTEGRATED ANALYSIS III 419
17.1 Overview 419
17.2 Introduction to the Experiment 419
17.3 Method 420
xii CONTENTS

17.5 An Alternative Design: The Latin Square 425
17.6 Summary 430
PART 4:
Correlation and Regression 433
CHAPTER 18 AN INTRODUCTION TO CORRELATION
AND REGRESSION 435
18.1 Introduction to the Correlation and Regression Chapters 435
18.2 Overview of Chapter 18 436
18.3 Some Examples of Bivariate Relationships 437
18.4 Linear Relationships 441
18.5 Introducing Correlation and Regression Using z Scores 443
18.6 Least-Squares Linear Regression for Raw Scores 447
18.7 More About Interpreting the Pearson Correlation Coefficient 452
18.8 What About Nonlinear Relationships? 460
18.9 Concluding Remarks 460
18.10 Summary 461
CHAPTER 19 MORE ABOUT CORRELATION 467
19.1 Overview 467
19.2 Inference About Correlation 467
19.3 Partial and Semipartial (or Part) Correlations 479
19.4 Missing Data in Correlation 483
19.5 Other Measures of Correlation 483
19.6 Summary 488
CHAPTER 20 MORE ABOUT BIVARIATE REGRESSION 493
20.1 Overview 493
20.2 Inference in Linear Regression 494
20.3 Using Regression to Make Predictions 502
20.4 Regression Analysis in Nonexperimental Research 504
20.5 Consequences of Measurement Error in Bivariate Regression 505
20.6 Unstandardized vs Standardized Regression Coefficients 507
20.7 Checking for Violations of Assumptions 508
20.8 Locating Outliers and Influential Data Points 515
20.9 Robust Regression 521
20.10 Repeated-Measures Designs and Hierarchical Regression 522
20.11 Summary 523
CHAPTER 21 INTRODUCTION TO MULTIPLE REGRESSION 528
21.1 Overview 528
21.2 An Example of Regression with Several Predictors 529
21.3 The Meaning of the Regression Coefficients 537
21.4 The Partitioning of Variability in Multiple Regression 540
CONTENTS xiii
21.5 Suppression Effects in Multiple Regression 546

21.6 Summary 547
CHAPTER 22 INFERENCE, ASSUMPTIONS, AND POWER
IN MULTIPLE REGRESSION 551
22.1 Overview 551
22.2 Inference Models and Assumptions 551
22.3 Testing Different Hypotheses About Coefficients in Multiple
Regression 552
22.4 Controlling Type 1 Error in Multiple Regression 555
22.5 Inferences About the Predictions of Y 556
22.6 Confidence Intervals for the Squared Multiple Correlation
Coefficient 557
22.7 Power Calculations in Multiple Regression 559
22.8 Testing Assumptions and Checking for Outliers and Influential Data
Points 562
22.9 Automated Procedures for Developing Prediction Equations 568
22.10 Summary 571
CHAPTER 23 ADDITIONAL TOPICS IN MULTIPLE
REGRESSION 573
23.1 Overview 573
23.2 Specification Errors and Their Consequences 574
23.3 Measurement Error in Multiple Regression 576
23.4 Missing Data in Multiple Regression 577
23.5 Multicollinearity 578
23.6 Unstandardized vs Standardized Coefficients in Multiple
Regression 580
23.7 Regression with Direct and Mediated Effects 582
23.8 Testing for Curvilinearity in Regression 583
23.9 Including Interaction Terms in Multiple Regression 587
23.10 Logistic Regression 594
23.11 Dealing with Hierarchical Data Structures in Regression, Including
Repeated-Measures Designs 595
23.12 Summary 597
CHAPTER 24 REGRESSION WITH QUALITATIVE AND
QUANTITATIVE VARIABLES 602
24.1 Overview 602
24.2 One-Factor Designs 603
24.3 Regression Analyses and Factorial ANOVA Designs 609
24.4 Testing Homogeneity of Regression Slopes Using Multiple
Regression 618
24.5 Coding Designs with Within-Subjects Factors 620
24.6 Summary 623
xiv CONTENTS
CHAPTER 25 ANCOVA AS A SPECIAL CASE OF MULTIPLE

REGRESSION 627
25.1 Overview 627
25.2 Rationale and Computation 627
25.3 Adjusting the Group Means in Y for Differences in X and Testing
Contrasts 633
25.4 Assumptions and Interpretation in ANCOVA 635
25.5 Using the Covariate to Assign Subjects to Groups 641
25.6 Estimating Power in ANCOVA 641
25.7 Extensions of ANCOVA 642
25.8 Summary 643
CHAPTER 26 INTEGRATED ANALYSIS IV 647
26.1 Overview 647
26.2 Introduction to the Study 647
26.3 Method 648
26.4 Procedure 649
26.6 A Hypothetical Experimental Test of the Effects of Leisure Activity on
Depression 652
26.7 Summary and More Discussion 655
PART 5:
Epilogue 659
CHAPTER 27 SOME FINAL THOUGHTS: TWENTY
SUGGESTIONS AND CAUTIONS 661
APPENDICES
Appendix A Notation and Summation Operations 669
Appendix B Expected Values and Their Applications 678
Appendix C Statistical Tables 682
Answers to Selected Exercises 715
References 776
Author Index 791
Subject Index 797
Preface
Like the previous editions, this third edition of Research Design and Statistical Analysis is intended
as a resource for researchers and a textbook for graduate and advanced undergraduate students.
The guiding philosophy of the book is to provide a strong conceptual foundation so that readers are
able to generalize concepts to new situations they will encounter in their research, including new
developments in data analysis and more advanced methods that are beyond the scope of this
book. Toward this end, we continue to emphasize basic concepts such as sampling distributions,
design efficiency, and expected mean squares, and we relate the research designs and data analyses
to the statistical models that underlie the analyses. We discuss the advantages and disadvantages
of various designs and analyses. We pay particular attention to the assumptions involved, the
consequences of violating the assumptions, and alternative analyses in the event that assumptions
are seriously violated.
As in previous editions, an important goal is to provide coverage that is broad and deep enough
so that the book can serve as a textbook for a two-semester sequence. Such sequences are common;
typically, one semester focuses on experimental design and the analysis of data from such experi-
ments, and the other semester focuses on observational studies and regression analyses of the data.
Incorporating the analyses of both experimental and observational data within a single textbook
provides continuity of concepts and notation in the typical two-semester sequence and facilitates
developing relationships between analysis of variance and regression analysis. At the same time, it
provides a resource that should be helpful to researchers in many different areas, whether analyzing
experimental or observational data.
CONTENT OVERVIEW
Also like the previous editions, this edition can be viewed as consisting of four parts:
1. Data exploration and basic concepts such as sampling distributions, elementary prob-
ability, principles of hypothesis testing, measures of effect size, properties of estimators,
xv
xvi PREFACE
and confidence intervals on both differences among means and on standardized effect
sizes.
2. Between-subject designs; these are designs with one or more factors in which each
subject provides a single score. Key elements in the coverage are the statistical models
underlying the analysis of variance for these designs, the role of expected mean squares in
justifying hypothesis tests and in estimating effects of variables, the interpretation of inter-
actions, procedures for testing contrasts and for controlling Type 1 error rates for such
tests, and trend analysis—the analysis and comparison of functions of quantitative
variables.
3. Extension of these analyses to repeated-measures designs; these are designs in which sub-
jects contribute several scores. We discuss nesting and counterbalancing of variables in
research designs, present quasi-F ratios that provide approximate tests of hypotheses, and
consider the advantages and disadvantages of different repeated-measures and mixed
designs.
4. The fourth section provides a comprehensive introduction to correlation and regression,
with the goal of developing a general framework for analysis that incorporates both
categorical and quantitative variables. The basic ideas of regression are developed first for
one predictor, and then extended to multiple regression. The expanded section on multiple
regression discusses both its usefulness as a tool for prediction and its role in developing
explanatory models. Throughout, there is an emphasis on interpretation and on identifying
common errors in interpretation and usage.
NEW TO THIS EDITION
Although the third edition shares the overall goals of the previous editions, there are many modifi-
cations and additions. These include: (1) revisions of all of the chapters from the second edition; (2)
seven new chapters; (3) more examples of the use of SPSS to conduct statistical analyses; (4) added
emphasis on power analyses to determine sample size, with examples of the use of G*Power (Faul,
Erdfelder, Lang, & Buchner, 2007) to do this; and (5) new exercises. In addition to the modifications
of the text, there is a substantial amount of additional material at the website, www.psypress.com/
research-design. The website contains the following: (1) SPSS syntax files to perform analyses from
a wide range of designs and a hotlink to the G*Power program; (2) all of the data files used in the
text and in the exercises in SPSS and Excel formats; (3) technical notes containing derivations of
some formulas presented in the book; (4) extra material on multiple and on logistic regression; and
(5) a solutions manual and the text’s figures and tables for instructors only.
Additional chapters
There are seven new chapters in the book. Chapters 1, 13, and 27 were added to provide more
emphasis on the connections between design decisions, statistical analyses, and the interpretation of
results from a study. In addition, a chapter has been added at the end of each of the four sections
noted above to provide integrated examples of applications of the principles and procedures
covered in the sections.
Planning the Research. The first chapter provides a schema for thinking about the major
steps involved in planning a study, executing it, and analyzing and interpreting the results. The
PREFACE xvii
emphasis is on the implications of decisions made in the planning study for subsequent analyses
and interpretation of results. The chapter establishes a critical theme for the rest of the book;
namely, that design and statistical analyses go hand-in-hand.
Comparing Experimental Designs. Chapter 13, the first chapter in the third section on
repeated-measures, provides a bridge between the second and third sections. It introduces blocking
in research designs, the analysis of covariance, and repeated-measures and Latin square designs.
Advantages and disadvantages of these designs and analyses are discussed. In addition, the import-
ant concept of the relative efficiency of designs is introduced, and illustrated with data and the
results of computer sampling studies. This chapter reinforces the theme of the intimate connection
between design and statistical analysis.
Review of Important Points and Cautions About Common Errors. Chapter 27 is

intended to remind readers of points discussed in the book—points we believe to be important but
sometimes overlooked—and to warn against common errors in analyzing and interpreting results.
For example, the chapter reminds readers of the importance of carefully choosing a research design
and the need for a priori power calculations. As another example, the chapter again emphasizes the
distinction between statistical and practical, or theoretical, significance.
Integrated Analysis Chapters. Each chapter in the book covers a lot of conceptual and
procedural territory, so the integrated analysis chapters provide opportunities to see how the con-
cepts and analyses come together in the context of a research problem. In these chapters, we
consider the design of a study and the analysis of the resulting data. The presentation includes
discussion of the pros and cons of possible alternative designs, and takes the analysis through
exploration of the data to inferential procedures such as hypothesis tests, including, where
applicable, tests of contrasts, estimates of effect size, and alternatives to the standard analyses in
consideration of possible violations of assumptions. These chapters also serve as a review of the
preceding chapters in the section and, in some cases, are used to introduce additional methods.
Use of Statistical Software

We assume that most readers will have access to some statistical software package. Although we
have used SPSS for most of our examples, the analyses we illustrate are available in most packages.
At several points, we have indicated the relevant SPSS menu options and dialog box choices needed
to carry out an analysis. In cases where certain analyses are not readily available in the menus of
current versions of SPSS (and possibly other statistical packages), we have provided references
to Internet sites that permit free, or inexpensive, downloads of relevant software; for example,
programs for obtaining confidence intervals for various effect size measures. Also, although their
use is not required in the book, we have provided a number of SPSS syntax files available at the
book’s website (see below) that can be used to perform analyses for a wide range of designs. We note
that the syntax files were written using SPSS 17 and that the analyses reported in the book are also
based on that version. Future versions may provide slightly different output, other options for
analysis, or somewhat different syntax.
We have used G*Power 3 for power analyses in many of the chapters and in some exercises.
This very versatile software provides both a priori and post hoc analyses for many designs
and analyses, as well as figures showing the central and noncentral distributions for the test and
xviii PREFACE
parameters under consideration. The software and its use are described by Faul, Erdfelder, Lang,
and Buchner (2007). G*Power 3 can be freely downloaded from the website (http://www.psycho.
uni-duesseldorf.de/abteilungen/aap/gpower3/). Readers should register there in order to download
the software and to be notified of any further updates. Description of the use of G*Power 3 and
illustrations in the current book are based on Version 3.0.9. An excellent discussion of the use of
that program has been written by Faul, Erdfelder, Lang, and Buchner (2007).
Exercises
As in previous editions, each chapter ends with a set of exercises. Answers to odd-numbered
exercises are provided at the back of the book; all answers are available in the password-protected
Instructor’s Solution Manual available at the book’s website. There are more than 40 exercises
in the four new integrated-analysis chapters to serve as a further review of the material in the
preceding chapters.
The Book Website

For the third edition, a variety of materials are available on the website, www.psypress.com/
research-design. These include the following.
Data Files
A number of data sets can be accessed from the book’s website. These include data sets used
in analyses presented in the chapters, so that these analyses can be re-created. Also, there are
additional data sets used in the exercises. All data files are available both in SPSS and Excel format,
in order to make them easily accessible. Some of these data sets have been included in order to
provide instructors with an additional source of classroom illustrations and exercises. For example,
we may have used one of the tables in the book to illustrate analysis of variance, but the file can also
be used to illustrate tests of contrasts that could follow the omnibus analysis. A listing of the data
files and descriptions of them are available on the website.
SPSS Syntax Files

As mentioned above, a number of optional syntax files are provided on the website that can be used
to perform analyses for a wide range of designs. These include analyses involving nesting of vari-
ables having random effects; tests involving a variety of contrasts, including comparisons of con-
trasts and of trend components; and a varied set of regression analyses. These files, together with a
Readme Syntax file that describes their use, are available at the website.
Technical Notes
For the sake of completeness, we wanted to present derivations for some of the expressions used in
the book—for example, standard errors of regression coefficients. Because these derivations are not
necessary to understand the chapters, and may be intimidating to some readers, we have made them
available as optional technical notes on the website.
PREFACE xix
Additional Chapters
Here we present two supplementary chapters in pdf format that go beyond the scope of the book.
One is a brief introduction to regression analysis using matrix algebra. The other is an introduction
to logistic regression that is more comprehensive than the brief section that we included in Chapter
23. Other material will be added at various times.
Teaching Tools
There is information on the website for instructors only. Specifically, there is a solutions manual for
all the exercises in the book and electronic files of the figures in the book.
Errata
Despite our best intentions, some errors may have crept into the book. We will maintain an up-to-
date listing of corrections.
ACKNOWLEDGMENTS
We wish to express our gratitude to Axel Buchner for helpful discussions about G*Power 3; to
J. Michael Royer for permission to use the data from his 1999 study; to Jennifer Wiley and James F.
Voss for permission to use the Wiley-Voss data; to Melinda Novak and Corrine Lutz for permission
to use the self-injurious behavior data set; and to Ira Ockene for permission to use the Seasons
data. The Seasons research was supported by National Institutes of Health, National Heart, Lung,
and Blood Institute Grant HL52745 awarded to University of Massachusetts Medical School,
Worcester, Massachusetts.
Special thanks go to those individuals who reviewed early chapters of the book and made many
useful suggestions that improved the final product: Jay Parkes, University of New Mexico; John
Colombo University Kansas; William Levine, University of Arkansas; Lawrence E. Melamed,
Kent State University; and one anonymous reviewer. We also wish to thank our colleagues,
Alexander Pollatsek and Caren Rotello, and the many graduate assistants who, over the years, have
contributed to our thinking about the teaching of statistics.
We also wish to express our gratitude to several individuals at Routledge. We have been
greatly helped in the development and publication of this book by Debra Riegert, Senior Editor,
and her Editorial Assistant, Erin Flaherty, as well as by Nicola Ravenscroft, the Project Editor for
this book, and Joseph Garver, who was responsible for the final technical editing. We also wish to
thank the American Statistical Association, the Biometric Society, and the Biometrika Trustees for
their permission to reproduce statistical tables.
Finally, as always, we are indebted to our wives, Nancy A. Myers, Susan Well, and Elizabeth
Lorch, for their encouragement and patience during this long process.
PART 1
Foundations of
Research Design
and Data Analysis
Chapter 1
Planning the Research
1.1 OVERVIEW
There are three essential stages in carrying out an effective research project. In the first stage, the
research is planned: objectives are stated; decisions are made about the treatments to be included,
the measures to be obtained, and the type and number of subjects; the research design is deter-
mined; and possible patterns of results and their implications are contemplated. This stage is
reflected primarily in the Method section of the final research report. In the second stage, the
data are collected and analyzed: descriptive statistics are calculated; population parameters are
estimated; and inferential tests are performed to determine whether any obtained effects are larger
than those we would expect to occur due to chance. The outcomes of the analyses are presented
in words, tables, and graphs in the Results section of the final report. The final stage is the
interpretation of the results, which typically is presented in the Discussion section of the research
report: What do the results tell us about the answers to the questions we initially asked? Answering
the questions that motivated the research is, of course, our ultimate goal; however, correct con-
clusions about what our results mean are totally dependent upon the first two stages. If the study
is designed in such a way that factors other than the independent variable may have influenced
the results, we may be led to incorrect conclusions. Even if the design of the study is sound,
our statistical tests may fail to reveal effects that are present in the population from which we
have sampled if our measures are very variable, or if we collect too few data. Finally, despite a
sound design and adequate procedures, data analyses that violate assumptions underlying the
statistical procedures may lead us to incorrect inferences. We may think of the planning and
analysis stages as providing input to the inferential stage and, as an old adage states, “garbage in,
garbage out.”
In this chapter, we focus on the initial stage of planning the research. We will present an
overview of the decisions that confront the researcher at the onset of a new project, and of the
factors that should influence those decisions. In subsequent chapters, we will return to many of
the issues raised in this chapter, explicitly linking the decisions made in the planning stage to aspects
of the data analysis. Indeed, the decisions made in planning the research are the major factors
3
4 1 / PLANNING THE RESEARCH
influencing the results of statistical analyses, and subsequently influencing the conclusions that are
drawn.
Chapter 1 is organized by the major decisions that must be made in planning a study:
• The independent variable. What is the question being asked in the research study? The
question must be translated into an independent variable to be manipulated in an
experiment or a predictor variable to be measured in an observational study.
• The dependent variable. What measure or measures should we use? In any study, there
is a choice of measures. Different measures will provide different information and have
different psychometric properties. Some measures may be better windows than others on
the phenomenon we are studying. Some measures may be more sensitive than others to
variation in the behaviors that are of interest. Some measures may be more reliable than
others as indicators of some aspect of ability or performance. How should we balance
these considerations in deciding among alternative measures?
• The subject population. Who is the target of the research question? Are we interested in
healthy, elderly adults? Do we wish to study brain development in rats? And how should
observations be sampled from the relevant population? Shall we sample from a diverse
population, perhaps varying widely in attributes such as age, intelligence, and ethnicity? Or
should our sampling process be more tightly constrained? What are the implications of our
decisions about sampling for conclusions that we hope to be able to make?
• Nuisance variables. In addition to decisions about the independent and dependent
variables, the researcher must carefully consider the possible influences of other variables in
the research study. What other variables may plausibly influence the dependent variable?
These other variables are potentially a nuisance in two very important respects. First, if
they are not taken into account, they may confound the independent variable. In that
case, it may be impossible to determine whether a difference in the mean scores across
various conditions is due to an effect of the independent variable, or to effects of the
nuisance variables that are correlated with the independent variable. Second, even if
steps are taken to make certain that the independent variable is not confounded with
other variables, nuisance variables contribute random variability to the data. This
“error variance” is a concern because it can result in our failing to detect effects of the
independent variable.
• The research design. In some circumstances, the research question demands an obser-
vational study; in other circumstances, the research question can be addressed by an
experiment; in all circumstances, there are many options in designing the final study. The
choice among the options must be informed by all of the questions considered to this
point: What is the independent variable? What is the population to be studied? What
measure has been chosen? What are the potential nuisance variables, and how—and how
much—are they expected to influence the dependent variable?
• The statistical analyses. The statistical analyses should be planned before collecting
the data for two reasons. First, planning the analyses has the healthy effect of forcing the
researcher to specify the questions to be addressed by the data. Second, it enables the
researcher to be sure that, given the planned research design, the targeted questions can be
answered by a statistical analysis.
The decisions made in the planning stage are interrelated; therefore, any sequencing of those
decisions is a bit arbitrary and may be somewhat misleading. Nevertheless, for expository purposes,
THE INDEPENDENT VARIABLE 5
we will sequence the six categories of considerations by the order in which the decisions are typically
first confronted by the researcher.
1.2 THE INDEPENDENT VARIABLE
Research begins with a question. What is the best way to teach the logic of a simple experiment to
fourth-graders? Do people’s attitudes toward gay marriage affect the likelihood that they will vote
in a presidential election? Which dosage schedule minimizes side effects of a drug? The answer to
the question begins with the identification of a variable to be manipulated in an experiment or
observed in a natural setting. In an experiment, we call that variable an “independent variable”; in
an observational setting, we often refer to the variable as a “predictor variable”; for the present, we
will use the term “independent variable” to apply to both situations. The translation of a research
question into a relevant independent variable is guided, in part, by the researcher’s assumptions
about plausible answers to the question. In the question about science teaching, for example, the
choice of teaching interventions might be based on current theory about science teaching. In
the drug example, the selection of different dosages for comparison will likely be based on existing
knowledge of the drug.
Beyond the general process of translating a question into a relevant independent variable, it is
useful to distinguish between quantitative and qualitative variables. Quantitative variables are
defined by the amount of a variable. The research question about drug dosage provides an example
of a quantitative variable. The conditions of an experiment to relate drug dosage to the occurrence
of side effects will compare conditions that differ solely in the amount of drug administered to
subjects. Qualitative variables involve the type of treatment. The research question about science
teaching provides an example of a qualitative variable. The relevant experiment will compare two or
more teaching interventions to determine which is the most effective. In this example, the different
teaching methods would constitute the levels of the independent variable.
1.2.1 Quantitative Independent Variables

In the case of a quantitative independent variable, the experimenter is generally interested in ques-
tions relating to the form of the function relating the independent variable, X, to the dependent
variable, Y. What is the rate of change in the incidence of side effects as drug dosage is increased? At
what point on the dose effect function are side effects at a minimum? Does the function that relates
side effects to drug dosage differ for two alternative drug treatments? Given the focus on the form of
the function, the levels of the independent variable should be chosen to cover a wide enough range
to detect any behavioral change within that range. Further, the number and spacing of levels across
that range should be sufficient to clearly define the shape of the function, including its maximum or
minimum (e.g., the drug dosage that minimizes side effects).
Theoretical considerations may also dictate the choice of levels of a quantitative independent
variable. Suppose the investigator is trying to decide between two theories, one of which predicts
a stepwise function and one of which predicts a gradual change in Y as a function of X. In this
case, examination of a broad range of levels may be sacrificed in order to concentrate on more levels
with a narrow range, presumably permitting a clearer view of the shape of the function within
that range.
1.2.2 Qualitative Independent Variables

In studies investigating a quantitative independent variable, the specific numerical levels are of
less interest in themselves than in the information they provide about a function relating the
independent and dependent variables. In our drug dosage example, it is probably not critical
whether the levels of drug dosage are 10, 20 and 40 mg, or 15, 25 and 50 mg. In contrast, a
qualitative independent variable is one whose levels have typically been chosen specifically
because they are of direct interest to the researcher. In our science teaching example, the teaching
interventions to be compared in such an experiment will be specifically chosen. The choice may be
based on theory, or on previous research findings, or on observations of behavior in naturally
occurring circumstances. For example, two teaching interventions might be constructed to
represent opposing teaching philosophies. Alternatively, an educational researcher may observe
that a novel method of teaching science has produced impressive scores relative to national norms,
so it is decided to compare the novel method with a more conventional method in a controlled
experiment.
1.3 THE DEPENDENT VARIABLE
In addition to the independent variable, a researcher must also select a dependent variable, or
measure of performance. The choice of the dependent variable should be based on several con-
siderations. First, the dependent variable should be a valid measure of the behavior of interest. By
valid, we mean that the measure should reflect the behavior of interest in the research. It is often a
straightforward matter to select a valid measure. For example, if a researcher wishes to study how
the organization of information in a text influences recall of the text, the dependent variable will be
recall. If the incidence of side effects of a drug is of interest, the researcher will probably have a list
of side effects to check and will simply count the number of instances of side effects for each subject
in each condition. However, there are many situations where the selection of a valid measure is less
straightforward. If a researcher wants to study how the organization of information in a text
influences comprehension of the text, the use of recall as a measure of comprehension is less
convincing because recall is influenced by factors other than comprehension. A researcher wishing
to assess teaching effectiveness might use student ratings of the overall quality of instruction.
However, if by “effectiveness” we mean how well students learned in a course, student ratings may
not be a good measure of their learning.
The choice of a valid measure should be guided by the goals of the research and what the
researcher knows about the domain. This includes both theory and the empirical literature. For
example, if a clinical researcher hypothesizes specific factors that underlie depressive behavior,
scales designed to measure those factors may be included in a study of depression. Similarly, if an
experimenter has a theory that predicts effects on response time as well as accuracy in studies of
memory, then both types of measures should be recorded. Further, as a measure is used by many
researchers over time, a knowledge base develops that should inform future use of the measure. For
example, recall measures were common in early studies of text comprehension, but have fallen into
disfavor as it was discovered that recall performance is influenced by memory factors that have little
to do with what most would define as comprehension.
Practical constraints are frequently an important influence on choice of measures. A per-
sonality researcher may feel that the ideal way to measure some clinical state is in an intensive
interview, but if the research goal requires a large number of subjects, a paper-and-pencil scale
THE SUBJECT POPULATION 7
may be used because it is easily administered to many individuals, and can also be scored by
machine.
Within the constraints imposed by the researcher’s goals and by practical limits of time in
administration and scoring, statistical considerations are relevant to selection of the dependent
variable. All other factors being equal, the most reliable measure—the one that is least variable
within conditions—should be chosen. In some instances, high reliability is easily attained. For
example, response times may be consistently accurately recorded. However, in other cases, reliability
is less easily assured. For example, reliability is a major consideration in research in which the
dependent variable is an attitude scale or other measurement instrument. Many articles and books
have been written on the topic of measurement, and we cannot do justice to the issues here.
Interested readers should consult some of the many sources available (e.g., Downing & Haladnya,
2006; Martin & Bateson, 2007; Nunnally & Berstein, 1994).
The sensitivity of the measure is also an important concern when selecting a dependent
variable. All else being equal, a measure that is capable of detecting small differences in per-
formance is preferable to a measure that can detect only gross differences. In our example of a study
of drug side effects, the ultimate side effect would be death of a subject; however, if that was the
only category measured, the resulting failure to record less drastic side effects would seriously
compromise our ability to identify an optimal dosage to minimize side effects. Perhaps less
obviously, measures that are relatively sensitive across many ranges of behavior will not be able to
reveal differences among experimental conditions if performance is close to some maximum or
minimum value. This would be the case, for example, if two methods of teaching are evaluated by
scores on a test so easy that the scores are high regardless of the instructional method. In this
example, simply increasing the difficulty of the test should improve its sensitivity.
Another factor that may affect the outcome of the statistical analysis is the distribution of the
dependent variable. Many common statistical procedures rest on the assumption that the data are
normally distributed; that is, that the distribution curve has, at least approximately, a bell shape.
This is but one of several assumptions that underlie various methods for estimating and testing
effects. We will have much to say throughout this book about the assumptions that underlie the
methods we discuss—what these assumptions are, the consequences when they are violated, and the
alternative methods appropriate when the consequences may invalidate our conclusions.
Finally, we do not want to leave the impression that there is a single “best” dependent variable
in a given experiment. Different measures provide different information, so it is often advisable to
use multiple measures of behavior in the same experiment. The result will be a richer understanding
of behavior.
1.4 THE SUBJECT POPULATION
With the independent variable identified and the dependent variable(s) selected, who will par-
ticipate in the study? The answer to this question has immediate implications for our ability to
generalize our research conclusions. Several considerations shape the choice of subjects in most
research.
Practical considerations often play a major role in defining the subject population. In clinical
research, for example, financial and time limitations may force a researcher to draw a sample of
subjects from the local Veterans Administration hospital. In social and cognitive research, the
easy accessibility of students in introductory psychology courses makes them a nearly irresistible
source of study subjects. These practical considerations are understandable, but it is important
to recognize that the constraints on sampling that they represent serve to define the subject popula-
tion. The sampling process, in turn, has implications for how our research conclusions may be
generalized. In some situations, practical constraints on sampling may not seriously constrain
conclusions. If the study involves basic sensory or perceptual processes, the results from a sample
of college students can probably be generalized to individuals of a similar age who do not attend
college. On the other hand, if a study of problem-solving strategies is conducted on students from a
highly selective college, the conclusions may not be applicable to individuals who do not attend
college or, for that matter, to students at less selective colleges.
Beyond the reality of practical limitations, the purpose of the research should be an important
factor in the sampling of subjects. If the investigator wants to investigate the cognitive abilities of
older adults, then that is the population to be represented in the study. This seems obvious but
matters usually are more complicated. For example, what range of older ages will be included in the
study? The actual sample of subjects may be a function of several considerations beyond the very
general goal.
Theoretical considerations may influence the choice of subjects. Suppose the investigator who
is studying cognitive functioning of older adults hypothesizes that speed and accuracy in memory
and problem-solving tasks are a direct function of the degree to which people continue to engage in
intellectual pursuits. Of course, this depends on the definition of intellectual activities, but the
researcher might administer a questionnaire to ensure that he had a range of individuals with
respect to whether they participated in activities such as doing puzzles, reading and discussing those
readings, or learning new things such as a foreign language.
The subjects’ previous histories will be an important constraint on the sample. If the investi-
gator wants older individuals with no obvious cognitive impairments, individuals who may be
suffering from such impairments should be excluded from the study. If part of the research goal is
to generalize to a population of older individuals representing a broad spectrum of experience,
some attempt should be made to include people from various social and economic classes, ethnic
backgrounds, and work experiences.
Although perhaps less obvious than the preceding considerations, the control of error variance
is an issue in selecting research subjects. The investigator may want to include a wide range of
attributes in the study so that there is a firm statistical basis for generalizing to a broad population.
However, the more diversified the sample, the more variable the data will be and, therefore, the
less clear will be the effects of independent variables on the behavior of interest. For example, if
we want to evaluate some method of improving memory in a sample of older adults, we are likely
to have a better estimate of the effect of the method in a sample whose members are similar in
attributes such as age, experience, and intelligence than in one with wide variation in these
attributes.
The tradeoff in sampling from a more narrowly defined population is that, although the data
are less variable, inferences about individuals outside the range studied are more speculative. For
example, we may assume that if a method of improving memory is less effective for the older
members of our sample, it will be even less effective with individuals older than those in our sample.
However, that is a hypothesis that is only indirectly supported by our data. Ideally, investigators
should think about the population to which they wish to generalize, consider the implications for
the possible variability of the data, and then come to a decision about the attributes to be controlled
in recruiting subjects. Often, we can compensate for a heterogeneous sample by having a larger
sample (see Section 1.6), by the choice of the research design (Chapter 13), and by statistical means
(Chapters 13 and 25).
NUISANCE VARIABLES 9
1.5 NUISANCE VARIABLES
Many of the decisions facing the researcher reflect the fact that scores are influenced by variables
other than the independent variables of interest. Such variables have been labeled “irrelevant” (to
the researcher’s interest; Myers, 1979) or “nuisance” variables (Kirk, 1995). Subjects in research
differ in ability, prior experience, attitudes, age, gender, and many other attributes that may
influence their responses in a study. Even a single individual may make different responses to the
same stimulus at different points in time because of factors such as fatigue or practice, a change in
calibration of equipment, a change in room temperature, or variation in items on a test or rating
instrument.
The presence of nuisance variables is a threat to the success of a study in two potential ways.
First, care must be taken to ensure that nuisance variables are not confounded with the independent
variable; otherwise, differences among experimental conditions may not be unambiguously
attributed to the independent variable. Second, the presence of nuisance variables may make it
more difficult to decide whether the independent variable has had an effect, or whether the effect is
due to chance.
1.5.1 Confounding
Suppose we observe that students taught science by a lab-based curriculum show higher achieve-
ment than students taught science by a text-based curriculum. Can we be sure that the difference
in achievement is due to the difference in the curricula? Are the students taught by the lab-based
approach more likely to come from an environment that provides more parental support or more
resources (e.g., lab equipment)? Are the teachers using the lab-based curriculum better trained in
science or more experienced? These are just some of the potential confounds that may compromise
our hypothetical comparison of curricula. This is but one illustration of how differences between
the means of various conditions may be attributed to the effects of the independent variable when
those differences are in part, or entirely, due to effects of one or more nuisance variables. Such
incorrect attribution of effects may occur whenever variables that are not of primary interest are
related to the independent variable.
Issues of public policy provide an important example because they often involve separating the
contributions of many variables from those of the variable of interest. For example, a relevant
policy issue is whether giving vouchers to poor children to enable them to attend private schools
results in more effective education of those children. However, differences in academic performance
between students who obtain vouchers and those who remain in public schools, cannot simply be
attributed to differences in the schools. Children who obtain vouchers and transfer to private
schools may be more able, or have more motivated parents, than those public school students who
do not opt out of the public school system.
Many other examples of possible confounds due to nuisance variables could be presented.
Relations among variables cause problems of interpretation for true experiments, for observational
studies, and for correlational studies. Campbell and Stanley (1963, 1966) present an excellent dis-
cussion of several classes of nuisance variables, together with consideration of the advantages and
disadvantages of relevant research designs. Although primarily concerned with educational
research, their discussion is relevant to any area.
Fig. 1.1 Two data sets with a line connecting mean scores.
1.5.2 Error Variance

Error variance consists of those chance fluctuations in scores that are attributable to the effects of
nuisance variables. Consider Fig. 1.1. The two points on the x-axis represent two levels of an
independent variable, perhaps two methods of teaching science. In that example, the individual
points on the graph represent the scores of individual students on a science achievement test, and
the points connected by the line would represent the average scores of individuals in the two groups.
The average score at Method 2 is higher than that at Method 1, suggesting that the second method is
superior. However, error variance weakens that conclusion. Note that several scores at Method 2 are
lower than some of the scores at Method 1, even lower than the mean score at Method 1. Can we be
sure that Method 2 represents the better method? Or might the difference in mean performance be
due to chance differences between the groups in average ability or motivation, or some other factor?
Note that if there were no variability within a condition, all the points would fall at the means, and
the advantage of Method 2 would be clear. Unfortunately, error variance is always present, albeit to
varying degrees, in behavioral research.
1.6 RESEARCH DESIGN
In an ideal world, a well-articulated research question would be rewarded by definitive empirical

results. But an ideal world would not include nuisance variables, so even well-articulated research
questions are sometimes answered in a misleading way or not at all. The study of research design
and inferential statistics deals to a large extent with the problems posed by nuisance variables. We
have already encountered the two main problems: nuisance variables can introduce confounds that
make it difficult or impossible to know the true effect of our independent variables; and nuisance
variables produce error variance that hinders a researcher’s ability to discern the effects of
independent variables. In this section, we elaborate these two problems and discuss some strategies
for coping with them.
RESEARCH DESIGN 11
1.6.1 Observation Versus Experimentation

Not all research questions may be approached in the same way. The nature of the approach taken by
a researcher has important implications for the researcher’s ability to cope with potential confounds
and with error variance.
Some questions readily lend themselves to experimentation. Our example of determining the
relationship between drug dosage and the incidence of side effects is a good case in point. Typically,
an animal model would be used to investigate this question in a laboratory setting. The
experimenter has precise control over the administration of drug dosages and extensive control over
the conditions under which the drug is administered (e.g., animal weight, feeding and watering, time
of day). As we will see, the control associated with experiments makes experimentation eminently
suited to dealing with the threats of confounds and error variance.
However, many important questions necessitate a loss of experimental control. For example, if
we want to know the effects of a traumatic event on depression, a true experiment is not possible
because manipulating the independent variable would be unethical. Instead, a researcher might
collect relevant information through interviews or questionnaires administered to individuals
who have undergone some trauma. In such circumstances, a researcher cannot prevent nuisance
variables from confounding variables of interest. For example, if the researcher is specifically inter-
ested in the effects of a traumatic combat experience on depression, a comparison of individuals
who have experienced combat trauma with individuals who have not experienced combat trauma
could easily be compromised by other differences between the groups. Some of the potential con-
founds might be avoided by careful selection of a comparison group (e.g., a comparison group
comprised of soldiers not involved in combat would be preferable to a comparison group comprised
of college students). Still, it is often impossible to identify comparison groups that are matched in
all respects to the group of interest. Similarly, the loss of control inherent to observational studies is
frequently associated with increased error variance.
Finally, there are many examples of questions that may be approached through either
experimentation or observation. Our example of comparing science curricula is relevant. Many
educational researchers would approach this question with observational methods, comparing
the achievement levels of schools that use a lab-based curriculum with those of schools using a
text-based curriculum. One argument for adopting this approach is that an experimental approach
disrupts the natural educational environment, with the consequence that the findings of an experi-
ment will ultimately not generalize to the classroom. Another argument is that the barriers to a
true experimental approach are so great as to render an experiment impractical. However, many
educational researchers counter that an experimental approach can be situated in the classroom
without seriously compromising the natural educational environment. Although conceding that the
barriers to such an approach are significant, they argue that the effort to surmount the barriers is
rewarded by a clearer understanding of what instructional methods are effective and why. There
are valid points on both sides of the argument, and there is a place for both observational and
experimental research. Thus, it is important to have tools to address the effects of nuisance variables
in both types of research.
1.6.2 Dealing with Threats to Valid Inferences

In order to draw a valid conclusion about the effect of an independent variable, a researcher must
be able to rule out the possibility that nuisance variables could explain the apparent effect of the
independent variable.
Control of Nuisance Variables by Random Assignment. The most effective means

for ruling out nuisance variables as an alternative explanation of an observed effect is to guarantee
that the independent variable is not confounded with any relevant nuisance variables. This is
possible only in a true experiment. When the independent variable is manipulated, we have control
over which individuals are assigned to which treatments. In such cases, the potential effects of many
nuisance variables can be countered by random assignment of individuals to conditions. For
example, in a study comparing effects of several instructional methods on problem solving, we
would not want to select subjects in such a way as to increase the likelihood that more motivated
individuals would be assigned to a particular method. However, that might happen if the
first volunteers, possibly the more motivated individuals, were assigned to one method. One
solution is to randomly assign subjects to conditions, using a method that ensures that each subject
has an equal chance of being assigned to each instructional method. Randomization does
not ensure that the various instructional groups are perfectly matched on those nuisance variables
that might influence learning. However, randomization does ensure that over many replications
of the experiment, no treatment will have an advantage. In any one experiment, one condition
might have an advantage (e.g., include more motivated, or more intelligent, subjects) but statistical
analyses that are based on the assumption of randomization take these chance differences into
account.
Randomization does not apply only to the assignment of subjects to conditions. In many
studies, the subject is tested on several different items in each condition. For example, a researcher
might be interested in whether there is a difference in ease of understanding sentences as a function
of whether they are in the active or passive voice. Accordingly, the researcher might create an
active and a passive version of each member of a set of sentences. The time each subject needed
to read each sentence would be recorded; half of the sentences would be randomly selected for
presentation in the active voice, and the other half of the sentences would be in the passive voice. A
different random assignment could be used for each subject. In addition, the order of presentation
of the sentences could be randomly sequenced for each subject. Random assignment of sentences to
the active and passive voices addresses the possibility that some sentences are inherently more
difficult to read by providing an equal chance of their being presented in either condition to each
subject. Similarly, random sequencing of the presentation of the sentences addresses concerns
about the influences on reading time of fatigue, boredom, practice, or other time-related effects that
influence reading time.
Randomization guards against attributing effects to an independent variable when other
variables are responsible for the effects. However, it is not a cure-all. For example, we might wish to
study the effects on depression of a newly developed pill. We could have interviewers rate the state
of depression before and after treatment. However, just the knowledge of being treated might
influence the subjects’ responses. One solution would be to have a control condition, a second group
of depressed patients who are given a placebo (e.g., a sugar pill). Subjects would be randomly
assigned to either the experimental or the control condition. However, if the interviewer is aware of
the condition of each individual being interviewed, this could (presumably unintentionally) bias the
results. The response to this concern is to run what is termed a double-blind experiment, in which
neither interviewer nor subject is aware of the condition. Kirk (1995, pp. 22–24) discusses several
other methods that supplement randomization.
Measurement of Nuisance Variables. In the absence of experimental control, a

researcher can never be completely certain that a nuisance variable might not be responsible, at least
in part, for an apparent effect of an independent variable. The researcher’s best strategy is to
RESEARCH DESIGN 13
identify and measure relevant nuisance variables, and then hope to demonstrate that those variables
cannot account for the observed results. For example, consider a study that attempts to determine
the effect of combat trauma on depression. A comparison of soldiers who have with those who
have not experienced trauma might be compromised by, say, differences in the gender or average
education levels of the two groups. Men may be more likely to encounter traumatic combat experi-
ences than women because of different assignments. Similarly, better educated soldiers may be more
likely to receive assignments that make them less likely to experience combat. If gender and educa-
tion level are recorded and included in the analyses of the data, it may be possible to demonstrate
that neither variable is related to depression and therefore cannot explain an observed relation
between combat trauma and depression. Or gender and/or education level might be related to
depression, but not in a way that can explain an observed relation between trauma and depression.
Of course, it is possible that the nature of the relations between gender, education, and trauma
make it impossible to clearly discern the nature of the relation between trauma and depression.
Measurement is the best a researcher can do in this situation, but it does not substitute for the
ability to exert control over variables.
1.6.3 Dealing with Error Variance

If there is an effect of our independent variable in the population, we hope that the effect will be
clearly evident in our sample. Unfortunately, that doesn’t always happen. The reason is error
variance. An analogy is useful here. Think of error variance as noise and the effect of the inde-
pendent variable as a signal embedded in that noise. If the signal is sufficiently strong relative to the
noise, it will be detected; however, if the signal is too weak, it will be missed. A well-designed and
conducted study will establish circumstances in which the signal-to-noise ratio is high and thus the
probability of detecting an effect of the independent variable is high. This detection probability is
the concept of statistical power: Power is the probability of detecting an effect of an independent
variable in an experiment when an effect exists in the population. We seek to conduct studies with
high power; our primary means to accomplish this goal is to minimize error variance. Several
strategies are available.
Control by Uniform Conditions. Random assignment of subjects or items to treatments

provides protection against wrongly attributing effects to the independent variable. However,
randomization does not reduce error variance. One way to reduce error variance is to hold the
conditions of the study constant except for variation in the levels of the independent variables. For
example, all subjects should receive the same instructions in the same environment, and all animal
subjects should receive the same handling, feeding, and housing. To the extent that sources of
variance other than the independent variable are held constant, error variance will be reduced and
effects of independent variables will be more easily detected.
It is possible and desirable to seek uniformity of conditions in observational studies, as well
as experiments. A good example is found in a study by Räkkönen et al. (1999). They studied
the relation between ambulatory blood pressure (BP) and personality characteristics, taking BP
measurements from all subjects in the same time period on the same three days of the week, two
work days and one nonwork day. In this way, they reduced possible effects of the time at which
measurements were taken. Similarly, researchers have the option of selecting the subjects of
observational studies in such a way as to match subjects on attributes (e.g., age and intelligence) that
may influence the behavior of interest (e.g., learning in a study comparing teaching interventions),
but which are not of interest to the researcher.
Control by Design. The design of the research is a plan for data collection. We can minimize
the effects of error variance by choosing a design that permits us to calculate the contribution of
one or more nuisance variables. In the data analysis, that error variance is subtracted from the total
variability in the data set. Many of the design variations also enable the researcher to equate
treatment conditions for one or more nuisance variables, thus ensuring that those variables are not
responsible for any differences that are found between conditions.
One procedure that is often used in experiments is blocking, sometimes also referred to as
stratification, or matching. Typically, we divide the pool of subjects into blocks on the basis of some
variable whose effects are not of primary interest to us, such as gender or ability level. Then we
randomly assign subjects within each block to the different levels of the independent variable. As an
example of the blocking design, suppose we wish to carry out a study of memory in which subjects
are taught to use one of three possible mnemonic strategies. We might measure the memory span of
subjects before the start of the research. We could then randomly distribute the three subjects with
the highest memory span score among the three conditions, then do the same with the three next
highest scoring subjects, and so on, until all three conditions are filled. In this design, the groups are
roughly matched on initial memory ability, a variable that might influence their memory of
materials in the experiment. Thus, any possible differences between means of the strategy groups
cannot be attributed to initial memory span. As an added result of the matching, a statistical
analysis that treats the level of initial memory span as a second independent variable can now
remove the variability due to that variable. The blocking design is said to be more efficient than
a simpler design that randomly assigns subjects to conditions without regard to memory span.
(Chapter 9 presents the analysis of data when a blocking design has been used.)
Blocking can also be used in some studies that are not true experiments. If we wish to study
attitudes in different ethnic groups about some issue, we can attempt to recruit subjects of different
ethnicities who are matched on variables which might influence responses, such as age, income, and
education.
For some independent variables, even greater efficiency can be achieved if we test the same
subject at each level of the independent variable. This variant of blocking is known as a repeated-
measures design. A common example occurs in studies of forgetting in which measures are obtained
at various times after the material was studied. The advantage of repeated-measures designs is that
subjects can be treated as another independent variable in the statistical analysis, thus removing
from the data variability due to differences among subjects. Ideally, the remaining error variance
will be due to errors of measurement caused by random fluctuations over time in factors such as the
subject’s attentiveness, the difficulty of the stimuli, and the temperature in the room. Thus, the
repeated-measures design is a particularly effective way to minimize error variance, although it is
not without its drawbacks. The major drawbacks of the design are the possibility of systematic
biases due to time-related effects (e.g., practice and fatigue) and the possibility of carry-over effects
(i.e., influences of participation in earlier conditions on performance in subsequent conditions).
(A full treatment of this design is presented in Chapters 14–17.)
Control by Measurement. Often, blocking is not practical. Morrow and Young (1997)
studied the effects of exposure to literature on the reading scores of third-graders. Although reading
scores were obtained before the start of the school year (pretest scores), the composition of the
third-grade classes was established by school administrators prior to the study. Therefore, the
blocking design was not a possibility. However, the pretest score could still be used to reduce error
variance by removing that portion of the variability in the dependent variable (i.e., the posttest
score) that was predictable from the pretest score. This adjustment, called analysis of covariance, is
STATISTICAL ANALYSES 15
thus a statistical means to reduce error variance, as opposed to the design approach of blocking.
(Analysis of covariance is discussed in Chapters 13 and 25.)
Usually, the greater efficiency that comes with more complicated designs and analyses has a
cost. For example, additional information is required for both blocking and the analysis of co-
variance. Furthermore, the appropriate statistical analysis associated with more efficient approaches
is usually based on more assumptions about the nature of the data. In view of this, a major theme
of this book is that there are many possible designs and analyses, and many considerations in
choosing among them. We would like to select our design and method of data analysis with the
goals of minimizing potential confounds and reducing error variance as much as possible. However,
our decisions in these matters may be constrained by the resources that are available and by the
assumptions that must be made about the data. Ideally, the researcher should be aware of the pros
and cons of the different designs and analyses, and the tradeoffs that must be considered in making
the best choice.
Control by Sample Size. The final strategy for dealing with error variance is through the
choice of sample size. The size of the sample does not actually affect the individual error variance
in the design; however, it does influence the amount of error in estimates of the effects of the
independent variables. As sample size increases, estimates of effects become more precise; as a
consequence, the power to detect the effects improves. Thus, in general, increasing sample size
increases statistical power in any design. The relationship between sample size and power is well
defined for a given design, so we can—and should—use that relationship in planning how many
observations to make in a given experiment or observational study. Sometimes, we may find that the
sample size required to achieve a certain level of power is impractical. However, it is better to know
that before we begin a study so that we can modify our plans by revising our research design, or
choosing other, less variable, measures.
1.7 STATISTICAL ANALYSES
Statistical analyses are performed after the data are collected, of course. However, the analyses
should be planned before the data are collected, as part of the process of evaluating the adequacy of
the research design. The analysis of data broadly consists of two phases: (1) an exploratory phase, in
which measures of central tendency (e.g., means, medians), variability, and shape of distributions
should be calculated and graphed; and (2) an inferential phase, in which population parameters are
estimated and hypotheses about them are tested.
Too often, researchers neglect the first, exploratory phase. However, the descriptive statistics
are the results of an experiment and they therefore deserve particular attention. That attention
begins during the planning phase. The research plans should include decisions about what descrip-
tive statistics will be calculated and what plots of the data will be useful to view. The researcher
should imagine, for example, possible patterns of means across conditions, and what different
patterns would reveal about the behavior being studied. We will present the reasons for, and
examples of, exploratory analyses in the next chapter. For now, we note that such analyses may
suggest additional hypotheses and also inform us about potential problems for the statistical
analyses to be carried out in the next phase.
The goals of the research will often dictate very specific hypotheses or questions. For example,
if there are several levels of the independent variable, are there comparisons between certain levels
that are of particular interest? Or, if the independent variable is quantitative, is it of interest to
examine certain features of the function relating the dependent and independent variables, such as
whether it is best represented by a straight line or a curve? These rather specific sorts of questions
should be considered before the data are collected for two reasons. First, considering specific
questions before the data are collected ensures that the design allows us to answer those questions.
Second, inferences based on tests of hypotheses formulated before the results are viewed have less
likelihood of yielding erroneous conclusions than tests of hypotheses suggested by the observed
results.
The power of the statistical test should be an important consideration in planning the research.
Estimates of error variance obtained from related research or pilot studies, together with a desired
level of power to detect effects of a certain size, should determine the amount of data collected. Just
how this is done is considered in several chapters in this book and computer software is provided on
the website for the book to carry out the calculations.
Finally, it is useful to specify the planned inferential tests in detail to make certain that the
experimental design has been thoroughly evaluated and all potential sources of variability have
been identified. This practice will help guard against oversights in planning the design. For example,
an educational researcher planning an experiment to compare two methods of teaching science
might assign two classes of students at random to a lab-based teaching method and two other
classes to a text-based method. However, this plan is flawed because it neglects an important
potential influence on performance; namely, individual classes may differ in performance regardless
of their assignment to different teaching methods. An implication of this observation is that the
research plan should include many more than four classes so that variability due to classes may be
adequately evaluated.
1.8 GENERALIZING CONCLUSIONS
A final planning consideration is to identify issues concerning the generalizability of any findings
to emerge from the research. Campbell and Stanley (1963) distinguished between internal and
external validity of inferences. Internal validity refers to the question of whether observed effects
can validly be attributed to the independent variable. External validity refers to the extent to which
the inferences that are drawn can be generalized to other conditions or populations. In much of this
chapter, we have been concerned with internal validity; in particular, we have discussed the role
of randomization and research design in protecting against potential confounds. We touched
briefly on the subject of external validity when we cautioned that researchers should consider the
population to which their inferences apply. For example, if a treatment of depression is successful in
a study with all female subjects, or in one with individuals in a certain age range, we should not
conclude that the treatment would also reduce depression in males, or in older subjects, without
some firm basis in theory or data. Perhaps the most common setting in which we need to be careful
about generalizing beyond the population sampled is the academic, in which the subjects are
usually college students within a limited range of age and intelligence, or the children of faculty and
students.
Similar issues arise with respect to generalizing conclusions based on a particular method
or measure. For example, recall and recognition scores have been shown to reflect very different
memory processes. And different results have been obtained in studies of reading in which reading
times are calculated as differences between button presses that bring on successive words in a text
and studies in which reading times are recorded by tracking eye movements.
SUMMARY 17
Studies in which the independent variable is quantitative confront the researcher with other
problems of generalization. There are practical limits to the levels that can be included. Conclusions
about the shape of the function relating dependent and independent variables require interpolation
between, and extrapolation beyond, the levels observed.
Still another common problem for external validity involves generalization to a population
of items. When emotional states or attitudes are inferred from responses to a set of pictures,
we have to consider whether the results are specific to those particular pictures or have more
general implications. Similar inferential issues confront researchers in the area of language
comprehension, whose stimuli are sentences, or words that are usually constrained as to length or
familiarity.
One approach to the issue of external validity is to systematically vary the populations sampled
or the measures obtained. This can be done within a study or over a series of studies. For example, a
study of the effects of some treatment might include gender of the subjects as a second variable. Or,
as has often been done, both recall and recognition measures might be recorded in studies of
memory. The inclusion of variables that enable tests of the generalizability of results should be a
consideration in planning the research. Nevertheless, it is important to understand that results can
never be generalized on every dimension. As Campbell and Stanley (1963) have pointed out, the
very act of conducting an experiment raises the question of whether the results would apply in a
less controlled, real-world setting. Whereas we can provide a logical basis for arguing that certain
methods, such as randomization, increase internal validity, we can only argue for external validity
by extrapolating beyond the methods and results of the current study. We should take great care
before making such extrapolations.
1.9 SUMMARY
Careful planning of a study may save much wasted effort. Such plans should have several
goals:
• The research plan should include steps to ensure that results attributed to the independent
variable are not due to other uncontrolled variables. In experiments, random assignment
is the key procedure for guarding against systematic biases due to nuisance variables. In
less controlled research environments, the researcher must identify nuisance variables
of interest and measure them so that their influences may be assessed.
• Error variance should be controlled as much as possible to ensure maximal power to
detect effects of the independent variable. Several strategies address control of error
variance, including ensuring uniformity of conditions, selecting reliable dependent
variables, choice of research design, and measurement of nuisance variables for purposes
of statistical adjustment.
• Decisions about the number of observations to collect in a study should be informed by
power considerations.
• Statistical analyses, including descriptive statistics, should be planned as part of the process
of planning the research design.
• Issues pertaining to the bases for generalizing conclusions should be considered during the
planning phase.
EXERCISES
1.1 A researcher requested volunteers for a study comparing several methods to reduce weight.
Subjects were told that if they were willing to be in the study, they would be assigned randomly
to one of three methods. Thirty individuals agreed to this condition and participated in
the study.
(a) Is this an experiment or an observational study?
(b) Is the sample random? If so, characterize the likely population.
(c) Describe and discuss an alternative research design.
1.2 A study of computer-assisted learning of arithmetic in third-grade students was carried out in
a private school in a wealthy suburb of a major city.
(a) Characterize the population that this sample represents. In particular, consider whether
the results permit generalizations about computer-assisted instruction (CAI) for the broad
population of third-grade students. Present your reasoning.
(b) This study was done by assigning one class to CAI and one to a traditional method.
Discuss some potential sources of error variance in this design.
1.3 Investigators who conducted an observational study reported that children who spent con-
siderable time in day care were more likely than other children to exhibit aggressive behavior in
kindergarten (Stolberg, 2001). Although this suggests that placement in day care may cause
aggressive behavior—either because of the day-care environment or because of the time away
from parents—other factors may be involved.
(a) What factors other than time spent in day care might affect aggressive behavior in the
study cited by Stolberg?
(b) If you were carrying out such an observational study, what might be done to attempt to
understand the effects upon aggression of factors other than day care?
(c) An alternative approach to the effects of day care upon aggressive behavior would be to
conduct an experiment. How would you do this and what are the pros and cons of this
approach?
1.4 It is well known that the incidence of lung cancer in individuals who smoke cigarettes is higher
than in the general population.
(a) Is this evidence that smoking causes lung cancer?
(b) If you were a researcher investigating this question, what further lines of evidence would
you seek?
1.5 In the Seasons study (the data are in the Seasons file in the Seasons folder on the CD
accompanying this book), we found that the average depression score was higher for males
with only a high-school education than for those with at least some college education. Discuss
the implications of this finding. In particular, consider whether the data demonstrate that
providing a college education will reduce depression.
1.6 In a 20-year study of cigarette smoking and lung cancer, researchers recorded the incidence of
lung cancer in a random sample of smokers and nonsmokers, none of whom had cancer at the
start of the study.
(a) What are the independent and dependent variables?
(b) For each, state whether the variable is discrete or continuous.
(c) What variables other than these might be recorded in such a study? Which of these are
discrete or continuous?
References
Achen, C. H. (1977). Measuring representation: Perils of the correlation coefficient. American Journal of
Political Science, 41, 805–815.
Achen, C. H. (1982). Interpreting and using regression. Beverly Hills, CA: Sage.
Aguinis, H., Beaty, J. C., Boik, R. J., & Pierce, C. A. (2005). Effect size and power in assessing the moderating
effects of categorical variables using multiple regression: A 30-year review. Journal of Applied Psychology,
90, 94–107.
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA:
Sage.
Akritas, M. G. (1990). The rank transform method in some two-factor designs. Journal of the American
Statistical Association, 85, 73–78.
Alexander, R. A., & Govern, D. M. (1994). A new and simpler approach to ANOVA under variance hetero-
geneity. Journal of Educational Statistics, 19, 91–101.
Alf, E. F., & Graf, R. G. (1999). Asymptotic confidence limits for the difference between two squared multiple
correlations: A simplified approach. Psychological Methods, 4, 70–75.
Algina, J., & Keselman, H. J. (1997). Detecting repeated-measures effects with univariate and multivariate
statistics. Psychological Methods, 2, 208–218.
Algina, J., & Keselman, H. J. (1999). Comparing squared multiple correlation coefficients: Examination of an
interval and a test of significance. Psychological Methods, 4, 76–83.
Algina, J., Keselman, H. J., & Penfield, R. D. (2005a). An alternative to Cohen’s standardized mean difference
effect size. Psychological Methods, 10, 317–328.
Algina, J., Keselman, H. J., & Penfield, R. D. (2005b). Effect sizes and their intervals: The two-levels repeated-
measures case. Educational and Psychological Measurement, 65, 241–258.
Algina, J., Keselman, H. J., & Penfield, R. D. (2006). Confidence interval coverage for Cohen’s effect size
statistic. Educational and Psychological Measurement, 66, 945–960.
Allison, P. D. (2002). Missing data. Newbury Park, CA: Sage.
American Psychological Association. (2001). Publication manual of the American Psychological Association (5th
ed.). Washington, DC: Author.
776
REFERENCES 777
Anderson, L. R., & Ager, J. W. (1978). Analysis of variance in small group research. Personality and Social
Psychology Bulletin, 4, 341–345.
Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27, 17–21.
Anscombe, F. J., & Tukey, J. W. (1963). The examination and analysis of residuals. Technometrics, 5, 141–160.
Atiqullah, M. (1964). The robustness of the covariance analysis of a one-way classification. Biometrika, 51,
365–373.
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for
subjects and items. Journal of Memory and Language, 59, 390–412.
Balanda, K. P., & MacGillivray, H. L. (1988). Kurtosis: A critical review. American Statistician, 42, 111–119.
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological
research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psych-
ology, 51, 1173–1182.
Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Society, 160,
268–282.
Beck, A. T., & Steer, R. A. (1984). Beck Depression Inventory: Manual. San Antonio, TX: Psychological
Corporation and Harcourt Brace Jovanovich.
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics. New York: Wiley.
Berk, R. A. (2004). Regression analysis: A constructive critique. Thousand Oaks, CA: Sage.
Bevan, M. F., Denton, J. Q., & Myers, J. L. (1974). The robustness of the F test to violations of continuity and
form of treatment populations. British Journal of Mathematical and Statistical Psychology, 27, 199–204.
Birnbaum, M. H. (1973). The devil rides again: Correlation as an index of fit. Psychological Bulletin, 79,
239–242.
Bishop, Y. M. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice.
Cambridge, MA: MIT Press.
Blair, R. C., & Higgins, J. J. (1980). A comparison of the power of Wilcoxon’s rank-sum statistic to that of
Student’s t statistic under various non-normal distributions. Journal of Educational Statistics, 5, 309–335.
Blair, R. C., & Higgins, J. J. (1985). A comparison of the paired samples t test to that of Wilcoxon’s signed-rank
test under various population shapes. Psychological Bulletin, 97, 119–128.
Bless, H., Bohner, G., Schwarz, N., & Strack, F. (1990). Mood and persuasion: A cognitive response analysis.
Personality and Social Psychology Bulletin, 16, 331–345.
Bloom, B. S. (1964). Stability and change in human characteristics. New York: Wiley.
Boik, R. J. (1981). A priori tests in repeated-measures designs: Effects of nonsphericity. Psychometrika, 46,
241–255.
Boneau, C. A. (1962). A comparison of the power of the U and t tests. Psychological Review, 69, 246–256.
Box, G. E. P. (1953). Nonnormality and tests on variances. Biometrika, 40, 318–335.
Box, G. E. P. (1954). Some theorems on quadratic forms in the study of analysis of variance problems: Effect of
inequality of variance in the one-way classification. Annals of Mathematical Statistics, 25, 290–302.
Box, G. E. P. (1979). Robustness in the strategy of scientific model building. In R. L. Launer & G. N. Wilkinson
(Eds.), Robustness in statistics (pp. 201–236). New York: Academic Press.
Bozivich, H., Bancroft, T. A., & Hartley, H. O. (1956). Power of analysis of variance test procedures for certain
incompletely specified models. Annals of Mathematical Statistics, 27, 1017–1043.
Bradley, J. V. (1968). Distribution-free statistical tests. Englewood Cliffs, NJ: Prentice-Hall.
Brown, M. B., & Forsythe, A. B. (1974a). Robust tests for the equality of variances. Journal of the American
778 REFERENCES
Brown, M. B., & Forsythe, A. B. (1974b). The ANOVA and multiple comparisons for data with heterogeneous
variances. Biometrics, 30, 719–724.
Bryant, J. L., & Paulson, A. S. (1976). An extension of Tukey’s method of multiple comparisons to experi-
mental designs with random concomitant variables. Biometrika, 63, 631–638.
Brysbaert, M. (2007). The language-as-fixed-effect fallacy: Some simple SPSS solutions to a complex problem.
Paper presented at the Second Training Meeting of the EU Marie Curie Research Training Network:
Language and Brain (Oviedo, Spain, April 2007).
Busemeyer, J. R., & Jones, L. E. (1983). Analysis of multiplicative combination rules when the causal variables
are measured with error. Psychological Bulletin, 93, 549–562.
Campbell, D. T., & Kenny, D. A. (1999). A primer on regression artifacts. New York: Guilford.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Boston:
Houghton-Mifflin.
Carlson, J. E., & Timm, N. H. (1974). Analysis of nonorthogonal fixed-effect designs. Psychological Bulletin,
81, 563–570.
Cheung, M. W.-L., & Chan, W. (2004). Testing dependent correlation coefficients via structural equation
modeling. Organizational Research Methods, 7, 206–222.
Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological
research. Journal of Verbal Learning and Verbal Behavior, 12, 335–359.
Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American
Cleveland, W. S., Devlin, S. J., & Grosse, E. H. (1988). Regression by local fitting: Methods, properties, and
computational algorithms. Journal of Econometrics, 37, 87–114.
Clinch, J. J., & Keselman, H. J. (1982). Parametric alternatives to the analysis of variance. Journal of Edu-
cational Statistics, 7, 207–214.
Cobb, J. A., & Hops, H. (1973). Effects of academic survival skill training on low achieving first graders. Journal
of Educational Research, 67, 108–113.
Cochran, W. G. (1941). The distribution of the largest of a set of estimated variances as a fraction of their total.
Eugenics, 11, 47–52.
Cochran, W. G. (1950). The comparison of percentages in matched samples. Biometrika, 37, 256–266.
Cochran, W. G., & Cox, G. M. (1957). Experimental designs (2nd ed.). New York: Wiley.
Cohen, J. (1962). The statistical power of abnormal-social psychological research. Journal of Abnormal and
Social Psychology, 65, 145–153.
Cohen, J. (1973). Eta-squared and partial eta-squared in fixed factor ANOVA designs. Educational and Psycho-
logical Measurement, 33, 107–112.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences (rev. ed.). New York: Academic Press.
Cohen, J. (1978). Partialled products are interactions and partialled powers are curve components. Psycho-
logical Bulletin, 85, 858–866.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (3rd ed.). New York: Academic Press.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304–1312.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd
ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/ correlation analysis for the
behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Conover, W. J., & Iman, R. L. (1981). Rank transformations as a bridge between parametric and nonparametric
statistics. American Statistician, 35, 124–129.
REFERENCES 779
Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics, 19, 15–18.
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. New York: Chapman & Hall.
Cook, R. D., & Weisberg, S. (1999). Applied regression including computing and graphics. New York: Wiley.
Coombs, W. T., Algina, J., & Oltman, D. O. (1996). Univariate and multivariate omnibus hypothesis tests
selected to control Type I error rates when population variances are not necessarily equal. Review of
Educational Research, 66, 137–179.
Cramer, E. M., & Appelbaum, M. I. (1980). Nonorthogonal analysis of variance—once again. Psychological
Bulletin, 87, 51–57.
Crespi, L. P. (1944). Amount of reinforcement and level of performance. Psychological Review, 51, 341–357.
Cronbach, L. J., & Furby, L. (1970). How should we measure “change”—or should we? Psychological Bulletin,
74, 68–80.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals
that are based on central and noncentral distributions. Educational and Psychological Measurement, 61,
532–574.
Dalton, S., & Overall, J. E. (1977). Nonrandom assignment in ANCOVA: The alternative ranks design. Journal
of Experimental Education, 46, 58–62.
Dar, R., Serlin, R. C., & Omer, H. (1994). Misuse of statistical tests in three decades of psychotherapy research.
Journal of Consulting and Clinical Psychology, 62, 75–82.
Davenport, J. M., & Webster, J. T. (1973). A comparison of some approximate F tests. Technometrics, 15, 779–
789.
Davidson, M. L. (1972). Univariate vs. multivariate tests in repeated-measures experiments. Psychological
Bulletin, 77, 446–452.
Dawes, R. M. (1971). A case study of graduate admissions: Application of three principles of human decision
making. American Psychologist, 26, 180–188.
DeCarlo, L. T. (1997). On the meaning and uses of kurtosis. Psychological Methods, 2, 292–307.
DeShon, R. P., & Alexander, R. A. (1996). Alternative procedures for testing regression slope homogeneity
when group error variances are unequal. Psychological Methods, 1, 261–277.
Donaldson, T. S. (1968). Robustness of the F test to errors of both kinds and the correlation between the
numerator and denominator of the F ratio. Journal of the American Statistical Association, 63, 660–676.
Downing, S. M., & Haladnya, T. M. (2006). Handbook of test development. Mahwah, NJ: Lawrence Erlbaum
Associates, Inc.
Draper, N. R., & Smith, H. (1998). Applied regression analysis (3rd ed.). New York: Wiley.
Duncan, D. B. (1955). Multiple range and multiple F tests. Biometrics, 11, 1–42.
Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56,
52–64.
Dunn, O. J., & Clark, V. A. (1969). Correlation coefficients measured on the same individuals. Journal of the
American Statistical Association, 64, 366–377.
Dunnett, C. W. (1955). A multiple comparison procedure for combining several treatments with a control.
Journal of the American Statistical Association, 50, 1096–1121.
Dunnett, C. W. (1964). New tables for multiple comparisons with a control. Biometrics, 20, 482–491.
Dunnett, C. W. (1980). Pairwise multiple comparisons in the unequal variance case. Journal of the American
Edgell, S. E., & Noon, S. N. (1984). Effect of the violation of normality on the t test of the correlation
coefficient. Psychological Bulletin, 95, 576–583.
780 REFERENCES
Efron, B. (1982). The jackknife, the bootstrap, and other resampling plans. Philadelphia: Society for Industrial
and Applied Mathematics.
Efron, B. (1988). Bootstrap confidence intervals: Good or bad? Psychological Bulletin, 104, 293–296.
Efron, B., & Diaconis, B. (1983). Computer-intensive methods in statistics. Scientific American, 248, 115–130.
Emerson, J. D., & Stoto, M. A. (1983). Transforming data. In D. C. Hoaglin, F. Mosteller, & J. F. Tukey (Eds.),
Understanding robust and exploratory data analysis (pp. 97–128). New York: Wiley.
Erdfelder, E., Franz, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior
Research Methods, Instrumentation, and Computers, 28, 1–11.
Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis
program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.
Feldt, L. S. (1958). A comparison of the precision of three experimental designs employing a concomitant
variable. Psychometrika, 23, 335–353.
Fenz, W., & Epstein, S. (1967). Gradients of physiological arousal of experienced and novice parachutists as a
function of an approaching jump. Psychosomatic Medicine, 29, 33–51.
Fienberg, S. E. (1977). The analysis of cross-classified data. Cambridge, MA: MIT Press.
Fisher, R. A. (1935). The design of experiments. Edinburgh, Scotland: Oliver & Boyd.
Fisher, R. A. (1952). Statistical methods for research workers (12th ed.). London: Oliver & Boyd.
Fitzsimons, G. J. (2008). Editorial: Death to dichotomizing. Journal of Consumer Research, 35, 5–8.
Fliess, J. L. (1973). Statistical methods for rates and proportions. New York: Wiley.
Forster, K. I., & Dickinson, R. G. (1976). More on the language-as-fixed-effect fallacy: Monte Carlo estimates
of error rates for F1, F2, F′, and min F ′. Journal of Verbal Learning and Verbal Behavior, 15, 135–142.
Fox, J. (1997). Applied regression analysis, linear models, and related methods. Thousand Oaks, CA: Sage.
Fredrickson, B. L., & Kahneman, D. (1993). Duration neglect in retrospective evaluations of affective episodes.
Journal of Personality and Social Psychology, 65, 45–55.
Friedenreich, C. M., Courneya, K. S., & Bryant, H. E. (1998). The lifetime total physical activity questionnaire:
Development and reliability. Medicine and Science in Sports and Exercise, 30, 266–274.
Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of
variance. Journal of the American Statistical Association, 32, 675–701.
Games, P. A. (1973). Type IV errors revisited. Psychological Bulletin, 80, 304–307.
Games, P. A., & Howell, J. F. (1976). Pairwise multiple comparison procedures with unequal n’s and/or
variances: A Monte Carlo study. Journal of Educational Statistics, 1, 113–125.
Games, P. A., Keselman, H. J., & Clinch, J. J. (1979). Tests for homogeneity of variance in factorial designs.
Psychological Bulletin, 86, 978–984.
Games, P. A., Keselman, H. J., & Rogan, J. C. (1981). Simultaneous pairwise multiple comparison procedures
when sample sizes are unequal. Psychological Bulletin, 90, 594–598.
Ganzach, Y. (1997). Misleading interaction and curvilinear terms. Psychological Methods, 2, 235–247.
Gary, H. E., Jr. (1981). The effects of departures from circularity on Type 1 error rates and power for randomized
block factorial experimental designs. Unpublished doctoral dissertation, Baylor University, Waco, TX.
Gatsonis, C., & Sampson, A. R. (1989). Multiple correlation: Exact power and sample size calculations.
Gibbons, J. D. (1993). Nonparametric statistics: An introduction. Newbury, CA: Sage.
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3–8.
Goldstein, H. (1995). Multilevel statistical models. London: Edward Arnold.
REFERENCES 781
Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 55,
431–433.
Greenwald, A. G. (1993). Consequences of prejudice against the null hypothesis. In G. Keren & C. Lewis (Eds.),
A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 419–448). Hillsdale, NJ:
Lawrence Erlbaum Associates, Inc.
Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of Consulting and Clinical Psychology,
68, 155–165.
Grissom, R. J., & Kim, J. J. (2001). Review of assumptions and problems in the appropriate conceptualization
of effect size. Psychological Methods, 6, 135–146.
Harlow, L. L. (1997). Significance testing introduction and overview. In L. L. Harlow, S. A. Mulaik, & J. H.
Steiger (Eds.), What if there were no significance tests? (pp. 1–17). Mahwah, NJ: Lawrence Erlbaum
Associates, Inc.
Harris, R. J. (1985). A primer of multivariate statistics (2nd ed.). New York: Academic Press.
Harris, R. J. (2001). A primer of multivariate statistics (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates,
Inc.
Harter, H. L., Clemm, D. S., & Guthrie, E. H. (1959). The probability integrals of the range and of the Studen-
tized range (WADC Tech. Rep. 58–484). Wright-Patterson Air Force Base, Ohio: Wright Air Development
Center.
Hartley, H. O. (1950). The maximum F-ratio as a short-cut test for heterogeneity of variance. Biometrika, 37,
308–312.
Havlicek, L. L., & Peterson, N. L. (1977). Effect of the violations of assumptions upon significance levels for
the Pearson r. Psychological Bulletin, 84, 373–377.
Hays, W. L. (1981). Statistics (3rd ed.). New York: Holt, Rinehart & Winston.
Hays, W. L. (1988). Statistics (4th ed.). New York: Holt, Rinehart & Winston.
Hays, W. L. (1994). Statistics (5th ed.). New York: Holt, Rinehart & Winston.
Hayter, A. J. (1986). The maximum familywise error rate of Fisher’s least significant difference test. Journal of
the American Statistical Association, 81, 1000–1004.
Hedges, L. V. (1981). Distributional theory for Glass’s estimator of effect size and related estimators. Journal of
Educational Statistics, 6, 107–128.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic Press.
Herzberg, P. A. (1969). The parameters of cross validation. Psychometrika (Monograph supplement, No. 16).
Hill, M., & Dixon, W. J. (1982). Robustness in real life: A study of clinical laboratory data. Biometrics, 38, 377–
396.
Hillebrand, D. K. (1986). Statistical thinking for behavioral scientists. Boston: Duxbury.
Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (1983). Understanding robust and exploratory data analysis. New
York: Wiley.
Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (1985). Exploring data tables, trends and shapes. New York: Wiley.
Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (1991). Fundamentals of exploratory analysis of variance. New
York: Wiley.
Hoaglin, D. C., & Welsch, R. (1978). The hat matrix in regression and ANOVA. American Statistician, 32, 17–
22.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–
803.
Hocking, R. R. (1983). Developments in linear regression methodology: 1959–1982. Technometrics, 25, 219–
245.
782 REFERENCES
Hoenig, J. M., & Heisey. D. M. (2001). The abuse of power: The pervasive fallacy of power calculations in data
analysis. American Statistician, 55, 19–24.
Hogg, R. V. (1974). Adaptive robust procedures: A partial review and some suggestions for future applications
and theory. Journal of the American Statistical Association, 69, 909–927.
Hogg, R. V. (1979). Statistical robustness: One view of its use in application today. American Statistician, 33,
108–115.
Hogg, R. V., Fisher, D. M., & Randles, R. K. (1975). A two-sample adaptive distribution-free test. Journal of
Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods (2nd ed.). Hoboken, NJ: Wiley.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6,
65–70.
Hommel, G. (1988). A stepwise rejective multiple test procedure based on a modified Bonferroni test.
Biometrika, 75, 383–386.
Hora, S. C., & Conover, W. J. (1984). The F statistic in the two-way layout with rank-score transformed data.
Journal of the American Statistical Association, 79, 668–673.
Hora, S. C., & Iman, R. L. (1988). Asymptotic relative efficiencies of the rank-transformation procedure in
randomized complete-block designs. Journal of the American Statistical Association, 83, 462–470.
Hosmer, D. W., & Lemeshow, S. (2001). Applied logistic regression (2nd ed.). New York: Wiley.
Hotelling, H. (1931). The generalization of Student’s ratio. Annals of Mathematical Statistics, 2, 360–378.
Hox, J. (2002). Multilevel analysis: Techniques and applications. Mahwah, NJ: Lawrence Erlbaum Associates,
Inc.
Hsu, T. C., & Feldt, L. S. (1969). The effect of limitations on the number of criterion score values on the
significance of the F test. American Educational Research Journal, 6, 515–527.
Huck, S. W., & McLean, R. A. (1975). Using a repeated-measures ANOVA to analyze the data from a pretest–
posttest design: A potentially confusing task. Psychological Bulletin, 82, 511–518.
Hudson, J. D., & Krutchkoff, R. C. (1968). A Monte Carlo investigation of the size and power of tests
employing Satterthwaite’s synthetic mean squares. Biometrika, 55, 431–433.
Huitema, B. E. (1980). The analysis of covariance and alternatives. New York: Wiley.
Hunka, S., & Leighton, J. (1997). Defining Johnson–Neyman regions of significance in the three-covariate
ANCOVA using Mathematica. Journal of Educational and Behavioral Statistics, 22, 361–387.
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research studies.
Newbury Park, CA: Sage.
Huynh, H. (1982). A comparison of four approaches to robust regression. Psychological Bulletin, 92, 505–512.
Huynh, H., & Feldt, L. S. (1976). Estimation of the Box correction for degrees of freedom from sample data in
randomized block and split-plot designs. Journal of Educational Statistics, 1, 69–82.
Iman, R. L., & Conover, W. J. (1981). Rank transformations as a bridge between parametric and nonparametric
statistics. American Statistician, 35, 124–129.
Iman, R. L., Hora, S. C., & Conover, W. J. (1984). Comparison of asymptotically distribution-free procedures
for the analysis of complete blocks. Journal of the American Statistical Association, 79, 674–685.
Irwin, J. R., & McClelland, G. H. (2001). Misleading heuristics and moderated regression models. Journal of
Marketing Research, 38, 100–109.
ISSP (International Society of Sport Psychology). (1992). Physical activity and psychological benefits: A pos-
ition statement. The Sport Psychologist, 6, 199–203.
Jaccard, J., Turrisi, R., & Wan, C. K. 1990. Interaction effects in multiple regression. Newbury Park, CA: Sage.
REFERENCES 783
James, G. S. (1951). The comparison of several groups of observations when the ratios of the population
variances are unknown. Biometrika, 38, 324–329.
James, G. S. (1954). Tests of linear hypotheses in univariate and multivariate analysis when the ratios of the
population variances are unknown. Biometrika, 41, 19–43.
Janky, D. G. (2000). Sometimes pooling for analysis of variance hypothesis tests: A review and study of a split-
plot model. American Statistician, 54, 269–279.
Jennings, E. (1988). Models for pretest–posttest data: Repeated-measures ANOVA revisited. Journal of
Educational Statistics, 13, 273–280.
Johnson, P. O., & Neyman, J. (1936). Tests of certain linear hypotheses and their application to some edu-
cational problems. Statistical Research Memoirs, 1, 57–93.
Kelley, K. (2005). The effects of nonnormal distributions on confidence intervals around the standardized
mean difference: Bootstrap and parametric confidence intervals. Educational and Psychological Measure-
ment, 65, 51–69.
Kepner, J. L., & Robinson, D. H. (1988). Nonparametric methods for detecting treatment effects in repeated-
measures designs. Journal of the American Statistical Association, 83, 456–461.
Keren, G. (1993). Between-or within-subjects design: A methodological dilemma. In G. Keren & C. Lewis
(Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 257–272).
Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Keselman, H. J., Rogan, J. C., Mendoza, J. L., & Breen, L. J. (1980). Testing the validity conditions of repeated-
measures F tests. Psychological Bulletin, 87, 479–481.
Keselman, H. J., Wilcox, R. R., Othman, A. R., & Fradette, K. (2002). Trimming, transforming statistics, and
bootstrapping: Circumventing the biasing effects of heteroscedasticity and nonnormality. Journal of Mod-
ern Applied Statistical Methods, 1, 288–309.
Keuls, M. (1952). The use of the Studentized range in connection with an analysis of variance. Euphytica, 1,
112–122.
King, G. (1991). Truth is stranger than prediction, more questionable than inference. American Journal of
Political Science, 35, 1047–1053.
Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences (3rd ed.). Belmont, CA: Brooks/
Cole.
Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological
Measurement, 56, 746–759.
Koele, P. (1982). Calculating power in analysis of variance. Psychological Bulletin, 92, 513–516.
Kraemer, H. C. (2005). A simple effect size indicator for two-group comparisons? A comment on requivalent.
Psychological Methods, 10, 413–419.
Kraemer, H. C., & Thiemann, S. (1987). How many subjects? Statistical power analysis in research. Beverly Hills,
CA: Sage.
Kramer, C. Y. (1956). Extension of multiple range tests to group means with unequal numbers of replications.
Biometrics, 12, 307–310.
Kreft, I., & de Leeuw, J. (1998). Introducing multilevel modeling. London: Sage.
Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American
Lee, W.-C., & Rodgers, J. L. (1998). Bootstrapping correlation coefficients using univariate and bivariate sam-
pling. Psychological Methods, 3, 91–103.
Lee, Y. S. (1972). Tables of the upper percentage points of the multiple correlation coefficient. Biometrika, 59,
175–189.
784 REFERENCES
Lehmann, E. L. (1975). Nonparametrics. San Francisco: Holden-Day.

Lenth, R. V. (2001). Some practical guidelines for effective sample-size determination. American Statistician,
55, 187–193.
Levene, H. (1960). Robust tests for equality of variances. In I. Olkin (Ed.), Contributions to probability and
statistics: Essays in honor of Harold Hotelling (pp. 278–292). Stanford, CA: Stanford University Press.
Levine, D. W., & Dunlap, W. P. (1982). Power of the F test with skewed data: Should one transform or not?
Lin, L., Halgin, R. P., Well, A. D., & Ockene, I. (2008). The relationship between depression and occupational,
household, and leisure-time physical activity. Journal of Clinical and Sport Psychology, 2, 95–107.
Lindquist, E. F. (1953). Design and analysis of experiments in education and psychology. Boston: Houghton-
Mifflin.
Linn, R. L., & Slinde, J. A. (1977). The determination of the significance of change between pre- and posttest-
ing periods. Review of Educational Research, 47, 121–150.
Little, R. J. A. (1988). Missing data in large surveys. Journal of Business and Economic Statistics, 6, 287–301.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Lix, L. M., & Keselman, H. J. (1998). To trim or not to trim: Tests of location equality under heteroscedasticity
and nonnormality. Educational and Psychological Measurement, 58, 409–429.
Lix, L. M., Keselman, J. C., & Keselman, H. J. (1996). Consequences of assumption violations revisited: A
quantitative review of alternatives to the one-way analysis of variance F test. Review of Educational
Research, 66, 579–620.
Lorch, R. F., & Myers, J. L. (1990). Regression analyses of repeated-measures data: A comparison of three
different methods. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 149–157.
Luke, D. A. (2004). Multilevel modeling. Thousand Oaks, CA: Sage.
Lunney, G. H. (1970). Using analysis of variance with a dichotomous variable: An empirical study. Journal of
Educational Measurement, 7, 263–269.
Lutz, C., Well, A. D., & Novak, M. (2003). Stereotypic and self-injurious behavior in rhesus macaques: A
survey and retrospective analysis of environment and early experience. American Journal of Primatology,
60, 1–15.
MacCallum, R. C., & Mar, C. M. (1995). Distinguishing between moderator and quadratic effects in multiple
regression. Psychological Bulletin, 118, 405–421.
MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A comparison of
models to test mediation and other intervening variable effects. Psychological Methods, 7, 83–104.
Marascuilo, L. A., & Levin, J. R. (1970). Appropriate post hoc comparisons for interaction and nested hypoth-
eses in analysis of variance designs: The elimination of type IV errors. American Educational Research
Journal, 7, 397–421.
Marascuilo, L. A., & Levin, J. R. (1972). Type IV errors and interactions. Psychological Bulletin, 78, 368–374.
Marascuilo, L. A., & Levin, J. R. (1973). Type IV errors and Games. Psychological Bulletin, 80, 308–309.
Martin, P., & Bateson, P. (2007). Measuring behaviour: An introductory guide. Cambridge: Cambridge Uni-
versity Press.
Massey, F. J. (1951). The Kolmogorov–Smirnov test for goodness of fit. Journal of the American Statistical
Association, 46, 68–78.
Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. Annals of Mathemat-
ical Statistics, 11, 204–209.
Maxwell, S. E. (1980). Pairwise multiple comparisons in repeated-measures designs. Journal of Educational
Statistics, 5, 269–287.
REFERENCES 785
Maxwell, S. E. (2000). Sample size and multiple regression analysis. Psychological Methods, 5, 434–458.
Maxwell, S. E., & Bray, J. H. (1986). Robustness of the quasi F statistic to violations of sphericity. Psychological
Bulletin, 99, 416–421.
Maxwell, S. E., Camp, C. J., & Arvey, R. D. (1981). Measures of strength of association. Journal of Applied
Psychology, 66, 525–534.
Maxwell, S. E., & Delaney, H. D. (1993). Bivariate median splits and spurious statistical significance. Psycho-
Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison
perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Maxwell, S. E., Delaney, H. D., & Dill, C. A. (1984). Another look at ANCOVA versus blocking. Psychological
Bulletin, 95, 136–147.
Maxwell, S. E., & Howard, G. S. (1981). Change scores—necessarily anathema? Educational and Psychological
Measurement, 41, 747–756.
McGraw, L., & Wong, S. (1992). The common language effect size statistic. Psychological Bulletin, 111, 361–
365.
Mead, R., Bancroft, T. A., & Han, C. (1975). Power of analysis of variance test procedures for incompletely
specified fixed models. Annals of Statistics, 3, 797–808.
Menard, S. (2002). Applied logistic regression (2nd ed.). Thousand Oaks, CA: Sage.
Meng, X.-L., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psycho-
Merriam, P. A., Ockene, I. S., Hebert, J. R., Rosal, M. C., & Matthews, C. E. (1999). Seasonal variation of
blood cholesterol levels. Journal of Biological Rhythms, 14, 330–330.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105,
156–166.
Miller, R. G. (1974). The jackknife—a review. Biometrika, 61, 1–17.
Miller, R. G., Jr. (1981). Simultaneous statistical inference (2nd ed.). New York: Springer-Verlag.
Mittlehammer, R. C., Judge, G. C., & Miller, D. J. (2000). Econometric foundations. Cambridge: Cambridge
University Press.
Morrison, D. F. (1990). Multivariate statistical methods (3rd ed.). New York: McGraw-Hill.
Morrison, D. F. (2004). Multivariate statistical methods. Pacific Grove, CA: Duxbury Press.
Morrow, L. M., & Young, J. (1997). A family literacy program connecting school and home: Effects on attitude,
motivation, and literacy achievement. Journal of Educational Psychology, 89, 736–742.
Murray, J. E., Yong, E., & Rhodes, G. (2000). Revisiting the perception of upside-down faces. Psychological
Science, 11, 492–496.
Myers, J. L. (1959). On the interaction of two scaled variables. Psychological Bulletin, 56, 385–391.
Myers, J. L. (1979). Fundamentals of experimental design (3rd ed.). Boston: Allyn & Bacon.
Myers, J. L., DiCecco, J. V., & Lorch, R. F. (1981). Group dynamics and individual performances: Pseudogroup
and quasi-F analyses. Journal of Personality and Social Psychology, 40, 86–98.
Myers, J. L., DiCecco, J. V., White, J. B., & Borden, V. M. (1982). Repeated measurements on dichotomous
variables: Q and F tests. Psychological Bulletin, 92, 517–525.
Myers, J. L., Hansen, R. S., Robson, R. R., & McCann, J. (1983). The role of explanation in learning elem-
entary probability. Journal of Educational Psychology, 75, 374–381.
Myers, J. L., Pezdek, K., & Coulson, D. (1973). Effects of prose organization on free recall. Journal of Edu-
cational Psychology, 65, 313–320.
786 REFERENCES
Myers, J. L., & Well, A. D. (1995). Research design and statistical analysis. Hillsdale, NJ: Lawrence Erlbaum
Associates, Inc.
Myers, J. L., & Well, A. D. (2003). Research design and statistical analysis (2nd ed.). Mahwah, NJ: Lawrence
Erlbaum Associates, Inc.
Namboodiri, N. K. (1972). Experimental designs in which each subject is used repeatedly. Psychological Bul-
letin, 77, 54–64.
Neter, J., Kutner, M. H., Nachtscheim, C. J., & Wasserman, W. (1996). Applied linear statistical models (4th ed.).
Boston: WCB McGraw-Hill.
Newman, D. (1939). The distribution of range in samples from a normal population, expressed in terms of
independent estimate of a standard deviation. Biometrika, 31, 20–30.
Nunnally, J., & Berstein, I. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.
Ockene, I. S., Chiriboga, D. E., Stanek, E. J., Harmatz, M. G., Nicolosi, R., Saperia, G., et al. (2004). Seasonal
variation in serum cholesterol levels. Archives of Internal Medicine, 164, 863–870.
Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations,
and limitations. Contemporary Educational Psychology, 25, 241–286.
Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: Measures of effect size for some
common research designs. Psychological Methods, 8, 434–447.
Olkin, I., & Finn, J. (1990). Testing correlated correlations. Psychological Bulletin, 108, 330–333.
Olkin, I., & Finn, J. (1995). Correlations redux. Psychological Bulletin, 118, 155–164.
Oshima, T. C., & Algina, J. (1992). Type I error rates for James’ second order test and Wilcox’s Hm test under
heteroscedasticity and non-normality. British Journal of Mathematical and Statistical Psychology, 45, 225–
263.
Overall, J. E., Lee, D. M., & Hornick, C. W. (1981). Comparison of two strategies for analysis of variance in
nonorthogonal designs. Psychological Bulletin, 90, 367–375.
Overall, J. E., & Spiegel, D. K. (1969). Concerning least squares analysis of experimental data. Psychological
Bulletin, 72, 311–322.
Overton, R. C. (2001). Moderated multiple regression for interactions involving categorical variables: A stat-
istical control for heterogeneous variance across two groups. Psychological Methods, 6, 218–233.
Pearson, E. S., & Hartley, H. (1954). Biometrika tables for statisticians. London: Cambridge University Press.
Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation and prediction. Fort Worth, TX:
Harcourt Brace.
Peritz, E. (1970). A note on multiple comparisons. Unpublished manuscript, Hebrew University, Jerusalem,
Israel.
Perlmutter, J., & Myers, J. L. (1973). A comparison of two procedures for testing multiple contrasts. Psycho-
Piccinelli, M., & Wilkinson, G. (2000). Gender differences in depression: A critical review. British Journal of
Psychiatry, 177, 486–492.
Pollatsek, A., & Well, A. D. (1995). On the use of counterbalanced designs in cognitive research: A suggestion
for a better and more powerful analysis. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 21, 783–794.
Quené, H., & van den Bergh, H. (2007). On multi-level modeling of data from repeated-measures designs: A
tutorial. Speech Communication, 43, 103–121.
Raaijmakers, J. G. W., Schrijnemakers, J. M. C., & Gremmen, F. (1999). How to deal with “The language-as-
fixed-effect fallacy”: Common misconceptions and alternative solutions. Journal of Memory and Lan-
guage, 41, 416–426.
REFERENCES 787
Ragosa, D. (1980). Comparing nonparallel regression lines. Psychological Bulletin, 88, 307–321.
Ragosa, D. (1995). Myths and methods: “Myths about longitudinal research,” plus supplemental questions. In
M. Gottman (Ed.), The analysis of change (pp. 3–66). Mahwah NJ: Lawrence Erlbaum Associates, Inc.
Räkkönen, K., Matthews, K. A., Flory, J. D., Owens, J. F., & Gump, B. B. (1999). Effects of optimism,
pessimism, and trait anxiety on ambulatory blood pressure and mood during everyday life. Journal of
Personality and Social Psychology, 76, 104–113.
Ramsey, P. H. (1978). Power differences between pairwise multiple comparisons. Journal of the American
Ramsey, P. H. (1981). Power of univariate pairwise multiple comparison procedures. Psychological Bulletin, 90,
352–366.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods.
Thousand Oaks, CA: Sage.
Raudenbush, S., Bryk, A., & Congdon, R. (2000). Hierarchical linear and nonlinear modeling. Chicago: Scien-
tific Software International.
Rencher, A. C., & Pun, F. C. (1982). Inflation of R-squared in best subset regression. Technometrics, 22, 49–54.
Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review,
15, 351–357.
Rodgers, J. L., & Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient. American
Statistician, 42, 59–66.
Roediger, H. L., III, Meade, M. L., & Bergman, E. T. (2001). Social contagion of memory. Psychonomic
Bulletin and Review, 8, 365–371.
Rogan, J. C., Keselman, H. J., & Mendoza, J. L. (1979). Analysis of repeated measurements. British Journal of
Mathematical and Statistical Psychology, 32, 269–286.
Rom, D. M. (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality.
Rosenberger, J. L., & Gasko, M. (1983). Comparing location estimators: Trimmed means, medians, and trime-
ans. In D. C. Hoaglin, F. Mosteller, & J. F. Tukey (Eds.), Understanding robust and exploratory data analysis
(pp. 297–328). New York: Wiley.
Rosenthal, R. (1991). Meta-analytic procedures for social research (rev. ed.). Newbury Park, CA: Sage.
Rosenthal, R. (1995). Writing meta-analytic reviews. Psychological Bulletin, 118, 183–192.
Rosenthal, R., & Rubin, D. B. (1983). Ensemble-adjusted p values. Psychological Bulletin, 94, 540–541.
Rosenthal, R., & Rubin, D. B. (1994). The counternull value of an effect size: A new statistic. Psychological
Science, 5, 329–334.
Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation. Psychological
Science, 11, 446–453.
Rosenthal, R., & Rubin, D. B. (2003). requivalent: A simple effect size indicator. Psychological Methods, 8, 492–496.
Rouanet, H., & Lepine, D. (1970). Comparisons between treatments in a repeated-measurement design:
ANOVA and multivariate methods. British Journal of Mathematical and Statistical Psychology, 23, 147–
163.
Rousseeuw, J. R., & Leroy, A. M. (1987).Robust regression and outlier detection. New York: Wiley.
Roy, S. N., & Bose, R. C. (1953). Simultaneous confidence interval estimation. Annals of Mathematical Stat-
istics, 39, 405–422.
Royer, J. M., Tronsky, L. M., & Chan, Y. (1999). Math-fact retrieval as the cognitive mechanism underlying
gender differences in math test performance. Contemporary Educational Psychology, 24, 181–266.
Rozeboom, W. W. (1979). Ridge regression: Bonanza or beguilement? Psychological Bulletin, 86, 242–249.
788 REFERENCES
Sandik, L., & Olsson, B. (1982). A nearly distribution-free test for comparing dispersion in paired samples.
Satterthwaite, F. E. (1946). An approximate distribution of variance components. Biometrics Bulletin, 2, 110–
114.
Schafer, J. L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall.
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3–15.
Scheffé, H. (1959). The analysis of variance. New York: Wiley.
Schnorr, J. A., Lipkin, S. G., & Myers, J. L. (1966). Level of risk in probability learning: Within-and between-
subjects designs. Journal of Experimental Psychology, 72, 497–500.
Schrier, A. M. (1958). Comparison of two methods of investigating the effect of amount of reward on per-
formance. Journal of Comparative and Physiological Psychology, 51, 725–731.
Seaman, M. A., Levin, J. R., & Serlin, R. C. (1991). New developments in pairwise multiple comparisons: Some
powerful and practicable procedures. Psychological Bulletin, 110, 577–586.
Shaffer, J. P. (1979). Comparison of means: An F test followed by a modified multiple range procedure. Journal
of Educational Statistics, 4, 14–23.
Shaffer, J. P. (1986). Modified sequentially rejective multiple test procedures. Journal of the American Statistical
Association, 81, 826–831.
Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46, 561–584.
Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika,
52, 591–611.
Shepperd, J. A. (1991). Cautions in assessing spurious “moderator effects”. Psychological Bulletin, 110, 315–
317.
Šidák, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. Journal of
Siegel S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York:
McGraw-Hill.
Singh, S., & Ernst, E. (2008). Trick or treatment: The undeniable facts about alternative medicine. New York:
Norton.
Smith, J. F. K. (1976). Data transformations in analysis of variance. Journal of Verbal Learning and Verbal
Behavior, 15, 339–346.
Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The
importance of noncentral distributions in computing intervals. Educational and Psychological Measure-
ment, 61, 605–632.
Steiger, J. H. (1979). Multicorr: A computer program for fast, accurate, small-sample tests of correlational
pattern hypotheses. Educational and Psychological Measurement, 39, 677–680.
Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245–251.
Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power calculation, and
hypothesis testing for the squared multiple correlation. Behavior Research Methods, Instruments, and
Computers, 4, 581–582.
Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of statistical
models. In L. L. Harlow, L. L., Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp.
221–257). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Stevens, J. (1986). Applied multivariate statistics for the social sciences. Hillsdale, NJ: Lawrence Erlbaum
Associates, Inc.
Stolberg, A. L. (2001). Understanding school- and community-based groups for children and adolescents.
PsycCRITIQUES, 46, 55–56.
REFERENCES 789
Thompson, W. F., Schellenberg, E. G., & Husain, G. (2001). Arousal, mood, and the Mozart effect. Psycho-
logical Science, 12, 248–251.
Tomarken, A. J., & Serlin, R. C. (1986). Comparison of ANOVA alternatives under variance heterogeneity and
specific noncentrality structures. Psychological Bulletin, 99, 90–99.
Toothaker, L. E. (1993). Multiple comparison procedures. Newbury Park, CA: Sage.
Tukey, J. W. (1949). One degree of freedom for nonadditivity. Biometrics, 5, 232–242.
Tukey, J. W. (1952). A test for nonadditivity in the Latin square. Biometrics, 11, 111–113.
Tukey, J. W. (1953). The problem of multiple comparisons. Unpublished manuscript, Princeton University.
Tukey, J. W. (1969). Analyzing data: Sanctification or detective work? American Psychologist, 24, 83–91.
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Tukey, J. W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100–116.
Tukey, J. W., & McLaughlin, D. H. (1963). Less vulnerable confidence and significance procedures for location
based on a single sample: Trimming/Winsorization. I. Sankhyā: Indian Journal of Statistics, Series A, 25,
331–352.
Vargha, A., & Delaney, H. D. (1998). The Kruskal–Wallis test and stochastic homogeneity. Journal of Edu-
cational and Behavioral Statistics, 23, 170–192.
Velleman, P., & Welsch, R. (1978). Efficient computing of regression diagnostics. American Statistician, 35,
234–242.
Welch, B. L. (1938). The significance of the difference between two means when the population variances are
unequal. Biometrika, 25, 350–362.
Welch, B. L. (1947). The generalization of Student’s problem when several different population variances are
involved. Biometrika, 34, 28–35.
Welch, B. L. (1951). On the comparison of several mean values: An alternative approach. Biometrika, 38, 330–
336.
Welsch, R. E. (1977). Stepwise multiple comparison procedures. Journal of the American Statistical Association,
72, 566–575.
Wherry, R. J. (1931). A new formula for predicting the shrinkage of the coefficient of multiple correlation.
Annals of Mathematical Statistics, 2, 440–457.
Wilcox, R. R. (1987). New designs in analysis of variance. Annual Review of Psychology, 38, 29–60.
Wilcox, R. R. (1989). Comparing the variances of dependent groups. Psychometrika, 54, 305–315.
Wilcox, R. R. (1997). Introduction to robust estimation and hypothesis testing. San Diego, CA: Academic Press.
Wilcoxon, F. (1949). Some rapid approximate statistical procedures. New York: American Cyanimid Company.
Wiley, J., & Voss, J. F. (1999). Constructing arguments from multiple sources: Tasks that promote understand-
ing and not just memory for texts. Journal of Educational Psychology, 91, 301–311.
Wilkinson, L. (1979). Tests of significance in stepwise regression. Psychological Bulletin, 86, 168–174.
Wilkinson, L. (1998). SYSTAT 8 statistics manual. Chicago, IL: SPSS, Inc.
Wilkinson, L., & Dallal, G. E. (1982). Tests of significance in forward selection regression with an F-to-enter
stopping rule. Technometrics, 24, 25–28.
Wilkinson, L., & the Task Force on Statistical Inference (1999). Statistical methods in psychology journals:
Guidelines and explanations, American Psychologist, 54, 594–604.
Williams, E. J. (1949). Experimental designs balanced for the estimation of residual effects of treatments.
Australian Journal of Scientific Research, Series A: Physical Sciences, 2, 149–168.
Witvliet, C. V., Ludwig T. E., & Vander Laan, K. L. (2001). Granting forgiveness or harboring grudges:
Implications for emotion, physiology, and health. Psychological Science, 12, 117–123.
790 REFERENCES
Yates, F. (1934). The analysis of multiple classifications with unequal numbers in the different classes. Journal of
Yuen, K. K. (1974). The two-sample trimmed t for unequal population variances. Biometrika, 61, 165–170.
Zar, J. H. (1972). Significance testing of the Spearman rank correlation coefficient. Journal of the American
Zeaman, D. (1949). Response latency as a function of the amount of reinforcement. Journal of Experimental
Psychology, 9, 466–483.
Zimmerman, D. W., & Zumbo, B. D. (1993). The relative power of parametric and nonparametric statistical
methods. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Method-
ological issues (pp. 481–517). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Zimmerman, D. W., Zumbo, B. D., & Williams, R. H. (2003). Bias in estimation and hypothesis testing of
correlation. Psicológica, 24, 133–158.
Zou, G. Y. (2007). Toward using confidence intervals to compare correlations. Psychological Methods, 12,
399–413.
Zwick, R. (1993). Pairwise multiple comparison procedures for one-way analysis of variance designs. In
G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Statistical issues
(pp. 43–71). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

TXT

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

TXT

Transféré par

Droits d'auteur :

Formats disponibles

Research Design

and Statistical Analysis

Cover design by Design Deluxe

ISBN: 978–0–8058–6431–1 (hbk)

CHAPTER 3 BASIC CONCEPTS IN PROBABILITY 47

CHAPTER 7 INTEGRATED ANALYSIS I 154

CHAPTER 9 MULTI-FACTOR BETWEEN-SUBJECTS

CHAPTER 10 CONTRASTING MEANS IN BETWEEN-SUBJECTS

10.3 Calculations for Hypothesis Tests and Conﬁdence Intervals on

14.3 Fixed and Random Effects 335

17.4 Results and Discussion 420

21.5 Suppression Effects in Multiple Regression 546

CHAPTER 25 ANCOVA AS A SPECIAL CASE OF MULTIPLE

NEW TO THIS EDITION

Review of Important Points and Cautions About Common Errors. Chapter 27 is

Use of Statistical Software

The Book Website

SPSS Syntax Files

1.2 THE INDEPENDENT VARIABLE

1.2.1 Quantitative Independent Variables

1.2.2 Qualitative Independent Variables

1.3 THE DEPENDENT VARIABLE

1.4 THE SUBJECT POPULATION

1.5 NUISANCE VARIABLES

1.5.2 Error Variance

1.6 RESEARCH DESIGN

In an ideal world, a well-articulated research question would be rewarded by deﬁnitive empirical

1.6.1 Observation Versus Experimentation

1.6.2 Dealing with Threats to Valid Inferences

Control of Nuisance Variables by Random Assignment. The most effective means

Measurement of Nuisance Variables. In the absence of experimental control, a

1.6.3 Dealing with Error Variance

Control by Uniform Conditions. Random assignment of subjects or items to treatments

1.7 STATISTICAL ANALYSES

1.8 GENERALIZING CONCLUSIONS

Lehmann, E. L. (1975). Nonparametrics. San Francisco: Holden-Day.

Vous aimerez peut-être aussi