Vous êtes sur la page 1sur 20

CS

CS 5014
5014
Research
Research Methods
Methods inin CS
CS
Prof.
Prof. Frakes
Frakes
Research
Research Designs
Designs

© W. Frakes 2003 1

Experimentaion
Experiment - A procedure for
determining the effect of one set of
variables on another.

Independent Variables (Treatment


Variables) - The variables that are
under the control of the experimenter.

Dependent Variables - variables affected


by changes to independent variables.

Nuisance Variables - Undesired


sources of variation in an experiment
that may affect the dependent
variables.
© W. Frakes 2003 2

1
Scale of Experiments
Small Scale Experiments - involve a
few subjects, usually working alone on
a relatively simple task that can be
completed in a few hours or less.

Micro Model - A theory supported by


small scale experiments.
better internal validity
worse external validity

Macro Model - A theory supported by


large scale experiments.
better external validity
worse internal validity
© W. Frakes 2003 3

Threats to Experiment

Internal Validity - Are the observed differences in


the dependent variable caused by the independent
variables?

* nuisance variables weaken internal validity

External Validity - Generalizability - do the


observed results apply to the population of
interest?

© W. Frakes 2003 4

2
Research Design
• Any study needs a structure or plan which defines
- Number and type of variables to be studying
- The relationships among the variables
Such a plan is called a design.
• Experimental vs. non-experimental designs
- The distinction is based on the degree of control the
experimenter has over subjects and conditions - The most
important difference being whether or not we can randomly
assign subjects to levels of the independent variables.
- many of the same principle apply to both experimental and
non-experimental design.

© W. Frakes 2003 5

Research Design -
the plan, structure, and strategy of investigation
• Purposes
1. To provide answers to research problems.
2. To control variability.
• Research Problems can be stated as hypothesis.
• Research Design sets up the framework for adequate
tests of the relationships among variables.

© W. Frakes 2003 6

3
Research Design (Continued)
• Basic Concepts: - Should already know these
- Variables
- Measurement - Measurement Error
- Reliability - Validity
- Control
- Randomization - random assignment of subjects to
treatment conditions
- Confounding (Third) Variable
- Generalizability of Results

© W. Frakes 2003 7

Experimental Design Notation


X = exposure of a group to a treatment

O = observation or measurement

R = random assignment to a group


X’s and O’s in a given row are applied to the same subjects
X’s and O’s vertical to one another are simultaneous
- - - - Separates groups not equated via random assignment

© W. Frakes 2003 8

4
Threats to Internal Validity
1.History - Specific events ocurring between the
first and second measurement in addition to the
experimental variable.
O1 X O2
___________
History
If X is use of a new tool and O is a measure of productivity,
History might involve a strike, education, new mgmt
practices, etc.

© W. Frakes 2003 9

Threats to Internal Validity


maturation

Processes within respondents operating as a


function of the passage of time per se (not
particular to specific events.)
e.g. getting older, hungrier, tireder, etc
we know that engineers goals change as they get older.
If the DV is dependent on this, it may change merely
because of subjects’ aging.
e.g. abilities of subjects change, e.g. because
of the classes they take.

© W. Frakes 2003 10

5
3.Testing

•The effects of taking a test upon the scores


of a second testing

•knowing that one is being


tested may affect performance.

- Hawthorne Effect

© W. Frakes 2003 11

4. Instrumentation

Changes in the calibration of a measuring


instrument or changes in the observers or
measurements
O1 X O2

Metrics Metrics
Tool Tool’
Tool
Modified

© W. Frakes 2003 12

6
5.Statistical Regression

- Operates where subjects have been selected on


the basis of extreme scores.
e.g. Select best (or worst) programmers and then retest,
scores will tend towards the mean on retest.

© W. Frakes 2003 13

6. Differential Selection Bias

Biases resulting in differential selection of respondents


for the comparison groups. (e.g. non-random selection)

© W. Frakes 2003 14

7
7. Experimental Mortality

Differential loss of respondents from the


comparison groups.

© W. Frakes 2003 15

9. Reactive or interaction effect of testing

- In which a pretest changes the respondents’


sensitiuity to the experimental variables and
thus make the results obtained on the pretested
subjects unrepresentative of the population.

© W. Frakes 2003 16

8
12. Multiple treatment interference

Likely to occur wherever multiple treatments are


applied to the same respondents because the
effects of prior treatments are not usually
erasable.
O1 X1 X2 X3 O2

© W. Frakes 2003 17

Pre-Experimental Designs
Quasi-experiments
(Campbell and Stanley)

- not scientifically valid

- may be used as pilot studies

© W. Frakes 2003 18

9
One shot case study

X O

• Implicit comparison with a baseline situation


e.g. Started using C++ and measured faults/ KNCSL
sometime afterwards.
• Very Common in software engineering
• Very weak design

© W. Frakes 2003 19

One Shot Case Study Example

X O

e.g.
X = use Ada

O = person-months of effort

© W. Frakes 2003 20

10
One group pretest - Posttest design

O1 X O2
O1, O2: Productivity Measure X: Use of C++

Measured, started using C++, measured again

© W. Frakes 2003 21

One Group Pre-test Post-test


Design
O1 X O 2

X = learn Ada

O1 = time to solve problems in another


language (e.g. C)
O2 = time to solve problems in Ada

Does not control for history, maturation,


etc.
© W. Frakes 2003 22

11
Static Group Comparison

X O1
-------
O2

• e.g. Comparisons of companies that use C++ with those


that do not.

• Treatment groups don't have random assignment, so all


sorts of subset biases can enter.

• You can't fix this with subject matching, e.g. matching on


certain characteristics, e.g. programming experience.
© W. Frakes 2003 23

The time series experiment

O1 O2 O3 O4 X O5 O6 O7 O8
- used a lot in 19th century biological and physical
experimentation
e.g. if I have a bar of iron (unchanged in weight for many
months, then dip it in nitric acid - the loss in weight of the
iron bar would follow this experimental logic.
The logic of this is that a discontinuity in the measurement
series will be caused by the treatment X

© W. Frakes 2003 24

12
The time series experiment - example

X= Introduce Higher Production Norms for Code

Faults/
KNCSL

Time
© W. Frakes 2003 25

Non-equivalent control group design

O X O
------------
O O
* Similar in structure to design 4, but without random
assignment
Experimental and Control Group do not have pre-
experimental sampling equivalence. Rather than groups
represent
e.g. A company has two programming sites (e.g. Pala Alto
and Fairfax). A new tool is introduced in Pala Alto. Both
sites are measured on same DV, before and after the
treatment.

© W. Frakes 2003 26

13
Correlational Design
• Correlational Design - Purely Observational - The
investigator does not intervene in any way, or
expose subjects to a manipulation.
- Rather measures are taken on something and relationships
are determined among the measures.
- These measures can be taken by
- Direct observation
- Questionaires
- Existing records

© W. Frakes 2003 27

Cross Sectional Design


• Cross Sectional Design - All measurements are
taken at one point in time.
The experimental design is :
O
where O represents all observations on all variables.
+ Attractive because of low expense, simplicity, ease of
administration
+ Useful for determining if 2 or more variables are related
- use correlation coefficent for 2 vars
- may want to use factor analysis to reduced number of vars
- may want to use regression analysis for > 2 vars

© W. Frakes 2003 28

14
Cross Sectional Design (Cont’d)
• One way to strengthen the claim of cross sectional
data to causality would be to retake the same
measure at a later point in times.
Such a design would be represented by
O O’
where O is Causes Measure and O’ is Effects Measure.

© W. Frakes 2003 29

Quasi-experiment Exercise
• Break into groups and design a quasi-experiment to test the
effect of the WWW on teaching
• Which biases does your design handle?
• Which biases does your design not handle?

© W. Frakes 2003 30

15
Random assignment

•crucial to a true experimental design.

•Random assignment turns biases into


noise, we can use statistics to deal
with this noise.

© W. Frakes 2003 31

A True Experiment
Posttest - Only Design

R 1 X 1 O
R 2 X 2 O
R 3 X 3 O
. . .
. . .
. . .
R n X n O
© W. Frakes 2003 32

16
True Experimental Designs
Pretest-Posttest Control Group Design

R O1 X O2
R O3 O4
8 Programmers are randomly assigned to one of the two
groups. One group uses a coverage analyzer - The control
does not. The DV is number of faults discoverd.
How might your statistically analyze this data?
Take gain scores for each group
O2 - O1
O4 - O4
and do a t-test (or non-parametric equivalent) on them.

© W. Frakes 2003 33

Experiment Example

Given a certain design, implement in


PL-1, C, and Ada

Treatment Measure

R Ada LOC
R PL-1 LOC
R C LOC

© W. Frakes 2003 34

17
Solomon 4 Group Design

R O1 X O2
R O3 O4
R X O5
R O6
• Allows us to estimate with estimate external validity factors.
Design 4 (O1 -O4) is paralleled with experimental and control
groups lacking the pretest.
• This allows the effect of testing and the interaction of testing
and X to be determined.
• The effect of X is replicated in 4 ways.
O2 > O1, O2 > O4, O5 > O6, O5 > O3

© W. Frakes 2003 35

Solomon 4 Group Design - Analysis

- Can be analyzed with a 2x2 ANOVA design

N0 X X Main
Pretested O4 O2 Ef f ects
o f Pre-
No Pretest O6 O5 testing

Main Ef f ects o f X

© W. Frakes 2003 36

18
Posttest-Only Control Group Design

R X O1
R O2
• We allow the randomization to take care of the
equivalence of the groups before the treatment.
• Controls for testing as main effect, but does not
measure it.
Statistical tests
T- test (or non-parametric equivalent)
ANOVA - 2 group

© W. Frakes 2003 37

Factorial Design
• Most real experiments involve several IV’s and are
meant to determine their combined effect on the DV.
e.g. 2x2 Factorial Design
2 IV’s with 2 levels each DV= Faults/NCSL

Real MIS
Time
C

C++
with this design you examine 2 main effects: C vs C++, Real
Time vs MIS, and also on interaction of the 2 IV’s.
This can be analyzed with ANOVA.

© W. Frakes 2003 38

19
True Experiment Exercise
• Break into groups and design a true experiment to test the
effect of the WWW on teaching
• Which biases does your design handle?
• Which biases does your design not handle?

© W. Frakes 2003 39

20

Vous aimerez peut-être aussi