CS 5014 Research Methods in CS Prof. Frakes Research Designs

CS
CS 5014
5014
Research
Research Methods
Methods inin CS
CS
Prof.
Prof. Frakes
Frakes
Research
Research Designs
Designs
© W. Frakes 2003 1
Experimentaion
Experiment - A procedure for
determining the effect of one set of
variables on another.
Independent Variables (Treatment

Variables) - The variables that are
under the control of the experimenter.
Dependent Variables - variables affected

by changes to independent variables.
Nuisance Variables - Undesired

sources of variation in an experiment
that may affect the dependent
variables.
© W. Frakes 2003 2
1
Scale of Experiments
Small Scale Experiments - involve a
few subjects, usually working alone on
a relatively simple task that can be
completed in a few hours or less.
Micro Model - A theory supported by

small scale experiments.
better internal validity
worse external validity
Macro Model - A theory supported by

large scale experiments.
better external validity
worse internal validity
© W. Frakes 2003 3
Threats to Experiment
Internal Validity - Are the observed differences in

the dependent variable caused by the independent
variables?
* nuisance variables weaken internal validity
External Validity - Generalizability - do the

observed results apply to the population of
interest?
© W. Frakes 2003 4
2
Research Design
• Any study needs a structure or plan which defines
- Number and type of variables to be studying
- The relationships among the variables
Such a plan is called a design.
• Experimental vs. non-experimental designs
- The distinction is based on the degree of control the
experimenter has over subjects and conditions - The most
important difference being whether or not we can randomly
assign subjects to levels of the independent variables.
- many of the same principle apply to both experimental and
non-experimental design.
© W. Frakes 2003 5
Research Design -
the plan, structure, and strategy of investigation
• Purposes
1. To provide answers to research problems.
2. To control variability.
• Research Problems can be stated as hypothesis.
• Research Design sets up the framework for adequate
tests of the relationships among variables.
© W. Frakes 2003 6
3
Research Design (Continued)
• Basic Concepts: - Should already know these
- Variables
- Measurement - Measurement Error
- Reliability - Validity
- Control
- Randomization - random assignment of subjects to
treatment conditions
- Confounding (Third) Variable
- Generalizability of Results
© W. Frakes 2003 7
Experimental Design Notation

X = exposure of a group to a treatment
O = observation or measurement
R = random assignment to a group

X’s and O’s in a given row are applied to the same subjects
X’s and O’s vertical to one another are simultaneous
- - - - Separates groups not equated via random assignment
© W. Frakes 2003 8
4
Threats to Internal Validity
1.History - Specific events ocurring between the
first and second measurement in addition to the
experimental variable.
O1 X O2
___________
History
If X is use of a new tool and O is a measure of productivity,
History might involve a strike, education, new mgmt
practices, etc.
© W. Frakes 2003 9
Threats to Internal Validity

maturation
Processes within respondents operating as a

function of the passage of time per se (not
particular to specific events.)
e.g. getting older, hungrier, tireder, etc
we know that engineers goals change as they get older.
If the DV is dependent on this, it may change merely
because of subjects’ aging.
e.g. abilities of subjects change, e.g. because
of the classes they take.
© W. Frakes 2003 10
5
3.Testing
•The effects of taking a test upon the scores

of a second testing
•knowing that one is being

tested may affect performance.
- Hawthorne Effect
© W. Frakes 2003 11
4. Instrumentation
Changes in the calibration of a measuring

instrument or changes in the observers or
measurements
O1 X O2
Metrics Metrics
Tool Tool’
Tool
Modified
© W. Frakes 2003 12
6
5.Statistical Regression
- Operates where subjects have been selected on

the basis of extreme scores.
e.g. Select best (or worst) programmers and then retest,
scores will tend towards the mean on retest.
© W. Frakes 2003 13
6. Differential Selection Bias
Biases resulting in differential selection of respondents

for the comparison groups. (e.g. non-random selection)
© W. Frakes 2003 14
7
7. Experimental Mortality
Differential loss of respondents from the

comparison groups.
© W. Frakes 2003 15
9. Reactive or interaction effect of testing
- In which a pretest changes the respondents’

sensitiuity to the experimental variables and
thus make the results obtained on the pretested
subjects unrepresentative of the population.
© W. Frakes 2003 16
8
12. Multiple treatment interference
Likely to occur wherever multiple treatments are

applied to the same respondents because the
effects of prior treatments are not usually
erasable.
O1 X1 X2 X3 O2
© W. Frakes 2003 17
Pre-Experimental Designs
Quasi-experiments
(Campbell and Stanley)
- not scientifically valid
- may be used as pilot studies
© W. Frakes 2003 18
9
One shot case study
X O
• Implicit comparison with a baseline situation

e.g. Started using C++ and measured faults/ KNCSL
sometime afterwards.
• Very Common in software engineering
• Very weak design
© W. Frakes 2003 19
One Shot Case Study Example
X O
e.g.
X = use Ada
O = person-months of effort
© W. Frakes 2003 20
10
One group pretest - Posttest design
O1 X O2
O1, O2: Productivity Measure X: Use of C++
Measured, started using C++, measured again
© W. Frakes 2003 21
One Group Pre-test Post-test

Design
O1 X O 2
X = learn Ada
O1 = time to solve problems in another

language (e.g. C)
O2 = time to solve problems in Ada
Does not control for history, maturation,

etc.
© W. Frakes 2003 22
11
Static Group Comparison
X O1
-------
O2
• e.g. Comparisons of companies that use C++ with those

that do not.
• Treatment groups don't have random assignment, so all

sorts of subset biases can enter.
• You can't fix this with subject matching, e.g. matching on

certain characteristics, e.g. programming experience.
© W. Frakes 2003 23
The time series experiment
O1 O2 O3 O4 X O5 O6 O7 O8
- used a lot in 19th century biological and physical
experimentation
e.g. if I have a bar of iron (unchanged in weight for many
months, then dip it in nitric acid - the loss in weight of the
iron bar would follow this experimental logic.
The logic of this is that a discontinuity in the measurement
series will be caused by the treatment X
© W. Frakes 2003 24
12
The time series experiment - example
X= Introduce Higher Production Norms for Code
Faults/
KNCSL
Time
© W. Frakes 2003 25
Non-equivalent control group design
O X O
------------
O O
* Similar in structure to design 4, but without random
assignment
Experimental and Control Group do not have pre-
experimental sampling equivalence. Rather than groups
represent
e.g. A company has two programming sites (e.g. Pala Alto
and Fairfax). A new tool is introduced in Pala Alto. Both
sites are measured on same DV, before and after the
treatment.
© W. Frakes 2003 26
13
Correlational Design
• Correlational Design - Purely Observational - The
investigator does not intervene in any way, or
expose subjects to a manipulation.
- Rather measures are taken on something and relationships
are determined among the measures.
- These measures can be taken by
- Direct observation
- Questionaires
- Existing records
© W. Frakes 2003 27
Cross Sectional Design

• Cross Sectional Design - All measurements are
taken at one point in time.
The experimental design is :
O
where O represents all observations on all variables.
+ Attractive because of low expense, simplicity, ease of
administration
+ Useful for determining if 2 or more variables are related
- use correlation coefficent for 2 vars
- may want to use factor analysis to reduced number of vars
- may want to use regression analysis for > 2 vars
© W. Frakes 2003 28
14
Cross Sectional Design (Cont’d)
• One way to strengthen the claim of cross sectional
data to causality would be to retake the same
measure at a later point in times.
Such a design would be represented by
O O’
where O is Causes Measure and O’ is Effects Measure.
© W. Frakes 2003 29
Quasi-experiment Exercise
• Break into groups and design a quasi-experiment to test the
effect of the WWW on teaching
• Which biases does your design handle?
• Which biases does your design not handle?
© W. Frakes 2003 30
15
Random assignment
•crucial to a true experimental design.
•Random assignment turns biases into

noise, we can use statistics to deal
with this noise.
© W. Frakes 2003 31
A True Experiment
Posttest - Only Design
R 1 X 1 O
R 2 X 2 O
R 3 X 3 O
. . .
. . .
. . .
R n X n O
© W. Frakes 2003 32
16
True Experimental Designs
Pretest-Posttest Control Group Design
R O1 X O2
R O3 O4
8 Programmers are randomly assigned to one of the two
groups. One group uses a coverage analyzer - The control
does not. The DV is number of faults discoverd.
How might your statistically analyze this data?
Take gain scores for each group
O2 - O1
O4 - O4
and do a t-test (or non-parametric equivalent) on them.
© W. Frakes 2003 33
Experiment Example
Given a certain design, implement in

PL-1, C, and Ada
Treatment Measure
R Ada LOC
R PL-1 LOC
R C LOC
© W. Frakes 2003 34
17
Solomon 4 Group Design
R O1 X O2
R O3 O4
R X O5
R O6
• Allows us to estimate with estimate external validity factors.
Design 4 (O1 -O4) is paralleled with experimental and control
groups lacking the pretest.
• This allows the effect of testing and the interaction of testing
and X to be determined.
• The effect of X is replicated in 4 ways.
O2 > O1, O2 > O4, O5 > O6, O5 > O3
© W. Frakes 2003 35
Solomon 4 Group Design - Analysis
- Can be analyzed with a 2x2 ANOVA design
N0 X X Main
Pretested O4 O2 Ef f ects
o f Pre-
No Pretest O6 O5 testing
Main Ef f ects o f X
© W. Frakes 2003 36
18
Posttest-Only Control Group Design
R X O1
R O2
• We allow the randomization to take care of the
equivalence of the groups before the treatment.
• Controls for testing as main effect, but does not
measure it.
Statistical tests
T- test (or non-parametric equivalent)
ANOVA - 2 group
© W. Frakes 2003 37
Factorial Design
• Most real experiments involve several IV’s and are
meant to determine their combined effect on the DV.
e.g. 2x2 Factorial Design
2 IV’s with 2 levels each DV= Faults/NCSL
Real MIS
Time
C
C++
with this design you examine 2 main effects: C vs C++, Real
Time vs MIS, and also on interaction of the 2 IV’s.
This can be analyzed with ANOVA.
© W. Frakes 2003 38
19
True Experiment Exercise
• Break into groups and design a true experiment to test the
effect of the WWW on teaching
• Which biases does your design handle?
• Which biases does your design not handle?
© W. Frakes 2003 39
20

CS 5014 Research Methods in CS Prof. Frakes Research Designs

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

CS 5014 Research Methods in CS Prof. Frakes Research Designs

Transféré par

Droits d'auteur :

Formats disponibles

CS

Independent Variables (Treatment

Dependent Variables - variables affected

Nuisance Variables - Undesired

Micro Model - A theory supported by

Macro Model - A theory supported by

Internal Validity - Are the observed differences in

* nuisance variables weaken internal validity

External Validity - Generalizability - do the

Experimental Design Notation

R = random assignment to a group

Threats to Internal Validity

Processes within respondents operating as a

•The effects of taking a test upon the scores

•knowing that one is being

Changes in the calibration of a measuring

- Operates where subjects have been selected on

6. Differential Selection Bias

Biases resulting in differential selection of respondents

Differential loss of respondents from the

9. Reactive or interaction effect of testing

- In which a pretest changes the respondents’

Likely to occur wherever multiple treatments are

- not scientifically valid

- may be used as pilot studies

• Implicit comparison with a baseline situation

One Shot Case Study Example

Measured, started using C++, measured again

One Group Pre-test Post-test

O1 = time to solve problems in another

Does not control for history, maturation,

• e.g. Comparisons of companies that use C++ with those

• Treatment groups don't have random assignment, so all

• You can't fix this with subject matching, e.g. matching on

The time series experiment

X= Introduce Higher Production Norms for Code

Non-equivalent control group design

Cross Sectional Design

•crucial to a true experimental design.

•Random assignment turns biases into

Given a certain design, implement in

Solomon 4 Group Design - Analysis

- Can be analyzed with a 2x2 ANOVA design

Vous aimerez peut-être aussi