Establishing The Internal and External Validity of Experimental Studies

PRIMER Experimental studies
PRIMER
Establishing the internal and external validity

of experimental studies
MARION K. SLACK AND JOLAINE R. DRAUGALIS
T
he effects of investigational treat-
ments are established by statisti- Abstract: The information needed to de- evolves from one section to the next to pro-
termine the internal and external validity of vide a complete logical description of each
cally testing the findings to de-
an experimental study is discussed. internal-validity problem. The map ad-
termine if any differences are likely Internal validity is the degree to which a dresses experimental mortality, random-
to be due to chance alone and by study establishes the cause-and-effect rela- ization, blinding, placebo effects, and ad-
examining the studys design and ex- tionship between the treatment and the herence to the study protocol. Threats to
ecution to rule out alternative causes observed outcome. Establishing the inter- internal validity may be a source of extrane-
of the observed effects. The process nal validity of a study is based on a logical ous variance when the findings are not sig-
of ruling out alternative causes is re- process. For a research report, the logical nificant. External validity is addressed by
framework is provided by the reports delineating inclusion and exclusion criteria,
ferred to as assessing or establishing
structure. The methods section describes describing subjects in terms of relevant
internal validity. Internal validity is what procedures were followed to mini- variables, and assessing generalizability.
the degree to which a study establish- mize threats to internal validity, the results By using a cognitive map, investigators
es the cause-and-effect relationship section reports the relevant data, and the reporting an experimental study can sys-
between the treatment and the ob- discussion section assesses the influence of tematically address internal and external
served outcome; conversely, it refers bias. Eight threats to internal validity have validity so that the effects of the treatment
to the degree to which the absence of been defined: history, maturation, testing, are accurately portrayed and generaliza-
instrumentation, regression, selection, ex- tion of the findings is appropriate.
a relationship implies the absence of
perimental mortality, and an interaction of
cause.1,2 Internal validity is the sine threats. A cognitive map may be used to Index terms: Clinical studies; Control, qual-
qua non of research; without it, a guide investigators when addressing validity; Methodology; Researach
study is meaningless.1 In a study that ity in a research report. The map is based on Am J Health-Syst Pharm. 2001; 58:2173-
lacks internal validity, the results are the premise that information in the report 84
probably attributable to a cause oth-
er than the treatment. Consequently,
one could not expect to observe sim- tests establish the likelihood that the assessed to judge whether the effect
ilar effects if the study was duplicat- study results are due to chance varia- resulted from the treatment or from
ed, nor could the results be general- tion rather than to the treatment or another factor. Even if the statistical
ized to similar populations. some other cause. When the result is test does not indicate significance
Internal validity, as defined by not likely attributable to chance (i.e., (i.e., p is greater than 0.05), the de-
Campbell and Stanley,1 is a logical the value of p is 0.05 or less), then the sign and execution of the study can
rather than statistical issue. Statistical design and execution of the study are still be assessed to determine if extra-
MARION K. SLACK, PH.D., is Research Scientist/Teaching Associate and This is article 204-000-01-012-H04 in the ASHP Continuing Educa-
JOLAINE R. DRAUGALIS, PH.D., is Professor and Assistant Dean, College of tion System; it qualifies for 1.0 hour of continuing-education credit. See
Pharmacy, The University of Arizona, Tucson. page 2182 or http://ce.ashp.org for the learning objectives, test questions,
Address correspondence to Dr. Slack at the College of Pharmacy, The and answer sheet.
University of Arizona, P.O. Box 210207, Tucson, AZ 85721-0207 (slack@
pharmacy.arizona.edu). Copyright 2001, American Society of Health-System Pharmacists,
Presented at the ASHP Midyear Clinical Meeting, Orlando, FL, De- Inc. All rights reserved. 1079-2082/01/1102-2173$06.00.
cember 8, 1999.
Am J Health-Syst PharmVol 58 Nov 15, 2001 2173

The Primer section covers basic information in various fields of knowledge of interest to able that progresses from the meth-
pharmacists who practice in health systems. Within the scope of the section are reviews ods section to the results section to
of fundamental concepts in, for example, pharmacy, pharmaceutics, pharmacology, the discussion section.
physiology, therapeutics, and health care technology. Also covered are topics somewhat
out of the mainstream of pharmacy (e.g., advances in nondrug health care technology)
This approach differs from a scale
but nevertheless of interest to practitioners. or checklist method, in which the
reader scores a research report for a
number of methodological factors.
neous factors obscured the treat- readers to use when determining if a Scales and checklists are usually con-
ments impact. To effectively estab- report adequately addresses internal cerned with overall study quality and
lish that a treatment produced the validity. In addition, we describe the have been criticized for not having a
outcome, the investigator must show implications of methodology for study theoretical foundation.5 They lack
that extraneous factors were unlikely outcomes when the results are not sig- logical order and list methodological
to have influenced the results. nificantly different among study factors that are not specific to the re-
The types of extraneous factors groups and discuss external validity, or search design. For example, carry-
that can influence the outcome of a generalizability. over effects, a consideration in a
study depend on the research de- crossover design, may be included on
sign.1,3 That is, the extraneous factors Theoretical framework the list, even though the study has a
that can affect the outcome of a true The following discussion is based randomized design with parallel
experimental study are different on the framework of Campbell and groups. Also, checklists typically as-
from those that can influence a study Stanley,1 in that specific threats to sess multiple types of validity; they
involving a pretestposttest design establishing a cause-and-effect rela- contain items related to statistical
and a single group of subjects. A true tionship (i.e., internal validity) are conclusion validity and external va-
experimental design is one that has at associated with the particular re- lidity, as well as items related to in-
least two independent, parallel search design and with how the ternal validity.6
groups; randomly assigns subjects to study procedures are executed.
the groups; and assesses treatments Therefore, the investigator needs to Threats to internal validity
prospectively. know which threats to internal valid- Internal validity is concerned with
Studies evaluating experimental ity are associated with which re- the rigor (and thus the degree of con-
research designs have shown that search designs and the sources of trol) of the study design. The degree
poor execution of specific study pro- bias associated with particular as- of control exerted over potential ex-
cedures can bias results. Schulz et al.4 pects of study execution. traneous variables determines the
examined the association between We also follow Campbell and level of internal validity. Controlling
treatment effects and procedures Stanleys contention that establishing for potentially confounding variables
such as allocation concealment, se- the internal validity of a study or as- minimizes the potential for an alter-
quence generation, withdrawals sessing bias is based on a logical native explanation for treatment ef-
(dropouts), and blinding. They re- process. Hence, the information fects and provides more confidence
ported that the effects of treatment needed to assess internal validity that effects are due to the indepen-
were 30% greater in studies with in- must be presented so that the reader dent variable. Eight threats to internal
adequate allocation concealment has the critical information available validity have been defined: history,
than in studies with adequate conceal- in a logical sequence. For a research maturation, testing, instrumenta-
ment. Similar results were observed in report, the logical framework is pro- tion, regression, selection, experi-
studies that lacked appropriate vided by the reports structure. The mental mortality, and an interaction
blinding. Less bias was attributed to methods section describes how the of threats.1,2
sequence-generation procedures or study was designed and what proce- History. History becomes a threat
to dropouts. dures were followed to reduce or when other factors external to the sub-
This article delineates the method- eliminate specific threats to internal jects (in addition to the treatment vari-
ological issues associated with experi- validity. The results section reports able) occur by virtue of the passage of
mental research designs, shows how the data relevant to establishing in- time. For example, the reported effect
they differ from those associated with ternal validity, and the discussion of a year-long, institution-specific pro-
other designs, and provides a cognitive section provides the investigators as- gram to improve medical resident pre-
map for investigators to use to ensure sessment of the influence of bias. For scribing and order-writing practices
that they address pertinent method- a specific threat to internal validity or may have been confounded by a self-
ological issues when reporting the re- source of bias, a logical thread of in- directed continuing-education series
sults of an experimental study and for formation should be readily identifi- on medication errors provided to resi-
2174 Am J Health-Syst PharmVol 58 Nov 15, 2001

dents by a pharmaceutical firms med- move closer to the mean (i.e., re- fronted involves maturation. The
ical education liaison. gress) in repeated testing. For exam- selectionmaturation interaction
Maturation. The maturation ple, if a group of subjects was recruit- concerns the differential assignment
threat can operate when biological or ed on the basis of extremely high of subjects to groups in a way that
psychological changes occur within stress scores and an educational in- relates to the subjects maturation.
subjects and these changes may ac- tervention was conducted, any For example, two groups of diabetic
count in part or in total for effects postintervention improvement not- patients may have similar disease in-
discerned in the study. For example, ed could be due partly, if not entirely, dicators at the start of a study, yet a
a reported decrease in emergency- to regression rather than to the cop- treatment effect could result if a larg-
room visits in a long-term study of ing techniques presented in the edu- er percentage of patients in whom an
pediatric patients with asthma may cational program. effect of maturation (e.g., progressive
be due to outgrowing childhood Differential selection. The selec- worsening of disease) is more preva-
asthma rather than to any treatment tion threat is of utmost concern lent are assigned to one group.7
regimen imposed. Both history and when subjects cannot be randomly The research design chosen (e.g.,
maturation are more of a concern in assigned to treatment groups, partic- experimental, quasi-experimental,
longitudinal studies. ularly if groups are unequal in rele- one-group pretestposttest) and op-
Testing. The testing threat may vant variables before treatment inter- erational procedures used (e.g., ran-
occur when changes in test scores oc- vention. For example, one obstetrics domization techniques, adherence
cur not because of the intervention and gynecology clinics patients re- standards) determine the level of con-
but rather because of repeated test- ceive a pharmacy-based educational fidence in the internal validity. Knowl-
ing. This is of particular concern intervention and another clinics pa- edge of the potential threats and the
when researchers administer identi- tients receive a mailed pamphlet; ability to discern to what degree they
cal pretests and posttests. For exam- both methods are designed to en- may be operating in a study enable one
ple, a reported improvement in med- courage calcium supplementation. to better analyze the results.
ical resident prescribing behaviors When the outcome is measured at Random assignment to parallel
and order-writing practices in the the end of the study, it may be con- groups, the hallmark of an experi-
study previously described may have founded by the fact that the groups mental study, effectively controls all
been due to repeated administration were not equal with respect to rele- threats to internal validity except ex-
of the same short quiz. That is, the vant variables (e.g., age, currently perimental mortality. Differential se-
residents simply learned to provide the provided educational materials, hys- lection is controlled because random
right answers rather than truly achiev- terectomy status, menopausal status) assignment creates groups that are
ing improved prescribing habits. before the educational program was equivalent with respect to known
Instrumentation. When study re- implemented. and unknown variables so that dif-
sults are due to changes in instru- Experimental mortality. Experi- ferences in outcomes cannot be
ment calibration or observer changes mental mortality is also known as at- caused by differences among groups.
rather than to a true treatment effect, trition, withdrawals, or dropouts and Other threats, for example matura-
the instrumentation threat is in op- is problematic when there is a differ- tion, are ruled out by the presence of
eration. For example, in a communi- ential loss of subjects from compari- one or more parallel groups. Because
cations course, evaluator 1 observes son groups subsequent to random- maturation should occur equally in
pharmacy students counsel a patient ization, resulting in unequal groups all the groups, any difference in re-
at week 3 of the semester, and evalua- at the studys end. One example is a sponse should be due to the treat-
tor 2 observes the students at the study designed to compare the effects ment. No other research design can
conclusion of the course. If the eval- of an intranasal corticosteroid spray control for so many threats at once.
uators are dissimilar enough in their with placebo in alleviating symptoms This is why experimental studies are
approach, perhaps because of lack of of allergic rhinitis. If subjects with considered the standard of research
training, this difference may contrib- the most severe symptoms preferen- design.
ute to measurement error in trying to tially dropped out of the active treat-
determine how much learning oc- ment group, the treatment may ap- Cognitive map for establishing the
curred over the semester. pear more effective than it really is. effects of treatment
Regression. The regression threat Selection interactions. The final Cognitive maps are plans or pro-
can occur when subjects have been threat to internal validity is an inter- cedures for completing a task or ac-
selected on the basis of extreme action of the selection threat with complishing a goal.8 A cognitive map
scores, because extreme (low and any of the other threats. The selec- provides a skeleton for directing the
high) scores in a distribution tend to tion interaction most commonly con- analytical process and guiding the

logic of the writing; it also provides sessment in the discussion section of cept for the experimental treatment.
rules for organizing the final product the impact of any internal-validity Experimental mortality. The first
and facilitates systematic examina- problems on study outcomes. internal-validity factor listed in Table
tion of issues.9 Such a tool is believed Throughout this part of the discus- 1 is experimental mortality. To reit-
to be important to analytical think- sion, we assume that the findings erate, experimental mortality in-
ing. We developed a cognitive map to were statistically significant, that is, volves any subject who has been en-
guide investigators when addressing that differences among groups are rolled in a study and randomly a
validity issues in a research report. probably not due to chance varia- ssigned to a group but not included
The cognitive map shown in Table tion. The introduction and conclu- in the analysis for any reason.10 Par-
1 is based on the premise that each sion sections do not provide direct ticipants may be excluded from the
section of a research report provides information on internal validity and analysis for a number of reasons, in-
specific information related to estab- are not included. cluding ineligibility (subjects admit-
lishing the effects of a treatment and From a practical perspective, the ted to study because of clerical or di-
that the information evolves from central issue in demonstrating inter- agnostic errors), nonadherence to
one section to the next to provide a nal validity and establishing the ef- the study protocol (by either subjects
complete logical description of each fects of a treatment is ensuring that or researchers), poor or missing data,
internal-validity problem. In the ta- the comparison groups (the treat- and competing events.
ble, the components proceed from ment and control groups) are equiv- Because the value of random as-
left to right; information evolves alent in all variables except the inde- signment is lost if subjects are
from a description in the methods pendent (treatment) variable. In other dropped from the analysis (the
section of study procedures intended words, the groups are similar dem- groups can no longer be considered
to prevent or limit design or method- ographically and do not differ in se- equivalent in terms of known and
ological problems to a report in the verity and type of disease, prognosis, unknown factors), the preferred pro-
results section of findings relevant to or comorbidities and in how they cedure for preventing bias is an
establishing internal validity to an as- were handled during the study, ex- intention-to-treat analysis, in which
Table 1.
Cognitive Map for Establishing Internal Validity of Experimental Studies
Information in Section of Research Reporta
Internal-Validity Factor Methods Section Results Section Discussion Sectionb
Related to Study Design
Experimental mortality Description of data analysis for Demographics and clinical Reasons for withdrawal
study dropouts, or use of outcomes tables: statistical reported. If intention-to-treat
intention-to-treat analysis or tests used to compare baseline analysis not used, discusses
appropriate statistical analysis characteristics and dependent impact of dropouts on data
variable between groups interpretation and dependent
consistent with intention- variable
to-treat analysis, or analysis
with and without data from
dropouts
Related to Study Procedures
Randomization Description of randomization Demographics table: statistically Differences between groups
method, baseline data compares study groups in and their impact on results
collected, and statistical terms of relevant demographic discussed
analysis of baseline data data
Blinding Description of blinding Effectiveness of blinding Issues related to blinding and
procedures; if no blinding, reported; if no blinding, data their impact on results
discussion of methods used showing treatment discussed
to prevent bias equivalence (except with
respect to independent
variable) reported
Placebo Description of matching Assessment of subjects and Issues related to placebo and
placebo, discussion of effects providers knowledge of their impact on results
related to placebo treatment discussed
Adherence to protocol Description of methods used to Protocol adherence for all Compliance issues and their
assess adherence and of treatment groups reported impact on findings discussed
adherence standards
a
Introduction and conclusion sections are not included, since they do not provide direct information on establishing the effect of treatment.
b
In general, threats to internal validity are not addressed in the discussion section if the methods and results sections establish that the threat is unlikely to play a role
in the study.

all subjects randomized are included mographic data and the outcomes identifying the method of randomiza-
in the analysis.3,11 Although the exact data are typically presented in two tion used, the method used to conceal
reasons for withdrawal from the study separate tables. In such cases, the to- the assignment schedule until recruit-
do not affect an intention-to-treat tal number of subjects in each table ment is complete, who generated and
analysis, they may be informative for should match the total number of who executed the allocation scheme,
future studies or when using the patients randomized. For example, if and relevant baseline data showing
treatment in practice. the authors state that 309 women that the study groups are equivalent in
In a simple intention-to-treat were enrolled in the study, the total terms of known variables.12
analysis, all subjects are retained in number of patients in the demo- Concealing the allocation sequence
the denominator if the dependent graphics table must equal that in the from providers who enter subjects into
variable is a proportion (e.g., the outcomes table (n = 309). a study appears particularly impor-
proportion of patients who im- The next four internal-validity tant. That is, the provider should not
proved) and the last obtained mea- factors listed in Table 1 are related to know which treatment the next subject
surement is used for a continuous the implementation of study proce- would receive if admitted into the
variable (e.g., blood pressure). Inves- dures. Procedures such as random study. Concealment prevents bias
tigators should state whether they assignment, double-blinding, using a from entering into the process of de-
used intention-to-treat analysis in placebo, and using protocols should termining subject eligibility and as-
the methods section. prevent bias from influencing mea- signing treatment. Studies without
If an intention-to-treat analysis is sures of the dependent (outcome) concealment of the allocation se-
not used, then the analysis that was variable. However, they must be im- quence find effects 30% larger than
used must be described and the in- plemented correctly; carelessly exe- studies with concealment.4,13
vestigators must verify that no bias cuted procedures are common Statistical tests are used to com-
was present as a result of withdraw- sources of bias. We now describe pare the baseline variables of all
als. If there was bias, the investigators what information is needed to deter- treatment groups. This establishes
must discuss its impact on the esti- mine if the study procedures were that the random-assignment proce-
mate of treatment effect. In general, implemented in a manner that did dure indeed resulted in groups that
establishing that withdrawals did not not introduce bias. were similar for measured variables
bias the findings is much more oner- Randomization. Randomization and that bias resulting from the ran-
ous than using an intention-to-treat is the first study procedure outlined domization process was unlikely.
analysis. The investigator must show in Table 1. Note that randomization, Authors may report p values when
that the analysis was not biased and or random assignment, is a different comparing baseline variables among
that subjects did not withdraw differ- process with a different objective study groups; however, a p value in-
entially from the study groups.10 Al- than random selection. Random as- dicates if the randomization was fair,
though information on the relative signment uses a random process, not whether the groups were equiva-
number of dropouts from each such as a coin toss, a table of random lent. Therefore, the prognostic
group and the reasons for withdraw- numbers, or computer-generated strength of the variables and the
al may provide insight into the causes random numbers to determine the magnitude of the difference also
of experimental mortality, such in- type of treatment (e.g., drug or place- need to be considered.14 If the groups
formation does not establish equiva- bo) that each study participant re- are not equivalent for all variables,
lence for unknown factors, nor does ceives. Random selection uses a the differences should be addressed
it rule out the possibility that drop- random process to identify study in the discussion section and the im-
outs are related to treatment. Thus, participants from the population. pact of the differences on the report-
alternative methods of analysis are Because random assignment is related outcomes judged. In one study,
always less desirable, and the results ed to internal validity and random analysis of baseline characteristics re-
more tentative, than if intention-to- selection to external validity, the two vealed differences between groups in
treat had been used. procedures should not be confused. exposure to smoke, fat intake, and
In the results section, investiga- Randomization is the best method alcohol consumption.15 The investi-
tors establish that an intention-to- available to produce study groups gators then used multivariate logistic
treat analysis was indeed used by that are equivalent with respect to regression to assess the impact of the
showing that the number of subjects known and unknown variables.3,10 unequal groups on the results of the
randomized to study groups was the However, the randomization proce- study. The regression analysis sup-
same as the number of subjects for dure must be executed in a manner ported their contention that differ-
whom baseline data and outcomes that does not introduce bias into the ences between the groups were not
data were reported. The baseline de- study. Recommendations include responsible for the findings.

Blinding. If a study is blinded, the Adverse effects from placebo ad- that affect adherence, such as severity
procedures used to blind patients ministration. Closely related to of illness, level of education, and so-
and providers to treatment assign- blinding are adverse effects from pla- cioeconomic status, may be indepen-
ments should be described in the cebo administration when placebos dently related to treatment outcomes,
methods section, any data on the ef- need to match certain characteristics so that responses in the adherent
fectiveness of the blinding should be (e.g., taste) of the test drug. In the group are biased and not representa-
reported in the results section, and study comparing the zinc lozenges tive of the entire sample. Like other
any relevant issues should be ad- with placebo, the placebo lozenges factors that may affect internal valid-
dressed in the discussion section. A needed to be very similar to the zinc ity, adherence to the protocol and the
study that evaluated physicians in- lozenges to maintain blinding.17 The standards of adherence used in the
terpretation of blinding found sub- study authors described the placebo study are described in the methods
stantial variability between readers in the methods section and ad- section. When describing the find-
interpretations and textbook defini- dressed the issue of adverse effects ings relevant to adherence in the re-
tions of the terms single blind, from the placebo (which would make sults section, the results are present-
double blind, and triple blind.16 the zinc lozenges appear effective) in ed by treatment group so that any
Therefore, it was suggested that au- the discussion section. between-group differences in adher-
thors specifically state the blinding Adherence to the protocol. Ad- ence are readily apparent. The results
status of everyone involved in a herence to the study protocol, the fi- should include adherence problems
study. Providing data on the effec- nal internal-validity factor described associated with protocol violations
tiveness of blinding is particularly in Table 1, can have a major impact by providers or researchers. If differ-
important if characteristics of the on the interpretation of the findings. ences were identified, then the impli-
treatment allow subjects to identify Consider the extreme, hypothetical cations for interpreting the study
whether they are receiving the drug case in which a significant difference findings should be discussed.
or placebo. For example, in a study is found to favor the treatment but Scientific misconduct. Yet anoth-
comparing zinc and placebo lozeng- the subjects in the treatment group er problem related to establishing the
es, the investigators asked subjects to do not take any of the medication. effects of treatment is scientific mis-
guess their study assignment.17 They The observed effect could not be conduct. Fabrication of data and ma-
reported the findings and concluded caused by the treatment if no one nipulation of data (such as discard-
that blinding had been effective. took it. Hence, adherence informa- ing data that do not support the
In studies that are not blinded, the tion is important to establishing the hypothesis) result, of course, in a
investigators must discuss the meth- effects of treatmentand is consid- study that has no internal validity. The
ods used to prevent bias. All relevant ered an ethical imperative by some.18 findings cannot be replicated by oth-
data must be presented if available, Investigators need to be alert to all er investigators, nor can there be
and the matter must be addressed in types of protocol violations. Both generalization.
the discussion section (Table 1). The providers and patients may violate
allocation sequence can be concealed protocols. While the failure of pa- Methodological problems and
even if the study is not blinded. That tients to adhere to the protocol likely statistical significance
is, the person actually assigning the reduces the effect of the treatment, Although problems with internal
patient to a particular treatment does violations by providers and research- validity are typically associated with
not know the order in which patients ers may bias the study in either direc- studies reporting statistically signifi-
are to receive treatment, so bias from tion, depending on the particular vi- cant differences, methodological
differential assignment (e.g., assign- olation.10 For example, if data (e.g., problems may introduce extraneous
ing sicker patients to the new treat- serum glucose concentrations after variance into the study that obscures
ment because the new treatment is an insulin dose) are collected at times the real differences and produces
believed to be better) need not occur different from those specified in the findings that are not significant.19 (In
even in a study that is not blinded. protocol, patients may display a dif- the real world of research, control-
An effective method of concealing al- ferent response than if the data were ling extraneous variance so that a real
location is to require the person who collected when they should have difference can be identified is the
actually assigns the patient to treat- been. In that case, the effect of the larger problem.) Below, we compare
ment groups to contact a research treatment would appear more or less the implications of methodological
coordinator to obtain the assign- powerful than it really is. problems both for results that are
ment. That way, the person assigning Like study withdrawals and exper- significant and for those that are not.
the treatment does not have access to imental mortality, nonadherence Experimental mortality. Experi-
the allocation sequence. does not occur randomly. Factors mental mortality may favor either

the treatment or the control group. If of the treatment). Studies comparing pling techniques. Study results based
patients who are likely to improve service options or general prevention on random samples are considered
anyway predominate in the treat- programs may be particularly vul- generalizable, while study results
ment group through differential nerable to this problem because they based on other methods of identifying
withdrawals and only the data from cannot be blinded. For example, patients are not. However, clinical
these patients are included in the studies involving a reduction in studies rarely use random sampling
analysis, then that group will appear smoking or a change in dietary habits techniques, because the identity of ev-
to have better outcomes. If a similar often do not find differences between ery eligible patient in the targeted pop-
scenario occurs for the control group, groups because the control group has ulation must be known at the begin-
then the difference between the two adopted many of the behavioral ning of the study for a random sample
groups may not appear to be signifi- changes that constituted the treat- to be taken from it. Since clinicians
cant. In addition, excessive withdraw- ment. 20 In contrast, significance cannot identify patients who will have
als may reduce the sample size so that might be spuriously increased if per- a myocardial infarction, attempt sui-
the power is no longer sufficient to sons collecting outcomes data are cide, or experience other clinical
detect a significant difference. In aware of treatment assignment, since events that determine eligibility before
studies with high dropout rates in they may rate the outcome for the the trial begins, random sampling of a
both treatment and control groups, treatment group more favorably. population cannot be used. Also, ran-
both problems are likely operating, Adherence to the protocol. Poor dom sampling does not guarantee
and the findings cannot be interpret- adherence to treatment protocols by generalizability. If the targeted popula-
ed with any degree of confidence. participants can reduce the treat- tion is a small subpopulation within a
Randomization process. Signifi- ment impact and lead to differences larger population, the results may not
cance may be affected by the ran- that are not significant. Inadequate be generalizable to the larger popula-
domization process if randomization compliance reduces the power of a tion because it may not be adequately
results in unequal groups. Small study so that larger samples sizes are represented in the random sample.
groups (i.e., those with 100 or fewer required to identify significant dif- Other information is needed to estab-
subjects3) are especially vulnerable to ferences. In some cases, the sample lish generalizability.
unequal randomization effects. If the size may need to be increased by 50% Information for determining ex-
inequality favors the control group, to counteract a 20% reduction in ternal validity is provided in the
then the difference between the drug adherence.10 Treatment effects methods and results sections of a re-
groups may not be significant. In ad- may also appear nonsignificant when search report. In the methods sec-
dition, bias may be introduced into subjects who are not likely to benefit tion, inclusion and exclusion criteria
the randomization procedure in cer- from the therapy are included in the help identify the population to which
tain circumstances. Bias may be a study. This again reduces the power the results might apply. Additional
particular problem if the person in- of the trial so that a larger sample is information on generalizability is
teracting with the patient also makes required. found in the data on demographic
the assignment and is not blinded to Nonadherence may also have im- characteristics, diseases, and other
the allocation sequence. If the experi- plications for the applicability of the characteristics of the study partici-
mental treatment is seen as highly treatment: If subjects cannot adhere to pants. By examining the characteris-
desirable or beneficial, the assign- the treatment regimen, then its useful- tics of the study participants, readers
ment may be biased so that the sick- ness is reduced. In a study of dietary can estimate if they are likely to ob-
est patients are assigned to the treat- fiber supplements for preventing col- tain similar outcomes in their own
ment group. In that case, the control orectal adenomas, the authors dis- patient population. For example, the
group may appear to have a better cussed the possibility that subjects results of a study that evaluates the
outcome. were unwilling to comply with the efficacy of a specific treatment in eld-
Blinding. Lack of blinding can re- high-fiber regimen; the regimen may erly Caucasian men with coronary
duce the apparent effect of a treat- not have been a useful intervention.15 heart disease cannot be extrapolated
ment and result in statistically non- to Hispanic women.
significant results. If subjects know Establishing generalizability The report may include a state-
that they are not receiving the treat- When investigators think of gen- ment describing the authors assess-
ment under study, they may make eralizability, they typically think of ment of the population to which the
every effort to achieve the outcome extrapolating the results to other pa- results can be generalized. For exam-
anyway. Another potential problem tient populations, depending on ple, the authors of one study wrote,
arises when the control group is con- whether patients were selected for The study population . . . was repre-
taminated (i.e., receives at least some the study by means of random sam- sentative of patients 75 years of age or

younger who were not receiving long- groups or the reader concludes the ships among sections differentiates
term aspirin treatment and who had difference is not valid, there is no the cognitive map from checklists
not recently undergone angioplasty treatment effect and no cause-and- and from more general structural ap-
or bypass surgery.21 Alternatively, effect relationship to assess. One may proaches. Checklists are inventories
the population to which the results want to examine threats to internal of items that should be included in a
can (or cannot) be generalized may be validity to determine if they may research report.12,22 Typically, they
described in the discussion of study have introduced extraneous vari- include many items addressing a
limitations. For example, the investi- ance, but then the purpose of the as- broad range of issues, only some of
gators may state that the study was sessment is no longer to determine if which are specifically related to in-
conducted in a primarily Hispanic the findings are relevant to ones ternal or external validity. In addi-
population at a single practice site in practice. tion, the relationships among sec-
the Southwest and that generalizability A similar logic exists with respect tions and the role of information are
to other populations is unknown. to external validity; if there is no in- not readily apparent with checklists.
ternal validity, then there is no treat- The cognitive map described closely
Steps in establishing internal and ment effect to generalize. Hence, the resembles the structure suggested in
external validity question of generalizability becomes the Consolidated Standards of Re-
The three-step process shown in moot. porting Trials (CONSORT) state-
Table 2 can be used to assess the va- ment, which does focus on the key
lidity of a studys findings and deter- Discussion pieces of information needed to eval-
mine if they are relevant to readers The cognitive map presented of- uate internal and external validi-
practices. The first step in establish- fers a guide to addressing specific ty.23,24 However, again, the cognitive
ing validity is to assess the statistical problems with the internal validity of map highlights the relationships and
conclusion. Only if the conclusion is experimental studies. This guide will roles of information in the report,
valid is internal validity assessed; help investigators structure the in- not just the content of each section.
similarly, external validity is assessed formation required to establish a Because checklists address a broad
only if internal validity is established. cause-and-effect relationship and range of issues involved in reporting
This is the decision process recom- will steer readers toward the same in- research, the cognitive map described
mended by Campbell and Stanley1 formation as they assess validity. The should be seen as a supplement to
and Cook and Campbell.2 If there is clear delineation of specific threats to checklists, not a replacement. Also,
no significant difference among internal validity and of the relation- the map should be differentiated
Table 2.
Steps for Assessing Validity of an Experimental Studya
Step Assessment Process Decision
1. Validity of statistical conclusion Assess statistical significance (i.e., Difference is real and is not likely
p value is 0.05 and statistical due to chance variation;
results are valid). proceed to next step.
OR
Difference is likely due to chance
variation; stop here.b
2. Internal validity Assess internal validity on basis Difference is most likely due to
of research design and thetreatment; proceed to next
operational procedures. step.
OR
Difference is probably due to the
effects of confounding factors
or bias; stop here.
3. External validity Examine inclusion and exclusion Study participants are similar to
criteria and characteristics of patients the report reader sees;
study participants. the treatment should be useful.
OR
Study participants are very
different from patients the
report reader sees; the
treatment may or may not be
useful.
a
Use these steps when determining if research findings are applicable to a particular practice situation.
b
Internal validity may be assessed if the purpose is to determine if threats to internal validity may be producing extraneous variance that has obscured the treatment
effect.

from checklists and scales used to as- study can systematically address in- 14. Altman DG, Dore CJ. Randomisation and
baseline comparisons in clinical trials. Lan-
sess the overall quality of a study. The ternal and external validity so that cet. 1990; 335:149-53.
purpose of checklists is to assess re- the effects of the treatment are accu- 15. Alberts DS, Martinez ME, Roe DJ et al. Lack
search that has been reported and rately portrayed and generalization of effect of a high-fiber cereal supplement
not necessarily to assist investigators of the findings is appropriate. on the recurrence of colorectal adenomas. N
Engl J Med. 2000; 342:1156-62.
in structuring a report.25 16. Devereaux PJ, Manns BJ, Ghali WA et al.
The cognitive map is limited in References Physician interpretations and textbook defi-
1. Campbell DT, Stanley JC. Experimental and nitions of blinding terminology in random-
that some knowledge of research de- quasi-experimental designs for research. ized controlled trials. JAMA. 2001; 285:
sign is required to adapt it to specific Boston: Houghton Mifflin; 1963. 2000-3.
research situations. Also, while some 2. Cook TD, Campbell DT. Quasi-experimen- 17. Mossad SB, Macknin ML, Medendorp SV et
tation: design and analysis issues for field al. Zinc gluconate lozenges for treating the
aspects of the map, such as threats settings. Boston: Houghton Mifflin; 1979.
common cold. Ann Intern Med. 1996; 125:
related to withdrawals, protocol ad- 3. Elwood M. Critical appraisal of epidemio-
81-8.
herence, and placebo use, can be logical studies and clinical trials. 2nd ed.
18. Efron B. Foreword in special issue on ana-
Oxford, England: Oxford Univ. Press; 1998.
adapted to other research designs, 4. Schulz KF, Chalmers I, Hayes RJ et al. Em-
lyzing non-compliance in clinical trials. Stat
Med. 1998; 17:249-50.
other designs have additional prob- pirical evidence of bias: dimensions of
19. Polk RE, Hepler CD. Controversies in anti-
lems that must be considered. 7 methodological quality associated with esti-
mates of treatment effects in controlled tri- microbial therapy: critical analysis of clinical
Knowledge of statistical techniques, als. JAMA. 1995; 273:408-12. trials. Am J Hosp Pharm. 1986; 43:630-40.
such as multivariate logistic regres- 5. Moher D, Jadad AR, Nichol G et al. Assess- 20. Beresford SA, Curry SJ, Kristal AR et al. A
ing the quality of randomized controlled tri- dietary intervention in primary care prac-
sion, may be necessary to adequately tice: the eating patterns study. Am J Public
als: an annotated bibliography of scales and
address some questions about inter- checklists. Control Clin Trials. 1995; 16: Health. 1997; 87:610-6.
nal validity. 62-73. 21. Theroux P, Ouimet H, McCans J et al. As-
6. Chalmers TC, Smith H Jr, Blackburn B et al. pirin, heparin, or both to treat acute unstable
The cognitive map should im- angina. N Engl J Med. 1988; 319:1105-11.
A method for assessing the quality of a ran-
prove pharmacists ability to effec- domized control trial. Control Clin Trials. 22. Asilomar Working Group. Checklist of in-
tively communicate their research 1981; 2:31-49. formation for inclusion in reports of clinical
7. Harrison DL, Draugalis JR. Critically evalu- trials. Ann Intern Med. 1996; 124:741-3.
findings. Pharmacists who have con- 23. Begg C, Cho M, Eastwood S et al. Improving
ating research methods: an introduction.
ducted high-quality research can Manag Care Med. 1996; 3:23-7. the quality of reporting of randomized con-
more accurately represent study 8. Rabow J, Charness MA, Kipperman JK et al. trolled trials. JAMA. 1996; 276:637-9.
quality in their report. Improvement Learning through discussion. Thousand 24. Moher D, Schulz KF, Altman D. The CON-
Oaks, CA: Sage; 1994. SORT statement: revised recommendations
of study reporting is a need that has 9. Rosenwasser D, Stephen J. Writing analyti- for improving the quality of reports of
been recognized in both pharmacy cally. Fort Worth, TX: Harcourt College; parallel-group randomised trials. JAMA.
and medicine,21,26 and structured ap- 1997. 2001; 285:1987-91.
10. Friedman LM, Furberg CD, DeMets DL. 25. Moher D, Jadad AR, Tugwell P. Assessing
proaches to writing are believed to Fundamentals of clinical trials. 3rd ed. St. the quality of randomized controlled trials.
help authors attend to essential de- Louis: Mosby; 1996:204-22. Int J Technol Assess Health Care. 1996;
tails.27 Indeed, an evaluation of the 11. Everitt BS, Pickles A. Statistical aspects of 12:195-208.
the design and analysis of clinical trials. 26. Ferrill MJ, Norton LL, Blalock SJ. Deter-
impact of the CONSORT statement London: Imperial College Press; 1999. mining the statistical knowledge of pharma-
found that journal articles were more 12. Standards of Reporting Trials Group. A cy practitioners: a survey and review of the
likely to include checklist items after proposal for structured reporting of ran- literature. Am J Pharm Educ. 1999; 63:371-6.
domized controlled trials. JAMA. 1994; 272: 27. Hartley J. From structured abstracts to
journals began using it.28 1926-31. structured articles: a modest proposal. J
13. Schulz KF, Chalmers I, Grimes DA et al. Tech Writ Commun. 1999; 29:255-70.
Conclusion Assessing the quality of randomization from 28. Moher D, Jones A, Lepage L. Use of the
reports of controlled trials published in ob- CONSORT statement and quality of reports
By using a cognitive map, investi- stetrics and gynecology journals. JAMA. of randomized trials. JAMA. 2001; 285:
gators reporting an experimental 1994; 272:125-8. 1992-5.

Establishing The Internal and External Validity of Experimental Studies

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Establishing The Internal and External Validity of Experimental Studies

Transféré par

Droits d'auteur :

Formats disponibles

PRIMER Experimental studies

Establishing the internal and external validity

Am J Health-Syst PharmVol 58 Nov 15, 2001 2173

2174 Am J Health-Syst PharmVol 58 Nov 15, 2001

Am J Health-Syst PharmVol 58 Nov 15, 2001 2175

2176 Am J Health-Syst PharmVol 58 Nov 15, 2001

Am J Health-Syst PharmVol 58 Nov 15, 2001 2177

2178 Am J Health-Syst PharmVol 58 Nov 15, 2001

Am J Health-Syst PharmVol 58 Nov 15, 2001 2179

2180 Am J Health-Syst PharmVol 58 Nov 15, 2001

Am J Health-Syst PharmVol 58 Nov 15, 2001 2181

Vous aimerez peut-être aussi