Académique Documents
Professionnel Documents
Culture Documents
Abstract
Background: One of the determinants of the medical student’s behaviour is the medical school learning environment.
Aim: The aim of this research was to identify the instruments used to measure the educational environment in health professions
education and to assess their validity and reliability.
Med Teach Downloaded from informahealthcare.com by University of Melbourne on 11/26/10
Methods: We performed an electronic search in the medical literature analysis and retrieval system online (MEDLINE) and Timelit
(Topics in medical education) databases through to October 2008. The non-electronic search (hand searching) was conducted
through reviewing the references of the retrieved studies and identifying the relevant ones. Two independent authors read, rated
and selected studies for the review according to the pre-specified criteria. The inter-rater agreement was measured with Kappa
coefficient.
Results: Seventy-nine studies were included with the Kappa coefficient of 0.79, which demonstrated a reliable process, and 31
instruments were extracted. The Dundee Ready Education Environment Measure, Postgraduate Hospital Educational Environment
Measure, Clinical Learning Environment and Supervision and Dental Student Learning Environment Survey are likely to be the
most suitable instruments for undergraduate medicine, postgraduate medicine, nursing and dental education, respectively.
For personal use only.
Conclusions: As a valid and reliable instrument is available for each educational setting, a study to assess the educational
environment should become a part of an institution’s good educational practice. Further studies employing a wider range of
databases with more elaborated search strategies will increase the comprehensiveness of the systematic review.
Correspondence: Diantha Soemantri Department of Medical Education, Faculty of Medicine, Universitas Indonesia, Jakarta Pusat 10430, Indonesia.
Tel/Fax: 62 21 3901814; email: dianthasoemantri@yahoo.com; diantha.soemantri@ui.ac.id
ISSN 0142–159X print/ISSN 1466–187X online/10/120947–6 ß 2010 Informa UK Ltd. 947
DOI: 10.3109/01421591003686229
D. Soemantri et al.
should evaluate those areas that are fostered and encouraged results, the importance of the decision that will be made and
in the institution. ultimately the consequences resulting from the measurement.
The term environment and climate are often used inter- Reliability coefficient of 0.60 is considered acceptable for
changeably in educational literature. Genn (2001b) defines the questionnaires (Nunnally 1978).
term climate as the educational environment of an institution
as perceived by students. Roff and McAleer (2001) use the term
educational climate as the synonym of educational environ- Objectives
ment. In the study conducted by Rothman & Ayoade (1970), The objectives of this systematic review were to respond to the
the learning environment is defined as what the students following two research questions:
perceive and thus no differentiation is made between the term
environment and climate. Therefore, in this article the educa- (1) Which instruments or inventories are used to measure
tional climate will be regarded as similar to educational the educational environment in health professions
environment. education?
However, it is the students’ perceptions of the environment, (2) How is the suitability of each instrument in measuring
rather than the environment itself, that determines the behav- the educational environment in health professions
iour (Hutchins 1961; Rothman & Ayoade 1970; Genn 2001a; education?
Konings et al. 2005). Students’ perceptions of the classroom
Med Teach Downloaded from informahealthcare.com by University of Melbourne on 11/26/10
instrument or the suitability of that instrument to measure quality assessment of included studies and data extraction,
educational environment. The qualities, or often referred to as synthesis of study results, interpretation of results and report
psychometric features, usually come under two main head- writing (Pai et al. 2004). These steps should result in a
ings, which are validity and reliability. A valid and reliable summary of evidence related to a specific research question.
learning environment tool allows a meaningful measure of the
learning environment of an institution, and thus appropriate
measures to improve the environment can be taken. The Search strategy
differences among various educational settings may require a
different educational environment inventory, which suits the We performed an electronic search in the MEDLINE and
specific situation of a particular institution. Timelit (Topics in Medical Education) databases through to
In any measurement process, it is necessary to ascertain October 2008. The non-electronic search (hand searching) was
whether the instrument or inventory measures what it conducted through reviewing the references of the retrieved
supposes to measure; and validity deals with this specific studies and identifying the relevant ones. The MEDLINE search
question. Content validity refers to the extent that the strategy combined the available and appropriate headings
instrument or inventory measures the intended subject medical subject (MeSH) which are (exp) educational mea-
matter content (Gronlund 1976a). Criterion-related validity surement/, (exp) education/, (exp) learning/ with the free
refers to the extent to which the measurement results can be keywords ‘education* environment’, ‘climate’, ‘learning envi-
used to predict a future outcome and also the extent to which ronment’ and ‘education* climate’. In the Timelit database in
they correlate with current results ( performance) obtained by which the search using subject headings is not possible, a
other valid measures (Gronlund 1976a). Construct validity may search using the relevant keywords (‘educational climate’,
be defined as the extent to which the test can be used to ‘educational environment’, ‘education environment’ and
measure certain psychological constructs and the results ‘learning environment’) was conducted.
interpreted in terms of those constructs (Gronlund 1976a).
Reliability refers to the reproducibility of the measurement
Inclusion and exclusion criteria
or assessment results. A result may be reliable over different
periods of time, over different raters or over different samples We included studies that were in the area of health professions
of questions (Gronlund 1976b). An unreliable or inconsistent education (undergraduate or postgraduate), which measures
measurement result cannot possibly permit valid interpretation the educational or learning environment with a quantitative
of the result. A high correlation between the scores of method (employs an instrument or inventory). A study was
individual items, in other words a high internal consistency, excluded if it was a review article, a study in non-health
would indicate that the scores measure a single construct. The professions education, did not measure the overall educational
degree of reliability adequate for a particular measurement or learning environment of the institution and employed a
depends on: the purpose of the measurement, the use of the qualitative research method.
948
BEME Rapid Review
in different fields of health professions education was identi- graduate and postgraduate medicine, nursing, dental, chiro-
fied. The information concerning the educational environment practic education). Table 1 summarizes the identified
measurement instrument was extracted from the included instruments along with the settings in which the instruments
studies: the health professions education setting in which the were used.
instrument was developed and/or used; the general informa- Data regarding the use of the identified instruments in each
tion about the instrument, including its subscales; the validity article were extracted and synthesized. They include the
of the results (content validity, criterion-related validity, psychometric properties presented in the articles as the results
construct validity) and the reliability of the results. of the use of the instruments in the educational environment
Table 1. Undergraduate and postgraduate health profession educational environment measurement instruments identified in
the systematic review.
measurement processes. The synthesis of data were presented (human resources, educational material, time and cost) are
in the form of table for each area of the health profession study essential. In order to improve the comprehensiveness of the
(Tables 2–5, available at www.medicaltacher.org). The row search, the hand searching of the references cited in the
lists the educational environment measurement instrument retrieved articles was conducted. The process yielded a
whereas the column lists the psychometric properties of each substantial number of additional articles, which were not
instrument, which consist of content validity/content evidence, retrieved through the database search.
criterion validity/relation other variables, construct validity and The decision to include or exclude each study was based on
reliability/internal structure. the criteria visualized in the flowchart (Figure 1). Although the
flowchart was clear and the reviewers understood the criteria
beforehand, some disagreements occurred inadvertently
Discussion during the process. The disagreements mostly emerged when
Specific databases which contain publications in medical making the decisions on studies that only measured a certain
education are of limited availability. It becomes more chal- aspect of the educational environment; they resulted from
lenging to conduct a comprehensive search because of this differences in interpretation about those particular studies.
issue. Relevant papers are more likely to be missed and large A high percentage of agreement (90%) between the two
numbers of false hits occur. Moreover, finding the appropriate independent reviewers and the Kappa coefficient of 0.79 show
subject headings may become a problem because the database a fairly reliable result of the inclusion/exclusion (review)
(MEDLINE) is not specific to medical education and thus, the process. The systematic search yielded a substantial number of
availability of subject headings relevant to medical education is instruments for measuring educational environment in health
limited. It was then necessary to combine the search with professions education. The measurement of educational envi-
relevant keywords (‘education environment’, ‘education cli- ronment has received important attention from educationalists,
mate’, ‘learning environment’ and ‘climate’) to increase the highlighted by the availability of various instruments to be
specificity. Notwithstanding, our search strategy was likely to utilized in different educational settings.
mislay relevant studies due to the lack of specific search tools
for the medical education area. In that order, the use of only
Analysis of the validity and reliability of educational
two databases and a low-sensibility search strategy become
environment measurement instruments
limitations of this study. This is caused by the limited resources
on the researcher’s side. For a systematic and comprehensive Medicine (undergraduate). There were 12 educational envi-
search to be successfully conducted, adequate resources ronment measurement instruments identified in this study.
950
BEME Rapid Review
The summary of the psychometric qualities of the instrument- Four instruments: the STEEM (Cassar 2004), the OREEM
sare presented in Table 2, available at www.medicalteacher. (Kanashiro et al. 2006), the ATEEM (Holt & Roff 2004) and the
org. Several instruments (questionnaire from Parry et al. practice-based educational environment measure (Mulrooney
(2002), Pololi & Price (2000), Patel and Dauphinee (1985), 2005) were developed for use in the surgical/operating theatre
Robins et al. (1997), MSEI (Hutchins 1961) and MSEQ learning setting, anaesthetic theatre learning setting and
(Wakeford 1981, 1984)) had considerable weaknesses in practice-based component in general practice (GP) training,
their validity and reliability. The Learning Climate Measure respectively. The VA Learners’ Perceptions Survey was
was an instrument developed specifically to measure the designed for residents from different specialties within the
educational environment of Thai medical education United States VA clinical training setting (Keitz et al. 2003). In
(Wangsaturaka 2005). Therefore, it was a nation-specific the process of deciding the most suitable instrument for use in
instrument and its use was limited to that specific context. postgraduate medical education, it was appropriate to exclude
The STEEM was originally developed for postgraduate surgical those instruments. The decision was not based on the quality
learning setting. When applied to the undergraduate setting, it of those instruments, since most of them demonstrated good
has demonstrated some weaknesses in terms of its validity validity and reliability, but rather because of the specific
(Nagraj et al. 2006). content of the instrument, which limited their use in a more
The LEQ (Rothman 1970; Rothman & Ayoade 1970; Levy general postgraduate education setting.
et al. 1973; Kaufman & Mann 1996; Schwartz & Loten 2003, The content validity of the LEA was not established,
Med Teach Downloaded from informahealthcare.com by University of Melbourne on 11/26/10
2004a, b) MSLES (Feletti & Clarke 1981a, b; Clarke et al. 1984; although it showed fairly high internal consistency (Roth
Lancaster et al. 1997) and DREEM (Roff et al. 1997, 2001; et al. 2006). The questionnaire from Rotem et al. (1995) and
Al-Qahtani 1999; Al-Zidgali 1999; Baozhi 2003; Bassaw et al. the PHEEM established content validity through clear descrip-
2003; Vieira et al. 2003; Zaini 2003; Al-Hazimi et al. 2004a, b; tion of the development process of the instruments (Roff et al.
Mayya & Roff 2004; Till 2004, 2005; Jiffry 2005; Varma et al. 2005). The strengths of the DREEM were the high internal
2005; Dunne et al. 2006) demonstrated the robustness in terms consistency and concurrent validity. However, the content and
of their psychometric qualities. Their content validity was construct validities were not established since the factor
established and they also had high internal consistency. The analysis demonstrated that some modifications were necessary
construct validity of the MSLES and DREEM were indicated by for the instrument to be applied in postgraduate setting
For personal use only.
their ability to significantly differentiate the students’ percep- (Bassaw et al. 2003; de Oliveira Filho & Schonhorst 2005;
tions of the learning environment between medical schools de Oliveira Filho et al. 2005a, b).
with the traditional curriculum and ones with the more The PHEEM consisted of 40 items divided into three
innovative curriculum (Feletti & Clarke 1981b; Al-Qahtani subscales ( perceptions of role autonomy, perceptions of
1999; Zaini 2003; Al-Hazimi et al. 2004a). The DREEM showed teaching and perceptions of social support). The reliability of
an additional strength because it could be applied in medical the PHEEM was better compared to that of the Rotem, Godwin
schools in different countries, cultures and contexts and still and Du questionnaire. It has also been shown that by using
demonstrated consistent reliability. In relation to criterion PHEEM, reliable results can be obtained with feasible sample
validity, there was a relationship between the DREEM scores, size. In addition, the PHEEM had been administered to several
which reflected the students’ perceptions of the learning different sample groups, such as senior house officers and
environment and academic achievement (Baozhi 2003; Mayya specialist registrars from different specialties, and demon-
& Roff 2004). strated almost similar reliability coefficients in those groups
According to the data presented in the related articles, the (Jayashree 2004; Roff et al. 2005; Aspegren et al. 2007; Boor
DREEM is likely to be the most suitable instrument to be et al. 2007; Clapham et al. 2007). Some researches have
applied in undergraduate medical education settings. The conducted factor analysis on PHEEM and they demonstrated
validity had been established, and the instrument demon- different results regarding the factors, which the instrument
strated highly reliable results consistently throughout its measures (Aspegren et al. 2007; Boor et al. 2007; Clapham
administration in different contexts. The results also indicated et al. 2007). The construct validity of the Rotem, Godwin and
the ability of the DREEM to differentiate between the learning Du questionnaire was not clearly established. Therefore, the
environment of the traditional and one of the more innovative PHEEM is likely to be the most suitable instrument for use in
medical schools. postgraduate medical education because of its content validity,
high reliability and also its ability to be used in different
Medicine (postgraduate). Most of the identified educational postgraduate settings.
environment measurement instruments in postgraduate med-
ical education were developed for use in specific postgraduate Nursing. Most of the identified instruments were used to
specialty areas. The psychometric qualities of the identified assess the clinical learning environment in nursing education
instruments are summarized in Table 3, available at www. setting. A summary of the psychometric qualities of the
medicalteacher.org. Postgraduate medical education pro- instruments’ is presented in Table 4, available at www.
grammes consisted of many different specialties and levels of medicalteacher.org. All instruments, except CUCEI (Fisher &
training. Each speciality had its own uniqueness in terms of the Parkinson 1998), provided moderate to high reliability coef-
educational environment. This situation often required a ficients. Several instruments, such as the CPCLES (Letizia &
specific instrument which could address the certain aspects Jennrich 1998), the questionnaire from Hart and Rotem (1995),
of the educational environment in the particular setting. and the DREEM (Pimparyon et al. 2000; Al-Sketty 2003;
951
D. Soemantri et al.
O’Brien et al. 2008) demonstrated some weaknesses in regard the DSLES was more similar to MSLES than the LES, it was
to the construct validity of the instruments. Almost all more likely to have better content validity.
instruments have demonstrated a high degree of content
validity. The PREQ covered several aspects of an educational
environment (supervision, intellectual climate, clarity, infra-
Conclusion
structure, skills development and thesis examination process). The systematic search yielded 178 studies, which were
However, according to the validity analysis, there are some considered to be eligible for further review. The pre-specified
subscales or statements in the instrument that were too specific criteria were applied to those studies and finally 79 studies,
for a particular nursing education setting (Drennan 2008), which were the original/primary studies measuring the overall
which then limit its use at different nursing education settings. educational/learning environment in health professions edu-
The content and construct validities of the CLES have been cation using specific instruments, were included. A substantial
established. It was administered to the groups of nursing number of educational environment measurement instruments
students from different countries (Finland and United (31 instruments) were extracted from the included studies. The
Kingdom) and it has demonstrated marginal to high reliability use of a wider range of databases with more elaborated search
coefficient (internal consistency) in those studies (Saarikoski & strategies will increase the comprehensiveness of the system-
Leino-Kilpi 1999, 2002; Saarikoski et al. 2002). The CLEI had atic review.
also been administered in several contexts of nursing educa- The DREEM, PHEEM, CLES and DSLES are likely to be the
Med Teach Downloaded from informahealthcare.com by University of Melbourne on 11/26/10
tion. However, the internal consistency was fairly poor most suitable instruments for undergraduate medicine, post-
compared to that of other instruments (Chan 2001a, b, 2002, graduate medicine, nursing and dental education, respectively.
2003; Ip & Chan 2005; Henderson et al. 2006; Midgley 2006; The content validity was established through elaborate
Chan & Ip 2007). The reliability coefficients of the CLE Scale description of the development process of the instruments.
(Dunn & Burnett 1995; Dunn & Hansford 1997) were lower Some instruments have added values with the establishment of
than those of the CLES, although both the instruments’ their construct validity. These educational environment mea-
construct validity has been confirmed. Therefore, based on surement instruments also demonstrated consistency through-
the available information and analysis, the CLES is likely to be out their applications in different contexts/settings.
the most suitable instrument to assess nursing students’ Further analysis will be useful to explore the ability of each
For personal use only.
perceptions of their clinical learning environment. instrument as a predictor variable for a particular educational
achievement. Furthermore, a study on students’ perceptions of
the educational environment should become a part of good
Dentistry. There were five instruments for dental educational
educational practice of an institution, as a suitable, valid and
environment measurement identified in this study. The validity
reliable instrument for each educational context is available.
and reliability of those instruments varied and are summarized
in Table 5, available at www.medicalteacher.org.
There was only one instrument that was originally devel- Acknowledgements
oped for dental educational environment (Gerzina et al. 2005).
The authors thank Sue Roff and Sean McAleer for their
The others were modifications or identical version of instru-
comments and support in completing the project and finalizing
ments, which were originally designed to be used in medical
the manuscript. This work is partially supported by grants of
school. The content validity of those instruments was ques-
the National Commission for Scientific and Technological
tionable since there were no processes of validating them
Research (CONICYT), FONDECYT no. 11004336 to A.R.
before they were applied in dental education settings,
although the original versions may have proven to be valid. Declaration of interest: The authors report no conflicts of
In addition to that, two instruments (LES and DSLES) were interest. The authors alone are responsible for the content and
developed from the MSLES. It was an instrument first devel- writing of the article.
oped in 1970s and it might not encompass current trends or
changes in educational practice nowadays. The DSLES was
more identical to the MSLES than the LES, because the LES was Notes on contributors
a shorter version of the MSLES whereas the changes in the DIANTHA SOEMANTRI, MD, MmedEd, is a lecturer in medical education in
DSLES were only minor. the Department of Medical Education, Faculty of Medicine, Universitas
Indonesia.
The LES (Stewart et al. 2006) demonstrated the highest
overall reliability coefficient (Cronbach’s alpha of 0.97). The CRISTIAN HERRERA is member of the Evidence Based Medicine Unit and
the Health Policy and Systems Research Unit from the School of Medicine,
reliability of the ClinEd IQ (Henzi et al. 2006), the DREEM Pontificia Universidad Católica de Chile.
(Zamzuri et al. 2004) and the questionnaire from Gerzina et al.
ARNOLDO RIQUELME, MD, MmedEd, is an undergraduate and postgrad-
(2005) were not tested. The DSLES also demonstrated good uate clinical tutor in Internal Medicine and consultant in the Department of
reliability (Cronbach’s alpha of 0.91), which was comparable Gastroenterology, Pontificia Universidad Catolica de Chile School of
to those of the MSLES (Henzi et al. 2005). Medicine, Chile.
Based on the information provided in the studies and the
analysis conducted, the DSLES is likely to be the most suitable
instrument for measuring educational environment in dental
References
education setting. It demonstrated good reliability and since The references for this article can be viewed at www.medicalteacher.org.
952