Académique Documents
Professionnel Documents
Culture Documents
La reproduction ou représentation de cet article, notamment par photocopie, n'est autorisée que dans les
limites des conditions générales d'utilisation du site ou, le cas échéant, des conditions générales de la
licence souscrite par votre établissement. Toute autre reproduction ou représentation, en tout ou partie,
sous quelque forme et de quelque manière que ce soit, est interdite sauf accord préalable et écrit de
l'éditeur, en dehors des cas prévus par la législation en vigueur en France. Il est précisé que son stockage
dans une base de données est également interdit.
RECHERCHE EMPIRIQUE
EMPIRICAL RESEARCH
confirmed the three-factor structure but called into question the inclusion of
personal accomplishment in the conceptualization/measurement of burnout.
However, they also corroborated the existence of a “wording effect” that blurs
the “true” relationships between the burnout constructs. Thus, the develop-
ment of a new version of the MBI-HSS using bipolar scales is recommended.
Finally, these analyses suggest the removal of two to five items, a 17-item ver-
sion appeared to be the most satisfactory.
Keywords: Burnout measurement, Factor structure, Wording effect, Cross-
validation, Nomological validity, Item removal.
résumé
I. introduction
currently the most widely used scale to measure burnout among human
services workers and was translated into many languages, such as French,
Greek, Italian, Dutch, German, Japanese, Arabic, Spanish, Finnish, Swedish
and Norwegian. According to Schaufeli and Enzman (1998), 90 % of all
studies examining burnout have used the MBI.
I.3. Is the difference between pa and the other two constructs
(ee and dp) due to a “wording effect”?
(2007, see also Bresó, Salanova, & Schaufeli, 2007). Considering the factor
structure of the Dutch version of the MBI-General Survey, they hypoth-
esized that using positively worded items in order to measure a negative
phenomenon could be the cause of the weak magnitude of the correlations
between PA and the other two constructs (EE and DP). They considered
that “it would not make sense to assess lack of efficacy with reversed efficacy
items” and that using negatively worded items constitutes a better way to
measure such a lack. Hence, they used both kinds of items in four samples
(total N = 1099) and observed that positively worded items (measuring
efficacy) were more correlated with work engagement, while negatively
worded items (measuring inefficacy) were more correlated with burnout
(emotional exhaustion and cynicism).
However, they did not really assess the possibility of a “wording effect”
in Confirmatory Factor Analysis (CFA). This effect occurs when positively
and negatively worded items of the same scale do not induce symmetrically
opposed answers from the respondents. In other words, agreeing with a
positively worded item is not similar to rejecting a negatively worded one
and vice versa. Thus, these two types of items tend to constitute two cor-
related but distinct sets of items. The possibility of a “wording effect” has
been tested, for instance, in CFA on the Rosenberg self-esteem scale by
Horan, DiStefano and Motl (2003). In particular, they hypothesized that
the observed distinction between positive and negative items, which theo-
retically reflect the same latent factor, may be due to a personality trait or
© Presses Universitaires de France | Téléchargé le 09/12/2022 sur www.cairn.info (IP: 188.255.174.2)
To summarize, our empirical study reported here aimed (1) to test with
CFA the three-factor structure of the French version of the MBI-HSS,
(2) to estimate whether the PA construct must be included in the defini-
tion and the measurement of burnout, (3) to examine the possibility that
a “wording effect” explain the difference between this construct and the
other two (EE and DP) and (4) to consider the removal of items that
caused problems in other validation studies (items 6, 12, 16 and 22).
II. method
II.1. Participants
the three MBI-HSS sub-scales. If, as expected, the correlation between the
GHQ and the PA scores is lower in magnitude than with the EE and DP
scores, this would constitute additional evidence of the distinctiveness of
the PA construct.
were tested on the calibration sample; (2) then, the modification indices
were examined in order to identify a possible optimization of the fit of the
best model (note: these modifications must be theoretically interpretable),
(3) finally, the models previously estimated on the calibration sample were
tested on the validation sample, and the invariance of the models across
the two sub-samples was estimated via the χ² difference test (Δχ²). This
test was preferred to Cudeck and Browne’s (1983) Cross-Validation Index
(CVI) because the models were nested (Bollen, 1989). This two-sample
cross-validation technique was preferred to the Expected Cross-Validation
Index (ECVI, Browne & Cudeck, 1989) over the entire sample because the
use of two sub-samples eliminates bias in the a posteriori modification of
the models tested (e.g., on the basis of the so-called modification indices)
(Browne, 2000).
The four theoretical models were first estimated on the complete
22-item scale. Models M3 and M4 were also tested on shortened versions
of the scale (i.e., 20-item, 18-item and 17-item versions) in order to esti-
mate the gain in item removal. Overall, eleven CFA models were applied to
both sub-samples. Next, two CFA models were also implemented to assess
the invariance of models across them.
Finally, the internal consistency of sub-scales was estimated (Cronbach’s
alpha) and the nomological validity was assessed by computing correla-
tions between the MBI dimensions and the GHQ-12 score.
© Presses Universitaires de France | Téléchargé le 09/12/2022 sur www.cairn.info (IP: 188.255.174.2)
The structure matrix, which reflects the correlations between the latent
factors through the different items, illustrates that most items reflected the
expected factor more than the other two, except item 6 (“Working with
people all day is really a strain for me”) and item 16 (“Working directly
with people puts too much stress on me”). Such results suggest the exclu-
sion of all these items in order to optimize scaling (i.e., to retain only
18 items).
Given that the residual variance of M3a was fairly high (SRMR > .08),
some problematic items were removed. The modification indices suggested
relating item 12 to EE (associated decrease in χ² = 268.4) and to DP (minus
68.2 in χ²). They also suggested adding an error covariance between items
6 and 16 (minus 168.1 in χ²), a result indicating that a non-negligible part
of their common variance reflected another factor absent from M3. Then,
considering both the EFA and CFA results, the three-factor model was
also estimated on a 20-item version of the scale (i.e., without items 12 and
16) (M3b). As a result, all indices were improved. In particular, the SRMR
was lower than the recommended value of .08.
© Presses Universitaires de France | Téléchargé le 09/12/2022 sur www.cairn.info (IP: 188.255.174.2)
Table 3. Comparison of model fit indices of the MBI-HSS in confirmatory factor analysis.
Tableau 3. Comparaison des indices d’ajustement des modèles testés sur le MBI-HSS en analyse factorielle confirmatoire.
factor (EE/DP/PA) minus items 6, 12, = 3.76 [.045/.055] (M3c) = 2.98 [.037/.047] (M3c)
increase 16, 22 and one negative wording
model factor (remaining EE+DP items)
fit with a
shortened M4d: Three correlated factors (EE/ 400/105 .050 .97 .98 .042 496 86* 311/105 .042 .98 .98 .039 407 168*
scale? DP/PA) minus items 6, 12, 13, 16, = 3.81 [.045/.055] (M3d) = 2.96 [.037/.047] (M3d)
22 and one negative wording factor
(remaining EE+DP items)
Figure 1. Path diagram with standardized coefficients (all are significant) for the three-factor
model of the complete version (22-item) of the MBI-HSS-Fr for both sub-samples (M3a).
Figure 1. Diagramme de chemin avec coefficients standardisés (tous sont significatifs)
pour le modèle en trois facteurs appliqué sur la version complète (22 items) du MBI-HSS
pour les deux sous-échantillons (M3a).
- © PUF -
31 mai 2017 04:03 - Le travail humain n°2/2017 - Collectif - Le travail humain - 155 x 240 - page 175 / 240
Considering the EFA and CFA results, the validity of an 18-item ver-
sion of the MBI-HSS, minus items 12, 16, 6 and 22 (M3c) was also esti-
mated. Concerning item 22, the modification indices suggested adding a
structural link with EE (minus 61.8 in χ²), which corresponded to the cross-
loading observed in the EFA. Moreover, it loaded insufficiently on the DP
factor with a very high error-variance of .88/.85. Item 6 cross-loaded in the
EFA and the addition of the error-covariance with item 16 suggested by
the modification indices also argued in favor of its removal. In addition, its
error-variance was equal to .75/.72. The suppression of these items slightly
decreased the SRMR and χ²/df, as well as moderately decreasing the AIC
(which can be used to compare non-nested models).
Figure 2. Path diagram with standardized coefficients for the three-factor model plus
a negatively worded latent factor of the complete version (22-item) of the MBI-HSS-Fr
for both sub-samples (M4a). EE = emotional exhaustion, DP = depersonalization,
PA = personal accomplishment and NW = negative wording.
Figure 2. Diagramme de chemin avec coefficients standardisés pour le modèle en trois facteurs
plus le facteur latent « formulation négative » sur la version complète (22 items) du MBI-HSS-Fr
pour les deux sous-échantillons (M4a). EE = épuisement émotionnel, DP = dépersonnalisation,
PA = accomplissement personnel et NW = formulation négative.
- © PUF -
31 mai 2017 04:03 - Le travail humain n°2/2017 - Collectif - Le travail humain - 155 x 240 - page 177 / 240
IV. Discussion
Overall, our analyses showed that the French MBI-HSS assessed the
same three dimensions as the original measure. In the EFA, Velicer’s (1976)
MAP test corroborated the three-factor structure and most item load-
ings appeared to be satisfactory (with some exceptions). In the CFAs, the
three-factor model had a good fit with the complete 22-item version (M3a)
and an excellent fit with the shortened versions (M3b, M3c and M3d).
Moreover, this three-factor model outperformed the alternative one-factor
and two-factor models (Δχ²) and was found to be invariant across sub-
samples in cross-validation analysis (Byrne, 1991). Thus, the conformity
of the MBI-HSS to Maslach and Jackson’s (1981) theoretical model was
corroborated in our large sample of French healthcare providers.
© Presses Universitaires de France | Téléchargé le 09/12/2022 sur www.cairn.info (IP: 188.255.174.2)
Table 4. Descriptive statistics and correlation matrix for the different versions of the three sub-scales of the MBI-HSS
and for the GHQ-12. *p < .01; EE = emotional exhaustion, DP = depersonalization, PA = personal accomplishment
and GHQ = General Health Questionnaire. All correlations with the GHQ were obtained on a sub-sample (n = 1824),
all other correlations were obtained on the entire sample (N = 2357). All means (M) and standard deviations (SD) for
the MBI-HSS were calculated to be on the same scale [0 ≤ M ≤ 6], in order to facilitate the comparison between the
different versions of sub-scales. The GHQ score was scaled from 1 to 4. α refers to Cronbach’s reliability coefficient.
Tableau 4. Statistiques descriptives et matrice de corrélations pour les différentes versions des trois sous-échelles du MBI-HSS et pour le score au
GHQ-12. *p < .01 ; EE = épuisement émotionnel, DP = dépersonnalisation, PA = accomplissement personnel et GHQ = questionnaire général
de santé. Les corrélations avec le GHQ ont été obtenues à partir d’un sous-échantillon (n = 1824), toutes les autres corrélations proviennent de
l’échantillon complet (N = 2357). Toutes les moyennes (M) et tous les écart-types du MBI-HSS ont été calculés afin de pouvoir les situer sur
l’échelle de réponse fournie aux participants [0 ≤ M ≤ 6], ceci afin de faciliter les comparaisons entre les différentes versions des sous-échelles.
Le score au GHQ a été échelonné de 1 à 4. α renvoie au coefficient de fiabilité de Cronbach.
(Maslach, Schaufeli, & Leiter, 2001), such a result calls into question the
conceptual validity of the three-factorial definition of burnout or, alterna-
tively, suggests that answers collected by the MBI-HSS are moderately
biased by the obvious contrast between negative and positive items, thus
challenging its measurement precision.
Thus, because of such a bias, it is almost impossible to conclude
concerning the relevance of the inclusion of the (reduced) PA construct
in the definition and measurement of burnout when using the MBI-HSS
in its current form. Consequently, alternative procedures and materials
must be implemented to study this “wording effect” and to reduce its
impact. At first glance, using two types of items for EE, DP and PA (in line
with Salanova & Schaufeli, 2007; Bresó et al., 2007) appears to be a suit-
able option. However, it is likely to inflate this valence-driven bias in item
processing, a bias that could explain the two negatively correlated second-
order latent factors observed by Salanova and Schaufeli (2007) in CFA,
namely burnout (assessed with negative items) and engagement (assessed
with positive items).
Instead, it might be more relevant to use one set of bipolar items, with
response options that reflect the highest levels of burnout and symmetrical
options that reflect the lowest levels of burnout (e.g., “Most of the time,
when I get up in the morning and have to face another day on the job I
feel… [no fatigue at all to an extreme fatigue]”). At least, when using two sets
of items (positive and negative), one could examine the magnitude of this
© Presses Universitaires de France | Téléchargé le 09/12/2022 sur www.cairn.info (IP: 188.255.174.2)
dropping items 6 and 22 also improved model fit slightly, especially the
AIC value (see M3c). Moreover, our results showed that they were poor
indicators of their theoretical factors. The error-variance of item 22 was
very high and, additionally, it was found to reflect more the “negative
wording” latent factor than DP (see Figure 2). Similar observations were
made concerning items 6, 16 and 13, which reflected the “negative word-
ing” factor more than their theoretical factor (EE) (M4a) (see Figure 2)
and had a high error-variance when this fourth factor was absent from the
model (M3a). Furthermore, removing these items diminished the differ-
ence (Δχ²) between Maslach and Jackson’s (1981) three-factor model (EE,
DP and PA) and this alternative model (see the comparison between M3c
and M4c, and between M3d and M4d).
Items 6, 22 and 13 have been regularly questioned in past research.
Various empirical studies have shown that one or several of them fail
to load on any factor or to cross-load (Abu-Hilal, 1995; Abu-Hilal
& Salameh, 1992; Densten, 2001; Golembiewski, et al., 1983; Kanste
et al., 2006; Koeske & Koeske, 1989; Mor & Laliberte, 1984; Olivares-
Faúndez, Mena-Miranda, Jélvez-Wilke, Macía-Sepúlveda, & 2014; Pierce
& Molloy, 1989; Sabbah et al., 2012; Schaufeli & Van Dierendonck, 1993).
Concerning item 13, we can also have some doubts regarding its content
validity. Clearly, being frustrated is not being exhausted. A worker could be
both frustrated and bursting with energy. Thus, item 13 probably reflects
more a general negative attitude toward the job (dissatisfaction) than EE.
© Presses Universitaires de France | Téléchargé le 09/12/2022 sur www.cairn.info (IP: 188.255.174.2)
V. Conclusion
© Presses Universitaires de France | Téléchargé le 09/12/2022 sur www.cairn.info (IP: 188.255.174.2)
references
Seng Kam, C. C., & Meyer, J. P. (2015). Implications of item keying and item
valence for the investigation of construct dimensionality. Multivariate Behavioral
Research, 50(4), 457-469. doi: 10.1080/00273171.2015.1022640
Shirom, A., & Melamed, S. (2005). Does burnout affect physical health? A review
of the evidence. In Research Companion to Organizational Health Psychology
(pp. 599–622). Northampton, MA, USA: Edward Edgar Publishing.
Shirom, A., & Melamed, S. (2006). A comparison of the construct validity of two
burnout measures in two groups of professionals. International Journal of Stress
Management, 13(2), 176-200. doi: 10.1037/1072-5245.13.2.176
Truchot, D. (2009). Le burn-out des médecins généralistes : influence de l’ini-
quité perçue et de l’orientation communautaire. Annales Médico-Psychologiques,
167(6), 422–428. doi :10.1016/j.amp.2009.03.018
Vandenberghe, C., Stordeur, S., & d’Hoore, W. (2009). Une analyse des effets de la
latitude de décision, de l’épuisement émotionnel et de la satisfaction au travail
sur l’absentéisme au sein des unités de soins infirmiers. Le Travail Humain,
72(3), 209. doi:10.3917/th.723.0209
Vanheule, S., Rosseel, Y., & Vlerick, P. (2007). The factorial validity and measure-
ment invariance of the Maslach Burnout Inventory for human services. Stress
and Health, 23(2), 87-91. doi: 10.1002/smi.1124
Velicer, W. F. (1976). Determining the number of components from the matrix of
partial correlations. Psychometrika, 41(3), 321–327.
Walkey, F. H., & Green, D. E. (1992). An exhaustive examination of the replicable
factor structure of the Maslach Burnout Inventory. Educational and Psychological
Measurement, 52, 309-323.
Wheeler, D. L., Vassar, M., Worley, J. A., & Barnes, L. L. B. (2011). A reliabil-
ity generalization meta-analysis of coefficients alpha for the Maslach Burnout
© Presses Universitaires de France | Téléchargé le 09/12/2022 sur www.cairn.info (IP: 188.255.174.2)