Académique Documents
Professionnel Documents
Culture Documents
We present an analysis of the dimensionality of the scales that assess the nine team roles
contained in the Team Role Self-Perception Inventory. Using a data set of over 14,000
respondents, reasonable fit to seven-item unidimensional factor models was obtained for
all scales except Implementer and Specialist. Two-factor structures for all scales showed
improvements in model fit although for all roles a small and unreliable second factor was
found. Bi-dimensional structures reflect the separate loading of negatively worded items
and/or different item content areas. Five-item scales provide a more economical version of
the inventory and areas for further development of the instrument are identified.
number of measures, in this case nine (Dunlap & Cornwell, roles reflecting the scales given the highest scores. If only
1994). In addition, due to scoring restraints, ipsative scores that are given to the highest ranked roles are
instruments should only be used for intra-individual analysed then the level of interdependence within these
comparison. This arises as ipsative scoring shows only a items will be at most very small. Furthermore, the deletion
person’s relative position among the several dimensions of the scores given to any ‘‘dross’’ scale items helps to
being scored. We do not know the absolute strength of one increase the level of variability in total scores that will also
person’s preference for a particular team role relative to be assisted by large sample sizes. In light of these features
another person, for instance, and for this reason inter- we suggest that normative statistical approaches can be
individual comparisons should not be undertaken. usefully applied to TRSPI data for certain analyses (see
Problems associated with analysing and interpreting Hicks, 1970). While the instrument’s overall factor
ipsative data have long been recognized. The fundamental structure should not be examined from TRSPI data alone,
problem is that scores given by respondents to items retain it seems clear that scale items can be investigated for
a level of interdependence and are not independent as is the reliability along with the factor structure of individual
case in a normative instrument. Because of the average scales.
negative correlations that arise, conventional factor analy-
tic approaches are discouraged (e.g., see Dunlap &
Previous Psychometric Evaluation
Cornwell, 1994, for a review). Dunlap and Cornwell
deduced that principal components analysis ‘‘will produce The earliest attempts at evaluation were marked by small
artifactual bipolar factors that result, not from the true sample sizes. This led to researchers assigning a ‘‘score’’ of
underlying relationships between the variables, but from zero to the items that respondents did not distribute points
relationships induced solely by the ipsative nature of the to in order to produce scales with seven scored items for
measure’’ (p. 122). The risk of identifying an incorrect analysis, even though a large proportion of the items were
factor structure instead of the true relationships is too ‘‘scored’’ in this way. We have reservations about the
great. Problems also arise when computing reliability appropriateness of this method because, as most items are
estimates as item scores are not measures of the construct unscored, the covariance matrices on which reliability
that they are intended to measure plus error, as Classical estimates were based relied heavily on ‘‘scores’’ (zeros) not
Test Theory requires, but ‘‘as a response to an item within provided by respondents. Reliability estimates obtained
the context of an item set and the properties of the other following this procedure were low (Broucek & Randell,
items therein’’ (Meade, 2004, p. 537). 1996; Furnham et al., 1993a).
The TRSPI, as a forced-choice instrument, is open While the initial studies were quite critical of the TRSPI,
therefore to the issues outlined above. However, it is clear more recent work using large data sets and which utilizes
that the way forced-choice instruments are designed does only the items scored by respondents has indicated that
influence the scope available to analyse them. The most scale reliability is much better than previously estimated
extreme case occurs where an instrument represents two (Swailes & McIntyre-Bhatty, 2002). Theoretically, the
factors with two items in each item set where only one item scales are unidimensional and, as internal-consistency
can be scored. Saville and Willson (1991) showed that reliability estimates assume unidimensionality it is impor-
where the number of factors is large then the correlation tant to understand more about the structures of the scales
between normative and ipsative scale scores is high and in the instrument. A review of research into the nature of
that reliability estimates are not overestimated. Increasing the TRSPI and the relationships between team roles and
the number of factors decreases interdependence among other variables suggested that discriminant validity be-
them. tween the nine roles was lacking (Aritzeta et al., in press).
The grouping of items also influences the level of Previous work on the factor structure of the inventory
ipsativity. The most desirable situation is where an item has demonstrated bipolar structures (Beck, Fisch, &
from one scale occurs with an item from all other scales in Bergander, 1999; Dulewicz, 1995; Furnham et al., 1993a;
an item set. It is also desirable that traits are measured with Senior, 1998) whether using the TRSPI or personality
the same number of items (Meade, 2004, p. 538). Large measures to construct team roles and some authors have
item sets will also help to reduce interdependence. The explained factor bidimensionality in terms of the ipsative
TRSPI assesses nine team roles using seven items per role. format of the TRSPI. However, acknowledging that such
The items are divided into nine item sets with one item per an effect is present to some extent, it is necessary to explore
role in each set and so scores given to items in one set are further whether the internal structure of the TRSPI’s scales
not dependent on the scores given to any other set. In could be a reason for the lack of discriminant validity and
addition, scores given to the seven items within a scale are whether their structure can explain factor structures found
not dependent upon each other although they are by previous authors. If scale structures are incoherent or
influenced by the scores given to other scale items. depart strongly from unidimensionality then this could be
The TRSPI produces a ranking of a person’s preferences an explanation for weak discrimination between team
towards nine team roles with the highest ranked (natural) roles.
Only one previous study has looked at this issue and for the discrepancy function, C, is an indicator that there is
confirmatory factor analysis revealed that five scales no significant difference between the data and the model
showed very good fit to a unidimensional structure as and is thus an indicator of very good fit. Small differences
indicated by p values above .05 (Swailes & McIntyre- between the model and the data will however yield
Bhatty, 2003). These scales were Coordinator, Monitor significant p values for C when sample sizes become large
Evaluator, Plant, Specialist, and Teamworker. The other (typically over 100) and so other fit indices have been
four scales showed less good fit: Completer Finisher C/ developed. We used the ratio of C to the number of degrees
df 5 1.86, comparative fit index (CFI) 5 .91; Implementer of freedom in the model (C/df) which is ideally below three
C/df 5 3.32, CFI 5 .85; Resource Investigator C/df 5 1.80, (Medsker et al., 1994), the root mean square error of
CFI 5 .95 and Shaper C/df 5 2.63, CFI 5 .95. In addition, approximation (RMSEA) which below .08 represents
there were indications that the Completer Finisher, ‘‘reasonable’’ fit and below .05 represent ‘‘close’’ fit (Browne
Implementer and Shaper scales were bi-dimensional and & Cudeck, 1993), the lower and upper limits of a 90%
the sample sizes were less than 100 for two roles and confidence interval on the population value of RMSEA and
between 100 and 200 for five roles. This paper presents a a p value (PCLOSE) for testing the null hypothesis that
more detailed analysis of scale structures on a larger sample RMSEA is no greater than .05. Also reported are the
in an attempt to provide a more exhaustive exploration of normed fit index (Bentler & Bonnett, 1980) which if less
the instrument’s properties and the stability of its scales. than .9 usually indicates substantial scope for improvement,
the incremental fit index (Bollen, 1989) which when close to
unity indicates very good fit, the goodness of fit index which
Method also approaches unity indicating very good fit and the CFI
which is 1.0 at perfect fit and which is ideally .95 or above
Sample
(Hu & Bentler, 1999). Reliability was assessed using a
The study used data from 14,311 respondents to the formula for composite reliability (Bagozzi, 1994) along with
English version of the nine-role TRSPI. The dataset was Cronbach’s a for comparison.
provided by the test publisher and respondents are drawn
from a wide range of occupations, management roles and
seniority. Forty per cent of respondents were female. Results
Thirty-seven items had a modal score of two and 26 had a
Data Analysis modal score of one. Mean item scores ranged from 1.58 to
Data were analysed with AMOS version 5.0 and exploration 2.69. The item scored the least (a Monitor Evaluator item)
of the data suggested that asymptotically distribution-free was scored 3149 times and the item scored most often
(ADF) estimation should be used in light of the sharp (a Teamworker item) was scored 11,498 times. Across the
departures from normality observed for most variables. This whole data set the numbers of respondents that had scored
was confirmed by running 1000 bootstrap samples to all seven items in a scale ranged from 257 for the Plant scale
compare estimation criteria: the smallest mean discrepancy to 1229 for the Shaper scale.
was obtained with ADF estimation. One-factor models in
which all seven items loaded onto a single factor were fitted Single Factor Models
to the data. Subsequently, the specification search facility in
AMOS was used to examine the fit of all possible two-factor For the seven-item scales, reasonable fit as judged through
structures. In this approach, any measured variable (in this non-significant p values for C was obtained for Completer
case the scored items) can depend upon any factor Finisher, Co-ordinator, Monitor Evaluator and Teamwor-
(Arbuckle, 2003). With seven items loading onto two factors ker. Using C/df and RMSEA as a guide, Plant, Resource
there are potentially 16,284 (214) possible models. Most of Investigator and Shaper also fitted reasonably well although
these models are discarded as unidentified or as inadmissible Resource Investigator items 5 and 6 had non-significant
or because of poor fit indices. To help choose from the loadings (p4.05) onto the latent factor. Implementer and
surviving two-factor models, scree plots and best-fit plots Specialist scale data appeared to fit least well and for the
were examined for each of the model fitting attempts and Specialist scale item five also had a non-significant loading.
these suggested evaluation of models with 15 parameters. a’s were .7 or above for the Co-ordinator, Plant, Resource
Investigator, Shaper and Teamworker scales.
Assessing Fit
Two-Factor Models
There are many ways of assessing the fit of structural models
although much judgement remains in reaching decisions All two-factor models showed improved fit over one-factor
about how well data fit a model – see Medsker, Williams, models with all but the Implementer, Shaper and Specialist
and Holahan (1994) for a review. A non-significant p value scales yielding non-significant p values for C. However, the
Implementer scale showed other good indicators (CFI .91, remaining five item factors (four items in the case of
RMSEA .03). The intercorrelation, r, between the two Resource Investigator) was .7 or above for five scales and
factors in each scale was mostly moderate to strong. The .69 for Shaper, .61 for Implementer and Specialist, and .58
lowest intercorrelation, .31, was for Completer Finisher for Monitor Evaluator.
whereas others ranged from .41 (Implementer) to .89
(Resource Investigator).
Parsimonious One-Factor Scales
Factor loadings revealed a consistent pattern such that
for Completer Finisher, Co-ordinator, Implementer, Shaper In light of the weak properties shown by the small, two-
and Teamworker, the second and seventh items in each item factors further tests were conducted to find more
scale loaded onto the second factor. Two-item second parsimonious scales. With the small factors discarded, five-
factors were also observed for Monitor Evaluator (items 2 item unidimensional models were tested and this produced
and 4), Plant (items 5 and 7) and Specialist (items 2 and 5). improvements for all scales although Implementer, Moni-
The only exception to this pattern, Resource Investigator, tor Evaluator and Specialist continued to show poor
split into a four-item factor (items 1, 2, 3, 7) and a three- reliability (a and composite reliability less than about .7).
item factor for the best fitting model. The composite Although the data from the Resource Investigator scale
reliability of the smaller factors, however, was low and split into a four-item and a three-item factor, the fit of a
ranged from .35 to .59 except for Resource Investigator five-item model obtained by dropping items 5 and 6 was
(.69) and Shaper (.72). Composite reliability of the still good (C 5.9, df 5, p .31, a .73) (Table 1).
Notes: CF 1F7, seven item, one-factor model for Completer Finisher scale; CF 2F, seven-item, two-factor model; CF
1F5, five item, one-factor model; a, Cronbach’s a, r, factor intercorrelation for the two-factor models; cr, composite
reliability.RMSEA L–H gives the lower and higher limits of a 90% confidence interval on the population value of RMSEA.
this takes place, research studies should treat findings Journal of Occupational and Organizational Psychology, 68,
relating to the Specialist and Implementer roles with 81–99.
Dunlap, W.P. and Cornwell, J.M. (1994) Factor analysis of ipsative
particular caution. measures. Multivariate Behavioural Research, 29, 115–126.
This paper helps to form a more complete picture of the Furnham, A., Steele, H. and Pendleton, D. (1993a) A psychometric
TRSPI and its properties. The results of this study are assessment of the Belbin Team-Role Self-Perception Inventory.
somewhat mixed in that while they do not demonstrate Journal of Occupational and Organizational Psychology, 66,
nine robust scales it is clear that a majority of the scales 245–257.
Furnham, A., Steele, H. and Pendleton, D. (1993b) A response to Dr
show acceptable properties.
Belbin’s reply. Journal of Occupational and Organizational
Psychology, 66, 261.
Hicks, L.E. (1970) Some properties of ipsative, normative and
References forced-choice normative measures. Psychological Bulletin, 74,
167–184.
Arbuckle, J.L. (2003) Amos 5.0 update to the Amos user’s guide. Hu, L. and Bentler, P.M. (1999) Cutoff criteria for fit indices in
Chicago: SmallWaters Corporation. covariance structure analysis: Conventional criteria versus new
Aritzeta, A., Swailes, S. and Senior, B. (in press) The team role self- alternatives. Structural Equation Modelling, 6, 1–55.
perception inventory: Development, validity and applications Margerison, C. and McCann, D. (1990) Team management.
for team building. Journal of Management Studies. London: W.H. Allen.
Bagozzi, R.P. (1994) Structural equation models in marketing Meade, A.W. (2004) Psychometric problems and issues involved
research: Basic principles. In R.P. Bagozzi (Ed.), Principles of with creating and using ipsative measures for selection. Journal
marketing research (pp. 317–385). Oxford: Blackwell. of Occupational and Organizational Psychology, 77, 531–552.
Beck, D., Fisch, R. and Bergander, W. (1999) Functional roles in Medsker, G.J., Williams, L.J. and Holahan, P.J. (1994) A review of
work groups – an empirical approach to the study of group role current practices for evaluating causal models in organizational
diversity. Psychologische Beiträge, 41, 288–307. behaviour and human resources management research. Journal
Belbin, M. (1981) Management teams, why they succeed or fail. of Management, 20, 439–464.
London: Heinemann. O’Doherty, D.M. (2005) Working as part of balanced team. Inter-
Belbin, M. (1993) Team roles at work. Oxford: Butterworth- national Journal of Engineering Education, 21, 113–120.
Heinemann. Parker, G.M. (1990) Team players and teamwork: The new
Belbin, M., Aston, R. and Mottram, D. (1976) Building effective competitive business strategy. Oxford: Josey-Bass.
management teams. Journal of General Management, 3, 23–29. Saville, P. and Willson, E. (1991) The reliability and validity of
Bentler, P.M. and Bonnett, D.G. (1980) Significance tests and normative and ipsative approaches in the measurement of
goodness of fit in the analysis of covariance structures. personality. Journal of Occupational Psychology, 64, 219–238.
Psychological Bulletin, 88, 588–606. Senior, B. (1997) Team roles and team performance: Is there ‘really’
Bollen, K.A. (1989) A new incremental fit index for general a link? Journal of Occupational and Organizational Psychology,
structural equation models. Sociological Methods and Research, 70, 241–258.
17, 303–316. Senior, B. (1998) An empirically-based assessment of Belbin’s team
Broucek, W.G. and Randell, G. (1996) An assessment of the roles. Human Resource Management Journal, 8, 54–60.
construct validity of the Belbin Self-Perception Inventory and Senior, B. and Swailes, S. (1998) A comparison of the Belbin self
Observer’s Assessment from the perspective of the five-factor perception inventory and observer’s assessment sheet as mea-
model. Journal of Occupational and Organizational Psychology, sures of an individual’s team roles. International Journal of
69, 389–405. Selection and Assessment, 6, 1–8.
Browne, M.W. and Cudeck, R. (1993) Alternative ways of assessing Spencer, J. and Pruss, A. (1992) Managing your team. London:
model fit. In K.A. Bollen and J.S. Long (Eds), Testing structural Piatkus.
equation models. Newbury Park, CA: Sage. Swailes, S. and McIntyre-Bhatty, T. (2002) The ‘‘Belbin’’ team role
Davis, J., Millburn, P., Murphy, T. and Woodhouse, M. (1992) inventory: Reinterpreting reliability estimates. Journal of Man-
Successful team building: How to create teams that really work. agerial Psychology, 17, 529–536.
London: Kogan Page. Swailes, S. and McIntyre-Bhatty, T. (2003) Scale structure of the
Dulewicz, V. (1995) A validation of Belbin’s team roles from team role self perception inventory. Journal of Occupational and
16PF and OPQ using bosses’ ratings of competence. Organizational Psychology, 76, 525–529.
Appendix A
Table A1. Team role descriptors, strengths and allowed weaknesses
Team role Descriptors Strengths Allowed weaknesses
Completer finisher (CF) Anxious, conscientious, introvert, Painstaking, conscientious, searches out errors Inclined to worry unduly. Reluctant
self-controlled, self-disciplined, and omissions, delivers on time to delegate
submissive and worrisome
Implementer (IMP) Conservative, controlled, disciplined, efficient, Disciplined, reliable, conservative and Somewhat inflexible. Slow to
inflexible, methodical, sincere, stable and efficient, turns ideas into practical actions respond to new possibilities
systematic
Teamworker (TW) Extrovert, likeable, loyal, stable, submissive, Co-operative, mild, perceptive and diplomatic, Indecisive in crunch situations
supportive, unassertive, and uncompetitive listens, builds, averts friction, calms the waters
Specialist (SP) Expert, defendant, not interested in others, Single-minded, self-starting, dedicated; Contributes on a narrow front only.