Printer Friendly

A psychometric assessment of the Belbin Team-Role Self-Perception Inventory.

This paper set out to examine the psychometric properties of the extensively used, but little tested, Belbin (1981) Team-Role Self-perception Inventory which examines how people behave in teams. The original 56-item inventory was given to over 100 people from a variety of backgrounds in a non-ipsative Likert scaling form. The alpha coefficients for the eight roles were not impressive and factor analysis did not provide clear evidence of the proposed structure. The second experiment attempted a similar nonipsative analysis with the more recent 70-item version of the questionnaire which has nine team roles. Again the alpha coefficients were modest and the factor analysis suggested a more simple solution than suggested. Team-role scores did not correlate significantly with a large number of demographic factors any more than may be expected by chance. A third experiment used the original scale with the original ipsative scoring system using full-time managers as subjects. Again alpha levels were low, intercorrelations not as predicted and the factor structure unclear. Implications of these findings are discussed.

There is a considerable literature on the creation of occupational teams, the way in which they function and the consequences of their make-up. Much of the work in this field is descriptive and based on various case studies (Adair, 1986; Handy, 1985). However, there appears to be little empirical evidence to support various theories because of the extreme difficulty in measuring salient, ecologically valid and reliable, team-dependent outcome variables in order to establish some criterion of team success. Equally importantly, there is a lack of psychometrically valid measures of how people behave in teams.

This study aimed to examine, quite specifically, the psychometric properties of one such measure: the Belbin Team-role Self-perception Inventory (BTRSPI). This measure is used extensively in applied settings especially in selecting, counselling and developing management teams (Hogg, 1990) but has received comparatively little psychometric assessment or validation. Belbins, however, is not the only measure attempting to assess team-role behaviour. McCann & Margerison (1989) have also developed a team-role measure which also has eight types and appears to be heavily influenced by the Jungian theories developed in the Myers-Briggs Type Inventory test (Myers & McCaulley, 1985). While McCann & Margerison's questionnaire has norms and evidence of internal reliability and concurrent validity there appears to be little or no evidence of the factorial structure of the questionnaire (to confirm the classificatory/taxonomic scheme) nor any evidence of the predictive or construct validity of the test. More importantly, and ironically, it provides no evidence that any one mix of |team types' is any more efficient or effective than any other.

Belbin's (1981) BTRSPI first appeared in his frequently reprinted book Management Teams. Although the eight identified team roles are discussed extensively, the questionnaire is not, and it appears to be just one of a number of different ways to actually assess team-role preferences. Belbin does not discuss how the BTRSPI was developed nor does he clarify the theory of the structure behind the questionnaire. Belbin (1988) does propose that the questionnaire may be used to design a team. Furthermore he also suggests the eight roles can be further classified:

|Into the ark the managers went two by two. There were two types of negotiator ([resource investigator and team worker (RI and TW)I, manager-worker [company worker and completer finisher (CW and CF)I, intellectual [monitor evaluator and plant (ME and PL)I, and team leader [chairman and shaper (CH and SH)]. These ark members in all the various combinations of characteristics provide the basic material for populating the world of management teams. The RI is the creative negotiator; the TW, the internal facilitator; the CW, the effective organizer; the CF, the one who guarantees delivery; the ME, the analyser of problems; the PL, the source of original solutions; the CH, the team controller; and the SH, the slave-driver (where something stronger than control is needed). Management teams thrive on having members who are good examples of these types' (p. 123).

Belbin says relatively little about the meaning of the scores derived from the inventory though he does doubt some of the labels used. Yet he does note:

|The highest score on team role will indicate how best the respondent can make his or her mark in a management or project team. The next highest scores can denote back-up team roles towards which the individual should shift if for some reason there is less group need for a primary team role. The two lowest scores in team role imply possible areas of weakness. But rather than attempting to reform in this area the manager may be better advised to seek a colleague with complementary strengths' (p. 156).

Although norms based on a very limited sample (N = 78) were provided, little evidence of the psychometric properties of the test are offered. Yet it. has become a very popular counselling and developmental tool for business consultants, particularly in Britain. Thus we know little of the test's reliability (test-retest, split-half, internal), validity (concurrent, content, predictive, construct), nor of its dimensionality. The same appears to be true of the second extended version (Belbin Associates, 1988). The BTRSPI questionnaire is unusual and problematic for a number of reasons. Firstly it is an ipsative test where subjects are required to read seven hypothetical situations and then rate either eight (version 1) or 10 (version 2) behavioural statements relating to that situation and |distribute a total of 10 points among the sentences which you think most accurately describe your behaviour'.

This test is then quite clearly ipsative. Recently Johnson, Wood & Blinkhorn (1988) pointed out five |uncontroversial' drawbacks of such ipsative tests.

1. They cannot be used for comparing individuals on a scale-by-scale basis; 2. Correlations amongst ipsative scales cannot legitimately be factor analysed in the usual way; 3. Reliabilities of ipsative tests overestimate, sometimes severely, the actual reliability of the scales: in fact, the whole idea of error is problematical; 4. For the same reason, and others, validities of ipsative tests overestimate their utility; 5 .Means, standard deviations and correlations derived from ipsative test scales are not independent and cannot be interpreted and further utilized in the usual way' (p. 154).

They are highly critical of ipsative tests in general, particularly those used in occupational settings. There are many reasons why test constructors prefer ipsative measures usually concerned with social desirability and reliability, but if a test has a firm theoretical base there is no reason to assume it could not easily be used in either an ipsative or a non-ipsative version. Saville & Willson (1991) have defended ipsative tests arguing that they score as well as normative tests, though this issue remains highly debatable. Most psychometricians remain, quite rightly, unconvinced of any benefit of ipsative tests pointing only to their drawbacks. For instance in her celebrated work, Anastasi (1988) noted: |In conclusion, it appears that the forced-choice technique has not proved as effective as had been anticipated in controlling faking or social desirability response sets. At the same time, the forced-choice item format, particularly when it yields ipsative scores, introduces other technical difficulties and eliminates information about absolute strength of individual characteristics that may be of prime importance in some testing situation' (p. 553).

A second problem of the Belbin BTRSPI concerns the way in which the questions are asked. Both first and second versions are arranged in such a way that for each of the seven sections subjects are required to specify their typical behaviour. Thus, for instance, one reads: |When involved in a project with other people ...' or |I gain satisfaction in a job because ...' These situations are vague, inconsistent and do nothing to let the subject know crucial aspects of the nature of their group/team. The precise nature of the task is not set out nor in all instances do the |situations' specifically mention teams. This could easily lead to poor reliability (Argyle, Furnham & Graham, 1981). Presumably, any questionnaire which seeks to detect a person's consistency in role taking will grow progressively more unreliable as the respondent is asked to report upon team roles taken in more and more diverse settings. The prospects for high reliability are best in tests which either repeatedly focus upon a single situation or which assess traits of such generality they are expressed very widely. Belbin (1981) appears unclear on this point.

A third problem concerns the fact that the measure is neither theoretically nor empirically derived. As he explained in his book, Belbin (1981) used standard psychometrically validated measures like the 16PF and the EPI but developed his typology on observatory and inductive, rather than theoretically deductive, means. Whilst this is not unusual in psychology, a problem with this approach lies in the fact that previously well-documented and theoretically important traits, like neuroticism, tend to get overlooked. How, for instance, can be we sure all the team roles are included in the inventory. Frequently, poorly psychometrized tests marketed for human-resource training, appear to neglect |negative' personality traits like neuroticism, widely recognized as a major dimension of personality (Furnham, 1992).

The first and perhaps most important questions about this measure are reliability, validity and dimensionality of the eight team-role scales. The latter is particularly important when a measure purports to measure eight different (though possibly related) team-role responses. It seems from his writing that Belbin (1981) anticipates four factors (team leaders, intellectuals, negotiators, manager-workers), though he never explained how these four factors are themselves interrelated. This paper reports on three experiments testing the hypothesized relationship among the scales, the first examining the original BTRSPI and the second the revised scale. In Expt 1 a group of mainly student subjects completed the measure; in Expt 2, subjects were working adults. Both experiments were primarily concerned with dimensional structure and internal reliability.


This first experiment aimed to examine the internal reliability and structure of the original BTRSP, using the Likert scaling approach.



A total of 102 subjects took part in this study of whom 50 were male and 52 female. They ranged from 19 to 53 years (M = 23-19) and came from a university subject panel. All were unpaid volunteers and native English speakers and 39 per cent were in full-time employment. About 25 were students of geography and the remainder business students. As a matter of course, various standard demographic details were collected on the sample.


The questionnaire was slightly adapted from that of Belbin (1981). In the original questionnaire there are seven sections each with eight items totalling 56 questions. Belbin (1981) required that subjects distribute 10 points in any way they wished between these eight alternatives to each section. In the scoring each item refers to a team role so a total team-role score is made up of the addition of the seven instances of each teamrole preference. Because of this unusual scoring method it makes a factor analysis of the 56 items rather meaningless. However, in the questionnaire used in this study, the only difference introduced was that subjects responded to each item in each section on a nine-point (agree =9, disagree = 1) scale, indicating the extent to which they behaved like that in those seven situations. The current measure is therefore not ipsative unlike the original measure which indicated how they preferred to behave in those situations. Subjects completed the questionnaire at their place of work or study. The task took about 15 minutes and where possible subjects were debriefed.


In order to establish the internal reliability of each team-role preference score alpha coefficients were calculated. Table 1 shows the alphas which range from .34 to .71. Only one is above .70 (recognized as barely acceptable) while three are .50 or below. This suggests the test has seriously inadequate internal reliability though of course there remains a considerable debate over the desirability of internal consistency (Boyle, 1991). Secondly, the relationship between the various team roles was considered. Table 1 also shows the correlations between the eight scales. Of the 28 correlations, all but five are significantly positively correlated suggesting poor discriminant validity. Belbin suggested the highest correlations should be between RI and TW (r = .26), CW and CF (r = .54), ME and PL (r = .24) and CH and SH (r = .40) yet CH and ME correlated (r = .58) and SH and RI (r = .57).


A factor analysis (orthogonal and oblique) of the eight role scores was computed to determine the factor structure. As Gorsuch (1 97 4) has noted, the varimax othogonal solution is most often recommended. He also noted: |Comparing orthogonal and oblique solutions encourages selection of the simple uncorrelated factor model if that is actually relevant' (p. 191). As the difference between the two rotations was not considerable the varimax solution was reported. As can be seen from Table 1, two clear factors emerged accounting for 60 per cent of the variance.

The first factor which accounted for nearly 45 per cent of the variance contained both creative-thinker roles, while the second factor contained mainly company-worker roles and accounted for just over 20 per cent of the variance. The team-leader and creative-thinker roles (with one negotiator role) loaded on the first factor and company-worker roles (with one negotiator and one team-leader role) loaded on the second.


This first experiment provides little psychometric validation of the reliability and factor structure of the measure. Internal consistency (coefficient alpha) is usually recognized to be the most important and efficient way of measuring reliability (Kline, 1986) yet the BTRSPI seems to fall short of an acceptable standard. The factor structure of the questionnaire did not support Belbin's four-factor solution. In fact, two orthogonal factors emerged which were interpretable. They are in fact not dissimilar to the old task vs. socio-emotional leader distinction.

The first experiment had at least two limitations: firstly, it was an analysis of the original measure which has been superseded and may in fact have been used for purposes for which it was not designed. It never purported to be a psychometrized measure yet was clearly meant to be used in organizational settings. Secondly, the subjects were in large part undergraduates who may have had very limited experience of working in teams which may or may not be important. Hence it was decided to do a second experiment using the revised (and possibly improved) Belbin measure to ascertain its psychometric properties. Also it was decided to use working adults. The second version of the scale is not unlike the first although the changes in items approached 50 per cent. In essence the analysis of the second version was to be the same as the first, and focused on internal consistency and factor structure.




One hundred and ten subjects attending various non-management training courses took part in this study. They were aged 17 to 61 years (X = 28.14), 37 subjects were male and 73 female. About two-thirds were employed and of these, their salaries ranged from under 5000 [pounds] to well over 25 000 [pounds]. All reported that they had worked in teams.


The questionnaire was similar to that of the original version; indeed over 50 per cent of the items were identical though many were new. It differed in having nine rather than eight roles, plus one |filler' item (high in social desirability but not measuring a team role). Hence there were the same seven sections but now with 10 items in each. The scoring method was similar. Once again the original measure is ipsative so a nine-point agree-disagree scale was introduced, similar to that used in Expt 1. Subjects completed the questionnaire anonymously at their place of work. It took about 15 minutes and where possible they were debriefed.


The same basic questions were asked as in the first study and the analysis was similar. Table 2 shows the nine different team roles, some almost identical to those in the original version. Once again the alphas are unacceptably low - three are under 0.40, and only one is over 0.70. Comparing Tables 1 and 2 it would appear that four of the team roles appear to approach satisfactory alpha levels, namely monitor-evaluator, completer-finisher, resource-investigator and shaper. However, the two tables are not strictly comparable because the items making up the two scales are not identical.

Table 2 also shows the correlation matrix. Once again most of the team roles are positively and significantly correlated. Plant and specialist seemed to correlate least with other roles, while implementer correlated with seven of the eight other scales and completer correlated positively with eight other roles, three with r > .50.

The various team-role measures were then correlated with 14 different variables including age, sex, education, salary, occupational and marital status, religious affiliation, religious belief, years since promotion and self-ratings on five variables concerning managerial ability. Fewer correlations reached significance than expected by chance indicating that none of the individual difference variables measured in this study correlated with any of the nine team roles.

Table 2 also shows the varimax factor analysis results, which were not unlike those resulting from the orthogonal rotation. Again the total role scores as opposed to the individual items were factored so the subject-to-item ratio was 12.5 to 1. Here three clear factors emerged accounting for well over two-thirds of the variance yet some roles seemed to load on more than one factor (e.g. resource investigator on factor I and factor 3; completer on factors 1 and 2). While it is possible to make some sense out of the factor analysis it certainly does not confirm the structure set out by Belbin in either the first or second version of the inventory. The first factor has five roles loading on it which seem to contain the more serious, hard-working roles in teams. The second factor has two roles loading > .50, namely team-worker and specialist, both sober, dedicated roles, while the third factor seems to involve the two more extroverted divergent-thinking roles.



A criticism could be made of Expts 1 and 2 for not using the scale in the original ipsative form and for Expt 1 having some |non-working' subjects, i.e. those who may never have had the experience of working in teams. Hence a third experiment was conducted using exclusively working adults and ipsative scoring. Results from this experiment allow comparison of at least the internal reliability of normative and ipsative approaches to measurement.




One hundred subjects took part in the study. They were aged 29 to 60+ years and over 80 per cent were male. All were participants of small developmental workshops run by a psychological management consultancy. They were all full-time employees of large private and public organizations and were used to working in teams.


The original questionnaire with standard instructions was used. Subjects completed the questionnaire as part of an exercise on team work.


Table 3 shows that only one of the eight alphas (for shaper) reached the generally accepted >.70 criteria. Unlike the normative version, many of the correlations between the scales were negative. Thus plant and company worker correlate r = -.51 which is in line with Belbin's original theory. Indeed the correlational analysis presented in Tables 1 and 3 are dramatically different, suggesting the effect of the different scoring methods. Most were under .50 and lower than for the normative version. Furthermore, both the factor analysis (of the total scores per role) and correlational analysis proved less clear than results from the normative analysis, in line with Belbin's scheme. However, as Belbin suggested, four factors did emerge even though the loadings did not confirm his original classification. The factor analysis in fact suggested that the factors were bipolar with the roles |plant' and |company worker' being at polar ends of one factor and |shaper' and |team worker' at the end of another. Though these are perfectly understandable, they do not seem to fit in with the theoretical system proposed by Belbin (1981).



The results of these three experiments provide little psychometric support for the structure of Belbin's inventories. Neither the internal reliability nor the factor structure of either inventories (original and revised) give confidence that they could have predictive or construct validity. However, it should be pointed out that the relatively small samples in the experiments mean that the factor structure may well be unstable and there was little overlap between these analyses. Indeed the ipsative and normative methods yielded strikingly different results contrary to the findings of Saville & Willson (1991).

It is clear from both analyses that the 4 (team leader, creative thinkers, negotiators and company workers) X 2 taxonomy first proposed by Belbin could not be supported using data derived from normative or ipsative scaling. Yet some of the team roles are clearly closely related, like shaper and monitor-evaluator, both of which had consistently high internal reliability. These results do not necessarily invalidate Belbin's (1981) |theory' or classification, but they do suggest that the test as it is designed does not yield reliable, internally consistent scores that are related in the way the theory suggests. If the teamrole questionnaire had been theory as well as empirically driven, relying on established psychological processes in correlation and experimental psychology, it is possible that its factor structure would have been clear.

As to the issue of the necessary inferiority of ipsative scoring, this study does not suggest, as Saville & Willson (1991) have noted, that ipsative scoring overestimates reliabilities. In five of the eight team roles the alphas were higher (between .12 and .27) for the normative version. However, it should be pointed out that whichever method was used, all the team-role alphas had unacceptably low alpha coefficients provided, of course, alphas are not seen as merely an index of redundancy (Boyle, 1991).

Supporters and/or users of this inventory will no doubt question the degree to which the results from the non-ipsative form can be applied to the original ipsative form. There are potential problems in converting an ipsatized scale to a non-ipsatized scale and then carrying out reliability and factor analyses. Suffice it to say that there is no theoretical (or indeed empirical) reason why the inventory should be ipsative and there is almost no published evaluative work on the inventory using the ipsative scoring. Ipsative scoring methods constrain subjects in their responses more than the usual Likert method, and because subjects find them difficult, they tend to make more errors.

These results do have implications for team-role profile interpretation. Bearing in mind the low internal consistencies of the scales and high intercorrelations (positive for normative scoring, negative for ipsative) between them, the reliability of differences between scale scores is likely to be negligible. This is important, as profile interpretations implicitly - if not explicitly - involve the analysis of differences between scale scores. For example, the reliability of differences between CO (coordinator) (alpha = .55) and TE (team worker) (alpha = .34), given their intercorrelation of 0.41, would be only 0.06 (assuming equal variances).

It seems from Belbin's (1981) extensive observation on team functioning that he has identified a variety of roles which individuals characteristically choose when working in teams. Furthermore the particular range of roles in a team may well affect the functioning of that team. Yet there remains some doubt, from a psychometric point of view, whether he has been able to provide a reliable measure of these role preferences in either the first or second version of this scale.

This inventory has attracted a good deal of interest and support from management trainers and consultants but little from psychometricians. If either version is to be used in the process of selection it is important that the psychometric properties of the scale are investigated. As reliability sets the upper limit on validity, it seems that the first task of the test constructors is to improve the internal reliability. Secondly, as possible response sets are causing high positive intercorrelations between all the subscales this issue needs to be tackled. Finally, more theoretical and empirical attention needs to be paid to the higher-order structure of the eight team roles.


Adair, J. (1 986). Training for Leadership. London: Macdonald. Anastasi, A. (1988). Psychological Testing. New York: Macmillan. Argyle, H. Furnham, A. & Graham, J. (1981). Social Situations. Cambridge: Cambridge University Press. Belbin, M. (1981). Management Teams. London: Heinemann. Belbin Associates. (1988). Interplace: Matching people to jobs. Cambridge. Boyle, G. (1991). Does item homogeneity indicate internal consistency or item redundancy in psychometric scales? Personality and Individual Differences. 12, 291-294. Furnham, A. (1992). Personality at Work: The Role of Individual Differences in the Work Place. London: Routledge. Gorsuch, R. (1974). Factor Analysis. London: Saunders. Handy, C. (1985). Understanding Organisations. Harmondsworth: Penguin. Hogg, C. (1990). Team building. Personnel Management Factsheet. Johnson, C., Wood, R. & Blinkhorn, S. (1988). Spuriouser and spuriouser: The use of ipsative personality tests. Journal of Occupational Psychology. 61, 153-161. Kline, P. (1986). A Handbook of Test Construction. London: Methuen. McCann, R. & Margerison, C. (1989). Managing high-performing teams. Training and Development Journal, 11, 53-60. Myers, I. & McCaulley M. (1985). Manual: A Guide to the Development and Use of the Myers-Briggs Type Indicator. Palo Alto, CA: CPP. Saville, P. & Willson, E. (1991). The reliability and validity of normative and ipsative approaches to the measurement of personality. Journal of Occupational Psychology, 64, 219-230.
COPYRIGHT 1993 British Psychological Society
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1993 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Furnham, Adrian; Steele, Howard; Pendleton, David
Publication:Journal of Occupational and Organizational Psychology
Date:Sep 1, 1993
Previous Article:Personality and job competences: the criterion-related validity of some personality variables.
Next Article:A reply to Belbin Team-Role Self-Perception Inventory by Furnham, Steele and Pendleton.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters