# Vocational qualifications in Britain and Europe: theory and practice.

VOCATIONAL QUALIFICATIONS IN BRITAIN AND EUROPE: THEORY AND PRACTICE
(1)

This Note considers three questions bearing on the reform of vocational qualifications in Britain, against the background of changes being introduced by the National Council for Vocational Qualifications. First, in what important respects did Britain need a reformed and centrally-standardised system of vocational qualifications? Secondly, what are the proper criteria for choosing between alternative methods of awarding qualifications? Much that is at issue hinges on the relative importance of externally-marked written tests as compared with practical tasks assessed by an instructor; the discussion and conclusions reached here in relation to vocational testing apply in large measure also to current debates in other contexts, such as the proper role of teacher-assessed coursework in school examinations at 16+ (GCSE) and the official teacher-assessment of pupils at age 7 (SATs) currently being administered in British schools for the first time. Our third question is: in what significant ways do Continental systems of awarding qualifications differ from those now proposed for Britain? (2)

Need for a standardised system

It is now accepted on all sides that Britain needs more of its workforce to be vocationally trained to intermediate levels; that is to say, to craft or technician standards as represented, for example, by City and Guilds examinations (at part 2) or BTEC National Certificates and Diplomas. In engineering, building and related trades there has for long been a system for the award of qualifications that has worked more or less satisfactorily; indeed, the City and Guilds system established at the end of the last century was in many ways an internationally admired pioneer, and its syllabuses and examinations were followed, and are still followed, in many parts of the world. In other occupations, such as office work or retailing, a variety of qualifying bodies grew up in Britain - such as the Royal Society of Arts, the London Chamber of Commerce and Industry, Pitman's, the Institute of Drapers - which developed (what has been called) a |jungle' of qualifications at a variety of unco-ordinated levels. In many other occupations in Britain there was no system of qualifications at all.

On the other hand, in Germany - but also, for example, in France, Austria, Switzerland and the Netherlands - vocational qualifications and associated part-time or full-time courses were developed which covered virtually the whole range of occupations in the economy. The qualifications awarded usually at ages 18-20 at the end of these vocational courses - the Berufsschulabschluss in Germany and the Certificat d'aptitude professionnelle in France - are as widely understood as, say, O-level passes were recently in Britain (the narrower and clearer range of attainments encompassed by an O-level pass make it a more appropriate standard of comparison than the new GSCE, with the very wide range of attainments spanned by its awards).

What was essentially wrong in Britain with engineering and building qualifications was that too few people took them - but I believe there was nothing fundamentally wrong with the qualification-procedure itself. For the rest of the economy there was a serious need (a) to make the system coherent, so that equivalent levels could more easily be recognised; and (b) to expand the occupational coverage. These two objectives - greater recognisability and expansion of coverage - are of course to some extent linked. Greater recognisability should lead to greater marketability, reduced transaction costs in the labour market, and to greater demand for qualifications and skills both by employers and by trainees. The benefits to be expected are similar to those ensuing from |hallmarking'. There are also economies of scale in organising training programmes, and in specifying standards and certification-procedures for a limited number of defined training-occupations at defined levels. Something is of course lost in standardising and restricting the number of training-occupations and levels: just as something is lost in not having a suit made to measure; but, it hardly needs saying, manufacturing to standardised sizes enables many more to buy a decent suit.

Criteria for vocational qualifications

There has always been debate on the relative roles of theory and practice in general education; that debate has been at least as vigorous in relation to vocational education and the award of vocational qualifications. The unsatisfactory extremes of relying solely on |time-serving' or solely on |pencil-and-paper' tests have often been contrasted as the basis for the award of vocational qualifications.

In general, it is clear that all procedures for the award of qualifications can provide no more than imperfect indicators of future capability. Before describing how qualifications are awarded in practice, let us for a moment consider the issues in an entirely theoretical way, with the aid of some basic statistical mathematics. Suppose we wish to estimate the capability of a person, not simply in relation to what he has done so far, but in relation to what he is likely to be able to do in the future under similar, but not identical, circumstances to those encountered in the past; for example, the quality of materials may alter, designs may alter, the type of person under whom (or with whom) he will be working may alter. Let [xi] denote his true, but yet unobserved, capability in the future; and let x denote his performance as measured by some test-procedure based on his past performance in specimen tasks. Without affecting the argument, these can both be considered as multi-dimensional - relating, for example, to speed of work, accuracy of work, cleanliness, etc. In choosing amongst alternative test-procedures we have to accept, as said, that none will be wholly accurate; and we have also to accept that testing is an expensive process, and only limited resources can be devoted to it.

The expected total discrepancy between test and actual performance can be divided into two components. In statisticians' terms they correspond to bias and variance; in educationists' terms they correspond, respectively, to Validity and Reliability (3). In detail - when choosing between alternative estimators, or between alternative test-procedures, we wish to minimise:-

(a) the bias: that is, in a sufficiently large number

of repeated applications we would like the

expected value to correspond to the true value;

that is, we wish to minimise

E {x} - [xi]; and

(b) the variance: as between alternative test-procedures

which were equally satisfactory from

the point of view of their bias, we choose the one

that has the minimum variability in repeated

applications (whether by different examiners, or

on different samples of questions or tasks); that

is, we choose the alternative which yields the

minimum value of

E[{(x - E{x})}.sup.2].

These two components contribute to the total discrepancy between test and actual performance as follows: -

E{[(x - [xi]).sup.2} = E{[(x - E{x}).sup.2]} + [[E{x} - [xi]].sup.2]

i.e. Total mean- square-error = Variance + [(Bias).sup.2]

= Reliability + [(Validity).sup.2].

In contrasting written and practical testing of vocational capability, it is widely agreed that written tests have greater Reliability in the sense that different external examiners would give much the same marks if they independently examined a group of candidates. On the other hand, it is argued by those of |modern' views, written tests have a lower Validity since they are applied under |artificial' examination conditions, and do not test what the candidate actually does in the course of his work; on that view, the greatest Validity attaches to the assessment of practical tasks carried out by the candidate in a workplace environment, preferably in the course of his normal work and assessed by his normal workplace supervisor. Any lower Reliability of such procedures resulting from the supervisor knowing his own trainee or for any other reason, it has sometimes incautiously been suggested, is of no consequence (4).

The view in favour of giving great weight to written testing can perhaps be summarised as follows. First, any argument that bases itself on the notion that Validity (ie lack of bias) is all that matters, is essentially wrong. We need to be concerned with the total expected error associated with a qualification-procedure (ie Validity plus Reliability); we are likely to be misled if we focus on only one component. Secondly, if in reality there was a relation such that procedures of high Reliability had low Validity, and vice versa, then that relation needs careful empirical research. The relation is likely to vary from one occupation to another; for example, it is likely to depend on the relative importance in each occupation of applied craft tasks and of planning tasks. Thirdly, we have to take into account the costs of testing. One simple rule seems to hold very widely, namely, that pencil-and-paper tests are quicker and cheaper to administer than assessing practical tasks; consequently, written tests method can examine a much wider range of activities per unit of resources devoted to certification.

A simple example may not be out of place. A carpenter or mechanical fitter needs to know which type of metal screw to choose for each job; screws come in a myriad of different lengths, diameters, threads, heads (flat, round, Phillips, etc.); and in different metals (brass, steel, chrome, etc.). If a final assessment of capabilities had to wait till the candidate had used each type in the course of his normal work in front of his supervisor, and had done so properly on a sufficient proportion of repeated occasions, it would take a very, very long time for him to be judged as qualified. On the other hand, a few specimen written questions, such as: -

Which kind of screw would you use for fitting a

mirror to a bathroom wall, and why?

would only take a few minutes. By testing that the candidate knows why he is doing what he is doing, and not merely observing that he is doing it correctly, we attain greater confidence that he can operate under the variety of different circumstances that arise in practice. Notice also that he needs to be tested not only on which is the right type of screw, but, if that type is not readily available, he needs to know which of the available alternatives are acceptable, even if less than ideal; and he also needs to know which are not acceptable, even if the customer would not immediately twig. The case for testing knowledge, and not merely observing practice, is thus a strong one even for the simplest of tasks.

But let us return to the costs of testing. Inevitably only a sample of relevant knowledge and skills can be tested. Otherwise not only would the direct costs of examination become excessive, but so would the indirect costs; as HMI recently noted, the new NVQ assessment procedures have already |encroached on the time available for teaching and learning'. This was said in relation to engineering qualifications; but HMI were probably also influenced by the example they quoted of the hairdressing NVQ which involves a |1000 task checklist'.(5)

Clearly, the greater the number of test-items, the greater can be our confidence in the final verdict - but also the greater is the cost. By the familiar statistical rule, a doubling in the required precision requires a quadrupling in the number of test-items. This rule applies if the observations are independent; if they are correlated and, so to speak, to some extent they test the same capability in another way, then more than a quadrupling will be necessary (it may not even be possible to double the precision after a certain point). In other words, the way the first ten questions or ten tasks are dealt with by the candidate tells us a great deal about him; if we aimed to double our confidence in our judgment we are likely to require many more than forty questions or tasks. Further, because practical tests are so very much more expensive than written tests, both in administration and marking, it is efficient when working within a limited budget to allocate more questions to written tests than to practical tests. A complex balancing exercise is thus involved in the economic design of test-procedures; it is not surprising that in reality they are developed only slowly over the years, often with step-by-step experimentation, and require an intimate knowledge of the details of each occupation. To illustrate the complexity of what is involved, the mathematical principles governing the optimal mix of theoretical and practical tests are developed in the Appendix below; to derive orders of magnitude, some examples are worked out there under plausible assumptions regarding relative costs, precision, and intra-correlations. It appears, for example, that even in craft-occupations in which practical aspects might account for, say, three-quarters of the total required skills (and theoretical aspects account for one quarter), it may be efficient to allocate only one quarter of the total budget to practical tests and three-quarters to written theoretical tests. Other assumptions may yield different mixes; but it is clear that the cost factor provides strong rational ground for carrying out so much vocational testing in a written form.

European practice

We are now ready to outline how vocational qualifications are awarded in Continental Europe, concentrating on those aspects which differ from Britain under the principles promoted by NCVQ. The number of accredited training-occupations, and the number of associated vocational qualifications, is limited to just under 400 in Germany, France and the Netherlands; in Britain, under NCVQ arrangements which give priority to the views of employers, much larger numbers of approved qualifications are likely to emerge - perhaps running into very many thousands. The Continental approach yields a smaller total because breadth is demanded on all approved training courses in the interests of nationwide standards and the transferability of skills; the British approach emphasises the tailoring of qualifications to meet as far as possible the varying needs of employers. For example, to gain an NVQ in engineering (at level 3) it is proposed that the candidate should be able to choose any six so-called Segments out of an available menu of some 250. If there were no restriction on choice, this would yield a theoretical maximum number of combinations in excess of a thousand billion ([10.sup.12])! In practice a limited number of favoured combinations will no doubt be settled upon, though larger than under the previous arrangements developed by the Engineering Industry Training Board (6). While closely-tailored specialisation may encourage British employers to provide training facilities, and encourage them to contribute to the finance of training, it is not obvious that carrying the process to such an extreme promotes an adaptable economy, nor is it in the long-term interest of employees who need to be able to change their type of work.

Let us now consider the length of course and qualification procedures; for the sake of brevity we focus on the German system. The main vocational qualification in Germany is usually awarded following three years' of apprenticeship, combined with day-release at college; that is, at about age 18 or 19. The qualification is not based on length of college attendance (this has to be emphasised, as there has been some misunderstanding), but on success in final examinations; the length of college attendance is reduced for those who have higher initial qualifications, or extended for those who have failed their examinations (there is a limit to the number of times that an examination can be repeated - usually only once or twice.) Examinations are both theoretical (written) and practical; they are taken at the end of the apprenticeship period, and also at an interim stage. The written examinations cover vocational and general subjects (mathematics, language, social studies, etc.); they are externally set and externally marked. About half a dozen papers are usually involved; the inclusion of general subjects indicates clearly that vocational education is intended to be a form of continuing education (comparable to the philosophy of the previous |Continuation Colleges' in England). The practical examination may extend for more than a whole day (for example in building work) and usually includes an oral test; it is also externally marked, usually by three examiners, none of whom is permitted to know the examinee.

An independent jury (used in French to mean a board of examiners) is also regarded as an essential feature of the French vocational qualification system, to preserve the |authenticity of the procedures' and to avoid suspicions that diplomas are handed out for improper reasons - |pour faire plaisir aux gens' (as pointed out in a recent article on the French system) (7).

In addition to attaining passes in both written and practical examinations, the German trainee has to produce a satisfactory record of completion of the centrally-specified list of tasks to be carried out as part of his apprenticeship.

It is this last requirement which has become the over-riding element in the NVQ approach. A vast, highly expensive and (in my view) largely unnecessary re-specification of existing qualifications has been demanded by NCVQ in order to re-express them in term of basic |can do' tasks, to be carried out as far as possible in front of a supervisor at work; this has affected established and highly experienced organisations, such as City and Guilds, BTEC, EITB, etc. I have no doubt that the satisfactory execution of a sample of practical tasks is an important part of a sensible qualification process. But it is, I hope, evident from what has been said here that the German and French requirements and safeguards, as listed above, are sensible and efficient; and that they are particularly important in ensuring reliability of the qualification - so promoting a more efficient market mechanism for the allocation of scarce skills, and thereby promoting the acquisition of yet higher skill-levels.

What has actually been put into effect by those responsible for vocational qualifications in other successful economies, as well as in successful training sectors here, should - in my view - provide a better guide to what is generally required in this country today than reliance on newly-formulated theoretical principles - at least in the present state of knowledge, and until much detailed empirical research has been carried out on the relevant parameters (of the type defined in the Appendix). Perhaps in the light of experience, the authorities in this country will yet re-consider what are the right principles governing a national system of vocational qualifications; and that European experience will be judged to provide important and relevant lessons.

Mathematical Appendix

ON THE OPTIMUM MIX OF PRACTICAL AND THEORETICAL TESTS

This appendix is concerned with the principles governing the optimum mix of different types of tests in the award of a qualification. The problem arises because, for example, written (|theoretical') tests and practical tests can be administered and marked at typically very different unit-costs, and marked with different degrees of precision. For a given required degree of precision in the combined final mark, and for a given total of resources devoted to testing, it is consequently advantageous to over-represent those types of test that have a lower unit-cost, after allowing for differences in their relative precision; the marks on the constituent tests have of course to be weighted to reflect the a priori required relative importance of the different types of skill represented by the different types of test (8).

For simplicity we need consider just two types of test-questions, theoretical and practical, denoted by subscripts t and p respectively. Suppose there are [n.sub.t] theoretical questions on which a candidate obtains an average mark of [M.sub.t]; and that mark [x.sub.it] for each theoretical question has an equal measure of uncertainty attached to it (because of errors of measurement, etc.), denoted by its sampling error [sigma.sub.t]. The sampling error of the average [M.sub.t] is then, in the usual way, [Mathematical Expression Omitted]. Corresponding symbols ([n.sub.p],[x.sub.ip], [M.sub.p], ...) relate to the practical questions.

The average mark M awarded on the combined tests is a weighted average

M = [w.sub.t][M.sub.t] + [w.sub.p][M.sub.p], (1)

where [M.sub.t] = [epsilon] [x.sub.it]/[n.sub.t], [M.sub.p] = [epsilon [x.sub.ip]/[n.sub.p], and [w.sub.t] + [w.sub.p] = 1. Practical skills are clearly more important in craft occupations and, roughly speaking, we may suppose that [w.sub.p] might be 3/4 and [w.sub.t] might be 1/4; for technician occupations those weights might be reversed. Let us next assume that a practical question costs k times as much to administer and mark as a theoretical question; it may not be unreasonable to suppose that k lies in the range 10-100, the higher ratio applying if comparison is between practical and written multiple-choice questions (perhaps higher still if the multiple-choice questions are marked mechanically). The total budget available for testing, measured in terms of unit-costs for theoretical questions, is thus

B = [n.sub.t] + [kn.sub.p]. (2)

Let us also allow for the possibility that the degree of uncertainty attached to the marking of a practical question differs from that attached to a theoretical question by a factor s, that is

[[sigma.sub.p] = s [[sigma.sub.t]; (3)

because of the element of judgment in assessing practical tasks, s is probably greater than 1, but perhaps not greater than 2.

The variance of the total mark, M can then be derived in the usual way, allowing for (3), to give

[Mathematical Expression Omitted]

Our problem is to choose [n.sub.t] and [n.sub.p] so as to minimise this variance subject to the budget constraint (2). Following the method of Lagrange multipliers, we minimise

V{M} - [lambda][B - [n.sub.t] - [kn.sub.p]] (5)

by partial differentiation with respect to [n.sub.t] and [n.sub.p]; this yields

[Mathematical Expression Omitted]

and

[Mathematical Expression Omitted]

Combining those last two equations and eliminating [lambda], we find

[Mathematical Expression Omitted]

that is, the ratio of theoretical to practical test-items should reflect their relative importance in the final marking criteria, but modified by the factor [Mathematical Expression Omitted] which reflects the relatively greater cost of practical tests and their different reliability.

By way of example we take the specimen values for the parameters mentioned above. For craft-type occupations, we assume that practical skills form three-quarters of the total ([w.sub.p] = 0.75), that a question on a practical test costs ten times as much to administer and mark as a theoretical question (k = 10), and that is marked with equal precision (s = 1); the optimum combination is then approximately equal numbers of practical and theoretical questions (rather than 3:1, as might be suggested by the occupational skill-mix). If practical questions cost 100 times more to mark than theoretical questions, and were two-thirds as precise in their marking (k = 100, s = 1.5), then practical questions should optimally form only 31 per cent of the total. For technician-type occupations, we assume the converse proportions (ie, [w.sub.p = 0.25), and practical questions should account for only 10 or 5 per cent of the total number on the alternative assumptions mentioned.

Lack of independence in marking

We now move to more complex matters. So far we have treated each of a series of test-questions as providing independent information on a candidate's capabilities. In practice a positive correlation is to be expected amongst the |errors of measurement' in the marks for different questions awarded to any candidate. It arises, for example, because a candidate is examined on a particular but perhaps unrepresentative day; or because he is marked by a particular examiner, with particular views, or particular personal sympathies or antipathies to that candidate. (There is also a formally analogous problem that arises if we are interested in estimating the average mark for a whole class of pupils; in that case the correlation amongst the marks - rather than amongst the errors in the marks - is relevant.) This correlation has consequences for the extent to which an increase in the length of a test (adding to the number of questions) improves the precision of our knowledge of a candidate's capabilities; and for the optimum mix of theoretical and practical tests.

Because of the complexity of the issues, it is best to begin again with the simpler case of one type of test, and to set out assumptions more explicitly. A candidate has to answer n questions on each of which he is awarded a mark [x.sub.i] (i = 1, 2,..., n). His final mark for the test is the average of those marks.

M = [epsilon] [x.sub.i]/n. (7)

We suppose that a great many such questions are available (say, from a computerised data-bank of questions), that any number can be chosen at random, and that the marks are subject to errors which are not independent. We are interested in the rate at which the precision of the average mark rises as we increase the number of questions, that is, the rate at which the variance of M falls as n increases.

We suppose for analytical simplicity that the uncertainty attached to the mark for each question is the same, so that the covariance between any two marks is

V{[x.sub.1]} = V{[x.sub.2]} ... = [o.sup.2]; (8)

we also assume that the correlation (9) between the errors attached to the marks on any two questions is the same, p, so that

C{[x.sub.i], [x.sub.j]} = [p[sigma].sup.2] (9)

To derive the variance of M we square (7)

[Mathematical Expression Omitted]

where the first summation extends over n squared terms, and the second (double-summation) relates to n(n - 1) cross-products. Taking expectations yields

V{M} + [[no.sup.2] + n(n - 1)[[rho][sigma].sup.2]]/[n.sup.2]

= [[sigma].sup.2][(1/n) + [rho] (1 - 1/n]; (10)

the standard error of the average mark is thus

[Mathematical Expression Omitted]

There are two familiar extreme cases. If the observations are truly independent, that is if [rho] = 0, we derive the usual formula for the sampling error of an average, [Mathematical Expression Omitted]; in this case an increase in the number of questions leads ultimately to complete precision. The other extreme is given by [rho] = 1, for example if each candidate obtains the same score on every question (but not the same score as other candidates). In such a situation more questions will not change the average mark for any candidate; equation (11) thus shows correctly that the standard error of the average mark does not vary with n. The notable feature of the general case (where [rho] lies between 0 and 1) is that an indefinite increase in the number of questions does not lead to ultimate complete precision, but only to a finite asymptotic level [Mathematical Expression Omitted]. The adjacent table shows that a correlation as low as 0.1 implies negligible improvement in precision after, say, the first forty questions. The reason may be put intuitively as follows. Each additional question may have only a low correlation with any single previous question; but, considered in relation to the whole succession of previous questions - say, all marked by the same examiner - it adds little fresh knowledge. To improve precision radically, it may be better to add independent examiners rather than additional questions.

We are now ready to return to our central problem, namely, how to allocate a limited budget between practical and theoretical questions, taking account of the possibility of correlation between questions. We must allow for three types of correlation: between one theoretical question and another theoretical question; between one practical question and another practical question; and between a theoretical and a practical question. These are devoted by [[rho].sub.t], [[rho].sub.p] and [[rho].sub.pt]. Proceeding as previously, applying the result of (10) to (4) and including the covariance term (10), we derive the variance of the combined mark on both theoretical and practical questions as

[Mathematical Expression Omitted]

Taking the budget constraint into account as in (5), we eventually derive a modified optimal condition

[Mathematical Expression Omitted]

this differs from (6) by the factor [Mathematical Expression Omitted]. Notice that [[rho].sub.pt] does not affect the optimal mix.

To illustrate the impact of this factor, let us suppose that theoretical questions can be formulated and marked in a moderately independent way, so that their correlation is as low as [[rho].sub.t] = 0.1; but that practical questions have a considerably greater correlation - say, because of the inevitably closer contact between examiner and candidate - so that [[rho].sub.p] = 0.5. Taking the assumptions for craft-type occupations made previously ([w.sub.p] = 0.75, k = 100, s = 1.5), this leads to an optimal requirement for practical questions to form only 25 per cent of the total testing-budget, instead of the 31 per cent if the interclass correlations are zero, or the 75 per cent that practical requirements have in the posited total mix of skills.

There is space here to do no more than mention two further issues thatarise in testing and which might benefit from analysis on the above approach. First, the optimum bunching of questions around specified levels in |criterion referenced' tests (for example, can the candidate drive a car satisfactorily?). Secondly, the appropriate margin of safety in balancing the risks of failing those who ought to be passed (because the assessment was carried out on a |bad day'), against the risks attached to passing someone who ought to fail (he was assessed on a |good day', but he suffers from an unusually large number of |bad days').

NOTES

( 1) Originally presented at a seminar at the University of Warwick on 19 March 1991; revised with

the benefit of discussion

there, and subsequently with my colleagues at the National Institute. It develops ideas

previously put forward in a Note

in this Review, August 1989. The underlying research forms part of a wider programme of

international comparisons of

training, education and productivity supported by the Economic and Social Research Council and

the Gatsby

Foundation, to which bodies my thanks are due. Responsibility for errors remains my own. ( 2) Strictly speaking NCVQ applies only to England and Wales, and Scotland comes under a separate

body (Scotvec); the

issues are however much the same, and nothing of substance is sacrificed if, for convenience of

exposition, we refer

throughout simply to |Britain'. ( 3) Capitals are attached to these words to indicate their technical connotation here. ( 4) For an extreme statement (|we should just forget altogether') by NCVQ's Director of Research,

see G Jessup,

Outcomes: NVQs and the Emerging Model of Education and Training (Falmer, 1991), p. 191. Similar

views are to be

detected in earlier publications from the Training Agency of the Department of Employment

in their

Guidance Notes for

the Development of Assessable Standards for National Certification (Sheffield, 1989); see, for

example, the remark on

the merits (sic) of oral questioning: |it does not require candidates to be able to read or

write' (Guidance Note 5, p.7) Of

course someone may be considered a capable carpenter for many purposes without being able to

write; the Continental

view would be that an employer who wished to employ him as a carpenter is permitted to do so,

but he should not be

awarded a Vocational Qualification. On the other hand, NCVQ would be prepared to award a

Qualification. One of the

dangers of the latter approach is that vocational qualifications will acquire a cumulatively

lower status in Britain

(|suitable for illiterates'), whereas on the Continent great pains have been taken to enhance

their esteem. ( 5) HMI, National Vocational Qualifications in Further Education 1989-1990, DES, 1991, pp. 6, 8. ( 6) Progress seems to require ever finer grinding. Previously, two Modules were the requirement for

a craft qualification in

engineering; a Module was then divided into three Segments. On the latest development each

Segment is to be divided

into an average of four Elements (yielding a total of about a thousand Elements). At the time

of writing it seems that

extensive negotiations are in progress with NCVQ in several occupational areas. |Conditional

Accreditation' has been

granted by NCVQ for some existing qualifications, so that government training subsidies may

immediately be received

by the industry concerned pending agreement on the longer-term re-structuring of their

qualification-procedures in

accordance with NCVQ's principles. ( 7) E Kirsch, Formation Emploi, Oct-Dec 1990, p. 13. ( 8) From a formal point of view the mathematical development that follows in this Appendix is, in

essence, no more than an

application of the standard theory of stratified and clustered sampling; but I am not aware

that it has previously been

applied in this context (the algebra here concentrates on the essentials required in the

present application and is, I

hope, simpler to follow than provided in general texts on sampling theory). ( 9) This corresponds to the intraclass correlation which arises |mainly in biological studies' (G U

Yule and M G Kendall, An

Introduction to the Theory of Statistics, Griffin, London, 14th edition, 1950, p.272; the

charming application to variations

in the length of cuckoos' eggs according to nest of foster parent - Robin, Wren or Hedge

Sparrow - will bring joy to

many a scientific heart: ibid., p.280, based on a a study in Biometrika, 1905). For its

application in cluster sampling see,

for example, M H Hansen, W N Hurwitz and W G Madow, Sample Survey Methods and Theory (Wiley,

1953), vol. II,

ch. 6. (10) There are 2 [n.sub.t][n.sub.p] covariance terms of the type [x.sub.it][x.sub.jp] which, on

taking expected values, reduce to the simple final term

in (12).

This Note considers three questions bearing on the reform of vocational qualifications in Britain, against the background of changes being introduced by the National Council for Vocational Qualifications. First, in what important respects did Britain need a reformed and centrally-standardised system of vocational qualifications? Secondly, what are the proper criteria for choosing between alternative methods of awarding qualifications? Much that is at issue hinges on the relative importance of externally-marked written tests as compared with practical tasks assessed by an instructor; the discussion and conclusions reached here in relation to vocational testing apply in large measure also to current debates in other contexts, such as the proper role of teacher-assessed coursework in school examinations at 16+ (GCSE) and the official teacher-assessment of pupils at age 7 (SATs) currently being administered in British schools for the first time. Our third question is: in what significant ways do Continental systems of awarding qualifications differ from those now proposed for Britain? (2)

Need for a standardised system

It is now accepted on all sides that Britain needs more of its workforce to be vocationally trained to intermediate levels; that is to say, to craft or technician standards as represented, for example, by City and Guilds examinations (at part 2) or BTEC National Certificates and Diplomas. In engineering, building and related trades there has for long been a system for the award of qualifications that has worked more or less satisfactorily; indeed, the City and Guilds system established at the end of the last century was in many ways an internationally admired pioneer, and its syllabuses and examinations were followed, and are still followed, in many parts of the world. In other occupations, such as office work or retailing, a variety of qualifying bodies grew up in Britain - such as the Royal Society of Arts, the London Chamber of Commerce and Industry, Pitman's, the Institute of Drapers - which developed (what has been called) a |jungle' of qualifications at a variety of unco-ordinated levels. In many other occupations in Britain there was no system of qualifications at all.

On the other hand, in Germany - but also, for example, in France, Austria, Switzerland and the Netherlands - vocational qualifications and associated part-time or full-time courses were developed which covered virtually the whole range of occupations in the economy. The qualifications awarded usually at ages 18-20 at the end of these vocational courses - the Berufsschulabschluss in Germany and the Certificat d'aptitude professionnelle in France - are as widely understood as, say, O-level passes were recently in Britain (the narrower and clearer range of attainments encompassed by an O-level pass make it a more appropriate standard of comparison than the new GSCE, with the very wide range of attainments spanned by its awards).

What was essentially wrong in Britain with engineering and building qualifications was that too few people took them - but I believe there was nothing fundamentally wrong with the qualification-procedure itself. For the rest of the economy there was a serious need (a) to make the system coherent, so that equivalent levels could more easily be recognised; and (b) to expand the occupational coverage. These two objectives - greater recognisability and expansion of coverage - are of course to some extent linked. Greater recognisability should lead to greater marketability, reduced transaction costs in the labour market, and to greater demand for qualifications and skills both by employers and by trainees. The benefits to be expected are similar to those ensuing from |hallmarking'. There are also economies of scale in organising training programmes, and in specifying standards and certification-procedures for a limited number of defined training-occupations at defined levels. Something is of course lost in standardising and restricting the number of training-occupations and levels: just as something is lost in not having a suit made to measure; but, it hardly needs saying, manufacturing to standardised sizes enables many more to buy a decent suit.

Criteria for vocational qualifications

There has always been debate on the relative roles of theory and practice in general education; that debate has been at least as vigorous in relation to vocational education and the award of vocational qualifications. The unsatisfactory extremes of relying solely on |time-serving' or solely on |pencil-and-paper' tests have often been contrasted as the basis for the award of vocational qualifications.

In general, it is clear that all procedures for the award of qualifications can provide no more than imperfect indicators of future capability. Before describing how qualifications are awarded in practice, let us for a moment consider the issues in an entirely theoretical way, with the aid of some basic statistical mathematics. Suppose we wish to estimate the capability of a person, not simply in relation to what he has done so far, but in relation to what he is likely to be able to do in the future under similar, but not identical, circumstances to those encountered in the past; for example, the quality of materials may alter, designs may alter, the type of person under whom (or with whom) he will be working may alter. Let [xi] denote his true, but yet unobserved, capability in the future; and let x denote his performance as measured by some test-procedure based on his past performance in specimen tasks. Without affecting the argument, these can both be considered as multi-dimensional - relating, for example, to speed of work, accuracy of work, cleanliness, etc. In choosing amongst alternative test-procedures we have to accept, as said, that none will be wholly accurate; and we have also to accept that testing is an expensive process, and only limited resources can be devoted to it.

The expected total discrepancy between test and actual performance can be divided into two components. In statisticians' terms they correspond to bias and variance; in educationists' terms they correspond, respectively, to Validity and Reliability (3). In detail - when choosing between alternative estimators, or between alternative test-procedures, we wish to minimise:-

(a) the bias: that is, in a sufficiently large number

of repeated applications we would like the

expected value to correspond to the true value;

that is, we wish to minimise

E {x} - [xi]; and

(b) the variance: as between alternative test-procedures

which were equally satisfactory from

the point of view of their bias, we choose the one

that has the minimum variability in repeated

applications (whether by different examiners, or

on different samples of questions or tasks); that

is, we choose the alternative which yields the

minimum value of

E[{(x - E{x})}.sup.2].

These two components contribute to the total discrepancy between test and actual performance as follows: -

E{[(x - [xi]).sup.2} = E{[(x - E{x}).sup.2]} + [[E{x} - [xi]].sup.2]

i.e. Total mean- square-error = Variance + [(Bias).sup.2]

= Reliability + [(Validity).sup.2].

In contrasting written and practical testing of vocational capability, it is widely agreed that written tests have greater Reliability in the sense that different external examiners would give much the same marks if they independently examined a group of candidates. On the other hand, it is argued by those of |modern' views, written tests have a lower Validity since they are applied under |artificial' examination conditions, and do not test what the candidate actually does in the course of his work; on that view, the greatest Validity attaches to the assessment of practical tasks carried out by the candidate in a workplace environment, preferably in the course of his normal work and assessed by his normal workplace supervisor. Any lower Reliability of such procedures resulting from the supervisor knowing his own trainee or for any other reason, it has sometimes incautiously been suggested, is of no consequence (4).

The view in favour of giving great weight to written testing can perhaps be summarised as follows. First, any argument that bases itself on the notion that Validity (ie lack of bias) is all that matters, is essentially wrong. We need to be concerned with the total expected error associated with a qualification-procedure (ie Validity plus Reliability); we are likely to be misled if we focus on only one component. Secondly, if in reality there was a relation such that procedures of high Reliability had low Validity, and vice versa, then that relation needs careful empirical research. The relation is likely to vary from one occupation to another; for example, it is likely to depend on the relative importance in each occupation of applied craft tasks and of planning tasks. Thirdly, we have to take into account the costs of testing. One simple rule seems to hold very widely, namely, that pencil-and-paper tests are quicker and cheaper to administer than assessing practical tasks; consequently, written tests method can examine a much wider range of activities per unit of resources devoted to certification.

A simple example may not be out of place. A carpenter or mechanical fitter needs to know which type of metal screw to choose for each job; screws come in a myriad of different lengths, diameters, threads, heads (flat, round, Phillips, etc.); and in different metals (brass, steel, chrome, etc.). If a final assessment of capabilities had to wait till the candidate had used each type in the course of his normal work in front of his supervisor, and had done so properly on a sufficient proportion of repeated occasions, it would take a very, very long time for him to be judged as qualified. On the other hand, a few specimen written questions, such as: -

Which kind of screw would you use for fitting a

mirror to a bathroom wall, and why?

would only take a few minutes. By testing that the candidate knows why he is doing what he is doing, and not merely observing that he is doing it correctly, we attain greater confidence that he can operate under the variety of different circumstances that arise in practice. Notice also that he needs to be tested not only on which is the right type of screw, but, if that type is not readily available, he needs to know which of the available alternatives are acceptable, even if less than ideal; and he also needs to know which are not acceptable, even if the customer would not immediately twig. The case for testing knowledge, and not merely observing practice, is thus a strong one even for the simplest of tasks.

But let us return to the costs of testing. Inevitably only a sample of relevant knowledge and skills can be tested. Otherwise not only would the direct costs of examination become excessive, but so would the indirect costs; as HMI recently noted, the new NVQ assessment procedures have already |encroached on the time available for teaching and learning'. This was said in relation to engineering qualifications; but HMI were probably also influenced by the example they quoted of the hairdressing NVQ which involves a |1000 task checklist'.(5)

Clearly, the greater the number of test-items, the greater can be our confidence in the final verdict - but also the greater is the cost. By the familiar statistical rule, a doubling in the required precision requires a quadrupling in the number of test-items. This rule applies if the observations are independent; if they are correlated and, so to speak, to some extent they test the same capability in another way, then more than a quadrupling will be necessary (it may not even be possible to double the precision after a certain point). In other words, the way the first ten questions or ten tasks are dealt with by the candidate tells us a great deal about him; if we aimed to double our confidence in our judgment we are likely to require many more than forty questions or tasks. Further, because practical tests are so very much more expensive than written tests, both in administration and marking, it is efficient when working within a limited budget to allocate more questions to written tests than to practical tests. A complex balancing exercise is thus involved in the economic design of test-procedures; it is not surprising that in reality they are developed only slowly over the years, often with step-by-step experimentation, and require an intimate knowledge of the details of each occupation. To illustrate the complexity of what is involved, the mathematical principles governing the optimal mix of theoretical and practical tests are developed in the Appendix below; to derive orders of magnitude, some examples are worked out there under plausible assumptions regarding relative costs, precision, and intra-correlations. It appears, for example, that even in craft-occupations in which practical aspects might account for, say, three-quarters of the total required skills (and theoretical aspects account for one quarter), it may be efficient to allocate only one quarter of the total budget to practical tests and three-quarters to written theoretical tests. Other assumptions may yield different mixes; but it is clear that the cost factor provides strong rational ground for carrying out so much vocational testing in a written form.

European practice

We are now ready to outline how vocational qualifications are awarded in Continental Europe, concentrating on those aspects which differ from Britain under the principles promoted by NCVQ. The number of accredited training-occupations, and the number of associated vocational qualifications, is limited to just under 400 in Germany, France and the Netherlands; in Britain, under NCVQ arrangements which give priority to the views of employers, much larger numbers of approved qualifications are likely to emerge - perhaps running into very many thousands. The Continental approach yields a smaller total because breadth is demanded on all approved training courses in the interests of nationwide standards and the transferability of skills; the British approach emphasises the tailoring of qualifications to meet as far as possible the varying needs of employers. For example, to gain an NVQ in engineering (at level 3) it is proposed that the candidate should be able to choose any six so-called Segments out of an available menu of some 250. If there were no restriction on choice, this would yield a theoretical maximum number of combinations in excess of a thousand billion ([10.sup.12])! In practice a limited number of favoured combinations will no doubt be settled upon, though larger than under the previous arrangements developed by the Engineering Industry Training Board (6). While closely-tailored specialisation may encourage British employers to provide training facilities, and encourage them to contribute to the finance of training, it is not obvious that carrying the process to such an extreme promotes an adaptable economy, nor is it in the long-term interest of employees who need to be able to change their type of work.

Let us now consider the length of course and qualification procedures; for the sake of brevity we focus on the German system. The main vocational qualification in Germany is usually awarded following three years' of apprenticeship, combined with day-release at college; that is, at about age 18 or 19. The qualification is not based on length of college attendance (this has to be emphasised, as there has been some misunderstanding), but on success in final examinations; the length of college attendance is reduced for those who have higher initial qualifications, or extended for those who have failed their examinations (there is a limit to the number of times that an examination can be repeated - usually only once or twice.) Examinations are both theoretical (written) and practical; they are taken at the end of the apprenticeship period, and also at an interim stage. The written examinations cover vocational and general subjects (mathematics, language, social studies, etc.); they are externally set and externally marked. About half a dozen papers are usually involved; the inclusion of general subjects indicates clearly that vocational education is intended to be a form of continuing education (comparable to the philosophy of the previous |Continuation Colleges' in England). The practical examination may extend for more than a whole day (for example in building work) and usually includes an oral test; it is also externally marked, usually by three examiners, none of whom is permitted to know the examinee.

An independent jury (used in French to mean a board of examiners) is also regarded as an essential feature of the French vocational qualification system, to preserve the |authenticity of the procedures' and to avoid suspicions that diplomas are handed out for improper reasons - |pour faire plaisir aux gens' (as pointed out in a recent article on the French system) (7).

In addition to attaining passes in both written and practical examinations, the German trainee has to produce a satisfactory record of completion of the centrally-specified list of tasks to be carried out as part of his apprenticeship.

It is this last requirement which has become the over-riding element in the NVQ approach. A vast, highly expensive and (in my view) largely unnecessary re-specification of existing qualifications has been demanded by NCVQ in order to re-express them in term of basic |can do' tasks, to be carried out as far as possible in front of a supervisor at work; this has affected established and highly experienced organisations, such as City and Guilds, BTEC, EITB, etc. I have no doubt that the satisfactory execution of a sample of practical tasks is an important part of a sensible qualification process. But it is, I hope, evident from what has been said here that the German and French requirements and safeguards, as listed above, are sensible and efficient; and that they are particularly important in ensuring reliability of the qualification - so promoting a more efficient market mechanism for the allocation of scarce skills, and thereby promoting the acquisition of yet higher skill-levels.

What has actually been put into effect by those responsible for vocational qualifications in other successful economies, as well as in successful training sectors here, should - in my view - provide a better guide to what is generally required in this country today than reliance on newly-formulated theoretical principles - at least in the present state of knowledge, and until much detailed empirical research has been carried out on the relevant parameters (of the type defined in the Appendix). Perhaps in the light of experience, the authorities in this country will yet re-consider what are the right principles governing a national system of vocational qualifications; and that European experience will be judged to provide important and relevant lessons.

Mathematical Appendix

ON THE OPTIMUM MIX OF PRACTICAL AND THEORETICAL TESTS

This appendix is concerned with the principles governing the optimum mix of different types of tests in the award of a qualification. The problem arises because, for example, written (|theoretical') tests and practical tests can be administered and marked at typically very different unit-costs, and marked with different degrees of precision. For a given required degree of precision in the combined final mark, and for a given total of resources devoted to testing, it is consequently advantageous to over-represent those types of test that have a lower unit-cost, after allowing for differences in their relative precision; the marks on the constituent tests have of course to be weighted to reflect the a priori required relative importance of the different types of skill represented by the different types of test (8).

For simplicity we need consider just two types of test-questions, theoretical and practical, denoted by subscripts t and p respectively. Suppose there are [n.sub.t] theoretical questions on which a candidate obtains an average mark of [M.sub.t]; and that mark [x.sub.it] for each theoretical question has an equal measure of uncertainty attached to it (because of errors of measurement, etc.), denoted by its sampling error [sigma.sub.t]. The sampling error of the average [M.sub.t] is then, in the usual way, [Mathematical Expression Omitted]. Corresponding symbols ([n.sub.p],[x.sub.ip], [M.sub.p], ...) relate to the practical questions.

The average mark M awarded on the combined tests is a weighted average

M = [w.sub.t][M.sub.t] + [w.sub.p][M.sub.p], (1)

where [M.sub.t] = [epsilon] [x.sub.it]/[n.sub.t], [M.sub.p] = [epsilon [x.sub.ip]/[n.sub.p], and [w.sub.t] + [w.sub.p] = 1. Practical skills are clearly more important in craft occupations and, roughly speaking, we may suppose that [w.sub.p] might be 3/4 and [w.sub.t] might be 1/4; for technician occupations those weights might be reversed. Let us next assume that a practical question costs k times as much to administer and mark as a theoretical question; it may not be unreasonable to suppose that k lies in the range 10-100, the higher ratio applying if comparison is between practical and written multiple-choice questions (perhaps higher still if the multiple-choice questions are marked mechanically). The total budget available for testing, measured in terms of unit-costs for theoretical questions, is thus

B = [n.sub.t] + [kn.sub.p]. (2)

Let us also allow for the possibility that the degree of uncertainty attached to the marking of a practical question differs from that attached to a theoretical question by a factor s, that is

[[sigma.sub.p] = s [[sigma.sub.t]; (3)

because of the element of judgment in assessing practical tasks, s is probably greater than 1, but perhaps not greater than 2.

The variance of the total mark, M can then be derived in the usual way, allowing for (3), to give

[Mathematical Expression Omitted]

Our problem is to choose [n.sub.t] and [n.sub.p] so as to minimise this variance subject to the budget constraint (2). Following the method of Lagrange multipliers, we minimise

V{M} - [lambda][B - [n.sub.t] - [kn.sub.p]] (5)

by partial differentiation with respect to [n.sub.t] and [n.sub.p]; this yields

[Mathematical Expression Omitted]

and

[Mathematical Expression Omitted]

Combining those last two equations and eliminating [lambda], we find

[Mathematical Expression Omitted]

that is, the ratio of theoretical to practical test-items should reflect their relative importance in the final marking criteria, but modified by the factor [Mathematical Expression Omitted] which reflects the relatively greater cost of practical tests and their different reliability.

By way of example we take the specimen values for the parameters mentioned above. For craft-type occupations, we assume that practical skills form three-quarters of the total ([w.sub.p] = 0.75), that a question on a practical test costs ten times as much to administer and mark as a theoretical question (k = 10), and that is marked with equal precision (s = 1); the optimum combination is then approximately equal numbers of practical and theoretical questions (rather than 3:1, as might be suggested by the occupational skill-mix). If practical questions cost 100 times more to mark than theoretical questions, and were two-thirds as precise in their marking (k = 100, s = 1.5), then practical questions should optimally form only 31 per cent of the total. For technician-type occupations, we assume the converse proportions (ie, [w.sub.p = 0.25), and practical questions should account for only 10 or 5 per cent of the total number on the alternative assumptions mentioned.

Lack of independence in marking

We now move to more complex matters. So far we have treated each of a series of test-questions as providing independent information on a candidate's capabilities. In practice a positive correlation is to be expected amongst the |errors of measurement' in the marks for different questions awarded to any candidate. It arises, for example, because a candidate is examined on a particular but perhaps unrepresentative day; or because he is marked by a particular examiner, with particular views, or particular personal sympathies or antipathies to that candidate. (There is also a formally analogous problem that arises if we are interested in estimating the average mark for a whole class of pupils; in that case the correlation amongst the marks - rather than amongst the errors in the marks - is relevant.) This correlation has consequences for the extent to which an increase in the length of a test (adding to the number of questions) improves the precision of our knowledge of a candidate's capabilities; and for the optimum mix of theoretical and practical tests.

Because of the complexity of the issues, it is best to begin again with the simpler case of one type of test, and to set out assumptions more explicitly. A candidate has to answer n questions on each of which he is awarded a mark [x.sub.i] (i = 1, 2,..., n). His final mark for the test is the average of those marks.

M = [epsilon] [x.sub.i]/n. (7)

We suppose that a great many such questions are available (say, from a computerised data-bank of questions), that any number can be chosen at random, and that the marks are subject to errors which are not independent. We are interested in the rate at which the precision of the average mark rises as we increase the number of questions, that is, the rate at which the variance of M falls as n increases.

We suppose for analytical simplicity that the uncertainty attached to the mark for each question is the same, so that the covariance between any two marks is

V{[x.sub.1]} = V{[x.sub.2]} ... = [o.sup.2]; (8)

we also assume that the correlation (9) between the errors attached to the marks on any two questions is the same, p, so that

C{[x.sub.i], [x.sub.j]} = [p[sigma].sup.2] (9)

To derive the variance of M we square (7)

[Mathematical Expression Omitted]

where the first summation extends over n squared terms, and the second (double-summation) relates to n(n - 1) cross-products. Taking expectations yields

V{M} + [[no.sup.2] + n(n - 1)[[rho][sigma].sup.2]]/[n.sup.2]

= [[sigma].sup.2][(1/n) + [rho] (1 - 1/n]; (10)

the standard error of the average mark is thus

[Mathematical Expression Omitted]

There are two familiar extreme cases. If the observations are truly independent, that is if [rho] = 0, we derive the usual formula for the sampling error of an average, [Mathematical Expression Omitted]; in this case an increase in the number of questions leads ultimately to complete precision. The other extreme is given by [rho] = 1, for example if each candidate obtains the same score on every question (but not the same score as other candidates). In such a situation more questions will not change the average mark for any candidate; equation (11) thus shows correctly that the standard error of the average mark does not vary with n. The notable feature of the general case (where [rho] lies between 0 and 1) is that an indefinite increase in the number of questions does not lead to ultimate complete precision, but only to a finite asymptotic level [Mathematical Expression Omitted]. The adjacent table shows that a correlation as low as 0.1 implies negligible improvement in precision after, say, the first forty questions. The reason may be put intuitively as follows. Each additional question may have only a low correlation with any single previous question; but, considered in relation to the whole succession of previous questions - say, all marked by the same examiner - it adds little fresh knowledge. To improve precision radically, it may be better to add independent examiners rather than additional questions.

We are now ready to return to our central problem, namely, how to allocate a limited budget between practical and theoretical questions, taking account of the possibility of correlation between questions. We must allow for three types of correlation: between one theoretical question and another theoretical question; between one practical question and another practical question; and between a theoretical and a practical question. These are devoted by [[rho].sub.t], [[rho].sub.p] and [[rho].sub.pt]. Proceeding as previously, applying the result of (10) to (4) and including the covariance term (10), we derive the variance of the combined mark on both theoretical and practical questions as

[Mathematical Expression Omitted]

Taking the budget constraint into account as in (5), we eventually derive a modified optimal condition

[Mathematical Expression Omitted]

this differs from (6) by the factor [Mathematical Expression Omitted]. Notice that [[rho].sub.pt] does not affect the optimal mix.

To illustrate the impact of this factor, let us suppose that theoretical questions can be formulated and marked in a moderately independent way, so that their correlation is as low as [[rho].sub.t] = 0.1; but that practical questions have a considerably greater correlation - say, because of the inevitably closer contact between examiner and candidate - so that [[rho].sub.p] = 0.5. Taking the assumptions for craft-type occupations made previously ([w.sub.p] = 0.75, k = 100, s = 1.5), this leads to an optimal requirement for practical questions to form only 25 per cent of the total testing-budget, instead of the 31 per cent if the interclass correlations are zero, or the 75 per cent that practical requirements have in the posited total mix of skills.

There is space here to do no more than mention two further issues thatarise in testing and which might benefit from analysis on the above approach. First, the optimum bunching of questions around specified levels in |criterion referenced' tests (for example, can the candidate drive a car satisfactorily?). Secondly, the appropriate margin of safety in balancing the risks of failing those who ought to be passed (because the assessment was carried out on a |bad day'), against the risks attached to passing someone who ought to fail (he was assessed on a |good day', but he suffers from an unusually large number of |bad days').

NOTES

( 1) Originally presented at a seminar at the University of Warwick on 19 March 1991; revised with

the benefit of discussion

there, and subsequently with my colleagues at the National Institute. It develops ideas

previously put forward in a Note

in this Review, August 1989. The underlying research forms part of a wider programme of

international comparisons of

training, education and productivity supported by the Economic and Social Research Council and

the Gatsby

Foundation, to which bodies my thanks are due. Responsibility for errors remains my own. ( 2) Strictly speaking NCVQ applies only to England and Wales, and Scotland comes under a separate

body (Scotvec); the

issues are however much the same, and nothing of substance is sacrificed if, for convenience of

exposition, we refer

throughout simply to |Britain'. ( 3) Capitals are attached to these words to indicate their technical connotation here. ( 4) For an extreme statement (|we should just forget altogether') by NCVQ's Director of Research,

see G Jessup,

Outcomes: NVQs and the Emerging Model of Education and Training (Falmer, 1991), p. 191. Similar

views are to be

detected in earlier publications from the Training Agency of the Department of Employment

in their

Guidance Notes for

the Development of Assessable Standards for National Certification (Sheffield, 1989); see, for

example, the remark on

the merits (sic) of oral questioning: |it does not require candidates to be able to read or

write' (Guidance Note 5, p.7) Of

course someone may be considered a capable carpenter for many purposes without being able to

write; the Continental

view would be that an employer who wished to employ him as a carpenter is permitted to do so,

but he should not be

awarded a Vocational Qualification. On the other hand, NCVQ would be prepared to award a

Qualification. One of the

dangers of the latter approach is that vocational qualifications will acquire a cumulatively

lower status in Britain

(|suitable for illiterates'), whereas on the Continent great pains have been taken to enhance

their esteem. ( 5) HMI, National Vocational Qualifications in Further Education 1989-1990, DES, 1991, pp. 6, 8. ( 6) Progress seems to require ever finer grinding. Previously, two Modules were the requirement for

a craft qualification in

engineering; a Module was then divided into three Segments. On the latest development each

Segment is to be divided

into an average of four Elements (yielding a total of about a thousand Elements). At the time

of writing it seems that

extensive negotiations are in progress with NCVQ in several occupational areas. |Conditional

Accreditation' has been

granted by NCVQ for some existing qualifications, so that government training subsidies may

immediately be received

by the industry concerned pending agreement on the longer-term re-structuring of their

qualification-procedures in

accordance with NCVQ's principles. ( 7) E Kirsch, Formation Emploi, Oct-Dec 1990, p. 13. ( 8) From a formal point of view the mathematical development that follows in this Appendix is, in

essence, no more than an

application of the standard theory of stratified and clustered sampling; but I am not aware

that it has previously been

applied in this context (the algebra here concentrates on the essentials required in the

present application and is, I

hope, simpler to follow than provided in general texts on sampling theory). ( 9) This corresponds to the intraclass correlation which arises |mainly in biological studies' (G U

Yule and M G Kendall, An

Introduction to the Theory of Statistics, Griffin, London, 14th edition, 1950, p.272; the

charming application to variations

in the length of cuckoos' eggs according to nest of foster parent - Robin, Wren or Hedge

Sparrow - will bring joy to

many a scientific heart: ibid., p.280, based on a a study in Biometrika, 1905). For its

application in cluster sampling see,

for example, M H Hansen, W N Hurwitz and W G Madow, Sample Survey Methods and Theory (Wiley,

1953), vol. II,

ch. 6. (10) There are 2 [n.sub.t][n.sub.p] covariance terms of the type [x.sub.it][x.sub.jp] which, on

taking expected values, reduce to the simple final term

in (12).

Printer friendly Cite/link Email Feedback | |

Author: | Prais, S.J. |
---|---|

Publication: | National Institute Economic Review |

Date: | May 1, 1991 |

Words: | 5593 |

Previous Article: | Financing training in Britain. |

Next Article: | European integration and external constraints on social policy: is a social charter necessary? |

Topics: |