17 JULY 1936, Page 24

The Misuse of Marks

The Marks of Examiners. By Sir Philip Hartog, E. C. Rhodes and Cyril Burt. (Macmillan. 8s. Bd.) NUMERICAL marking is a necessary evil : it simplifies the' work of recruiting-sergeants, business-men, Civil Service Commissioners, and all who have to select human beings capable of performing specific mechanical tasks. It is therefore desirable that it should be as efficient as possible, within its proper limits, and that these limits should be clearly recognised. The International Institute Examinations Enquiry Committee, thanks to a' grant from the Carnegie Corporation, has been able to make a long series of tests of examiners and examina- tions, and they have shown clearly that the marks given, whether in written examinations or in interviews, are thoroughly unreliable. A candidate given a mark of distinction by one examiner may be failed by another, and an examiner may give widely differing marks on re-marking the same paper. after an interval. The latest publication of the Committee covers the same ground as the Examination of Examinations ' which appeared hat year. It repeats and reinforces the arguments of that pamphlet, and it contains a reply to some of the criticisms which were urged against it. Detailed accounts of the conditions and results of the experiments are given : they appear to have been remarkably thoroUgh. Thus, many of the sets of papers were 'narked by as nanny as fifteen examiners, and both candidates and examiners worked as far as possible under normal conditions. The examiners' were fully qualified men, paid at the usual rates, and all candidates who underwent special examinations were offered some strong inducement to do their best work.

The book is concerned solely with discrepancies of marking ; it is not concerned with the fact that one paper may be a better test of (say) chemical knowledge than another, nor is it concerned with the variations in the work done by a candidate on different days. Apart from personal influences and differing conceptions' of the purpose of the examinations, the dis- crepancies of two examiners may be due, as Professor Burt shows in an appendix, to differing standards of severity (i.e., the average mark given by the two men may not be the same), or to differing spacing of the marks about this

average the standard deviation and the curve of distribution may not be the same). These influences are fairly easily detected and put right ; and every examiner and examining body should learn to cope with them, Professor Burt has a gay time expounding as much, and rather more, of the theory of statistics as might be useful :. the probable error due to rough-and-ready methods of correction will, in all practical cases, be less than the error - due to uncontrollable causes. Professor Burt is at his happiest in. his mathematical analysis of the random errors of examiners • (including those which he ascribes to unconscious. prejudices). He, outlines a, method for finding the " weight " or coefficient of.reliability of each examiner in a set .of six (with a smaller. group, the large probable error - would make the method useless). Thus- it seems that every -candidate for the post of examiner shoUld be compared with not less than five or six other examiners so that- his " weight " may be calculated

but before embarldng on this tiresome and expensive course, it might be worth while for the employers to employ someone to work out the relative " weights " of a set of examiners

several times, basing their calculations on three or four different sets of papers, to see whether an examiner's " weight " reniains constant. Professor Burt hits left this undone, and

no doubt a heavy crop of Ph.D.'s is waiting:

Of course, an examiner may be a good scholar although his "weight " is Satan, and he may be a good examiner, too; if we admit that some of the " unconscious prejudices"

may be justified by the subsequent career of the candidates. The statistical method is valid only on the assumption that everything can be marked and numbered : to admit intuitions which can only be verified by subsequent events would

upset the whole giddy pyramid; The widest discrepancies were shown in the test of the Civil Service type of interview.

The candidate who headed the list given by one panel of examiners was more than half-way down the list given by

the other, and vice-versa.

It is a pity that the tests do not appear to have been applied to " examinations " in painting, poetry or music : the results might have alarmed even the Committee and shaken their confidence that everything can be said in numbers if only you juggle long enough. In a footnote, Professor Burt asks :

" Could Bacon's Essays be compared with Macaulay's, on such a basis ? Are not mental elements simply aspects of something which is alive and growing, fragments that are integrated organically into a pattern which can never be treated as the sum of the separate parts ? To my mind such questions form a warning rather than an objection : by the choice of some more complex mathematical function and the use of imaginary quantities, we could in theory, I presume, deal with the problems of psychology as accurately as with those of any other concrete science."

God bless us ! The hardest problem of criticism is that of disentangling the " elements " in a piece of writing, and if the writing is of any importance at all it always has elements which were never known before and which there- fore cannot be made known, either in quantity or quality, by an act of comparison. You can only mark things which are' dilutions of things already known to you and already analysed : but every personality is as difficult to criticise as any poem. What is the standard of comparison, the perfect candidate ? If it is a person who never makes a mistake in arithmetic, never misses the target, or never frowns at a customer, all well and good : marking is possible. But if education is intended to produce men, and not merely robots for finance and industry, the problem is not so easy, marking (even with . the aid of Complex Variables or, say, the Theory of Groups) may not be possible, and it is certainly not practicable. There is a real danger that a more efficient marking system will merely serve to strengthen the strangle- hold of examinations on our schools. Let the business-men and their professors of industrial psychology run their examinations as efficiently and as mechanically as they like ; but if good taste, good manners and good morals have any place in education, the less talk ahout marks the better.

MICHAEL ROBERTS.