Refereeing essays

June 29, 2022 Andy Grayson

Refereeing essays

Marking is about making complex judgments

This post was published by Wonkhe

Who'd be a football referee? I mean, seriously, who'd do that job? You have to make difficult decisions about complex matters, under time constraints. On occasion, mistakes are made. Sometimes, even when multiple experts replay those complex matters over and over, there is still no consensus on whether a particular outcome is 'correct'. And literally millions of people might be watching.

Thankfully, in Higher Education, we don't have to deal with this. Not, at least, with the 'millions of people watching' bit. However, the rest is familiar. We too have to make difficult decisions about complex matters, under time constraints. Take for example the business of marking an essay.

Bill Shankly

Of course, I'm not pretending that the outcome of marking a student's work is comparable in importance to the outcome of judging whether a defender's arm is in an unnatural position when struck by the ball in the area during a crucial premiership match. Those two scenarios are not at all equivalent. No, the judgement about the student's piece of work is considerably more important. Because life isn't a game.

So, what can a referee guarantee the players of each team? That the outcomes of all their decisions will be correct? That every other referee will agree with their decisions? I think not. And we can't make those guarantees to students either.

Marking a piece of assessed work at university involves the exercise of academic judgments. We shouldn't pretend to students that the same academic judgments would always be made by any given marker. And we shouldn't pretend that they will always be 'correct' in some sort of objective sense. Furthermore, we should be prepared to have grown-up conversations with students to this effect, in the same way that Jürgen Klopp and Mark Clattenburg might talk once the heat of the match has receded.

After the final whistle

It's long been a concern of mine, how we handle conversations with students who are not happy about the grade that they have received for a piece of assessed work. It's my experience that the more honest we can be with them, the more satisfied they are with the outcome of such conversations.

That might sound paradoxical given that part of that honesty entails saying, "yes, it is the case that a different marker might have given your work a different grade." How on earth does that 'admission' lead to greater student satisfaction?

The answer lies in the notion of empowerment. If we are open and honest with students, we empower them to be confident in their ability to evaluate their own work. If we hide behind the shield of, "we know best, and this is the only grade that could be awarded" then we take away their capacity to disagree. University life should be all about enabling disagreement.

Contracts

This is clearly not to say that 'anything goes'. So, what I want to do in this article is to map out the parameters of what we can (and cannot) promise to students, assuming that we are competent and sufficiently expert to mark their work in the first place. The type of assessed work that I am talking about in this piece is the essay (or 'report' or 'dissertation' or any of those longer pieces that require a marker to come to an overarching, holistic judgment).

In my view, when assessing student work of this type, we are in a position to promise students three things. We can guarantee to mark their work:

Carefully
In line with published criteria
In good faith

That's it. Nothing more, and crucially, nothing less. And that is more or less what a referee can promise. Except their criteria are generally referred to as 'rules'.

Having guaranteed students these things, we have to be honest about the fact that the conclusion we come to about their work is thereafter a matter of academic judgement. And that's the point at which different markers can legitimately, and fairly, come to different conclusions. In the context of the specialised, complex grading decisions that have to be made at degree level, we have to look for fairness in the process of assessment, more than in the outcomes of it.

VAR

The alternative is to find ways of assessing work that are less judgment dependent and more algorithmic. For some of the time, and for some types of subject matter, that's fine. We can use short answer questions and multiple-choice tests (for example) for which there can be much higher levels of inter-marker agreement. The outcomes of such tests can be 'fairer'. This is rather like the process of drawing horizontal lines across a screen and measuring whether the striker's toe is nearer the goal than the defender's heel. A kind of academic VAR.

However, just as VAR cannot be used to determine the outcome of every decision made on a football field (no matter how much money you spend on it), our more 'objective' tests are not suited to every kind of assessment that we needs to be set. Relying too much on this approach to measuring student performance, in order perhaps to achieve demonstrably higher inter-marker agreement, turns out to be unfair in other ways. It privileges those, for example, who are able to martial a lot of facts and set-piece answers, to the detriment of those who are stronger at the higher-level skills of working with ideas inventively, synthesising, thinking critically, and so forth.

Publication of criteria is crucial to this implicit contract between marker and marked. The student has to have a way of understanding what is going on in the head of the person who is charged with assessing their work. And there has to be the possibility of discussion ('dialogue') of the grading of the piece of work, framed by reference to those criteria.

In what ways does this empower students to evaluate their own work? Well, if we simply exercise the power inherent in our role, and assert (in effect), "I am right, you are wrong, and all my colleagues would agree on that" then the student who disagrees with us has nowhere to go with their own view of what they have done.

If, on the other hand, they are permitted to see that disagreement can exist with regards to these judgments, then they are in a position to make informed decisions about how to tackle subsequent tasks. They can be encouraged to be confident in working on their approach to these matters. And confidence in studying is an important commodity.

Red or yellow card?

Bear in mind, the types of judgements we are talking about are inherently complex. They often revolve around deciding whether someone has performed 'excellently' on a criterion, or merely 'very well'. The overall grade for a piece of work depends on such fine-cut judgments on multiple criteria, many of which can arguably (and fairly) fall one way or another.

And that is not dissimilar to the way in which the results of a football match can depend on whether a player is sent off for 'using excessive force' instead of merely being cautioned for a 'reckless' challenge, or whether the forward who was standing in an offside position when the winning goal went in was adjudged to have been interfering with play or not. The fact that these sorts of judgments are made in HE when not so very many people are watching, makes it extra important to be open and honest about them.