Assessment Through The Looking Glass
The old 5-14 assessments used to give robust, consistent answers to specific questions. The questions were of the form "what percentage of P7 students have managed to score at least 15 marks in one of a specific set of nationally generated assessment instruments". We thought that the questions were of the form "what percentage of P7 students have passed level D mathematics", but of course that isn't what we were measuring. We were measuring performance in tests. The mistaking of test performance for evidence of "passing a level" was bad, but we were right in thinking that we compared like with like as we looked at performance data from across Scotland.
Now we have abandoned national tests, and have moved towards measuring "passing a level" based on a combination of a rich bundle of assessments and teacher professional judgement. But what does it actually mean to have passed level 3 in numeracy? CfE documentation talks about assessing breadth of learning, level of challenge and ability to apply learning in unfamiliar contexts. But just how broad? How much challenge? How unfamiliar? None of these things are made clear (nor could they be), but this problem is supposedly addressed by the magic bullet "moderation". But moderation is not a magic bullet. No amount of moderation will allow us to reach a consensus on where the fundamentally vague boundary between levels lies, let alone how we can measure which side of the boundary any student sits at any given time. England stopped reporting on levels for good reason. You can read the report from the Commission on Assessment without Levels (published in September 2015) here. It says
Despite being intended only for use in statutory national assessments, too frequently levels also came to be used for in-school assessment between key stages in order to monitor whether pupils were on track to achieve expected levels at the end of key stages. This distorted the purpose of in-school assessment, particularly day-to-day formative assessment. The Commission believes that this has had a profoundly negative impact on teaching... Levels also used a ‘best fit’ model, which meant that a pupil could have serious gaps in their knowledge and understanding, but still be placed within the level. This meant it wasn’t always clear exactly which areas of the curriculum the child was secure in and where the gaps were.
And yet we persist, and find ourselves in the Alice-in-Wonderland situation where schools are expected to provide data to the Government on the levels achieved by students in Reading, Writing, Listening-and-Talking and Numeracy, with this data being robust enough to allow us to measure progress over time.
Of course we know where this is heading. Standardised tests are coming, and these will be used to determine the level achieved by pupils. The rhetoric is that we will use a combination of "teacher professional judgement" and the results of the tests. But it is categorically impossible for the former to generate anything resembling robust or consistent data. And so, over time, we will quietly drop the teacher judgement element and return to where we started in 5-14, measuring progress through the proxy measure of performance in national tests. I don't see anything particularly bad about this end-point, but it depresses me how many thousands of teacher-hours will have been wasted along the way.