Over the last few weeks and months I have been thinking a lot about assessment within secondary schools, what assessment can and cannot do, how we can make assessment more effective for students and teachers, and how we should report attainment and progress to parents. In the blog that follows, I would like to thank @bennewmark, @daisychristo, @dylanwiliam, @JamieScott_Edu, @LeadingLearner and especially @teacherhead and @HuntResearchSch for helping to shape my thoughts on this topic.
It is only through assessment can we find out if students have learnt what was intended by their teachers, as Dylan Wiliam says “assessment is the bridge between teaching and learning” . The core purpose of assessment is to support learning, not to track school performance metrics. Assessment should enable teachers to develop a strong understanding of their students’ strengths, areas for development, and to plan for the next steps in learning both in terms of student support and additional challenge .
Assessment is a complex subject, but any assessment should always consider the following four aspects:
- Purpose – be clear about what you want to assess.
- Validity – is the assessment able to measure what you want effectively? Consider a maths test with a high reading demand, what will/won’t this tell you about the mathematical ability of students with poor literacy skills?
- Reliability – if a different examiner marked the assessment, or the student sat the assessment again with a few different questions, or at a different time of day, how reliable would the assessment be in reporting the same final grade?
- Value – all summative assessments take away from student learning and teacher planning time. Is the cost-benefit analysis worth it?
At this point I would like to talk a little more about reliability. No assessment is perfectly reliable and not enough parents/teachers/SLT realise this. Were you aware that research into the misclassification of National Curriculum levels awarded to Year 6 students (from their SATs) was found to be around 15% in English and 10% in Maths?  and that the proportion of pupils who will receive an unreliable grade for their English Language GCSE is set to rise from 30% to 45% with the introduction of the new 9-1 GCSE grades?
“Even though [examiners] are doing their best to get things right, inevitably more will fall the wrong side of the boundary. Near every grade boundary there are going to be candidates that have the wrong grade. If there are more boundaries there will be more children getting the wrong grade” – Neil Sheldon, the Royal Statistical Society.
[Note that there will be 17 possible grade combinations in the new GCSE Combined Science qualification!]
But marks awarded by examiners aren’t the only source of unreliability, statutory assessments such as those sat by Year 11 students at the end of Key Stage 4 need to provide a shared understanding of the general level a student is working at, across large domains of content. To assess everything a pupil knows could take days, and since time and money are in short supply, GCSEs ‘sample’ students’ knowledge, testing certain areas of knowledge and skill in order to make inferences about their broader abilities. This means that if the student sat the test again, with a slightly different ‘sample’ of questions, it is entirely possible they could end up with a different summative grade.
Finally, the students themselves are also a source of unreliability. Students can perform differently depending on illness, time of day, the temperature of the room and whether they have eaten breakfast or not .
It follows then that…
“Grades should be seen not as absolute values but as statistical approximations bounded by clear confidence limits.” – Dr Kevin Stannard
Ok … why bother with assessment then?
Assessment of students is still useful to bridge the teaching-learning gap, but as a staff body we need to realise that assessment is not absolute but fuzzy. The good news is that the larger our group of students, the less the fuzzy the assessment figures become (think of a school’s overall Progress 8 figure with small…ish confidence intervals versus the Progress 8 figure of its various subgroups with much larger confidence intervals).
”I think that value-added is a really powerful tool. I think it’s probably most helpful to look at it at the cohort level and not at the individual level” – Daisy Christodoulou 
Assessment in schools is usually split into two distinct categories:
- Assessment of learning aka summative assessment
- Assessment for learning aka formative assessment
Summative assessments should be taken infrequently (certainly not more than bi-annually) and usually produce a grade (or scaled score) with a shared meaning of student performance. In order to produce this shared meaning, summative assessments should be valid, reliable, taken in standard conditions, sample from a large domain of content and be able to distinguish between pupils . Summative assessments are the only real way of telling us, not about what a pupil can do at a particular point in time, but about what students have truly learnt and what have changed in their long term memories, not just in the current term, but previous school terms and Key Stages.
Formative assessments should be done frequently (ideally every lesson) and be recorded as raw marks that have meaning only to the teacher and student. Formative assessments should be domain specific and have very precise questions to identify student misconceptions and feedback for the student and teacher to act upon, they should be repetitive (since practice and repetition improves learning) and if the aim is for teachers to teach mastery, students should be expected to get 90-100% of the questions right .
So summative assessments provide a shared and consistent understanding of pupils’ achievements, while formative assessments include the day-today practices that help teachers and pupils understand what has and has not been learnt, and put in place actions to address this . An analogy for this is that summative assessment is like a security floodlight that has a wide coverage but is less focused, formative assessment on the other hand is like a stage spotlight that has narrow coverage but is very focused.
@teacherhead gave a fantastic talk on assessment at the 2018 Headteachers Roundtable Summit (reproduced on his blog here). One of the slides that really got me thinking was on what an idealised summative/formative assessment regime might look like, reproduced and adapted below:
Do data drops inform teaching and learning?
All teachers will be familiar with the [half] termly summative data drops that the school’s data manager will upload into some 3rd party software provider (SISRA, 4Matrix etc) so SLT can monitor the performance of the KS3, KS4 and KS5 cohorts at their school, but does it actually need to be this way? If the entire dataset was wiped overnight, how would it make a difference to teaching and learning in your school? Answer: probably very little.
While termly data drops might tell SLT where cohort is at a given point in time, it does nothing to take the cohort further. It does not inform the teaching and learning that needs to take place to deepen student’s understanding of photosynthesis in variegated plants or find the roots of quadratic equations or explore what role Castro played in the Cuban missile crisis. The fact of the matter is that summative macro data does not drive up standards. It is the more messy, less clinical, teacher owned formative data gathered by him/her every lesson and adjusting their lesson accordingly under the name of “responsive teaching” that ultimately drives up standards.
“As long as we set our end goal as mastery of a particular domain, then exams will be valid measures of that goal” – Daisy Christodoulou 
Alas, summative assessments are needed – not just for tracking the progress of cohorts, but also for statutory reporting purposes. Both students (and their parents) need an indication of where their son/daughter is in their subjects at a particular point(s) in the school year. To this end it makes sense to assess students’ knowledge, and application of that knowledge, using summative assessment techniques. It is important that the summative assessments don’t just examine the most recent term’s work since this encourages a very shallow approach to learning (and teaching) – students will learn that they only need to remember the material for a few weeks and can then forget it again, so there is little to no incentive for the student to gain the deeper understanding required for long-term recall . Instead, summative assessments should be cumulative in nature i.e. test not just in the current topic, but the entire term’s material, and previous term’s too. This will often give different student outcomes to what a teacher might expect, but will be a truer picture of the final external examination at the end of a Key Stage*. As mentioned previously, summative assessments should be used infrequently, both to allow sufficient time to pass within which progress can be measured, to minimise workload for the teacher, and to minimise missed learning opportunities/lesson time for the students invloved.
*while exam boards often make available secure mock exams for KS4 and KS5, compiling summative assessments for KS3 may be trickier. To this end, many schools use standardised assessments, such as those provided by GL Assessment, to test their cohorts in the core subjects at specific points during the Key Stage. Using standardised tests can: a) Provide schools with an indication of how their pupils compare with their peers nationally; b) Reduce the burden for teachers and middle leaders, and; c) Reduce teaching to the test, as many standardised packages randomise the questions they ask .
Formative assessment on the other hand is the bread and butter of teaching and learning – it is formative assessment every single lesson that makes a difference to pupil outcomes (and thus school league table performance metrics). The most important assessment happens minute-by-minute and day-by-day in every classroom, and that is where an investment of time and resources will have the greatest impact on learning . By this I mean low stakes quizzing that allows ALL students to succeed. We should measure what we value, not value what we can measure – these daily low stakes quizzes not only reinforce synaptic connections in the brain, but also sends a message to students that we value the time they put into their revision and transferring that knowledge from their limited working memory into their limitless long term memory. Remember that it is success that leads to motivation, not necessarily the other way around . Combining the above with spaced practice, an interleaved curriculum, knowledge of Cognitive Load Theory and more evidence based techniques will always lead to improved attainment and progress in all pupils. Please check out the “Ten principles of instruction” by Barak Rosenshine, based on classroom practices of those teachers whose students show the highest gains. By delivering quality teaching & learning in our classrooms, all students and subgroups of students will benefit. As the aphorism goes “a rising tide lifts all boats”.
In summary, do more of the left hand side and less of the right hand side:
Simplifying Student Reports
To the uninitiated parent school reports can look like the output from a random number generator, with many of the fields being subjective and open to interpretation – please click on the left hand picture for a larger example of an example report card below by @teacherhead:
So how do we improve this to make it less subjective and more useful to both the teachers, parents and ultimately the students.
- Ditch target grades for students (but still make them visible to teachers)
Studies by psychologists such as Locke and Latham (2006) concluded that so long as a person is committed to the goal, has the requisite ability to attain it, and does not have conflicting goals, then there is a linear relationship between goal difficulty and task performance. Ben Newmark has written extensively not only the history of target grades, but why students frequently don’t meet Locke and Latham’s criteria, resulting in detrimental outcomes for all (I wholeheartedly recommend reading Ben’s blog on this). One extreme example of the target grade culture in schools can be found here.
Schools lecture students about Carol Dweck’s growth mindset, but tend to give students a target that is effectively ‘fixed’ over the course of their Key Stage. Very high target grades demotivate students (especially girls) and low target grades also demotivate students (if you have a target less than a 4, most students think that they have “failed” already). Even middle prior attaining students tend to stop trying once they have reached their target grade rather than pushing themselves further. Why not have a culture where every student should be trying to better themselves, believing that they can achieve to the best of their ability (without putting a ceiling on it). The idea of ditching students target grades is gaining traction in many research schools across the country. Assistant Headteacher Garry Littlewood of Huntington School, York, feels that the above approach is “encouraging more aspiration amongst the students.” .That said, I do believe target grades should still be made visible to teaching staff and SLT so that they are aware of each student’s potential. They are also needed in relation to measuring student progress, see later.
- Replace working-at-grades with predicted grades at the end of the Key Stage
GCSE grade descriptors 9 to 1 are designed to summatively capture learning and achievements at the end of a course of study, rather than provide a sequence for learning wherein schools cover the content for grade ‘4’ before moving onto grade ‘5’, which is what ‘working at grades’ infer. . Instead, use teacher predicted grades i.e. the grade a student is likely to get at the end of the Key Stage if they carry on working as they have been. The problem with that, of course, is that it is very difficult to predict how well students will do in the future, so it may be useful to include a range of predicted grades for a given subject e.g. grade 6±1. Note, parents should be well informed on the meaning of the new GCSE 9-1 numerical grading system, see the adapted graphic from @teacherhead below which also depicts the uncertainty (mentioned previously) around summative assessment grades for GCSE.
- Student Progress
All too often students have report cards that imply that they are on track in all subjects, and not until the last couple of report points until their GCSEs do the reports start to flag up that a given student is underperforming. Why is this? I believe that many school reporting systems are to binary in nature and teachers are very reluctant to report students as ‘off track’ until much to late in the game. Schools need to ensure that a warning flag is put up sooner to both students and to parents so that it can be acted upon. To this end I would suggest academic progress in each subject be assigned one of four possible bands: ‘exceeding expected progress’ [PURPLE], ‘meeting expected progress’ [GREEN], ‘working towards expected progress’ [AMBER], or ‘underperforming against expected progress’ [RED] – adapted from Huntington School’s assessment policy . Note that progress can be directly calculated by comparing the end of Key Stage predicted grade against end of Key Stage target grade (which is only know to staff).
- Attitude to Learning (AtL)
Attitude to Learning should be reported as either ‘Excellent’, ‘Good’, ‘Insufficient’ (meaning the pupil is coasting), or ‘Poor’ (meaning the pupil takes little or no responsibility for his or her progress), see  for more information. The science behind questionnaires says that you should only ever have an even number of options to force people not to sit on the fence. All too often student’s see different AtL scores in different subjects, even though they have the same attitude towards their learning in each. A particularly academic student may be seen to be an eager learner in all their subjects and a ‘naughty’ student may be disruptive in the majority of their lessons. Either way, the inconsistencies in their AtL scores across their subjects is not necessarily down to the student but down to the teachers having different expectations. It would therefore make more sense to average the AtL scores across all subjects and report back to the parents the average Attitude to Learning outcome.
- Timing of reports and next steps
The timing of reports is always a contentious issue. The later in the academic year it can be done, the more summative information a school will have on a given student, but giving the report at the end of the year is almost useless if the primary purpose is to support learning . It is for each school to decide their own particular schedule for issuing reports (making this known to parents on the school calendar), but I would advise it is done soon after a large summative assessment (and associated data drop) and therefore relatively infrequently – certainly no more than three times a year, and preferable bi-annually. Summative grades can only be made when students are tested over a wide domain of content over a long period of time. A GCSE summative grade not only tests what has been learnt but also what has been forgotten – we tend to ignore this last part when we look at end-of-topic tests etc when filling in reports, hence tend to be over optimistic at times. It is important for SLT to eliminating unnecessary workload associated with data management . This way teachers are more likely to take more time and care over their data entry and therefore make more accurate judgements about their students.Finally, when distributing a student’s report, we need to clear about what we expect parents and guardians to do with this information. If the effect of the report is to persuade students that effort is more important than talent as a determinant of success, then it is likely to be helpful .
I show below two exemplar reports based on the aforementioned report criteria. The first example is the report of a KS4 student who has FFT20 targets of a 3 in all his/her subjects (targets not visible to student for reasons given above). This student clearly left Year 6 with relatively low SATs scores, however, is meeting expected progress in the majority of subjects and exceeding expected progress in Maths and Science. So, even though this student is getting relatively low predicted grades, he/she is clearly on track for his/her given staring point. RS is a slight concern, but the tutor subject specific comments will illuminate points of improvement to get back on track in this subject (the subject teacher will also have had a conversation with this student before the report is issued).
The second example shows a student with FFT20 targets of a 7 in all his/her subjects (targets not shown) i.e. achieved high SATs results at the end of Year 6. As you can immediately see from the amount of orange and red, this student has poor attendance, has an Insufficient attitude to learning, and this is having an impact on almost all his/er subjects. In this instance a parental meeting would be advisable to discuss these issues and put in place supporting measures for this student both inside of school and within their home environment.
Finally, all the reports would explain the colour coding on the back, including brief descriptors for Progress and Attitude to Learning e.g. from :
 Wiliam D, Redesigning Schooling – 8: Principled assessment design. The Schools, Students and Teachers network; July 2014. Link here.
 Assessment Review Group, Redressing the balance. NAHT; January 2017. Link here.
 Lord Bew, Independent Review of Key Stage 2 testing, assessment and accountability. June 2011. Link here.
 Christodoulou D, Making Good Progress?: The future of Assessment for Learning. OUP Oxford; February 2017. Link here.
 Millard W, Small I, & Menzies L, Testing the Water: How assessment can underpin, not undermine, great teaching. Pearson; November 2017. Link here.
 Hendrick C & Macpherson R, What Does This Look Like In The Classroom: Bridging The Gap Between Research And Practice. John Catt Educational; September 2017. Link here.
 Huntington School’s Assessment, Recording and Reporting policy, March 2017. Link here.
 Report of the Independent Teacher Workload Review Group, Eliminating unnecessary workload associated with data management. March 2016. Link here.
Rosenshine B, Principles of Instruction; Research-based strategies that all teachers should know. Excellent. Link here.
Durrington Research School, Insights into assessment from ‘what does this look like in the classroom’. Link here.
The Effortful Educator, Less is More: Simple Formative Assessment Strategies in the Classroom. Link here.