The tyranny of school metrics

How do we know if a school is getting better? Or worse? How do we know if the strategies that we choose to improve our schools are working? Or not?

Cause cannot be reliably attributed to effect

Schools are such complex places, with multiple influences on outcomes, that it is almost impossible to directly link cause and effect. Consider the example of improved attainment in reading at the end of KS2 compared to the previous academic year. There are a number of possible contributing factors.

  • Was it because of the now ‘research informed’ reading strategy?
  • Was it because of better attendance?
  • Was it because the children read better books?
  • Was it because behaviour was better?
  • Was it because there were different teachers to last year in Year 6?
  • Was it because of the relative starting points of the two cohorts?

It could have been all of these and none of these. In truth, it will likely have been a combination of these conditions and more, each playing minor and major roles at different points in time.

Confidently attributing cause to effect is to underestimate complexity. Nevertheless, seeking to make sense of what we see is a natural attempt to impose order when faced with complexity. The illusion of explanatory depth, where one thinks that they understand more than they actually do, explains such attribution of cause to effect, a trap that all are susceptible to. Knowing about how complex systems such as schools work and knowing that one is vulnerable to such bias exemplifies the neat interplay between types of knowledge that make up expertise, vital to effective decision making and problem solving.

Comparing then and now is more useful than disentangling cause and effect

Despite the futility of attributing cause to effect, evaluating impact is still a necessary part of school improvement. This evaluation helps to zero in on what leaders need to pay attention to so leaders need to find ways of fully understanding the strengths and areas of improvement in various aspects of school life now compared to a point in the past. This baseline would need a time reference, for example the beginning of a term or academic year, the month of appointment into a new role or the publication date for an Ofsted report. Any baseline information will be made up of either quantitative or qualitative information by which comparisons between then and now might be made. Quantitative information could include attendance figures, statutory assessment figures, test scores or survey results. Qualitative information could include the observations that leaders make from their day to day interactions, anecdotes from various stakeholders or examples of children’s work.

Measurement is not an alternative to judgement

Whether it is quantitive or qualitative information that is gathered in order to evaluate the impact of school improvement efforts, leaders need to beware of the the pitfalls in measuring performance or impact. Muller warns of the tyranny of metrics that:

Muller’s warnings should resonate with school leaders in an education system where judgements of what children understand and the effectiveness of schools are reduced to grades.  Grades are a mechanism to measure and an attempt to bring order to complexity but, as Muller concludes, measurements demand judgement:

  • Whether to measure
  • What to measure
  • How to evaluate the significance of what’s been measured
  • Whether rewards or penalties will be attached to the results
  • To whom to make the measurements available

Avoid replacing judgement with numerical indicators

The argument for metrics is that by reducing complex information to points on a scale or numerical values, comparisons within and between schools are easier, they are more easily understood and are therefore useful to hold leaders to account.  Such metrics have been a staple in schools for years through GCSE grades, SATs scaled scores, progress measures, teacher assessments or Ofsted grades.  Although this simplification of information makes it comparable, what is lost is history, context and meaning: 

Caveats and ambiguities are peeled away, giving the illusion of certainty and transparency.

Jerry Muller

The illusion of certainty is a problem for school leaders because it oversimplifies complex problems. Complexity cannot be measured. Just as a mean average summarises a range of numbers, any metric that leaders select merely masks what is actually useful – the context and nuance that the metric represents.

Not everything that matters is measurable

Schools exist so that children can learn but there are differing opinions about what schools are for. Summarising various authors on the justification for mass public education, Wiliam categorises those justifications into four areas:

  • Personal empowerment
  • Cultural transmission
  • Preparation for citizenship
  • Preparation for work

There are other important outcomes of school improvement that, if a measurement rubric is applied, lose their nuance. Wellbeing is one such outcome. Whether it is the wellbeing of staff or children, leaders can measure proxies of wellbeing, such as responses to survey questions, sickness absence or staff retention data. These proxies and any others that aim to measure wellbeing can only represent a sample of the entire domain.

Further, the very attempts at measuring wellbeing can result in a negative influence on it. Consider a staff survey which contains questions designed to evaluate how staff feel about working practices. Wellbeing and workload are two sides of the same coin and so additional work, like a long survey to complete, can have a negative effect on staff wellbeing. The complexity only increases from here though. Perceptions of workload matter more than actual time spent working and if staff knew that their survey contributions were genuinely listened to and resulted in improvement, the time spent completing it would be worthwhile. Conversely, if staff considered the survey to be a token gesture which would never result in change, the time spent completing it would be considered wasted. When completing a survey, some staff may feel reluctant to express negative opinions, even in it were anonymous, for fear of repercussions, further affecting wellbeing.

Trust is vital for effective leadership and, because of its importance, it would be tempting to measure it. Leaders could quite easily ask staff, either through conversation or through a survey, about the levels of trust they feel is given or received. Trust is such a complex concept that leaders would have to have to prepare a combination of very carefully planned questions, along with additional methods of gathering information such as observing interactions, all underpinned by a sound understanding of what it is that actually contributes to high levels of relational trust. Bryk and Sneider identified four attributes needed to build trusting relationships:

It would be difficult to write survey questions that yield responses that get to the root of why levels of relational trust are what they are and so a more fluid conversation might be a better way of evaluating a desired outcome such as this. Such evaluation relies on the knowledge of the person doing it – if they do not fully understand how relational trust is built and maintained, they will not be able to effectively find out information related to it, hence the importance of building leaders’ knowledge of evaluation mechanisms. If an outcome matters, there ought to be some form of evaluation that maintains the quality of the information gathered without oversimplifying it in order to obtain a measurement.

The vicious circle between metrics and trust

There is a reasonable assumption to be made that school leaders who deliberately plan for the difference that they aim to make and evaluate the extent to which it has been achieved will stand a good chance of bringing about school improvement. However, the way in which that evaluation is carried out will have consequences throughout the school system. The demand for measured accountability waxes and trust wanes. A lack of trust leads to more metrics and more metrics leads to less reliance on judgement and lower trust.

Not everything that can be measured matters

The path of least resistance is to collect information on things that are easy to measure. Attendance, for example, is absolute; children are either in school or they are not and a school’s management information system makes it easy to see the percentage of children at school or not for any given period. Of course, it matters that children come to school but attendance is merely a prerequisite for effective learning. Just because it is easy to measure, it doesn’t mean that it is more useful to measure it.

Teacher assessments of children’s attainment might not be easy to measure reliably but data can be generated and measured easily. Most schools will have some sort of tracking system where teachers input, using a graded system, a representation of their learning at different points in time. The tracking system calculates and graphs this information, conjuring the illusion of certainty about what children have learned. The problem with this example is that the information is flawed in the first place. Teacher assessments are unreliable due to inherent teacher bias and any inferences made from unreliable assessment data have low validity.

The more a metric is used to make decisions, the more it will be gamed

A significant risk of reducing complex information to a metric is how that information is used by leaders. If high stakes decisions are based on metrics, there is always the chance that they are manipulated in order to gain reward or avoid negative consequences. Once noble goals can easily be displaced as effort is diverted to what gets measured. Curriculum narrowing is the prime example of this towards the end of KS2. SATs assess elements of English and maths while other subjects are not assessed at all. When so much rides on good results, it is obvious, if disagreeable, why some leaders prioritise English and maths at the expense of a broad and balanced curriculum.

The flaw of measuring inputs not outcomes

There’s one particular misconception around impact that it is not the same as the actions that leaders take. Consider these common examples of the conflation of actions and the difference that result from those actions:

A maths leader is asked what impact they have had on maths across the school. Their reply includes a description of the training that they had organised for all staff in the Autumn term on concrete, pictorial and abstract representations. This organisation of training might well be a necessary step in the improvement of maths teaching across the school but ultimately it is still only an action. The maths leader’s response gives no consideration to the extent to which staff now understand the CPA model nor the extent to which they are making use of it in maths lessons, nor even whether children are any better at maths than they were before the training.

Here’s another example:

A Headteacher is sitting across the table from governors, their MAT line manager or an HMI and is asked what progress the school has made towards the areas for improvement published in the most recent Ofsted report from last year’s inspection. The Headteacher, naturally, knows exactly what those areas for improvement are and, regarding the one that referenced a need to improve behaviour (particularly around low level disruption), responds by telling their interviewer that senior staff have attended behaviour training provided by the local authority. As in the first example, the Headteacher has given no consideration to how well staff have implemented any strategies developed as a result of the work with the local authority’s behaviour adviser or indeed whether behaviour is actually any better than it was at the time of the previous inspection.

In each example, leaders emphasised what they had done. These actions might well have been important in improving the quality of maths teaching or behaviour but the leaders did not articulate the difference that their actions led to.


In summary:

  • The complexity of schools means that cause cannot be reliably attributed to effect so it is more beneficial for leaders to compare then and now to determine the success of school improvement initiatives.
  • Evaluating impact requires more than measurement because not all that is important can be measured.
  • Reducing complex information to simple data not only removes nuance and context but can have a negative cultural impact on trust and gaming.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: