Decisions, decisions - can research help identify the best phonics programme?

‘The question for teachers is no longer “look and say” or phonics. Instead, the question is which phonics programmes are most effective’, stated Nick Gibb in 2019 (Hazell, 2019). It would, therefore, seem an appropriate time for all DfE approved phonics programmes to be rigorously assessed through randomised controlled trials and the most efficacious one selected and standardised across all schools in England.

There may be trouble ahead…

Firstly, there is the notorious complexity of educational research; what David Berliner (2006) described as ‘the hardest science of all’, because humans in schools are embedded in complex and changing networks of social interaction. Pupil behaviour is constantly reacting with teacher characteristics such that the research variables become almost intolerable. Teacher training, subject knowledge, conceptions of learning, beliefs about assessment, curriculum, personal happiness with life react with student socio-economic factors, IQ, motivation, class make up, mobility, anxiety, friendships and weather such that researchers are often unsure as to which directions the influences work. This is why research into school reform movements have such difficulty replicating

effects from site to site. The Follow-Through study by House et al. (1978) into a multitude of instructional models offers a stark warning for comparative programme research. The variance in student achievement was larger within programmes than between programmes. And even when the results seem emphatic, as in the Clackmannanshire phonics study (Watson and Johnston, 2004), the research design was heavily criticised (Wyse and Goswami, 2008) for the variety of curricula available across the study and other potentially influencing factors such as home instruction, length of study sessions and teacher efficacy.

The next problem would be the means of assessment. How would the effectiveness of the phonics programmes be evaluated? Surely, Phonics Screening Check outcomes would be an effective evaluation? Firstly, the check, although vital, does not assess the whole alphabetic code, so phonics programmes that ensure pupils score well on the check (even 100%) may exclude large amounts of code-knowledge thus building up problems in the future. Secondly, the threshold for the check is a bare minimum. A programme that focuses on pupils’ reaching the threshold would be inadequate but may score well in the research study. Add to that the fact that the check includes real words and is not entirely a pseudoword test so may not check phonic strategies alone, and the problems start to mount. Then there’s the issue of polysyllabic word decoding. Some packages do not include this level explicitly within their programme, yet it is arguably an inherent part of alphabetic code deciphering. Or would a words-read-per-minute test be used? This was the issue for much reading research throughout the twentieth century. By focussing early reading assessment on the amount read, rather than word identification, reading techniques focused on whole-word recognition and not decoding – hence the proliferation of flashcards and incidental phonics after the Gates study (1927). As extracting meaning from text is the ultimate point of reading, a comprehension test could be viable, but effective reading comprehension is a combination of word reading, fluency, content knowledge, life experience, text experience and imagination (Kamhi, 2006). That’s a lot of assessment conflation.

Then there is the issue of teacher input. No programme can ultimately flatten the curve of teacher subject knowledge, experience and expertise. Pupils taught by more expert teachers will perform better no matter the programme. Some programmes seek to minimise the effects of teacher expertise through heavy dependence on resources and structure whereas others place the emphasis on teacher training, theory, rationale and subject knowledge. This is possibly the most difficult variable to militate against in any study and consistently conflates educational research. It is, nonetheless, possible to extract significant differences in cohort performance across a study, however, it is not possible to account for all possible variables. Ultimately, the research question will focus on the effectiveness of any programme, and that will include the teachers.

Finally, there is the question of the monetarisation of programmes. Many are now significant businesses in which schools have made substantial investments. Switching to an alternative programme has risks for schools: retraining expenses, wasted resources, what to do about year groups midway through a programme. Additionally, try convincing a school that their phonics programme is inadequate and will not promote fluent reading for all pupils when their Phonic Screening Check outcomes are near 100%.

There is perhaps a solution: analyse all of the phonics programmes according to their theoretical frameworks. Do they encourage teachers to have an excellent knowledge of the alphabetic code, its structure, complexity and logic? Do they teach all of the alphabetic code including polysyllabic level? Do they privilege sounds or letters? Do they undermine the logic of the code by the inclusion of ‘rules’? Do they destabilise the lucidity of the code in the eyes of pupils by inclusion of ‘sight words’ and ‘tricky words’? Do they follow our knowledge of cognitive science by including retrieval practice and spacing? Do they have decodable texts available for every stage of code knowledge? Are they flexible enough to accommodate slower grasping pupils and pupils who may require more intense and slower instruction?

Or perhaps, a Year Three phonics pseudo-word check that assesses the whole alphabetic code, including polysyllabic level with a threshold set at Bloom’s (1968) mastery expectation of 90%. And just in case we’re worried about over-testing, do this instead of KS1 SATs. Then we might know which phonics programmes deliver.