Visible Learning attempts to be both encyclopedia and synthesis. The book categorizes and describes over 800 meta-analyses of educational research (altogether, those 800 meta-analyses included over 50,000 separate studies), and it puts the results of those meta-analyses onto a single scale, so that we can compare the effectiveness of the very different approaches. After categorizing the meta-analyses, into, for instance, "Vocabulary Programs", "Exposure to Reading", "Outdoor Programs", or "Use of Calculator", Hattie then determines the average effect that the constituent meta-analyses show for that educational approach. By these measures, exposure to reading seems to make more of a difference than the use of calculators, but less of a difference than outdoor programs, and much less of a difference than vocabulary programs. (There are some odd results: "Direct Instruction," according to Hattie's rank-ordering, makes more of a difference than "Socioeconomic Status.")
Like other teaching gurus and meta-meta-analyzers (for instance, Robert Marzano, whose 2000 monograph, A New Era of School Reform, makes the case very explicitly), Hattie believes that good teaching can be codified and taught (that sounds partly true to me), that good teaching involves having very clear and specific learning objectives (I'm somewhat doubtful about that), and that good teaching can overcome, at the school level, the effects of poverty and inequality (I don't believe that). Hattie uses a fair amount of data to back up his argument, but the data and his use of it are somewhat problematic.
First, questions about the statistical competence of Hattie in particular
I am not sure whether we can trust education research, and I am not alone. John Hattie seems to be a leading figure in the field, and while he seems to be a decent fellow, and while most of his recommendations seem somewhat reasonable, his magnum opus, Visible Learning, has such significant issues that my one friend who's a professional statistician believes, after reading my copy of the book, that Hattie is incompetent.
The most blatant errors in Hattie's book have to do with something called "CLE" (Common Language Effect size), which is the probability that a random kid in a "treatment group" will outperform a random kid in a control group. The CLEs in Hattie's book are wrong pretty much throughout. He seems to have written a computer program to calculate them, and the computer program was poorly written. This might be understandable (all programming has bugs), and it might not have meant that Hattie was statistically incompetent, except that the CLEs Hattie cites are dramatically wrong. For instance, the CLE for homework, which Hattie uses prominently (page 9) as an example to explain what CLE means, is given as .21. This would imply that it was much more likely that a student who did not have homework would do well than a student who did have homework. This is ridiculous, and Hattie should have noticed it. But even more egregious is when Hattie proposes CLEs that are less than 0. Hattie has defined the CLE as a probability. A probability cannot be less than 0. There cannot be a less than zero chance of something happening (except perhaps in the language of hyperbolic seventh graders.)
As my statistician friend wrote me in an email, "People who think probabilities can be negative shouldn't write books about statistics."
Second, doubts about the trustworthiness of educational researchers in general
My statistician friend is not the first to have noticed the probabilities of less than zero. A year and a half ago a Norwegian researcher wrote an article called "Can We Trust The Use of Statistics in Educational Research" in which he raised questions about Hattie's statistical competence, and in follow-up correspondence with Hattie the Norwegian was not reassured. (Hattie seems, understandably, not to want to admit that his errors were anything more than minor technical details. In a exchange of comments on an earlier post on this blog, as well, Hattie seems to ignore the CLE/negative probability problem.)
For me, the really interesting thing about Hattie's exchange with the Norwegians was that he seemed genuinely surprised, two years after his book had come out, by the fact that his calculations of CLE were wrong. In his correspondence with the Norwegians, Hattie wrote, "Thanks for Arne Kåre Topphol for noting this error and it will be corrected in any update of Visible Learning." This seems to imply that Hattie hadn't realized that there was any error in his calculations of CLE until it was pointed out by the Norwegians--which means, if I'm right, that no one in the world of education research noticed the CLE errors in between 2009 and 2011.
If it is true that the most prominent book on education to use statistical analysis (when I google "book meta-analysis education", Hattie's book is the first three results) was in print for two years, and not a single education researcher looked at it closely enough and had enough basic statistical sense to notice that a prominent example on page 9 of the book didn't make sense, or that the book was apparently proposing negative probabilities, then education research is in a sorry state. Hattie suggests that the "devil" in education is the "average" teacher, who has "no idea of the damage he or she is doing," and Hattie approvingly quotes someone who calls teaching "an immature profession, one that lacks a solid scientific base and has less respect for evidence than for opinions and ideology" (258). He essentially blames teachers for the fact that teaching is not more evidence-based, implying that if we hidebound practitioners would only do what the data-gurus like him suggest, then schools could educate all students to a very high standard. There is no doubt that there is room for improvement in the practice of many teachers, as there is in the practice of just about everyone, but it is pretty galling to get preachy advice about science from a guy and a field who can't get their own house in order.
Another potential problem with Hattie's data
Aside from the CLE issue, I am troubled by the way Hattie presents his data. He uses a "barometer" that is supposed to show how effective is the curricular program or pedagogical practice he is considering. This is the central graphic tool in Hattie's book, the gauge by which he measures every curricular program, pedagogical practice and administrative shift:
Note that developmental and teacher effects are both above zero. What this implies is that the effect size represented by the arrow is not the effect as compared to a control group of students that got traditional schooling, nor even the effect size as compared to students who got no schooling but simply grew their brains over the course of the study, but the effect size as compared to the same students before the study began.
This would imply that offering homework, with a reported effect size of .29, is actually worse than having students just do normal school, or that multi-grade classes, with an effect size of .04, make kids learn nothing.
Now, that is obviously not what Hattie means. The truth is that Hattie sometimes uses "effect size" to mean "as compared to a control group" and other times uses it to mean "as compared to the same students before the study started." He seems comfortable with this ambiguity, but I am not. Not only is the "barometer" very confusing in cases like homework and multi-grade classrooms, where the graphic seems clearly to imply that those practices are less effective than just doing the regular thing (especially confusing in the case of homework, which is the regular thing), this confusion makes me very, very skeptical of the way Hattie compares these different effect sizes. The comparison of these "effect sizes" is absolutely central to the book. Comparing effect sizes (and he rank orders them in an appendix) is just not acceptable if the effects are being measured against dramatically different comparison groups.
Hattie, in a comment on an earlier post in which I expressed annoyance at this confusion, suggested that we should think of effect sizes as "yardsticks"--but in the same comment he says that effect size is the effect as compared to two different things. In his words: "An effect size of 0 means that the experimental group didn't learn more than the control group and that neither group learned anything." Now, I am an English teacher, so I know that words can mean different things in different contexts. But that is exactly what a yardstick is not supposed to do!
Of course, it is possible that many of Hattie's conclusions are correct. Some of them (like the idea that if you explicitly teach something and have kids practice it under your close observation, then they will get better at it more quickly than if you just ask them to try it out for themselves) are pretty obvious. But it is very hard to have much confidence in the book as a whole as a "solid scientific base" when it contains so much slipperiness, confusion and error.
Beyond these broad issues with Hattie's work, I also have some deep qualms about the way he handles reading in particular. Maybe one day I'll address those in another post.